Building your first AI agent: what to decide before writing a line of code

When a client comes to us saying they want to "build an agent," we almost never open a code editor first. We open a blank document and start asking questions. Not to slow things down — to avoid building the wrong thing at speed.

There are five decisions that, if made poorly at the start, will cause problems at every stage after. They're not glamorous decisions. They don't involve choosing a model or picking an orchestration framework. They're operational decisions, and they matter more.

Decision 1: What is the trigger?

An agent needs to know when to start. This sounds obvious. It's frequently left vague.

Triggers fall into three categories: scheduled (run every night at 11pm), event-driven (run when a new row appears in this spreadsheet), and on-demand (run when a human clicks this button). Each has a different failure mode. Scheduled agents fail silently — no one knows they stopped running until damage surfaces. Event-driven agents can loop if the event fires incorrectly. On-demand agents become a crutch if the human gate is the wrong bottleneck.

Pick one trigger type per agent. If a process needs more than one trigger, that's a sign you're describing more than one process.

Decision 2: What data does it read, and from where?

List every data source the agent needs access to. Be specific: not "CRM data" but "the Deals table in HubSpot, filtered to status = Qualified, updated in the last 48 hours." The more precisely you describe the data inputs, the clearer the agent's actual scope becomes.

This step usually reveals two things. First, that some of the data the agent theoretically needs doesn't exist yet, or isn't structured consistently. Second, that some data sources require access permissions that haven't been discussed with IT or management. Both of these are better discovered on day one than on day thirty.

Write the data sources as a list. If the list has more than five items, the scope is probably too wide for a first agent.

Decision 3: What actions is it allowed to take?

This is the decision most teams skip, and it's the most consequential one.

Actions fall on a spectrum from read-only (the agent looks at things and generates a report) to fully autonomous (the agent moves money, sends emails to customers, closes tickets without review). The right position on that spectrum depends on how reversible the actions are and how much trust the team has built with the system.

For a first agent in any organisation, we recommend starting closer to the read-only end — even if the eventual goal is full automation. Let the agent run for two weeks generating output it hands to a human. If the human consistently agrees with what the agent proposes, that's evidence to extend autonomy. If the human frequently overrides it, that's information about what the agent's logic is missing.

Document the allowed actions explicitly. "The agent may send internal Slack notifications. It may not send external emails. It may update the Status field in the project tracker. It may not delete records." This document becomes the reference point for every subsequent decision.

Decision 4: What does a successful run look like?

Define success before you build. Not in a vague way ("it works") but in a measurable way that you can check against actual output.

For a data extraction agent: success means extracting all records matching criterion X, with fewer than Y% errors, and producing output in format Z. For a classification agent: success means categorising items into one of four buckets with agreement rate above N% compared to how a human would categorise them. For a notification agent: success means sending the correct message to the correct person within the correct time window, with no duplicate sends.

If you can't define success in measurable terms before building, you won't be able to evaluate the agent objectively after building. You'll be left with subjective impressions ("it seems to be working") that are not useful when something breaks at scale.

Decision 5: What happens when something goes wrong?

This is the question almost no one asks in the design phase.

Agents encounter unexpected situations. A data source returns an empty response. An API times out. The document the agent was told to read is in a format it doesn't recognise. A record it needs to update has been deleted by someone else in the meantime.

For each of these scenarios, the agent needs a defined behaviour. Does it retry? How many times? Does it fail gracefully and log the issue? Does it notify a human? Does it halt and wait for manual intervention?

A useful framework: categorise failure modes as recoverable (the agent can try again), escalable (a human needs to look at this), or terminal (stop the run, log everything, alert the team). Design a response for each category before you start building. This turns error handling from an afterthought into a first-class part of the system.

What this looks like in practice

The answers to these five decisions fit on two pages. We write them in plain language, not pseudocode. The document gets reviewed by the person who owns the process being automated — not by a technical lead alone. If the process owner can't confirm the document accurately describes how things work, the agent isn't ready to be built.

Once the document exists and is agreed on, building is mostly an engineering task. The hard thinking has already happened. That's how it should be.

The teams that spend a week on this document before touching code consistently have fewer problems in production than the teams that spend a week writing code and then try to reverse-engineer what the agent is supposed to do.

One more thing: the document is a living record. When the agent's behaviour needs to change — because the process changes, because the data structure changes, because the business changes — the document changes first. The code follows the document. If you update the code without updating the document, you'll lose track of what the agent was supposed to do and what it's now doing instead. That gap is where most maintenance problems originate.