When the steps to solve a task aren't known in advance, a fixed pipeline isn't enough. This study shows how I think about agents — an LLM that reasons, calls tools, keeps memory, and coordinates with other specialised agents — and, just as importantly, how to keep such a system bounded, observable and evaluable.
A workflow follows a path I designed. An agent decides the path itself: given a goal, it reasons about what to do, picks a tool, observes the result, and repeats until the goal is met. That autonomy is what makes agents powerful for open-ended tasks — and what makes them risky if left unbounded.
The four ingredients I look for:
An LLM that can break a goal into steps and decide the next action (the "think" in think–act–observe).
Functions the agent can call — search, retrieval, calculators, internal APIs — to act on the world and gather facts.
Short-term scratchpad for the current task plus longer-term memory across sessions.
Observe each tool result and adjust — including self-critique to catch its own mistakes.
The building blocks behind the word "agentic".
An LLM that, given a goal, decides its own next action, runs it, looks at the result, and repeats until done — rather than following a path you hard-coded.
A typed, well-described function the agent may call to act or gather facts (search, retrieval, an internal API). @tool def get_policy(id): ...
The core rhythm: think (decide), act (call a tool), observe (read the result), repeat. think -> act -> observe
A supervisor breaks a goal into sub-tasks and delegates them to focused worker agents, then assembles the results.
A verifier agent that scores or rejects a result before it is finalised — the quality gate of a multi-agent system.
Short-term scratchpad for the current task plus longer-term memory carried across sessions.
Model Context Protocol — a standard way to expose tools and data so any MCP-aware agent can discover and call them, instead of bespoke glue per integration.
Explicit "done" criteria plus a step/token budget so the agent can't loop forever. while steps < MAX:
An agent is, at heart, a loop around an LLM that can call tools. Here it is with one tool and a budget.
# A minimal ReAct agent — one tool, a hard step budget, no framework magic def get_policy_status(policy_id): # 1. the only tool we expose return crm.lookup(policy_id) # calls an internal system tools = {"get_policy_status": get_policy_status} goal = "Is policy A-1024 active and when does it renew?" scratch, MAX = [], 5 # memory + stopping condition for step in range(MAX): # the agent loop decision = llm_decide(goal, scratch, tools) # 2. THINK: tool call or final answer? if decision.is_final: print(decision.answer); break # done -> stop result = tools[decision.tool](**decision.args) # 3. ACT: run the chosen tool scratch.append((decision.tool, result)) # 4. OBSERVE: remember, then loop else: print("Step budget exhausted — escalating to a human.") # guardrail
A supervisor (planner) decomposes the goal and delegates to specialised agents, each with its own tools; a critic verifies before anything is finalised.
An agent is only as useful as the tools it can call. Each tool needs a clear name, description and typed schema so the model knows when and how to use it. The Model Context Protocol (MCP) standardises this: tools and data sources are exposed by MCP servers and any MCP-aware agent can discover and call them — instead of bespoke glue per integration.
# A tool is just a typed, well-described function the agent can choose to call from langchain_core.tools import tool @tool def get_policy_status(policy_id: str) -> dict: """Return the current status and key dates for an insurance policy. Use when the user asks about a specific policy by its ID.""" # description guides tool choice return crm.lookup(policy_id) # calls an internal system tools = [get_policy_status, search_documents, calculator] agent = create_agent(llm, tools, system=POLICY) # model decides which tool, when, with what args result = agent.invoke({"goal": "Is policy A-1024 active and when does it renew?"})
| Decision | Why it matters |
|---|---|
| Single agent vs. multi-agent | One agent is simpler; specialised agents give cleaner prompts, focused tools and easier evaluation — at the cost of coordination overhead. |
| Autonomy vs. control | More freedom solves more tasks but is harder to predict. I bound steps, budgets and tool scope, and keep a fixed graph where the steps are actually known. |
| Tool granularity | Many narrow tools are easier for the model to use correctly than a few overloaded ones — but too many bloats the context. |
| Memory strategy | Persisting everything is costly and leaks irrelevant context; I summarise and retrieve memory on demand. |
| Stopping conditions | Without explicit success criteria and a step/token budget, agents loop. Define "done" up front. |
Agents are non-deterministic, so evaluation is not optional — it is the control system.
Did it reach the goal? Outcome-based scoring on a fixed suite of representative tasks.
Were the right tools called in a sensible order, without wasteful loops or unsafe actions?
Steps, tokens and latency per task, plus guardrail hits — caught before users do.