Agentic AI Systems — Architecture Case Study

01What makes a system "agentic"

A workflow follows a path I designed. An agent decides the path itself: given a goal, it reasons about what to do, picks a tool, observes the result, and repeats until the goal is met. That autonomy is what makes agents powerful for open-ended tasks — and what makes them risky if left unbounded.

The four ingredients I look for:

Reasoning

An LLM that can break a goal into steps and decide the next action (the "think" in think–act–observe).

Tools

Functions the agent can call — search, retrieval, calculators, internal APIs — to act on the world and gather facts.

Memory

Short-term scratchpad for the current task plus longer-term memory across sessions.

Feedback loop

Observe each tool result and adjust — including self-critique to catch its own mistakes.

Honest positioning. I treat agentic AI at an exploration / PoC level today — strong on the patterns, trade-offs and guardrails, and actively building hands-on depth. This page is how I would design and reason about such a system.

02Key terms in plain English

The building blocks behind the word "agentic".

Agent

An LLM that, given a goal, decides its own next action, runs it, looks at the result, and repeats until done — rather than following a path you hard-coded.

Tool

A typed, well-described function the agent may call to act or gather facts (search, retrieval, an internal API). @tool def get_policy(id): ...

ReAct loop

The core rhythm: think (decide), act (call a tool), observe (read the result), repeat. think -> act -> observe

Planner / Executor

A supervisor breaks a goal into sub-tasks and delegates them to focused worker agents, then assembles the results.

Critic

A verifier agent that scores or rejects a result before it is finalised — the quality gate of a multi-agent system.

Memory

Short-term scratchpad for the current task plus longer-term memory carried across sessions.

MCP

Model Context Protocol — a standard way to expose tools and data so any MCP-aware agent can discover and call them, instead of bespoke glue per integration.

Stopping condition

Explicit "done" criteria plus a step/token budget so the agent can't loop forever. while steps < MAX:

03Minimal worked example — the whole thing, broken down

An agent is, at heart, a loop around an LLM that can call tools. Here it is with one tool and a budget.

1. Give it tools. A dictionary of functions the model is allowed to call.
2. Think. Ask the LLM what to do next given the goal and what it has seen so far.
3. Act. If it asked for a tool, run that tool and capture the result.
4. Observe & repeat. Feed the result back; stop when the model answers or the step budget runs out.

# A minimal ReAct agent — one tool, a hard step budget, no framework magic
def get_policy_status(policy_id):                 # 1. the only tool we expose
    return crm.lookup(policy_id)                    #    calls an internal system

tools = {"get_policy_status": get_policy_status}
goal = "Is policy A-1024 active and when does it renew?"
scratch, MAX = [], 5                              # memory + stopping condition

for step in range(MAX):                          # the agent loop
    decision = llm_decide(goal, scratch, tools)    # 2. THINK: tool call or final answer?
    if decision.is_final:
        print(decision.answer); break             #    done -> stop
    result = tools[decision.tool](**decision.args) # 3. ACT: run the chosen tool
    scratch.append((decision.tool, result))        # 4. OBSERVE: remember, then loop
else:
    print("Step budget exhausted — escalating to a human.")   # guardrail

Everything else is hardening this loop: more (and safer) tools, a planner that splits the goal across worker agents, a critic that checks the answer, and human approval on state-changing actions — the design in the next section.

05Tools & MCP — how agents touch the real world

An agent is only as useful as the tools it can call. Each tool needs a clear name, description and typed schema so the model knows when and how to use it. The Model Context Protocol (MCP) standardises this: tools and data sources are exposed by MCP servers and any MCP-aware agent can discover and call them — instead of bespoke glue per integration.

# A tool is just a typed, well-described function the agent can choose to call
from langchain_core.tools import tool

@tool
def get_policy_status(policy_id: str) -> dict:
    """Return the current status and key dates for an insurance policy.
    Use when the user asks about a specific policy by its ID."""   # description guides tool choice
    return crm.lookup(policy_id)                                      # calls an internal system

tools = [get_policy_status, search_documents, calculator]
agent = create_agent(llm, tools, system=POLICY)   # model decides which tool, when, with what args
result = agent.invoke({"goal": "Is policy A-1024 active and when does it renew?"})

Guardrail first. Tools that change state (write/act) get validation, allow-lists and — for high-stakes actions — a human approval step, reusing the human-in-the-loop pattern from my LangGraph study.

06Design trade-offs

Decision	Why it matters
Single agent vs. multi-agent	One agent is simpler; specialised agents give cleaner prompts, focused tools and easier evaluation — at the cost of coordination overhead.
Autonomy vs. control	More freedom solves more tasks but is harder to predict. I bound steps, budgets and tool scope, and keep a fixed graph where the steps are actually known.
Tool granularity	Many narrow tools are easier for the model to use correctly than a few overloaded ones — but too many bloats the context.
Memory strategy	Persisting everything is costly and leaks irrelevant context; I summarise and retrieve memory on demand.
Stopping conditions	Without explicit success criteria and a step/token budget, agents loop. Define "done" up front.

07Evaluating & operating agents

Agents are non-deterministic, so evaluation is not optional — it is the control system.

Task success

Did it reach the goal? Outcome-based scoring on a fixed suite of representative tasks.

Trajectory quality

Were the right tools called in a sensible order, without wasteful loops or unsafe actions?

Cost & safety

Steps, tokens and latency per task, plus guardrail hits — caught before users do.

LLM-as-judge, carefully. Automated graders scale evaluation, but I anchor them with a human-labelled gold set so the judge itself stays honest.

08How the three studies fit together

RAG gives agents trustworthy, grounded knowledge to act on.
LangGraph provides the controllable state machine — including bounded loops and human-in-the-loop — that keeps an agent safe.
Agentic systems add autonomy and multi-agent coordination on top, for tasks whose steps aren't known in advance.
Honest scope: architecture / PoC-level throughout — designed to walk through, reason about, and build on.

Agentic AI: systems that plan, use tools, and check their work

01What makes a system "agentic"

Reasoning

Tools

Memory

Feedback loop

02Key terms in plain English

Agent

Tool

ReAct loop

Planner / Executor

Critic

Memory

MCP

Stopping condition

03Minimal worked example — the whole thing, broken down

04A planner / executor, multi-agent design

05Tools & MCP — how agents touch the real world

06Design trade-offs

07Evaluating & operating agents

Task success

Trajectory quality

Cost & safety

08How the three studies fit together