Milton Martin
Proof of concept · Architecture exploration

Orchestrating LLM workflows with LangChain & LangGraph

Moving beyond a single prompt: how I structure multi-step LLM applications as an explicit, stateful graph — with branching, tool calls, retries, checkpointing and a human-in-the-loop — so the system is controllable, debuggable and safe to run in production.

LangChain LangGraph State machine Tool calling Checkpointing Human-in-the-loop

01Why a graph, not a script

A single prompt works for a single turn. Real tasks need several steps — classify, retrieve, call a tool, validate, maybe loop back and try again. Wiring that as ad-hoc if/else around chained prompts becomes brittle fast: there is no shared state, no clean way to retry one step, and no visibility into why a run went the way it did.

LangGraph models the workflow as a directed graph of nodes (units of work) and edges (what happens next), all operating over a shared, typed state. That makes the flow explicit, inspectable, resumable, and testable — the difference between a demo and something you can operate.

Mental model. LangChain gives me the building blocks (models, prompts, tools, retrievers); LangGraph is the control plane that decides which block runs next and carries state between them.

02Key terms in plain English

The vocabulary of a LangGraph workflow, before any code.

State

One typed object that travels through the whole workflow; every step reads and updates it instead of using hidden globals. class State(TypedDict): ...

Node

A plain function that does one job (classify, retrieve, generate) and returns the updated state. def generate(state): ...

Edge

A fixed "after A, do B" connection between two nodes. g.add_edge("retrieve", "generate")

Conditional edge

Routing that looks at the state to choose the next node — this is where branching and loops live. g.add_conditional_edges("grade", route)

Tool node

A node that calls an external system (search, API, calculator) and feeds the result back into state.

Checkpointer

Saves the state after each step so a run can pause, resume, or recover — and a human can step in mid-flow. g.compile(checkpointer=memory)

Human-in-the-loop

An interrupt where a person approves or corrects before the graph continues — for high-stakes steps.

START / END

The graph's entry and exit markers; routing to END finishes the run. g.add_edge(START, "retrieve")

03Minimal worked example — the whole thing, broken down

The smallest useful graph: generate, self-check, and loop back a bounded number of times.

# The smallest self-correcting graph — generate -> check -> (loop or stop)
from langgraph.graph import StateGraph, START, END
from typing import TypedDict

class State(TypedDict):              # 1. shared state
    question: str
    draft: str
    ok: bool
    attempts: int

def generate(state):                # 2a. node: produce a draft
    state["draft"] = llm_answer(state["question"])
    state["attempts"] += 1
    return state

def check(state):                   # 2b. node: self-grade the draft
    state["ok"] = is_good(state["draft"])
    return state

def route(state):                   # 3. conditional edge = control logic
    if state["ok"]:            return END          # good enough -> stop
    if state["attempts"] < 3: return "generate"   # retry (bounded!)
    return END                                       # give up gracefully

g = StateGraph(State)                # 4. wire it together
g.add_node("generate", generate); g.add_node("check", check)
g.add_edge(START, "generate"); g.add_edge("generate", "check")
g.add_conditional_edges("check", route)
app = g.compile()
app.invoke({"question": "Is flood damage covered?", "attempts": 0})
That hard cap (attempts < 3) is the whole point: self-correction without a budget loops forever and burns tokens. The fuller flow below adds retrieval and a human escalation path.

04Example flow — a self-correcting document assistant

A graph that answers a question over documents, checks whether it actually has enough grounding, and loops to refine the query if not — escalating to a human when confidence stays low.

START Classify intent + route Retrieve context Generate draft answer Grounded? self-check Answer Human review on low confidence yes no · refine query & retry (max N) still low → escalate
Solid = normal flow. Dashed = conditional edges driven by the self-check (loop or escalate).

05The core building blocks

State

A single typed object passed between nodes (question, retrieved context, draft, attempt count, confidence). Every node reads and updates it — no hidden globals.

Nodes

Plain functions that do one thing: classify, retrieve, generate, grade. Easy to unit-test in isolation.

Conditional edges

Routing logic that inspects the state to decide the next node — this is where branching, looping and escalation live.

Tool nodes

Nodes that call external tools/APIs (search, calculators, internal systems) and feed results back into state.

Checkpointing

Persist state after each step so a run can pause, resume, or recover from a crash — and so a human can step in mid-flow.

Human-in-the-loop

An interrupt point where a person approves or corrects before the graph continues — vital for high-stakes actions.

06Defining the graph — annotated snippet

Conceptual LangGraph-style pseudocode showing state, nodes and the conditional loop.

from langgraph.graph import StateGraph, START, END
from typing import TypedDict, List

class State(TypedDict):                 # shared, typed state flows through every node
    question: str
    context: List[str]
    draft: str
    attempts: int
    grounded: bool

def retrieve(state: State) -> State:
    state["context"] = search(state["question"])     # tool/retriever call
    return state

def generate(state: State) -> State:
    state["draft"] = llm_answer(state["question"], state["context"])
    return state

def grade(state: State) -> State:
    state["grounded"] = is_supported(state["draft"], state["context"])  # self-check
    state["attempts"] += 1
    return state

def route(state: State) -> str:           # conditional edge = the control logic
    if state["grounded"]:        return "answer"
    if state["attempts"] < 3:     return "retrieve"   # loop back and refine
    return "human"                                  # give up gracefully → escalate

g = StateGraph(State)
g.add_node("retrieve", retrieve); g.add_node("generate", generate)
g.add_node("grade", grade);       g.add_node("human", human_review)
g.add_edge(START, "retrieve"); g.add_edge("retrieve", "generate"); g.add_edge("generate", "grade")
g.add_conditional_edges("grade", route, {"answer": END, "retrieve": "retrieve", "human": "human"})
app = g.compile(checkpointer=memory)     # checkpointing → resumable + human-in-the-loop

07Design trade-offs

DecisionWhy it matters
Explicit graph vs. a single mega-promptMore upfront structure, but each step becomes testable, retryable and observable — far easier to operate and debug.
Bounded loops (max attempts)Self-correction is powerful but can spin forever or burn tokens. A hard cap plus graceful escalation keeps cost and latency predictable.
Where to put the humanToo many approvals kill throughput; too few are risky. I gate only high-stakes or low-confidence steps.
Checkpoint granularityPer-node checkpoints enable resume and audit, at some storage cost — worth it for long or critical flows.
Graph vs. autonomous agentA fixed graph is predictable; a free agent is flexible. I reach for a graph when the steps are known, and lean agentic when they are not — see the agentic systems study.

08Observability & LLMOps

Tracing

Every node, prompt, tool call and state transition is traced, so a run reads like a story you can replay.

Evaluation

Test the graph end-to-end and node-by-node against fixed cases; track success rate, loop counts and escalation rate.

Cost & latency

Measure tokens and time per node to find the expensive step before scaling out.

09Limitations & next steps