Is "Deep Agent" the same as "multi-agent"?

Multi-agent usually means you're composing several independent agents into a workflow — a research agent here, a writing agent there, glued together with orchestration code. Deep Agent means one catalog entry is internally three tiers. You can absolutely compose Deep Agents into a department (that's what our Demo 2 walkthrough shows) — but even the one Deep Agent, on its own, has a team inside it.

How is this different from an LLM framework?

Frameworks (LangChain, CrewAI, Autogen, LangGraph) give you the parts. You build the planner. You define the specialists. You wire the validator. You test the graph. Per agent. Forever. Sentrul ships 257 Deep Agents where the tiers are pre-assembled, pre-tested, and pre-wired into the multi-tenancy, billing, and observability layers. If you want to build them yourself, our architecture is open — our blog walks through the exact LangGraph patterns we use. Most teams find the 3-to-4-week-per-Deep-Agent build cost isn't the best use of their engineers.

Can I customize a Deep Agent's specialists?

Yes, on the Operations tier and up. You can swap a specialist (for example, replace our default classifier with your tuned classifier that understands your taxonomy), override the model lane for any tier, and change the validator's risk threshold. You can't break the three-tier contract — that's intentional. The contract is what keeps audit trails consistent and traces debuggable.

What happens when the validator triggers the HITL gate?

The graph pauses at interrupt_before on the outbound edge. State persists to our PostgreSQL checkpointer — nothing is lost. The approver (a human on your team, configurable per Deep Agent) gets notified by email, Slack, or whatever your MCP integration routes to. They approve or reject in the app. The graph resumes, or it doesn't. Every pause and resume is its own span in the trace.

Every agent is a Deep Agent.

Not a chatbot. Not a prompt. A team — planner, specialists, and reviewer — running inside one button.

Sentrul ships 257 agents across every department. Every one of them is a Deep Agent: three tiers, composed in LangGraph, each tier visible in the trace. Your team hires one agent; a team of ~5 shows up for work.

Start free trial →Choose Enterprise →

No credit card. 14 days. All 257 Deep Agents. Bring your own key.

BYOK on every tierThree tiers, every run traced in Langfuseinterrupt_before gate catches ~7% of outbound actionsBuilt on LangGraph · Multi-tenant on Azure

Tier 1

Planner

Decomposes the task. Picks which specialists run on which model.

Tier 2

Specialists

3–7 sub-agents, each with one job. Fan out in parallel.

Tier 3

Validator

Checks output. Raises the HITL gate when risk crosses the line.

What's a Deep Agent?

The word “agent” has gotten slippery. It covers chat prompts with personas, single-shot LLM calls, script wrappers, and full autonomous rigs — all sharing one label. When you buy an agent, you don't know which one you're getting.

A Deep Agent is specific. Three tiers, every time:

Tier 1 — Planner

Reads the input. Decomposes the task. Picks which specialists run in what order, on which model. Writes the plan into shared state so every later step (and every audit) can see why it happened.

Tier 2 — Specialists

A small team of sub-agents — usually three to seven per Deep Agent — each with one job and an explicit input/output contract. They run in parallel wherever the planner says they can. Researcher, extractor, classifier, scorer, writer — role-specific, swappable, independently observable.

Tier 3 — Validator

Runs last. Checks the combined output against policy, schema, risk, and confidence thresholds. When risk crosses the line, it raises the human-in-the-loop gate — interrupt_before in LangGraph — and pauses the graph for your approval. Either passes the output through or waits for a human.

Every tier is a span in the Langfuse trace. Open any run, count them. The three tiers aren't a marketing claim — they're a trace assertion.

If you run operations

You're not hiring an agent. You're hiring a team.

One Deep Agent has:

A team lead who decides what gets done.
Three to seven specialists who do the actual work — in parallel when they can.
A reviewer who checks the output before it reaches anyone external.

You pay for one catalog entry. You get five people's worth of division of labor.

And the reviewer actually rejects things. On our production workloads, roughly 7% of outbound actionsget caught at the reviewer tier and gated for human approval. That's the safety net doing real work — not theater.

Start free trial →

If you build platforms

Deep Agent is a three-tier swarm composed in LangGraph:

# Conceptual sketch — every Deep Agent looks like this:

graph = StateGraph(DeepAgentState)

graph.add_node("planner",     planner_fn)      # tier 1
graph.add_node("specialists", specialist_fn)    # tier 2 (fan-out)
graph.add_node("validator",   validator_fn)     # tier 3

graph.add_edge("planner", "specialists")
graph.add_edge("specialists", "validator")

graph.add_conditional_edges(
    "validator",
    route_on_risk,
    {"pass": END, "gate": "hitl_approver"},
)

graph = graph.compile(
    checkpointer=PostgresSaver(...),
    interrupt_before=["hitl_approver"],
)

Planneris a routing step, not always an LLM call. On ~60% of runs it's a lightweight classification that picks which specialists to spawn and which model lane to use.
Specialists fan out. Each is a pure function over state with an explicit contract — swappable without re-testing the outer graph.
Validator is the interrupt_before trigger. Risk score, policy violation, or low confidence raises the HITL gate; otherwise the graph closes.

Every node is a span in the Langfuse trace with a stable ID you can paste into a ticket. The graph is the code — our CI compiles every Deep Agent on every push.

See the architecture docs →

Why three tiers, not one big prompt?

Reason 1

Reliability goes up because specialists have contracts

A single generalist prompt fails on edge cases roughly 15% of the time in our internal benchmarks. Five specialists composed with explicit input/output contracts fail ~2%. Specialization compounds.

Reason 2

Model costs go down because the planner picks

The planner chooses the model lane per step — not per agent. Classifications run on Haiku 4.5 or local tier-1; reasoning runs on Claude Sonnet 4.6; the occasional heavy synthesis runs on Opus 4.7. Median Sentrul run costs ~60% less than putting the whole task on the biggest model.

Reason 3

The safety net actually catches things

The validator tier is the reason the human-in-the-loop gate isn't theater. On our production workloads the validator rejects or gates ~7% of outbound actions— roughly one in every fourteen catches something real. That number isn't an agent quirk; it's the tier doing its job.

See all plan tiers and how model routing is configured per subscription level.

See the three tiers in every trace

When a Deep Agent runs, every tier becomes a span in Langfuse. You see the planner's decomposition, the specialists' parallel calls, and the validator's check — with timings, token counts, and the exact model used per step.

Paste any trace ID into a support ticket or audit query. The three tiers aren't inferred; they're logged.

[SCREENSHOT — anonymized Langfuse trace view, three spans labeled planner / specialists / validator, with a visible interrupt_before pause between validator and the outbound edge]

Production trace from a pilot compliance workflow. Total wall-clock: 18 seconds. Total cost: $0.04. Validator triggered the HITL gate; human approved in 4 minutes.

Security overview →Read the docs →

FAQ

Is "Deep Agent" the same as "multi-agent"?: Multi-agent usually means you're composing several independent agents into a workflow — a research agent here, a writing agent there, glued together with orchestration code. Deep Agent means one catalog entry is internally three tiers. You can absolutely compose Deep Agents into a department (that's what our Demo 2 walkthrough shows) — but even the one Deep Agent, on its own, has a team inside it.
How is this different from an LLM framework?: Frameworks (LangChain, CrewAI, Autogen, LangGraph) give you the parts. You build the planner. You define the specialists. You wire the validator. You test the graph. Per agent. Forever. Sentrul ships 257 Deep Agents where the tiers are pre-assembled, pre-tested, and pre-wired into the multi-tenancy, billing, and observability layers. If you want to build them yourself, our architecture is open — our blog walks through the exact LangGraph patterns we use. Most teams find the 3-to-4-week-per-Deep-Agent build cost isn't the best use of their engineers.
Can I customize a Deep Agent's specialists?: Yes, on the Operations tier and up. You can swap a specialist (for example, replace our default classifier with your tuned classifier that understands your taxonomy), override the model lane for any tier, and change the validator's risk threshold. You can't break the three-tier contract — that's intentional. The contract is what keeps audit trails consistent and traces debuggable.
What happens when the validator triggers the HITL gate?: The graph pauses at interrupt_before on the outbound edge. State persists to our PostgreSQL checkpointer — nothing is lost. The approver (a human on your team, configurable per Deep Agent) gets notified by email, Slack, or whatever your MCP integration routes to. They approve or reject in the app. The graph resumes, or it doesn't. Every pause and resume is its own span in the trace.

“The pause was the feature, not the bug. We just hadn't had a system that paused before.”
— VP Operations, 120-FTE B2B SaaS (pilot, public quote approved)

Ready to hire a Deep Agent?

Start free. No card. All 257 Deep Agents.

14 days. Your keys. Full three-tier observability. No lock-in.