Case study
Multi-agent AI workflow system
Orchestrated agents with explicit task graphs, shared memory boundaries, and durable execution—LangGraph and CrewAI coordinated behind FastAPI with Redis-backed checkpoints.
Platform engineer — orchestration, state, observability
- Orchestration
- LangGraph + CrewAI
- Durability
- Redis checkpoints
- Transport
- SSE progress stream
Reference architecture
Problem
- —Single-shot prompts could not decompose long operational runbooks into verifiable substeps.
- —Teams needed delegation patterns: researcher, verifier, and executor roles with different tool access.
Architecture
- —LangGraph defines the state machine: nodes are agents or tools; edges encode branching and human approval gates.
- —CrewAI crews encapsulate role prompts and toolkits for bounded subtasks invoked from graph nodes.
- —Redis stores run checkpoints, dedupe keys, and distributed locks for fan-out/fan-in segments.
- —FastAPI exposes run management, streaming events to clients over SSE for progress and partial artifacts.
Challenges
- —Agent memory: separate short-term thread state from durable facts written only after validation.
- —Failure recovery: resume from last good checkpoint without double-applying side effects.
- —Observability: correlate spans across agents with a single run_id propagated on every tool call.
Technologies
- —LangGraph
- —CrewAI
- —FastAPI
- —Redis
- —PostgreSQL
Engineering decisions
- —Chose explicit graphs over implicit agent loops to make behavior reviewable by security stakeholders.
- —Tool allowlists per role; no blanket internet access from execution agents.
- —Structured outputs at boundaries between agents to reduce ambiguous handoffs.
Outcome
- —Durable multi-step workflows suitable for production change management and document-heavy pipelines.
- —Operators can inspect each node output before downstream agents proceed when policies require it.