Case study
AI voice ordering platform
Realtime voice commerce with WebRTC audio, model streaming, and multilingual turn-taking—built for sub-second perceived latency and reliable order completion.
Lead engineer — voice pipeline, backend services, deployment
- Order completion time
- −40%
- Realtime path
- WebRTC + streaming
- Languages
- EN / AR / UR
Reference architecture
Problem
- —Phone and kiosk ordering needed a conversational layer that could handle noisy environments, code-switching, and mid-sentence corrections without restarting the flow.
- —Menu data and cart state had to stay consistent while the model streamed partial intents and tool calls across an unreliable network.
Architecture
- —Browser captures PCM via WebRTC; signaling and session metadata route through a thin FastAPI edge.
- —Realtime model session maintains tool definitions for menu lookup, cart mutations, and handoff to human agents when confidence drops.
- —Redis holds ephemeral session state, idempotency keys for cart writes, and short-lived rate limits per device.
- —AWS hosts the API tier, async workers for post-call analytics, and encrypted object storage for optional call artifacts where policy allows.
Challenges
- —Interruption handling: cancel in-flight tool calls when the user barges in, and reconcile partial transcripts with the cart snapshot.
- —Multilingual routing: detect language per turn, pin system prompts, and avoid mixed-language tool payloads.
- —Latency budget: colocate media and API regions, trim JSON schemas sent to the model, and prefetch menu slices by store context.
Technologies
- —OpenAI Realtime API
- —WebRTC
- —FastAPI
- —Redis
- —AWS
- —Docker
Engineering decisions
- —Chose explicit server-side orchestration over client-only prompts so every tool invocation is logged, versioned, and replayable for audits.
- —Separated "listening" state from "acting" state so partial ASR spikes do not enqueue duplicate order lines.
- —Used Redis TTL sessions with versioned cart documents to make retries safe under duplicate websocket reconnects.
Outcome
- —Shipped a production voice path that tolerates interruptions and keeps cart integrity under concurrent partial intents.
- —Reduced median time from intent to confirmed order by tightening the tool graph and prefetching high-traffic menu paths.