Case study

AI voice ordering platform

Realtime voice commerce with WebRTC audio, model streaming, and multilingual turn-taking—built for sub-second perceived latency and reliable order completion.

Lead engineer — voice pipeline, backend services, deployment

Order completion time
−40%
Realtime path
WebRTC + streaming
Languages
EN / AR / UR

Reference architecture

Problem

  • Phone and kiosk ordering needed a conversational layer that could handle noisy environments, code-switching, and mid-sentence corrections without restarting the flow.
  • Menu data and cart state had to stay consistent while the model streamed partial intents and tool calls across an unreliable network.

Architecture

  • Browser captures PCM via WebRTC; signaling and session metadata route through a thin FastAPI edge.
  • Realtime model session maintains tool definitions for menu lookup, cart mutations, and handoff to human agents when confidence drops.
  • Redis holds ephemeral session state, idempotency keys for cart writes, and short-lived rate limits per device.
  • AWS hosts the API tier, async workers for post-call analytics, and encrypted object storage for optional call artifacts where policy allows.

Challenges

  • Interruption handling: cancel in-flight tool calls when the user barges in, and reconcile partial transcripts with the cart snapshot.
  • Multilingual routing: detect language per turn, pin system prompts, and avoid mixed-language tool payloads.
  • Latency budget: colocate media and API regions, trim JSON schemas sent to the model, and prefetch menu slices by store context.

Technologies

  • OpenAI Realtime API
  • WebRTC
  • FastAPI
  • Redis
  • AWS
  • Docker

Engineering decisions

  • Chose explicit server-side orchestration over client-only prompts so every tool invocation is logged, versioned, and replayable for audits.
  • Separated "listening" state from "acting" state so partial ASR spikes do not enqueue duplicate order lines.
  • Used Redis TTL sessions with versioned cart documents to make retries safe under duplicate websocket reconnects.

Outcome

  • Shipped a production voice path that tolerates interruptions and keeps cart integrity under concurrent partial intents.
  • Reduced median time from intent to confirmed order by tightening the tool graph and prefetching high-traffic menu paths.

← All case studies · Home · Contact