Twilio Call Flow
Inbound phone calls flow through Twilio into the API service, which bridges the audio to an AI provider (OpenAI Realtime or Gemini Live).
Flow
Detailed Steps
Caller dials Twilio number
|
v
POST /api/twiml -- Twilio webhook (serve.js)
|
+- AI disabled? -> <Dial> redirect to ai.redirectPhone
|
+- AI enabled? -> <Connect><Stream> to /api/media-stream
|
v
WS /api/media-stream -- Twilio opens WebSocket (twilioHandler.js)
|
+- "start" event:
| 1. Extract caller phone + dialed number from stream params
| 2. lookupBusinessByPhone() -- find business by Twilio number in Firestore
| 3. Create interaction ID (pre-generated for tool references)
| 4. Auto-register caller via getOrCreateCaller()
| 5. Check return caller status (recently verified? skip re-verification)
| 6. Load tool definitions + build system prompt
| 7. Connect to AI provider (GPT or Gemini, with fallback)
|
+- "media" events:
| Audio conversion: Twilio mulaw 8kHz -> PCM16 (24kHz for OpenAI, 16kHz for Gemini)
| Forward to AI provider WebSocket
|
+- AI audio responses:
| Audio conversion: PCM16 -> mulaw 8kHz
| Send back to Twilio WebSocket as "media" events
|
+- AI tool calls:
| executeTool() with call context (callSid, callerPhone, channel="phone")
| Result sent back to AI -> AI continues speaking
| Special cases: transfer_call/end_call stop further AI responses
|
+- "stop" event:
Write interaction to Firestore (transcript, actions, outcome, duration)Business Lookup
The business is found by matching the dialed phone number against integrations.twilio.phoneNumber in Firestore. If no match, the stream is closed immediately.
Return Caller Recognition
Before building the prompt, the handler checks if the caller was recently verified:
- Looks up caller by phone in
businesses/{uid}/callers - Checks
ai.returnCallerEnabledis true - Checks
caller.lastVerifiedAtis withinai.returnCallerWindowminutes (default 60) - If
ai.returnCallerMobileOnly(default true), caller must be from a mobile number
If all conditions pass, callerVerified=true is set in the call context and the caller's name is injected into the system prompt so the AI greets them by name and skips verification.
AI Provider Selection + Fallback
The provider comes from business.ai.voiceProvider (default "gpt"). If the primary provider fails to connect, it falls back:
- GPT fails → try Gemini
- Gemini fails → try GPT
Fallback is attempted once. If both fail, the call ends with an error interaction logged.
Call Context
Tools receive this context during phone calls:
{
callSid, // Twilio call SID
accountSid, // Twilio account SID
authToken, // Twilio auth token (for API calls like transfer)
businessPhone, // The dialed Twilio number
callerPhone, // Caller's phone number
channel: "phone", // Distinguishes from "web"/"portal"
interactionId, // Pre-generated ID for cross-referencing
callerVerified, // true if return caller recognized
callerId, // Firestore caller doc ID if recognized
}Call-Ending Tools
transfer_call and end_call have special behavior: after execution, the handler stops sending further AI responses to prevent the AI from speaking after the call is being redirected or hung up.
Audio Conversion
Handled by audioUtils.js:
twilioToOpenAI()— mulaw 8kHz base64 → PCM16 24kHz base64twilioToGemini()— mulaw 8kHz base64 → PCM16 16kHz base64openAIToTwilio()— PCM16 24kHz base64 → mulaw 8kHz base64
Interaction Logging
On stream end, writeInteraction() persists to businesses/{uid}/interactions/{id}:
- Outcome:
"handled"(normal),"escalated"(transferred), or"failed"(error) - Full transcript (user, assistant, tool entries)
- Actions taken with success/failure flags
- Duration in seconds