Skip to content

Twilio Call Flow

Inbound phone calls flow through Twilio into the API service, which bridges the audio to an AI provider (OpenAI Realtime or Gemini Live).

Flow

Detailed Steps

Caller dials Twilio number
        |
        v
POST /api/twiml                    -- Twilio webhook (serve.js)
   |
   +- AI disabled? -> <Dial> redirect to ai.redirectPhone
   |
   +- AI enabled? -> <Connect><Stream> to /api/media-stream
        |
        v
WS /api/media-stream               -- Twilio opens WebSocket (twilioHandler.js)
   |
   +- "start" event:
   |     1. Extract caller phone + dialed number from stream params
   |     2. lookupBusinessByPhone() -- find business by Twilio number in Firestore
   |     3. Create interaction ID (pre-generated for tool references)
   |     4. Auto-register caller via getOrCreateCaller()
   |     5. Check return caller status (recently verified? skip re-verification)
   |     6. Load tool definitions + build system prompt
   |     7. Connect to AI provider (GPT or Gemini, with fallback)
   |
   +- "media" events:
   |     Audio conversion: Twilio mulaw 8kHz -> PCM16 (24kHz for OpenAI, 16kHz for Gemini)
   |     Forward to AI provider WebSocket
   |
   +- AI audio responses:
   |     Audio conversion: PCM16 -> mulaw 8kHz
   |     Send back to Twilio WebSocket as "media" events
   |
   +- AI tool calls:
   |     executeTool() with call context (callSid, callerPhone, channel="phone")
   |     Result sent back to AI -> AI continues speaking
   |     Special cases: transfer_call/end_call stop further AI responses
   |
   +- "stop" event:
         Write interaction to Firestore (transcript, actions, outcome, duration)

Business Lookup

The business is found by matching the dialed phone number against integrations.twilio.phoneNumber in Firestore. If no match, the stream is closed immediately.

Return Caller Recognition

Before building the prompt, the handler checks if the caller was recently verified:

  1. Looks up caller by phone in businesses/{uid}/callers
  2. Checks ai.returnCallerEnabled is true
  3. Checks caller.lastVerifiedAt is within ai.returnCallerWindow minutes (default 60)
  4. If ai.returnCallerMobileOnly (default true), caller must be from a mobile number

If all conditions pass, callerVerified=true is set in the call context and the caller's name is injected into the system prompt so the AI greets them by name and skips verification.

AI Provider Selection + Fallback

The provider comes from business.ai.voiceProvider (default "gpt"). If the primary provider fails to connect, it falls back:

  • GPT fails → try Gemini
  • Gemini fails → try GPT

Fallback is attempted once. If both fail, the call ends with an error interaction logged.

Call Context

Tools receive this context during phone calls:

javascript
{
  callSid,           // Twilio call SID
  accountSid,        // Twilio account SID
  authToken,         // Twilio auth token (for API calls like transfer)
  businessPhone,     // The dialed Twilio number
  callerPhone,       // Caller's phone number
  channel: "phone",  // Distinguishes from "web"/"portal"
  interactionId,     // Pre-generated ID for cross-referencing
  callerVerified,    // true if return caller recognized
  callerId,          // Firestore caller doc ID if recognized
}

Call-Ending Tools

transfer_call and end_call have special behavior: after execution, the handler stops sending further AI responses to prevent the AI from speaking after the call is being redirected or hung up.

Audio Conversion

Handled by audioUtils.js:

  • twilioToOpenAI() — mulaw 8kHz base64 → PCM16 24kHz base64
  • twilioToGemini() — mulaw 8kHz base64 → PCM16 16kHz base64
  • openAIToTwilio() — PCM16 24kHz base64 → mulaw 8kHz base64

Interaction Logging

On stream end, writeInteraction() persists to businesses/{uid}/interactions/{id}:

  • Outcome: "handled" (normal), "escalated" (transferred), or "failed" (error)
  • Full transcript (user, assistant, tool entries)
  • Actions taken with success/failure flags
  • Duration in seconds

Last updated: