ADR-001: Firebase Functions for API + Cloud Run for Voice/Chat
Status
Accepted
Date
2025-01-15
Context
The Humanlike platform needs two distinct backend services:
Dashboard/admin APIs -- CRUD operations for business management, staff, calendar OAuth, Stripe billing, and static site generation. These are short-lived HTTP request/response cycles initiated by the web dashboard.
Voice and chat APIs -- real-time audio streaming (Twilio WebSocket media streams), OpenAI Realtime API WebSocket connections, and streaming chat completions. These are long-lived connections that can last minutes per call.
Firebase Cloud Functions (2nd gen) have a maximum timeout of 540 seconds and do not natively support WebSocket upgrades. Voice calls require persistent bidirectional WebSocket connections between Twilio and the AI provider, with audio format conversion happening in real time.
Additionally, the voice/chat service needs to be written in plain JavaScript (no build step) for rapid iteration on prompt engineering and tool dispatch, while the dashboard APIs benefit from TypeScript and Express router structure shared with the rest of the monorepo.
Decision
Split the backend into two deployment targets:
Firebase Cloud Functions (
functions/) -- TypeScript Express routers for dashboard APIs:adminApi,calendarApi,siteApi,staffApi,stripeApi. Deployed automatically on push tomainwhenfunctions/**orshared/**change.Cloud Run (
services/api/) -- plain JavaScript HTTP + WebSocket server for voice and chat: Twilio TwiML webhook, Twilio media stream WebSocket handler, OpenAI ephemeral token creation, tool execution, chat completions proxy, portal resolution. Deployed as a Docker container on push whenservices/api/**changes.
Both services share Firebase Auth for authentication and Firestore for data, using the same @humanlike/shared types and path builders (Cloud Functions via TypeScript import, Cloud Run via runtime reference).
Alternatives Considered
| Alternative | Why Rejected |
|---|---|
| Everything in Firebase Cloud Functions | Cloud Functions do not support WebSocket upgrades needed for Twilio media streams and OpenAI Realtime. The 540s timeout is too short for long calls. |
| Everything in Cloud Run | Would lose Firebase Functions conveniences (automatic scaling, tight Firestore triggers integration, Express router pattern). Dashboard APIs do not need long-lived connections. |
| Single Cloud Run service with TypeScript | The voice/chat service benefits from plain JS for fast iteration without a build step. Mixing TS and JS in one service adds complexity. |
| AWS Lambda + API Gateway WebSocket | Would require migrating away from Firebase ecosystem (Auth, Firestore, Hosting) and introducing cross-cloud complexity. |
| Self-hosted server (GCE/EC2) | Adds operational burden for scaling, patching, and availability. Cloud Run provides managed scaling with pay-per-use. |
Consequences
Positive:
- Voice/chat service supports WebSocket connections of arbitrary duration
- Each service can scale independently based on its traffic pattern
- Plain JS in
services/api/enables rapid prompt and tool iteration without build steps - Dashboard APIs get TypeScript safety and Express router structure
- Path-based CI deploys only what changed
Negative:
- Two deployment targets to monitor and debug
- Shared logic (Firebase Auth verification, Firestore access) is duplicated across both services rather than shared as a library
- Cloud Run requires a Dockerfile and container registry management
Neutral:
- Both services authenticate via Firebase Auth tokens, so the auth model is consistent
- Firestore is the shared data layer, keeping both services in sync without inter-service calls