Real-time voice agent using Pipecat framework with Google Gemini Live API for speech-to-speech conversations with Plivo telephony.
- Pipecat Framework: Modular pipeline architecture for voice AI
- Speech-to-Speech: Native audio using Gemini Live API
- Multi-turn Conversations: Maintains context across conversation turns
- Voice Activity Detection: Silero VAD for natural turn-taking
- Low Latency: Real-time bidirectional audio streaming
- Python 3.10+
- uv package manager (recommended)
- Google AI API key with Gemini Live API access
- Plivo account with a phone number
- ngrok (for local development)
cd gemini2.5-live-pipecat
uv syncOr with pip:
pip install -e .cp .env.example .envEdit .env with your credentials:
GEMINI_API_KEY=your_gemini_api_key
PLIVO_AUTH_ID=your_plivo_auth_id
PLIVO_AUTH_TOKEN=your_plivo_auth_token
PUBLIC_URL=https://your-ngrok-url.ngrok-free.appngrok http 8000Copy the ngrok URL to PUBLIC_URL in your .env file.
Set your Plivo phone number's Answer URL to:
https://your-ngrok-url.ngrok-free.app/answer
uv run python voice_agent.pyCall your Plivo phone number and start talking to the agent.
gemini2.5-live-pipecat/
├── voice_agent.py # Main application with Pipecat pipeline
├── pyproject.toml # Project dependencies
├── .env.example # Environment variable template
└── README.md # This file
┌─────────┐ ┌─────────────┐ ┌─────────────┐
│ Phone │────▶│ Plivo │────▶│ Server │
│ Call │◀────│ (PSTN) │◀────│ (FastAPI) │
└─────────┘ └─────────────┘ └──────┬──────┘
│
WebSocket (μ-law 8kHz)│
▼
┌─────────────┐
│ Pipecat │
│ Pipeline │
│ │
│ ┌─────────┐ │
│ │ Gemini │ │
│ │ Live │ │
│ └─────────┘ │
└─────────────┘
- Incoming Call: Plivo receives call and hits
/answerwebhook - WebSocket Setup: Server returns XML to establish bidirectional stream
- Audio Streaming: Plivo streams μ-law 8kHz audio via WebSocket
- Pipecat Pipeline: Audio flows through the pipeline with VAD and context management
- Gemini Processing: Gemini Live processes speech and generates response
- Response Streaming: Audio is streamed back through Pipecat to Plivo
The pipeline uses Pipecat's modular architecture with Gemini Multimodal Live:
Pipeline([
transport.input(), # Receive audio from Plivo
llm, # Gemini Multimodal Live (speech-to-speech)
transport.output(), # Send audio to Plivo
])Since Gemini Multimodal Live handles both speech recognition and synthesis natively, the pipeline is simpler than traditional STT → LLM → TTS architectures.
| Variable | Description | Default |
|---|---|---|
GEMINI_API_KEY |
Google AI API key | Required |
PLIVO_AUTH_ID |
Plivo Auth ID | Required |
PLIVO_AUTH_TOKEN |
Plivo Auth Token | Required |
PUBLIC_URL |
Public URL for webhooks (ngrok) | Required |
SERVER_PORT |
Server port | 8000 |
GEMINI_MODEL |
Gemini model name | models/gemini-2.5-flash-native-audio-preview-12-2025 |
GEMINI_VOICE |
Gemini voice name | Puck |
SYSTEM_PROMPT |
Custom system prompt | Default assistant prompt |
Aoede, Charon, Fenrir, Kore, Puck
| Feature | gemini2.5-live-pipecat | gemini2.5-live-native |
|---|---|---|
| Framework | Pipecat | None (direct API) |
| Code complexity | Lower | Higher |
| Customization | Via Pipecat processors | Full control |
| Audio conversion | Handled by Pipecat | Manual implementation |
| VAD | Silero via Pipecat | Custom or none |
Use gemini2.5-live-pipecat when you want:
- Quick setup with less code
- Built-in VAD and audio handling
- Easy pipeline customization
- Integration with other Pipecat services
Use gemini2.5-live-native when you need:
- Maximum control over audio processing
- Custom audio format handling
- Minimal dependencies
- Production-optimized performance
- Verify
GEMINI_API_KEYis correct - Check that Plivo credentials are set
- Ensure ngrok is running and URL matches
PUBLIC_URL
- Check Plivo webhook configuration
- Verify ngrok tunnel is active
- Review server logs for errors
- Verify Gemini Live API access is enabled for your API key
- Check the model name is correct
- Review logs for Gemini API errors
MIT