In this tutorial, we’ll create a real-time voice agent that responds to queries via speech in ~500ms. This flexible implementation lets you swap in any Large Language Model (LLM) or Text-to-Speech (TTS) model. It’s ideal for voice-based use cases like customer support bots and receptionists. To create this app, we use PipeCat, a framework that handles component integration, user interruptions, and audio data processing. We’ll demonstrate this by joining a meeting room with our voice agent using Daily (PipeCat’s creators) and deploy the app on Cerebrium for seamless deployment and scaling. Essentially our application will have 3/4 parts:Documentation Index
Fetch the complete documentation index at: https://cerebrium-fix-make-entrypoint-docs-explicit.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- Your Pipecat agent which acts as the orchestrator
- Your Deepgram TTS/STT service (Requires a Deepgram Enterprise account)
- A self-hosted LLM using the vLLM framework

Deepgram deployment
For the sake of conciseness, look at our Partner Services page to see how you can deploy a Deepgram service on Cerebrium. The link is hereYou need a Deepgram Enterprise License to do deploy Deegram on Cerebrium else
you must use their API endpoint below.
LLM Deployment
For our LLM we deploy a OpenAI compatible Llama-3 endpoint using the vLLM framework - in order to have a low TTFT we deploy a quantized version (RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8”). Runcerebrium init llama-llm and add the following code to your cerebrium.toml:
HF_TOKEN.
The run cerebrium deploy to make it live - you should see it live in your Cerebrium dashboard. We will use your deployment url in the next step.
Based on your GPU hardware and replica-concurrency in your cerebrium.toml, you can set how many concurrent calls the LLM can take.
Pipecat setup
In your IDE, run the following command to create our pipecat-agent:cerebrium init pipecat-agent. We will be using the Pipecat framework to orchestrate our services to create a voice agent
Add the following pip packages to your cerebrium.toml to create your deployment environment:
- We are using the WebRTC functionality from Daily to create our room which you can switch out for Twilio/Telenyx. We then have two functions to create/authenticate our meeting room - create_room() and create_token()
- For the Deepgram and LLM service, we use a local url to connect to the services within the Cerebrium cluster. We are working on making this better but for now just edit with your project key.
- For TTS, we are using the Cartesia service to showcase how versatile Pipecat is but you can use the TTS service from Deepgram too!
main() function:
This code handles these events:
- First participant joins: Bot introduces itself via a conversation message
- Additional participants join: Bot listens and responds to all participants
- Participant leaves or call ends: Bot terminates itself

python main.py. Your code should then work
That’s it! You now have a fully functioning AI bot that interacts with users through speech in ~500ms. Imagine the possibilities!
Now, let’s create a user interface for the bot.
Deploy to Cerebrium
Deploy the app to Cerebrium by running this command in your terminal:cerebrium deploy
We’ll add these endpoints to our frontend interface.