Inter-cluster routing enables direct, low-latency communication between your Cerebrium apps within the same region. This private networking feature allows apps to communicate without traversing the public internet, reducing latency and improving performance. This is ideal for low-latency use cases and allows your applications/services to scale independently based on your configured scaling parameters. Inter-cluster routing provides:Documentation Index
Fetch the complete documentation index at: https://cerebrium-fix-make-entrypoint-docs-explicit.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- Low latency: Direct container-to-container communication within the same region (~0.3–1 ms typical)
- High bandwidth: Up to 50 Gbps between containers
- No public internet: Apps communicate directly without external routing
- Observablity: All requests appear in your Cerebrium dashboard with full logs, payloads, and latency metrics
How It Works

http://api.aws/v4/<project_id>/<app_name>/<func_name>
This endpoint pattern remains the same across all regions, so you don’t need to update URLs when deploying to multiple locations. Inter-cluster routing only works between applications deployed within the same region, ensuring that traffic remains private and low latency. Despite passing through the local proxy for authentication and routing, requests never traverse the public internet — they stay fully contained within the cluster network, achieving typical latencies of 0.3–1 ms and bandwidth up to 50 Gbps between containers.
Currently gRPC is unsupported but is on our roadmap