De Ce Latența Contează
Latența sub 800ms se simte naturală. Peste 1.5s devine frustrantă. Optimizarea fiecărei componente face diferența între experiență bună și excelentă.
<800ms
Natural conversation
800-1500ms
Acceptable
>1500ms
Frustrating
Latency Breakdown
| Component | Typical | Optimized | Notes |
|---|---|---|---|
| Network (Client → Server) | 20-50ms | 10-30ms | CDN, edge deployment |
| Audio Encoding | 10-20ms | 5-10ms | Opus low-delay mode |
| ASR Processing | 150-300ms | 100-200ms | Deepgram Nova-2 |
| LLM Inference | 500-1500ms | 200-500ms | Streaming, caching |
| TTS Generation | 200-400ms | 80-200ms | Cartesia/ElevenLabs Turbo |
| Audio Decoding | 5-10ms | 2-5ms | Hardware decode |
| Network (Server → Client) | 20-50ms | 10-30ms | Streaming chunks |
| TOTAL | 905-2330ms | 407-975ms | 2-3x improvement |
Optimization Techniques
Edge Deployment
-50msDeploy closer to users
Streaming TTS
-200msStart playback before full generation
LLM Streaming
-500msToken-by-token to TTS
Prompt Caching
-100msCache system prompts
Speculative Execution
-150msPre-generate likely responses
Connection Pooling
-30msReuse connections
Target Latencies by Use Case
| Use Case | Target | Acceptable |
|---|---|---|
| Conversational AI | <800ms | <1200ms |
| Customer Support | <1000ms | <1500ms |
| IVR Navigation | <500ms | <800ms |
| Real-time Translation | <300ms | <500ms |
Streaming Pipeline
User speaks
→ASR streams words
→LLM starts on partial
→TTS starts on first tokens
→Audio plays
Cu streaming end-to-end, userul aude răspunsul în timp ce AI-ul încă generează.
Kallina: Optimized by Default
Stack-ul nostru este pre-optimizat pentru latență minimă.
Testează Latența →