Speed = Natural Conversation
Latența peste 300ms face conversația să se simtă nenaturală. Ținta: sub 200ms end-to-end pentru experiență seamless.
Latency Breakdown
Optimization Techniques
🌐 Network Layer
- ✓ Edge deployment (PoP near users)
- ✓ Direct peering with carriers
- ✓ UDP for media (not TCP)
- ✓ Minimize hops
- ✓ Geographic load balancing
🎤 Audio Layer
- ✓ Small audio frames (20ms)
- ✓ Adaptive jitter buffer
- ✓ Early packet processing
- ✓ Streaming STT (not batch)
- ✓ Streaming TTS output
🧠 AI Layer
- ✓ Streaming LLM responses
- ✓ Speculative execution
- ✓ Model quantization
- ✓ GPU acceleration
- ✓ Parallel processing
⚙️ Infrastructure
- ✓ Connection pooling
- ✓ Warm containers
- ✓ In-memory caching
- ✓ Async operations
- ✓ Zero-copy audio
Streaming Architecture
// Streaming pipeline - process audio as it arrives
async function processAudioStream(audioStream) {
// Stream 1: Audio → STT (streaming)
const transcriptStream = stt.streamingRecognize(audioStream);
// Stream 2: Transcript → LLM (streaming)
const responseStream = llm.streamCompletion(transcriptStream);
// Stream 3: Response → TTS (streaming)
const audioResponse = tts.streamSynthesize(responseStream);
// Start playing audio as soon as first chunk ready
// Don't wait for full response!
return audioResponse;
}
// Result: User hears response starting ~150ms after speaking
// vs ~800ms+ with batch processing❌ Batch Processing
Wait for user to finish → Process all → Return all
~800ms+ latency
✓ Stream Processing
Process chunks as they arrive → Return immediately
~150ms latency
Edge Deployment
Global PoP Locations
Voice AI processing runs at nearest PoP to minimize round-trip time.
Jitter Buffer Tuning
Static Buffer
Fixed delay, predictable but may be too much or too little.
Adaptive Buffer (Recommended)
Adjusts based on network conditions. Lower latency on good networks.
Latency Monitoring
Latency Impact on Experience
Feels like real-time, natural conversation
Slight delay, still comfortable
Noticeable delay, may cause interruptions
Frustrating, users talk over each other