Codec = Quality + Efficiency
Codec-urile comprimă și decomprimă audio. Alegerea corectă balansează calitatea vocii cu utilizarea bandwidth-ului și latența.
Codec Comparison
| Codec | Bitrate | Sample Rate | MOS | Latency | Best For |
|---|---|---|---|---|---|
| G.711 μ-law | 64 kbps | 8 kHz | 4.1 | Very Low | PSTN compatibility |
| G.711 A-law | 64 kbps | 8 kHz | 4.1 | Very Low | EU PSTN |
| Opus | 6-510 kbps | 8-48 kHz | 4.5+ | Low | WebRTC, quality |
| G.729 | 8 kbps | 8 kHz | 3.9 | Medium | Bandwidth savings |
| G.722 | 64 kbps | 16 kHz | 4.3 | Low | HD Voice |
| SILK | 6-40 kbps | 8-24 kHz | 4.2 | Low | Variable networks |
Opus: Recommended for Voice AI
Why Opus?
- ✓ Adaptive bitrate based on network
- ✓ Wideband audio (16 kHz+) for clear speech
- ✓ Low latency (2.5-60ms frames)
- ✓ Excellent packet loss resilience
- ✓ Open and royalty-free
- ✓ Native WebRTC support
Opus Configuration
{
"codec": "opus",
"bitrate": 24000, // 24 kbps
"sampleRate": 16000, // 16 kHz
"channels": 1, // Mono
"frameSize": 20, // 20ms frames
"fec": true, // Forward error correction
"dtx": true // Discontinuous transmission
}G.711: PSTN Standard
μ-law (PCMU)
Used in North America and Japan. Optimized for voice frequencies.
A-law (PCMA)
Used in Europe and rest of world. Slightly better SNR.
Codec Selection Strategy
PSTN Calls (Inbound/Outbound)
Use G.711 (PCMU/PCMA) - universal compatibility, no transcoding needed.
WebRTC Browser Calls
Use Opus - best quality, adaptive to network conditions.
Low Bandwidth Scenarios
Use G.729 or Opus at low bitrate - efficient compression.
Quality vs Bandwidth
* Including RTP/UDP/IP overhead. Quality measured as MOS equivalent percentage.
Transcoding Considerations
Avoid When Possible
Transcoding adds latency și poate degrada calitatea.
- • +5-20ms latency per transcode
- • Quality loss (especially lossy→lossy)
- • CPU resources consumed
When Necessary
Use dedicated transcoding resources.
- • WebRTC ↔ PSTN calls
- • Different codec endpoints
- • Recording in specific format
Voice AI Codec Requirements
| Component | Preferred Format | Reason |
|---|---|---|
| STT (Speech-to-Text) | 16 kHz, 16-bit PCM | Optimal for speech recognition |
| TTS (Text-to-Speech) | 24 kHz, 16-bit PCM | High quality synthesis output |
| Recording Storage | Opus or MP3 | Storage efficiency |
| Live Playback | Match caller codec | Avoid transcoding |