Media Processing Foundation
Media servers gestionează fluxurile audio: mixing, transcoding, recording, și interfața cu voice AI processing.
Media Server Options
FreeSWITCH
Open-source, highly scalable. Best for complex deployments.
- ✓ High performance
- ✓ Lua/JavaScript scripting
- ✓ WebRTC native
- ✓ Enterprise features
Asterisk
Mature, well-documented. Great for traditional PBX.
- ✓ Huge community
- ✓ Dialplan flexibility
- ✓ AGI scripting
- ✓ Proven stability
Janus
WebRTC gateway focused. Lightweight and modern.
- ✓ WebRTC native
- ✓ Plugin architecture
- ✓ Low footprint
- ✓ Modern C code
FreeSWITCH Architecture
Core
Event-driven architecture, handles call state, routing, and module coordination.
Endpoints
SIP, WebRTC, Verto
Codecs
G.711, Opus, G.729
Applications
IVR, Conference, Record
Event Socket
External control interface for Voice AI integration. Send commands, receive events.
Voice AI Integration
// FreeSWITCH ESL integration for Voice AI
const { Connection } = require('modesl');
const esl = new Connection('127.0.0.1', 8021, 'ClueCon');
esl.on('esl::ready', () => {
// Subscribe to call events
esl.subscribe(['CHANNEL_CREATE', 'CHANNEL_ANSWER',
'CHANNEL_HANGUP', 'DTMF']);
});
esl.on('esl::event::CHANNEL_ANSWER', (event) => {
const uuid = event.getHeader('Unique-ID');
// Start Voice AI processing
esl.api('uuid_audio_fork',
`${uuid} start ws://voice-ai:8080/audio both`);
// Play greeting
esl.execute('speak',
'flite|kal|Hello, how can I help you?', uuid);
});
// Receive transcription from Voice AI
voiceAI.on('transcription', (text, uuid) => {
// Process with LLM and respond...
});Media Server Functions
Transcoding
Convert between codecs (G.711 ↔ Opus).
Mixing
Combine multiple audio streams (conferencing).
Recording
Capture audio to file (compliance, training).
DTMF
Generate and detect touch-tones.
Audio Fork
Split audio stream to Voice AI.
Playback
Play audio files or TTS output.
VAD
Voice Activity Detection for barge-in.
AGC
Automatic Gain Control for levels.
Server Metrics
Deployment Best Practices
Resource Allocation
- • 1 CPU core per 50-100 concurrent calls
- • 4GB RAM minimum, 8GB+ recommended
- • SSD storage for recordings
- • Dedicated network interface
Network Configuration
- • Disable firewall conntrack for RTP
- • Use jumbo frames if available
- • QoS marking (DSCP EF)
- • Separate signaling/media interfaces