De Ce VAD e Critic

VAD determină când începe și se termină discursul utilizatorului. Esențial pentru endpointing (când să răspundă AI-ul) și barge-in (când utilizatorul întrerupe).

<50ms

Detection latency

>95%

Accuracy target

Real-time

Frame-by-frame

VAD Types

Energy-based

Compare audio energy to threshold

Pros

+ Very fast
+ Low CPU
+ Simple

Cons

- Fails in noise
- Needs tuning

Zero-Crossing Rate

Count signal sign changes

Pros

+ Fast
+ Combined with energy

Cons

- Not robust alone

GMM-based

Statistical model of speech/non-speech

Pros

+ More robust
+ Adaptive

Cons

- Higher latency
- More complex

Neural Network

Deep learning classification

Pros

+ Best accuracy
+ Handles noise

Cons

- CPU/GPU needed
- Latency

WebRTC VAD Modes

Mode	Aggressiveness	Description
0	Quality	Least aggressive, highest quality
1	Low	Low aggressiveness
2	Medium	Medium aggressiveness (default)
3	High	Most aggressive, may clip speech

Use Cases

Endpointing

Detect when user stops speaking

Yes

Barge-in

Detect when user interrupts AI

Yes

Bandwidth Saving

Don't transmit silence

Medium

ASR Optimization

Only process speech regions

Medium

Recording Trimming

Remove silence from recordings

Low

Endpointing Configuration

// VAD-based endpointing configuration
const vadConfig = {
  // Minimum speech duration to trigger
  minSpeechDuration: 200,  // ms

  // Silence duration to trigger endpoint
  endpointSilence: 700,    // ms

  // Hangover (buffer after speech)
  hangoverTime: 300,       // ms

  // Energy threshold (dB)
  energyThreshold: -35,

  // VAD mode (0-3)
  vadMode: 2,

  // Use neural VAD (more accurate)
  useNeuralVAD: true
};

// Events
vad.on('speechStart', () => {
  // User started speaking
  stopAIPlayback();  // For barge-in
});

vad.on('speechEnd', () => {
  // User finished speaking
  triggerASRFinalization();
});

Quality Metrics

False Acceptance Rate

Noise classified as speech

<5%

False Rejection Rate

Speech classified as silence

<2%

Detection Latency

Time to detect speech start

<50ms

Hangover Time

Buffer after speech ends

200-500ms

Kallina: Intelligent VAD

Neural VAD pentru endpointing precis și natural.

Testează →

Voice Activity Detection

De Ce VAD e Critic

VAD Types

Energy-based

Zero-Crossing Rate

GMM-based

Neural Network

WebRTC VAD Modes

Use Cases

Endpointing

Barge-in

Bandwidth Saving

ASR Optimization

Recording Trimming

Endpointing Configuration

Quality Metrics

False Acceptance Rate

False Rejection Rate

Detection Latency

Hangover Time

Kallina: Intelligent VAD

Conținut Relevant

Funcționalități

Prețuri

Solicită Demo

Case Studies

FAQ

Transformă Comunicarea cu Clienții

Rămâi la curent