De Ce Streaming
Fără streaming, utilizatorul așteaptă 2-3 secunde până LLM-ul termină de generat. Cu streaming, aude răspunsul în ~500ms. Diferență uriașă în experiență.
Fără Streaming
Wait 2-3s → Hear full response
Cu Streaming
Wait 500ms → Hear response flowing
Benefits
Lower Perceived Latency
User hears response within 500ms instead of waiting 2-3s
Natural Pacing
Response flows naturally like human speech
Early TTS Start
TTS can start generating while LLM still outputs
Interruptibility
Can stop generation if user interrupts
Streaming Pipeline
// LLM streaming with sentence buffering for TTS
async function streamToTTS(llmStream) {
let buffer = '';
const sentenceEnders = /[.!?]/;
for await (const chunk of llmStream) {
buffer += chunk.text;
// Check for complete sentence
const match = buffer.match(sentenceEnders);
if (match) {
const sentenceEnd = match.index + 1;
const sentence = buffer.slice(0, sentenceEnd);
buffer = buffer.slice(sentenceEnd).trim();
// Send to TTS immediately
await tts.speak(sentence);
}
}
// Send any remaining text
if (buffer.trim()) {
await tts.speak(buffer);
}
}Challenges & Solutions
| Challenge | Solution |
|---|---|
| Sentence Boundaries | Buffer until punctuation or natural break |
| TTS Sync | Queue sentences for smooth playback |
| Function Calls | Detect and handle mid-stream |
| Error Recovery | Graceful fallback if stream fails |
Sentence Buffering Strategy
Token: "Comanda"→ Buffer: "Comanda"
Token: " ta"→ Buffer: "Comanda ta"
Token: " ajunge"→ Buffer: "Comanda ta ajunge"
Token: " mâine."→ SEND TO TTS: "Comanda ta ajunge mâine."