GPT-4 Turbo Overview
GPT-4 Turbo oferă cel mai bun balance între capabilities, latență și cost pentru majoritatea aplicațiilor voice AI.
128K tokens
Context Window
~500-800ms
TTFT (streaming)
Parallel, reliable
Function Calling
December 2023
Knowledge Cutoff
Best Practices
Use Streaming
Always enable streaming for voice. Start TTS on first tokens.
stream: trueSystem Prompt Caching
Keep system prompt constant for cache benefits.
Use identical system messageShort Responses
Prompt for concise responses suitable for voice.
"Respond in 1-2 sentences"Function Descriptions
Clear function descriptions for reliable calling.
Detailed parameter descriptionsVoice-Optimized System Prompt
const systemPrompt = `You are a friendly AI assistant for [Company]. You are having a phone conversation with a customer. VOICE GUIDELINES: - Keep responses under 2 sentences - Use natural, conversational language - Don't use bullet points or lists - Confirm information naturally - Say phone numbers digit by digit - Ask one question at a time CAPABILITIES: - Check order status - Schedule appointments - Answer product questions - Transfer to human agent If you can't help, offer to transfer to a human.`;
Prompt Tips for Voice
Instruct to be concise: "Keep responses under 50 words"
Avoid lists in voice: "Use flowing sentences, not bullet points"
Conversation context: "You are in a phone conversation"
Pronunciation hints: "Say numbers digit by digit for phone numbers"
Natural confirmations: "Confirm information naturally, don't repeat verbatim"
Function Calling Example
const functions = [{
name: "check_order_status",
description: "Check the status of a customer order",
parameters: {
type: "object",
properties: {
order_id: {
type: "string",
description: "The order ID (e.g., ORD-12345)"
}
},
required: ["order_id"]
}
}];
// GPT-4 will call this when user asks about order
// "Where is my order 12345?"
// -> function_call: check_order_status({order_id: "12345"})