
Pre-Flight: Voice Agent Auditor
Catches the production bugs in your voice agent's prompt before your users do.

About this Agent
A pre-flight check for voice AI agents — built by an engineer who has shipped production voice agents on Bolna, Vapi, Retell, LiveKit, and ElevenLabs Conversational AI, and debugged the bugs that only show up at 2am on a live call. Paste in your voice agent's system prompt, opening line, and a one-line context (platform, use case, language). Get back a senior-engineer code review of your agent's configuration covering: - Race conditions in turn-taking — the "user-says-hello-during-the-welcome-message" bug that breaks most first deployments on Bolna and Vapi. - Prompt ambiguity and missing fallback paths — instructions the LLM can interpret two ways, undefined trigger conditions, no exit conditions, no escalation paths. - TTS-readability and latency issues — URLs and currency symbols read literally, monologues that bloat first-token latency, prompts that exceed practical token budgets for voice. - Safety, PII, and human-handoff gaps — missing consent disclosures, promises the agent structurally can't keep, no guardrails for hostile or distressed callers. Every issue comes with the exact phrase that's risky, why it matters in production, the user-experience impact, and a copy-paste-ready rewrite. Not generic advice. Best used on the day before you ship a new voice agent — or right after you fix a production incident, to catch the next one before it bites.
Key Features
- Catches the race conditions other tools miss. Specifically tuned for the welcome-message-vs-pickup bug on Bolna, the first-message-mode gaps on Vapi, and the begin_message issues on Retell — production failure modes that don't surface in dev testing and have broken more first deployments than any other category.
- Every issue comes with a copy-paste-ready rewrite. Not "consider rephrasing for clarity." The exact replacement text, ready to drop into your system prompt. If the auditor can't write the fix, it doesn't raise the issue.
- Severity tiers that reflect production reality. Critical / High / Medium / Low — anchored to actual ship decisions, not opinion. A Critical means don't deploy. A Low means polish when you have time. The overall 0–10 score tells you in one number whether you're shipping today.
- Audits four categories in one pass. Race conditions and turn-taking, prompt ambiguity and missing fallbacks, TTS-readability and latency, and safety/PII/escalation gaps — the four places voice agents fail in production, covered in a single audit.