How to Create Voice-Enabled Chatbots with Audio Input and AI Speech Output
•
3 minutes read

Adding voice capabilities to your chatbot can elevate user engagement, accessibility, and interactivity. In this guide, we’ll show how to enable audio input and generate AI speech output using Release0’s built-in tools and integrations with ElevenLabs and OpenAI.
🎤 Audio Input: Let Users Speak
Every Text Input block in Release0 can optionally include microphone support. When enabled, users can click a mic icon and speak instead of typing.
How to Enable:
- Add a Text Input block.
- Enable the setting:
🎤 Allow voice input
- Users will see a mic icon next to the text field.
Voice input is especially helpful for mobile users, accessibility flows, and quick replies.
🧠 Speech AI Output with ElevenLabs
To convert chatbot replies into spoken audio, use the ElevenLabs AI integration block.
Setup Steps:
- Add the ElevenLabs block after any AI or Text block.
- Provide your ElevenLabs API key.
- Pass in the
{{response_text}}
from a previous block. - Choose a voice (e.g., Rachel, Adam).
- Use the output variable (e.g.,
{{audio_url}}
) in an Audio block to play the sound.
Example Flow:
- User asks a question → AI answers via OpenAI block
- ElevenLabs converts answer to speech
- Audio block plays it back to user
🌍 Voice Bot Example: World Capitals Quiz
Try this agent that lets you ask for a country and get the capital city spoken back:
Flow Overview:
- Text Input block with voice input enabled
- AI Block (OpenAI or Groq) that answers: "What is the capital of
{{country}}
?" - ElevenLabs block transforms that response into audio
- Audio block plays the AI’s spoken reply
This flow creates a fully voice-interactive experience — ideal for educational bots, multilingual use cases, and accessibility-driven UX.
🔊 When to Use Voice Interactions
- Education & Quizzes: Reinforce learning through spoken feedback
- Customer Support: Let users speak instead of typing long messages
- Language Learning: Teach pronunciation with AI voices
- Hands-Free Access: Great for mobile or voice-first interfaces
Audio input uses the browser’s built-in speech recognition. Accuracy may vary by device and language.
🔧 Combine with Other Blocks
- Use Condition blocks to check if audio transcription contains certain phrases
- Trigger different ElevenLabs voices based on user preferences
- Store voice interaction logs with Submissions or export to Sheets
Ready to bring your bots to life with voice?
Try building your own agent using voice input + speech output, or fork our World Capitals Voice Bot above to get started!