Published July 26, 2025 © GPL3+

Real-Time Voice Transcription with XIAO ESP32S3

Turn voice into text using the XIAO ESP32S3’s mic and ElevenLabs API in this compact, AI-powered IoT project.

BeginnerProtip1 hour298

Real-Time Voice Transcription with XIAO ESP32S3

Things used in this project

Hardware components

Seeed Studio XIAO ESP32S3 Sense

USB Cable, USB Type C Plug

Software apps and online services

Arduino IDE

Story

Overview

In this guide, we’ll explore how to use the Seeed XIAO ESP32S3 Sense to capture voice input using its built-in microphone and convert it into text using the ElevenLabs Speech-to-Text API. Powered by the dual-core ESP32-S3 and on-chip PSRAM, this tiny board makes it easy to prototype voice-interactive applications with minimal hardware.

Test the Mic – Record and Save Audio Locally

Before diving into live transcription, let’s verify that the XIAO ESP32S3 Sense’s microphone works as expected. In this section, we’ll use I2S to record a short WAV audio clip and save it to an SD card.

What This Code Does

Captures 10 seconds of mono audio at 16 kHz
Saves it as a proper .wav file to the SD card
Uses built-in PSRAM for buffering
Prepares a valid WAV header for playback on any device

1 / 3 • xiao_s3_audio code

Recording Output

File: /arduino_rec.wav
Format: 16-bit PCM, mono, 16 kHz
Can be played using VLC, Audacity, etc.

1 / 2 • Serial Output

Create an ElevenLabs API Key

To use ElevenLabs' Speech-to-Text API, you’ll need an API key. Here's how to generate one:

Go to https://elevenlabs.io and sign in or create a free account.
Once logged in, navigate to your Account Settings or API section from the dashboard.
Find the API Key area and click “Create New Key” (give it a name like “XIAO_STT_Test”).
Copy the generated API key and save it in a safe place—you’ll use this in the Arduino sketch later.

Create API Key

Send Audio to ElevenLabs and Get Transcription

With our audio successfully recorded and saved to the SD card as a.wav file, it's time to bring in the power of ElevenLabs. In this section, we'll walk through how to send the recorded file to the ElevenLabs Speech-to-Text API and receive back a transcription.

What This Code Does

Connects your XIAO ESP32S3 to WiFi
Records 5 seconds of 16-bit mono audio at 16 kHz using the built-in mic
Saves the file to an SD card with a valid .wav header
Finds the latest .wav file recorded
Sends the file using a multipart HTTP POST request to the ElevenLabs STT API
Parses the JSON response and prints the transcribed text to Serial

xiao_s3_eleven_api

Output

Once the recording is saved to the SD card, the XIAO ESP32S3 sends the.wav file to ElevenLabs' Speech-to-Text API.
Upon success, the transcription result is displayed in the Arduino Serial Monitor.
This demonstrates that the device handles multilingual inputs with high accuracy and returns detailed transcription metadata like word-level timestamps and confidence scores.

1 / 2 • Serial Output

Code

Credits

Dev Bhavsar

7 projects • 8 followers

I like turning ideas into real things — whether it’s with code, hardware, or both. Always building, always learning.

Real-Time Voice Transcription with XIAO ESP32S3

Things used in this project

Hardware components

Software apps and online services

Story

Overview

Test the Mic – Record and Save Audio Locally

Create an ElevenLabs API Key

Send Audio to ElevenLabs and Get Transcription

Schematics

Schematic for XIAO ESP32S3

Code

Speech to Text

Credits

Dev Bhavsar

Comments

Embed the widget on your own site

Real-Time Voice Transcription with XIAO ESP32S3

Real-Time Voice Transcription with XIAO ESP32S3

Things used in this project

Hardware components

Software apps and online services

Story

Overview

Test the Mic – Record and Save Audio Locally

Create an ElevenLabs API Key

Send Audio to ElevenLabs and Get Transcription

Schematics

Schematic for XIAO ESP32S3

Code

Speech to Text

Credits

Dev Bhavsar

Comments

Related channels and tags