English | Русский
Smart Home Microphone Module is a compact IoT device based on the ESP32-S3 microcontroller, designed for use in voice-controlled systems, audio monitoring, and integration with smart-home infrastructure. The device captures ambient sound, isolates voice data, processes it, and transmits it over a distributed network.
Note
This project is based on the voice_activity_detection example from Espressif’s esp-skainet repository.
The project comprises the following components:
- Microphone Module (source code, schematic, PCB layout, enclosure)
- Server (server-side source code)
- voice_activity_detection Example from Espressif, adapted for DevKit ESP32-S3 with one or two INMP441 MEMS microphones
The core components of the device are the ESP32-S3-WROOM-N16R8 module and two I²S MEMS microphones INMP441.
Warning
This project works only with an ESP32 variant that supports at least 4 MB of PSRAM!
The module reads audio data from the I²S microphones and processes it using Espressif’s Audio Front End (AFE). During processing, it performs Voice Activity Detection (VAD) and Noise Suppression (NS). The voice data then undergoes Automatic Gain Control (AGC), is converted to mono, packaged into RTP packets, and sent to the server over Wi-Fi via UDP.
Schematic of AFE audio processing:

The project also includes a server application that runs on the local network and performs speech recognition.
Key features of the device:
- Audio processing: VAD / NS / AGC / MISO
- Logging processed voice data to an SD card (for debugging)
- Transmitting only voice-active raw PCM data over Wi-Fi (UDP/RTP)
Tip
You can easily reproduce this on a breadboard and test it yourself.
Check the module/ directory for the microphone module’s source code and its README with quick-start setup instructions.
Caution
There may be errors in the schematic or PCB layout. Before fabricating the PCB, we recommend consulting a professional!
See the implementation notes.
The server application, written in Python, receives RTP packets from the device over the local network. Depending on user configuration, it can:
- play back the incoming audio in real time,
- save it to a file,
- perform offline speech recognition using the VOSK API framework.
Tip
See the server/ directory for the server application’s source code and its README with quick-start setup instructions.
The original VAD example from Espressif’s esp-skainet repository supports only official evaluation boards (Korvo, Eye, etc.). In this project, it has been extended to run on custom boards (including the ESP32 DevKit) with one or two INMP441 microphones (or other I²S MEMS mics).
Tip
See the examples/voice_activity_detection directory for the example’s description, source code, and its README with quick-start setup instructions.

