Smart Home Microphone Module

Smart Home Microphone Module is a compact IoT device based on the ESP32-S3 microcontroller, designed for use in voice-controlled systems, audio monitoring, and integration with smart-home infrastructure. The device captures ambient sound, isolates voice data, processes it, and transmits it over a distributed network.

Note

This project is based on the voice_activity_detection example from Espressif’s esp-skainet repository.

Overview

The project comprises the following components:

Microphone Module (source code, schematic, PCB layout, enclosure)
Server (server-side source code)
voice_activity_detection Example from Espressif, adapted for DevKit ESP32-S3 with one or two INMP441 MEMS microphones

Microphone Module

The core components of the device are the ESP32-S3-WROOM-N16R8 module and two I²S MEMS microphones INMP441.

Warning

This project works only with an ESP32 variant that supports at least 4 MB of PSRAM!

The module reads audio data from the I²S microphones and processes it using Espressif’s Audio Front End (AFE). During processing, it performs Voice Activity Detection (VAD) and Noise Suppression (NS). The voice data then undergoes Automatic Gain Control (AGC), is converted to mono, packaged into RTP packets, and sent to the server over Wi-Fi via UDP.

Schematic of AFE audio processing:

The project also includes a server application that runs on the local network and performs speech recognition.

Key features of the device:

Audio processing: VAD / NS / AGC / MISO
Logging processed voice data to an SD card (for debugging)
Transmitting only voice-active raw PCM data over Wi-Fi (UDP/RTP)

Tip

You can easily reproduce this on a breadboard and test it yourself.
Check the module/ directory for the microphone module’s source code and its README with quick-start setup instructions.

Schematic

Caution

There may be errors in the schematic or PCB layout. Before fabricating the PCB, we recommend consulting a professional!
See the implementation notes.

Enclosure

Server

The server application, written in Python, receives RTP packets from the device over the local network. Depending on user configuration, it can:

play back the incoming audio in real time,
save it to a file,
perform offline speech recognition using the VOSK API framework.

Tip

See the server/ directory for the server application’s source code and its README with quick-start setup instructions.

voice_activity_detection example

The original VAD example from Espressif’s esp-skainet repository supports only official evaluation boards (Korvo, Eye, etc.). In this project, it has been extended to run on custom boards (including the ESP32 DevKit) with one or two INMP441 microphones (or other I²S MEMS mics).

Tip

See the examples/voice_activity_detection directory for the example’s description, source code, and its README with quick-start setup instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 415 Commits
components		components
examples		examples
img		img
module		module
pcb		pcb
server		server
test		test
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.ru.md		README.ru.md
conftest.py		conftest.py
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Smart Home Microphone Module

Overview

Microphone Module

Schematic

Enclosure

Server

voice_activity_detection example

About

Uh oh!

Languages

License

alecproj/microphone-module

Folders and files

Latest commit

History

Repository files navigation

Smart Home Microphone Module

Overview

Microphone Module

Schematic

Enclosure

Server

voice_activity_detection example

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages