🎤 Build a Local Voice Detection System (Offline, Free)

🧩 1. What Is “Voice Detection”?

Voice detection (VAD) = detecting when someone is talking vs. silence or noise.
It doesn’t transcribe speech — it just identifies that speech is happening.

This is the foundation of:

Voice assistants
Smart recording tools
Real-time speech analytics

🧰 2. Tools You’ll Need (All Free)

Library	Purpose	Offline?
`sounddevice`	Access microphone input	✅
`numpy`	Handle audio data	✅
`webrtcvad`	Voice Activity Detection (by Google, open source)	✅
`wave`	Save audio files	✅

Install them (Python ≥3.8):

pip install sounddevice numpy webrtcvad

⚙️ 3. Basic Concept

We’ll:

Capture short chunks of microphone audio.
Use WebRTC VAD to detect if that chunk contains voice.
Print a message (“Voice detected”) when someone talks.

💻 4. Full Example: `voice_detector.py`

import sounddevice as sd
import numpy as np
import webrtcvad
import struct

# -----------------------------
# SETTINGS
# -----------------------------
SAMPLE_RATE = 16000  # samples per second
FRAME_DURATION = 30  # ms
FRAME_SIZE = int(SAMPLE_RATE * FRAME_DURATION / 1000)  # samples per frame
VAD_MODE = 2  # 0=very sensitive, 3=very strict

vad = webrtcvad.Vad(VAD_MODE)

# -----------------------------
# Helper: convert numpy audio chunk to bytes
# -----------------------------
def audio_to_bytes(audio):
    ints = np.int16(audio * 32768)
    return struct.pack("%dh" % len(ints), *ints)

# -----------------------------
# Main Loop
# -----------------------------
def main():
    print("🎙️ Voice Detection started (Ctrl+C to stop)")
    with sd.InputStream(channels=1, samplerate=SAMPLE_RATE, blocksize=FRAME_SIZE) as stream:
        while True:
            audio_chunk, _ = stream.read(FRAME_SIZE)
            audio_chunk = np.squeeze(audio_chunk)

            audio_bytes = audio_to_bytes(audio_chunk)

            if vad.is_speech(audio_bytes, SAMPLE_RATE):
                print("🟢 Voice detected!")
            else:
                print("⚪ Silence...", end="\r")

if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        print("\nStopped.")

▶️ 5. Run It

python voice_detector.py

Then speak near your microphone — you’ll see:

🟢 Voice detected!

when it hears speech.

🧠 6. How It Works

sounddevice streams real-time microphone data.
Each frame (~30 ms of sound) is analyzed.
webrtcvad applies a machine-learning-based algorithm to detect speech patterns.
The model runs completely offline, using CPU only.

🎧 7. Optional: Record Only When Voice Is Detected

You can extend it to save audio segments that contain voice:

import wave
import time

def record_voice_segments():
    print("🎙️ Recording voice segments (Ctrl+C to stop)")
    with sd.InputStream(channels=1, samplerate=SAMPLE_RATE, blocksize=FRAME_SIZE) as stream:
        buffer = []
        speaking = False
        while True:
            audio_chunk, _ = stream.read(FRAME_SIZE)
            audio_chunk = np.squeeze(audio_chunk)
            audio_bytes = audio_to_bytes(audio_chunk)

            if vad.is_speech(audio_bytes, SAMPLE_RATE):
                buffer.append(audio_bytes)
                if not speaking:
                    print("🟢 Voice detected — recording...")
                    speaking = True
            else:
                if speaking and len(buffer) > 0:
                    filename = f"voice_{int(time.time())}.wav"
                    with wave.open(filename, "wb") as wf:
                        wf.setnchannels(1)
                        wf.setsampwidth(2)
                        wf.setframerate(SAMPLE_RATE)
                        wf.writeframes(b"".join(buffer))
                    print(f"💾 Saved segment to {filename}")
                    buffer.clear()
                    speaking = False

if __name__ == "__main__":
    try:
        record_voice_segments()
    except KeyboardInterrupt:
        print("\nStopped.")

🎯 This script saves a .wav file every time you speak — all offline.

🧱 8. Improvements You Can Add

Noise filtering → use pydub or scipy to denoise input
Visualization → use matplotlib to plot waveforms in real time
Trigger command → when voice detected, call another script or AI agent

Example idea:

if vad.is_speech(audio_bytes, SAMPLE_RATE):
    print("🟢 Voice detected — launching local AI...")
    subprocess.run(["python", "local_agent.py"])

🔒 9. Advantages of Local Voice Detection

✅ 100% offline — no Google, no cloud
✅ Zero cost
✅ Privacy-safe
✅ Low CPU usage (works on any laptop)

🚀 Summary

You just built a real-time local voice detection system using:

Python 🐍
sounddevice 🎤
webrtcvad 🧠

🎤 Build a Local Voice Detection System (Offline, Free)

🧰 2. Tools You’ll Need (All Free)

⚙️ 3. Basic Concept

💻 4. Full Example: `voice_detector.py`

▶️ 5. Run It

🧠 6. How It Works

🎧 7. Optional: Record Only When Voice Is Detected

🧱 8. Improvements You Can Add

🔒 9. Advantages of Local Voice Detection

🚀 Summary

Related

Leave a ReplyCancel reply

🎤 Build a Local Voice Detection System (Offline, Free)

🧰 2. Tools You’ll Need (All Free)

⚙️ 3. Basic Concept

💻 4. Full Example: voice_detector.py

▶️ 5. Run It

🧠 6. How It Works

🎧 7. Optional: Record Only When Voice Is Detected

🧱 8. Improvements You Can Add

🔒 9. Advantages of Local Voice Detection

🚀 Summary

Share this:

Related

Leave a ReplyCancel reply

Discover more from Sowft | Transforming Ideas into Digital Success

💻 4. Full Example: `voice_detector.py`