Voice Commander 🎙️

Local voice transcription with AI-powered refinement for developers

Transform your speech into clean, structured prompts using Whisper.cpp (local, GPU-accelerated) + Gemini API (cloud refinement).

✨ Features

🎤 Hotkey Recording: F8/F9 to start/stop
🚀 GPU Acceleration: CUDA-powered Whisper transcription
🤖 AI Refinement: Gemini cleans up filler words, fixes grammar, structures output
📝 Structured Output: XML/JSON/plain text formats
🔒 Privacy-First: Transcription runs locally, only refined text hits API
⚡ Auto-Paste: Seamlessly inserts text at cursor
🔌 VS Code Extension: Integrated workflow

🎬 Demo

test_lq.mp4

🎯 Use Cases

Dictate code comments without "um" and "uh"
Convert rambling thoughts into structured prompts
Hands-free coding when keyboard is unavailable
Faster brainstorming and documentation

Setup

Linux (GPU-accelerated)

Build whisper.cpp with CUDA:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
mkdir build && cd build
cmake .. -DGGML_CUDA=ON -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release -j$(nproc)
cd ../..

Download model:

cd whisper.cpp/models
bash download-ggml-model.sh medium.en
cd ../..

Install Python dependencies:

pip install sounddevice scipy numpy pyperclip pynput python-dotenv google-genai

Configure AI refinement (optional but recommended):

Copy the example config:
```
cp .env.example .env
```
Edit .env and add your Gemini API key:
```
GEMINI_API_KEY=your-api-key-here
VC_ENABLE_LLM=true
VC_LLM_FORMAT=xml  # Options: plain, xml, json
```
Get a free API key: https://aistudio.google.com/apikey
Run Voice Commander:
```
python Linux/portable_commander_gpu.py
```

macOS/Windows

Install whisper.cpp:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make

Download model:

bash ./models/download-ggml-model.sh medium.en

Install Python dependencies:

pip install sounddevice scipy numpy pyperclip pynput

Run Voice Commander:
```
python portable_commander.py
```

VS Code Extension

See VScode_extension/ folder for VS Code integration.

Usage

Press F8 to start recording
Press F9 to stop and paste text
Works in any application

⚙️ Configuration

Edit .env file:

Variable	Options	Default	Description
`VC_ENABLE_LLM`	`true`/`false`	`true`	Enable AI refinement
`VC_LLM_FORMAT`	`plain`/`xml`/`json`	`xml`	Output structure
`GEMINI_API_KEY`	Your API key	-	Required for refinement
`VC_PASTE_MODE`	`auto`/`ctrl_v`/`ctrl_shift_v`	`auto`	Paste behavior

📋 Requirements

Python 3.7+
CUDA-capable GPU (for acceleration)
whisper.cpp compiled in parent directory
Microphone access
Gemini API key (free tier available)

🧠 How It Works

Press F8 → Start recording
Speak naturally → "um, so like, I need a function that uh calculates fibonacci"
Press F9 → Stop recording
Whisper transcribes (local, GPU-accelerated)
Gemini refines → Removes fillers, fixes grammar, structures output
Auto-pastes → Clean text appears at cursor

Example:

Input:  "um so like I want to [NOISE] create a function that uh calculates fibonacci"
Output: <prompt><task>Create a function that calculates the Fibonacci sequence</task></prompt>

🚀 Publishing to Portfolio

This project demonstrates:

End-to-end ML pipeline integration
GPU optimization (CUDA)
API integration (Gemini)
Production-ready error handling
Real-world developer tooling

🤝 Contributing

PRs welcome! Areas for improvement:

Additional LLM providers (OpenAI, Anthropic)
Custom prompt templates
Multi-language support
Voice command macros

📄 License

MIT License - see LICENSE file

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
Linux		Linux
Windows		Windows
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Commander 🎙️

✨ Features

🎬 Demo

🎯 Use Cases

Setup

Linux (GPU-accelerated)

macOS/Windows

VS Code Extension

Usage

⚙️ Configuration

📋 Requirements

🧠 How It Works

🚀 Publishing to Portfolio

🤝 Contributing

📄 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Commander 🎙️

✨ Features

🎬 Demo

🎯 Use Cases

Setup

Linux (GPU-accelerated)

macOS/Windows

VS Code Extension

Usage

⚙️ Configuration

📋 Requirements

🧠 How It Works

🚀 Publishing to Portfolio

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages