Aura - Intelligent Voice Refinement Engine

Aura is an open-source, free, privacy-first voice refinement tool. Its core experience is the capsule: press a hotkey to record, press again to finish, then Aura transcribes and refines your speech and inserts the result into the active input field (or falls back to clipboard).

Aura is inspired by the Typeless interaction model, but built independently with its own implementation and product identity.

Highlights

Fast capture: One hotkey to start, one to finish
Intelligent refinement: Transforms spoken drafts into polished, structured text ready to use
Auto insert: Writes directly into the focused input field or clipboard
Multi-language support: Primarily tested with Simplified Chinese and English; other languages depend on model capabilities
Local or cloud: ASR and LLM can run locally or via cloud providers
Minimal UI: Main app is for settings, model management, and history only
Multiple output modes: Notes, emails, reports, and social media formats
Resource adaptive: Automatically selects optimal models based on system resources
Cross-platform: Built with Tauri 2.0 for macOS, Windows, and Linux (Windows/Linux not fully tested yet)

How It Works

Hotkey: Option + Shift + Space (macOS)
Capsule flow:
- First press: Start recording
- Second press: Stop and process
Processing pipeline:
- Audio capture → Speech recognition (ASR) → Denoising → LLM refinement → Structured output
Auto insert:
- Cursor in input field → Auto paste
- Otherwise → Clipboard
Cancel operation: Press Esc to cancel at any time

Development Setup

cd aura
npm install
npm run tauri -- dev

Quick start:

./start.sh

Requirements

Rust 1.70+
Node.js 18+
ffmpeg (for audio conversion)
Ollama (only required for local LLM refinement)

Platform Support

macOS: Fully tested and supported
Windows: Functional but not fully tested
Linux: Functional but not fully tested

Note: The application is built with Tauri 2.0 and is cross-platform by design. Windows and Linux builds should work, but have not undergone comprehensive testing. Some platform-specific features (like auto-paste) may behave differently.

Ollama Example

brew install ollama
ollama serve
ollama pull qwen3.5:2b

Features

1. Speech Recognition (ASR)

Local Options:

Whisper models: tiny, base, small, medium, large-v3
Automatic model download from HuggingFace mirrors
Primarily tested with Chinese and English; other languages supported based on Whisper capabilities
Optimized for 16kHz mono audio

Cloud Providers:

OpenAI (Whisper API, GPT-4o transcription)
Groq (Whisper large-v3-turbo)
Deepgram (Nova-2)
AssemblyAI
Azure Speech Services
Google Speech-to-Text
Custom OpenAI-compatible endpoints

2. Text Refinement (LLM)

Processing Pipeline:

Denoising: Removes filler words ("um", "uh", "那个", "嗯")
Context application: User terminology, name mappings, forbidden words
LLM refinement: Transforms spoken language into polished text
Structure detection: Automatically formats enumerated lists
Language normalization: Converts to Simplified Chinese when applicable

Local Options:

Ollama integration (qwen3.5:2b, llama3.2, gemma3, mistral, etc.)
Automatic model preloading for faster response
Resource-aware model selection

Cloud Providers:

OpenAI (GPT-4.1, GPT-4o)
Anthropic (Claude 3.5 Sonnet, Haiku)
Google Gemini (1.5 Pro, 2.0 Flash)
DeepSeek
Qwen (Alibaba Cloud)
GLM (Zhipu AI)
Kimi (Moonshot)
Minimax
OpenRouter
Custom OpenAI-compatible endpoints

3. Output Modes

Note (default): Natural, clear text suitable for notes
Email: Concise, professional email body
Report: Structured brief for daily/weekly reports
Social: Social media-friendly short content

4. Language Support

Fully Tested:

Simplified Chinese
English

Other Languages:

Supported based on underlying model capabilities (Whisper for ASR, selected LLM for refinement)
Not comprehensively tested; functionality may vary
Language detection and processing depend on model performance

Settings

The desktop UI provides comprehensive configuration:

General: Interface language (English/Chinese), audio input device
Speech Recognition: Local/cloud routing, provider selection, model configuration
Refinement: LLM provider settings, model selection with recommendations
Wake Word: Enable/disable wake word detection (in development)
History: Paginated view of recent transcriptions and refinements
Diagnostics: Real-time environment health checks for ASR, LLM, delivery, and runtime

Recommended Models

Local ASR:

whisper-base: Recommended default, balanced speed and accuracy (~142MB)
whisper-tiny: Lowest resource usage, suitable for older machines (~75MB)
whisper-small: Higher accuracy, heavier download (~466MB)

Local LLM:

qwen3.5:2b: Recommended default, fast and efficient
llama3.2:3b: Good alternative for English
gemma3:4b: Balanced performance

Cloud ASR:

OpenAI: gpt-4o-mini-transcribe (fast and affordable)
Groq: whisper-large-v3-turbo (fastest)
Deepgram: nova-2 (high accuracy)

Cloud LLM:

OpenAI: gpt-4.1-mini (fast and cost-effective)
Anthropic: claude-3-5-sonnet-latest (highest quality)
DeepSeek: deepseek-chat (excellent Chinese support)

Auto Insert

macOS: Auto-paste requires Accessibility permission. Without it, Aura falls back to clipboard output.

Enable at: System Settings → Privacy & Security → Accessibility

Windows/Linux: Auto-paste behavior may differ from macOS. The application will attempt to paste or fall back to clipboard depending on system capabilities.

Build & Release

Build the app

npm run build

macOS release build

./build-release.sh

Primary output:

Aura-macos-universal.dmg

macOS signing & notarization

For public distribution, configure Apple Developer certificates and follow:

RELEASE_CHECKLIST.md
build-release.sh

Project Structure

aura/
├── src/                    # Frontend (React + TypeScript)
│   ├── App.tsx            # Main application component
│   ├── components/        # UI components
│   └── assets/            # Frontend assets
├── src-tauri/             # Backend (Rust)
│   ├── src/
│   │   ├── asr/          # Speech recognition module
│   │   ├── llm/          # LLM client
│   │   ├── processing/   # Text processing (denoising)
│   │   ├── storage/      # Data storage
│   │   ├── learning/     # Learning and correction
│   │   ├── core.rs       # Core refinement engine
│   │   ├── models.rs     # Data models
│   │   ├── settings.rs   # Configuration management
│   │   └── lib.rs        # Tauri commands
│   ├── icons/            # Application icons
│   └── Cargo.toml        # Rust dependencies
├── scripts/               # Release tooling scripts
├── package.json          # Node.js dependencies
└── README.md             # Project documentation

Core Architecture

Frontend (React + TypeScript):

Capsule UI state management
Settings interface and history view
Audio recording (WebRTC)
Real-time audio level visualization

Backend (Rust + Tauri):

ASR Engine: Local Whisper and cloud API integration
LLM Client: Ollama and multi-cloud provider support
Core Engine: Denoising → Context application → LLM refinement → Structuring
Storage Layer: SQLite (user context) + LanceDB (vector retrieval)
System Integration: Global shortcuts, Accessibility API, clipboard

Technology Stack

Frontend: React 19, TypeScript, Vite
Backend: Rust, Tauri 2.0
Speech Recognition: whisper-rs, cloud APIs
LLM: Ollama, OpenAI, Anthropic, Gemini, etc.
Audio Processing: cpal, hound, ffmpeg
Database: rusqlite, LanceDB
System APIs: macOS Accessibility, Core Graphics

Roadmap

Current (v0.1.0):

✅ End-to-end voice flow
✅ Local + cloud dual stack
✅ Typeless-style capsule interaction
✅ Multiple output modes
✅ Resource-adaptive model selection

Planned:

🔄 Wake word real-time listening
🔄 Comprehensive Windows and Linux testing
🔄 Enhanced provider auto-detection
🔄 User context and correction memory (personalization features)
🔄 Additional output modes and scenarios
🔄 Multi-language testing and optimization

License

MIT

Contributing

Issues and Pull Requests are welcome!

Acknowledgments

Inspired by: Typeless
Speech recognition: OpenAI Whisper
LLM support: Ollama, OpenAI, Anthropic, and more

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
public		public
scripts		scripts
src-tauri		src-tauri
src		src
.gitignore		.gitignore
README-zh.md		README-zh.md
README.md		README.md
build-release.sh		build-release.sh
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
start.sh		start.sh
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
verify-offline.sh		verify-offline.sh
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aura - Intelligent Voice Refinement Engine

Highlights

How It Works

Development Setup

Requirements

Platform Support

Ollama Example

Features

1. Speech Recognition (ASR)

2. Text Refinement (LLM)

3. Output Modes

4. Language Support

Settings

Recommended Models

Auto Insert

Build & Release

Build the app

macOS release build

macOS signing & notarization

Project Structure

Core Architecture

Technology Stack

Roadmap

License

Contributing

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aura - Intelligent Voice Refinement Engine

Highlights

How It Works

Development Setup

Requirements

Platform Support

Ollama Example

Features

1. Speech Recognition (ASR)

2. Text Refinement (LLM)

3. Output Modes

4. Language Support

Settings

Recommended Models

Auto Insert

Build & Release

Build the app

macOS release build

macOS signing & notarization

Project Structure

Core Architecture

Technology Stack

Roadmap

License

Contributing

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages