Skip to main content

Documentation Index

Fetch the complete documentation index at: https://hyperwhisper.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Local models let HyperWhisper transcribe fully offline — once a model is downloaded, audio never leaves your device and no internet connection is required to transcribe. HyperWhisper ships two families of speech-to-text models:
  • Whisper — OpenAI’s general-purpose multilingual models, multiple sizes from Tiny to Large v3
  • Parakeet — NVIDIA’s models, fast and accurate for English (v2) and 25 European languages (v3)
For local post-processing (cleanup, formatting), variants of Google’s Gemma 4 are also available on Windows x64. Model Management

Downloading models

Open Model Management from the app, click Download on any entry, and watch the circular progress indicator. You can cancel mid-stream with the × button if needed. Downloaded models are stored on disk and stay there until you remove them.
PlatformStorage location
Windows%LOCALAPPDATA%\HyperWhisper\Models\
macOS~/Library/Application Support/HyperWhisper/Models/

Does local mean offline?

Yes. After a model finishes downloading, HyperWhisper transcribes entirely on your machine with no network calls. This is the strongest privacy guarantee — your audio is never uploaded anywhere. See Data Privacy for details.

GPU vs CPU

Local transcription engines use your GPU for acceleration when available, and fall back to CPU automatically if you don’t have a dedicated GPU or there isn’t enough VRAM. Local Gemma post-processing is available on Windows x64 only: it uses CUDA on NVIDIA GPUs and falls back to CPU on x64 AMD, Intel, non-GPU, or GPU-constrained systems.
EngineBackendGPU supportCPU fallback
Whisper (Windows)WhisperNet / DirectComputeNVIDIA, AMD, Intel (any DirectX 11 GPU)Yes
Whisper (macOS)libwhisper / MetalApple Silicon GPU + Neural EngineYes
Parakeet (Windows)sherpa-onnx / DirectMLNVIDIA, AMD, IntelYes
Parakeet (macOS)sherpa-onnx / CoreMLApple SiliconYes
Local Gemma post-processing (Windows)LLamaSharp / GGUFNVIDIA CUDA only on x64Yes
The VRAM values listed below are the recommended GPU memory for full GPU acceleration. If your GPU has less than the recommended VRAM, the model still runs — it just uses CPU (or partial GPU offload), which is slower but works on any modern machine.
No dedicated GPU? Pick a Tiny or Base model and CPU transcription will still feel responsive on push-to-talk-length clips. For longer files, consider HyperWhisper Cloud instead — see Providers.

Whisper models

ModelSizeRecommended VRAMLanguagesBest for
Tiny78 MB~1 GBMultilingualLowest-end machines, quick drafts
Tiny (English)78 MB~1 GBEnglish onlySame as Tiny, slightly better English
Base148 MB~1 GBMultilingualLight hardware, basic dictation
Base (English)148 MB~1 GBEnglish onlySame as Base, slightly better English
Small488 MB~2 GBMultilingualBest balance for most users
Small (English)488 MB~2 GBEnglish onlySame as Small, slightly better English
Medium1.5 GB~5 GBMultilingualHigher accuracy, mid-range GPUs
Medium (English)1.5 GB~5 GBEnglish onlySame as Medium, slightly better English
Large v3 Turbo1.5 GB~6 GBMultilingualNear-Large accuracy, much faster
Large v23.1 GB~10 GBMultilingualHighest Whisper accuracy (older)
Large v33.1 GB~10 GBMultilingualHighest Whisper accuracy (latest)
English-only variants (.en) are the same architecture as the multilingual model but trained only on English data. If you only ever dictate in English, they’re slightly more accurate at the same size — but you lose multilingual support entirely.

Parakeet models

NVIDIA Parakeet models are typically faster than equivalent-size Whisper models and very accurate for the languages they support.
ModelSizeLanguagesBest for
Parakeet v2 (English)661 MBEnglish onlyFastest accurate English transcription
Parakeet v3 (Multilingual)671 MB25 European languagesMultilingual European dictation
Parakeet v3 covers: English, German, Spanish, French, Italian, Portuguese, Dutch, Polish, Romanian, Swedish, Danish, Finnish, Norwegian, Czech, Slovak, Hungarian, Croatian, Slovenian, Bulgarian, Ukrainian, Greek, Lithuanian, Latvian, Estonian, Catalan, Basque.
On Windows, Parakeet runs on both x64 and ARM64, while Whisper is currently x64-only. If you’re on a Snapdragon / ARM Windows device, choose Parakeet.

Local Gemma post-processing models

Local Gemma models clean up and format transcript text after transcription. They run offline after download, but they are separate from speech-to-text models and are currently available only in the Windows x64 app.
ModelSizeRecommended VRAMBest for
Gemma 4 E2B (Q4_K_M)3.1 GB~4 GBRecommended local cleanup model
Gemma 4 E4B (Q4_K_M)5.0 GB~6 GBHigher quality local cleanup
Gemma 4 26B A4B MoE (UD-Q4_K_M)16.9 GB~18 GBHigh-memory workstations
Gemma 4 31B Dense (Q4_K_M)18.3 GB~20 GBHighest local quality, slowest
On AMD and Intel GPUs, local Gemma post-processing uses CPU fallback in the current Windows build. Transcription can still use GPU acceleration independently.

Which model should I pick?

No dedicated GPU / older laptop

Whisper Tiny or Base — runs on CPU comfortably and still produces good transcripts for short dictation.

Modern desktop with mid-range GPU

Whisper Small (488 MB, ~2 GB VRAM) — the best balance of accuracy and speed for most users. This is the recommended default.

English-only and want speed

Parakeet v2 (English) — typically faster than Whisper Small with comparable English accuracy.

High-end GPU (RTX 3080+ / Apple Silicon Pro/Max)

Whisper Large v3 Turbo for speed, or Whisper Large v3 for maximum accuracy on noisy / accented audio.

When to switch to cloud instead

Local models are great for privacy, offline use, and zero per-minute cost. But cloud providers are noticeably more accurate than even the largest local Whisper model — especially for accented English, technical vocabulary, and noisy audio. If you find local accuracy isn’t cutting it, try a HyperWhisper Cloud tier — there’s no markup over the underlying provider and a Pro license includes 5,000 credits to start. See Providers for tiers and pricing.

Boosting local accuracy

You can squeeze more accuracy out of any local model by:
  • Adding custom vocabulary for product names, jargon, and colleagues’ names
  • Reducing background noise — see Best Practices
  • Speaking at a natural pace (not too fast, not too slow)

Removing models

To free up disk space, click the trash icon next to any downloaded model in Model Management. The file is removed immediately and can be re-downloaded any time.