Commander Flow: A Year of Voice Typing Evolution

Commander Flow — a year of voice typing evolution: a developer's desk with a laptop and a microphone

In June 2025 I shipped a build that crashed every 40 minutes and transcribed "hi" as "h-uh-i." I was still thrilled. I knew I was building, on Windows, something that didn't exist there — a fully offline voice-input tool with human-grade polishing. It's April 2026 now, and the same laptop runs Commander Flow in the background for days at a time.

This isn't a product history written from outside. It's the view from the inside: how the product changed, and how my own working habits with text changed in step with it.

Summer 2025: living with Whisper-tiny

The first build used ggml-tiny (75 MB). Recognition was tolerable on clean English, mediocre on other languages, and catastrophic on mixed speech. "Deploy to staging" came out close, but rarely exact. There was no polishing yet — just raw ASR.

I rebuilt my own workflow around the weaknesses of the thing I was writing. Short phrases. No slang. Awkward, but still faster than typing — and even at that stage I knew I wasn't going back.

"When you're building your own tool, you fall in love with the idea before the code is honest enough to reflect it."

Fall 2025: polishing arrived and the product stopped flinching at fillers

I added a local LLM. That was the turning point. You could say "so basically I think we need to, uh, redo this module," and the text field would show "I think we need to redo this module."

That's the moment I stopped pre-formatting my own speech. Before that, I'd internally edit each sentence before pressing the hotkey. Now the thought comes out as a thought, and the tool peels away the chaff. And I knew, watching myself, that anyone who used this seriously would feel the same shift.

Winter 2025/2026: Parakeet, and I learned what "fast" really means

ASR LATENCY · CPU Whisper-large (before) ~900 ms Parakeet TDT v3 (after) ~140 ms 5–10× faster on the same CPU · sherpa-onnx C# bindings "140 ms is shorter than I can perceive a pause"
January 2026: switching to Parakeet via sherpa-onnx — the most palpable improvement of the year.

In January I switched the default ASR to Parakeet-TDT-0.6B-v3 via sherpa-onnx. On a CPU without a GPU, it turned out to be 5–10× faster than Whisper-large. Latency between releasing the hotkey and seeing text dropped from ~900 ms to ~140 ms.

140 milliseconds is shorter than a person can consciously register as a pause. The line between "press hotkey" and "text appears" disappears in perception. From that point on — both for me and for everyone testing the build — dictation stopped feeling like "issuing a command to a tool"; it became simply a continuation of thought.

Spring 2026: the latest Google AI and polishing modes

POLISHING MODE friendly business code-comments prompt-engineering accountant minimal-edit Google AI · balanced Six modes switched by hotkey or from the tray MORNING client emails · business DAY team Slack · friendly EVENING family chat · minimal-edit
Tray menu: modes switch on the fly — but most of the time I just say the tone I want as a voice command.

The current default is the latest Google AI model. And the headline feature, the one I built the whole thing around: polishing modes. You pick a style from the tray or by hotkey: business / friendly / minimal-edit / accountant / academic / code-comments.

My own typical Monday — the first place I tested every new mode:

  • Morning client emails — business
  • Team Slack — friendly
  • Code comments — code-comments (keeps technical terms in their original casing, doesn't "fix" variable names)
  • Evening reply in the family group chat — minimal-edit (just removes fillers, doesn't smooth out the tone)

Retrospective: what changed in me, while I was building this

If I made a bullet list it would sound boring. So I'll put it this way: in a year of using my own tool every day, three things changed in how I deal with text, and none of them were on the roadmap.

First, the ideas in my emails got longer. Typing trims a phrase to whatever your fingers can produce. Voice doesn't rush. Testers noticed my reports had become more structured before I noticed it myself.

Second, English emails stopped making me anxious. I dictate in my native language, ask for a rewrite in business English, and get text indistinguishable from a native speaker. It's no longer a separate stress point — it's just the next step in the same dictation.

And third, the strangest one: my hands ache less at the end of the day. I never thought of typing as physical labor until I stopped doing it.

Rough edges I run into — and that I keep on the roadmap

An honest list of what still bugs me about my own product:

Polishing sometimes "improves" terms that shouldn't be touched. Someone says "kubectl apply" and gets "Kubernetes apply." I solved it with the dictionary in settings (PolishOptions.Dictionary) — add your own terms and the LLM leaves them alone. But I know first-time users don't find this hook fast enough; surfacing it earlier is on the list.

Cold-start model warmup. The first dictation after Windows boots is noticeably slower than the next ones. I added AudioDeviceWarmup (saves 40–80 ms), but the LLM's KV-cache still needs to warm up. For now you just say any phrase into the mic right after startup — like a stretch. There's more to do here.

Sometimes I want to hot-swap modes mid-sentence. Like "business opening, then a funny P.S." Not a thing yet — but I like the direction, and it's something I'm keeping in mind.

What I'd say to myself a year ago

"Be patient. The thing that looks like a toy now will become the most-used app after the browser in 10 months. Keep a journal. This will be the story of how private AI on a specific device becomes the new normal."

Alpha is done. Beta is almost done. I'm no longer shipping a prototype — I'm responsible for a product that people use every day.

And I'm proud of that.

Try it yourself

Download Commander Flow and hold Caps Lock in any app. Recognition runs locally, no cloud — free trial included.

Download free

Related stories

All articles