Back to blog

Cleaning up dictated text with AI: the role of the LLM

How AI transforms raw dictation (filler words, repetitions, false starts) into clean, punctuated text. The role of the cleanup LLM, with concrete before/after examples.

You dictate an idea out loud, and the result on screen makes you wince: “um” scattered everywhere, the same sentence repeated twice, an abandoned false start halfway through, not a single comma. The transcription is accurate — too accurate. It writes exactly what you said, hesitations and all, while what you wanted was clean text. That’s precisely the job an AI layer can do for you. Here’s how to clean up dictated text with AI, what the cleanup LLM actually does, and what the before/after looks like.

Why raw transcription is never “clean”

It helps to distinguish two steps that are often confused. Transcription (or speech recognition) converts your voice into words. Its goal is accuracy: reproduce what was said, without interpretation. That’s why it dutifully captures every “um,” every “I mean,” every repetition.

Spoken language is inherently messy. When we talk without a script, we hesitate, backtrack, restart sentences, think out loud. In writing, those crutches become noise. macOS’s built-in dictation stops at this first step: it transcribes, but doesn’t write. The cleanup is still your problem — hence the feeling that you have to rewrite everything from scratch.

What the cleanup LLM actually does

This is where the second step comes in: a large language model (LLM) takes the raw transcript and rewrites it into presentable text. Not a simple spell-check — a genuine editing pass. In practice, it does several things in one move:

  • Removes filler words: “um,” “uh,” throwaway “like,” trailing “you know,” “right” at the end of sentences.
  • Eliminates repetitions and false starts: when you restart a sentence, it keeps only the final version.
  • Adds punctuation automatically: commas, periods, question marks, capitalization — without you having to say “comma” out loud.
  • Structures the text: it breaks a monologue into readable sentences, sometimes into paragraphs.
  • Adapts tone to context: short and direct in a messaging app, more polished in an email.

The key point: the LLM works on meaning, not just words. It understands that an abandoned sentence later rephrased is a single idea, and it preserves only the final intent. That’s what sets it apart from a standard spellchecker.

Before and after: three concrete examples

Nothing beats examples. Here are typical raw dictations and their LLM-cleaned versions.

Raw dictation (verbatim transcript)After AI cleanup
“uh so basically I wanted to let you know that the meeting uh it’s been moved to Thursday actually Thursday 3 pm”“The meeting has been moved to Thursday at 3 p.m.”
“ok so for the project we have two options either we launch now or no wait we wait for the client’s sign-off instead”“For the project, two options: launch now, or wait for the client’s sign-off.”
“can you uh can you send me the file the excel file when you have five minutes thanks”“Could you send me the Excel file when you have a moment? Thanks.”

The pattern is clear: filler words disappear, false starts (“or no wait”) are resolved, punctuation appears, and the restart becomes a clean sentence. The content itself is untouched — the AI doesn’t add information, it removes noise.

Where you should stay alert

Let’s be honest about the limits, because no tool is magic:

  1. Proper nouns. A model can’t guess the exact spelling of an unusual last name or an obscure brand. Keep a habit of proofreading those.
  2. Meaning depends on transcription. If speech recognition mishears a word at the start, the LLM will neatly rewrite… a mistake. A good microphone is still the foundation.
  3. Paraphrasing. Overly aggressive cleanup can rephrase something to the point of shifting a nuance. Good tools stay conservative: they clean without reinventing.

In short, AI saves considerable time on formatting, but doesn’t replace a quick proofread on sensitive passages.

Where this cleanup fits in your workflow

The advantage of an LLM built into dictation is that the cleanup is invisible and instant: you speak, and already-clean text is inserted at the cursor — not the raw version you’d have to fix afterward. That’s the approach of Speech Flow, a native macOS app (Apple Silicon) that weighs in at ~50 MB. You hold Ctrl, speak, release; an LLM cleans, punctuates, and adapts the tone to the app you’re writing in. Mixed FR/EN/ES/IT mid-sentence is handled.

On the privacy side, the details matter when you’re trusting an AI with your voice: Speech Flow runs on BYOK (you bring your own OpenAI, Gemini, or Groq key). Your audio goes directly to that provider to be processed, then is not stored. If you’d like to compare this approach with cloud subscription solutions, the Speech Flow vs Wispr Flow comparison covers the differences in detail.

FAQ

What’s the difference between transcription and AI cleanup?
Transcription converts voice to words, verbatim (filler words included). LLM cleanup then rewrites that raw text: it removes the “ums,” resolves repetitions, adds punctuation, and formats the result. These are two distinct steps; Apple’s built-in dictation only handles the first.

Can AI change the meaning of what I dictated?
The risk exists with overly aggressive cleanup, but serious tools stay conservative: they remove noise without adding information or rephrasing your ideas. A quick proofread on proper nouns and figures is still recommended.

Do you need to dictate punctuation when an LLM cleans the text?
No. That’s the whole point: the LLM punctuates automatically based on the meaning and rhythm of the sentence. You speak naturally, without saying “comma” or “period.”


A faithful transcript is just a starting point; it’s the LLM layer that turns a halting monologue into clean text. If dictating without cleaning up afterward would save you time every day, Speech Flow does that cleanup on the fly — worth trying if you’re on an Apple Silicon Mac and comfortable with the BYOK model, with an all-inclusive plan if you’d rather not manage any keys.