Distribution fine-tuning

‹

Textsplain

explained in texts

Monday, May 18, 2026 · 9:41 AM

ok what is distribution fine-tuning and why are people acting like it fixes AI slop

short version: it tries to make the model’s whole batch of writing look human, not just make one answer win a taste test.

whole batch?

yeah. one AI paragraph can look fine. ask for 500 and you start seeing the fingerprints.

same phrases, same structure, that weird polished sameness

exactly. same pacing, same tidy argument shape, same generic detail level.

like a playlist where every song is different but they all somehow have the same chorus.

wait, isn’t regular fine-tuning supposed to fix that?

supervised fine-tuning teaches the model to imitate examples for prompts.

but imitating good examples one by one does not guarantee the full population has human variety.

so it learns the costume, not the crowd

right. it can answer a prompt well and still overuse certain moves across a lot of prompts.

what kind of moves

overused words, favorite phrases, too many neat transitions, symmetrical reasoning, details that feel correctly placed but stock.

painfully familiar

DFT looks at that statistically instead of only asking “did a judge like this one output?”

give me the non-paper version

one check counts word and short-phrase patterns. if humans use a phrase sometimes and the model uses it constantly, that shows up.

repeated chorus detector

yep. another check asks whether model writing clusters like human writing, or drifts into its own weird style cloud.

less phrase counting, more overall shape

right. and a judge-model comparison checks outputs against human references.

why not just turn up temperature until it sounds less samey

because temperature is a blunt knob.

one setting might improve phrase variety but hurt the embedding match. another might help the judge metric but not the token pattern.

so “make it randomer” is not the cure

not by itself. randomness can make the playlist less repetitive without making it match the human playlist.

then DFT trains on the playlist shape directly

that’s the idea. optimize for the batch distribution: spread, frequency, style cloud, not just one target answer.

did it actually work

in the reported setup, yes. DFT beat a strong baseline that cherry-picked the best SFT and sampling setting per metric.

they also report smaller DFT models beating larger SFT baselines on some of their metrics.

spicy, but with the usual “reported setup” asterisk

exactly. don’t turn it into a universal law.

what was the product angle

they weren’t just saying “make it human.” they gave it prompt, outline, style, use case.

and copy-paste friction: random fruits or animals injected into copied text, plus apparently no public API.

anti-spam banana moat

basically. the product constraints match the goal: better writing, not infinite fake-human spam.

so the point is distribution, not vibes

yep. human writing has population statistics. if you only optimize single samples, the batch can still smell machine-made.

better playlist, not just a nicer song

that’s the clean takeaway.

Read Mon, May 18 · 10:02 AM