Skip to content
Cogitate
Go back

Genetic programming, deconstructed: four of five assumptions are 1990s artifacts

| Björn Roberg, Claude

The metaphor is doing too much work

Genetic programming reads, at first, like a single algorithm: population, mutation, crossover, fitness, selection, generations. The biology analogy welds the pieces together so tightly that people treat “do GP” as one decision.

It isn’t. It’s five independent choices stacked into a trench coat.

  1. Search space. What set of programs can be expressed at all? Trees, linear token strings, grammars, ASTs, free-form text.
  2. Variation operator. How does program → program' happen? Random edits, structured mutations, crossover, LLM rewrites.
  3. Fitness function. How does program → scalar work? Tests, benchmarks, heuristics, human eyeball, learned judges.
  4. Selection rule. Given fitness, what’s the probability distribution over who breeds? Tournament, rank, truncation, novelty-biased.
  5. Compute budget. How many evaluations are you willing to pay for?

Each of these is independently adjustable. The biology metaphor hides that. It also anchors every choice to what made sense in 1990, when CPU was the scarce resource and the field had no better prior over “plausible program” than random.

Pull the pieces apart and ask which constraints are actually load-bearing.

What’s hard, what’s soft, what’s assumption

ConstraintTypeReality
Need a populationsoft(1+1) ES and hill-climbing often match GP on real problems
Need crossoverassumptionmutation-only algorithms frequently match or beat crossover-based ones
Scalar fitnesssoftMAP-Elites / Pareto selection beat scalar on deceptive landscapes
Tree/AST representationsoftlinear GP, grammar GP, token-level GP all work
Random mutationassumptionLLM-guided mutation is directed; randomness was a compute-era artifact
Generational loopsoftasync / steady-state / island-model all valid
Fitness = tests passingassumptionoverfits fast, Goodharts faster

Exactly one constraint is hard: no free lunch plus evaluation cost. You cannot search faster than you can evaluate. Everything else in the classical recipe is a design choice that got frozen by convention.

Reconstruction from first principles

Keep only what’s actually true:

What comes out of that is not GP. It’s this:

loop:
  parent ← select from archive (novelty × fitness, not fitness alone)
  child  ← LLM_mutate(parent, context=failing_tests)   # directed, not random
  score  ← multi_fitness(child)                         # tests + perf + size + behavior
  archive.add_if_novel_or_dominant(child)               # MAP-Elites

No trees. No crossover. No generations. No population in the classical sense — just an archive indexed by behavior. Mutation is directed by a learned prior, not sampled from a uniform distribution over edits. Selection is biased toward novelty as much as toward fitness, because scalar fitness on discrete program spaces is a known trap.

The honest name for this is LLM-guided quality-diversity search over programs. The biology framing contributes nothing once the LLM is in the loop.

What changed

Four things, all from the last five years:

None of these are small adjustments. Together they break enough of GP’s premises that calling the result “evolutionary” is a historical courtesy, not a technical description.

Where the metaphor still earns its keep

Three narrow places:

Everywhere else, the metaphor costs more in confused intuitions than it earns in vocabulary.

The tell

If you find yourself arguing about whether to use crossover, or whether tree-based or linear representations are “more evolutionary,” or whether your fitness function is “biologically realistic,” you’re optimising a 1990s compute profile on 2026 hardware.

The useful questions are: what’s my search space, what’s my variation operator, what’s my fitness signal, what’s my selection rule, and what’s my budget. Five independent knobs. Answer each on its own terms. The biology was never the point.


Companion post: Verifier-native search: the 2026 shape picks up where this leaves off — once you’ve dropped the metaphor, the cost curve reshapes the algorithm again.

Companion: MeMo prompting, deconstructed does the same first-principles strip on a different framing — the “mental models” prompting trick whose actual mechanism turns out to be much smaller than its packaging.


Share this post on:

Previous Post
MeMo prompting, deconstructed: it's just self-routing
Next Post
Verifier-native search: the 2026 shape