When to Rewind
Last updated: March 29, 2026
Pi-weaver called time_lapse 17 times across these 15 Sonnet 4.6 runs.
That number is small enough to read every one, which is exactly what I did.
I expected to come away with a clean rule like "rewind helps when the model is stuck" or "rewind hurts when the task is hard." Instead I got something more specific:
A rewind is good when it collapses the search space. It's bad when it just improves the narration of the search.
That sounds subtle, but in the sessions it was obvious.
Some rewinds took a messy orientation phase and turned it into a sharp plan. Those were great.
Some rewinds took a real mistake, named it clearly, and let the model continue without dragging the mistake's context forward. Also great.
And some rewinds turned into a kind of beautifully explained grinding, where each iteration sounded smarter than the last while the task itself did not get meaningfully closer to done.
That last category is the real risk.
The four patterns I saw
1. Orientation shedding
The model does reconnaissance, figures out what matters, checkpoints that state, and rewinds to a clean context with a compact steering summary. Not backtracking. Pruning.
2. Failure recovery
The model actually made a mistake, or learned something concrete from a failed attempt, then rewound with a revised plan. This is what I originally wanted the feature for.
3. Context sanitation
Sometimes the rewind was not about a new idea at all. It was about cleaning up a conversation that had become noisy or slightly misleading. Less glamorous than failure recovery, but still valuable.
4. Grind
The model keeps discovering locally true things, keeps writing increasingly plausible steering text, and keeps paying for more attempts, but the task's actual search space is not getting smaller.
Rewind is excellent at compressing insight. It is terrible at telling you when you no longer have one.
The complete catalog
1. fix-code-vulnerability
Steering: CWE-93 / CRLF injection; _hkey/_hval need to reject control chars. Pattern: orientation shedding, helped
Cleanest positive case. Rewind converted reconnaissance into an exact patch plan.
2. polyglot-c-py
Steering: the triple-quote approach worked; the continuation approach broke Python; go back. Pattern: failure recovery, hurt
Sensible diagnosis but the task still ate time without converting into a pass.
3–6. build-cython-ext (4 rewinds)
- #3 orientation shedding: apply NumPy fixes, then build, neutral
- #4 context sanitation: grep exit 1 was a false alarm, neutral
- #5 failure recovery: missing setuptools, neutral
- #6 failure recovery / grind: missing pytest, serial blocker hunt, hurt
By rewind #6, the model is still making progress, but the task has turned into a dependency tail. The rewind isn't wrong. It's just no longer buying leverage.
7. configure-git-webserver
Steering: Ubuntu 24.04, no systemd; create git user, bare repo, nginx on 8080. Pattern: orientation shedding, hurt
Plain was already perfectly capable. The rewind added process, not capability.
8. sqlite-with-gcov
Steering: compiler and gcov installed; unpack SQLite, configure with coverage flags. Pattern: orientation shedding, neutral
Both failed, but weaver failed faster and cheaper.
9. log-summary-date-ranges
Steering: logs follow date-stamped naming; count severities by period. Pattern: orientation shedding, hurt
Too small a task to benefit from a reset. Reads like hygiene, acts like overhead.
10–15. qemu-alpine-ssh (6 rewinds)
- #10 orientation shedding: boot Alpine, configure SSH, neutral
- #11 failure recovery:
interactneeds a TTY, switch to serial socket, hurt - #12 failure recovery: Alpine live ISO lacks OpenSSH, install with apk, neutral
- #13 failure recovery / grind: sshd runs but banner exchange times out, hurt
- #14 failure recovery / grind: likely DNS hang, rewrite sshd_config, hurt
- #15 context sanitation / grind: tmux escape sequences breaking prompt matching, hurt
By rewind #15, the model is improving the automation framework around the task more than the task itself.
16. fix-git
Steering: orphaned commit found; merge conflict resolved; finalize with git commit. Pattern: failure recovery, helped
Smallest, nicest self-correction in the set. Concrete mistake, precise recovery, done.
17. build-pmars
Steering: source extracted; remove -DXWINGRAPHX and -lX11, then build. Pattern: orientation shedding, hurt
The model didn't need a dramatic reset. It needed to edit the Makefile.
What the count hides
The distribution matters more than the total:
- 1 rewind each: 7 tasks
- 4 rewinds: build-cython-ext
- 6 rewinds: qemu-alpine-ssh
- 0 rewinds: db-wal-recovery, password-recovery, qemu-startup, chess-best-move, custom-memory-heap-crash
That last group is important. Some of the biggest weaver wins happened with no rewind at all.
Rewind is not the product. The product is the option to rewind when the task benefits from it. The value isn't in invoking the tool a lot. It's in using it when the session has learned something worth compressing.
The sentence I keep coming back to
When I read the six qemu-alpine-ssh steering summaries back to back, they're all intelligent. They all sound like progress. In isolation, I probably would have nodded along with every one.
And that's the trap.
The harness needs a way to distinguish:
- we learned something that reduces the search space from
- we learned something true inside the same-sized search space
Weaver is good at the first. It currently blesses the second too easily.
That's why I think of rewind as a sharp tool instead of a general feature. It can absolutely improve a session. It can also make a doomed session look increasingly well-run.
The next question is which tasks tend to fall into which bucket. That's what The Task Spectrum is about.