How to Near-One-Shot a Feature

Atticus Li

How to Near-One-Shot a Feature

One-shotting a feature with AI isn't luck — it's a method. The goal isn't the prettiest first draft; it's the fewest total tokens to a change that compiles…

By Atticus Li July 4, 2026 9 min read

"One-shot" has become a brag: paste a prompt, hit enter, get a working feature, screenshot it. But the version worth chasing isn't a party trick. It's an economic strategy — and it's learnable.

In the flagship piece on choosing a model, I argued the metric that matters is cost per accepted production change, not tokens or benchmark scores. Near-one-shotting is the prompt-level lever on that metric. Every re-prompt you avoid is subscription, credits, retries, and — most expensively — your steering time you don't spend. The winner isn't the model that writes the prettiest first answer. It's the workflow that produces the most accepted, tested, deployed changes per week. This is how you tilt the odds toward that on the first pass.

"One-shot" means accepted, not impressive

Redefine the target before you optimize it.

A pretty draft is code that looks right in the chat window. A near-one-shot is a change that compiles, passes the tests, does what you asked, and survives review — on the first real attempt, or close to it. Those are different goals, and optimizing for the first one actively hurts the second. A confident, plausible, 300-line first draft that turns out to be aimed at the wrong module is worse than a smaller change that lands. You paid for the draft, you'll pay to unwind it, and you paid attention you can't get back.

So the question isn't "how do I get a great first response?" It's "how do I minimize the total tokens — and total loops — between my request and a change I actually accept?" Framed that way, near-one-shotting is mostly about eliminating the reasons re-prompts happen, before they happen.

Where wasted loops come from

Almost every correction loop traces to one of four failures, and all four are preventable at the prompt:

Missing context — the model didn't know a constraint, a convention, or where a thing lives, so it guessed wrong.
Ambiguous spec — you under-described the task, the model filled the gap with a reasonable-but-wrong assumption, and you discover the mismatch after it's built.
Wrong target — it edited the wrong file or chose the wrong approach, and nobody caught it until the diff was huge.
No way to self-check — there was no test, no acceptance criterion, so the model had no signal it was drifting and neither did you until you ran it.

Each of the moves below kills one of these. That's the whole method.

Move 1: Front-load the full spec — task, intent, and constraints

The single biggest source of re-prompts is the spec that dribbles out over five turns. "Add a rate limiter." (builds one) "No, per-user." (rebuilds) "Sliding window, not fixed." (rebuilds) "And it needs to fail open." Each correction is a full loop you paid for.

Say all of it up front. Anthropic's own prompting guidance is, underneath, a token-efficiency guide: the clearer the instruction, the fewer correction cycles. State the task (what to build), the intent (why — what the change enables), and the constraints (the non-negotiables) in the first message.

The intent matters more than people expect. "Add a per-user rate limiter" gets you a rate limiter. "Add a per-user rate limiter because we're getting scraped and need to fail open so real users are never blocked" gets you the right rate limiter — the model connects the task to the constraint you care about instead of inferring priorities on its own. Give the reason, not just the request.

Move 2: Hand it the shape of "done"

A model that can check its own work stops guessing. A model that can't will hand you a plausible draft and let you be the test suite — which means every miss is a round trip through you.

So define "done" in a form the model can verify against:

Point it at the tests. "Make test_rate_limit.py pass" is a target with a pass/fail signal. Tell it to run the tests and iterate to green. The guess-check-guess loop collapses into a loop the model runs itself, cheaply, without your attention.
Give acceptance criteria when there's no test yet — concrete, checkable statements ("returns 429 with a Retry-After header," "allows 100 req/min per user ID, not per IP"), not vibes ("make it robust").
Show an example of the tricky part. A few-shot example of the exact input/output shape, or a similar existing function to mirror, is worth more than a paragraph describing it. Examples are the highest-bandwidth way to steer a model — one good one prevents a dozen clarifying re-prompts.

Move 3: Plan before you let it write

This is the cheapest insurance in AI coding, and it's the same lever the pillar and the token-saving guide both point at from different angles.

Before the model writes a line, ask for the plan: "Figure out how you'd implement this and lay out the approach and the files you'll touch. Don't write code yet." You get a few-thousand-token checkpoint where a wrong target — the wrong file, the wrong abstraction, a misread of the requirement — is cheap to catch. Approve the plan, and the implementation that follows is far more likely to land in one pass because you've already removed failures #1 and #3 before the expensive part starts.

Skipping the plan to "save time" is how you buy a 40,000-token implementation of the wrong thing.

Move 4: Structure the prompt so it can't miss the important part

Order and structure change outcomes. Anthropic's prompting framework puts the task and role first, then context and data, then detailed step-by-step instructions, then examples, then a reminder of anything critical at the end — because models weight structure heavily and a critical constraint buried in the middle of a wall of text gets underweighted.

For a coding prompt, that translates to a simple shape:

What you're doing and why (task + intent).
The relevant context (the files, the conventions, the constraint) — or where to find it.
The steps or the order you want it to work in ("read the form-handling code first, then the validator, then make the change").
The example or the test it should match.
The one reminder that matters most, restated at the end ("Do not change the public API signature").

You don't need a template for every prompt. You need to make sure the load-bearing constraint isn't something the model has to infer.

Move 5: Scope each pass to one accepted change

The instinct is to ask for the whole feature in one heroic prompt. That maximizes the surface area for a wrong assumption and produces a sprawling diff you can't review, so you accept it half-blind or send it back whole.

Sequence it instead. Have the model make one small, coherent change, run the tests, commit, and move on. Small diffs are reviewable, so you catch drift while it's cheap to fix. And commits are checkpoints: if a step goes sideways, you reset to the last good commit instead of paying to untangle a large bad edit. "Near-one-shot" at the feature level is usually several genuinely-one-shot steps, each verified, not one giant lucky guess.

Move 6: Interrupt at the first sign of drift

Even a good setup occasionally goes wrong mid-run. The move is to not let it finish. The moment you see the model heading for the wrong file, inventing an abstraction you didn't ask for, or misreading the task, stop it and redirect. A correction you issue 500 tokens in is trivially cheap; the same correction after it's generated 5,000 tokens of confident wrong work is not. Watching the run and knowing when to cut in is one of the highest-leverage habits in agentic coding — it's how you keep a near-miss from becoming a full retry.

The paradox that makes it work

Near-one-shotting looks like it should mean shorter prompts. It's the opposite. You spend more tokens up front — a fuller spec, an example, a plan, a stated intent — precisely so you spend far fewer overall. A hundred extra tokens of specification routinely saves thousands of tokens of correction plus a bug-cleanup pass plus your attention.

That's the entire thesis of the cost-per-accepted-change lens, applied at the level of a single prompt: optimize for the total cost of getting to a change you accept, not the size of any one message. The cheapest feature is the one you didn't have to ask for three times.

FAQ

Isn't "one-shotting" mostly about how good the model is?

The model sets the ceiling; your prompt decides how close you get to it. The same frontier model will one-shot a well-specified, verifiable task and flail on an ambiguous one. Model choice and prompt method are two dials on the same outcome — see which model to use for the first dial.

What's the fastest habit to adopt if I only change one thing?

Ask for a plan before any code. It's a cheap checkpoint that catches the two most expensive failures — wrong file and wrong approach — before you pay for the implementation loop.

How is this different from just writing a longer prompt?

Length isn't the point; verifiability and structure are. A long, rambling prompt with no test target and a buried constraint still produces re-prompts. A tighter prompt that states the intent, points at a test, and names the one hard constraint one-shots more reliably.

Doesn't front-loading a big spec waste tokens if the task turns out simple?

The spec is cheap relative to a single correction loop. Even on a simple task, one avoided re-prompt more than pays for the extra context — and on a non-simple task it's the difference between one pass and four.

How does this connect to saving tokens overall?

Directly. Fewer correction loops is the biggest token lever there is. The Claude Code token-saving guide covers the session-and-context side; this covers the prompt side. Same goal: fewer total tokens to a working change.

Where to go from here

Near-one-shotting isn't about being lucky or terse. It's about removing — before you hit enter — the reasons a re-prompt would happen: the missing context, the ambiguous spec, the wrong target, the missing test. Do that and your first attempts start landing, which is the cheapest outcome there is.

For the strategic layer under all of this — which model, what stack, how to think about the whole tokens-to-revenue ladder — start with Which AI Model Should You Actually Use to Build Software?, then work with me or subscribe to the newsletter for more field notes from building in public.

Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified in behavioral economics. Led 100+ in-house experiments at NRG in 2025, with project evidence and limits documented in the case studies.

About LinkedIn Newsletter