Roadmapping When the Model Changes Every Six Months

Twelve-month feature commitments break when model capabilities shift every quarter. Build a three-horizon roadmap and write down what would invalidate each bet.

By Prateek Jain
10 min readIntermediate

ProductPlan's 2025 State of Product Management report found that 54% of PMs primarily track features and releases rather than business outcomes. Under executive pressure, that share climbs to 70%1. Feature-tracking was a tolerable habit when teams shipped quarterly. Now that AI-assisted development lets teams ship weekly, the same habit fills the roadmap with deliverables nobody has re-justified since planning season, and the gap between "on track" and "still worth building" widens every sprint.

Why the twelve-month feature roadmap breaks

Annual feature planning assumed that building was the constraint. You committed to twelve months of features because engineering capacity, not idea quality, decided what shipped, and a year was roughly how long the list took to clear. AI-assisted development removed that constraint. Userpilot's analysis of the modern roadmap argues that teams now ship at rates the old planning cycle was never designed for, so bad prioritization and weak discovery compound faster than an annual plan can correct2.

The second force is model volatility. In an AI product role, a single model release can invalidate a twelve-month roadmap3. The feature you scoped for Q4 because the model could not do it reliably in January may be a prompt away by June. The differentiator you committed to may ship inside the next model version as a default capability. Feature-level commitments a year out were always optimistic. With capabilities shifting every quarter, they are fiction.

What should die here is the false precision, not the long view2. You can commit to a user outcome for two years. The features that deliver that outcome can change with the technology.

Outcome-anchored horizons

My PM Interview's guide to AI roadmaps calls the fix outcome-anchored horizon planning: anchor every roadmap item to a user outcome rather than a technical implementation, then organize time into horizons that carry different levels of specificity, detailed where you have confidence and directional where you do not3.

"The PM who ties their roadmap to a model version will be wrong by Q3. The PM who ties it to a user outcome will still be right in two years."3

Outcome-based roadmaps have been standard advice for a decade. The same guide's framing is that AI turned them from best practice into a survival requirement3. Three horizons make the idea operational:

HorizonTimeframeWhat you commit toSpecificityReview cadence
Now0 to 3 monthsNamed features with owners and datesHigh: scoped work, success metrics definedWeekly
Next3 to 9 monthsApproaches to an outcomeMedium: problem statements, two or three candidate solutionsMonthly
Later9 to 24 monthsUser outcomesDirectional: the outcome, the audience, why it mattersQuarterly

Each horizon carries a different kind of promise. A Now item reads like a ticket: "reduce report-generation time for analyst accounts from 40 minutes to under 5, shipping the templated-export flow by March 15." A Next item names the problem and the leading approaches without picking a winner: "cut analyst time-to-first-report; candidates are templated exports, a natural-language query layer, or scheduled digests." A Later item is only the outcome: "analysts produce their weekly reporting without a data team in the loop."

The Later horizon is also where strategy gets tested. If you cannot tie a Later outcome to a validated market need, the problem sits upstream of the roadmap, in product-market fit, not in planning format.

Write down what would invalidate each bet

A horizon plan still fails if nobody notices when its premises expire. So every initiative on the Next and Later horizons gets a short list of assumptions that must stay true for the bet to make sense. Three categories cover most of them:

  • Model capability. Does this need something models do reliably today, or are you betting a capability arrives? Name the capability and the reliability bar.
  • Cost curve. Do the unit economics work at today's inference prices, or only if prices keep falling? Write down the price you are assuming.
  • Regulation. Does the feature depend on a data use or deployment context that pending rules could restrict? Name the specific rule you are watching.

Then review the assumptions on the same cadence you review progress, monthly for Next and quarterly for Later, matching the table above. Progress reviews tell you whether the work is on schedule. Assumption reviews tell you whether the work still makes sense, and they catch invalidation months before a missed metric would.

When an assumption breaks, the initiative is neither quietly shelved nor stubbornly defended. It re-enters prioritization as a changed bet with new numbers. RICE fits this moment because a broken assumption usually changes a specific input: a capability arriving early raises confidence, a stalled cost curve cuts impact, a competitor shipping the same thing cuts reach. Re-score the changed bet against everything currently in the Now horizon and let the math force the conversation.

The second user stream: agents

Products now have two user streams: human users, and AI agents that call your APIs through protocols like MCP without ever opening the interface2. Anthropic introduced MCP as an open standard, and OpenAI added remote MCP server support to its API in 20252, so for most software categories the agent stream is no longer speculative.

Agents break the standard measurement stack. An agent never sees a tooltip, fires no clickstream events, and will not answer an NPS survey2. Every adoption metric you currently report assumes a human in the loop.

That has two roadmap consequences. The agent surface earns its own items: API coverage for the jobs agents actually perform, machine-readable documentation, error responses an agent can recover from without a human reading a stack trace. And it needs its own measurement plan, defined at the API layer rather than the client: calls per task, retry rates, task completion. PM Toolkit runs an agent surface itself, 17 calculator tools exposed over MCP and used from Claude Code and Cursor rather than the browser, and none of the browser-side analytics see that traffic.

Where AI belongs in the roadmapping process

Use AI to upgrade the evidence layer under the roadmap: synthesizing support tickets, clustering feature requests, mapping where usage drops off2. That work was always undersupplied because it was slow, and it is the kind of work models do well. Better evidence makes every horizon review sharper.

Do not hand AI the ranking. The tempting version is feeding the model your feedback corpus and asking what to build next, which collapses into request-counting, and request counts miss strategy, timing, tech debt, and the tacit knowledge that lives in your team2. A model can count the 200 tickets mentioning export bugs. Whether fixing exports beats the platform migration only your team knows about is a call the corpus cannot make for you.

This division of labor mirrors the broader pattern in the AI product management field guide: AI absorbs the production work, and the judgment moves up a level. Ranking roadmap bets is that judgment layer, applied quarterly.

FAQ

How far out should an AI product roadmap commit to specific features? About three months, the Now horizon. Past that, commit to approaches (3 to 9 months) and outcomes (9 to 24 months). The constraint is not planning skill; it is that model capabilities, inference costs, and competitor baselines all move on roughly quarterly cycles, so feature specificity beyond one quarter encodes guesses as promises.

Is this just Now/Next/Later with extra steps? Same skeleton, two additions that do the real work. First, each horizon carries a different commitment type: features, then approaches, then outcomes, instead of the same feature list at decreasing resolution. Second, every Next and Later bet lists the assumptions that would invalidate it, with a review cadence. Most Now/Next/Later boards skip both.

What do I tell an executive who wants feature dates for next year? Offer the outcome, the current leading approach, and the date the approach gets locked. Feature names committed a year out have a poor survival rate under quarterly model shifts, and a commitment that quietly dies is worse for trust than a scoped one. The 70% figure from ProductPlan1 suggests executive pressure is exactly when feature-tracking takes over, so have this answer ready before the meeting.

How do I measure AI agents using my product? Server-side. Agents fire no client events and answer no surveys, so define adoption at the API layer: which tools or endpoints get called, calls per completed task, error and retry rates. Treat a rising agent share of total usage as a roadmap signal in its own right.

Does any of this apply if my product has no AI features? Most of it. Your competitors' shipping speed is set by AI-assisted development whether or not you use it, agents can call a plain REST API as easily as an MCP server, and writing invalidation assumptions next to roadmap bets costs one line per bet in any product category.

Sources

Footnotes

  1. ProductPlan, 2025 State of Product Management Report. Figure cited via the Userpilot article below; we have not checked it against the original report. 2

  2. Product roadmap in AI Era: from delivery plan to decision system, Userpilot 2 3 4 5 6 7 8

  3. How to Build an AI Product Roadmap When the Technology is Changing Every Six Months, My PM Interview 2 3 4