AI Feature Unit Economics: Cost per Interaction Guide | PM Toolkit

In April 2026, the weighted average price businesses paid for AI tokens was about $0.72 per million, according to Ramp's spend data¹. Seventy-two cents per million sounds like a rounding error until you multiply it through a real feature: thousands of tokens per call, more than one call per task, millions of tasks per month. Every AI feature has a marginal cost per use, and the choices that set it (context size, model tier, retries, agent loops) are product decisions. A PM who ships an AI feature without a cost-per-interaction model is making a pricing and margin decision without knowing it. This article gives you the math.

Where AI features break the SaaS margin model

Classic SaaS economics rest on near-zero marginal cost. Once the product exists, serving the next user costs hosting and a sliver of support, which is how the industry settled into the unit economics covered in our CAC/LTV guide. AI features break the assumption underneath all of it. Each model call is metered, and the meter runs every time the feature runs.

The gap shows up in margins. Pricing analyses from 2026 put AI-first B2B SaaS gross margins in the 50-60% range, against 80-90% for traditional SaaS²³. Andreessen Horowitz flagged the same structural problem in 2020, finding AI companies at 50-60% gross margin while software companies ran 60-80% or better⁴. None of these figures come from audited filings, so treat the bands as directional. The direction itself is hard to argue with: inference cost scales with usage, and usage is the thing you are trying to grow.

The CFO sees the margin compression after the quarter closes. The PM sets it months earlier, because the cost drivers are feature design choices: how much context the prompt carries, which model tier serves the request, how many retries the error handler allows, and whether a task runs as one call or an agent loop of five. Engineering implements those choices, but they get decided (or defaulted) at spec time.

Where the cost lands on the P&L: inference for a live feature is cost of goods sold, not R&D. It compresses gross margin directly, which is the number investors and acquirers anchor on. Drivetrain's CFO guide covers the accounting treatment in detail⁵. For the PM, the practical consequence is simple: AI serving costs belong in the same gross-margin math you already use for LTV.

Build the cost-per-interaction model

One definition first. An interaction is one user-triggered task: summarize this ticket, draft this reply. An interaction may involve several model calls, and that multiplier is where budgets quietly die.

cost per interaction = (input tokens + output tokens) × price per token × calls per interaction
monthly AI COGS      = cost per interaction × interactions per month

Take a fictional feature: a support-ticket summarizer inside a helpdesk product. Whenever an agent opens a ticket, the feature condenses the thread into a five-line summary.

Line item	Value
System prompt and instructions	1,000 tokens
Ticket thread (trimmed)	3,000 tokens
Output summary	300 tokens
Calls per interaction (including retries)	1.15
Tokens per interaction	~4,950
Blended token price (Ramp weighted average, April 2026)¹	$0.72 per million
Cost per interaction	~$0.0036
Monthly cost at 50,000 tickets	~$178
Monthly cost at 1,000,000 tickets	~$3,560

Two things to hold onto from this table. First, the blended average smooths over a wide spread: frontier models price at many multiples of it and small models at a fraction, so the model-tier decision moves the whole column. Second, every row above the price is a product decision. Trim the thread harder and the input shrinks. Cap the summary at three lines and the output shrinks. Allow three retries instead of one and the multiplier grows.

Agent loops multiply the multiplier. Rebuild the same summarizer as an agent that reads the ticket, pulls two related tickets, and drafts a reply, and you have four model calls carrying more context each. In our worked example the cost per interaction lands around six times the single-call version. Nothing on the product dashboard changes. The invoice does.

What businesses actually spend in 2026

Ramp publishes AI cost data drawn from spend across the businesses on its platform. Three figures from its April 2026 data¹:

Weighted average price paid: about $0.72 per million tokens
Median business AI spend: about $2,246 per month
Mean business AI spend: about $140,842 per month

The median and the mean sit roughly 60x apart, and that gap is the most useful number on the list. Most businesses buy a handful of assistant seats and stop. A small number run inference at production scale and spend orders of magnitude more, which drags the mean far above the median. The distribution has a heavy tail, and shipping a successful AI feature is precisely how a company migrates from the median group into the tail, so budget for tail-level spend rather than today's median. One caveat on the source: Ramp's sample is companies that run spend through Ramp, which skews toward US startups and mid-market firms, so read the levels as directional.

Cheaper tokens will not shrink your bill on their own. Listed per-token prices have trended down since 2023, but teams respond by spending the savings: longer contexts, more retries, heavier agent loops, longer outputs. We have not tied this to a single dataset; the intuition is that usage per task grows at least as fast as prices fall. Model your costs on tokens per task, which you control, not on the hope that next year's models are cheaper.

The five levers

Once the model exists, cost reduction is a design exercise. These are the levers, roughly in order of how often they pay off.

Lever	What it does	What it costs you
Prompt caching	Bills repeated context (system prompt, tool definitions, shared docs) at a cache-read discount instead of full input price	Only works when context is stable across calls; unique-per-call content gets nothing
Model routing	A small, cheap model handles the easy majority of requests; hard cases escalate to a frontier model	An eval set that defines "easy", plus an escalation path
Context trimming	Sends the 3,000 tokens the task needs instead of the 30,000 in the record	Engineering work on retrieval and history summarization
Batching	Non-urgent jobs run through discounted batch endpoints	Latency: results arrive in minutes or hours, not seconds
Output caps	Max-token limits and terse formats cut output tokens, which are priced above input tokens on vendor price lists	Occasional truncation, so you need a retry path

Caching deserves the extra paragraph because its payoff depends entirely on feature shape. Providers bill cache reads at a steep discount to fresh input tokens, but the discounts and minimums differ by vendor and change often, so model against the current pricing page rather than any blog post, including this one⁶. The variable you control is the hit rate. A feature with a long, stable system prompt and shared reference docs can serve most of its input from cache. A feature whose input is mostly unique user content, say a fresh document on every call, caches almost nothing regardless of the discount.

Structure prompts for cache hits. Most providers cache a prefix, so order matters: stable content first (system prompt, tool definitions, reference material), variable content last (the user's input). One variable token early in the prompt breaks the cache for everything after it.

Where it lands in LTV and the feature review

Per-interaction costs become per-user costs, and per-user costs change your unit economics. Suppose a drafting assistant costs $0.02 per interaction blended, and the median active user triggers 20 interactions a month. That is $0.40 per active user per month in COGS that did not exist before the feature shipped.

Now run it through the margin math. A product at $50 ARPU with 80% gross margin contributes $40 per user per month. The $0.40 feature drops that to $39.60, margin to 79.2%, barely visible. But a heavier agentic feature at $4 per active user per month drops contribution to $36 and margin to 72%. At 4% monthly churn (a 25-month average lifetime), LTV falls from $1,000 to $900. If your CAC is $300, the LTV:CAC ratio slides from 3.3 to 3.0, right at the floor of the healthy range in the 2026 SaaS benchmarks. The feature did not touch acquisition, pricing, or churn, and it still moved the ratio.

Plug your own numbers in. Set gross margin to the post-AI figure and watch what happens to lifetime value:

Interactive Calculator

The same math belongs in the build decision. Cost per task should sit next to retention impact in every AI feature review: a feature that costs $0.003 per task and lifts retention clears easily, while one that costs $4 per user per month needs a price increase, a usage cap, or a measured retention lift large enough to cover it. The pricing strategy guide covers the pass-through options (usage-based add-ons, credit packs, tier gating). For the build-versus-run comparison, put the development cost against the ongoing margin impact:

Calculator type "roi-payback-calculator" not found

FAQ

Do AI feature costs belong in CAC or COGS? Serving costs for a live feature are COGS: they recur with usage and compress gross margin. AI spend on acquisition, like ad-creative generation or outbound automation, belongs in fully-loaded CAC. Keeping the two separate matters because they hit different sides of the LTV:CAC ratio.

How do I estimate token counts before launch? Run 20 to 30 realistic samples through the target model's tokenizer (every major provider ships one) and take the distribution, not the average. A rough fallback for English text is about four characters per token. Then instrument actual token usage from day one, because production inputs are always longer than your samples.

What cost per interaction is too high? There is no universal threshold. Compare monthly AI cost per active user against per-user contribution margin (ARPU times gross margin). Below 1% of contribution, ship it and monitor. Above a few percent, the feature needs a pricing answer before launch, not after.

Will falling token prices fix my margins on their own? Unlikely. Listed prices fall, but tokens per task grow as features adopt bigger contexts and agent loops, so bills tend to hold or rise. Design choices (routing, trimming, caching) cut costs faster than the market does, and they compound with price declines instead of substituting for them.

Should an AI feature be free or paid? Start from cost per active user versus ARPU. A cost under 1% of ARPU can ride along free as retention insurance. A cost above 5-10% of ARPU usually needs its own monetization: a usage allowance, a credit pack, or a higher tier. The pricing strategy guide walks through the structures.

Sources

How much do AI tokens cost businesses? Ramp. Figures as published by Ramp from anonymized spend data on its platform, April 2026: weighted average ~$0.72 per million tokens, median monthly AI spend ~$2,246, mean ~$140,842. ↩ ↩² ↩³
The Economics of AI-First B2B SaaS in 2026, Monetizely ↩
AI Is Killing SaaS Margins, Fraction ↩
The New Business of AI (and How It's Different From Traditional Software), Andreessen Horowitz, 2020 ↩
Unit Economics of AI SaaS Companies: A CFO's Guide, Drivetrain ↩
Provider cache-read discounts, cache minimums, and batch-tier pricing change frequently and vary by vendor. We have deliberately not quoted per-vendor figures; model your costs against the provider's current pricing page. ↩