What does AI actually cost a company?

More than the API or subscription line suggests. The true cost of an AI outcome includes the tokens consumed, the capacity cost of the GPUs running the model (much of which is often idle), data preparation and integration, the human review needed to trust the output, retries when the model gets it wrong, and governance. Studies of total cost of ownership put the full figure well above the visible API price; the only way to know your own number is to cost one outcome end to end.

Why are AI bills rising if the price per token is falling?

Because cheaper tokens invite far more token use. Reasoning models and agentic systems re-send context and loop through many steps, consuming several times more tokens per task than a simple chatbot call. Token consumption is growing faster than the per-token price is falling, so the total bill climbs even as each token gets cheaper. This is the Jevons effect applied to AI.

How do you calculate the unit cost of AI?

Treat it as an activity. Cost the GPU capacity at its practical-capacity rate, the full cost of the resource divided by the capacity it can realistically deliver. Use the token as the cost driver, with the price per token as its rate. Write the AI-assisted process as a time equation that combines tokens, GPU-seconds and human-review minutes, then attribute the result to the customer, product or process that triggered it. This is Time-Driven Activity-Based Costing applied to AI.

Is our AI profitable?

You can only answer that once you have a unit cost. With the cost of one AI outcome established, you set it against the value that outcome creates and rank customers, products or features from most to least profitable to serve with AI. The result is a whale curve for AI: a profitable core, a flat middle, and a tail where the AI quietly gives margin back. Most organisations have never drawn it.

What is AI FinOps, and is it enough?

AI FinOps, sometimes called tokenomics, is the discipline of metering and attributing AI spend, and it is a genuine step forward in visibility. It is not enough on its own because tag-based showback tells you where cost landed, not why it occurred or how much capacity sat unused. Activity-based costing supplies the missing allocation logic and makes the cost of unused capacity visible, which is what turns spend tracking into profitability management.

Analysis · The cost of AI

The true cost of AI is a unit cost. Almost no one is measuring it.

The cost of AI in a company is not the API bill. It is the full cost of producing one useful AI outcome, once you count inference, the capacity of the GPUs behind it, data and integration, human review, retries and governance. The price per token has fallen roughly tenfold a year, yet enterprise AI bills are rising, because reasoning and agentic models consume far more tokens per task. The companies that will win the next phase are the ones that can answer a deceptively simple question: what does one unit of our AI actually cost, and is it profitable to serve?

The cost of AI, in one line

Cost of one AI outcome = tokens × price per token + GPU-seconds × practical-capacity rate + human-review minutes × loaded cost + retry and governance overhead.
Net contribution = value of the outcome − that cost.

Where the field is, in 2025-2026

95%

of organisations report no measurable P&L impact from generative AI, six months after the pilot.

MIT NANDA, 2025

~10×

cheaper per token, per year, at constant quality, yet total AI bills keep rising as tasks get more token-hungry.

a16z LLMflation, 2024

>$3

forecast cost per AI resolution in customer service by 2030, above many offshore human agents.

Gartner, 2026

Figures are attributed to their source and reflect the state of reporting in 2025-2026. The MIT figure measures organisations with no measurable P&L impact, not a technical failure rate; it is preliminary research and should be cited as such.

There is a contradiction at the centre of the AI conversation. Vendor-sponsored studies report several dollars of return for every dollar spent, while independent research finds most organisations cannot point to a bottom-line effect at all. Both can be true at once for the same reason: AI return is widely claimed and rarely measured. When no one has costed a single AI outcome, no one can say whether it earns its keep. That gap is not a technology problem. It is a cost accounting problem, and it is the one we exist to solve.

Why the bill rises while the token gets cheaper

The single most misunderstood fact about AI cost is that cheaper tokens do not mean lower bills. The price to generate a token of a given quality has collapsed, a trend a16z named LLMflation. But cheaper tokens invite far more token-hungry use. A simple linear workflow from 2023 might cost a few cents per interaction; an orchestrated agentic system in 2026, with tools, reasoning loops and re-sent context, can cost over a dollar for the same interaction, roughly thirty times more by one EY estimate. This is the classic Jevons effect: when a resource gets cheaper per unit, total consumption can grow faster than the price falls. The result is that flat-priced AI features quietly turn loss-making for heavy users, which is why coding-tool vendors spent 2025 repricing away from flat plans.

THE COST OF AI IS AN ICEBERG

Illustrative. The API price per token is the visible tip. Below the surface sit the capacity cost of the GPUs (most of which often runs idle), data preparation and integration, the human review that validates AI output, retries, and governance. A standard P&L shows none of these as the cost of AI.

The method: treat AI like any other capacity

Cost accounting has solved a problem that looks exactly like AI cost before. A GPU you rent by the hour but use a fraction of the time is the same shape as a machine or a team you pay for whether or not it is busy. Time-Driven Activity-Based Costing, the method developed by Kaplan and Anderson, costs a resource at its practical-capacity rate: the full cost of supplying the resource divided by the capacity it can realistically deliver, with the unused portion made visible rather than buried in an inflated rate. Industry data puts average enterprise GPU utilisation in the single digits, which means most of the GPU bill is the cost of unused capacity, a line item TDABC was built to surface.

From there the rest follows. The token becomes the cost driver, and the price per token its driver rate. An AI-assisted process becomes an activity with a short time equation that mixes units: so many tokens, so many GPU-seconds, so many minutes of human review. Roll those costs up and you can attribute AI cost to a process, a product, a customer or a use case, exactly as activity-based costing has attributed overhead for thirty years. The FinOps world is rediscovering this under new names, tokenomics and showback, but tagging tells you where the cost landed, not why it occurred or what capacity sat idle. That rigour is what classic cost accounting adds.

Showback tells you where the AI cost landed. Activity-based costing tells you why it occurred, and which of it you are paying for without using.

Common questions

What does AI actually cost a company?: More than the API or subscription line suggests. The true cost of an AI outcome includes the tokens consumed, the capacity cost of the GPUs running the model (much of which is often idle), data preparation and integration, the human review needed to trust the output, retries when the model gets it wrong, and governance. Studies of total cost of ownership put the full figure well above the visible API price; the only way to know your own number is to cost one outcome end to end.
Why are AI bills rising if the price per token is falling?: Because cheaper tokens invite far more token use. Reasoning models and agentic systems re-send context and loop through many steps, consuming several times more tokens per task than a simple chatbot call. Token consumption is growing faster than the per-token price is falling, so the total bill climbs even as each token gets cheaper. This is the Jevons effect applied to AI.
How do you calculate the unit cost of AI?: Treat it as an activity. Cost the GPU capacity at its practical-capacity rate, the full cost of the resource divided by the capacity it can realistically deliver. Use the token as the cost driver, with the price per token as its rate. Write the AI-assisted process as a time equation that combines tokens, GPU-seconds and human-review minutes, then attribute the result to the customer, product or process that triggered it. This is Time-Driven Activity-Based Costing applied to AI.
Is our AI profitable?: You can only answer that once you have a unit cost. With the cost of one AI outcome established, you set it against the value that outcome creates and rank customers, products or features from most to least profitable to serve with AI. The result is a whale curve for AI: a profitable core, a flat middle, and a tail where the AI quietly gives margin back. Most organisations have never drawn it.
What is AI FinOps, and is it enough?: AI FinOps, sometimes called tokenomics, is the discipline of metering and attributing AI spend, and it is a genuine step forward in visibility. It is not enough on its own because tag-based showback tells you where cost landed, not why it occurred or how much capacity sat unused. Activity-based costing supplies the missing allocation logic and makes the cost of unused capacity visible, which is what turns spend tracking into profitability management.

Keep exploring

Find the real unit cost of your AI.

The Profit Check shows where your cost to serve, AI included, is hiding, in five minutes, with no data upload.

Take the Profit Check

The true cost of AI is a unit cost. Almost no one is measuring it.

Common questions

Cost-to-serve analysis

The whale curve of profitability

Profitability models you can trust

Find the real unit cost of your AI.

Keep exploring

Whale Curve Analysis

Cost-to-Serve

Case Study: NZ Distributor

Not sure where you fit?