What an LLM tends to get wrong about overhead

Overhead is where AI-built cost models go to embarrass themselves. Direct costs are easy, because they more or less announce themselves. Indirect cost is judgement, and judgement is exactly what a model assembled from patterns tends to fake convincingly. This is an informal teardown of the errors we keep seeing, written from the validation chair.

Wrong drivers, chosen because they were nearby

The most common fault. An AI-built model needs to allocate, say, quality assurance cost, and it reaches for headcount or revenue because those columns were in the data and they are the conventional default. Quality assurance is not consumed in proportion to revenue. It is consumed by the products that generate inspections, returns, and rework, which are often the low-revenue, high-fuss ones.

The result looks orderly. Every product gets a quality allocation, the column sums correctly, nothing throws an error. It is also pointed in the wrong direction, and it will flatter exactly the products that are quietly expensive. A human who has costed a factory floor smells this immediately. A pattern-matcher does not. The model has no nose for which products generate trouble, only for which columns correlate, and correlation in a cost ledger is a poor guide to causation. Revenue correlates with almost everything, which is exactly why it is such a tempting and such a dangerous default driver.

Capacity that quietly goes missing

Time-Driven costing lives and dies on capacity. You cost the practical capacity of a resource, you cost what gets used, and the gap is the cost of unused capacity, which is a finding, not a rounding error. AI-built models routinely allocate the full cost of a resource across whatever was produced, which silently buries idle capacity inside product cost.

The effect is corrosive. Unit costs rise in slow periods purely because volume fell, which tells managers the products got more expensive when really the factory got emptier. We see this constantly. The model is not wrong about arithmetic. It is wrong about what unused capacity means, and that distinction is the entire reason TDABC exists. Unused capacity is a management decision wearing a number. It tells you whether you bought too much resource, or whether demand fell, and it belongs in front of the people who can act on it, not smeared invisibly across the cost of every unit that happened to be made that month.

Allocations that refuse to reconcile

A surprising number of AI-built models do not tie back to the ledger and nobody checked. Total allocated cost should equal total actual cost, within a tolerance you can defend. When it does not, you have leakage: cost double-counted, cost dropped, a pool allocated twice. The model still produces a tidy cost per unit. The tidiness is the trap.

Reconciliation is the cheapest, highest-yield check there is, and it is the one most often skipped because the output already looks finished. If a model cannot reconcile to actuals, every margin it reports is decorative.

Confident commentary that is confidently wrong

The newest failure mode, and in some ways the most dangerous. Ask the model to explain its results and it will write fluent, plausible narrative about why product A is more profitable than product B. The prose is good. The prose is also generated to sound right, not to be checked against the model’s own internals, and it will happily explain a result that the allocation logic does not actually support.

A reader who trusts the commentary inherits errors with a layer of confidence painted over them. We have learned to read AI-generated cost commentary the way you read a cover letter. Pleasant, and not evidence. The tell is usually specificity: the commentary asserts a cause with a confidence the underlying numbers never earn, naming a driver of profitability that the allocation logic does not actually isolate. Good commentary points back at a figure you can open and check. Generated commentary points at a feeling.

So what do you actually do?

Validate the boring things first, because the boring things are where the money is. Reconcile to the ledger. Open the largest cost pools and interrogate the driver. Check that unused capacity is shown rather than smeared. Then, and only then, read the commentary, against the numbers, not instead of them. This sequence is roughly the spine of the Trust Score, and the dimension-by-dimension version lives at /ai-profitability/trust-score/.

Takeaway: AI-built overhead allocations fail in a small number of repeatable ways. Wrong driver, missing capacity, no reconciliation, confident narration. None of them are exotic. All of them are catchable by someone who reads the model instead of admiring it.