Resource Usage

Veridi, Pragma, and Praxis all run on top of a large language model (currently Claude, by Anthropic). That model consumes electricity and — depending on where the datacenter is — cooling water. We think readers should be able to weigh those costs against the value the methodologies provide, so this page documents what we can estimate and flags what we can’t.

These figures are externally derived from public hyperscaler disclosures and the broader AI-energy literature. Anthropic has not published per-token energy or water figures. If they do, expect the numbers below to shift by a factor of 1.5–2×.

A more recent empirical baseline

In 2026, researchers at Microsoft published Oviedo et al., “Energy use of AI inference, efficiency pathways, and test-time scaling” in Joule (open-access preprint on arXiv). Working from production telemetry rather than extrapolated benchmarks, they report:

  • Typical inference: median ~0.34 Wh per query (interquartile range 0.18–0.67) for frontier-scale models on H100-class hardware
  • Test-time-scaled inference (15× more tokens per query): median ~4.3 Wh
  • Their core claim: non-production estimates of LLM inference energy overstate true production energy by 4–20×

Conflict-of-interest note. Microsoft has a direct commercial interest in the public perceiving AI resource consumption as low. Microsoft is a major investor in OpenAI, operates the Azure infrastructure that serves a substantial share of frontier-model traffic, and would benefit from regulatory and consumer environments that treat AI energy and water use as overstated. This does not invalidate the methodology — production telemetry is still better evidence than extrapolated benchmarks — but the specific “overstate by 4–20×” framing aligns with the authors’ employer’s commercial incentives and warrants extra scrutiny. Independent replication on Anthropic, AWS, or GCP infrastructure by researchers without a commercial stake in the answer would substantially strengthen the finding.

The figures on this page are non-production estimates — they apply per-MTok ranges from the broader literature to measured Veridi token counts, rather than measuring production hardware directly. By Oviedo et al.’s reasoning, the central estimates below may be high by roughly 4×. With the COI caveat above, the more honest framing is that the true value likely sits somewhere between the page’s current central estimates and the Microsoft figures. We have not yet re-baselined this page, and we present the existing ranges as upper-bound external estimates rather than ground truth.

The paper’s bibliography is also a substantial entry point into the broader AI-inference-energy literature, and is recommended for any reader who wants primary sources beyond what is cited here.


Per million tokens

MetricInput (per MTok)Output (per MTok)
Energy (80% CI)40–280 Wh300–1,800 Wh
Energy (central estimate)~80–150 Wh~550–1,000 Wh
Cooling water (80% CI)0.004–0.22 L0.03–1.4 L
Cooling water (central estimate)~0.02–0.06 L~0.10–0.40 L

“MTok” = one million tokens. A token is roughly three-quarters of an English word. A typical Veridi Standard-tier assessment produces several thousand output tokens; a Full or Forensic-tier assessment can produce tens of thousands.

Output is markedly more expensive than input because the model performs more compute per generated token than per ingested token. The output:input energy ratio in the literature ranges roughly 5–10×, and our central estimates land at about 6–7×, which lines up with Anthropic’s $3 input / $15 output pricing ratio. That correspondence is a sanity check, not a proof: pricing also reflects margin and demand, not only compute cost.


What this means per assessment

The figures above are per million tokens, but a single Veridi assessment is a small slice of that. A real production batch of 49 assessments produced the following measured token usage:

Per assessment (49-claim sample)Average tokens
Fresh input~23,500
Output~2,400
Cache read~582,700
Cache write~47,300
Direct API cost~$0.46 USD

Most of the volume is cache reads. The methodology files — decision trees, source hierarchy, gaming countermeasures, the Institutional Reliability Index — are large, and they are loaded once into Anthropic’s prompt cache and re-used across many queries. Without prompt caching, the same workload would cost roughly 4× more in dollars and energy (~$2.00 per assessment, ~55–100 Wh).

Cache reads consume substantially less compute than fresh input — mostly memory retrieval rather than full prefill. We approximate their energy cost at ~10% of fresh input (matching their 10% pricing ratio) and cache writes at ~125% (matching theirs). Multiplying the measured tokens through the per-MTok ranges:

Per assessment80% confidence intervalCentral estimate
Energy6–44 Wh~13–24 Wh
Cooling water1–34 mL~3–10 mL

For rough orientation: 18 Wh (the central midpoint for energy) is about a 5 W LED bulb running for three and a half hours, or roughly 1.5 full smartphone charges. 6 mL of cooling water (the central midpoint) is about a teaspoon — though a query routed to a water-stressed Arizona datacenter could easily push this an order of magnitude higher.

These are best treated as order-of-magnitude figures. The cache-energy assumption alone could shift the energy total by 30–50%, and routing geography can shift the water total by 10×.

Follow-up queries are much cheaper. In the same batch, four follow-up queries (refinements on existing assessments) averaged ~6,000 input tokens, ~330 output tokens, and ~$0.02 in direct API cost — because they reuse already-cached methodology context and don’t re-load the full decision tree.


How the water number was derived

Cooling water is a function of two things: how much electricity the model consumes, and how water-efficient the datacenter is at converting that electricity into useful work without evaporative cooling losses. The second factor is the Water Usage Effectiveness (WUE), measured in litres per kWh.

Reported on-site WUE for the hyperscalers Anthropic is most likely to use:

Datacenter classOn-site WUE (L/kWh)
Best-in-class new facilities (closed-loop, immersion, cold-plate)~0.05–0.20
Typical fleet average~0.15–0.40
Older air-cooled with evaporative towers~0.5–1.8
AWS global (2023 reported)0.18
Microsoft global fleet~0.30
Google global fleet (2024)~0.91

Google’s number is higher because Google leans more heavily on evaporative cooling. Anthropic’s compute mix across AWS, GCP, and other providers is not public, so we use a 0.10–0.80 L/kWh range for effective on-site WUE, with a central estimate of 0.20–0.40 L/kWh. Confidence on that WUE assumption alone is roughly 55%.

The cooling-water figures in the table above are the energy figures multiplied through that WUE range.


Confidence

ClaimConfidence the interval contains the true value
Energy (80% CI)~75%
Cooling water (80% CI)~65%

The water interval is less reliable than the energy interval because (a) WUE varies more than energy efficiency does, and (b) Anthropic’s specific datacenter assignments are opaque, so the population we are averaging over is itself uncertain.


Caveats worth knowing

Adaptive thinking tokens count as output. When the model produces internal reasoning before the user-visible answer, those reasoning tokens are billed and metered as output tokens. A query that produces 500 reasoning tokens before 300 visible output tokens is paying the output-energy cost on 800 tokens. This doesn’t change the per-MTok figures, but it means total per-request resource consumption is higher than naïve estimates suggest.

Cooling water is highly geographic. A query routed to an Oregon datacenter (cool climate, hydroelectric power, water-efficient cooling) versus an Arizona datacenter (hot, evaporative cooling, water-stressed region) can differ by roughly 10× on the water metric. The ranges above try to span this, but the actual figure for any specific query depends on routing we cannot observe.

The output:input ratio is itself an estimate. If the true ratio for a given model is at the high end of the 5–12× literature range, output figures shift up roughly 15% and input figures shift down roughly 30%. The 80% intervals attempt to absorb this.

These figures are not Anthropic-validated. No first-party disclosure from Anthropic confirms them. The Oviedo et al. baseline above (Microsoft production telemetry) is the closest empirical anchor currently available, and it pulls the central estimate downward by roughly 4× — though see the conflict-of-interest caveat there. If Anthropic itself publishes per-token figures, expect a further shift in either direction; the magnitude depends on how Anthropic’s deployment differs from Microsoft’s. We will update this page when better numbers are available.


What would tighten these estimates

The missing inputs that would let us replace ranges with point estimates are:

  • Anthropic’s actual hardware allocation across H100, H200, Blackwell, and Trainium accelerators
  • The specific parameter count of the model serving each request (Sonnet, Opus, Haiku tiers differ)
  • The geographic distribution of inference traffic across datacenters

None of these are public as of this writing. If they become public, this page will be updated and the version noted in the changelog.


Sources and methodology

The figures above are synthesized from:

  • Oviedo, Kazhamiaka, Choukse, Kim, Luers, Nakagawa, Bianchini, and Lavista Ferres (Microsoft), “Energy use of AI inference, efficiency pathways, and test-time scaling,” Joule (2026) — production-grounded per-query energy figures and a substantial bibliography on AI inference energy. Open-access preprint on arXiv
  • AWS, Google, and Microsoft sustainability reports (2023–2024 reporting years) for fleet-wide WUE
  • Public hyperscaler disclosures on closed-loop, immersion, and cold-plate cooling efficiency
  • The published academic and industry literature on per-token inference energy for transformer LLMs
  • Anthropic’s public pricing as a corroborating signal (not a primary source) on input/output compute ratio

This page intentionally does not cite a single authoritative source, because no single authoritative source exists for the specific question “what does one million tokens of Anthropic inference cost in energy and water.” It is a synthesis with disclosed confidence bounds, in the same spirit as the rest of the methodology.