Background
Mood override
Text size
Audio
Speaker also sits top-left ↖

What does

actually cost?

Real prices for content you recognise. An email, a legal contract, an entire novel. Pick your model. Spot the 200K context cliff. Toggle Batch and Cache pricing. Watch costs compound.

No sign-up · runs entirely in your browser

Who are you paying?
Every reply re-reads the entire conversation
Context re-read (grows)
Output written (fixed)
You
6 words / 8 tokens / $0.000024 ↓ prompt
0 words / 0 tokens / $0.00 ↑ reply
total this exchange / 86 tokens / $0.001194
Under two thousandths of a cent. But a million messages a day is $1,194. Scale is everything — which is what the tool below shows.
Tap a content type live
📧
Email ~300 words / ~400 tokens
Standard
Full rate
Batch API
50% off
Cache hit
90% off input
Now try it with your own content

Type a word count, paste any text, or pick a content type. Drag the conversation slider. Toggle Batch and Cache. Everything you just saw — but live, with your numbers.

What does it actually cost to run an AI API call?

It depends on the model and how much text goes in and out. TokenScale shows real examples. Writing the whole of The Hobbit costs about $0.06 on Gemini Flash-Lite, the cheapest tier tracked. The same job costs far more on a frontier model. Pick a content size and a provider to see the live figure.

Why does the output price matter more than the input price?

Most providers charge more for the tokens a model generates than for the tokens you send. Output often costs three to five times more than input. A long answer to a short question can cost more than it looks. TokenScale splits every price into input and output so the gap is visible.

How much do AI prices differ between providers?

A lot. Across the 21 providers TokenScale tracks, the cheapest and most expensive rates differ by more than thirty times for similar work. Choosing the right provider for a task can cut the bill by an order of magnitude. The comparison table sorts every provider cheapest first.

What is the cheapest way to run a large AI workload?

Three things lower the bill. Pick a low-cost provider, use Batch mode where it is offered for fifty percent off, and reuse cached context for up to ninety percent off input. TokenScale lets you toggle Standard, Batch and Cache to see each saving. Open-weight models on inference hosts are usually the cheapest of all.

Is TokenScale free?

Yes. TokenScale is free, with no sign-up and no account. It runs entirely in your browser. Nothing you type is sent anywhere or stored.

How often is the pricing updated?

Every night. An automated check reads each provider's pricing and records it, building a verified price history you can scroll back through. The latest verified date is shown on the site.

4.1 ★★★★☆ AI Critics
💬
Tap to enter a word count ↑ or use the slider
drag the slider to explore
⚑ report a pricing error
tap to edit
↕ tokens
 
 
$0.00
↓ input
 
 
$0.00
↑ output
 
 
$0.00
= total
 
— set a word count above
of light
⚡ ×1
Set a word count above.
Estimated from token count, split ~80% input / 20% output — output decode uses ~3–5× more energy per token than input prefill (Luccioni et al. 2023). Output-heavy tasks (code generation, long-form writing) may use up to 2× more than shown.
Lite ~0.0001 / 0.0004  ·  Mid ~0.0003 / 0.0012  ·  Pro ~0.0008 / 0.0035 Wh per 1K in / out tokens.
MoE models (DeepSeek, some open-hosted) adjusted to ~30% of dense equivalent (DeepSeek-V3 report 2024). Range is an illustrative band, not a confidence interval — reflects hardware generation, PUE 1.1–1.6, and batch size (IEA 2024). Carbon uses provider-specific grid mix (IEA 2023): DeepSeek ~580, US providers ~380, global fallback 436 gCO2/kWh. The idea started watching a kettle boil — one litre takes ~0.1 kWh, enough for 30–300 queries like this.
Query
Kettle Index
type tokens · set word count above · paste text below
⚡ Live Token Counter ⚡ paste or type to count tokens & cost
⚠ Exceeds 200K limit — must chunk across multiple API calls.
pick something familiar
Costs compound — watch every reply re-read everything Drag the slider · see context dominate · understand your bill
each reply re-reads everything · costs compound fast
THE COMPOUNDING EFFECT
Every reply re-reads
the entire conversation
Each time the AI responds, it processes all previous messages from scratch — not just your question. By reply 5, you're paying for 15 context re-reads. By reply 10, context dominates your bill.
↓ drag the reply slider to watch costs stack in real time
REPLIES → cost ↑ 1 2 3 4 5
context re-read (grows) output written (fixed)
Set your scenario
Replies in conversation Reply 1 of 10
reply +10 +100
↓ Context re-read each reply = the growing blue bar
words ×2 +1K
↑ Output written per reply = the red bar
words ×2 +250
Model
Blue = context re-read (grows) Red = output written (fixed)
in : out ratio
per-call cost × volume = real budget · see the scale
PRODUCTION REALITY CHECK
Fractions of a cent become
real budget at scale
A $0.0001 per-call cost looks free. At 1,000 calls/day it's $3/month — manageable. At 100K calls/day it's $300/month. Set your actual volume and discover where AI fits in your budget.
↓ drag the sliders to reveal your production number
PER CALL × 1,000 calls/day PER DAY × 30 days PER MONTH × 12 months PER YEAR real money
API calls per day production volume
1,000 calls +10K +100K
Output words per call avg. reply length
300 words +250 +1000
Per call
enter a word count
Per day
Per month
30 days
Per year
365 days
All models
Provider name
provider type
TokenScale · Learn
Everything you need to understand AI model pricing, providers and how to choose.
Article
⚠ Report a pricing error
Help keep TokenScale accurate
Help me choose
Help me choose
Question 1 of 5
What do you mainly want to use AI for?
Select all that apply
Our recommendation
Also worth considering
Learn more
Tap to close