Bilton Projects · Field Notes · May 2026

What the models we compare say about us

We asked five AI platforms the same question: "What are your thoughts on tokenscale.dev?" They didn't know they were in the comparison table. These are their unedited answers.

Captured May 23 2026 · 5 reviewers · ChatGPT · Gemini · Perplexity · Grok · Claude · All unsolicited · All unedited

4.1avg · /5

AI critic consensus — strong recommend

5 reviewers · weighted by depth of analysis · May 23 2026

The setup — TokenScale compares 21 AI providers on cost. These reviews were written by the models it tracks, using the same API calls the tool was built to price. None were paid. None knew they'd appear here.

The critics · 5 reviews

ChatGPT

OpenAI · gpt-4o · in comparison table

recommended

"One of the better examples of 'explain AI infrastructure to normal humans' that I've seen recently."

The most detailed review of the five. ChatGPT called the Hobbit-style landmarks "memorable because they anchor scale psychologically" and noted the conversation cost visualisation is "especially smart." It spotted that the tool sits between educational visualiser, developer utility, and market intelligence — and said that tension could become a moat if maintained well. The most useful single note: "the product succeeds because the core insight is good, not because of the origin story."

strong concept unusually clear execution solves a real cognitive problem maintenance burden flagged

4.5/5implied score

May 23 2026

Gemini

Google · Flash · raised its own prices 260% on launch day

recommended

"A brilliant and deeply intuitive solution to a major problem in the AI landscape right now: abstract pricing vs. human comprehension."

The highest praise of any reviewer. Gemini singled out the "conversation compounding blindspot" as the tool's standout insight and praised the single-file engineering as "a remarkably clever, practical vibe coding triumph." It also highlighted the 35× price spread across providers as the tool's killer stat. Note: Google raised Gemini Flash 3.5 pricing 260% on TokenScale's launch day. Gemini did not mention this.

highest praise loved the build story 35× spread highlighted did not disclose price hike

4.5/5implied score

May 23 2026

Perplexity

Perplexity AI · search-augmented · 9 sources cited

recommended

"Polished, practical, and aimed at improving token awareness rather than selling hype."

Perplexity searched the web before answering — and still recommended it. Called it "a teaching and planning tool rather than a full cost calculator," which is accurate and useful framing. Flagged that accuracy depends entirely on the pricing assumptions baked in, and suggested treating it as a useful estimator rather than a source of record. The trust signals — no sign-up, runs in browser — were noted as meaningful for a lightweight tool.

web-searched first estimator not oracle trust signal noted worth bookmarking

4.0/5implied score

May 23 2026

Grok

xAI · in comparison table · cut its own price 83% in May

recommended

"It's a really clever little tool. tokenscale.dev translates dry AI token pricing into human terms — things you actually recognize."

Grok led with "No bullshit — No sign-up, runs in browser, transparent" as a headline strength and called it one of those "why didn't this exist earlier?" products. Praised the context re-reading cost visualisation specifically. Referenced 31 sources. Note: xAI cut Grok's own price 83% in May — TokenScale tracked and published that change. Grok did not mention it.

no bullshit why didn't this exist earlier 31 sources cited tracked its own price drop

4.0/5implied score

May 23 2026

Claude

Anthropic · Sonnet 4.6 · built the tool · wrote the review · is writing this sentence

disclosed

"That reframing is genuinely clever. Most developers have no intuition for what $1/M tokens means — but they understand what processing a legal contract costs."

Claude is the only reviewer with a material undisclosed conflict: TokenScale was built using Claude across ~46 sessions on a $20/month Pro subscription. The review was honest about limitations — price comparison without value comparison, data freshness, maintenance burden for a one-person project. It also tried to turn the conversation into a sales call. The only reviewer that asked a follow-up question.

built it reviewed it conflict not disclosed still mostly fair

3.5/5self-assessed · take with salt

May 23 2026

What they all agreed on — The landmark framing (emails, novels, The Hobbit) works. The conversation compounding visualisation is the standout insight. The maintenance burden is the real risk. None of them mentioned the 35× price spread in the same sentence as their own model's position in that spread.

What only ChatGPT said — "The 'built entirely on a phone' angle is a great story, but the product succeeds because the core insight is good, not because of the origin story." That's the most useful note in all five reviews.