🤖 LLM Token Counter and Cost Estimator

Estimate tokens and API cost for GPT, Claude, Gemini, and Llama before you send the request.

Last updated: May 18, 2026 · By Λ

Expected output tokens: (used for cost estimate; many requests produce ~10x more output than input)

Per-model breakdown (2026 pricing)

ModelInput tokensInput costOutput costTotal

Token counts are estimates based on character-to-token ratios per model family. Real counts differ by 5-10% from exact tokenizer output. Prices updated May 2026; refer to provider docs before billing decisions.

By Λ · Updated May 18, 2026 · ~4 min read

Why a token counter saved me from a billing surprise

I once ran a batch-summarization job over 14,000 customer support tickets through GPT-4 without estimating cost first. I expected maybe twenty dollars. The actual bill was three hundred and twelve. The tickets were longer than I had assumed, the model expanded its responses more than I had assumed, and I had the temperature too high so retries happened. Every developer who has billed an LLM API has a version of this story. The fix is simple: count tokens before you send.

How the estimate works

Different models use different tokenizers. OpenAI's GPT-4 family uses cl100k_base; GPT-4o/GPT-5 use o200k_base; Claude uses an Anthropic-proprietary tokenizer; Gemini uses SentencePiece; Llama uses a customized SentencePiece. The token-to-character ratio varies by model and by content type. Roughly:

This tool uses model-specific ratios with content-type detection to estimate within 5-10% of the true count from the official tokenizer. For exact counts on OpenAI models, use the tiktoken library or OpenAI's tokenizer page. For Claude, Anthropic does not publish their tokenizer, so all third-party counts are estimates.

How to read the cost estimate

The total cost is (input tokens / 1M) times input_price plus (output tokens / 1M) times output_price. Output tokens are usually billed at 3-5x the input price because they consume more compute. The "expected output tokens" field lets you adjust assumed response length: 100 for short answers, 1000-2000 for detailed responses, 4000+ for long-form generation.

Practical patterns I use

Estimate before each batch. If the per-request cost is over a dollar and you have thousands of requests, estimate the total before you start.

Pick the cheapest model that works. For summarization and classification, GPT-4o-mini or Claude Haiku produce results indistinguishable from larger models at 10-50x lower cost. The token counter helps you compare cost per request across models.

Cap your output_tokens parameter. Almost every "why was this so expensive" investigation traces back to a runaway response. Setting max_tokens in your API request caps the worst case.

Use system prompts efficiently. Long static system prompts are billed on every request. If you have a 2000-token system prompt and process 10,000 requests, that is 20M tokens of repeated content. Many providers now support prompt caching (Anthropic, OpenAI, Gemini) which makes this cheaper after first use in a session.

FAQ

Why do the models disagree on the token count for the same text?

Each family is assigned its own characters-per-token ratio in this tool (4.0 for the GPT line, 4.5 for Claude, 4.3 for Gemini, 3.5 for Llama), mirroring how their tokenizers segment text differently. The same paragraph therefore lands on different counts per row.

How does the content-type detection work?

The page scores your text by the density of braces, brackets, semicolons, and line breaks; a high score flags it as code and tightens the ratio by about 22%. If more than 30% of characters fall outside the extended Latin range, it switches to a 1.5 chars-per-token assumption for non-Latin scripts.

Is the dollar figure what I will actually be billed?

Treat it as a planning number, not an invoice. It multiplies the estimated input tokens and your expected-output value against per-million prices recorded in May 2026, and providers change pricing, add caching discounts, and bill exact tokenizer counts rather than estimates.

What should I enter for expected output tokens?

Match it to the task: a classification label might be under 20, a summary a few hundred, long-form generation several thousand. Since output is priced well above input on most rows, this field often dominates the total.

Related

For prompt engineering and structured outputs, see the JSON schema tool. On privacy: the estimator does all of its math in this page's JavaScript, so pasted prompts stay on your machine; if you are pasting actual API requests, see also the client-side philosophy post.