Which models are supported?

GPT-4o, GPT-4o mini, GPT-4.1, Claude Opus 4.7, Sonnet 4.6, Haiku 4.5, Gemini 2.5 Pro/Flash, Llama 3.1 405B/70B, and Mistral Large.

How accurate is the tokenization?

We use BPE tokenization for OpenAI/Anthropic estimation accurate to within 3% of real billing. Reasoning model thinking tokens are estimated separately.

Does it factor in caching?

Yes. Pro accounts can mark portions of the system prompt as cacheable; we project cache-hit savings at 90% read discount.

Developer Tools

LLM Cost Optimizer — Compare Real Prompt Cost Across Models

Paste your actual prompt + system prompt. We tokenize it, project monthly cost across GPT-4o, Claude, Gemini, and Llama at your volume, and recommend the cheapest model that meets your latency and capability bar.

Last reviewed Apr 28, 2026·Reviewed by CalcLab Team·Methodology

How this is calculated

How the optimizer works

We tokenize your actual system prompt and user prompt with a BPE-style estimator (within ±3% of true OpenAI/Anthropic billing), then project monthly cost across 10+ frontier models using your call volume. Each model gets a capability tier and a latency band; we recommend the cheapest model that meets both your capability bar and your latency budget.

Cache-aware pricing

Mark portions of your system prompt as cacheable (e.g. tool schemas, persistent context, few-shot examples). We project savings at provider-specific cache-read discounts: 90% off for Anthropic prompt caching, 50% off for OpenAI, 75% off for Gemini implicit caching.

Capability bar

We assign each model a tier based on standard benchmarks: flagship (Opus, GPT-5, Gemini 2.5 Pro) for hard reasoning; mid (Sonnet, GPT-4o, Flash) for general tasks; light (Haiku, GPT-5 mini) for classification and routing. Your selected bar filters the recommendation set.

Frequently asked

GPT-4o, GPT-4o mini, GPT-4.1, Claude Opus 4.7, Sonnet 4.6, Haiku 4.5, Gemini 2.5 Pro/Flash, Llama 3.1 405B/70B, and Mistral Large.
We use BPE tokenization for OpenAI/Anthropic estimation accurate to within 3% of real billing. Reasoning model thinking tokens are estimated separately.
Yes. Pro accounts can mark portions of the system prompt as cacheable; we project cache-hit savings at 90% read discount.