How much does GPT-5.5 cost?

OpenAI launched GPT-5.5 on April 23, 2026 at $5.00 per 1 million input tokens and $30.00 per 1 million output tokens, with a 1 million token context window. That's twice the price of GPT-5.4 ($2.50 input, $15.00 output). GPT-5.5 Pro costs $30.00 input and $180.00 output per 1 million tokens. At 1,000 API calls per day with 1,000 input and 500 output tokens each, GPT-5.5 runs about $600 a month, compared to $300 on GPT-5.4.

What is a token in AI?

A token is a chunk of text an AI model processes, roughly 4 characters or three-quarters of a word in English. 'Hello, how are you?' is about 6 tokens. AI providers charge per token, not per word or character.

Why do output tokens cost more than input?

Generating output requires a full forward pass through the model for each token produced. Reading input is faster. Output tokens typically cost 3-10x more than input tokens depending on the model.

What's the cheapest AI model right now?

The two cheapest paid options right now are Google's Gemini 2.0 Flash-Lite ($0.075 input / $0.30 output per 1M tokens) and Llama 3.2 3B ($0.06/$0.06). For completely free usage, Meta's Llama models are open-source. You run them yourself with no per-token charges.

What is prompt caching and how much does it save?

Prompt caching lets AI providers reuse previously processed instructions. Cached input tokens cost 90% less on Anthropic and OpenAI models. Use the cache rate slider in the calculator to see projected savings at your call volume.

What is model routing and how does it save money?

Model routing means sending simple tasks (classification, yes/no, formatting) to cheap models like GPT-4o mini or Gemini Flash, while reserving expensive models only for complex reasoning tasks. The Routing Optimizer tab calculates your exact savings based on your task mix.

How accurate is this token counter?

The token counter uses the standard approximation of 4 characters per token, accurate within 5-10% for English text. For exact counts, use OpenAI's tiktoken or Anthropic's token counter directly.

Who is this calculator for?

QuantaCost is built for developers and teams making direct AI API calls: building apps, chatbots, or internal tools powered by OpenAI, Anthropic, Google, or other providers. If you have a flat monthly subscription like ChatGPT Plus, you pay a fixed fee and these token costs don't apply to you.

Which AI model should I use?

It depends on what you're building, how much quality you need, and how many calls you'll make. QuantaCost has a free 30-second quiz at /quiz that asks four questions (use case, cost vs quality priority, call volume, and special needs) and recommends a top pick plus two alternatives from 38 models across OpenAI, Anthropic, Google, xAI, DeepSeek, Mistral, and Meta.

Developer Guide · April 28, 2026 · 9 min read

Anthropic Prompt Caching cache_control Guide: 90% Off (2026)

Anthropic prompt caching with cache_control: code samples, up to 90% off cached input tokens, plus OpenAI and Google caching and the edge cases that break it.

Loading full article...