Token Optimization
Tokens are the currency of AI. Every word you send costs money and consumes context. Learn what tokens are, how they work, and how to use fewer of them.
Token Types & Their Efficiency
Not all text tokenizes equally. Understanding which content creates the most tokens helps you optimize.
Word tokens
hellohelloCommon English words are often a single token.
Subword tokens
Rare or complex words are split into subwords by the BPE algorithm.
Character tokens
朋友朋友Non-Latin scripts (Chinese, Arabic, emoji) tokenize very inefficiently — often 2–4 tokens per character.
Code tokens
Code is tokenized by symbol boundaries — usually efficient but punctuation adds tokens.
Whitespace tokens
Leading whitespace (indentation) can create extra tokens. Minified code is often more token-efficient.
Number tokens
Long numbers split into multiple tokens. Use approximate numbers or remove trailing precision.
Tokenization efficiency by content type
English prose (concise)
95% efficient
Source code (well-formatted)
80% efficient
JSON / structured data
55% efficient
Repeated text / boilerplate
40% efficient
CJK / Arabic script
25% efficient
Binary / base64 encoded data
10% efficient
Tokens per API call (input context)
| Scenario | Tokens/call | Monthly cost* |
|---|---|---|
| Current (unoptimized) | 8,000 | $24.00 |
| After 40% reduction | 4,800 | $9.60 |
| Savings | 3,200 | $14.40 |
Input Tokens
System prompt + conversation history + file contents + tool results
Claude Sonnet: $3 / million
Write shorter prompts, use fewer files, compact conversation history
Prompt caching reduces repeated input costs by up to 90%
Output Tokens
The model's response — text, code, tool calls, reasoning
Claude Sonnet: $15 / million (5× input price)
Use max_tokens limit, ask for concise responses, avoid "explain everything"
Output is always freshly generated — no caching possible
The single biggest saving is removing filler from prompts. Every word that doesn't change the output is a wasted token.
Before (47 tokens)
Could you please help me by writing a function
in the Python programming language that will
take a list as input and return the items in
that list sorted in ascending order?After (11 tokens)
Python function: sort list ascending.Cut filler words
10–40% fewer input tokens
Use CSV over JSON
2–5× cheaper data format
Enable prompt caching
90% off repeated input
Compress history
50–80% fewer history tokens
Set max_tokens
Hard cap on output cost
Route to right model
10–50× cost reduction