Token Optimization

Tokens are the currency of AI. Every word you send costs money and consumes context. Learn what tokens are, how they work, and how to use fewer of them.

What Is a Token?

Fundamentals

A token is the basic unit an LLM uses to read and write text. It is NOT the same as a word, character, or letter — it sits somewhere in between.

Modern LLMs use Byte-Pair Encoding (BPE) to split text into tokens. BPE learns which character sequences appear most often in training data and groups them into single tokens. Common sequences become one token; rare sequences are split into many.

Rule of thumb

1 token ≈ ¾ of an English word.
100 tokens ≈ 75 words ≈ 1 short paragraph.

How BPE works (simplified)

Start

Every character is its own token

Step 1

Find the most frequent pair: "h" + "e" → "he"

Step 2

Find next pair: "he" + "l" → "hel"

Step 3

Find next pair: "hel" + "lo" → "hello" (1 token!)

Result

"hello" is one token; "héllo" might be 3

Token vs Word vs Character

Text	Characters	Words	Tokens
`hello`	5	1	1
`Hello, world!`	13	2	4
`unbelievable`	12	1	4
`ChatGPT`	7	1	3
`朋友 (friend)`	9	1	5
`function() {}`	13	1	5
`(3 spaces)`	3	0	1

The same meaning expressed in different ways can cost 2–5× more tokens. Writing concisely is not just good style — it saves real money.

Token Types & Their Efficiency

6 types

Not all text tokenizes equally. Understanding which content creates the most tokens helps you optimize.

Word tokens

1 token

hello

Common English words are often a single token.

Subword tokens

4 tokens

"un"

"believ"

"able"

"..."

Rare or complex words are split into subwords by the BPE algorithm.

Character tokens

4 tokens

朋友

Non-Latin scripts (Chinese, Arabic, emoji) tokenize very inefficiently — often 2–4 tokens per character.

Code tokens

6 tokens

"function"

" get"

"Data"

"()"

" {"

Code is tokenized by symbol boundaries — usually efficient but punctuation adds tokens.

Whitespace tokens

3 tokens

" "

"indent"

Leading whitespace (indentation) can create extra tokens. Minified code is often more token-efficient.

Number tokens

6 tokens

"3"

"."

"14"

"159"

"26"

"5"

Long numbers split into multiple tokens. Use approximate numbers or remove trailing precision.

Tokenization efficiency by content type

English prose (concise)

95% efficient

Source code (well-formatted)

80% efficient

JSON / structured data

55% efficient

Repeated text / boilerplate

40% efficient

CJK / Arabic script

25% efficient

Binary / base64 encoded data

10% efficient

Interactive: Token Counter & OptimizerType text in both boxes to compare token counts in real time

Original prompt

~18 tokens

71 characters · ~18 tokens · ~$0.000054 per call

Optimized prompt

~8 tokens

29 characters · ~8 tokens · saved ~10 tokens (56%)

Saved 10 tokens (56%)

At Claude Sonnet pricing ($3/M tokens), that's $0.000030 per call. At 10,000 calls/day that's $9.00/month.

Cost Impact CalculatorSee how optimization affects your monthly bill

Tokens per API call (input context)

Scenario	Tokens/call	Monthly cost*
Current (unoptimized)	8,000	$24.00
After 40% reduction	4,800	$9.60
Savings	3,200	$14.40

*Assumes 1,000 calls/day at Claude Sonnet $3/M input tokens

Input Tokens vs Output TokensThey are priced differently — and behave differently

Input Tokens

Everything you send

System prompt + conversation history + file contents + tool results

Cheaper

Claude Sonnet: $3 / million

You control this

Write shorter prompts, use fewer files, compact conversation history

Cache discount

Prompt caching reduces repeated input costs by up to 90%

Output Tokens

Everything AI writes

The model's response — text, code, tool calls, reasoning

More expensive

Claude Sonnet: $15 / million (5× input price)

Harder to control

Use max_tokens limit, ask for concise responses, avoid "explain everything"

No cache discount

Output is always freshly generated — no caching possible

How to Save Tokens — 6 Strategies

The single biggest saving is removing filler from prompts. Every word that doesn't change the output is a wasted token.

Cut these from every prompt

Pleasantries — "Please kindly help me with..."
Redundant context — restating things the model already knows
Over-explanation of the obvious
Trailing summaries — "In conclusion, as I mentioned above..."

Before (47 tokens)

BAD

Could you please help me by writing a function
in the Python programming language that will
take a list as input and return the items in
that list sorted in ascending order?

After (11 tokens)

GOOD

Python function: sort list ascending.

Token Optimization Quick Reference

Cut filler words

10–40% fewer input tokens

Use CSV over JSON

2–5× cheaper data format

Enable prompt caching

90% off repeated input

Compress history

50–80% fewer history tokens

Set max_tokens

Hard cap on output cost

Route to right model

10–50× cost reduction

Golden rule

Every token you don't send is a token you don't pay for and a token that can't crowd out useful context. Optimization is both a cost play and a quality play.