AI Knowledge Hub

Token Optimization

Tokens are the currency of AI. Every word you send costs money and consumes context. Learn what tokens are, how they work, and how to use fewer of them.
What Is a Token?
Fundamentals

A token is the basic unit an LLM uses to read and write text. It is NOT the same as a word, character, or letter — it sits somewhere in between.

Modern LLMs use Byte-Pair Encoding (BPE) to split text into tokens. BPE learns which character sequences appear most often in training data and groups them into single tokens. Common sequences become one token; rare sequences are split into many.

How BPE works (simplified)
Start

Every character is its own token

Step 1

Find the most frequent pair: "h" + "e" → "he"

Step 2

Find next pair: "he" + "l" → "hel"

Step 3

Find next pair: "hel" + "lo" → "hello" (1 token!)

Result

"hello" is one token; "héllo" might be 3

Token vs Word vs Character
TextCharactersWordsTokens
hello51
1
Hello, world!132
4
unbelievable121
4
ChatGPT71
3
朋友 (friend)91
5
function() {}131
5
(3 spaces)30
1

Not all text tokenizes equally. Understanding which content creates the most tokens helps you optimize.

Word tokens
1 token
hello
hello

Common English words are often a single token.

Subword tokens
4 tokens
"un"
"believ"
"able"
"..."

Rare or complex words are split into subwords by the BPE algorithm.

Character tokens
4 tokens
朋友
朋友

Non-Latin scripts (Chinese, Arabic, emoji) tokenize very inefficiently — often 2–4 tokens per character.

Code tokens
6 tokens
"function"
" get"
"Data"
"()"
" {"
""

Code is tokenized by symbol boundaries — usually efficient but punctuation adds tokens.

Whitespace tokens
3 tokens
" "
" "
"indent"

Leading whitespace (indentation) can create extra tokens. Minified code is often more token-efficient.

Number tokens
6 tokens
"3"
"."
"14"
"159"
"26"
"5"

Long numbers split into multiple tokens. Use approximate numbers or remove trailing precision.

Tokenization efficiency by content type

English prose (concise)

95% efficient

Source code (well-formatted)

80% efficient

JSON / structured data

55% efficient

Repeated text / boilerplate

40% efficient

CJK / Arabic script

25% efficient

Binary / base64 encoded data

10% efficient

Interactive: Token Counter & OptimizerType text in both boxes to compare token counts in real time
~18 tokens
71 characters · ~18 tokens · ~$0.000054 per call
~8 tokens
29 characters · ~8 tokens · saved ~10 tokens (56%)
Cost Impact CalculatorSee how optimization affects your monthly bill

Tokens per API call (input context)

ScenarioTokens/callMonthly cost*
Current (unoptimized)8,000$24.00
After 40% reduction4,800$9.60
Savings3,200$14.40
*Assumes 1,000 calls/day at Claude Sonnet $3/M input tokens
Input Tokens vs Output TokensThey are priced differently — and behave differently
Input Tokens
Everything you send

System prompt + conversation history + file contents + tool results


Cheaper

Claude Sonnet: $3 / million


You control this

Write shorter prompts, use fewer files, compact conversation history


Cache discount

Prompt caching reduces repeated input costs by up to 90%

Output Tokens
Everything AI writes

The model's response — text, code, tool calls, reasoning


More expensive

Claude Sonnet: $15 / million (5× input price)


Harder to control

Use max_tokens limit, ask for concise responses, avoid "explain everything"


No cache discount

Output is always freshly generated — no caching possible

How to Save Tokens — 6 Strategies

The single biggest saving is removing filler from prompts. Every word that doesn't change the output is a wasted token.

Before (47 tokens)
BAD
Could you please help me by writing a function in the Python programming language that will take a list as input and return the items in that list sorted in ascending order?
After (11 tokens)
GOOD
Python function: sort list ascending.
Token Optimization Quick Reference
Cut filler words

10–40% fewer input tokens

Use CSV over JSON

2–5× cheaper data format

Enable prompt caching

90% off repeated input

Compress history

50–80% fewer history tokens

Set max_tokens

Hard cap on output cost

Route to right model

10–50× cost reduction