What Are Tokens?

Tokens are the fundamental units that large language models (LLMs) use to process text. Models don't read words or characters directly — they split text into a sequence of tokens. A token can be a full word (like hello), a subword (like un + believ + able), or even a single character or punctuation mark.

For CJK (Chinese, Japanese, Korean) characters, each character typically encodes to 1-2 tokens, meaning the same semantic content in Chinese usually consumes more tokens than in English.

Token Examples

Text	Approx. Tokens	Explanation
`Hello world`	2	Common English words = 1 token each
`你好世界`	4	Each CJK character ≈ 1-2 tokens
`unbelievable`	3	Long words split into subwords
`ChatGPT is amazing!`	5	Proper nouns may be split
`const x = 42;`	5	Symbols and numbers each take tokens
`https://example.com/path`	7-9	URLs are split into many tokens

Token Limits & Pricing by Model

Below are the context window sizes and API pricing for major AI models as of 2026 (price per 1M tokens, USD):

Model	Context Window	Input / 1M	Output / 1M
GPT-4o	128K	$2.50	$10.00
GPT-4 Turbo	128K	$10.00	$30.00
GPT-3.5 Turbo	16K	$0.50	$1.50
Claude 3.5 Sonnet	200K	$3.00	$15.00
Claude 3 Opus	200K	$15.00	$75.00
Claude 3 Haiku	200K	$0.25	$1.25
Gemini 1.5 Pro	1M	$1.25	$5.00
Gemini 1.5 Flash	1M	$0.075	$0.30
Llama 3.1 405B	128K	Varies by provider
Llama 3.1 70B	128K	Varies by provider

How Tokenization Works

Modern LLMs mostly use BPE (Byte Pair Encoding) or its variants to convert text into tokens. The core idea behind BPE is:

Start from bytes: Initially treat each byte as an individual token.
Iterative merging: Count all adjacent token pair frequencies and merge the most frequent pair into a new token.
Repeat until vocabulary target: Keep merging until the vocabulary reaches a preset limit (e.g., cl100k_base has ~100K tokens).

Different models use different tokenizers:

OpenAI GPT-4/4o: Uses the cl100k_base encoder with ~100K vocabulary.
Anthropic Claude: Uses a proprietary tokenizer, similar efficiency to cl100k_base, slightly better for natural language.
Google Gemini: Uses SentencePiece tokenizer, optimized for multilingual text, slightly better CJK efficiency.
Meta Llama 3: Uses a BPE-based tokenizer with ~128K vocabulary.

This tool uses a heuristic algorithm for approximate token estimation. For exact counts, use each model's official tokenizer library (e.g., OpenAI's tiktoken).

Tips for Reducing Token Usage

Write concise prompts: Remove redundant phrasing and repeated instructions. Direct, concise prompts use fewer tokens and often produce better results.
Use system messages: Place fixed background instructions in the system message to avoid repeating them in every conversation turn.
Limit output length: Use the max_tokens parameter to cap response length, or explicitly ask for brief answers in your prompt.
Avoid large code blocks: Paste only relevant code snippets instead of entire files. Code typically uses 1 token per 2-3 characters, making it less efficient.
Compress with summaries: For long conversations, periodically ask the model to summarize previous context and replace full history with the summary.
Choose the right model: Use cost-effective models like GPT-3.5 or Claude Haiku for simple tasks; reserve GPT-4o or Claude Opus for complex reasoning.

Related Tools

Advanced Text Statistics — Character count, word frequency, readability scores
AI API Pricing Comparison — Compare API pricing across major models
AI Models Comparison — Compare model capabilities, context windows, speed

Token Counter

Token Breakdown

Context Window Usage

What Are Tokens?

Token Examples

Token Limits & Pricing by Model

How Tokenization Works

Tips for Reducing Token Usage

Related Tools

Token Counter

Token Breakdown

Context Window Usage

What Are Tokens?

Token Examples

Token Limits & Pricing by Model

How Tokenization Works

Tips for Reducing Token Usage

Related Tools

Outils associés