Lesson 2: What is a Token?
The basic unit
Section titled “The basic unit”A token is the basic unit an AI model works with. It is not a word — it is a chunk of text. Every piece of text the model reads or writes is first broken down into tokens through a process called tokenisation.
How tokenisation works
Section titled “How tokenisation works”Consider the word “understanding”:
- It might be split into
["under", "stand", "ing"]— three tokens - A shorter word like “cat” is typically a single token
- Numbers, punctuation, and spaces are also tokens
The exact split depends on the model’s tokeniser — a fixed vocabulary of token patterns learned during training. Different model families use different tokenisers, so the same text may produce different token counts.
Why tokens matter
Section titled “Why tokens matter”Tokens matter because:
- Pricing — Most AI APIs charge per token (input + output)
- Context limits — Every model has a maximum number of tokens it can handle at once
- Speed — More tokens means longer processing time
- Quality — How text is tokenised affects the model’s ability to understand and generate it
Rules of thumb
Section titled “Rules of thumb”- 1 token is roughly 4 characters in English or ¾ of a word
- 100 tokens is roughly 75 words
- A typical page of text is around 300-400 tokens
- Code tends to use more tokens than prose because of syntax, indentation, and special characters
Practical impact
Section titled “Practical impact”When working with AI tools, awareness of tokens helps you:
- Estimate costs before running expensive operations
- Stay within limits by keeping prompts focused
- Understand errors when a model says your input is too long
- Optimise performance by reducing unnecessary context