Skip to content

Lesson 2: What is a Token?

A token is the basic unit an AI model works with. It is not a word — it is a chunk of text. Every piece of text the model reads or writes is first broken down into tokens through a process called tokenisation.

Consider the word “understanding”:

  • It might be split into ["under", "stand", "ing"] — three tokens
  • A shorter word like “cat” is typically a single token
  • Numbers, punctuation, and spaces are also tokens

The exact split depends on the model’s tokeniser — a fixed vocabulary of token patterns learned during training. Different model families use different tokenisers, so the same text may produce different token counts.

Tokens matter because:

  1. Pricing — Most AI APIs charge per token (input + output)
  2. Context limits — Every model has a maximum number of tokens it can handle at once
  3. Speed — More tokens means longer processing time
  4. Quality — How text is tokenised affects the model’s ability to understand and generate it
  • 1 token is roughly 4 characters in English or ¾ of a word
  • 100 tokens is roughly 75 words
  • A typical page of text is around 300-400 tokens
  • Code tends to use more tokens than prose because of syntax, indentation, and special characters

When working with AI tools, awareness of tokens helps you:

  • Estimate costs before running expensive operations
  • Stay within limits by keeping prompts focused
  • Understand errors when a model says your input is too long
  • Optimise performance by reducing unnecessary context