From prompt to output

What actually happens between hitting "send" and seeing a reply? Step through the pipeline below — click any stage for a plain-language explanation and a worked example. Start with a live token demo using the real tokenizer.

Live demo · real tokenizer (o200k_base, exact for OpenAI)

Click a stage to read it (use ← → to move between stages).

1 · Text prompt

Everything starts as plain text: your system prompt, any context, and the user message, concatenated into one string. The model never sees "words" or "meaning" yet — just characters.

Example: Hello, tokenswise!

2 · Tokenization → IDs

A byte-pair-encoding (BPE) tokenizer chops the text into tokens — sub-word chunks from a fixed vocabulary — and maps each to an integer ID. Common words are one token; rare words, code, and punctuation split into more. This is what you're billed on and what fills the context window. The live demo above shows the real split.

Example (illustrative): "Hello" → 1 token, " tokens" + "wise" + "!" → 3 more, each becoming an ID like 13225.

3 · Embeddings

Each token ID is looked up in an embedding table and turned into a vector — a long list of numbers (often hundreds or thousands of dimensions) that places the token in a "meaning space," with similar tokens nearby. Positional information is added so order matters.

Example (illustrative): token 13225[0.014, -0.22, 0.08, …] (e.g. 2,048 numbers).

4 · Transformer layers

The vectors flow through dozens of transformer layers. In each, an attention mechanism lets every token "look at" the others and pull in relevant context, and a feed-forward network transforms the result. Layer by layer, the representation of the last position accumulates everything needed to predict what comes next.

Example: by the top layer, the vector at the end of "The capital of France is" strongly encodes "a city name, probably Paris."

5 · Decoding: logits → probabilities → next token

The final vector is projected to logits — one raw score per token in the whole vocabulary (~200,000). A softmax turns those into probabilities, then one token is sampled:

  • Temperature flattens (high) or sharpens (low) the distribution — higher = more random/creative, lower = more deterministic.
  • Top-p / top-k restrict sampling to the most likely tokens before choosing.

This produces one token at a time. The chosen token is appended to the input and stages 3–5 repeat — the autoregressive loop — until a stop token or limit is hit.

Example (illustrative): next-token probabilities Paris 0.91, the 0.03, a 0.01… → samples "Paris".

6 · Detokenization

Each generated token ID is mapped back to its text piece and the pieces are concatenated — the reverse of stage 2 — turning the stream of IDs back into readable text.

Example: [40, 1079, 6291]"I am Paris" (illustrative IDs).

7 · Output

The detokenized text is streamed back to you — usually token by token, which is why responses appear gradually. Total cost = your input tokens + every output token the loop produced, each priced per the model's rates.

Try the token counter on your own prompt, or price a call with the cost calculator.

Numbers in the stage examples (vector sizes, probabilities, IDs) are illustrative to show the shape of each step — only the live demo at the top uses a real tokenizer. Real models differ in dimensions, layer counts, and vocabulary.

More free token tools