This site gives a behind the scenes look at how LLMs work under the hood.
We'll start by covering the few core concepts that underpin the tech behind LLMs, with simple interactive demos to get an intuition for how they work.
Once that's done, we'll see the full structure of an LLM, and how they are trained.
When we provide a prompt to an LLM, the first thing that happens is that the prompt is broken down into tokens.
Much of the time, this just means breaking down the prompt into individual words and punctuation characters, but sometimes, particularly for more obscure words, longer words, or proper nouns, they are broken down into sub-word chunks. This is mainly to allow the LLM to better handle unknown or rare words, as well keeping the vocabulary size (i.e. the number of tokens the LLM needs to know) down to a manageable size.
Once a prompt has been converted to tokens, each token is then represented as a point in a high-dimensional geometrical space. This is because LLMs are neural networks, and neural networks need mathematical objects to work with instead of text data.
This process is called token embedding.
In theory, you could just assign any old point in space randomly to each token, and it would be a valid token embedding. But the interesting thing about LLM token embeddings, is that as a result of the LLM training, words with similar meanings end up with nearby to each other in space. So, for example, king, queen, and prince would end up next to each other, and cat, dog, and rabbit would end up close to one another, but the two groups of points would be far apart.
Try it out below. Here we embed tokens in 3D space to illustrate the concept. Real LLMs use much higher dimensional spaces, but the principle is the same.
Once a prompt has been tokenised and embedded, the next key step is self-attention. This is the mechanism that allows each token to look at all the other tokens in the sequence and decide which ones are most relevant to it.
The point of this is that, without self attention, a neural network struggles to relate tokens that are far away from each other, but important to add context. Imagine the example:
The book that I borrowed from the library last week, despite its damaged cover and missing index, was surprisingly useful.
The words book and useful are far away, but they need to be considered together to understand the core meaning of the sentence:
The book was useful
Self-attention is a way to model this within the neural network; as the model undergoes training, we allow token embeddings to be enriched by other token embeddings which are important to understanding the given token. We do this using a self-attention module
A self-attention module has many attention heads running in parallel, each learning to track a different kind of relationship — some focus on grammar, some on meaning, some on position.
Switch between the attention heads below to see how each one tracks a different kind of relationship. You can also click a word to select it and see how the amber highlight — showing what it attends to — shifts.
After the self-attention module comes the feed-forward layer. Once self attention has enriched each token with information from other tokens in the input, the feed-forward layer then re-interprets each token in terms of learned features, some of which may correspond to higher level concepts or abstractions.
Which features get picked out for a given token is decided by an activation function. It acts like a switch, turning each hidden feature on or off (or somewhere in between) depending on the input — which is what lets different tokens light up different features in the diagram below.
Pick an input token below to see which learned feature it triggers. Notice that different tokens light up different concept units — and that tokens sharing a concept (like cat and dog) activate the very same hidden unit.