[TODO: Intro sentence — what this page is and who it's for]
[TODO: Overview of structure — mention basic concepts, training, and inference as the three areas covered]
[TODO: Explain what tokenisation is and why LLMs use tokens rather than characters or words]
Each word is represented as a point in a high-dimensional space — its position encodes meaning. Words with similar meanings end up nearby. Below we show a 3D projection of 50-dimensional GloVe vectors, so some distance relationships are approximate.
These are static word embeddings (GloVe). Modern LLMs use contextual embeddings — the same word gets a different vector depending on surrounding words — but the core idea of meaning as position in space is the same.