2 Comments
User's avatar
Rainbow Roxy's avatar

Thanks for writing this, it clarifies a lot. I'm really curious about the 'tokens' you mentioned – are they essentially numerical embeddings of text segments? So keen to read the rest.

Francis Tan's avatar

Almost, but there is an important distinction. Tokens are text chunks, while embeddings are the numerical representations of those chunks. The process goes like this: text is split into tokens first, then each token is mapped to numbers inside the model so it can do math on them. So, tokens are the LEGO bricks you can see, and embeddings are the invisible coordinates that tell the model how each brick relates to every other brick in meaning and context.