Skip to content

Model surgery needed when adding new tokens

Model Surgery for New Tokens

When you add new special tokens to a pre-trained model, you need to make some adjustments to the model's parameters. In particular:

  1. The embedding matrix needs to be resized to accommodate the new tokens. Typically, the new rows are initialized with small random values.
  2. The final output layer (often called the "LM head" in GPT-style models) needs to be resized to output logits for the new tokens.

This process is sometimes called "model surgery". It's a bit delicate, as you need to make sure the new parameters are correctly integrated with the existing ones. Fortunately, most modern frameworks (like Hugging Face's transformers library) handle this automatically when you specify the special tokens during tokenization.

That covers the basics of special tokens in the GPT tokenizers. In the next section, we'll put everything we've learned together and write our own complete GPT-style tokenizer from scratch!


Last update: 2024-08-21