GPT-2 from Scratch | Sumit Pokharel

Learning how the original GPTs worked by building one myself, with a focus on understanding the core principles of the model

Introduction

I’ve always wanted to create a GPT-2 model from scratch, and this is my attempt at it. What I have in this repository is a fully-working GPT-2 model that can generate texts based on a given input and expected length of the text completion.

Update of Dec 15, 2024:
Created and tested the code that pretrains the model. It works. However, there are some caveats: I tried training the model on a very small dataset using the-verdict.txt file, which barely has 18,000 total characters. You can probably infer that these weights are not very good. However, training the model on a larger dataset and a larger compute is definitely recommended if you can, but I am extremely GPU-poor, and the free GPU resources I tried (Kaggle in particular) are limited. You can take a look at the notebook I published on Kaggle here by making use of the tiny-textbooks dataset from huggingface.

There is also another way to load pretrained weights from OpenAI’s GPT-2 model. I have loaded my model with the GPT-2 weights and it works very well. You can take a look at the my-model-w-oai-weights.ipynb file to see how I did it.

This is how it looks:

Sample-output

Setup and Installation

1. To start, use the terminal to go to the directory where you want to clone the repository and clone it:

git clone https://gitlab.com/sumitdoesml/gpt2-from-scratch.git

2. Navigate inside the repository:

cd gpt2-from-scratch

3. Create a virtual environment:

python -m venv .venv

# or if you are using uv
uv venv --python 3.12

4. Activate the virtual environment:

source .venv/bin/activate

5. Finally, install all the dependencies:

pip install -r requirements.txt

# or if you are using uv
uv pip install -r requirements.txt

Note: pip should automatically be installed if you create a virtual environment, but in case your terminal throws errors, you can install it manually too:

pip installation

And that’s it! We have set up the enviroment.

Running the Code

With OpenAI Weights

Please go to my-model-w-oai-weights.ipynb to see how to load the pretrained weights from OpenAI’s GPT-2 model and generate text.

With Your Own Pretrained Weights

For this, you will firstly need to train the model using your custom dataset. You can do this by configuring your dataset and training parameters in the train.py file. There are lots of datasets available on the internet that you can use for this, but pretty much all require GPUs to pretrain the model to a half-decent level. And I am extremely GPU-poor, so I could only do it on Kaggle (see the Kaggle notebook I published).

Once you train the model, you can load the weights into the model (see my-model-w-oai-weights.ipynb as well as the end of train.py for how to do this) and then go to generatetext.py to generate text.

You can change the input text as well as the desired length of the output text in the generatetext.py file (lines 20 and 56 respectively). Based on the input text, the model will generate a completion of the text of the desired length.

To run the code, you can use the following command:

python generatetext.py

This will generate a text completion based on the input text and the desired length of the output text. The output will be printed to the console.

It looks something like the following (based on start_context=“Happy birthday to” and max_new_tokens=200):

generatedtext

This is a screenshot of the model without any pretraining, so it is not very good. I wanted to do all this to understand the inner workings of the model, and it was fun! But you should definitely use a larger dataset and a larger compute if you can, or just stick with a pre-trained model. For studying purposes, however, doing something like this is definitely a good idea.

References

This has been made possible largely due to the book “Build a Large Language Model from Scratch” by Sebastian Raschka. God bless the man.

Additional resources that I referred to:

Language Models are Unsupervised Multitask Learners, the original GPT-2 paper by Radford et al.
Let’s build a GPT: from scratch, in code, spelled out by Andrej Karpathy
Attention is all you need, the original Transformer paper by Vaswani et al.
Understanding LLMs: A Comprehensive Overview from Training to Inference by Liu et al.
The Transformer Family by Lilian Weng

Go to the top