nanogpt kv cache first attempt

1. Run basic nano-gpt

git clone https://github.com/karpathy/nanoGPT.git

Install necessary packages

pip install -r requirements.txt

I have these packages in the requirements.txt

blobfile==2.0.1
certifi==2022.12.7
charset-normalizer==3.0.1
filelock==3.9.0
idna==3.4
lxml==4.9.2
numpy==1.24.2
pycryptodomex==3.17
pytz==2022.7.1
regex==2022.10.31
requests==2.28.2
tokenizers==0.13.2
torch==2.0.0
typing_extensions==4.4.0
urllib3==1.26.14
torch==2.0.0
numpy==1.24.2
transformers==4.28.1
datasets==2.11.0
tiktoken==0.3.3
wandb==0.14.2
tqdm==4.65.0

Follow quick start guidance in nanogpt repo do make sure that we can run training and inference successfully.

python data/shakespeare_char/prepare.py
python train.py --compile=False config/train_shakespeare_char.py
python sample.py --out_dir=out-shakespeare-char

My python version is 3.11 which is too high for model compile so I added --compile=False in train command.

With my A800 gpu, I get a loss 0.0449 after 5000 iteration training.

iter 4970: loss 0.0461, time 18.12ms, mfu 20.21%
iter 4980: loss 0.0441, time 18.14ms, mfu 20.24%
iter 4990: loss 0.0464, time 18.13ms, mfu 20.27%
step 5000: train loss 0.0383, val loss 4.7262
iter 5000: loss 0.0449, time 3352.84ms, mfu 18.26%

2. Load GPT-2 models checkpoints and test performance

https://stackoverflow.com/questions/75110981/sslerror-httpsconnectionpoolhost-huggingface-co-port-443-max-retries-exce

proxy error while trying to download gpt2 model from huggingface: https://github.com/huggingface/transformers/issues/17611

First downgrad requests version to 2.27.1

pip install requests==2.27.1

And then adding these two lines of code in train.py and sample.py fix the proxy connection issue for me

os.environ['CURL_CA_BUNDLE'] = ''
os.environ['HF_ENDPOINT']= 'https://hf-mirror.com'

Run sample.py to get a test of gpt2 model with params downloaded from huggingface.

 python sample.py --init_from='gpt2'

I tried to start with “please tell me a joke.” The output is not anything like joke but still very readable.

please tell me a joke

[…]

My name is Zarek, but I am extremely sad for you.

You can't even come to my house anymore

I'm sorry, I know

I have a dream

I don't know how long this thing will last

My name Is Zarek

I'm an adult who believes that

The problem with your friend is that he doesnt know

He doesn't know how to act

running time for 10 times inference:

---------------
Elapsed time: 25.4s

3. Implement KV cache for faster inference

4. Test KV cache performance

References

youtube video llm kv cache explanation

requirements.txt to run nano-gpt nano-gpt kv cache pr example

huggingface transformers kv cache source code on github

https://zhuanlan.zhihu.com/p/646577898

https://zhuanlan.zhihu.com/p/624740065

huggingface transformers API documentation




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Learning-based memory allocation for C++ server workloads summary
  • my question:
  • Binary search algorithm variant
  • Docker Rocksdb build
  • Difference between Dockerfile and Docker Compose