Back to Gems of AI

How 8000 lines of code finally made language models make sense to me

Language models feel like magic until you see the code. Andrej Karpathy's NanoChat project reduces the entire AI training pipeline to just 8000 lines of PyTorch.

I used to look at large language models as impenetrable black boxes. I knew the high level concepts like attention mechanisms and gradient descent, but the actual day to day mechanics felt completely out of reach. I assumed the code powering modern AI was a tangled web of millions of lines of proprietary logic.

Then I spent a weekend reading through NanoChat. It turns out that when you strip away the corporate wrappers and production boilerplate, an entire artificial intelligence pipeline fits into roughly 8000 lines of PyTorch. Reading it completely changed my perspective on what these systems actually are.

The abstraction trap in software

Software engineering loves abstractions. We hide complex systems behind simple interfaces so we do not have to think about them. This is generally a good thing, because nobody wants to write assembly language just to render a web page.

But when it comes to machine learning, these abstractions have become a trap. The major AI labs have wrapped their models in so many layers of API clients and deployment frameworks that the underlying math is completely obscured. When a developer starts learning AI today, they usually just learn how to format a JSON request.

This creates a false sense of complexity. We assume that because the outputs of these models are so impressive, the underlying code must be impossibly complicated. NanoChat proves that the core logic is actually quite small. The complexity comes from the scale of the data and the hardware, not the lines of code.

Reading the NanoChat codebase

Andrej Karpathy released NanoChat in late 2025 as an open source project aimed at education. He previously built nanoGPT, which was a brilliant but limited look at pretraining. NanoChat is the full stack. It is the entire journey from raw text to a chat interface.

The best way to experience this project is not to run it immediately, but to literally read the code top to bottom.

Because it is written in PyTorch, it reads almost like regular Python. You can follow the data as it flows through the system. You see the exact mathematical operations that take a word, turn it into a vector, and multiply it against billions of parameters to guess the next word. There is no magic here. It is just matrix multiplication wrapped in standard programming loops.

From tokenizer to web interface

The 8000 lines of code are not just the neural network architecture. That number includes the entire lifecycle of the application.

You start with the tokenizer code. This section demystifies one of the most confusing parts of working with AI. You get to see the algorithms that decide how to chop up a sentence, and suddenly it makes complete sense why language models are so bad at tasks that require counting letters.

Then you read the training loop. This is where the model actually learns. You see the code that calculates the error rate and updates the weights. It is repetitive, mechanical, and surprisingly straightforward.

Finally, the codebase includes the inference logic and a small web server. You see exactly how the model takes a prompt from a user, feeds it through the network, and streams the generated text back to the browser. You realize that ChatGPT is just a really fast autocomplete loop running on a very large spreadsheet of numbers.

Training a GPT-2 level model in two hours

Reading the code is one thing, but running it is where the concepts really solidify. The project is optimized to run on an 8xH100 GPU node. This sounds expensive, but it translates to about $100 in cloud computing costs.

Recent updates to the codebase mean you can train a model roughly equivalent to the original GPT-2 in just two hours.

I genuinely do not know how to feel about the fact that what used to be a massive research breakthrough can now be replicated over a lunch break by a curious developer. But watching those loss numbers drop in your own terminal is a profound experience. You are not just downloading a model. You are watching a statistical system slowly build an understanding of language from scratch.

The beauty of minimal implementations

We need more projects like NanoChat in the software industry. Production code is necessary for running businesses, but it is terrible for learning.

When you learn from a minimal implementation, you grasp the physics of the system. You start to understand the hard limits of what language models can and cannot do. You stop expecting them to reason like humans and start treating them like the advanced pattern matching engines they are.

If you write software and you feel intimidated by the current pace of AI development, I highly recommend blocking off a weekend. Grab a coffee, open the NanoChat repository, and just start reading. It will take the magic out of AI, and replace it with something much better. It will replace it with understanding.

  • GitHub Repository: https://github.com/karpathy/nanochat

Conclusion

The barrier to understanding language models has never been lower. You do not need a PhD to grasp the core concepts of machine learning anymore. You just need the patience to read 8000 lines of Python and the willingness to look past the hype. NanoChat is a gift to the engineering community, and studying it will make you a significantly better developer in the AI era.

Continue exploring

S

SmallAI Team

From Gems of AI ยท Manage credits

Frequently Asked Questions

Why is NanoChat only 8000 lines of code?

It is designed to be a minimal, educational implementation of a language model training pipeline, stripping away the complex abstractions found in production codebases.

What programming language does NanoChat use?

The project is written entirely in Python using the PyTorch library.

Can I run NanoChat on my laptop?

While you can read and study the code on a laptop, actually training the model requires access to GPUs, typically a cloud node with H100s.

What does the code cover?

It includes everything from the tokenizer and dataset processing to the transformer architecture and the final web interface.

Is this related to nanoGPT?

Yes, NanoChat builds upon the ideas of nanoGPT but expands the scope from just pretraining to a full conversational chatbot pipeline.

Do I need advanced math to understand the code?

Basic linear algebra and calculus help, but the code is structured to be readable for software engineers who want to learn the mechanics of AI.

Does the 8000 lines of code include the model interface?

Yes, it includes the full stack from tokenization and training loops all the way to a simple web-based chat interface.

How is NanoChat different from nanoGPT?

While nanoGPT focused strictly on pretraining a base model, NanoChat covers the entire pipeline including instruction tuning, which is what actually makes the model conversational.

Do I need to know advanced calculus to understand the codebase?

While knowing the math behind gradient descent helps, the PyTorch abstractions make it so that you can understand the data flow and logic without needing to derive the calculus yourself.

Ready to try our AI tools? 100+ specialized tools for tiny jobs. No signup required.
Browse 100+ Tools