NanoChat is an open-source project by Andrej Karpathy that provides a complete pipeline to train a ChatGPT-style language model from scratch.

How much does it cost to train NanoChat?

You can train a functional model on a single 8xH100 GPU node for approximately $100.

Who should use NanoChat?

Developers, students, and researchers who want to understand the full stack of training an AI model rather than just calling an API.

How long does it take to train the model?

Recent updates allow you to train a GPT-2 level model in about two hours.

Is NanoChat meant for production?

No, it is primarily an educational tool designed to demystify the training process of large language models.

Where can I find the code?

The code is available on GitHub under Andrej Karpathy's account.

How much Python experience do I need to understand NanoChat?

You need a solid understanding of basic Python and programming loops. Prior experience with PyTorch is helpful but not strictly required if you follow the code slowly.

Where can I rent an H100 GPU to run this code?

You can rent GPU instances by the hour from cloud providers like Lambda Labs, RunPod, AWS, or Paperspace.

Can I deploy the resulting NanoChat model to a website?

Yes, the repository includes a basic inference interface, and you can export the trained model weights to run on any standard web hosting setup.

Stop renting models and build your own chatgpt clone for $100

The AI industry has spent the last few years convincing us that training a language model requires a massive server farm and millions of dollars. They want developers hooked on their APIs. They want us to treat these models like magical black boxes that only the big tech companies have the resources to build.

But Andrej Karpathy just completely undermined that narrative with his new open source project. You can actually build your own conversational AI pipeline for the cost of a nice dinner. It is called NanoChat, and it is the most honest look at AI development I have seen in a long time.

The problem with modern AI development

Right now most developers working in the artificial intelligence space are just routing text to an API endpoint. We send a prompt to OpenAI or Anthropic, wait for a response, and parse the JSON. It is useful work, but it is not machine learning. You are effectively renting someone else's brain.

When you only interact with models through an API, you lose all intuition for how they actually work. You do not see the tokenization quirks. You miss the mechanics of the attention layers. You have no idea what the loss curve looked like during training.

This creates a massive knowledge gap. The people building the foundation models understand the physics of AI. Everyone else is just playing with the user interface. We have an entire generation of software engineers who know how to prompt a model but have absolutely no idea what happens under the hood when they hit submit.

What exactly is NanoChat

Karpathy describes NanoChat as the best ChatGPT that $100 can buy. It is a complete pipeline for training a conversational model. He released it in late 2025, and it has already become the default starting point for engineers who want to understand how these systems operate.

Unlike massive corporate repositories that hide the core logic behind layers of enterprise abstractions, NanoChat is painfully simple. It gives you the raw PyTorch code to go from a pile of text documents to a working web interface where you can chat with your creation.

It builds on his previous work with nanoGPT, but it takes things a step further. Instead of just pretraining a base model to predict the next word, NanoChat walks you through the entire process of making that model useful. It includes the steps to make it actually respond to questions instead of just rambling.

Breaking down the pipeline

The genius of NanoChat is how it exposes every single step of the model creation process. Most tutorials stop at the model architecture. Karpathy forces you to look at the entire data lifecycle.

First, you have to deal with the tokenizer. This is the piece of code that chops words up into little numbers the model can digest. You get to see exactly why language models struggle with spelling or math, because you literally watch the code compress the text into awkward chunks.

Then comes the pretraining phase. This is where the model reads gigabytes of internet text and learns the basic structure of human language. It is a brute force statistical exercise. After that, the project walks you through the instruction tuning phase. This is the magic step that turns a text predictor into a helpful assistant. You see the exact format of the conversations used to teach the model how to be polite and answer questions directly.

Finally, you get a simple web interface. You can type a message in your browser and watch your custom model generate a response, token by token.

The $100 training run

The most compelling part of this project is the accessibility. You do not need to raise venture capital to run this code.

The entire pipeline is designed to run on a single node with eight NVIDIA H100 GPUs. Renting one of those nodes from a cloud provider costs roughly $100 for the time it takes to complete the training run. Recent updates to the project even show how you can train a GPT-2 level model in just two hours.

I keep thinking about the developers who are currently in university or just starting their careers. For $100, they can get hands-on experience training an entire language model. They get to watch the loss go down in real time. They get to see exactly what happens when you mess up the learning rate. That kind of education used to require getting hired at a massive research lab.

Why building from scratch matters

I know what some people are thinking. Why would I spend $100 to train a weak model when I can use the latest frontier models for pennies?

The point is not to build a production model that replaces Claude or Gemini. The point is to build a mental model. When you write the code that converts words into numbers, you stop thinking of the AI as a thinking machine. You start seeing it as a statistical engine.

This changes how you build applications on top of AI. When you understand the underlying math, you write better prompts. You understand why the model hallucinates certain facts but gets others right. You can guess how a model will fail before you even run the test. You gain an intuition that cannot be learned by reading documentation or watching tech talks.

Official Links

GitHub Repository: https://github.com/karpathy/nanochat

Conclusion

The era of treating AI like magic is ending. The tools to build these systems are becoming smaller, cheaper, and easier to understand. If you have been relying exclusively on APIs to build your applications, take a weekend to run NanoChat. Spend the $100. It is the best investment you can make in your understanding of the technology that is driving the software industry forward.

Stop renting models and build your own chatgpt clone for $100

The problem with modern AI development

What exactly is NanoChat

Breaking down the pipeline

The $100 training run

Why building from scratch matters

Official Links

Conclusion

SmallAI Team

Frequently Asked Questions

What is NanoChat?

How much does it cost to train NanoChat?

Who should use NanoChat?

How long does it take to train the model?

Is NanoChat meant for production?

Where can I find the code?

How much Python experience do I need to understand NanoChat?

Where can I rent an H100 GPU to run this code?

Can I deploy the resulting NanoChat model to a website?

Related Tools

Related Articles

Login

You've reached your free limit

You ran out of credits

The problem with modern AI development

What exactly is NanoChat

Breaking down the pipeline

The $100 training run

Why building from scratch matters

Official Links

Conclusion

SmallAI Team

Frequently Asked Questions

What is NanoChat?

How much does it cost to train NanoChat?

Who should use NanoChat?

How long does it take to train the model?

Is NanoChat meant for production?

Where can I find the code?

How much Python experience do I need to understand NanoChat?

Where can I rent an H100 GPU to run this code?

Can I deploy the resulting NanoChat model to a website?

Related Tools

Related Articles

Get new posts in your inbox