Back to

GPT-5.3-Codex-Spark: Why 1,000 tokens per second changes how we code

OpenAI's new Spark model hits 1,000 tokens/s on Cerebras hardware. It's not smarter, but the instant speed changes the coding workflow forever.

I remember the first time I used Github Copilot in 2022. It felt like magic until it didn't. You'd hit tab, wait a beat, and watch the gray text ghost in. It was faster than typing, sure, but it still felt like a transaction: I ask, you give.

That transaction is gone.

This morning, OpenAI released GPT-5.3-Codex-Spark. It’s a dedicated coding model that runs at—I had to double-check this—1,000 tokens per second.

To put that in perspective, human reading speed is about 5-10 tokens per second. This thing generates code faster than your monitor can refresh. It doesn't "write" code; it just manifests it. You blink, and the function is there.

I've been using it for six hours. It’s messy, it’s chaotic, and it’s arguably the most fun I’ve had in a text editor in years.

The hardware twist (Goodbye, Nvidia?)

Here is the interesting part that tech Twitter is obsessing over: Spark isn't running on Nvidia H200s or B200s.

OpenAI partnered with Cerebras for this specific deployment. They are using those massive wafer-scale chips (the ones that look like a dinner plate).

Why does this matter to us? Because wafer-scale integration minimizes memory latency. The data doesn't have to travel between different GPUs; it stays on one giant piece of silicon. That is how they hit 1,000 tokens per second.

For years, we've been told that "bigger is better" (more parameters, more reasoning). Spark flips that. It says speed is a feature. When latency drops to zero, the AI stops feeling like a tool and starts feeling like an extension of your nervous system.

Deep reasoning vs. instant manifestation

Let’s be honest about what Spark is not. It is not "smarter" than GPT-5.3 Base or o3. In fact, it's dumber.

If you ask Spark to refactor a legacy monolithic architecture into microservices, it will hallucinate a mess that compiles but makes no sense. It doesn't "think" deeply. It has zero "chain of thought" capabilities.

But for 90% of daily coding? It’s incredible.

The "Flow" state

I was building a React component for a settings page. Usually, I'd type <div className=..., wait for the suggestion, tab, wait, tab.

With Spark, I typed // settings card with toggle for dark mode and the entire 50-line component appeared instantly. Not line-by-line. Instantly.

It shifts the workflow from "Write -> Review -> Write" to "Prompt -> Reject -> Prompt -> Accept". You are iterating so fast that it feels like sculpting clay rather than laying bricks.

The new workflow: The toggle

The UI in VS Code has changed to accommodate this. There is now a physical toggle switch in the AI pane (or a hotkey Cmd+Shift+L for "Lightning").

  • Spark Mode (Lightning Icon): This is your default. It’s for boilerplate, regex, unit tests, CSS, and quick scripts. It’s virtually free and infinite.
  • Base/o3 Mode (Brain Icon): You switch to this when you hit a wall. When Spark generates a bug you can't see, or when you need to plan a database schema.

I found myself staying in Spark mode for 45 minutes straight, just vibrating through code. Then I hit a nasty state management bug, switched to Base (which felt agonizingly slow by comparison), fixed it, and switched back.

Where it breaks

I don't want to overhype this. There are frustrations.

  1. Context drifting: Because Spark is so fast, you tend to generate too much code. I ended up with three different utility functions doing the same thing because I was moving too fast to check if I already had one.
  2. Subtle bugs: It’s confident. It will write a regex that looks 99% correct but fails on an edge case. Because it appears instantly, you trust it more than you should. You have to force yourself to slow down and read the code.
  3. The "slop" factor: My codebase grew by 20% in size today. Spark loves verbose, explicit code. It doesn't write elegant, one-line map/reduce functions. It writes for loops.

Conclusion

We finally have a model that keeps up with thought.

For the last three years, we've been constrained by the GPU. We learned to think in "prompts" and "waits." Spark removes the wait. It’s a specialized tool—a scalpel, not a Swiss Army knife—but it proves that we don't just need smarter models. We needed faster ones.

If you have the update, look for the little lightning bolt icon in your copilot settings. Turn it on. Just don't blame me when you have to clean up the mess you made in record time.