Back to Gems of AI

Symphony decoded: How OpenAI wants us to stop supervising agents

OpenAI's experimental Symphony framework shifts from coding copilots to autonomous agents that handle entire projects, providing proof of work via PRs and videos.

I have spent the last two years feeling like a helicopter parent to my code editor. We all know the drill. You write a prompt, watch the AI spit out 50 lines of code, and then spend the next ten minutes hunting down the hallucinated variable it sneaked in. It is exhausting. We traded writing code for micromanaging bots.

Today, OpenAI quietly dropped a GitHub repository that tries to fix this exact problem. It is called Symphony. The promise is simple but massive: stop supervising coding agents and start managing actual work.

The end of the babysitting era

Right now, most of us use AI as a very fast, sometimes confused junior developer sitting right next to us. You have to watch every keystroke. Symphony takes the agent out of your code editor and puts it in the background.

According to the repository, Symphony turns project work into isolated, autonomous implementation runs. Instead of you prompting an AI to write a function, Symphony looks at your issue tracker, figures out what needs to be done, and gets to work entirely on its own.

How the workflow actually operates

The demo video on their GitHub paints a clear picture of what this looks like in practice. Symphony monitors a Linear board for new tickets. When a task appears, it spawns an agent to handle it.

The agent goes off and writes the code, but it does not just throw a pull request over the wall and hope for the best. It has to provide proof of work. This includes passing CI status, running through PR review feedback, analyzing the complexity of the changes, and recording walkthrough videos of the finished feature.

You only get involved at the very end. If the proof of work looks good, you accept the PR and the agent safely lands the code.

You need harness engineering first

There is a catch. You cannot just drop Symphony into a messy, untested ten-year-old codebase and expect magic.

OpenAI notes that Symphony works best in environments that have adopted harness engineering. This means your project needs strong automated testing, clear CI pipelines, and solid guardrails. The AI needs a way to verify its own work before it shows anything to a human. If your codebase does not have tests, the agent is flying blind and will likely break things.

It is strictly an engineering preview

OpenAI is being honest that this is not a polished consumer product. They call it a low-key engineering preview meant for testing in trusted environments.

If you want to try it, you have two options. You can use their experimental reference implementation, which is written in Elixir. Alternatively, they suggest an incredibly meta approach: just feed the provided specification document to your favorite coding agent and ask it to build a custom version of Symphony for your specific tech stack.

Conclusion

We are clearly moving up the abstraction ladder. The days of fighting with autocomplete are ending. The new challenge will be figuring out how to clearly define work so an agent can do it without asking us questions every five minutes. I am genuinely curious to see how many teams can actually pull off the harness engineering required to make this work, but the direction is undeniable.

Frequently Asked Questions

What is OpenAI Symphony?

Symphony is an experimental framework from OpenAI that turns project tasks into isolated, autonomous implementation runs, allowing developers to manage work rather than micromanaging coding agents.

How does Symphony verify its code?

Agents in Symphony provide proof of work before submitting code. This includes passing continuous integration (CI) checks, responding to review feedback, and generating walkthrough videos.

Can I use Symphony on any codebase?

OpenAI recommends using Symphony on codebases that have adopted harness engineering, meaning they have strong automated testing and infrastructure to validate the agent's work.