Back to

PaperBanana: Automated academic diagrams for scientists

PaperBanana is a new AI agent that automates creating academic illustrations, letting researchers focus on science instead of drawing.

I have a confession: I hate making diagrams for papers.

You spend months on the research, the code, and the experiments. You finally get the results. And then you spend three days fighting with TikZ, PowerPoint, or draw.io just to make a figure that doesn't look like it was drawn by a tired toddler.

It’s a massive time sink. We all know visuals are critical—a good diagram gets your paper read—but the effort to produce "publication-ready" figures is disproportionately high.

That’s why I was excited to read about PaperBanana, a new framework from researchers at Google and others that basically says: "Let the AI handle the drawing."

The problem with scientific illustration

Right now, creating a diagram is a manual, disjointed mess. You have a vague idea in your head. You look at other papers for inspiration. You try to sketch it out. Then you struggle with the tools.

If you aren't a graphic designer, you end up with something functional but ugly. If you are a perfectionist, you lose days tweaking arrow alignment in LaTeX.

The team behind PaperBanana realized that while we have AI that can write code (Copilot) and AI that can generate art (Midjourney), we didn't really have an AI that understands the specific, rigid, and high-stakes language of academic illustration.

Enter PaperBanana

PaperBanana isn't just a text-to-image prompter. It’s an agentic framework.

This distinction matters. If you ask DALL-E to "draw a neural network architecture," you usually get a hallucinated mess of sci-fi wires that makes no technical sense. You can't put that in a NeurIPS submission.

PaperBanana works differently. It treats illustration as a multi-step engineering problem, not a creative art project. It breaks the process down into four distinct agents:

  1. Retrieval: It looks at existing papers and references to understand what standard diagrams in your field actually look like.
  2. Planning: It drafts a blueprint. It decides what components are needed and how they relate, rather than just guessing pixels.
  3. Rendering: It generates the actual image code (often using programmatic tools that ensure straight lines and correct text).
  4. Refinement: This is the cool part. It critiques its own work. It looks at the output, compares it to the plan, and iterates to fix errors.

It’s essentially simulating the workflow of a human illustrator who drafts, reviews, and polishes.

Does it actually work?

The authors didn't just release a tool; they built a benchmark called PaperBananaBench.

They took 292 methodology diagrams from NeurIPS 2025 papers—top-tier AI research—and tried to recreate them.

The results were impressive. They compared PaperBanana against standard vision-language models and other image generation baselines. PaperBanana consistently outperformed them in faithfulness (does it match the text description?), readability (can you actually read the labels?), and aesthetics.

Because it uses an underlying code-based approach for many tasks (like plotting), it handles statistical charts surprisingly well, avoiding the "made up numbers" problem frequent in pure generative art models.

Why this matters

We are moving toward "autonomous AI scientists"—systems that can conduct research, write code, and potentially write papers.

Until now, the visual component was a blocker. An AI could write a theorem, but it couldn't draw the diagram explaining it. PaperBanana fills that gap.

For the rest of us who aren't AI agents, it just means we might finally get our weekends back. Instead of learning a new plotting library or fighting with vector graphics, we might soon be able to just describe our experiment and let the banana handle the rest.

The paper is up on arXiv now (2601.23265). It’s worth a look if you’ve ever lost a fight with a figure caption.