I genuinely don't know how some teams sleep at night knowing their AI agents are exposed to the public internet. Building a wrapper around an API was fun in 2023. Building autonomous agents that can read databases and execute code is terrifying if you don't test them properly.
Every time I read about a new company accidentally leaking customer data through a prompt injection, I cringe. We are basically giving natural language interfaces the keys to our systems. The problem is that manual red teaming is slow and boring. You can't just sit there typing malicious prompts into your own app all day hoping to find the edge cases before the hackers do.
This is where Promptfoo comes in. It is an open-source CLI and library that automates the process of attacking your own AI applications.
The problem with manual testing
If you are like most developers, your prompt testing strategy probably involves tweaking a prompt, reloading your app, and trying a few tricky inputs to see if it breaks. This works for simple customer service chatbots. It completely falls apart when you build complex agents with tool access.
You might add a strict rule telling the model never to output personally identifiable information. Then you try a simple trick like asking the model to base64 encode the output, and suddenly your guardrails are useless. Real attackers are far more creative. They use indirect prompt injections hiding in webpages your agent reads, or they craft elaborate hypothetical scenarios to bypass safety filters.
I keep coming back to the realization that humans are too clever at breaking things. We need automated systems to simulate thousands of different attack vectors systematically.
Automated red teaming
Promptfoo runs locally on your machine. You install it via npm, run a setup command, and it starts generating custom attacks targeted specifically at your application's logic.
Instead of writing tests manually, the tool uses its own language models to probe your application for weaknesses. It checks for direct and indirect prompt injections. It attempts to bypass your guardrails using known jailbreak techniques. It actively tries to extract sensitive data, violate your business rules, or force the agent to misuse its connected tools.
Half the developer community is focused on building faster models, while the other half is trying to figure out how to stop those models from doing stupid things. Promptfoo bridges that gap by letting you test for over 50 types of vulnerabilities before you merge your code. They draw on threat intelligence from a community of over 300,000 developers to keep their attack vectors up to date.
Running in continuous integration
The best part about Promptfoo is that it integrates directly into your regular development workflow. You can set it up to run in GitHub Actions, GitLab, Jenkins, or your preferred CI/CD pipeline.
Every time you modify a prompt or update a model version, Promptfoo runs its suite of attacks. If an update makes your agent more susceptible to a prompt injection, the build fails. You get a report right in your pull request showing exactly what broke and how to fix it.
This shifts security to the left. You find out about vulnerabilities while you are writing code, rather than hearing about them from an angry user on social media.
It runs entirely locally
Privacy is a big deal when testing enterprise applications. Many teams cannot send their proprietary prompts, business logic, or customer data to a third-party testing service.
Promptfoo evaluates everything on your machine. Your prompts stay in your environment. You can even run tests against local models using Ollama. This means you can thoroughly test your applications without waiting for security compliance approvals for a new SaaS tool. The fact that 127 of the Fortune 500 companies use it tells you something about its enterprise readiness.
Official Links
- GitHub Repository: https://github.com/promptfoo/promptfoo
- Project Page / Demo: https://www.promptfoo.dev/
Conclusion
We are moving past the era of experimental AI toys. If you are building agents that handle real money or sensitive data, you need to test them like traditional software. I highly recommend pulling down the Promptfoo repository and running a red team setup on your latest project. You might be surprised by what it finds.