Back to Gems of AI

Stop playing whack-a-mole with jailbreaks, how Promptfoo secures your AI apps

Promptfoo is an open-source tool that automates red teaming for your AI applications. Here is how it helps you find vulnerabilities before they hit production.

I genuinely don't know how some teams sleep at night knowing their AI agents are exposed to the public internet. Building a wrapper around an API was fun in 2023. Building autonomous agents that can read databases and execute code is terrifying if you don't test them properly.

Every time I read about a new company accidentally leaking customer data through a prompt injection, I cringe. We are basically giving natural language interfaces the keys to our systems. The problem is that manual red teaming is slow and boring. You can't just sit there typing malicious prompts into your own app all day hoping to find the edge cases before the hackers do.

This is where Promptfoo comes in. It is an open-source CLI and library that automates the process of attacking your own AI applications.

The problem with manual testing

If you are like most developers, your prompt testing strategy probably involves tweaking a prompt, reloading your app, and trying a few tricky inputs to see if it breaks. This works for simple customer service chatbots. It completely falls apart when you build complex agents with tool access.

You might add a strict rule telling the model never to output personally identifiable information. Then you try a simple trick like asking the model to base64 encode the output, and suddenly your guardrails are useless. Real attackers are far more creative. They use indirect prompt injections hiding in webpages your agent reads, or they craft elaborate hypothetical scenarios to bypass safety filters.

I keep coming back to the realization that humans are too clever at breaking things. We need automated systems to simulate thousands of different attack vectors systematically.

Automated red teaming

Promptfoo runs locally on your machine. You install it via npm, run a setup command, and it starts generating custom attacks targeted specifically at your application's logic.

Instead of writing tests manually, the tool uses its own language models to probe your application for weaknesses. It checks for direct and indirect prompt injections. It attempts to bypass your guardrails using known jailbreak techniques. It actively tries to extract sensitive data, violate your business rules, or force the agent to misuse its connected tools.

Half the developer community is focused on building faster models, while the other half is trying to figure out how to stop those models from doing stupid things. Promptfoo bridges that gap by letting you test for over 50 types of vulnerabilities before you merge your code. They draw on threat intelligence from a community of over 300,000 developers to keep their attack vectors up to date.

Running in continuous integration

The best part about Promptfoo is that it integrates directly into your regular development workflow. You can set it up to run in GitHub Actions, GitLab, Jenkins, or your preferred CI/CD pipeline.

Every time you modify a prompt or update a model version, Promptfoo runs its suite of attacks. If an update makes your agent more susceptible to a prompt injection, the build fails. You get a report right in your pull request showing exactly what broke and how to fix it.

This shifts security to the left. You find out about vulnerabilities while you are writing code, rather than hearing about them from an angry user on social media.

It runs entirely locally

Privacy is a big deal when testing enterprise applications. Many teams cannot send their proprietary prompts, business logic, or customer data to a third-party testing service.

Promptfoo evaluates everything on your machine. Your prompts stay in your environment. You can even run tests against local models using Ollama. This means you can thoroughly test your applications without waiting for security compliance approvals for a new SaaS tool. The fact that 127 of the Fortune 500 companies use it tells you something about its enterprise readiness.

  • GitHub Repository: https://github.com/promptfoo/promptfoo
  • Project Page / Demo: https://www.promptfoo.dev/

Conclusion

We are moving past the era of experimental AI toys. If you are building agents that handle real money or sensitive data, you need to test them like traditional software. I highly recommend pulling down the Promptfoo repository and running a red team setup on your latest project. You might be surprised by what it finds.

Continue exploring

S

SmallAI Team

From Gems of AI ยท Manage credits

Frequently Asked Questions

What is Promptfoo used for?

Promptfoo is a CLI and library used for evaluating and red-teaming large language model applications to find vulnerabilities like prompt injections and data leaks.

Does Promptfoo run locally?

Yes, Promptfoo evaluations can run entirely locally on your machine so your prompts and data do not have to leave your environment.

Can Promptfoo run in a CI/CD pipeline?

Promptfoo is designed to integrate into continuous integration workflows to automatically test your prompts and models on every code change.

What types of AI vulnerabilities does Promptfoo test for?

It tests for direct and indirect prompt injections, jailbreaks, PII leaks, insecure tool use in agents, and toxic content generation.

Is Promptfoo open source?

Yes, Promptfoo is open source and available under the MIT license, though they also offer enterprise security solutions.

Which models does Promptfoo support?

It supports testing against OpenAI, Anthropic, Gemini, Llama, Azure, Bedrock, Ollama, and many other model providers.

How does Promptfoo test for vulnerabilities?

Promptfoo uses its own language models to automatically generate thousands of custom attacks and edge cases tailored specifically to your application's logic.

Do I need to be a security expert to use Promptfoo?

No, Promptfoo comes with built-in attack templates and red teaming scenarios that automate the testing process for developers without specialized security training.

How often are Promptfoo's attack vectors updated?

The attack vectors are updated regularly based on new threat intelligence and jailbreak techniques discovered by the community and security researchers.

What happens if my app fails a Promptfoo red team test?

Promptfoo provides a detailed report of the vulnerability, showing exactly which input bypassed your defenses so you can implement the necessary guardrails or system prompts to fix it.

Ready to try our AI tools? 100+ specialized tools for tiny jobs. No signup required.
Browse 100+ Tools