Red Teaming AI Systems
Lama Ahmad is on the Policy Research team at OpenAI, where she leads efforts on external assessments of the impacts of AI systems on society. Her work includes leading the Researcher Access Program, OpenAI's red teaming efforts, third party assessment and auditing, as well as public input projects.
In a recent talk at OpenAI, Lama Ahmad shared insights into OpenAI’s Red Teaming efforts, which play a critical role in ensuring the safety and reliability of AI systems. Hosted by Natalie Cone, OpenAI Forum’s Community Manager, the session opened with an opportunity for audience members to participate in cybersecurity initiatives at OpenAI. The primary focus of the event was red teaming AI systems—a process for identifying risks and vulnerabilities in models to improve their robustness.
Red teaming, as Ahmad explained, is derived from cybersecurity practices, but has evolved to fit the AI industry’s needs. At its core, it’s a structured process for probing AI systems to identify harmful outputs, infrastructural threats, and other risks that could emerge during normal or adversarial use. Red teaming not only tests systems under potential misuse, but also evaluates normal user interactions to identify unintentional failures or undesirable outcomes, such as inaccurate outputs. Ahmad, who leads OpenAI’s external assessments of AI system impacts, emphasized that these efforts are vital to building safer, more reliable systems.
Ahmad provided a detailed history of how OpenAI’s red teaming efforts have grown in tandem with its product development. She described how, during her tenure at OpenAI, the launch of systems like DALL-E 3 and ChatGPT greatly expanded the accessibility of AI tools to the public, making red teaming more important than ever. The accessibility of these tools, she noted, increases their impact across various domains, both positively and negatively, making it critical to assess the risks AI might pose to different groups of users.
Ahmad outlined several key lessons learned from red teaming at OpenAI. First, red teaming is a “full stack policy challenge,” requiring coordination across different teams and expertise areas. It is not a one-time process, but must be continually integrated into the AI development lifecycle. Additionally, diverse perspectives are essential for understanding potential failure modes. Ahmad noted that OpenAI relies on internal teams, external experts, and automated systems to probe for risks. Automated red teaming, where models are used to generate test cases, is increasingly useful, but human experts remain crucial for understanding nuanced risks that automated methods might miss.
Ahmad also highlighted specific examples from red teaming, such as the discovery of visual synonyms, where users can bypass content restrictions by using alternative terms. She pointed out how features like DALL-E’s inpainting tool, which allows users to edit parts of images, pose unique challenges that require both qualitative and quantitative risk assessments. Red teaming’s findings often lead to model-level mitigations, system-level safeguards like keyword blocklists, or even policy development to ensure safe and ethical use of AI systems.
During the Q&A session, attendees raised questions about the challenges of red teaming in industries like life sciences and healthcare, where sensitive topics could lead to overly cautious models. Ahmad emphasized that red teaming is a measurement tool meant to track risks over time and is not designed to provide definitive solutions. Other audience members inquired about the risks of misinformation in AI systems, especially around elections. Ahmad assured participants that OpenAI is actively working to address these concerns, with red teaming efforts focused on areas like misinformation and bias.
In conclusion, Ahmad stressed that as AI systems become more complex, red teaming will continue to evolve, combining human evaluations with automated testing to scale risk assessments. OpenAI’s iterative deployment model, she said, allows the company to learn from real-world use cases, ensuring that its systems are continuously improved. Although automated evaluations are valuable, human involvement remains crucial for addressing novel risks and building safer, more reliable AI systems.