Skip to content
OpenAI partners with US CAISI and UK AISI to strengthen AI safety and security
Source: openai.com

OpenAI partners with US CAISI and UK AISI to strengthen AI safety and security

Sources: https://openai.com/index/us-caisi-uk-aisi-ai-update, OpenAI

TL;DR

  • OpenAI continues and expands its voluntary partnerships with the US Center for AI Standards and Innovation (CAISI) and the UK AI Security Institute (UK AISI) to strengthen safe deployment of frontier AI. OpenAI CAISI update
  • The collaborations include joint red-teaming of safeguards against biological misuse, end-to-end testing of products for security issues, and rapid feedback loops to resolve vulnerabilities. OpenAI CAISI update
  • CAISI identified two novel security vulnerabilities in OpenAI’s ChatGPT Agent with a proof-of-concept attack that could bypass protections with about 50% success; OpenAI fixed the issues within one business day. OpenAI CAISI update
  • Since May, UK AISI has been red-teaming safeguards against biological misuse across OpenAI’s Agent and GPT‑5, with a weekly cadence and access to non-public tools to accelerate improvements. OpenAI CAISI update
  • The collaboration demonstrates how government and industry can work together to raise safety standards, improve product security, and foster responsible adoption of AI.

Context and background

OpenAI notes that developing and deploying AI that is secure and useful is core to its mission of ensuring that AGI benefits all of humanity, and that this requires ongoing work with national authorities and standards bodies. OpenAI entered into voluntary agreements with CAISI (the US Center for AI Standards and Innovation) and the UK AI Security Institute (UK AISI) as part of its approach to secure frontier AI deployment. These partnerships reflect the belief that frontier AI development should occur in close collaboration with allied governments that bring expertise in machine learning, national security, and metrology. For more than a year, OpenAI has partnered with CAISI to evaluate OpenAI models’ capabilities in cyber, chemical-biological, and other national security-relevant domains. This collaboration has expanded to include emerging product security challenges and red-teaming of OpenAI’s agentic AI systems, including the ChatGPT Agent product. OpenAI CAISI update The work with CAISI builds on OpenAI’s broader security program and internal testing, and the collaboration with UK AISI complements earlier efforts on safeguarding against biological misuse. UK AISI’s involvement began in May with red-teaming of safeguards across OpenAI’s systems, including the safeguards in both ChatGPT Agent and GPT‑5, as part of an ongoing program rather than tied to a single launch. The collaboration emphasizes rapid feedback loops and close coordination between OpenAI’s technical teams and external evaluators. OpenAI CAISI update

What’s new

The update highlights several new aspects of the CAISI/UK AISI collaborations:

  • Agentic AI security focus: OpenAI and CAISI conducted red-teaming of OpenAI’s agentic AI systems, including external evaluators working with OpenAI to identify and fix security vulnerabilities in real time. This included a preliminary step into testing approaches for agentic systems. OpenAI CAISI update
  • July collaboration results: CAISI gained early access to ChatGPT Agent, helping to understand system architecture and later performing red-teaming of the released system. The outcome included discovery of novel vulnerabilities and remediation. OpenAI CAISI update
  • Vulnerabilities and remediation: CAISI identified two novel security vulnerabilities that, under certain conditions, could let a sophisticated attacker bypass safeguards and remotely control the agent’s session and impersonate a user on other websites. A proof-of-concept attack demonstrated a ~50% success rate. After reporting, OpenAI fixed these issues within one business day. The work underscores the need to chain traditional cyber vulnerabilities with AI vulnerabilities to test guardrails. OpenAI CAISI update
  • Biological safeguards testing with UK AISI: As part of ongoing collaboration, UK AISI began red-teaming OpenAI’s safeguards against biological misuse (as defined by OpenAI policies) in May, covering safeguards in both ChatGPT Agent and GPT‑5. The collaboration uses iterative testing, with frequent meetings (roughly weekly) and bespoke configurations to test weaknesses.
  • Access and testing environment: UK AISI received in-depth access to systems and non-public testing resources to enable deeper testing, which helped surface failures that would be harder for external attackers to reproduce. The teams operated in iterative cycles of probing, strengthening safeguards, and retesting. OpenAI CAISI update Together, these efforts have led to improvements across monitoring, product configuration, and policy enforcement, and have produced concrete vulnerability reports and fixes that benefited end users and the security of widely deployed OpenAI products. UK AISI’s involvement also contributed to strengthening the full moderation stack and related safeguards. OpenAI CAISI update

Why it matters (impact for developers/enterprises)

The collaboration with CAISI and UK AISI signals a multi-layered approach to AI security that combines external evaluation with internal hardening. By validating agentic capabilities, stress-testing safeguards against misuse, and rapidly addressing discovered vulnerabilities, OpenAI seeks to raise industry standards and bolster trust in AI deployments. As OpenAI notes, close technical partnerships with organizations equipped to rigorously evaluate AI systems help strengthen confidence in the security of these systems for end users and enterprises alike. OpenAI CAISI update For developers and enterprises, the implications include more robust safeguards across products that rely on agentic AI, improved monitoring and moderation, and faster turnaround on fixes when vulnerabilities are discovered. The ongoing collaboration demonstrates how governments, standards bodies, and industry can work together to evaluate, improve, and responsibly deploy frontier AI. OpenAI CAISI update

Technical details or Implementation

The joint program blends traditional cybersecurity testing with AI-specific red-teaming, yielding concrete improvements in safeguards and product security. Key elements include:

  • Dual-domain red-teaming: CAISI’s expertise in cybersecurity and AI agent security was applied to test OpenAI’s agentic systems, including the ChatGPT Agent, for vulnerabilities that could affect system integrity and user safety. OpenAI CAISI update
  • End-to-end testing: OpenAI and partners conducted end-to-end testing of product configurations and responses, addressing vulnerabilities that could arise anywhere from model outputs to the user experience. OpenAI CAISI update
  • Rapid vulnerability triage: UK AISI contributed to a rapid feedback loop, surfacing more than a dozen vulnerability reports, with some leading to engineering fixes and policy or classifier improvements. OpenAI CAISI update
  • Monitoring and safeguards hardening: Improvements to the monitoring stack were measured against universal jailbreaks identified by UK AISI, strengthening detection and mitigation capabilities. OpenAI CAISI update
  • Custom testing configurations: OpenAI created testing configurations tailored to results from UK AISI to enable more effective evaluation of safeguards. OpenAI CAISI update
  • Non-public testing resources: The collaboration allowed access to non-public tools and design details, enabling more thorough red-teaming than would be possible with public materials alone. OpenAI CAISI update
  • Broader safeguards improvements: The agentic safeguards build on prior work to prevent biological misuse, illustrating how multiple layers of safety interact across the product. The ongoing program includes testing across model responses and the end-to-end product experience. OpenAI CAISI update

Key takeaways

  • External evaluation accelerates internal security improvements for AI systems.
  • Red-teaming for agentic AI and biological-misuse safeguards can reveal novel attack paths that mix traditional software vulnerabilities with AI-specific weaknesses.
  • Rapid triage and fixes—often within a business day—are achievable through close collaboration and access to non-public testing resources.
  • Ongoing, iterative testing and weekly coordination help sustain stronger safeguards across deployment scenarios.
  • Partnerships with standards bodies and security institutes can raise industry-wide confidence in AI safety.

FAQ

  • What is CAISI?

    The US Center for AI Standards and Innovation, a US research and standards body OpenAI has voluntary agreements with.

  • What is UK AISI?

    The UK AI Security Institute, with which OpenAI engaged in ongoing red-teaming of safeguards against biological misuse and other risk areas.

  • What vulnerabilities were found and how were they handled?

    CAISI identified two novel security vulnerabilities in ChatGPT Agent that could, under certain conditions, allow bypassing protections and remote control of a session; a proof-of-concept attack achieved ~50% success. OpenAI fixed the issues within one business day after disclosure. [OpenAI CAISI update](https://openai.com/index/us-caisi-uk-aisi-ai-update)

  • Why does this matter for developers and enterprises?

    The collaboration strengthens safeguards and product security, improves monitoring and testing, and demonstrates constructive government-industry collaboration aimed at safer deployment of frontier AI.

References

More news