Skip to content
OpenAI launches gpt-oss-120b and gpt-oss-20b under Apache 2.0 license
Source: openai.com

OpenAI launches gpt-oss-120b and gpt-oss-20b under Apache 2.0 license

Sources: https://openai.com/index/introducing-gpt-oss, openai.com

TL;DR

  • OpenAI released two open-weight language models: gpt-oss-120b and gpt-oss-20b, available under the Apache 2.0 license. They are designed for strong real-world performance at a lower cost and are optimized for open deployment.
  • gpt-oss-120b has 36 layers and a total of 117B parameters, with 128 total experts and 4 active experts per token. It can run efficiently on a single 80 GB GPU.
  • gpt-oss-20b has 24 layers and a total of 21B parameters, with 32 total experts and 4 active experts per token. It can run on edge devices with 16 GB of memory.
  • Both models demonstrate strong tool use, few-shot function calling, chain-of-thought reasoning, and performance on health and math benchmarks, while supporting OpenAI’s Responses API and on-device workflows.
  • Safety remains foundational: both models underwent safety training and evaluations, including adversarial fine-tuning under a Preparedness Framework, with external review and published model cards.

Context and background

OpenAI’s gpt-oss models mark the company’s first openly-available open-weight language models since GPT-2. The two new models are designed to offer high-quality reasoning capabilities at lower cost, with an Apache 2.0 license that supports wide customization and deployment. The effort complements existing API offerings by enabling open-weight models to be hosted on user infrastructure—ranging from on-prem deployments to edge devices—while emphasizing safety evaluations and alignment with OpenAI’s Model Spec. The gpt-oss initiative comes with partnerships and real-world testing experiences with organizations like AI Sweden, Orange, and Snowflake. These collaborations explore hosting, data security considerations, and fine-tuning on specialized datasets to address practical use cases—from data protection to industry-specific tasks.

What’s new

Two models are being released: gpt-oss-120b and gpt-oss-20b. Both are built as transformers with mixture-of-experts (MoE) to reduce the number of active parameters needed to process input. Specifics include:

  • gpt-oss-120b: 36 layers, 117B total parameters, 5.1B active parameters per token, 128 total experts, 4 active experts per token, context length up to 128k.
  • gpt-oss-20b: 24 layers, 21B total parameters, 3.6B active parameters per token, 32 total experts, 4 active experts per token, context length up to 128k.
  • Both models use alternating dense and locally banded sparse attention patterns, similar to GPT-3, and employ grouped multi-query attention with a group size of 8. They natively support context lengths up to 128k and use Rotary Positional Embedding (RoPE) for positional encoding.
  • Data and training focus: a mostly English, text-only dataset with emphasis on STEM, coding, and general knowledge. Data was tokenized with a tokenizer derived from OpenAI’s o4-mini and GPT-4o families (o200k_harmony).
  • Training and alignment: post-training mirrors the process used for o4-mini, including supervised fine-tuning followed by a high-compute RL stage. The objective was to align the models with the OpenAI Model Spec and teach them to apply chain-of-thought reasoning and tool use before producing answers.
  • Capability and safety: the models support three levels of reasoning effort (low, medium, high) to trade latency for performance. They underwent comprehensive safety training and evaluations, including an adversarially fine-tuned variant tested under OpenAI’s Preparedness Framework, and results are documented in a research paper and model card.
  • Ecosystem and accessibility: the gpt-oss models are designed to be used with OpenAI’s Responses API and integrated into agentic workflows, with strong instruction following and tool-use capabilities (e.g., web search, Python execution) and the ability to tune reasoning effort for latency-critical tasks.

Why it matters (impact for developers/enterprises)

The gpt-oss release provides a significant step toward broadly accessible open-weight LLMs with robust reasoning and tool-use capabilities. The key implications include:

  • Ownership and customization: organizations can run and customize these models on their own infrastructure, enabling on-premises data handling, data security, and alignment with internal policies.
  • Hardware-friendly deployment: gpt-oss-120b can run efficiently on a single 80 GB GPU, while gpt-oss-20b is designed to run on edge devices with 16 GB of memory. This lowers the barrier to on-device inference and rapid iteration without large-scale infrastructure.
  • Flexibility in performance and latency: the models expose adjustable reasoning effort (low/medium/high) to balance latency and performance for diverse workflows.
  • Safety and governance: OpenAI emphasizes safety as foundational, combining safety training, evaluation, and external review (via the Preparedness Framework and model cards) to maintain consistent safety standards with proprietary frontier models.
  • Ecosystem and ecosystem-enabled deployments: with compatibility to the Responses API and collaboration with partners, developers can design end-to-end AI workflows that combine local inference with cloud capabilities, depending on deployment and latency needs.

Technical details or Implementation

The gpt-oss models employ a mixture-of-experts (MoE) approach to reduce the number of active parameters processed per token. Notable technical characteristics include:

  • Architecture: Transformer architecture with alternating dense and locally banded sparse attention patterns, reminiscent of GPT-3, and grouped multi-query attention with a group size of 8.
  • Positional encoding: Rotary Positional Embedding (RoPE) used for efficient and scalable positional representation.
  • Context length: native support for context lengths up to 128k tokens.
  • Tokenization: data tokenized with a superset of the o4-mini and GPT-4o tokenizer (o200k_harmony), which is being open-sourced alongside the models.
  • Training and post-training: models post-trained with a process similar to o4-mini, combining supervised fine-tuning with a high-compute RL stage to align with the OpenAI Model Spec and to enhance CoT reasoning and tool use before final output.
  • Active vs total experts: gpt-oss-120b uses 128 total experts with 4 active per token; gpt-oss-20b uses 32 total experts with 4 active per token.
  • Performance notes: across standard benchmarks, gpt-oss-120b demonstrates competitive or superior results to several OpenAI reasoning models (e.g., o3-mini, o4-mini) on coding, math, health, and tool-use tasks; gpt-oss-20b shows similar performance to o3-mini and can outperform it in certain domains despite its smaller size.
  • Safety and governance: in addition to internal safety training and evaluations, OpenAI tested an adversarially fine-tuned gpt-oss-120b version under its Preparedness Framework, with results reported in a research paper and model card. A separate emphasis is placed on ensuring open models meet comparable safety standards to frontier proprietary models.

Model specifications snapshot

| Model | Layers | Total Params | Active Params Per Token | Total Experts | Active Experts Per Token | Context Length |---|---|---|---|---|---|---| | gpt-oss-120b | 36 | 117B | 5.1B | 128 | 4 | 128k |gpt-oss-20b | 24 | 21B | 3.6B | 32 | 4 | 128k |

Key takeaways

  • OpenAI provides two open-weight LLMs with strong reasoning and tool-use capabilities under the Apache 2.0 license, enabling broad customization and on-prem/on-device deployment.
  • The models leverage mixture-of-experts to optimize efficiency: relatively few active parameters are used per token during inference, contributing to lower compute needs for large models.
  • Both models support substantial context lengths (up to 128k) and are designed to operate across consumer hardware, including GPUs with ample VRAM and edge devices with modest memory.
  • Safety is a central pillar, with comprehensive safety training, adversarial fine-tuning, and external review; detailed safety results are shared in related research and model-card documentation.
  • The release is accompanied by ecosystem partnerships and documentation to support real-world deployments, from on-prem hosting to specialized fine-tuning datasets.

FAQ

  • What are gpt-oss-120b and gpt-oss-20b?

    They are open-weight language models released by OpenAI, built as transformers with mixture-of-experts, designed for strong reasoning, tool use, and efficient deployment. They are available under the Apache 2.0 license.

  • Under what license are they released?

    Apache 2.0.

  • How do they perform relative to other OpenAI models?

    gpt-oss-120b outperforms OpenAI o3-mini and matches or exceeds OpenAI o4-mini on several benchmarks; gpt-oss-20b matches or exceeds o3-mini and can outperform it in some areas like competition mathematics and health.

  • What hardware considerations are there?

    gpt-oss-120b can run on a single 80 GB GPU; gpt-oss-20b can run on edge devices with 16 GB of memory, enabling on-device use cases and rapid iteration without heavy infrastructure.

  • What safety approaches were used?

    Both models underwent extensive safety training and evaluations, plus an adversarial fine-tuning variant tested under a Preparedness Framework, with findings published in a research paper and a model card.

References

More news