GPT-5-Codex Addendum: Agentic Coding Optimized GPT-5 with Safety Measures
An addendum detailing GPT-5-Codex, a GPT-5 variant optimized for agentic coding within Codex, with safety mitigations and multi-platform availability.
Items tagged with “RL”.
An addendum detailing GPT-5-Codex, a GPT-5 variant optimized for agentic coding within Codex, with safety mitigations and multi-platform availability.
A new RL approach uses instruction-derived checklists to guide alignment, outperforming fixed-criteria reward models across multiple benchmarks on Qwen2.5-7B-Instruct, presented at ICLR 2025.
NVIDIA NeMo-RL v0.3 adds Megatron-Core support to boost training throughput for large models, addressing DTensor limitations, with long-context training, MoE support, and simplified configuration.
Overview of NeMo-RL v0.3 with Megatron-Core backend for post-training large models, detailing 6D/4D parallelism, GPU-optimized kernels, and simplified configuration to boost reinforcement learning throughput on models at scale.
NVIDIA Research introduces ProRL v2, the latest evolution of Prolonged Reinforcement Learning for LLMs. It explores thousands of extra RL steps, new stabilization techniques, and broad benchmarking to push sustained improvements beyond traditional RL schedules.
NVIDIA Cosmos Reason is an open, fully customizable reasoning vision-language model for physical AI and robotics. It enables step-by-step multimodal reasoning and boosts robotics performance through post-training refinements.
NVIDIA Research explores World Foundation Models (WFMs) for synthetic data generation and data curation to accelerate robot and industrial AI training, detailing Cosmos Predict, Cosmos Transfer, and Cosmos Reason and related workflows.
A detailed analysis of the worst-case frontier risks when releasing open-weight LLMs, introducing Malicious Fine-Tuning (MFT) to probe biology and cybersecurity capabilities and comparing against open- and closed-weight baselines.
A rigorous case that true AGI requires embodied intelligence and physical-world grounding, not mere scaling of multimodal models. Critiques world-model hypotheses and highlights evidence that language models may rely on memorized rules rather than physics.
Berkeley researchers deployed 100 RL-controlled vehicles on a live highway to dampen stop-and-go waves, improving traffic flow and cutting energy use for all drivers.
100 RL-controlled cars deployed on I-24 during rush hour to dampen stop-and-go waves, improve throughput, and reduce fuel use for all road users. Decentralized controllers rely on basic radar sensors and local observations.
Overview of BAIR Lab's 2024 AI PhD graduates, their research areas, advisors, and contact links, with profiles, research blurbs, and URLs for recruiting and collaboration.