Scaling RL for Traffic Smoothing: 100-AV Highway Deployment

Overview

Stop-and-go waves are common in dense highway traffic and lead to energy waste and higher emissions. Researchers deployed 100 reinforcement learning (RL) controlled cars on a real highway (I-24 near Nashville) to learn driving strategies that dampen these waves, improve energy efficiency, and maintain throughput around human drivers. The team built fast, data-driven simulations derived from experimental highway data to train RL agents to optimize energy use while operating safely around humans. A key finding is that a small fraction of well-controlled autonomous vehicles (AVs) can meaningfully improve traffic flow and reduce fuel use for all road users. The deployed controllers are designed to be hardware-friendly, operating in a decentralized fashion using only basic sensor input available on most modern vehicles. The project chronicles the path from simulation to the field, detailing steps taken to bridge the gap between training and real-world deployment. The researchers used highway trajectory data from I-24 to replay and generate unstable traffic patterns in simulation, enabling AVs to learn smoothing strategies behind human-driven traffic. The approach emphasizes local sensing: the observations fed to the RL agent are the AV’s own speed, the speed of the vehicle in front, and the gap between them. Based on these signals, the agent prescribes either an instantaneous acceleration or a desired speed for the AV. The reward function is crafted to balance energy efficiency with throughput and safe, reasonable driving, and it incorporates dynamic minimum and maximum gap thresholds to avoid degenerate, unsafe behaviors. The design also penalizes fuel consumption of human drivers behind the AV to discourage selfish optimization by the RL controller. In simulation, the learned behavior typically keeps slightly larger gaps than humans, allowing AVs to absorb upcoming slowdowns more effectively. In the most congested scenarios, simulations reported up to ~20% total fuel savings across road users with fewer than 5% AV penetration. Importantly, the RL controllers were designed to work with standard adaptive cruise control (ACC) hardware and can operate in a decentralized manner without special infrastructure. Following simulation validation, the team deployed the RL controllers in the field in what they called MegaVanderTest: a large-scale, 100-vehicle experiment on I-24 during peak traffic hours. Data collection used overhead cameras to reconstruct millions of vehicle trajectories, enabling detailed analysis of traffic dynamics and energy use.

Key features

Data-driven RL training in fast, realistic traffic simulations built from experimental highway data.
Local observations: AV speed, lead-vehicle speed, and inter-vehicle gap.
Reward shaping that balances energy efficiency, throughput, and safety; dynamic gap thresholds to prevent unsafe behavior.
Decentralized deployment compatible with standard ACC hardware and radar sensors; no special infrastructure required.
Large-scale field validation (MegaVanderTest): 100 vehicles during morning rush hours with millions of trajectories collected.
Reported energy savings of up to ~20% in congested scenarios; notable reductions in speed/acceleration variance as a proxy for wave dampening.
Observed that closer-following AVs can reduce energy use in downstream traffic and lower congestion footprints.
Evidence of potential future gains with faster simulations, better human-driver models, and exploration of 5G-enabled coordination.
The field test was conducted without explicit inter-AV communication, aligning with current autonomous-vehicle deployments.
Integrated deployment with existing adaptive cruise control (ACC) systems to enable scale.

Common use cases

Smoothing traffic and reducing fuel consumption on congested highways with minimal infrastructure changes.
Deploying RL-based traffic-smoothing controllers on existing AVs equipped with ACC to achieve broader energy efficiency benefits.
Bridging simulation-to-reality gaps in mixed-autonomy traffic research and informing future large-scale experiments.
Exploring data-driven human-driving models and advanced sensing methods to improve model fidelity and robustness.

Setup & installation

Not all setup or installation commands are provided in the source. The work describes training RL agents in fast simulations built from real data, validating in hardware, and then deploying the controllers on 100 vehicles. No command-level steps are included in the source.

# Setup commands not provided in the source

Quick start

The source provides a high-level blueprint but does not include runnable code or a ready-to-run quickstart. A minimal outline derived from the content is:

Build fast, data-driven highway simulations using real trajectory data.
Train RL agents to optimize energy efficiency while maintaining throughput and safety around human drivers.
Validate controllers in hardware, then deploy on a small fleet of AVs.
Collect and analyze field data to quantify energy savings and traffic smoothing effects.

# Quick start not provided by the source

Pros and cons

Pros
Demonstrates scalable, decentralized control that can be deployed on standard AVs without new infrastructure.
Evidence of meaningful energy savings (15–20%) around controlled vehicles in field data.
Field deployment (MegaVanderTest) on 100 cars represents one of the largest mixed-autonomy experiments to date.
Uses local sensor information, enabling deployment with existing radar-based sensing and ACC.
Smoothing effects observed as reduced speed/acceleration variance, indicating dampened stop-and-go waves.
Cons
Bridging the sim-to-reality gap remains a challenge; the paper emphasizes the need for faster, more accurate simulations and better human-driver models.
Future gains may rely on enhanced data sharing or explicit inter-AV communication (e.g., over 5G), which was not deployed in the field test.
The reward design requires careful balancing to avoid unsafe or suboptimal behaviors; dynamic gap thresholds are used to mitigate this risk.