Noise-Aware Generative Microscopic Traffic Simulation

Abstract

Accurately modeling individual vehicle behavior in microscopic traffic simulation remains a key challenge in intelligent transportation systems, as it requires vehicles to realistically generate and respond to complex traffic phenomena such as phantom traffic jams. While traditional human driver simulation models like the Intelligent Driver Model offer computational tractability, they do so by abstracting away the very complexity that defines human driving. On the other hand, recent advances in infrastructure-mounted camera-based roadway sensing have enabled the extraction of vehicle trajectory data, presenting an opportunity to shift toward generative, agent-based models that learn to reproduce driving behaviors directly from data. Yet, a major bottleneck remains: most existing datasets are either overly sanitized or lack standardization, failing to reflect the noisy, imperfect nature of real-world sensing. Unlike data from vehicle-mounted sensors—which can mitigate sensing artifacts like occlusion through overlapping fields of view and sensor fusion— infrastructure-based sensors surface a messier, more practical view of challenges that traffic engineers face every day. To this end, we present the I-24 MOTION Scenario Dataset (I24-MSD)—a standardized, curated dataset designed to preserve a realistic level of sensor imperfection, embracing these errors as part of the learning problem rather than an obstacle to overcome purely from preprocessing. Drawing from noise-aware learning strategies in computer vision, we further adapt existing generative models in the autonomous driving community for I24-MSD with noise-aware loss functions. Our results show that such models outperform traditional baselines in terms of simulation realism.

Motivation

Existing human driver models used in microscopic traffic simulation are often simplified ODEs that neglect the interactions between vehicles (except the leading vehicle), roadway structure, and signage. With greater availability of driving data collected through roadway sensing and high fidelity road maps, we revisit the modeling of microscopic traffic simulations from a data-driven generative modeling perspective.

Comparison of ODE Baseline and Generative Models

We take inspirations from autonmous driving community where generative modeling has already been applied to simulate traffic scenarios (e.g., Waymo Sim Agent). However, unlike in autonomous driving, generative microscopic traffic simulations often need to deal with noisy driving data collected through pole mounted cameras on the sides of the roadways. Such noise arises from many sources including from occlusion, jittering, etc. that makes accounting for them a necessary part of the problem formulation.

In light of this, we present the I-24 Motion Scenario Dataset (I24 MSD)—a structured collection of traffic scenarios derived from human driving data on Interstate 24 in Nashville, Tennessee. The dataset reflects the natural noise and imperfections inherent in real-world data collection, yet has undergone targeted post-processing steps that traffic engineers are most likely to apply in practice. These steps balance improving data quality with the practical constraints of available resources and expertise, leaving a realistic level of residual noise.

I24-Motion Scenario Dataset (I24-MSD)

Illustrations of I-24 MOTION traffic testbed and infrastructure-based multi-camera system used to collect the data presented in I24-MSD.

The I-24 Motion Dataset (I24-MSD) is a standardized dataset based on the largest traffic test bed in the world located in Interstate 24 in Nashville, Tennessee, and designed to advance generative microscopic traffic simulation. It contains over 3.29 million vehicle trajectories with a total duration of 40 hours of driving across 6.5 km of interstate, presented as 9-second-long traffic scenarios, each with up to 32 vehicles. The scenario-based dataset is derived from I24 Motion Dataset.

How noise in the dataset affects generative modeling?

We present preliminary analysis on how noise affect the accuracy of generative models build for microscopic traffic simulations. In light of this, we look at optimizing existing generative models from autonomous driving community for microscopic traffic simulation with noise-aware loss functions. We adapted the state-of-the-art SMART model for this purpose.

Below in Table 2 we summarize the evaluation results with standard metrics used in Waymo Sim Agent. Two baselines are compared with, Intelligent Driver Model (IDM) and Constant Velocity Model. We leverage Cross-entropy (CE), Cross-entropy with label smoothing (CE + LS), Focal loss (Focal), Symmetric cross-entropy (SCE) as the loss functions.

Data License Agreement

You are free to use the data in academic and commercial work.
The dataset contains anonymous trajectories. Any activities to re-identify individuals in the dataset or activities that may cause harm to individuals in the dataset are prohibited. You may not use the dataset in any manner that violates applicable privacy laws.

When you use I-24 MOTION Scenario Dataset (I24-MSD) in published academic work, you are required to include the following citation contents. This allows us to aggregate statistics on the data use in publications: V. Jayawardana, C. Tang, J. Ji, J. Philion, X. Peng, C. Wu, Noise-Aware Generative Microscopic Traffic Simulation, 2025

The bibtex version of the reference is:

@misc{jayawardana2025genmicrosim,
          title={Noise-Aware Generative Microscopic Traffic Simulation}, 
          author={Vindula Jayawardana and Catherine Tang and Junyi Ji and Jonah Philion and Xue Bin Peng and Cathy Wu},
          year={2025},
          eprint={2508.07453},
          archivePrefix={arXiv},
          primaryClass={eess.SY},
          url={https://arxiv.org/abs/2508.07453}, 
        }

Attribution to Original Work: This dataset was made using I-24 MOTION data under the I-24 MOTION license agreement, and your access and use of such work are governed by the terms and conditions therein.

The bibtex version of the reference to the original work is:

@article{gloudemans202324,
          title={I-24 MOTION: An instrument for freeway traffic science},
          author={Gloudemans, Derek and Wang, Yanbing and Ji, Junyi and Zachar, Gergely and Barbour, William and Hall, Eric and Cebelak, Meredith and Smith, Lee and Work, Daniel B},
          journal={Transportation Research Part C: Emerging Technologies},
          volume={155},
          pages={104311},
          year={2023},
          publisher={Elsevier}
        }

You are free to share the data and create derivative products as long as you maintain the terms above.
The data is provided “As is.” We make no other warranties, express or implied, and hereby disclaim all implied warranties, including any warranty of merchantability and warranty of fitness for a particular purpose.

Acknowledgements

The authors would like to thank Cameron Hickert, Zhengbing He, Han Zheng, and Tsung-Han Lin for constructive discussions and their feedback on this work. The authors also thank Derek Gloudemans and Gergely Zachar for providing the data to create the vectorized road maps of Interstate 24 and the coordination system transformation code.

Citation


    @misc{jayawardana2025genmicrosim,
      title={Noise-Aware Generative Microscopic Traffic Simulation}, 
      author={Vindula Jayawardana and Catherine Tang and Junyi Ji and Jonah Philion and Xue Bin Peng and Cathy Wu},
      year={2025},
      eprint={2508.07453},
      archivePrefix={arXiv},
      primaryClass={eess.SY},
      url={https://arxiv.org/abs/2508.07453}, 
    }

Template