Accurately modeling individual vehicle behavior in microscopic traffic simulation remains a key challenge in intelligent transportation systems, as it requires vehicles to realistically generate and respond to complex traffic phenomena such as phantom traffic jams. While traditional human driver simulation models like the Intelligent Driver Model offer computational tractability, they do so by abstracting away the very complexity that defines human driving. On the other hand, recent advances in infrastructure-mounted camera-based roadway sensing have enabled the extraction of vehicle trajectory data, presenting an opportunity to shift toward generative, agent-based models that learn to reproduce driving behaviors directly from data. Yet, a major bottleneck remains: most existing datasets are either overly sanitized or lack standardization, failing to reflect the noisy, imperfect nature of real-world sensing. Unlike data from vehicle-mounted sensors—which can mitigate sensing artifacts like occlusion through overlapping fields of view and sensor fusion— infrastructure-based sensors surface a messier, more practical view of challenges that traffic engineers face every day. To this end, we present the I-24 MOTION Scenario Dataset (I24-MSD)—a standardized, curated dataset designed to preserve a realistic level of sensor imperfection, embracing these errors as part of the learning problem rather than an obstacle to overcome purely from preprocessing. Drawing from noise-aware learning strategies in computer vision, we further adapt existing generative models in the autonomous driving community for I24-MSD with noise-aware loss functions. Our results show that such models outperform traditional baselines in terms of simulation realism.
Existing human driver models used in microscopic traffic simulation are often simplified ODEs that neglect the interactions between vehicles (except the leading vehicle), roadway structure, and signage. With greater availability of driving data collected through roadway sensing and high fidelity road maps, we revisit the modeling of microscopic traffic simulations from a data-driven generative modeling perspective.
We take inspirations from autonmous driving community where generative modeling has already been applied to simulate traffic scenarios (e.g., Waymo Sim Agent). However, unlike in autonomous driving, generative microscopic traffic simulations often need to deal with noisy driving data collected through pole mounted cameras on the sides of the roadways. Such noise arises from many sources including from occlusion, jittering, etc. that makes accounting for them a necessary part of the problem formulation.
In light of this, we present the I-24 Motion Scenario Dataset (I24 MSD)—a structured collection of traffic scenarios derived from human driving data on Interstate 24 in Nashville, Tennessee. The dataset reflects the natural noise and imperfections inherent in real-world data collection, yet has undergone targeted post-processing steps that traffic engineers are most likely to apply in practice. These steps balance improving data quality with the practical constraints of available resources and expertise, leaving a realistic level of residual noise.
The I-24 Motion Dataset (I24-MSD) is a standardized dataset based on the largest traffic test bed in the world located in Interstate 24 in Nashville, Tennessee, and designed to advance generative microscopic traffic simulation. It contains over 3.29 million vehicle trajectories with a total duration of 40 hours of driving across 6.5 km of interstate, presented as 9-second-long traffic scenarios, each with up to 32 vehicles. The scenario-based dataset is derived from I24 Motion Dataset.
We present preliminary analysis on how noise affect the accuracy of generative models build for microscopic traffic simulations. In light of this, we look at optimizing existing generative models from autonomous driving community for microscopic traffic simulation with noise-aware loss functions. We adapted the state-of-the-art SMART model for this purpose.
Below in Table 2 we summarize the evaluation results with standard metrics used in Waymo Sim Agent. Two baselines are compared with, Intelligent Driver Model (IDM) and Constant Velocity Model. We leverage Cross-entropy (CE), Cross-entropy with label smoothing (CE + LS), Focal loss (Focal), Symmetric cross-entropy (SCE) as the loss functions.
@misc{jayawardana2025genmicrosim, title={Noise-Aware Generative Microscopic Traffic Simulation}, author={Vindula Jayawardana and Catherine Tang and Junyi Ji and Jonah Philion and Xue Bin Peng and Cathy Wu}, year={2025}, eprint={2508.07453}, archivePrefix={arXiv}, primaryClass={eess.SY}, url={https://arxiv.org/abs/2508.07453}, }
@article{gloudemans202324, title={I-24 MOTION: An instrument for freeway traffic science}, author={Gloudemans, Derek and Wang, Yanbing and Ji, Junyi and Zachar, Gergely and Barbour, William and Hall, Eric and Cebelak, Meredith and Smith, Lee and Work, Daniel B}, journal={Transportation Research Part C: Emerging Technologies}, volume={155}, pages={104311}, year={2023}, publisher={Elsevier} }
The authors would like to thank Cameron Hickert, Zhengbing He, Han Zheng, and Tsung-Han Lin for constructive discussions and their feedback on this work. The authors also thank Derek Gloudemans and Gergely Zachar for providing the data to create the vectorized road maps of Interstate 24 and the coordination system transformation code.
@misc{jayawardana2025genmicrosim, title={Noise-Aware Generative Microscopic Traffic Simulation}, author={Vindula Jayawardana and Catherine Tang and Junyi Ji and Jonah Philion and Xue Bin Peng and Cathy Wu}, year={2025}, eprint={2508.07453}, archivePrefix={arXiv}, primaryClass={eess.SY}, url={https://arxiv.org/abs/2508.07453}, }
|