Important notice: opendata-qa.cern.ch is a quality-assurance server. Please use it for testing purposes only. The content may be erased from time to time. Please use opendata.cern.ch for production.
ATLAS collaboration
Cite as: ATLAS collaboration (2025). ATLAS $t\bar{t}$ simulation for ML-based jet flavour tagging (JetSet). CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.QG8W.TO8P
Dataset Derived Simulated Datascience ATLAS CERN-LHC
Flavour-tagging — the task of identifying the flavour of jets — is essential for many physics analyses at the ATLAS experiment. This dataset, released for public use, can be used to train and evaluate machine learning models for jet flavour-tagging at ATLAS. It aims to facilitate broader interest and further development of innovative machine learning techniques to improve flavour-tagging performance.
The dataset consists of approximately 50 million events from simulated top quark pair production at a centre-of-mass energy of 13.6 TeV. It is stored in HDF5 format and contains structured event-level, jet-level, track-level and truth hadron information. This dataset is designed to be compatible with the flavour-tagging algorithm development pipeline used at ATLAS, and is supported by accompanying instructions and example configurations provided in open-source repositories.
To improve usability, the dataset is split into three mutually exclusive HDF5 files:
mc-flavtag-ttbar-small.h5
— ~1.36 million events (~5.6 million jets)mc-flavtag-ttbar-medium.h5
— ~6.23 million events (~25.6 million jets)mc-flavtag-ttbar-large.h5
— ~41.1 million events (~168 million jets)Downloading all three files will provide access to the complete dataset. The smaller subsets are useful for quick exploration or prototyping workflows.
A detailed explanation of this dataset, and instructions for pre-processing, training, and evaluation workflows are provided in the accompanying GitLab repository. If this dataset is used in a publication, please cite this dataset record along with the accompanying ATLAS paper describing GN2, a ATLAS flavour-tagging algorithm with a transformer-like architecture.
Transforming Jet Flavour: Documentation and training pipeline
These open data are released under the Creative Commons Zero v1.0 Universal license.
Neither the experiment(s) ( ATLAS ) nor CERN endorse any works, scientific or otherwise, produced using these data.
This release has a unique DOI that you are requested to cite in any applications or publications.