Important notice: opendata-qa.cern.ch is a quality-assurance server. Please use it for testing purposes only. The content may be erased from time to time. Please use opendata.cern.ch for production.
ATLAS collaboration
Cite as: ATLAS collaboration (2024). ATLAS top tagging open data set with systematic uncertainties. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.SOAY.LABE
Dataset Derived Datascience ATLAS CERN-LHC
Boosted top tagging is an essential binary classification task for experiments at the Large Hadron Collider (LHC) to measure the properties of the top quark. The ATLAS Top Tagging Open Data Set is a publicly available dataset for the development of Machine Learning (ML) based boosted top tagging algorithms. The dataset consists of a nominal piece used for the training and evaluation of algorithms, and a systematic piece used for estimating the size of systematic uncertainties produced by an algorithm. The nominal data are is split into two orthogonal sets, named train and test. The systematic varied data is split into many more pieces that should only be used for evaluation in most cases. Both nominal sets are composed of equal parts signal (jets initiated by a boosted top quark) and background (jets initiated by light quarks or gluons).
A brief overview of these datasets is as follows. For more detailed information see arxiv:2047.20127.
For each jet, the datasets contain:
There are two rules for using this data set: the contribution to a loss function from any jet should always be weighted by the training weight, and any performance claim is incomplete without an estimate of the systematic uncertainties via the method illustrated in this repository. The ideal model shows high performance but also small systematic uncertainties.
This dataset supersedes an earlier data release which did not include data for estimating systematic uncertainties.
A detailed explanation of this dataset, with examples demonstrating how to train a tagger and assess systematic uncertainties, is provided in the this repository.
If this dataset is used in a publication, please cite this dataset record along with the accompanying paper arxiv:2047.20127.
The open data are released under the Creative Commons CC0 waiver. Neither the experiment(s) ( ATLAS ) nor CERN endorse any works, scientific or otherwise, produced using these data. All releases will have a unique DOI that you are requested to cite in any applications or publications.