Important notice: opendata-qa.cern.ch is a quality-assurance server. Please use it for testing purposes only. The content may be erased from time to time. Please use opendata.cern.ch for production.
The Compact Muon Solenoid (CMS) is one of the large particle detectors at CERN's Large Hadron Collider. The CMS Collaboration consists of more than 4000 scientists, engineers, technicians and students from around 240 institutes and universities from more than 50 countries. You can find more information about the CMS detector on the official CMS website.
You can find usage instructions and suggestions of CMS Open Data for different scopes in:
This page gives a brief overview of CMS Open Data contents:
The following are provided through this portal:
Collision data in the primary datasets are typically in a format known as AOD or Analysis Object Data, while simulated data are in a format called AODSIM. Beginning in Run 2, smaller data formats called MiniAOD and NanoAOD were developed in CMS to implement common physics object processing and remove information that not often needed for analysis.
AOD(SIM) and MiniAOD(SIM) files
AOD/AODSIM files are provided for Run 1 primary datasets and contain the information that is needed for analysis:
See the Getting Started page for AOD data to learn more about analyzing AOD files.
Starting from Run 2 (2015), MiniAOD/MiniAODSIM files are provided. These files contain similar information to AOD, but physics objects are processed to include more identification and selection information within a lighter C++ object, transverse momentum thresholds for storing objects are increased, and some lower-level information has been removed. MiniAOD datasets are appoximately one tenth of the size of AOD datasets. More information about MiniAOD:
AOD and MiniAOD files do not contain the final event interpretation with a simple list of particles. The files can be read in ROOT, but they cannot be opened (and understood) as simple data tables. A file typically contains several instances of the same physics object (i.e. a jet reconstructed with different algorithms), and some physics objects may be "double-counted" (i.e. a physics object may appear as a single object of its own type, but it may also be part of a jet).
Additional knowledge is needed to define a "good" physics object, and this definition can be different in each analysis. Only the runs that are validated by data quality monitoring should be used in any analysis. The list of the validated runs is provided.
NanoAOD(SIM) files
Starting from data collected in 2016, datasets in NanoAOD format are provided alongside MiniAOD.
Only a limited set of observables for each physics object is kept, with limited numerical precision.
For example, detector information is typically dropped in favor of pre-computed identification algorithm results.
The Particle Flow candidates are also dropped, since they are primarily used as inputs to higher-level physics object
reconstruction. The NanoAOD format is about 20 times smaller than MiniAOD, or about 200 times smaller than AOD.
NanoAOD files can be read in ROOT as a basic TTree
containing standard data types.
More information about NanoAOD:
NanoAOD files may still contain several instances of the same physics object (i.e. a jet reconstructed with different algorithms), and some physics objects may be "double-counted" (i.e. a physics object may appear as a single object of its own type, but it may also be part of a jet).
Additional knowledge is needed to define a "good" physics object, and this definition can be different in each analysis. Only the runs that are validated by data quality monitoring should be used in any analysis. The list of the validated runs is provided.
RECO files
Some datasets, such as those containing heavy-ion data, are provided in a format called RECO, which contains more information than the AOD format. This is done when the original analyses by the CMS collaboration were performed using this particular format.
Raw data
Small samples of raw data are also provided.