Important notice: opendata-qa.cern.ch is a quality-assurance server. Please use it for testing purposes only. The content may be erased from time to time. Please use opendata.cern.ch for production.
To analyse CMS data collected in 2011 and 2012, you need version 5.3.32 of CMSSW, supported only on Scientific Linux 6. If you are unfamiliar with Linux, take a look at this short introduction to Linux or tutorial. Once you have installed the CMS open data container or the CMS-specific CERN Virtual Machine, you need to open a terminal.
If you are using the VM, always use the "CMS shell" terminal available from the "CMS Shell" icon on the desktop for all CMSSW-specific commands, such as compilation and run. In the VM "CMS Shell", execute the following command in the terminal if you haven't done so before; it ensures that you have this version of CMSSW running:
$ cmsrel CMSSW_5_3_32
Note that if you get a warning message about the current OS not being slc6, you are using a wrong terminal ("Outer Shell") which is CERN CentOS 7 (cc7). Open a "CMS Shell" terminal as explained above and execute the cmsrel command there.
In the VM, the CMS analysis environment needs to be properly setup by entering the following commands in the terminal (you must do so every time you boot the VM before you can proceed):
$ cd CMSSW_5_3_32/src/
$ cmsenv # do not execute this command if you are working in the container
Make sure that you are always in the CMSSW_5_3_32/src/ directory, both in the CMS open data container and in the VM (and in the "CMS Shell" terminal in VM).
The primary data provided by CMS on the CERN Open Data Portal are in a format called "Analysis Object Data" or AOD for short. These AOD files are prepared by piecing raw data collected by various sub-detectors of CMS and contain all the information that is needed for analysis. The files cannot be opened and understood as simple data tables but require ROOT, a framework used by several particle-physics experiments to work with the collected data, in order to be read.
But let's first see what an AOD file contains.
Make sure that you are in the CMSSW_5_3_32/src/ folder (and, in VM, you have executed the cmsenv
command in your terminal).
Select a dataset, for example, the ElectronHad dataset from Run2012A. You can select a file, (a listing is available for each dataset record) and print out it contents with:
$ edmDumpEventContent root://eospublic.cern.ch//eos/opendata/cms/Run2012A/ElectronHad/AOD/22Jan2013-v1/20000/FEE9E03A-F581-E211-8758-002618943901.root
The ouput is a list of different objects that the file contains, such as
Type Module Label Process
----------------------------------------------------------------------------------------------
edm::TriggerResults "TriggerResults" "" "HLT"
trigger::TriggerEvent "hltTriggerSummaryAOD" "" "HLT"
[...]
vector<reco::GsfElectron> "gsfElectrons" "" "RECO"
[...]
vector<reco::Muon> "muons" "" "RECO"
[...]
Documentation of the objects of main interest to physics analysis is available in the CMS Open Data guide. The objects are implemented as C++ classes in the CMS software package CMSSW, and detailed reference documentation of all classes is available in the class list of the CMSSW reference manual. To see the properties of electrons, you would navigate to the namespace "reco" and find the entry for GsfElectron
. The reco::GsfElectron Class Reference lists all member functions through which the different properties of a reconstructed electron can be accessed. Note that many of the basic properties are "inherited" from the parent classes, and are listed separately under "Public Member Functions inherited from ... ". You can find more information on each object in the CMS Open Data guide (e.g. electrons).
These objects can be accessed in a software module which can be built with a helper script available in the CMS open data environment. Do the following:
$ mkdir Demo
$ cd Demo
$ mkedanlzr DemoAnalyzer
$ cd DemoAnalyzer
This will create several template files in the new DemoAnalyzer directory. For more information about CMSSW analyzer modules, have a look in the CMS open data guide.
Compile the code with:
$ scram b
You can ignore the message
****WARNING: No need to export library once you have declared your library as plugin.
Please cleanup src/Demo/DemoAnalyzer/BuildFile by removing the <export></export> section.
or take action and remove the indicated section from BuildFile.xml
.
Change the file name in the configuration file demoanalyzer_cfg.py
in the DemoAnalyzer directory. Take, for example, the SingleMu dataset from Run2012D i.e. replace file:myfile.root
with root://eospublic.cern.ch//eos/opendata/cms/Run2012D/SingleMu/AOD/22Jan2013-v1/10000/0015EC7D-EAA7-E211-A9B9-E0CB4E5536A7.root
.
Change the max number of events to 10 (i.e change -1 to 10 in process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1)
).
Run the code with:
$ cmsRun demoanalyzer_cfg.py
and you will get an output like:
221119 18:53:23 1032 Xrd: XrdClientConn: Error resolving this host's domain name.
221119 18:53:23 1032 secgsi_InitProxy: cannot access private key file: /home/cmsusr/.globus/userkey.pem
221119 18:53:23 1032 Xrd: CheckErrorStatus: Server [eospublic.cern.ch] declared: (error code: 3005)
19-Nov-2022 18:53:23 CET Initiating request to open file root://eospublic.cern.ch//eos/opendata/cms/Run2012D/SingleMu/AOD/22Jan2013-v1/10000/0015EC7D-EAA7-E211-A9B9-E0CB4E5536A7.root
19-Nov-2022 18:53:26 CET Successfully opened file root://eospublic.cern.ch//eos/opendata/cms/Run2012D/SingleMu/AOD/22Jan2013-v1/10000/0015EC7D-EAA7-E211-A9B9-E0CB4E5536A7.root
Begin processing the 1st record. Run 206401, Event 240060474, LumiSection 178 at 19-Nov-2022 18:54:37.199 CET
Begin processing the 2nd record. Run 206401, Event 240069594, LumiSection 178 at 19-Nov-2022 18:54:37.227 CET
Begin processing the 3rd record. Run 206401, Event 240049754, LumiSection 178 at 19-Nov-2022 18:54:37.228 CET
Begin processing the 4th record. Run 206401, Event 240115594, LumiSection 178 at 19-Nov-2022 18:54:37.228 CET
Begin processing the 5th record. Run 206401, Event 240154770, LumiSection 178 at 19-Nov-2022 18:54:37.229 CET
Begin processing the 6th record. Run 206401, Event 240103386, LumiSection 178 at 19-Nov-2022 18:54:37.229 CET
Begin processing the 7th record. Run 206401, Event 240173338, LumiSection 178 at 19-Nov-2022 18:54:37.230 CET
Begin processing the 8th record. Run 206401, Event 240127898, LumiSection 178 at 19-Nov-2022 18:54:37.230 CET
Begin processing the 9th record. Run 206401, Event 240103970, LumiSection 178 at 19-Nov-2022 18:54:37.231 CET
Begin processing the 10th record. Run 206401, Event 240129066, LumiSection 178 at 19-Nov-2022 18:54:37.231 CET
19-Nov-2022 18:54:37 CET Closed file root://eospublic.cern.ch//eos/opendata/cms/Run2012D/SingleMu/AOD/22Jan2013-v1/10000/0015EC7D-EAA7-E211-A9B9-E0CB4E5536A7.root
=============================================
MessageLogger Summary
type category sev module subroutine count total
---- -------------------- -- ---------------- ---------------- ----- -----
1 fileAction -s file_close 1 1
2 fileAction -s file_open 2 2
type category Examples: run/evt run/evt run/evt
---- -------------------- ---------------- ---------------- ----------------
1 fileAction PostEndRun
2 fileAction pre-events pre-events
Severity # Occurrences Total Occurrences
-------- ------------- -----------------
System 3 3
This is a simple loop over the first 10 events in the file. To access the physics object information, for example that of muons, add the following lines in src/DemoAnalyzer.cc
(the lines before and after of the lines to be added are also shown):
[...]
#include "FWCore/ParameterSet/interface/ParameterSet.h"
//classes to extract Muon information
#include "DataFormats/MuonReco/interface/Muon.h"
#include "DataFormats/MuonReco/interface/MuonFwd.h"
#include<vector>
//
// class declaration
[...]
// ----------member data ---------------------------
std::vector<float> muon_e; //energy values for muons in the event
};
[...]
using namespace edm;
//clean the container
muon_e.clear();
//define the handler and get by label
Handle<reco::MuonCollection> mymuons;
iEvent.getByLabel("muons", mymuons);
//if collection is valid, loop over muons in event
if(mymuons.isValid()){
for (reco::MuonCollection::const_iterator itmuon=mymuons->begin(); itmuon!=mymuons->end(); ++itmuon){
muon_e.push_back(itmuon->energy());
}
}
//print the vector
for(unsigned int i=0; i < muon_e.size(); i++){
std::cout <<"Muon # "<<i<<" with E = "<<muon_e.at(i)<<" GeV."<<std::endl;
}
#ifdef THIS_IS_AN_EVENT_EXAMPLE
[...]
Modify the BuildFile.xml
to include DataFormats/MuonReco
dependencies so that it becomes:
<use name="FWCore/Framework"/>
<use name="FWCore/PluginManager"/>
<use name="DataFormats/MuonReco"/>
<use name="FWCore/ParameterSet"/>
<flags EDM_PLUGIN="1"/>
Compile and run again with:
$ scram b
$ cmsRun demoanalyzer_cfg.py
The output gives the energy of muons in these events:
19-Nov-2022 19:53:08 CET Initiating request to open file root://eospublic.cern.ch//eos/opendata/cms/Run2012D/SingleMu/AOD/22Jan2013-v1/10000/0015EC7D-EAA7-E211-A9B9-E0CB4E5536A7.root
19-Nov-2022 19:53:10 CET Successfully opened file root://eospublic.cern.ch//eos/opendata/cms/Run2012D/SingleMu/AOD/22Jan2013-v1/10000/0015EC7D-EAA7-E211-A9B9-E0CB4E5536A7.root
Begin processing the 1st record. Run 206401, Event 240060474, LumiSection 178 at 19-Nov-2022 19:53:50.971 CET
Muon # 0 with E = 31.2151 GeV.
Begin processing the 2nd record. Run 206401, Event 240069594, LumiSection 178 at 19-Nov-2022 19:53:51.000 CET
Muon # 0 with E = 62.6309 GeV.
Begin processing the 3rd record. Run 206401, Event 240049754, LumiSection 178 at 19-Nov-2022 19:53:51.001 CET
Muon # 0 with E = 71.6465 GeV.
Muon # 1 with E = 3.99535 GeV.
Begin processing the 4th record. Run 206401, Event 240115594, LumiSection 178 at 19-Nov-2022 19:53:51.001 CET
Muon # 0 with E = 137.55 GeV.
Muon # 1 with E = 2.70864 GeV.
Muon # 2 with E = 4.33524 GeV.
Begin processing the 5th record. Run 206401, Event 240154770, LumiSection 178 at 19-Nov-2022 19:53:51.002 CET
Muon # 0 with E = 87.9848 GeV.
Muon # 1 with E = 4.34456 GeV.
Begin processing the 6th record. Run 206401, Event 240103386, LumiSection 178 at 19-Nov-2022 19:53:51.002 CET
Muon # 0 with E = 30.2197 GeV.
Muon # 1 with E = 11.064 GeV.
Muon # 2 with E = 10.8193 GeV.
Begin processing the 7th record. Run 206401, Event 240173338, LumiSection 178 at 19-Nov-2022 19:53:51.003 CET
Muon # 0 with E = 6.84971 GeV.
Muon # 1 with E = 12.0909 GeV.
Muon # 2 with E = 3.20224 GeV.
Muon # 3 with E = 7.04104 GeV.
Muon # 4 with E = 7.90646 GeV.
Muon # 5 with E = 6.20379 GeV.
Begin processing the 8th record. Run 206401, Event 240127898, LumiSection 178 at 19-Nov-2022 19:53:51.003 CET
Muon # 0 with E = 42.8793 GeV.
Muon # 1 with E = 3.31122 GeV.
Muon # 2 with E = 3.85927 GeV.
Muon # 3 with E = 3.0424 GeV.
Begin processing the 9th record. Run 206401, Event 240103970, LumiSection 178 at 19-Nov-2022 19:53:51.003 CET
Muon # 0 with E = 55.7221 GeV.
Muon # 1 with E = 2.80195 GeV.
Begin processing the 10th record. Run 206401, Event 240129066, LumiSection 178 at 19-Nov-2022 19:53:51.004 CET
Muon # 0 with E = 33.7197 GeV.
Muon # 1 with E = 4.90223 GeV.
Muon # 2 with E = 5.61441 GeV.
19-Nov-2022 19:53:51 CET Closed file root://eospublic.cern.ch//eos/opendata/cms/Run2012D/SingleMu/AOD/22Jan2013-v1/10000/0015EC7D-EAA7-E211-A9B9-E0CB4E5536A7.root
In AOD files, reconstructed physics objects are included without checking their "quality". For example, the reconstructed objects in the muon collection that you printed out are not guaranteed to be from validated data. In order to analyse only the "good quality" data, you must apply some selection criteria.
First of all, you will need to apply a filter for validated data. Then, you will want to apply some identification and selection criteria, such as if the objects in your analysis are isolated from or close to other particles in the same collision.
For a quick start on how to do this and to write out the most common objects and their properties, use the "Physics Object Extractor Tool (POET)" available in this repository. You can use ROOT to inspect reconstructed particles and the distributions of their properties.
Start by getting the code and compiling it. Make sure that you are back in the CMSSW_5_3_32/src/ folder. If you are using the VM, do the git command to get the code in the "Outer shell" terminal. Go to the right folder with cd ~/CMSSW_5_3_32/src
. In the container, keep using the normal container shell and go to the right folder with cd $CMSSW_BASE/src
.
$ git clone https://github.com/cms-opendata-analyses/PhysObjectExtractorTool.git
$ cd PhysObjectExtractorTool
$ git checkout 2012
$ scram b
NOTE: To analyse the full event content, the analysis job needs access to the "condition data", such as trigger information or jet-energy corrections. In the VM, the condition database is made available through the cvmfs
file system, and in the container, the condition data can be read from predefined condition data servers. In both cases, reading the condition data for the first time can take very long. For the 2011 and 2012 collision and simulated data, a selection of condition databases is provided locally in the cmssw_5_3_32-slc6_amd64_gcc472
container, and the access is much faster. Comment or uncomment the lines related to condition data depending of your environment following the instructions in the configuration file PhysObjectExtractor/python/poet_cfg.py
. See detailed instructions for the use of condition data for different data-taking years in the guide to the CMS condition database.
Note also how only the validated runs are selected in the configuration file. The relevant lines are:
import FWCore.ParameterSet.Config as cms
import FWCore.PythonUtilities.LumiList as LumiList
[...]
goodJSON = "data/Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt"
myLumis = LumiList.LumiList(filename=goodJSON).getCMSSWString().split(",")
process.source.lumisToProcess = CfgTypes.untracked(
CfgTypes.VLuminosityBlockRange())
process.source.lumisToProcess.extend(myLumis)
This selection must always be applied to any analysis on CMS open data, and to do so you must have the validation file downloaded to your local area.
To produce a root file with selected objects, do the following:
$ cd PhysObjectExtractor
$ cmsRun python/poet_cfg.py
The configuration file sets it to run over 1000 events in a simulated dataset.
If you are using the CMS open data container with the VNC application installed (see the container guide page), for opening the graphical user interface, start the VNC application in the container by typing
$ start_vnc
and then start a VNC viewer on your local computer using the password cms.cern
. The http option for a GUI in the browser is not guaranteed to work in the container with this CMSSW version.
You can now open the POET output file in ROOT:
$ root myoutput.root
You will see the ROOT logo appear on screen. You can now open the ROOT GUI by entering:
TBrowser t
and you will see the ROOT browser window:
Now, let us take a closer look at some collections of physics objects.
On the left window of ROOT, double-click on the file name (myoutput.root
). You should see a list of names, each corresponding to a collection of reconstructed data.
Let us take a peek, for example, at the muons, which are found in mymuons
. Look in there by double-clicking on that line and then double-clicking on Events
. Here, you can have a look at various properties of this collection, such as the transverse momentum of the muon: muon_pt
. Double-click on it to draw the distribution.
You can exit the ROOT browser through the GUI by clicking on Browser
on the menu and then clicking on Quit Root
or by entering .q
in the terminal.
That's it! Hope you enjoyed this exercise. Feel free to play around with the rest of the data and write your own analyzers and analysis code. Learn more in the CMS Open data guide and have a look at the other example analysis workflows such as the tool to produce reduced "NanoAOD" format for outreach and education and the example analyses on its output implemented in python for the di-muon spectrum or the Higgs boson decay to two tau leptons, or the Higgs decay to four leptons implemented in C++ or using ROOT's RDataFrame, or the di-muon spectrum analysis using Julia.