All datasets available in SPRACE (which are AOD or AODSIM) can be found in this link. For 2012, we have available the following real data datasets:
Run2012 datasets can be found in this link
Summer12 MC datasets can be found in this link
JSON files for 2012 Runs at 8 TeV can be found in this link.
These links should be useful.
In general, we advocate the following strategy:
In this way, we break the analysis in a hierarchical way: run on large datasets in the GRID, make smaller datasets to run on SPRACE, make Pattuples / ntuples to run in your computer. I think this is the more efficient strategy.
compareJSON.py --sub <mostRecent.json> <dataAlreadyUsed.json> <fileForNewDataOnly.json>
/MET/Run2012A-PromptReco-v1/AOD
with the rsanalyzer_JetMET_skimming_Run2012A_cfg.py
configuration file. We're setting up a task with around 75 jobs, and we will copy the output to the remote directory /MET_Run2012A-PromptReco_v1_2012May10
, which lives in srm://osg-se.sprace.org.br:8443/srm/managerv2?SFN=/pnfs/sprace.org.br/data/cms/store/user/yourUserName/MET_Run2012A-PromptReco_v1_2012May10
. Naturally, you have to setup these values for the ones you want. [CRAB] jobtype = cmssw scheduler = glite use_server = 0 [CMSSW] datasetpath=/MET/Run2012A-PromptReco-v1/AOD pset=rsanalyzer_JetMET_skimming_Run2012A_cfg.py total_number_of_lumis=-1 number_of_jobs = 75 lumi_mask=fileForNewDataOnly.json get_edm_output = 1 [USER] copy_data = 1 return_data = 0 storage_element = T2_BR_SPRACE user_remote_dir = /MET_Run2012A-PromptReco_v1_2012May10 ui_working_dir = myWorkingDirName [GRID] ce_white_list = T2_BR_SPRACE
myWorkingDirName/res/lumiSummary.json
. This file represents exactly the data over which you ran over, taking into account failed jobs, blocks of data which were not yet available, etc. crab -status -c myWorkingDirName crab -getoutput -c myWorkingDirName crab -report -c myWorkingDirName
lumiCalc2.py
script: lumiCalc2.py -b stable -i lumiSummary.json overview
mergeJSON.py previousData.json dataYouJustRanOver.json --output=totalData.json
The following picture shows this process schematically:
Naturally, it depends on your specific analysis channel. Remember that the goal is to separate the analysis hierarchically - run over large datasets using the GRID, preselect/reduce them to more manageable sizes, bring them to SPRACE and run the rest of the analysis more or less locally. If you make a very complicated preselection in the GRID, it starts to become comparable to make the whole analysis there, and defeats the whole idea. So, some general points:
TriggerResultsFilter
module. You can see an example of trigger-based skimming in this link
In CMS we use the Physics Analysis Toolkit (PAT) to steer our analyses. It is a set of standard EDModules and configuration files that act as building blocks for you to build you analysis.
The SPRACE Package is acessible in GitHub and contains some code used for the EXOTICA analyses in SPRACE.
See: https://github.com/trtomei/SpracePackage
-- ThiagoTomei - 30 May 2012
I | Attachment | History | Action | Size![]() |
Date | Who | Comment |
---|---|---|---|---|---|---|---|
![]() |
Slide1.png | r1 | manage | 125.6 K | 2012-05-14 - 18:57 | ThiagoTomei |
antalya escort bursa escort eskisehir escort istanbul escort izmir escort