Alpgen Time and Size Studies
Goal
To measure the time and size performance of ALPGEN runs (grid production, event production) in a typical CMS T2 farm, like SPRACE.
Modus Operandi
IMPORTANT NOTE: do everything in a subdirectory of /hdacs, so that you have more working space and the WNs can see that space.
- Setup a CMSSW area:
cmsrel CMSSW_2_2_9; cd CMSSW_2_2_9/src; cmsenv
- Use whatever method to be able to CVS code from CERN.
- Get the ALPGEN grids:
cvs co -r CMSSW_2_2_9 GeneratorInterface/AlpgenInterface
- Setup CONDOR:
. /OSG/setup.sh
- Use the script to submit ALPGEN jobs to SPRACE
The script takes three arguments: the config file for ALPGEN, the name of the ALPGEN executable you want to run, and the name of the grid.
It is the user's task to make sure all of those are coherent among themselves! For instance, if you want to run ttbar+4 jets, at 40 GeV threshold, starting from an already existing grid, you should use
python sprace_ALPGEN_submit.py input /OSG_app/app/cmssoft/cms/slc4_ia32_gcc345/external/alpgen/213-cms2/bin/2Qgen GeneratorInterface/AlpgenInterface/data/ttbar4j_40GeV.grid2
where input is of the form
1 ! generation mode
ttbar4j_40 ! string labeling the output files
0 ! start with: 0=new grid, 1=previous warmup grid, 2=previous generation grid
0 0 ! N(events)/iteration and N(iter's) for initial grid optimiz. Use 0 0 if starting from existing grid
50000000 ! number evts to generate
*** The above 5 lines provide mandatory inputs for all processes
*** (Comment lines are introduced by the three asteriscs)
*** The lines below modify existing defaults for the hard process under study
*** For a complete list of accessible parameters and their values,
*** input 'print 1' (to display on the screen) or 'print 2' to write to file
njets 4
ebeam 5000
ih2 1
ickkw 1
ptjmin 40
etajmax 5
drjmin 0.7
mt 175
ihvy 6
itdecmode 7
Tasks
- Linearity with number of events: Check running times and event sizes for ttbar+4 jets at 40 GeV threshold, starting from already-existing grids, asking for different number of events. DONE
- Independence of linearity wrt to jet multiplicity: Check running times and event sizes for ttbar+2 jets at 40 GeV threshold, starting from already-existing grids, asking for different number of events. DONE
- Time as function of jet multiplicity: Check running times and event sizes for ttbar+1,3,4 jets, at 40 GeV threshold, starting from already existing grids, asking for 1000000 events. DONE
- Time as function of matching threshold: Check running times and event sizes for ttbar + 2 jets, at 20, 30, 40, 50, 60 GeV threshold, starting from already existing grids, asking for 1000000 events. DONE
Tools
- Standard location of ALPGEN executables (from CMSSW):
$CMS_PATH/slc4_ia32_gcc345/external/alpgen
(there are many subdirectories with different versions, pick the latest one).
Results
For reference, these jobs were processed in
this hardware configuration, using condor 7.0.3 as batch scheduler and Scientific Linux 4.6. Also, ''events'' in this context means weighted events.
Linearity / independence with respect to jet multiplicity
events |
average time (s) |
standard deviation |
size(b) |
events |
unweighted |
lum (pb^(-1) |
1000 |
0.000 |
0.000 |
1653 |
29 |
0 |
- |
2000 |
0.000 |
0.000 |
2394 |
42 |
1 |
0.0894107602 |
5000 |
0.000 |
0.000 |
3420 |
60 |
6 |
0.46386027 |
10000 |
0.000 |
0.000 |
6612 |
116 |
4 |
0.28711817 |
20000 |
0.000 |
0.000 |
9804 |
172 |
3 |
0.138082356 |
50000 |
2.000 |
1.414 |
17157 |
301 |
8 |
0.393754979 |
100000 |
9.333 |
1.700 |
24282 |
426 |
9 |
0.28055974 |
200000 |
21.000 |
2.944 |
30153 |
529 |
1 |
0.0138315699 |
500000 |
55.000 |
2.160 |
34941 |
613 |
4 |
0.0630326254 |
1000000 |
113.667 |
3.300 |
45372 |
796 |
7 |
0.115740976 |
2000000 |
234.667 |
3.771 |
64467 |
1131 |
11 |
0.174470775 |
5000000 |
593.000 |
2.160 |
104253 |
1829 |
18 |
0.332504291 |
10000000 |
1199.667 |
2.643 |
167523 |
2939 |
56 |
1.15591618 |
20000000 |
2411.000 |
5.099 |
265050 |
4650 |
74 |
1.46726676 |
50000000 |
6015.667 |
21.714 |
513000 |
9000 |
143 |
2.75454531 |
events |
average time (s) |
standard deviation |
size(b) |
events |
1000 |
0.000 |
0.000 |
1653 |
29 |
2000 |
0.000 |
0.000 |
3021 |
53 |
5000 |
0.000 |
0.000 |
5757 |
101 |
10000 |
0.667 |
0.943 |
6099 |
107 |
20000 |
6.667 |
1.247 |
7125 |
125 |
50000 |
20.667 |
2.494 |
9975 |
175 |
100000 |
43.667 |
2.055 |
15333 |
269 |
200000 |
96.333 |
3.399 |
21090 |
370 |
500000 |
248.667 |
2.867 |
29412 |
516 |
1000000 |
498.667 |
3.301 |
36765 |
645 |
2000000 |
1013.667 |
1.687 |
47652 |
836 |
5000000 |
2562.000 |
33.978 |
69711 |
1223 |
10000000 |
5095.750 |
37.456 |
90288 |
1584 |
20000000 |
10149.250 |
56.167 |
121866 |
2138 |
50000000 |
25425.750 |
192.689 |
195567 |
3431 |
Conclusion: the running time is linear with the number of events ASKED, but not with the number of events actually produced.
Time as function of jet multiplicity
Conclusion: the running time is exponential with the number of extra jets asked.
Time as function of matching threshold
Conclusion: the running time is approximately independent of the matching threshold.
--
ThiagoTomei - 05 Jun 2009