This is Angelo's log book of work in the Exotica Analysis "Search for Heavy Resonances in the H-tagged Dijet Mass Spectrum in pp Collisions at 8 TeV". This log started to be written on Oct 28th, 2013. It supposed to be a summary of what has been written along CMS Analysis Notes 13-152 and 213/347
, as well as a report of Angelo's related activities.
As buscas por sinal de nova Física no LHC visam observar partículas de alta massa: geralmente maiores que 1 TeV. Por isso, o produto de decaimento dessas ressonâncias são partículas que sofrem um “boost” de Lorentz, caracterizando-se por apresentar trajetórias muito próximas. Muitas análises de busca por partículas massivas tem sido desenvolvidas no CMS como, por exemplo, a análise do canal de decaimento pp → X → (H → bb)(H → bb), em que X pode ser um Rádion ou um Gráviton, por exemplo, e decai em dois bóson de Higgs.
Ao invés de produzir dois jatos, a hadronização do par bb produz um único jato, chamado de jato “gordo”, como resultado do “boost” de Lorentz. Essa topologia requere o desenvolvimento de novas técnicas de análise de dados que possibilitem identificar o número de quarks b, imersos naqueles jatos gordos”, bem como obter a massa invariante do jato. A aplicação dessas técnicas permitiria reconstruir o cenário de bósons H decaindo em quatro jatos oriundos de quarks b. Além disso, também são empregadas técnicas de subestrutura de jatos.
Quarks b estão presentes em muitos canais interessantes para a Física. Como, no Modelo Padrão, a probabilidade de ocorrer o decaimento H → bb é de cerca de 57% para um bóson de Higgs com massa de 125 GeV/c², a descoberta recente desse bóson dependeu basicamente de uma eficiente identificação daqueles quarks. Devido a um tempo de vida relativamente longo dos hádrons que contém quarks b, jatos resultantes da hadronização desses quarks apresentam uma topologia difenrente dos jatos produzidos a partir de quarks leves. Aqueles hádrons podem viajar alguns milímetros, em relação à região de colisão próton-próton, antes de decair. Com a assinatura fornecida pelos produtos de decaimento, é possível reconstruir vértices secundários cujas características permitem identificar eficientemente os quarks b.
Como membro do Grupo ExoDiBoson do CMS, Angelo Santos tem participado dessa busca por partículas massivas, previstas em teorias que vão além do Modelo Padrão. realizando estudos com o emprego de técnicas de identificação de quarks b e de subestrutura de jatos.
This is a search for massive resonances decaying into a pair of Higgs bosons each reconstructed in hadronic final states. This search is optimized for large resonance masses, in which the Higgs decay products merge in one massive jet. QCD background is suppressed using jet substructure techniques. Data sample corresponds to an integrated luminosity of 19.6/fb of proton-proton collisions collected in the CMS experiment at the LHC in 2012 at a center-of-mass energy of 8 TeV.
This analysis search for new particles based on physics scenarios Beyond the Standard Model. These particles could be either a spin 0 Radion, or an excited state of the Graviton (spin 2). The X-particle (Radion or Graviton) decays to two Higgs bosons which both decay to a b-quark and a b-antiquark: X → HH → 4b channel. For these heavy X-particles (decaying nearly at rest in the labframe), the Higgs bosons will appear back-to-back, and very boosted due to their very high momentum. The final states will have only two merged, fat jets (dijet state) instead of 4 separated jets because the b-bbar pairs from each jet will appear merged in a single jet.
Branching fraction of X → HH will be around 25%. The H → b-bbar is the preferred decay since b is the most massive quark bellow one half of the Higgs mass. A collision happens only between internal quarks or gluons, which carry only a fraction of the total energy of the proton. Since this analysis uses data of sqrt(s) = 8 TeV, a reasonable effective energy is 3 TeV. Then the energy spectrum of this analysis ranges up to 3 TeV.
First studies have been performed only using QCD background, whose events were generated by MadGraph5 interfacing with Pythia6 for showering and hadronization. Events account only QCD interactions, without Electroweak bosons or top quarks. Since these events do not lead a very precise background estimation, background is estimated by a data-driven technique.
Event selections are enumerated as follow.
A possible discriminator between background and signal events is the N-subjetiness τ21 = τ2/τ1. That is, the smaller is τ21, the closer the jet is to a dipole (rather than monopole) structure, as is explained here.
B-tagging is a method used to identify jets originating from b-(anti)quarks, and is based on the lifetime of the decay products of the b-quark. Hadrons containing b-quarks frequently have a lifetime long enough to travel a measurable distance in the detector, causing a secondary vertex of charged tracks within the jet. This vertex is reconstructed using the adaptive vertex fitter in a cone of ΔR = 0.3 around the the primary vertex. The secondary vertex is rejected if it is either too much like the primary vertex or too far from it. The Combined Secondary Vertex (CSV) combines secondary vertices and lifetime information to construct a probability discriminator (between 0 and 1) to distinguish b-quark jets from other jets, resulting in two "working points":
Two baseline b-tagging approaches have been investigated:
The background is estimated using a data-driven called ABCD method. A sideband is defined with a different jet mass window cut on the second jet (first jet remains in a 110 - 135 GeV window). Then, the background for n b-tags is estimated using the spectrum of n - 1 b-tags. Assuming the mass and b-tag windows as in the table bellow, signal D can be estimated as D = (A/C).B.
Mass window | cut _n -1_ | cut _n_ |
---|---|---|
70 - 110 GeV | A | B |
110 - 135 GeV | C | D |
Two assumptions were considered to use this method:
Double subjet b-tagging with CSV shows better discrimination than fat-jet b-tagging. Only at very high pt, fat-jet b-tagging and subjet b-tagging are equally good. Therefore, different b-tagging cuts are chosen to be implemented on the four subjets, rather than on fat-jets:
Categories |
---|
≥ 1 loose |
≥ 2 loose |
≥ 3 loose |
exact 3 loose |
4 loose |
≥ 1 medium |
≥ 2 medium |
≥ 3 medium |
exact 3 medium |
4 medium |
Studies with Punzi Significance showed that b-tags strongly reduces the background, while leaving most of signal events, when increasing from 1 to 4 b-tags. For loose b-tags, the significance increases the more b-tags are aplied. For medium b-tags, 4 tags reduces the signal so much that it becomes less efficient.
To test the effectiveness of N-subjetiness, a cut τ21 < 0.5 on both jets is applied to all different b-tagging categories. Independent of b-tagging cut, the N-subjetiness reduces signal by a factor of 1.3, and QCD roughly by a factor of 3, being considered uncorrelated.
The used categories for limit-setting (via CLs techniques 1 and 2
) are:
In order to put exact limits on the production cross section of a heavy resonance, several feasibility studies have been done to get more accurate background estimations.
Using the fact that N-subjetiness and b-tagging are uncorrelated, a sideband is constructed with those variables. Then for signal region:
Signal | B-tag sideband | |
---|---|---|
Signal | 2 subjet b-tags | 0 subjet b-tags |
τ21 < 0.5 | τ21 < 0.5 | |
τ21 sideband 1 | 0.5 < τ21 < 0.75 | 0.5 < τ21 < 0.75 |
τ21 sideband 2 | τ21 > 0.75 | τ21 > 0.75 |
page 13 (3.4.1 N-Subjetiness and b-Tagging as Sideband): What is the meaning of "Jet 1 randomly chosen"?
At least 3 background events need to remain in the sideband in order to get the ABCD method working. Since there is a loose of too much (< 20%) signal and since the sensitivity at higher resonance masses is better without N-subjetiness, it is not considered a useful method.
A way to improve the initial background estimation (with ABCD method) is by varying the parameters on different jets, rather than only using one jet in the sideband. This estimation will work good enough if the correlation between those parameters is sufficiently small.
In this sense, signal region is defined as:
b-tags | Closure | Signal | |
---|---|---|---|
Signal | 110 < massjet1 < 135 GeV | 110 < massjet1 < 135 GeV | 110 < massjet1 < 135 GeV |
0 subjet b-tags on jet 2 | 1 subjet b-tag on jet 2 | 2 subjet b-tags on jet 2 | |
Low mass sideband | 70 < massjet1 < 110 GeV | 70 < massjet1 < 110 GeV | 70 < massjet1 < 110 GeV |
0 subjet b-tags on jet 2 | 1 subjet b-tag on jet 2 | 2 subjet b-tags on jet 2 | |
High mass sideband | 135 < massjet1 < 150 GeV | 135 < massjet1 < 150 GeV | 135 < massjet1 < 150 GeV |
0 subjet b-tags on jet 2 | 1 subjet b-tag on jet 2 | 2 subjet b-tags on jet 2 |
First checks were done using 0+1 subjet b-tags (0 subjet b-tags on the second jet, 1 subjet b-tag on first jet) to estimate 1+1 subjet b-tags. Estimations of 1+1 subjet b-tags give very reasonable agreement with the actual values. Estimation of 2+1 subjet b-tags high band is more difficult to compare because of lack of statistics.
It has been found that subjet b-tagging, in a boosted dijet topology, works very effectively as a background discriminator. Almost all QCD background has been removed, leaving most of signal intact. Categories with the highest efficiencies are 3 or 4 loose b-tags, and 3 or 4 medium b-tags. N-subjetiness applied to both jets (τ21 < 0.5) reduces signal events by a factor of 1.3 and QCD background events by a factor of 3, independently of applied b-tagging cuts.
First background estimation, where an ABCD method was attempted using a jet mass and a b-tagging sideband, with the primary jet still "signal-tagged", failed because of the correlation between those variables. The second background estimation, using uncorrelated N-subjetiness and b-tagging, failed due to lack of statistics in the sideband region. A third method, where jet mass and b-tagging were varied on different jets, seems much more successful so far.
This analysis has not been able to quantify this in an actual numbers and uncertainties yet, but this will be done later, and the same method will be used in further analysis and limit setting on production cross section.
The Ntuples (from Tijs) are located in /store/cmst3/user/mgouzevi/HH4B/TIJS_TREES
. They are:
/store/cmst3/user/mgouzevi/HH4B/TIJS_TREES/dijetWtag_Moriond_Data.root
/store/cmst3/user/mgouzevi/HH4B/TIJS_TREES/dijetWtag_Moriond_QCD500.root
/store/cmst3/user/mgouzevi/HH4B/TIJS_TREES/dijetWtag_Moriond_QCD1000.root
/afs/cern.ch/user/t/tomei/work/workWithAngelo/CMSSW_5_3_9/src/Ntuples/TNMc1/higgs_tagged_dijet_analysis_with_btag_TTJets.root
/afs/cern.ch/user/t/tomei/work/workWithAngelo/CMSSW_5_3_9/src/Ntuples/TNMc1/higgs_tagged_dijet_analysis_with_btag_Wbb.root
/store/cmst3/user/mgouzevi/HH4B/TIJS_TREES/dijetWtag_Moriond_HHPy61000.root
/store/cmst3/user/mgouzevi/HH4B/TIJS_TREES/dijetWtag_Moriond_HHPy61500.root
/store/cmst3/user/mgouzevi/HH4B/TIJS_TREES/dijetWtag_Moriond_HHPy62000.root
/store/cmst3/user/mgouzevi/HH4B/TIJS_TREES/dijetWtag_Moriond_HHPy62500.root
/store/cmst3/user/mgouzevi/HH4B/TIJS_TREES/dijetWtag_Moriond_HHPy63000.root
/store/cmst3/user/mgouzevi/HH4B/TIJS_TREES/dijetWtag_Moriond_Allsignal.root
srmcp -2 srm://srm-eoscms.cern.ch:8443/srm/v2/server?SFN=/eos/cms/store/cmst3/user/mgouzevi/HH4B/TIJS_TREES/<root_file_name> file:///<root_file_name>
But it is not recommend since some of the files are bigger than 1 GB. Instead, use the Physical File Name (PFN), like this:
TFile *file = TFile::Open("root://eoscms//eos/cms/store/cmst3/user/mgouzevi/HH4B/TIJS_TREES/dijetWtag_Moriond_Data.root")
How to allow someone to have access to files in you lxplus workspace:
fs setacl -dir /afs/cern.ch/user/t/tomei/work/workWithAngelo/CMSSW_5_3_9/src/Ntuples/TNMc1/ -acl <user_name> read
Dijet masses are fitted through the formula
were P0 takes care of normalization, while P1 and P2 take care of the distribution shape. There may be a correlation among these parameters. Here is the way to check such a correlation through a correlation matrix:
TH1F *h = new TH1F("h","h",100,-5,5); h->FillRandom("gaus", 50000); TFitResultPtr result = h->Fit("gaus", "S"); NO. NAME VALUE ERROR SIZE DERIVATIVE 1 Constant 2.00141e+03 1.09275e+01 4.14460e-02 -3.06373e-05 2 Mean -1.42906e-03 4.45701e-03 2.06212e-05 -4.26207e-03 3 Sigma 9.94981e-01 3.11521e-03 3.94839e-06 -3.03397e-01 TFitResult* r = result.Get(); r->GetErrors()[0] 1.09274960858893486e+01 r->GetErrors()[1] 4.45701208205878576e-03 r->GetErrors()[2] 3.11520927801539615e-03 r->PrintCovMatrix(cout) Covariance Matrix: Constant Mean Sigma Constant 119.41 -0.00017258 -0.019493 Mean -0.00017258 1.9865e-05 8.4435e-08 Sigma -0.019493 8.4435e-08 9.7045e-06 Correlation Matrix: Constant Mean Sigma Constant 1 -0.0035434 -0.57263 Mean -0.0035434 1 0.0060812 Sigma -0.57263 0.0060812 1
-- Main.assantos - 2013-10-28
I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|
![]() |
FitFormula.gif | r1 | manage | 1.0 K | 2013-11-21 - 19:57 | UnknownUser | Fit formula |
antalya escort bursa escort eskisehir escort istanbul escort izmir escort