Online calibration with neutral pions
The details of the π
0→γγ method for the ECAL calibration as well as the description of its implementation within the
KaliCalo package can be found in the internal notes:
At first run
To run the calibration within the online framework one needs to have an online account. To get it contact
lbonsupp@cern.ch. For the test run login to a plus node
$> ssh -Y lxplus
$> ssh -Y lbgw
$> ssh -Y plus
and setup the build environment as shown
here 
.
lb-dev AlignmentOnline v11r0
cd AlignmentOnlineDev_v11r0
getpack PRConfig head
cd PRConfig
ln -s . v1r999
Besides the PRConfig one needs to get the
KaliOnline
cd $User_release_area/AlignmentOnlineDev_v11r0
getpack AlignmentOnline/PyKaliOnline head
cd AlignmentOnline/PyKaliOnline/python/PyKaliOnline
Note: if “getpack” requests for a password, try getpack -p anonymous
AlignmentOnline/PyKaliOnline head
Put the scripts necessary for the offline tests into appropriate places
cp offline_tests/* .
cp -rf offline_tests/CalibTests ../../../../PRConfig/python/.
mv offline_tests/KaliOnlineTest.py ../../../../PRConfig/scripts/.
rm -rf offline_tests/
One would also need a local copy of Brunel and
DaVinci
cd $User_release_area
lb-dev Brunel v50r1
cd BrunelDev_v50r1
make configure
make -j 5 install
cd ../
lb-dev DaVinci v40r1p3
cd DaVinciDev_v40r1p3
make configure
make -j 5 install
Don't forget to compile
cd $User_release_area/AlignmentOnlineDev_v11r0
make configure
make -j 5 install
Before running a test create a working directory, for example
mkdir /home/your_user_name/tests
And get some data. A small sample of test data files is located on hlte0902 at /localdisk/hlt1/test_align (copy it to a locally available working directory)
At each run
Follow the instructions for
login into a suitable machine 
:
- hltperf-action-x5650
- hltperf-dell-x5650
- hltperf-asus-amd627
Beware that you need to load the LHCb environment on these machines:
source /cvmfs/lhcb.cern.ch/group_login.sh
And setup the running environment as it is shown
here
cd $User_release_area/AlignmentOnlineDev_v11r0
./run bash
export PYTHONPATH=/home/raaij/pydim/lib/python2.7/site-packages:/scratch/jenkins/benchmark/python:$PYTHONPATH
export HLTTCKROOT=/group/hlt/sattelite/MooreOnlinePit_v24r0p1/TCK/HltTCK/
cd $User_release_area/AlignmentOnlineDev_v11r0/PRConfig/scripts
To run the calibration test instead of the alignment use the command
python KaliOnlineTest.py --numa --nodes=2 --workers=10 --task-log=log/align.log --directory=/home/your_user_name/tests/ --viewers KaliCalibration
An output
An output of the calibration will be written to /home/your_user_name/tests/Store (or whatever working directory you use). It will contain
- a list of zipped files starting with 'Histograms' – the full database of pi0 histograms saved at each iteration;
- a list of zipped files starting with 'Lambdas' – the databases with calibration constants at each iteration;
- a list of zipped files starting with 'Problematic' – the databases with lists of badly-fitted cells at each iteration;
- root files 'OutputHistograms.root' and 'OutputCoefficients.root' – histograms with visual information of the calibration output;
- a text file 'CalibCoefficients.txt' – the list of calibration constants for each cell.
The
OutputHistograms.root file contains the histograms allowing to estimate the convergency and watch the neutral pion signal evolution during the calibration. That are:
- Conv1 (Conv2, Conv3,... - here the number corresponds to the number of a "secondary" iteration) - neutral pion peak position on each of the "primary" iterations;
- Sigm1 (Sigm2, Sigm3,... - here the number corresponds to the number of a "secondary" iteration) - neutral pion resolution on each of the "primary" iterations;
- MassInnerPassNItM, MassMiddlePassNItM, MassOuterPassNItM, MassPassNItM (where N and M are the numbers of the "secondary" and the "primary" iteration correspondingly) - the diphoton invariant mass distribution at each of the iterations for the Inner, Middle, Outer parts of the ECAL and for the whole electromagnetic calorimeter correspondingly.
The
OutputCoefficients.root file contains the calibration constants distributions:
- It1distr(Inner/Middle/Outer) - the distributions of the coefficients obtained during the first "secondary" iteration fitted with a Gauss function;
- It2distr(Inner/Middle/Outer) - the distributions of the coefficients obtained during the second "secondary" iteration fitted with a Gauss function;
- It12distr(Inner/Middle/Outer) - the total distributions of the coefficients fitted with a Gauss function;
- comp2DFace(Inner/Middle/Outer) - the 2D distribution of the total coefficients over the surface of the Inner, Middle and Outer zones of the ECAL (one bin corresponds to one cell).
Several notes: To reduce time spent for the tests one can reduce the number of iterations (by default there are 2 “secondary” iterations with 7 “primary” iterations in each). For this open $User_release_area/AlignmentOnlineDev_v11r0/AlignmentOnline/PyKaliOnline/python/PyKaliOnline/Paths.py Change the
PassIt (double number of primary iterations) and
MaxIt (total number of iterations) parameters. For example, if you need 2 “secondary” iterations with n “primary” iterations in each, it will be
PassIt = 2n,
MaxIt = 4n+1.
All the data is transmitted between Iterator and Analyzer by saving it to files. The filenames always contain the host name of the node, at which the calibration is run. So, if you stopped the task and then wish to start it from the existing point (for example not to rerun the reconstruction step, but to use already existing dsts), please make sure, that you run it at the same node.
An example of an output database with histograms may be downloaded from here:
https://cernbox.cern.ch/index.php/s/NocZidC69TUxkGX 
. An example script to read it is
ReadHistoMap.py.txt (remove the .txt extension before use). The
DaVinci environment is needed to run the script.
The calibration is run from a control panel for the LHCb Align partition (aka
LHCbA), which is shared for several tasks. Note, that there is a
timetable 
for the partition usage. The calibration also shouldn't be run along with the datataking (which could be checked with the
vistar 
).
Please, note that at the moment one should not use the partition, unless an explicit permission is given!
Code and data location
The code is available in the
AlignmentOnline/PyKaliOnline package under the
AlignmentOnline project. The one running for the online calibration is located at /group/calo/cmtuser/AlignmentOnlineDev_v10r4/KaliOnline/PyKaliOnline/python/PyKaliOnline/ (accessible from any plus or hlt machine).
The startup scripts point to the debug build. So, making any modifications, don't forget to compile it also:
>> echo $CMTCONFIG
>> x86_64-slc6-gcc49-opt
>> export CMTCONFIG=x86_64-slc6-gcc49-dbg
>> make
>> export CMTCONFIG=x86_64-slc6-gcc49-opt
>> make
The startup scripts are: /group/online/dataflow/cmtuser/OnlineDev_v5r19/Online/FarmConfig/job/Alig{Drv,Wrk}.sh. You have to be user online to change them. The raw input data on each node is located in
/localdisk/Alignment/BWDivision/
folder.
To open it log in to a ui machine:
>> ssh -Y lbgw
>> ssh -Y ui
>> /group/online/ecs/Shortcuts311/LHCb/ECS/ECS_UI_FSM.sh &
That will start the run control. Right-click on
LHCbAlign.
And the control panel will be opened. To be able to see the output logs, open the error logger. For this in another shell
>> ssh -Y lbgw
>> ssh -Y plus
>> errorLog LHCbA
(three windows with black backround will open).
Back to the
LHCbA partition control panel. When the partition is relased all the buttons are grey. Click the lock sign to take it:
This will open a small new window, click the "Take" button. This could take a few minutes.
Click dissmiss on the popup.
The partition is now in state "NOT_ALLOCATED". To prepare for running
- Enter your name in a window and click "Reserve alignment" so that everyone could know, who's using it at the moment.
- Select the Calo task from menu.
- Choose the runs to be used.
After clicking on "Choose runs for alignment" a new window will be opened. Usually not too much data is saved under the Calo task, the data from BWDivision task is used for calibration. So, select the "BWDivision" task from the menu, then select the runs (Shift+Up or Shift+Down to select several runs). Close window by clicking "OK".
Getting back to the control panel, one can now send the "ALLOCATE" command from the top:
The partition will become "NOT_READY". Send "CONFIGURE" from the top.
Sometimes during the configuration the partition can go to an "ERROR" state, as not all the nodes are ready to work. This can be overcome by recovering
This will bring the partition into the "NOT_READY" state. Then try to configure once again. The nodes causing troubles will be excluded. Another way to exclude the nodes is to click the "HLT" button.
A new window will be opened, in which one can look at the states of the nodes, include or exclude them. To include (exclude) nodes select them from the right (left) column and move to the left (right) one by clicking on left (right) arrow. Then click "Include" ("Remove") button below. The same thing could be done to a whole sub-farm.
When the partition is finally in the "READY" state, send "START_RUN" from the top That will start everything.
Error logs
Error logs are the three windows allowing to watch an output of the job.
The Message log shows output from all the nodes in real time. Usually it is more convenient to set an output level for it to "Warning" or "Error". The History log allows to look at an output from a particular node. The Error logger allows to change the output settings:
To look at the history output from a certain node just enter its name and press "Enter". For example "hlta0101". The Iterator is doing its job on "hlt0x" machines (where "x" is usually "1" or "2"), its output is also can be looked at through the history display. To set Message log output level change the value in the "Severity of the messages" window. For this twice press the "Up" key on your keyboard and then select a necessry value with ">" and "<" buttons. To change a number of output lines press "Up" button and enter a number of lines you would like to see in the output. Then press "Enter".
Output location
Output histograms and calibration coefficients go to
/group/calo/CalibWork/Store/
The fmDSTs, DSTs and root files created during the work are saved to
/localdisk/Alignment/Calo
at each node.
Independand of whether the test or full calibration is being run the scripts, which govern the job are the
Iterator and
Analyser. The Iterator runs on a single node, counting the iterations and performing those tasks which need not to be parallelized. The Analyser script is run on each node of the farm, having an access to a piece of data saved at a current node and allowing to split large tasks in many parallel processes.
Not being able to pass the data between each other, the Iterator and the Analyser communicate through the
Communicator by sending and recieving information about states of each other. All the other data, which needs to be passed between different alorithms is saved as files in specific folders. The dsts, fmDSTs and root files are saved at a local disk of each node (accessible by the Analyser only). The databases with histograms and calobration constants are saved to a common folder and can be picked up by both the Analysers and the Iterator.
The
Paths.py file contains the locations and names under which the transitional data is saved and severl functions to get the list of existing files of a certain type (for example to get a list of databases with the calibration constants obtained from several nodes). This is made for convenience, as the same files could be used by several scripts. The Paths.py also contains the variables for the required number of iterations and an import of some Online configration.
The
RunBrunel.py (
RunKali.py) script is aimed to set up the Brunel (
DaVinci) environment and run the reconstruction (Kali functions) under this environment as if it was run from a command line.
MultiBrunelStep is just a usual Brunel job with some tuning allowing it to run in online. To make the reconstruction at each node faster, the job is also split between cores of the node, using the multiprocessing python module.
The
MultiKaliPi0 _producefmDST script is a Kali (
DaVinci) job, performing neutral pion selection and creating the fmDSTs and root files. It can be run in two modes. It case if no calibration constants are given (which normally means the very first iteration, "first pass") it produces the fmDSTs and root files using the dsts as an input. In case, when a database with constants is recieved it runs a re-reconstruction from the fmDSTs ("second pass"). Again, to increase the speed of processing this task is split between the cores of each node using the multiprocessing module.
In the
Run.py file several functions are gathered, which perform different tasks at different stages of iterative procedure. That are:
- FillTheHistograms - given a list of root files it fills the histograms for each cell using the corresponding Kali function;
- SplitHistos - collects the databases with histograms filled at different nodes and merges them so that for each cell the full statistics was used. Then the full database is split into cells groups and saved to different files to be processed by the fitting function at different nodes;
- FitTheHistograms - given a database with histograms fits them and calculates the calibration constants using the corresponding Kali functions;
- CollectLambdas - collects the databases with calibration constants for each cells group obtained at different nodes and saves them to a single file;
- GetOutput - in the end of calibration produces files with some statistical histograms and final list of calibration constants.
Reconstruction and fmDST production steps
Some problems (errors like 'PropertyConfigSvc: could not obtain alias TCK/0x11291600' or even zero output without any visible errors) may arise at the reconstruction/fmDST production steps due to not up-to-date tags or software versions. Currently these versions are set up manually in the code. One may look up the latest versions of the Online and
OnlineBrunel available online here:
/group/online/dataflow/cmtuser/
Then open '/group/calo/cmtuser/AlignmentOnlineDev_v10r4/KaliOnline/PyKaliOnline/python/PyKaliOnline/RunBrunel.py' and in the string
brunel_path = "/group/online/dataflow/cmtuser/OnlineBrunel_%s" %'vXrYY'
replace the
OnlineBrunel (vXrYY) version with the one you wish to use.
The latest
CondDB tags available can be found here
/group/online/hlt/conditions/
just look for everything starting with 'LHCBCOND_'. The part of the filename containing 'cond-yyyymmdd' (where yyyymmdd are the 8 digits) will be the name of the tag. Then open '/group/calo/cmtuser/AlignmentOnlineDev_v10r4/KaliOnline/PyKaliOnline/python/PyKaliOnline/Paths.py' and set the tag in the function
def importOnline():
[...]
Online.CondDBTag = 'cond-yyyymmdd'
If you wish to get a newer version of
DaVinci, first check it out from repository (getpack Phys/KaliCalo if needed):
cd /group/calo/cmtuser
lb-dev DaVinci vXXrY
cd DaVinciDev_vXXrY
make configure
make -j 5 install
then do
source /group/online/dataflow/scripts/shell_macros.sh
cd /path/to/DaVinciDev_vXrY
do_configure
do_install
cmsetup
don't forget to make. Then go to '/group/calo/cmtuser/AlignmentOnlineDev_v10r4/KaliOnline/PyKaliOnline/python/PyKaliOnline/RunKali.py' and in the string
dv_path = "/group/calo/cmtuser/DaVinciDev_v40r1p3"
replace the
DaVinci version with the one you wish to use. Don't forget to make
AlignmentOnline package also.
During the run
The configuration of the run is picked up, when a "CONFIGURE" command is sent from the top. So, if any changes were made in the calibration code and one wishes them to be included, then after compiling (both opt and debug build), in is necessary to configure once again. For this:
- Stop run;
- Send the "RESET" command, partition will go in into "NOT_READY" state;
- Send the "CONFIGURE" command.
Sometimes it happens that during an iteration one or several of nodes can stall, which hangs everething in the "RUNNING" state with no visible activity (no output in the logs for these nodes). One can exclude them as shown above, without stopping the run. It will run the job further (the partition will go to the "READY" state, as it should be between iterations).
--
DariaSavrina - 2016-02-05