Tags:
create new tag
, view all tags

Guide for Monitoring Experts

This guide is meant as a reference for Monitoring experts of the different subdetectors/subsystems. It is mainly focused on the online monitoring framework, but the use of the common tools in the data quality offline environment is also described.


Before you start

be sure that:

  • you know the HIST_WRITER account's password of the histogram DB (ask the HistDB mantainer or another monitoring expert);
  • if you intend to mantain online reference histograms, you are on the hstwriter group (ask lbonsupp@cern.ch if not);
  • you are on the lhcb-online-monitoring mailing list, that should be used for any request or bug report.

The Histogramming Framework

The main components of the online histogramming framework are:

  • monitoring tasks, maintained by monitoring experts, publishing histograms via DIM through the MonitorSvc;
  • the Adder/Saver applications (or the MonitorSvc itself for most tasks having a single instance) that regularly save histograms to "Savesets" (ROOT Files with standard naming) and reset them;
  • an Histogram DB (HistDB), storing the list of available histograms (not their content) and the associated display/analysis options. These are maintained by the monitoring experts using the Presenter, the HistDB web interface https://lbhistogramdb.cern.ch, or the HistDB C++/Python API;
  • the Presenter application, providing the GUI to look at online and saved histograms according to the configurations stored in the HistDB;
  • analysis jobs performing automatic checks on the savesets and reporting possible alarms to the shifters.

histoFlow.png

Accessing the Histogram DB

The Histogram DB is hosted on the Oracle server at the pit where is called HISTDB. Users can connect with the read-only account HIST_READER (password is "reader") or with the HIST_WRITER account for having write access. All online applications know the correct connection parameters and users should not care about them.

For the offline environment, there are presently two options for using the production DB:

  • the DQ farm pcs pclbocr[123].cern.ch have access to the DB at the pit.You can use the Presenter from those nodes with the following commands:
  setenv CMTCONFIG slc4_ia32_gcc34
  SetupProject Online
  presenter.exe -C /afs/cern.ch/lhcb/group/dataquality/ROOT/presenter_2.cfg
    

where the relevant lines in the presenter config file are:

tnsnames-path = /afs/cern.ch/lhcb/group/dataquality/ROOT
databases = HISTDB
database-credentials = HIST_READER:reader;HIST_WRITER

  • a read-only mirror of the HistDB, called LHCBR_HIST, can be seen worldwide. You can thus use the Presenter even outside cern, using the following commands:
 SetupProject Online
 presenter.exe -C  /afs/cern.ch/lhcb/group/dataquality/ROOT/presenter_1.cfg

In this case the connection lines are

databases = lhcbr_hist
database-credentials = HIST_READER:reader

The mirror is updated every 8 hours, so the most recent changes won't be visible.

Histogram Creation

All histograms created using the HistogramSvc in a Gaudi job are published using DIM if you use the MonitorSvc and run the application using GaudiOnline.

Important: for online monitoring tasks, all histograms should be booked at initialization to be handled properly by the online framework

For the monitoring framework, histograms are uniquely identified by their full path name (including the GaudiAlgorithm name) and the TaskName. The latter identifies the monitoring task and is specified by the UTGID environment variable. For most tasks, UTGID has the format

<Partition>_<node>_<Taskname>_<nodenumber>

e.g. LHCb_MONA0805_VeloDAQMon_00 for task VeloDAQMon. All changes to the histogram identifier (TaskName or the histogram name) have to be propagated to the HistDB by hand and are not recommended. Instead, the histogram title (the only thing that is displayed to users by the Presenter) can be changed at any time without problems.

Two strings have special meaning in the histogram names:

  • "/" can be used to organize histograms in a folder tree structure for ROOT and the Presenter as well;
  • "_$" can be used to group histograms in sets, for example histograms
    Task= MuonDAQMon  Name= MuonMonitor/Pads by Region_$Q1
    Task= MuonDAQMon  Name= MuonMonitor/Pads by Region_$Q2
    

are part of the histogram set "MuonDAQMon/MuonMonitor/Pads by Region". Sets are meant for similar histograms that should have the same display or analysis options. The HistDB interface allows to specify options at set level, keeping the possibility to modify them at the level of single histograms.

Important : histograms for different time periods are added dynamically by the Presenter or analysis tools using the ROOT merging methods. You should publish only histograms that can be merged meaningfully (normal 1d or 2d counting histograms, or profile histograms). If you want to display e.g. a ratio (an efficiency plot for example), you should define it as a virtual histogram.

Histogram Declaration to HistDB

Every time histograms are added to the monitoring tasks, they have to be declared to the HistDB in order to be used by the Presenter or analysis tasks. This can be done in multiple ways:

Using the Saveset2HistDB tool

This is the recommended way.

This program is included in the Online/OMAlib package. Online, use

 
  SetupProject Online 
 Saveset2HistDB.exe -t <TaskName> <mysaveset.root>
    

the TaskName field is mandatory: be sure to use the correct one for your saveset. The program requires the HIST_WRITER password and will ask for confirmation before committing changes to the DB.

Outside the pit, you can use the tool only from the DQ farm pcs pclbocrX.cern.ch. Use

 
  SetupProject Online 
  setenv CMTCONFIG slc4_ia32_gcc34
  setenv TNS_ADMIN /afs/cern.ch/lhcb/group/dataquality/ROOT 
  Saveset2HistDB.exe -t <TaskName> <mysaveset.root>
    

Using the Presenter

Use Tools->Page Editor Online to declare histograms from DIM or Tools->Page Editor Offline to declare histograms from a saveset. You are asked to reconnect to the DB: use the HIST_WRITER account. Then select histograms on the right upper tree and declare them to the DB. See the Presenter documentation for more details.

Note than in the offline mode, the taskname is taken from the saveset file name, that must follow the correct naming convention (the file name is important not its path).

Using the InsertDB application

When a run (LHCb or FEST) is taking place, InsertDB can be used to add Moore and HLT/Lumi monitoring histograms to the Hist DB. It also creates a hierarchy of pages per trigger type (nickname).

Changing the TaskName

A tool is provided in the Online/OnlineHistDB package if you need to change your TaskName for some reason and want to keep all the display/analysis options of your histograms. You can change the TaskName in the HistDB using

 
  SetupProject Online
  ChangeTaskName.exe <oldTaskName> <newTaskName>
 

Virtual Histograms

In order to allow for some online processing of the published histograms, and to avoid any redundancy in the published information, "virtual" histograms can be defined in the HistDB via its web interface (recommended) or its API. Virtual histograms are obtained on the fly by the Presenter from the published ones or from savesets, allowing e.g. to plot online the ratio of two published histograms.

Once defined in the HistDB, they can be used as any other histogram, transparently for the users: you can include them in pages, define their display or analysis options. The definition consists in an algorithm name, a set of input histograms, some optional input parameters (depending on the algorithm) and the output histogram name, that will be also the histogram title. Algorithms are maintained in the Online/OMAlib package (objects of class OMAHcreatorAlg in the algorithms folder). You can ask the OMAlib maintainer to add your favorite. Some of the algorithms currently available are:

  • Divide: bin-by-bin ratio of two independent histogram;
  • Efficiency: bin-by-bin ratio of two histograms with binomial errors;
  • Scale: divide all bins of the first histogram by the content of the a bin of the second histogram. If the second histogram is equal to the first, the normalization bin is removed from the output;
  • HMerge: Merge histograms into a single histogram with possible variable bin sizes. This can be used to merge detector maps with different granularity without introducing extra-bins in the published histograms (since variable bin size is not supported by the MonitorSvc). An example is shown here merged.png

see the up-to-date full list of algorithms for virtual histograms

For the virtual histogram definition, use the web interface "Create Virtual Histogram" link (bottom left). You can choose the algorithm from a menu (a short documentation is provided). For algorithms without a fixed number of inputs (as HMerge), you can specify as input an histogram set (all histograms in set will be in the input) and/or up to 8 single histograms.

Virtual histograms can be nested (input of a virtual histogram can be a virtual histogram itself).

The task name of the virtual histogram is MotherTaskName_ ANALYSIS where MotherTaskName is the TaskName of the first input histogram. Note that input histograms can also belong to different tasks, but in such case the virtual histogram can only be displayed online, not in history mode, and can't be analyzed by the automatic tasks. The algorithm name is the name of the OMAlib algorithm and the histogram name is specified at creation time.

Note that virtual histograms are transient objects produced on the fly for display or analysis and are never saved to savesets.

Defining and Editing your Pages

As described below in more detail, you can define and edit your Presenter pages interactively in an (almost-)user-friendly way using the Presenter GUI itself. The HistDB web interface allows to see and change all the recorded information interactively using forms. You can also use the HistDB APIs to mantain your configurations in a C++ or Python code.

Using the Presenter

See the Presenter documentation for directions.

In short:

  • "Tools -> Editor Online (or Offline) Mode" (use Online if your source are online histograms, offline if you loaded a saveset)
  • pick your histograms from the lower right menu, right mouse button-> "Add checked histograms to page"
  • rearrange your histograms interactively, you can change the display options from "Edit -> Histogram Properties" using the ROOT panel
  • "File -> Save Page to Database" when you're done

Note that you need write access to the HistDB: login as HIST_WRITER when requested. Offline, you must use the ocr pcs.

To use a saveset as your source, you should load it first in History mode ("set file" in the history menu), then go to "Tools -> Editor Offline Mode". The saveset should follow the standard naming conventions containing the TaskName, as is true for all online savesets. Offline DQ files, that don't follow the convention, should be renamed first as TaskName-run0.root (where TaskName is Brunel or DaVinci).

Using the Web Interface

You can verify/edit the recorded display settings of your pages/histograms using the HistDB web interface . Use the navigation buttons on the left to browse through histogram or page records. You can also edit the task records (only the associated detectors are currently meaningful). In order to see the changes done via web on the Presenter, you need to reload the HistDB settings by clicking the right mouse button on the page tree on the left and selecting "Refresh". If you modify histogram specific options and you see no effect, you probably have page-specific options that can be seen/edited/removed from the page records.

Coding your pages

To have your configuration saved in a code, and to automatize the configuration of many similar pages, you can use the OnlineHistDB C++ API (see documentation). An example is provided in the Online/!OnlineHistDB packages, file doc/muon_example.cpp.

A python interface to the C++ API is provided in the Online/HistDBPython package, developed and maintained by N.Chiapolini. See the doc folder for examples.

Important notice: all changes done through the API have to be committed explicitly. All open uncommitted transaction lock the DB, preventing other users to modify it. This behaviour will be changed in the future; in the meantime you are asked to check your code carefully in order to close the transactions (deleting the OnlineHistDB objects) at the end of the job and in case of errors. If a Presenter editing session hangs or crashes, please check that the process has been killed and no ghost transactions survive. Transactions locking the DB can be killed by hand using this emergency tool .

Display Options

The list of available display options can be seen looking at the histogram records on the web interface.

dispopt2.png

Most of the parameters correspond to ROOT display options. Special options are:

  • the alternative title to be shown;
  • the normalization option, allowing to draw the histogram with normalized area, e.g. to 1;
  • the reference option, to switch off reference overlaying or to specify the reference normalization;
  • the fit option, to fit the histogram on every display with the specified function. Initial fit parameters can be specified. The list of possible fit functions is maintained in the Online/OMAlib package. You can ask the OMAlib maintainer to include your favorite.
  • the pattern option can be used to specify a filename containing a ROOT graphical object to be overdrawned on the histogram (to provide custom grids, text, etc.)

Editing the histogram record from the web interface you can also:

  • specify custom static bin labels. If you need the bin labels to be changed dynamically or you want them to be stored on savesets, use this trick in your monitoring task;
  • link a page to the histogram: when clicking on the histogram on a Presenter page, the linked page will be loaded.

Note that histogram display options can be defined at three levels:

  1. Histogram set
  2. Single Histogram
  3. Single Histogram on given page

The most specific (higher number on the list) has priority. Using the web interface or the API you can explicitly select the option level. When saving a page with the Presenter, the level is chosen automatically as the first unused one. For example, if you create a new set with ten histograms, the first histogram having the options recorded will define the default for all histograms in set; when you save options for another histograms (or re-edit the options for the first one), options will become specific for that histogram only. In the same way, if you add the same histogram on two pages, the options for the second instance will default to the options of the first, and can then be edited to make them page-specific.

Savesets

Savesets are available online under $HISTDIR/$HISTSAVESETSPATH (presently /hist/Savesets) with the following naming convention:

$HISTDIR/$HISTSAVESETSPATH/year/partition/TaskName/mont/day/TaskName-(run-)YYYYMMDDTHHMMSS(-EOR).root

where the optional -EOR string indicates that the saveset has been produced at end of run. The run number is present only if the saving task knows about it.

The Presenter in history mode can show the pages integrated on any time or run interval: choose the appropriate partition and type "set interval" in the history menu.

You can choose to see the summed histograms over the time/run range (default) or the trend of the histogram mean and rms (check the "sum" button and reload the page), see example below. In trend mode every bin correspond to a saveset (or run number) and time(run) is shown on bin labels every few bins (unless there's a time jump larger than 1 hour(run) between two consecutive bins).

histmode2.png

Savesets get aggregated by run by a dedicated service. Aggregated savesets appear, with a delay of 10 to 20 minutes from the run end, as

$HISTDIR/$HISTSAVESETSPATH/ByRun/run10k/run1k/TaskName-runrun.root

where run10k = int (run/10000)*10000 and run1k = int (run/1000)* 1000

You can navigate through aggregated savesets using the run widget on the top of the Presenter window.

Reference Histograms

Online reference histograms are maintained by monitoring experts under $HISTDIR/$HISTREFPATH (currently /hist/Reference). Write access requires to be in hstwriter group. Reference histograms must be in a file called

$HISTDIR/$HISTREFPATH/TaskName/default_1.root

that can be simply the copy of a "good" saveset. Single histograms can be modified by hand using ROOT or using the Presenter in Online Editor mode (select the histogram and use Edit->Save As Reference Histogram) (this feature in Offline editor mode needs to be fixed).

Multiple reference histograms for different run types and ranges are foreseen in the name of the reference file:

$HISTDIR/$HISTREFPATH/TaskName/runtype-StartRunRange.root

At the moment, the only supported runtype is the HLT TCK (if available as DIM service), while the run range is not yet supported.

A button is available on the Presenter to switch on/off reference overlay (the blue one on the upper right). You can inhibit reference overlay or control the reference normalization using the dedicated display option.

Automatic Histogram Analysis

Automatic analysis jobs run on every saveset as soon as is available, to provide immediate feedback to the shifters. A generic job (DbDrivenAnalysisTask) performs checks that are associated to histograms in the HistDB by the monitoring experts. These are maintained through the HistDB web interface or API, without having to write any code . The DbDrivenAnalysisTask output consists in messages to the shift crew.

More specific analysis requiring custom code (performing calibrations, publishing new histograms, etc) can be implemented as well using common tools provided by the Online/OMAlib package.

HistDB Driven

The recommended way to associate an analysis to the histograms in the DB is to use the web interface . From your histogram record, use the "Add Automatic Analysis" button. You can choose among a list of predefined algorithms, maintained in the Online/OMAlib package (objects of class OMACheckAlg in the algorithms folder, you can ask the OMAlib maintainer to add your favorite). The analysis is defined by the algorithm name, an optional set of input parameters (depending on the algorithm), and two threshold level (warning and alarm) for every output parameter of the algorithm.

see the up-to-date full list of algorithms for automatic analysis

You can also specify requirements on the detector status before performing the analysis on a saveset. Conditions you can currently require are:

  • LHC in PHYSICS state
  • VELO closed
  • HV in ready state (for all included detectors)

Conditions must be met at the beginning of the run and at the time of saveset in order to perform the analysis. Conditions that are not required in the analysis definition will be ignored.

You can also tune the statistics threshold to perform the analysis, in order to avoid fake alarms due to statistical fluctuations. You can specify a minimum number of average entries per bin and a minimum fraction of bins above this threshold. If those values are not set, a default depending on the algorithm is used.

You can choose the algorithm and set the parameters interactively on the web page, a documentation on the parameter meaning is provided.

defineAna.png

If you have an histogram set, the analysis can be defined on the set, not on the single histogram; then you can adjust the settings for every histogram in set.

You can edit the analysis parameters and mask/unmask the analysis at any time, the changes have immediate effect.

Custom Analysis

You can implement your custom analysis by writing a Gaudi application with an algorithm inheriting from the AnalysisTask class of the Online/OMAlib package (which in turn inherits from GaudiHistoAlg). The algorithm class must contain a method

virtual StatusCode analyze(std::string& SaveSet,
                           std::string Task);

that will be called every time a new Saveset is available for a given monitoring Task. In the (default) online mode, the analysis job waits undefinetely for new savesets after inizialization and never exits. The analysis can operate also in test mode, just running on some static input files and exiting. The analysys task behavior is controlled by the following options:

  • useHistDB to use the HistDB (default false);
  • HistDB, HistDBuser, HistDBpw to choose an alternative HistDB account. The default one is the read-only account of HISTDB. Switch to the HIST_WRITER account if you want to store messages to the HistDB;
  • HistRefRoot to change the histogram reference path (the online path is the default);
  • InputFiles is a vector of strings, defining the input files for test mode: if specified, the analysis will run on those files (one by one) and then exit without waiting for new savesets;
  • InputTasks (mandatory, unless you have InputFiles) is a vector of strings, listing the monitoring task(s) to be monitored. If the first element is "any", the analyze method will be called for every saveset of every task;
  • Partition is the partition to monitor, default is LHCb;
  • TextLog to specify an optional log file where alarms are reported (other than the MessageSvc);
  • HistDBMsgPersistency to switch on/off saving of messages in the HistDB, default is true;

  • StopAlgSequence: with the default value (true) the algorithm stops the gaudi sequence, waiting for new savesets, after initialization. If you want to run multiple analysis algorithms in the same sequence, you should set this option to false for all algorithms except the last one.

Examples of option files are available in the Online/OMAlib/options folder.

Analysis Messages

Text messages (warnings or alarms) are published through the Gaudi MsgService (for the logViewer) and DIM (for the Alarm Screen), and (if requested) stored in the HistDB. They can be seen in various ways:

  • using the Presenter Alarm Display ( View->Show alarm list), together with the associated histograms. The list of messages is updated automatically when it changes. Messages are sorted by subsystem and a different symbol is shown depending on severity (warnings or alarms). Messages that have been disabled are kept in the DB and can still be seen in the "archive" folder. Note that the histogram on saveset cannot be modified by the analysis persistently, so any histogram manipulation performed by the analysis (eg a fit) is not shown;
  • optionally, they are written in a log file. The log files of the DbDrivenAnalysisTask can be found under _/group/online/HistTools/Analysis/log ;
  • using the standard logViewer;
  • they will be sent to the LHCb Alarm Screen (not yet implemented);
  • using the script /group/online/scripts/dumpOMAlarms [ -t AnalysisTaskName ]
    (use -h for other options).

Messages are not resent if already present in the DB (from the same analysis check on the same histogram), unless the severity increases. If an analysis that raised an alarm is performed again on a new sample with sufficient statistics (according to the statistics threshold that can be set at analysis level) and the alarm condition is no more present, the alarm is considered "solved" and gets disabled. However, it is still kept (for 60 days) in the Database and can still be seen in the Presenter Alarm Display under the "archive" folder. If the alarm condition appears again on a later sample, the alarm is retriggered. On the Presenter Alarm Display you can see for every alarm the total number of occurences (how many samples triggered the alarm), how many times the alarm was solved, how many times it was retriggered, the time of first occurence and the time of last occurence (or last solving).

Analysis Messages from Custom Analysis

If you're implementing a custom analysis, it is your responsability to send the alarm messages.

Analysis messages are identified by the combination of

  • the analysis task name (the name of the analysis gaudi algorithm)
  • the name of the associated check, defined by the user with a name
     inline void AnalysisTask::setAnaName(std::string &name)
  • the ROOT name of the checked histogram
  • the input monitoring task name

You can send a message using

 
void raiseMessage(OMAMessage::OMAMsgLevel level,
                    std::string& message,
                    std::string& histogramName); 

where OMAMsgLevel is

 typedef enum { NOSTAT=2, INFO=3, WARNING, ALARM} OMAMsgLevel;

After running the analysis on a new saveset, all previous messages from the same analysis that have not been confirmed via a raiseMessage call with level NOSTAT (not enough statistics), WARNING or ALARM will be disabled.

Example:

#include "OMAlib/AnalysisTask.h"
class MyAnalysis: public AnalysisTask {
public:
 MyAnalysis( const std::string& name, ISvcLocator* pSvcLocator );
 virtual ~MyAnalysis( );

 virtual StatusCode initialize();    ///< Algorithm initialization
 virtual StatusCode finalize  ();    ///< Algorithm finalization
 virtual StatusCode analyze(std::string& SaveSet,
                            std::string Task = "any");
};

MyAnalysis::MyAnalysis( const std::string& name, ISvcLocator* pSvcLocator ):
   AnalysisTask(name, pSvcLocator) {
 ....
};

virtual StatusCode analyze(std::string& SaveSet, std::string Task) {
 AnalysisTask::analyze(SaveSet, Task);
 TFile* f=new TFile(SaveSet.c_str());
 if ( f->IsOpen() ) {
   TH1* h = f->Get("MyAlgorithm/MyHistogramName"); // get the histogram you want to analyze
   if (h) {
     std::string anaName = "MyAnalysisCheck1";
     setAnaName( anaName );
     .... perform your analysis ....
     if (alarm condition) {
        std::string alarmMessage = "Alarm on my system for this and that reason";
        std::string hname( h->GetName() );
        raiseMessage(OMAMessage::ALARM,
                 alarmMessage,
                 hname); 
     }
   }
  return StatusCode::SUCCESS;
 }
}

Monitoring Tools in the Offline DQ Environment

The Presenter, HistDB and OMAlib packages can be used for offline DQ, acting only on root files from the reconstruction jobs.

Since DIM is not used offline, Tasknames are assigned conventionally to the savesets (presently: "Brunel" or "DaVinci").

See the previous paragraphs for accessing the HistDB from the Presenter and declaring the histograms from the offline world. Just ignore the "DIM server not found" message at the Presenter startup.

Note that no standard saveset and reference paths have been defined yet in the DQ environment, but the Presenter requires those paths to point to existing directories with read access, so take care to specify the paths in your Presenter config file.

A centralized automatic analysis as in the online case is not foreseen (yet) for the offline DQ. Analysis checks can be defined as in the online case (DB driven or custom) and messages can be stored to the DB, but the analysis jobs have to be run by hand in test mode, by providing the input files via options.

You can find some examples here:

/afs/cern.ch/user/g/ggiacomo/public/OMAlib_DQ_lbocr (to run the analysis job from the DQ farm, where it's possible to store messages to the HistDB)

/afs/cern.ch/user/g/ggiacomo/public/OMAlib_DQ (to run the analysis job from wherever, but without persistence for messages)

Todo List

Features/fixes already requested and planned for some future release:

  • analysis alarms to ECS Alarm Screen
  • analysis with fit: show fit function on histogram with alarm

-- Main.ggiacomo - 15 Jan 2010

Topic attachments
I Attachment ActionSorted ascending Size Date Who Comment
PNGpng defineAna.png manage 41.8 K 2010-06-15 - 16:18 GiacomoGraziani  
PNGpng dispopt2.png manage 45.3 K 2010-06-15 - 15:59 GiacomoGraziani  
PNGpng histmode2.png manage 112.7 K 2010-01-25 - 15:05 GiacomoGraziani  
PNGpng histoFlow.png manage 111.5 K 2010-01-15 - 18:01 GiacomoGraziani  
PNGpng merged.png manage 70.5 K 2010-01-18 - 16:27 GiacomoGraziani  
Topic revision: r29 - 2011-08-03 - GiacomoGraziani
 

TWIKI.NET
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback