RAL Tier-1 Statistics

We currently use Ganglia (an open-source distributed monitoring and execution system) for collecting state information from all nodes in the Tier-1 service. As well as the standard monitoring for CPU load, memory usage and similar, we have added additional state-gathering scripts, using gmetric, to collect queue data from our OpenPBS batch server (as seen on the previous page), and information from the Atlas DataStore.

The last day's load on our SL4 batch workers:
SL4 Workers LOAD last day

The last day's network traffic to and from the disk servers:
Disk store network last day

The extent of the available information can be seen on our main monitoring site, at http://ganglia.gridpp.rl.ac.uk/.

In addition to the ganglia-based statistics, we process the OpenPBS log files for completed jobs, and store the information in a database for more detailed investigation. Plots are also made from this data:
Pie chart of CPU usage by group

Further statistics can be found at http://www.gridpp.rl.ac.uk/stats/.

Back to the Tier-1 homepage


Last modified Wed 22 July 2009 . View page history
Switch to HTTPS . Website Help . Print View . Built with GridSite 1.4.3
For more about GridPP please contact Neasan O'Neill