RAL Tier-1 Statistics
We currently use Ganglia (an open-source distributed monitoring and execution system) for collecting state information from all nodes in the Tier-1 service. As well as the standard monitoring for CPU load, memory usage and similar, we have added additional state-gathering scripts, using gmetric, to collect queue data from our OpenPBS batch server (as seen on the previous page), and information from the Atlas DataStore.
The last day's load on our SL4 batch workers:
The last day's network traffic to and from the disk servers:
The extent of the available information can be seen on our main monitoring site, at http://ganglia.gridpp.rl.ac.uk/.
In addition to the ganglia-based statistics, we process the OpenPBS log
files for completed jobs, and store the information in a database for more
detailed investigation. Plots are also made from this data:

Further statistics can be found at http://www.gridpp.rl.ac.uk/stats/.
Last modified Wed 22 July 2009 . View page history
Switch to HTTPS . Website Help . Print View . Built with GridSite 1.4.3