Wednesday, April 10, 2013

The Queen Mary Grid Cluster

The qmul grid/htc cluster is a high throughput (htc) research computing cluster based at Queen Mary, University of London. We primarily serve the scientific grid community and are funded by the griddpp
collaboration (i.e. uk stfc research council). By high throughput we mean the ability to do lots of individual separate jobs. Our main workload is data analysis for the ATLAS experiment at cern. We are the top site in the UK for this type of work, and one of the leading sites for the ATLAS LHC experiment in the world. We are part of the LondonGrid (hence the post to this blog!)

Our cluster comprises of:

For running the actual jobs
30 Dell C6100 using X5650s processors, contributing a total of 2880 job slots, and
60 older streamline nodes using E5420 processors, contributing a total of 480 job slots.

For Storage we run the Lustre parallel file system using
72 Dell R510s with 1800 TBytes of disk and
12 older Dell 1950s with MD100 disk arrays with 360TB of disk
Our actual provision is about 1600TB due to the use of raid 6 and "real" disk sizes.

We have a lot of development work to do over the next year which I hope to describe over the coming month in this blog including...

  • A new monitoring system probably based on opennms.
  • A new deployment system, to replace our hand made perl/mason/kickstart system probably using razor and puppet.
  • A cloud stack, we've been doing scientific computing using the grid software, but this model of computing is likely to be replaced with a cloud type model, we will need to look at the various options (OpenStack, CloudStack or OpenNebula).

The 11 racks of the QMUL cluster

No comments: