Friday, July 31, 2009

Comparing ATLAS analysis at RHUL using the file-staging and RFIO approaches

I have been looking at the performance of the Royal Holloway cluster during Hammercloud tests in which data was accessed directly from the DPM pool nodes using the RFIO protocol and comparing it to the recent UK-wide file-staging test (540).

For the RFIO approach two identical tests (537 and 538) were requested in order to ensure enough jobs arrived on site. The RFIO IOBUFSIZE was set to 4KB. Job CPU efficiencies and cluster throughput (the product of number of running jobs and average job efficiency) were extracted using Sam and Dug's script. The job throughput climbed steadily up to a peak at around 320 running jobs. At this point the throughput started to decline probably compounded by the fact that one of the disk servers lost a disk and became over-loaded.


The CPU efficiency declined relatively consistently as the number of running jobs increased:


Each job was reading data at about 1 MB/s so that at the peak the total bandwidth was around 350 MB/s - roughly 30 MB/s per disk server. The disk servers were working hard, however, the iostat %util values were around 100% with high cpu iowait values.

So how do these results compare to those obtained when staging files to the worker node prior to analysis? This graph shows the same RFIO throughput data together with results from the recently run file-staging test:



The throughput during file-staging leveled off earlier - at around 175 running jobs. Similarly the average job efficiency drops more steeply:


The job failure rate for the RFIO tests was 4% compared to 17% for the file-staging test.

1 comment:

rhcl said...

=> in the test with file staging to WN, did the WN have only one disk? what if the files had been staged to two disks per WN?

=> Would locality-conscious process scheduling help? Possible with RHUL's scheduler?

http://www2.computer.org/portal/web/csdl/transactions/tpds;jsessionid=CD59A4504FE04B01E087C90E845A5B11#4

=> It would be interesting to see if QMUL shows the same CPU/Walltime decrease.