Monday, April 15, 2013

virtualization performance hit


Like the rest of the world,  there is a lot of discussion going about the use of clouds and virtualization in gridpp. 

http://www.gridpp.ac.uk/gridpp30/mcnab-lhcb-vmclouds-march-2013.pdf

http://www.admin-magazine.com/HPC/articles/the_cloud_s_role_in_hpc

Using virtualization will have a performance impact, so using it for our type of computing (hpc/htc) may not be the best solution. However just what impact does it have? A quick search of the web suggests anywhere between 3 to 30%. Most of the overhead appears to be in the kernel and in i/o.

http://serverfault.com/questions/261974/how-much-overhead-does-x86-x64-virtualization-have

http://www.altechnative.net/2012/08/04/virtual-performance-part-1-vmware/

http://www.anandtech.com/show/3827/virtualization-ask-the-experts-1

I decided that I wanted to do some of my own tests with the focus on the type of work we do in gridpp. 

Testbed: 24 thread westmere processor running at 2.66 GHz + 48 Gig of memory using Scientific Linux 6.3 (basically RHEL6). I'm using the default install of KVM with the virtual image as a local file setup to use all 24 threads.

Benchmarks: 1) I unpack and make the ROOT analysis package using 24 threads; 2) as 1 but using only one thread. 3) I generate 500,000 Montecarlo events using the HERWIG++ Generator; 4) as 3 but I also include the time taken to unpack and install HERWIG++; 5) I run the HEP-SPEC06 benchmark. For tests 1 to 4 i use the TIME command to obtain the real time taken (smaller is better), for 5 I report the hep-spec score (larger is better).  I will run the benchmarks on the bare metal install and on the VM on the same hardware and compare the results.

Results:




Out of the box performance of KVM results in ~3% (CPU intensive) to 20% (sys call intensive) reduction in performance. There is some indication of correlation with ratio of sys time / user time (particular effect with make/tar/gzip?). This is not seen in HEP-SPEC result. SYS time is the CPU time spent within the kernel and from previous studies we expect this to incur a high performance hit in virtualization.

If I get the time I intend to repeat analysis using optimisations (e.g. guest image on LVM). Repeat analysis using fedora 18 ( ~RHEL 7). Repeat using sandybridge cpu. Look at network performance (eg iozone with lustre).

No comments: