Exploring the Performance Impact of Virtualization ! on an HPC Cloud Nuttapong Chakthranont†, Phonlawat Khunphet†, ! Ryousei Takano‡, Tsutomu Ikegami‡ †King Mongkut’s University of Technology North Bangkok, Thailand! ‡National Institute of Advanced Industrial Science and Technology, Japan IEEE CloudCom 2014@Singapore, 18 Dec. 2014
Outline • HPC Cloud • AIST Super Green Cloud (ASGC) • Experiment • Conclusion 2
HPC Cloud • HPC users begin to take an interest in the Cloud. – e.g., CycleCloud, Penguin on Demand • Virtualization is a key technology. – Pro: a customized software environment, elasticity, etc – Con: a large overhead, spoiling I/O performance. • VMM-bypass I/O technologies, e.g., PCI passthrough and ! SR-IOV, can significantly mitigate the overhead.
Motivating Observation • Performance evaluation of HPC cloud – (Para-)virtualized I/O incurs a large overhead. – PCI passthrough significantly mitigate the overhead. KVM (IB) KVM (virtio) 300 BMM (IB) BMM (10GbE) VM1 VM1 KVM (IB) KVM (virtio) 250 s] d Guest OS Guest OS n 200 eco Physical Guest Improvement by ! driver driver e [s m 150 PCI passthrough ti n o ti 100 VMM VMM ecu Ex 50 Physical Bypass driver 0 BT CG EP FT LU The overhead of I/O virtualization on the NAS Paral el IB QDR HCA 10GbE NIC Benchmarks 3.3.1 class C, 64 processes. BMM: Bare Metal Machine
HPC Cloud • HPC users begin to take an interest in the Cloud. – e.g., CycleCloud, Penguin on Demand • Virtualization is a key technology. – Pro: a customized software environment, elasticity, etc – Con: a large overhead, spoiling I/O performance. • VMM-bypass I/O technologies, e.g., PCI passthrough and ! SR-IOV, can significantly mitigate the overhead. We quantitatively assess the performance impact of virtualization! to demonstrate the feasibility of HPC Cloud.
Outline • HPC Cloud • AIST Super Green Cloud (ASGC) • Experiment • Conclusion 6
Easy to Use Supercomputer! - Usage Model of AIST Cloud - 1. Choose a template image of! Virtual cluster a virtual machine HPC BigData Web apps + Ease of use Allow users to Launch a virtual 2. Add required software! customize their machine when package virtual clusters necessary deploy take snapshots HPC 3. Save a user-customized! template image VM template image files 7
Easy to Use Supercomputer! - Elastic Virtual Cluster - Submit a job Virtual Cluster Create a virtual cluster Frontend node (VM) Login node Job scheduler NFSd sgc-tools cmp.! cmp.! cmp.! node node node Scale in/! scale out InfiniBand Image! repository Note: a single VM runs on a node. 8
ASGC Hardware Spec. • 155 node-cluster consists of Cray H2312 blade server • The theoretical peak performance is 69.44 TFLOPS • The operation started from July, 2014 Compute Node CPU Intel Xeon E5-2680v2/2.8GHz ! (10 core) x 2CPU Memory 128 GB DDR3-1866 InfiniBand Mellanox ConnectX-3 (FDR) Ethernet Intel X520-DA2 (10 GbE) Disk Intel SSD DC S3500 600 GB 9
Outline • HPC Cloud • AIST Super Green Cloud (ASGC) • Experiment • Conclusion 11
Benchmark Programs Micro benchmark – Intel Micro Benchmark (IMB) version 3.2.4 Application-level benchmark – HPC Chal enge (HPCC) version 1.4.3 • G-HPL • EP-STREAM • G-RandomAccess • G-FFT – OpenMX version 3.7.4 – Graph 500 version 2.1.4
IMB MPI Point-to-point communication 10$ 5.85GB/s 5.69GB/s /s) GB put)( 1$ ugh The overhead is less than 3% with large message, ro though it is up to 25% with smal message. Th Physical$Cluster$ Virtual$Cluster$ 0.1$ 1$ 1024$ Message)Size)(KB) 13
IMB MPI Collectives (64bytes) Allgather Allreduce ) 5000 1,200 c) Physical Cluster Physical Cluster 4000 se 1,000 Virtual Cluster (usec Virtual Cluster (u +77% e e 800 3000 +88% im im 600 T 2000 T n n 400 tio 1000 tio 200 cu cu e e 0 0 Ex Ex 0 32 64 96 128 0 32 64 96 128 Number of Nodes Number of Nodes 6000 Alltoall c) se Physical Cluster +43% The overhead becomes (u 4000 e Virtual Cluster significant as the number im T n 2000 of nodes increases. tio cu e 0 … load imbalance? Ex 0 32 64 96 128 Number of Nodes
HPCC G-HPL (LINPACK) Performance degradation:! 60 5.4 - 6.6% Efficiency* on 128 nodes ) 50 Physical: 90% S P Virtual: 84% 40 LO F *) Rmax / Rpeak (T 30 ce n a 20 rm 10 Physical Cluster rfo Virtual Cluster Pe 0 0 32 64 96 128 Number of Nodes 15
HPCC EP-STREAM and G-FFT The overheads are ignorable. EP-STREAM G-FFT 6 160 memory intensive al -to-al communication! with no communication ) with large messages S s) P 120 B/ LO 4 F (G ce (G n a ce n 80 a rm rm rfo 2 e P rfo 40 Pe Physical Cluster Physical Cluster Virtual Cluster Virtual Cluster 0 0 0 32 64 96 128 0 32 64 96 128 Number of Nodes Number of Nodes
Graph500 Graph500 (replicated-csc, scale 26) 1.00E+10 ) S Performance degradation:! 1.00E+09 EP 2% (64node) (T ce n a 1.00E+08 rm Physical Cluster rfo e Virtual Cluster P 1.00E+07 0 16 32 48 64 Number of Nodes Graph500 is a Hybrid paral el program (MPI + OpenMP). We used a combination of 2 MPI processes and 10 OpenMP threads. 17
Findings • PCI passthrough is effective in improving the I/O performance, however, it is stil unable to achieve the low communication latency of a physical cluster due to a virtual interrupt injection. • VCPU pinning improves the performance for HPC applications. • Almost al MPI col ectives suffer from the scalability issue. • The overhead of virtualization has less impact on actual applications.
Outline • HPC Cloud • AIST Super Green Cloud (ASGC) • Experiment • Conclusion 19
Conclusion and Future work • HPC Cloud is promising. – Micro benchmark: MPI col ectives have the scalability issue. – Application-level benchmarks: the negative impact is limited and the virtualization overhead is about 5%. – Our HPC Cloud operation started from July, 2014. • Virtualization can contribute to system utilization improvements. – SR-IOV – VM placement optimization based on the workloads of virtual clusters 20
Question? Thank you for your attention! Acknowledgments: The authors would like to thank Assoc. Prof. Vara Varavithya, KMUTNB, and Dr. Yoshio Tanaka, AIST, for valuable guidance and advice. In addition, the authors would also like to thank the ASGC support team for preparation and trouble shooting of the experiments. This work was partly supported by JSPS KAKENHI Grant Number 24700040. 21