2016年6月Intel高性能计算培训.pdf
Software Optimization Case Study Yu-Ping Zhao Yuping.zhao@intel.com Agenda RELION Background RELION ITAC and VTUE Analyze RELION Auto-Refine Workload Optimization RELION 2D Classification Workload Optimization Further Optimization Background • Cryo-electron microscopy (cryo-EM), is a form of transmission electron microscopy (TEM) where the sample is studied at cryogenic temperatures • RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM). • http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Main_Page • Relion use MPI (Message Passing Interface) for distributed-memory parallelisation, and POSIX threads for shared-memory parallelisation Agenda RELION Background RELION ITAC and VTUE Analyze RELION Auto-Refine Workload Optimization RELION 2D Classification Workload Optimization Further Optimization RELION ITAC Report Load balance of MPI process is not very good RELION VTune Analyze High Spin time(mostly focused on MPI Communication) CPU usage is not balanced between processes RELION different workload VTune Analyze 3D Auto Refine hotspots and 2D Classification hotspots Agenda RELION Background RELION VTUE Analyze RELION Auto-Refine Workload Optimization RELION 2D Classification Workload Optimization Further Optimization Optimization(1) –Data Alignment Align memory allocation with 64 byte Reload new() and delete() in multidim_array.h, it helps to other optimization Optimization(2) – Vector No.1 hotspot Optimized code(Major loop was vectored with SIMD successfully) Original code Intel Vtune Analyze Compare(Before/After Optimize) RELION Optimization(3) –inline function Inline function to make it be vectorized, #vi src/ml_optimizer.optrpt Before Vectorize LOOP BEGIN at src/ml_optimiser.cpp(3652,13) remark #15523: loop was not vectorized: loop control variable n was found, but loop iteration count cannot be computed before executing the loop LOOP END After Vectorize LOOP BEGIN at src/ml_optimiser.cpp(3654,97) … remark #15300: LOOP WAS VECTORIZED .. remark #15478: estimated potential speedup: 3.990 LOOP END ... ... Intel Vtune Hotspots Analyze Agenda RELION Background RELION ITAC and VTUE Analyze RELION Auto-Refine Workload Optimization RELION 2D Classification Workload Optimization Further Optimization Optimize(1) –Remove vector dependence Get hotspots info from Vtune analyze Optimize(1) - Remove vector dependence Line 4610: for loop is not vectorized Optimize(1) - Remove vector dependence Split hot loop into three parts, vectorized the first two loops. Optimize(1) - Remove vector dependence Reduce the loop number of third loop Optimize (1)-Remove vector dependence Vector report, First and second loops was vectorized Optimize (2)-Remove exception calls Get hotspots info from Vtune analyze Optimize (2)-Remove exception calls Line 410: For loop is not vectorized Optimize (2)-Remove exception calls Optimize (2)-Remove exception calls Vectorized the for loop For(int i=0;i

2016年6月Intel高性能计算培训.pdf




