Improving the Performance of HiRep Lattice Simulations Software by Exploiting the CPU Hardware Architecture Details and Algorithm Characteristics

Rahman, Md Shidur

dc.contributor.supervisor	Kelefouras, Vasilios
dc.contributor.author	Rahman, Md Shidur
dc.contributor.other	Faculty of Science and Engineering	en_US
dc.date.accessioned	2024-11-14T09:49:02Z
dc.date.available	2024-11-14T09:49:02Z
dc.date.issued	2024
dc.identifier	10698869	en_US
dc.identifier.uri	https://pearl.plymouth.ac.uk/handle/10026.1/22611
dc.description.abstract	In the scientific exploration of Quantum Chromodynamics (QCD)—the theory governing the strong interaction among quarks and gluons—large-scale numerical simulations are per formed using the framework of lattice gauge theories. Lattice Gauge Theory (LGT) simulations involve the formulation of gauge field theories on a space-time lattice. HiRep is a simulation suite designed for running lattice simulations, leveraging high-performance computing platforms. HiRep is designed to be flexible enough to study a wide range of strongly interacting systems, particularly those pertinent to novel physics investigations at CERN’s Large Hadron Collider (LHC). However, improving the execution time of HiRep is a challenging and non-trivial task. Even marginal improvements in HiRep’s execution time can have a significant impact on paving the way to new discoveries in the field of particle physics. However, a detailed study, analysis, and profiling of the HiRep application revealed that the implementation of the Dirac operator is one of the most computationally intensive routines, serving as the main performance bottleneck. Consequently, this routine was optimized for CPU-based distributed-memory hardware platforms. The main performance inefficiencies in clude communication overhead due to extensive data exchanges between MPI processes, work load imbalances in OpenMP regions, inefficient data reuse of lattice sites, and ineffective auto vectorization. To this end, both algorithmic and hardware-dependent optimization strategies are employed. These strategies include efficient hybrid parallelization (using both MPI and OpenMP parallel programming frameworks), optimizing OpenMP parallelism through loop collapsing, memory access patterns optimization, and vectorization (using both AVX2 and Clang compiler’s vector intrinsics). Based on experimental results obtained from two distinct High-Performance Computing (HPC) platforms, the proposed optimizations boost the performance of HiRep, achieving an overall speedup of up to ×1.80 compared to the baseline MPI version.	en_US
dc.language.iso	en
dc.publisher	University of Plymouth
dc.subject	Lattice simulation, Dirac operator, Performance optimization, Hybrid programming (MPI+OpenMP), Memory access patterns, Vectorization	en_US
dc.subject.classification	PhD	en_US
dc.title	Improving the Performance of HiRep Lattice Simulations Software by Exploiting the CPU Hardware Architecture Details and Algorithm Characteristics	en_US
dc.type	Thesis
plymouth.version	publishable	en_US
dc.identifier.doi	http://dx.doi.org/10.24382/5244
dc.rights.embargoperiod	No embargo	en_US
dc.type.qualification	Doctorate	en_US
rioxxterms.version	NA
plymouth.orcid_id	https://orcid.org/0000-0001-5192-4527	en_US

Files in this item

Name:: 2024rahman10698869phd.pdf
Size:: 2.119Mb
Format:: PDF
Description:: Thesis - full version

View/Open

Name:: license.txt
Size:: 3.016Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

01 Research Theses Main Collection
Research Theses Main

Show simple item record