236 0

Accelerating HPC Applications using Vector Offloading Technique and Custom Vector ISA Design

Title
Accelerating HPC Applications using Vector Offloading Technique and Custom Vector ISA Design
Author
손영빈
Advisor(s)
서지원
Issue Date
2023. 2
Publisher
한양대학교
Degree
Master
Abstract
As the amount of data and calculations required by recent emerging applications has rapidly increased, most modern processors contain a vector accelerator or internal vector units for the fast computation of target applications. However, it is difficult to accelerate applications with general-purpose vector ISA and vectorization efficiently. Because a fixed set of SIMD ISA lacks instructions to handle complex operations, they often suffer from low applicability and require program rewriting. Thus, to handle complex operations, the throughput of vector instructions would be increased, or vector instructions may not be used. Moreover, general application code often has low data parallelism and data dependencies, making vectorization difficult. Therefore, vector units are often underutilized or remain idle because of the challenges faced in vector code generation. In this paper, we propose two methods for accelerating applications effectively by using a Hwacha vector unit provided by a RISC-V infrastructure based on the profiled results of a Polybench benchmark suite that requires high-performance computing. 1) Custom ISA: Based on the profiled results, we suggest custom vector instructions that can improve application performance and design vector ISA. Specifically, we propose new reduction-based, permutation, and complex vector instructions to support several program patterns that were not effectively performed in the existing vector architecture. In addition, newly proposed custom instructions are implemented in the compiler and ISA-level functional simulator and verified that these proposed instructions are generated without any problem. 2) Vector Offloader: We propose the Vector Offloader for executing scalar programs, which considers the vector unit as a scalar operation unit to solve this underutilization problem. By using vector masking, an appropriate partition of the vector unit can be utilized to support scalar instructions. To efficiently utilize all execution units, including the vector unit, the Vector Offloader suggests running the target applications concurrently in both the central processing unit (CPU) and the decoupled vector units by offloading some parts of the program to the vector unit. Furthermore, a profile-guided optimization technique is employed to determine the optimal offloading ratio for balancing the load between the CPU and the vector unit. Experimental results show that the proposed technique achieved performance improvements up to 1.31× better than the simple, CPU-only execution on a field programmable gate array (FPGA)-level evaluation with a Polybench benchmark set.
URI
http://hanyang.dcollection.net/common/orgView/200000651642https://repository.hanyang.ac.kr/handle/20.500.11754/179814
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > ARTIFICIAL INTELLIGENCE(인공지능학과) > Theses(Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE