Repository at Hanyang University: OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator

620 0

OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator

Title: OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator

Citation: Proceedings of Machine Learning and Systems 2 (MLSys 2020), Page. 1-16

Abstract: We present a high-performance Transformer neural network inference accelerator named OPTIMUS. Optimus has several features for performance enhancement such as the redundant computation skipping method to accelerate the decoding process and the Set-Associative RCSC (SA-RCSC) sparse matrix format to maintain high utilization even when a large number of MACs are used in hardware. OPTIMUS also has a flexible hardware architecture to support diverse matrix multiplications and it keeps all the intermediate computation values fully local and completely eliminates the DRAM access to achieve exceptionally fast single batch inference. It also reduces the data transfer overhead by carefully matching the data compute and load cycles. The simulation using the WMT15 (EN-DE) dataset shows that latency of OPTIMUS is 41.62×, 24.23×, 16.01× smaller than that of Intel(R) i7 6900K CPU, NVIDIA Titan Xp GPU, and the baseline custom hardware, respectively. In addition, the throughput of OPTIMUS is 43.35×, 25.45× and 19.00× higher and the energy efficiency of OPTIMUS is 2393.85×, 1464× and 19.01× better than that of CPU, GPU and the baseline custom hardware, respectively.

URI: https://proceedings.mlsys.org/paper/2020/hash/903ce9225fca3e988c2af215d4e544d3-Abstract.html https://repository.hanyang.ac.kr/handle/20.500.11754/165451

Appears in Collections:: COLLEGE OF ENGINEERING[S](공과대학) > ELECTRONIC ENGINEERING(융합전자공학부) > Articles

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository