620 0

OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator

Title
OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator
Author
최정욱
Issue Date
2020-03
Publisher
Machine Learning and Systems
Citation
Proceedings of Machine Learning and Systems 2 (MLSys 2020), Page. 1-16
Abstract
We present a high-performance Transformer neural network inference accelerator named OPTIMUS. Optimus has several features for performance enhancement such as the redundant computation skipping method to accelerate the decoding process and the Set-Associative RCSC (SA-RCSC) sparse matrix format to maintain high utilization even when a large number of MACs are used in hardware. OPTIMUS also has a flexible hardware architecture to support diverse matrix multiplications and it keeps all the intermediate computation values fully local and completely eliminates the DRAM access to achieve exceptionally fast single batch inference. It also reduces the data transfer overhead by carefully matching the data compute and load cycles. The simulation using the WMT15 (EN-DE) dataset shows that latency of OPTIMUS is 41.62×, 24.23×, 16.01× smaller than that of Intel(R) i7 6900K CPU, NVIDIA Titan Xp GPU, and the baseline custom hardware, respectively. In addition, the throughput of OPTIMUS is 43.35×, 25.45× and 19.00× higher and the energy efficiency of OPTIMUS is 2393.85×, 1464× and 19.01× better than that of CPU, GPU and the baseline custom hardware, respectively.
URI
https://proceedings.mlsys.org/paper/2020/hash/903ce9225fca3e988c2af215d4e544d3-Abstract.htmlhttps://repository.hanyang.ac.kr/handle/20.500.11754/165451
Appears in Collections:
COLLEGE OF ENGINEERING[S](공과대학) > ELECTRONIC ENGINEERING(융합전자공학부) > Articles
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE