622 0

Full metadata record

DC FieldValueLanguage
dc.contributor.author최정욱-
dc.date.accessioned2021-10-13T02:35:02Z-
dc.date.available2021-10-13T02:35:02Z-
dc.date.issued2020-03-
dc.identifier.citationProceedings of Machine Learning and Systems 2 (MLSys 2020), Page. 1-16en_US
dc.identifier.urihttps://proceedings.mlsys.org/paper/2020/hash/903ce9225fca3e988c2af215d4e544d3-Abstract.html-
dc.identifier.urihttps://repository.hanyang.ac.kr/handle/20.500.11754/165451-
dc.description.abstractWe present a high-performance Transformer neural network inference accelerator named OPTIMUS. Optimus has several features for performance enhancement such as the redundant computation skipping method to accelerate the decoding process and the Set-Associative RCSC (SA-RCSC) sparse matrix format to maintain high utilization even when a large number of MACs are used in hardware. OPTIMUS also has a flexible hardware architecture to support diverse matrix multiplications and it keeps all the intermediate computation values fully local and completely eliminates the DRAM access to achieve exceptionally fast single batch inference. It also reduces the data transfer overhead by carefully matching the data compute and load cycles. The simulation using the WMT15 (EN-DE) dataset shows that latency of OPTIMUS is 41.62×, 24.23×, 16.01× smaller than that of Intel(R) i7 6900K CPU, NVIDIA Titan Xp GPU, and the baseline custom hardware, respectively. In addition, the throughput of OPTIMUS is 43.35×, 25.45× and 19.00× higher and the energy efficiency of OPTIMUS is 2393.85×, 1464× and 19.01× better than that of CPU, GPU and the baseline custom hardware, respectively.en_US
dc.description.sponsorshipThis research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ICT Consilience Creative program (IITP-2019-2011-1-00783) supervised by the IITP(Institute for Information & communications Technology Promotion).en_US
dc.language.isoenen_US
dc.publisherMachine Learning and Systemsen_US
dc.titleOPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network acceleratoren_US
dc.typeArticleen_US
dc.relation.page1-16-
dc.contributor.googleauthorPark, Junki-
dc.contributor.googleauthorYoon, Hyunsung-
dc.contributor.googleauthorAhn, Daehyun-
dc.contributor.googleauthorChoi, Jungwook-
dc.contributor.googleauthorKim, Jae-Joon-
dc.sector.campusS-
dc.sector.daehakCOLLEGE OF ENGINEERING[S]-
dc.sector.departmentDEPARTMENT OF ELECTRONIC ENGINEERING-
dc.identifier.pidchoij-
Appears in Collections:
COLLEGE OF ENGINEERING[S](공과대학) > ELECTRONIC ENGINEERING(융합전자공학부) > Articles
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE