Repository at Hanyang University: Hardware-aware Optimization of Layer Fusion for Latency-optimal CNN Inference

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)DEPARTMENT OF ELECTRONIC ENGINEERING(융합전자공학과)Theses (Master)

189 0

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	최정욱	-
dc.contributor.author	윤민용	-
dc.date.accessioned	2023-09-27T02:09:18Z	-
dc.date.available	2023-09-27T02:09:18Z	-
dc.date.issued	2023. 8	-
dc.identifier.uri	http://hanyang.dcollection.net/common/orgView/200000684461	en_US
dc.identifier.uri	https://repository.hanyang.ac.kr/handle/20.500.11754/187228	-
dc.description.abstract	Layer fusion is an effective technique for accelerating latency-critical real-time inference tasks in resource-constrained DNN accelerators. However, prior studies on layer fusion have focused on optimizing memory access without considering the hardware architecture of the accelerator, which significantly impacts latency. As the network structure becomes more efficient and the number of layers that can be fused in a fusion group increases, the impact of the hardware architecture of the accelerator on layer fusion performance becomes increasingly important. This paper proposes a method of layer fusion that minimizes latency by considering the hardware architecture of the accelerator. First, we present an analytical cost model for layer fusion that considers various hardware architectures, such as array dimension, buffer size, memory bandwidth, and dataflow of a 2D systolic array-based DNN accelerator. Subsequently, we present the impact of hardware architecture, data reuse optimization, and additional dataflow optimization on the performance of layer fusion. Finally, we propose a latency-optimal layer fusion technique that considers the hardware architecture of the accelerator and layer fusion optimization methods. By applying the proposed analytical cost model and latency-optimal layer fusion technique to various CNN models, hardware architectures, and batch sizes, we were able to reduce the network end-to-end inference latency by up to 41.1% compared to the access-based layer fusion methodology.\|레이어 병합은 리소스가 제한된 DNN 가속기에서 지연시간에 민감한 실시간 추론 작업을 가속하기 위한 효과적인 방법이다. 하지만, 레이어 병합에 대한 지금까지의 연 구들은 메모리 엑서스의 최적화에 중점을 두고, 지연시간에 큰 영향을 미치는 가속기의 하드웨어 아키텍처에 대한 고려가 부족하였다. 네트워크의 구조가 효율적으로 변하고 병합 할 수 있는 레이어의 개수가 증가함에 따라, 하드웨어 아키텍처의 영향은 더욱 증가하고 있다. 본 논문은 가속기의 하드웨어 아키텍처를 고려하여 지연 시간을 최소화 하는 레이 어 병합의 최적화 방법을 제시한다. 우선, 2D systolic array 기반 DNN accelerator의 array dimension, buffer size, memory bandwidth, dataflow와 같은 다양한 하드웨어 아키텍처를 고려한 레이어 병합의 analytical cost model을 제시한다. 이후, 하드웨어 아키텍처, 레이어 병합의 데이터 재사용 최적화, 추가적인 타일링을 통한 데이터 플로우 최적화가 레이어 병합의 성능에 미치는 영향을 제시한다. 마지막으로, 이러한 레이어 병합의 최적화 방법을 모두 고려한 지연 시간 최적 레이어 병합 기법을 제시한다. 제안된 analytical cost model과 지연 시간 최적의 레이어 병합 기법을 다양한 CNN 모델, 하드웨어 아키텍처 및 배치 사이즈에 적용하여, 메모리 액서스만 고려한 레이어 병합 기법 대비 네트워크 전체의 추론 시간을 최대 41.1%까지 감소 시킬 수 있었다.	-
dc.publisher	한양대학교	-
dc.title	Hardware-aware Optimization of Layer Fusion for Latency-optimal CNN Inference	-
dc.title.alternative	지연 시간 최적의 CNN 추론을 위한 하드웨어를 고려하는 레이어 병합 최적화 기법	-
dc.type	Theses	-
dc.contributor.googleauthor	윤민용	-
dc.contributor.alternativeauthor	Minyong Yoon	-
dc.sector.campus	S	-
dc.sector.daehak	대학원	-
dc.sector.department	융합전자공학과	-
dc.description.degree	Master	-

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > DEPARTMENT OF ELECTRONIC ENGINEERING(융합전자공학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show simple item record

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

BROWSE