328 0

HaRP: Hardware Free Reduction Pattern Based Layerwise Approximation Framework

Title
HaRP: Hardware Free Reduction Pattern Based Layerwise Approximation Framework
Author
김태정
Alternative Author(s)
Tae Jung KIM
Advisor(s)
박영준
Issue Date
2022. 8
Publisher
한양대학교
Degree
Master
Abstract
Although convolutional neural network (CNN) has been successfully adapted in many emerging fields, their performance is limited in several environments with limited computational resources because they require many computational resources. Therefore, several model-compression techniques have been studied. Existing model compression strategies for CNN usually employ method of controlling the number of bits or reducing the channels inside the neural network layer. Half precision, a representative bit control technique, is a widely used approximation approach that allows neural networks to be light. However, if a system does not support this method, its efficiency cannot be guaranteed. Channel pruning is a technique that analyzes channels using a pooling method and leaves important channels; however, it has the potential to reduce weight in a smaller unit. In this study we proposed an approximation method called HaRP that applies a reduction pattern-based approximation method. In HaRP, to effectively improve speed, even in systems with limited computational resources, operations are skipped at the element level, like channel pruning. With this pruning method, the calculation loss is solved by multiplying the compensation value by the number of calculations skipped. In addition, by performing an approximation at the warp level, the approximation can be performed effectively. However, because performing element-wise approximation may skip the operation of important elements, the approximation is performed carefully by analyzing the output accuracy of each layer. Finally, based on the analyzed information, information on the approximation level for each layer is created as a parameter, allowing it to be reused in other systems. Among the models we approximated, our framework showed a speedup of 45.4% on the NVIDIA Titan XP when used alone on AlexNet and a speed-up of up to 79.41% on the NVIDIA RTX 3090 when used with half precision in the experiments.
URI
http://hanyang.dcollection.net/common/orgView/200000627757https://repository.hanyang.ac.kr/handle/20.500.11754/186800
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > ARTIFICIAL INTELLIGENCE(인공지능학과) > Theses(Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE