HaRP: Hardware Free Reduction Pattern Based Layerwise Approximation Framework
- Title
- HaRP: Hardware Free Reduction Pattern Based Layerwise Approximation Framework
- Author
- 김태정
- Alternative Author(s)
- Tae Jung KIM
- Advisor(s)
- 박영준
- Issue Date
- 2022. 8
- Publisher
- 한양대학교
- Degree
- Master
- Abstract
- Although convolutional neural network (CNN) has been successfully adapted in many
emerging fields, their performance is limited in several environments with limited
computational resources because they require many computational resources. Therefore,
several model-compression techniques have been studied. Existing model compression
strategies for CNN usually employ method of controlling the number of bits or reducing the
channels inside the neural network layer. Half precision, a representative bit control
technique, is a widely used approximation approach that allows neural networks to be light.
However, if a system does not support this method, its efficiency cannot be guaranteed.
Channel pruning is a technique that analyzes channels using a pooling method and leaves
important channels; however, it has the potential to reduce weight in a smaller unit.
In this study we proposed an approximation method called HaRP that applies a reduction
pattern-based approximation method. In HaRP, to effectively improve speed, even in systems
with limited computational resources, operations are skipped at the element level, like
channel pruning. With this pruning method, the calculation loss is solved by multiplying
the compensation value by the number of calculations skipped. In addition, by performing an
approximation at the warp level, the approximation can be performed effectively. However,
because performing element-wise approximation may skip the operation of important
elements, the approximation is performed carefully by analyzing the output accuracy of each
layer. Finally, based on the analyzed information, information on the approximation level for
each layer is created as a parameter, allowing it to be reused in other systems. Among the
models we approximated, our framework showed a speedup of 45.4% on the NVIDIA Titan XP
when used alone on AlexNet and a speed-up of up to 79.41% on the NVIDIA RTX 3090 when
used with half precision in the experiments.
- URI
- http://hanyang.dcollection.net/common/orgView/200000627757https://repository.hanyang.ac.kr/handle/20.500.11754/186800
- Appears in Collections:
- GRADUATE SCHOOL[S](대학원) > ARTIFICIAL INTELLIGENCE(인공지능학과) > Theses(Master)
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML