94 0

Full metadata record

DC FieldValueLanguage
dc.contributor.advisor서지원-
dc.contributor.author원준호-
dc.date.accessioned2024-03-01T07:38:43Z-
dc.date.available2024-03-01T07:38:43Z-
dc.date.issued2024. 2-
dc.identifier.urihttp://hanyang.dcollection.net/common/orgView/200000726139en_US
dc.identifier.urihttps://repository.hanyang.ac.kr/handle/20.500.11754/188388-
dc.description.abstractAs deep learning models advance, so does the interest in utilizing deep learning applications in our daily lives. Edge devices, which have a wide range of applications such as autonomous vehicles, smart home appliances, and the Internet of Things (IoT), are devices that are connected to a network and act as an entry point for sending data to perform computations or operate without a network connection. For edge devices that mainly process real-time data, the inference speed of deep learning models is one of the most important factors. Existing techniques include the on-device method, which optimizes the model for the edge device and processes all operations within the edge device, and the cloud method, which collects data from the edge device and transmits the data to the computation server using network communication, then performs the actual computation on the computation server and receives the result back on the edge device. However, the on-device method often suffers from the low computation performance of edge devices, and the cloud method has limitations such as overloading the computing server and privacy issues. Therefore, to compensate for this, collaborative inference has been proposed, which divides the deep learning model and performs computation on both the server and the edge device. However, as deep learning models have become increasingly complex, it has become very difficult for humans to partition deep learning models. This paper introduces a framework that compiles deep learning models using TVM, a deep learning compiler, optimizes various models, converts them into Intermediate Representation (IR) graphs, and automatically partitions deep learning models by analyzing the connections of the graphs. In an experiment where the partitioned model was distributed using an Nvidia Jetson Xavier NX board as an edge device and a GTX 1060 desktop as a server, it is found that in Resnet101 and Resnet152, the model partition was faster than the edge device-only execution time in the partition ratio of 40%~60% and 10~33%, respectively, and there was a point where the server overload could be reduced. In addition, to reduce the data transfer overhead, which is the most important bottleneck in collaborative inference between server and edge, the framework automatically captured the point where the data size was reduced in the model, which could be used as one of the criteria for deep learning model partitioning. It was found that the model data minimization point and automatic partitioning method can reduce the transmitted and received data by up to 33% for the basic partition and up to 82% for the quantized partition in Unet, Resnet50, Resnet101, and Resnet152.-
dc.publisher한양대학교 대학원-
dc.titleOptimizing Deep Learning Model Inference using Efficient Model Partitioning on Edge Devices-
dc.typeTheses-
dc.contributor.googleauthor원준호-
dc.contributor.alternativeauthorWOHN JUN HO-
dc.sector.campusS-
dc.sector.daehak대학원-
dc.sector.department컴퓨터·소프트웨어학과-
dc.description.degreeMaster-
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE(컴퓨터·소프트웨어학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE