115 0

Revealing Optimal Solutions in Deep Reinforcement Learning through Knowledge Distillation

Title
Revealing Optimal Solutions in Deep Reinforcement Learning through Knowledge Distillation
Author
이봉준
Alternative Author(s)
LI FENGJUN
Advisor(s)
조인휘
Issue Date
2024. 2
Publisher
한양대학교 대학원
Degree
Master
Abstract
Deep reinforcement learning is a combination of deep learning and reinforcement learning, by integrating the powerful perception capabilities of deep learning in visual perception tasks and the decision-making abilities of reinforcement learning. A3C (Actor-Critic Algorithm) is an asynchronous deep reinforcement learning method that leverages the multi-threading capability of CPUs to execute multiple agents in parallel environments. However, it often obtains suboptimal solutions and the evaluation policy is not highly efficient, leading to high bias. Moreover, the A3C algorithm requires significant computational resources to support parallelized training, which can be challenging for certain devices and environments. To address these issues, an explainable deep reinforcement learning knowledge distillation method (ERL-KD) is proposed. This method collects scores from every sub-agent, with the main agent acting as the primary teacher network and the sub-agents as auxiliary teacher networks. The global optimal solution is obtained through interpretable methods such as Shapley value. Additionally, the concept of similarity constraint is introduced to adjust the similarity between the teacher network and the student network, encouraging the student network to explore freely. The experimental results demonstrate that the student network achieves performance comparable to that of a large-scale teacher network in the Atari2600 environment. Comparing ERL-KD with traditional A3C in the Atari2600 environment revealed that ERL-KD mitigated A3C's issues of suboptimal solutions and resource-intensive training. ERL-KD's student network surpassed A3C's performance and approached that of a larger-scale teacher network. The introduction of the similarity constraint notably improved the student network's exploration, showcasing ERL-KD's efficacy in resource- constrained scenarios
URI
http://hanyang.dcollection.net/common/orgView/200000720489https://repository.hanyang.ac.kr/handle/20.500.11754/188394
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE(컴퓨터·소프트웨어학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE