Repository at Hanyang University: Revealing Optimal Solutions in Deep Reinforcement Learning through Knowledge Distillation

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)COMPUTER SCIENCE(컴퓨터·소프트웨어학과)Theses (Master)

115 0

Revealing Optimal Solutions in Deep Reinforcement Learning through Knowledge Distillation

Title: Revealing Optimal Solutions in Deep Reinforcement Learning through Knowledge Distillation

Author: 이봉준

Alternative Author(s): LI FENGJUN

Advisor(s): 조인휘

Issue Date: 2024. 2

Publisher: 한양대학교 대학원

Degree: Master

Abstract: Deep reinforcement learning is a combination of deep learning and reinforcement learning, by integrating the powerful perception capabilities of deep learning in visual perception tasks and the decision-making abilities of reinforcement learning. A3C (Actor-Critic Algorithm) is an asynchronous deep reinforcement learning method that leverages the multi-threading capability of CPUs to execute multiple agents in parallel environments. However, it often obtains suboptimal solutions and the evaluation policy is not highly efficient, leading to high bias. Moreover, the A3C algorithm requires significant computational resources to support parallelized training, which can be challenging for certain devices and environments. To address these issues, an explainable deep reinforcement learning knowledge distillation method (ERL-KD) is proposed. This method collects scores from every sub-agent, with the main agent acting as the primary teacher network and the sub-agents as auxiliary teacher networks. The global optimal solution is obtained through interpretable methods such as Shapley value. Additionally, the concept of similarity constraint is introduced to adjust the similarity between the teacher network and the student network, encouraging the student network to explore freely. The experimental results demonstrate that the student network achieves performance comparable to that of a large-scale teacher network in the Atari2600 environment. Comparing ERL-KD with traditional A3C in the Atari2600 environment revealed that ERL-KD mitigated A3C's issues of suboptimal solutions and resource-intensive training. ERL-KD's student network surpassed A3C's performance and approached that of a larger-scale teacher network. The introduction of the similarity constraint notably improved the student network's exploration, showcasing ERL-KD's efficacy in resource- constrained scenarios

URI: http://hanyang.dcollection.net/common/orgView/200000720489 https://repository.hanyang.ac.kr/handle/20.500.11754/188394

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE(컴퓨터·소프트웨어학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE