Repository at Hanyang University: 상태공간 분할을 통한 강화학습 기반의 교전 알고리즘

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL OF ENGINEERING[S](공학대학원)ELECTRICAL ENGINEERING AND COMPUTER SCIENCE(전기ㆍ전자ㆍ컴퓨터공학과)Theses (Master)

332 0

상태공간 분할을 통한 강화학습 기반의 교전 알고리즘

Title: 상태공간 분할을 통한 강화학습 기반의 교전 알고리즘

Other Titles: Combat Algorithm based on Reinforcement Learning through Division of State Space

Author: 이성후

Alternative Author(s): Lee, Sung hu

Advisor(s): 조인휘

Issue Date: 2020-08

Publisher: 한양대학교

Degree: Master

Abstract: 컴퓨터 게임이나 시뮬레이션된 환경을 통해 인공지능 기법을 연구하는 것은 오랫동안 진행되어왔으며 주로 플레이어에게 승리하거나 게임의 흥미를 위해 플레이어의 적응을 목표로 한다. 인공지능 학습 기법 중 대표적인 강화학습 방법은 환경과 에이전트 간에 최적의 목표를 수행하기 위해 상호작용을 반복하며 점진적으로 정책 함수를 갱신하는 과정으로 많은 보상을 얻을 수 있게 학습하는 방법으로 이와 같은 방법은 이미 정의된 환경에서 학습을 진행하기 때문에 정의되지 않은 현실 상황에서 사용하기에는 제한이 존재한다. 본 논문에서는 정의되지 않은 현실 상황과 유사하게 상황을 조성할 수 있는 시뮬레이션 엔진을 활용하여 강화학습이 가능한 모델을 설계하고, 상황에서 얻어지는 수 많은 환경 정보와 입력 값으로 학습을 진행하는 기존 강화학습 알고리즘을 수정하여 환경 정보와 입력 값을 분해하여 학습을 진행하는 강화학습 알고리즘을 제안한다. 성능 측정 방식은 기존 DQN 강화학습 알고리즘과 상태공간을 분할한 강화학습 알고리즘이 Behavior Tree로 제작된 규칙 기반 에이전트와 교전하였고 서로 간에 교전하는 방식으로 실험하였다. 그 결과, 유효 타격 회수 및 생존시간에서 기존 강화학습 알고리즘보다 높은 성능 우위를 보였으며 동일한 성능까지의 학습 시간 또한 개선됨을 보였다.; The Research of artificial intelligence techniques through computer games or simulated environments has been around for a long time and mainly aims at adapting players to win or to win the game. Because learning is conducted in a defined environment, there are limitations to using it in an undefined reality. In this paper, the existing reinforcement learning algorithm that designs a model capable of reinforcement learning by using a simulation engine that can create a situation similar to an undefined real situation and conducts learning with a lot of environmental information and input values obtained from the situation We propose a reinforcement learning algorithm that performs learning by decomposing state information and input values by modifying. In the performance measurement method, the existing DQN reinforcement learning algorithm and the reinforcement learning algorithm that segmented the state space engaged with a rule-based agent made of Behavior Tree and experimented with each other. As a result, it showed a higher performance advantage than the existing reinforcement learning algorithm in the effective hitting number and survival time, and the learning time to the same performance was also reduced.

URI: https://repository.hanyang.ac.kr/handle/20.500.11754/153296 http://hanyang.dcollection.net/common/orgView/200000438541

Appears in Collections:: GRADUATE SCHOOL OF ENGINEERING[S](공학대학원) > ELECTRICAL ENGINEERING AND COMPUTER SCIENCE(전기ㆍ전자ㆍ컴퓨터공학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE