Repository at Hanyang University: 현실 세계에서 학습 : 로봇의 분말 그립을 위한 심층 강화학습 시스템

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)DEPARTMENT OF CONVERGENCE ROBOT SYSTEM(융합로봇시스템학과)Theses (Master)

561 0

현실 세계에서 학습 : 로봇의 분말 그립을 위한 심층 강화학습 시스템

Title: 현실 세계에서 학습 : 로봇의 분말 그립을 위한 심층 강화학습 시스템

Other Titles: Learning in Real-world : Deep Reinforcement Learning System for the Robot’s Powder Grip

Author: 유승환

Alternative Author(s): Seunghwan Yu

Advisor(s): 박태준

Issue Date: 2022. 2

Publisher: 한양대학교

Degree: Master

Abstract: 분말 소분 공정은 제조기업의 중요한 공정이며 정량의 분말을 정확하게 소분해야하기 때문에, 많은 인력과 시간이 소요된다. 특히 규모가 영세한 중소제조업은 인건비 상승 및 경기변동에 따른 어려움이 많기 때문에, 분말 소분 공정의 자동화를 통한 인력의 효율화가 필요하다. 그러나 분말 소분 공정의 자동화 라인은 고비용이고 다품종 소량생산에 대응하기 어려우므로, 중소제조업에 적합하지 않다. 이로 인해, 많은 중소제조업은 분말 소분을 수작업으로 진행하고 있다. 작업자가 분말의 무게를 측정 시, 오차가 발생하면 공정의 병목 현상으로 이어지기 때문에, 이에 따른 생산성 제고가 요구된다. 이와 같은 문제를 해결하고자, 본문에서 로봇이 분말을 집는 방법을 효과적으로 학습할 수 있는 심층 강화학습 시스템을 제안한다. 본 시스템을 구현하기 위해서는 두 가지 문제점이 있다. 첫째, 분말같이 형태가 없는 물체에 대한 마르코프 의사결정, 특히 상태와 보상에 대한 적합한 정의가 필요하다. 둘째, 현실의 학습 시간을 단축하기 위한 강화학습 모델이 필요하다. 본 시스템에서 사용하는 상태는 색상(RGB) 이미지와 깊이(Depth) 이미지, 보상 함수는 희박한(sparse) 함수와 완만한(smooth) 기울기 함수, 강화학습 모델은 Soft Actor Critic(SAC)과 Data Regularized Q-v2(DrQv2)이다. 적합한 상태를 제안하기 위해, SAC-RGB-sparse와 SAC-Depth-sparse의 평균 기댓값을 비교했다. 또한 적합한 보상 함수를 제안하기 위해, SAC-RGB-sparse와 SAC-RGB-smooth의 평균 기댓값을 비교했다. 마지막으로 적합한 강화학습 모델을 선정하기 위해, SAC-RGB-smooth와 DrQv2-RGB-smooth의 학습이 완료될 때의 평균 기댓값 및 학습 시간을 비교했다. 본 시스템에 적합한 상태로 색상 이미지를 제안한다. SAC-RGB-sparse의 평균 기댓값은 SAC-Depth-sparse보다 0.152 만큼 더 높았다. 또한 보상 함수로 완만한 기울기 함수를 제안한다. SAC-RGB-smooth의 평균 기댓값은 SAC-RGB-sparse보다 0.414 만큼 더 높았다. 마지막으로 강화학습 모델로 DrQ-v2를 제안한다. 20시간 학습 후에, DrQv2-RGB-smooth의 평균 기댓값은 SAC-RGB-smooth보다 3.681 만큼 더 높았다. 이러한 결과를 종합하여, DrQv2-RGB-Smooth는 현실 세계에서 로봇이 분말을 효과적으로 집는 방법을 학습할 수 있는 강화학습 시스템의 유망한 후보임을 보여준다.| The powder subdividing process is an important process for manufacturers and it takes a lot of human resource and time, because the quantitative amount of the powder must be accurately subdivided. In particular, medium-sized manufacturing industries have many difficulties due to rising labor costs and economic fluctuations, so efficiency of human resource needs to be improved through automation of the powder sub-division process. However, since the automated line of the powder subdividing process is expensive and difficult to cope with the production of small quantities of multiple varieties, it is not suitable for medium-sized manufacturing. For this reason, many medium-sized manufacturing industries are manually performing the powder subdividing. When the operator measures the weight of the powder, if an error occurs, it leads to a bottleneck in the process, so productivity improvement is required accordingly. In order to solve such a problem, this paper proposes a deep reinforcement learning system that can effectively learn how to pick up the powder by a robot. There are two problems to implement this system. First, Markov Decision Process, especially the state and reward function’s suitable definition, for non-formal objects such as powder, is needed. Second, a reinforcement learning model is needed to shorten real world’s learning time. The state used in this system is color(RGB) image and depth image, the reward function is sparse and smooth gradient function, and the reinforcement learning model is Soft Actor Critical (SAC) and Data Regularized Q-v2(DrQ-v2). To propose a suitable state, the average return of SAC-RGB-parse and SAC-Depth-parse were compared. In addition, to propose a suitable reward function, we compared the average return of SAC-RGB-sparse and SAC-RGB-smooth. Finally, to select a suitable reinforcement learning model, we compared the average return and the learning time between SAC-RGB-smooth and DrQv2-RGB-smooth. The RGB image is proposed in a state suitable for this system. The average return of SAC-RGB-parse was 0.152 higher than that of SAC-Depth-parse. We also propose a smooth gradient function as a reward function. The average return of SAC-RGB-smooth was 0.414 higher than that of SAC-RGB-sparse. Finally, DrQ-v2 is proposed as a reinforcement learning model. After 20 hours of learning, the average expected value of DrQv2-RGB-smooth was 3.681 higher than that of SAC-RGB-smooth. Putting these results together, DrQv2-RGB-Smooth shows that in the real world, robots are promising candidates for reinforcement learning systems that can learn how to effectively pick up the powder.

URI: http://hanyang.dcollection.net/common/orgView/200000589816 https://repository.hanyang.ac.kr/handle/20.500.11754/167816

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > DEPARTMENT OF CONVERGENCE ROBOT SYSTEM(융합로봇시스템학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE