Repository at Hanyang University: Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

Browse

My Repository

Repository at Hanyang UniversityCOLLEGE OF ENGINEERING SCIENCES[E](공학대학)ELECTRICAL ENGINEERING(전자공학부)Articles

Full metadata record

DC Field	Value	Language
dc.contributor.author	이주현	-
dc.date.accessioned	2021-11-30T02:21:33Z	-
dc.date.available	2021-11-30T02:21:33Z	-
dc.date.issued	2021-06	-
dc.identifier.citation	CHINA COMMUNICATIONS, v. 18, no. 6, page. 12-23	en_US
dc.identifier.issn	1673-5447	-
dc.identifier.uri	https://ieeexplore.ieee.org/document/9459561	-
dc.identifier.uri	https://repository.hanyang.ac.kr/handle/20.500.11754/166588	-
dc.description.abstract	This paper proposes a Reinforcement learning (RL) algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival rates are not known in advance and can change over time. For this purpose, this problem is formulated as an infinite-horizon Constrained Markov Decision Process (CMDP). To handle the constrained optimization problem, we first adopt the Lagrangian relaxation technique to solve it. Then, we propose a variant of Q-learning, Q-greedyUCB that combines ε-greedy and Upper Confidence Bound (UCB) algorithms to solve this constrained MDP problem. We mathematically prove that the Q-greedyUCB algorithm converges to an optimal solution. Simulation results also show that Q-greedyUCB finds an optimal scheduling strategy, and is more efficient than Q-learning with ε-greedy, R-learning and the Average-payoff RL (ARL) algorithm in terms of the cumulative regret. We also show that our algorithm can learn and adapt to the changes of the environment, so as to obtain an optimal scheduling strategy under a given power constraint for the new environment.	en_US
dc.description.sponsorship	This work was supported by the research fund of Hanyang University(HY-2019-N). This work was supported by the National Key Research & Development Program 2018YFA0701601.	en_US
dc.language.iso	en_US	en_US
dc.publisher	CHINA INST COMMUNICATIONS	en_US
dc.subject	reinforcement learning for average rewards	en_US
dc.subject	infinite-horizon Markov decision process	en_US
dc.subject	upper confidence bound	en_US
dc.subject	queue scheduling	en_US
dc.title	Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling	en_US
dc.type	Article	en_US
dc.relation.no	6	-
dc.relation.volume	18	-
dc.identifier.doi	10.23919/JCC.2021.06.002	-
dc.relation.page	12-23	-
dc.relation.journal	CHINA COMMUNICATIONS	-
dc.contributor.googleauthor	Zhao, Yu	-
dc.contributor.googleauthor	Lee, Joohyun	-
dc.contributor.googleauthor	Chen, Wei	-
dc.relation.code	2021007020	-
dc.sector.campus	E	-
dc.sector.daehak	COLLEGE OF ENGINEERING SCIENCES[E]	-
dc.sector.department	DIVISION OF ELECTRICAL ENGINEERING	-
dc.identifier.pid	joohyunlee	-

Appears in Collections:: COLLEGE OF ENGINEERING SCIENCES[E](공학대학) > ELECTRICAL ENGINEERING(전자공학부) > Articles

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show simple item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE