Full metadata record

DC FieldValueLanguage
dc.contributor.author이주현-
dc.date.accessioned2021-11-30T02:21:33Z-
dc.date.available2021-11-30T02:21:33Z-
dc.date.issued2021-06-
dc.identifier.citationCHINA COMMUNICATIONS, v. 18, no. 6, page. 12-23en_US
dc.identifier.issn1673-5447-
dc.identifier.urihttps://ieeexplore.ieee.org/document/9459561-
dc.identifier.urihttps://repository.hanyang.ac.kr/handle/20.500.11754/166588-
dc.description.abstractThis paper proposes a Reinforcement learning (RL) algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival rates are not known in advance and can change over time. For this purpose, this problem is formulated as an infinite-horizon Constrained Markov Decision Process (CMDP). To handle the constrained optimization problem, we first adopt the Lagrangian relaxation technique to solve it. Then, we propose a variant of Q-learning, Q-greedyUCB that combines ε-greedy and Upper Confidence Bound (UCB) algorithms to solve this constrained MDP problem. We mathematically prove that the Q-greedyUCB algorithm converges to an optimal solution. Simulation results also show that Q-greedyUCB finds an optimal scheduling strategy, and is more efficient than Q-learning with ε-greedy, R-learning and the Average-payoff RL (ARL) algorithm in terms of the cumulative regret. We also show that our algorithm can learn and adapt to the changes of the environment, so as to obtain an optimal scheduling strategy under a given power constraint for the new environment.en_US
dc.description.sponsorshipThis work was supported by the research fund of Hanyang University(HY-2019-N). This work was supported by the National Key Research & Development Program 2018YFA0701601.en_US
dc.language.isoen_USen_US
dc.publisherCHINA INST COMMUNICATIONSen_US
dc.subjectreinforcement learning for average rewardsen_US
dc.subjectinfinite-horizon Markov decision processen_US
dc.subjectupper confidence bounden_US
dc.subjectqueue schedulingen_US
dc.titleQ-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Schedulingen_US
dc.typeArticleen_US
dc.relation.no6-
dc.relation.volume18-
dc.identifier.doi10.23919/JCC.2021.06.002-
dc.relation.page12-23-
dc.relation.journalCHINA COMMUNICATIONS-
dc.contributor.googleauthorZhao, Yu-
dc.contributor.googleauthorLee, Joohyun-
dc.contributor.googleauthorChen, Wei-
dc.relation.code2021007020-
dc.sector.campusE-
dc.sector.daehakCOLLEGE OF ENGINEERING SCIENCES[E]-
dc.sector.departmentDIVISION OF ELECTRICAL ENGINEERING-
dc.identifier.pidjoohyunlee-
Appears in Collections:
COLLEGE OF ENGINEERING SCIENCES[E](공학대학) > ELECTRICAL ENGINEERING(전자공학부) > Articles
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE