Repository at Hanyang University: Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

409 0

Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

Title: Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

Keywords: reinforcement learning for average rewards; infinite-horizon Markov decision process; upper confidence bound; queue scheduling

Abstract: This paper proposes a Reinforcement learning (RL) algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival rates are not known in advance and can change over time. For this purpose, this problem is formulated as an infinite-horizon Constrained Markov Decision Process (CMDP). To handle the constrained optimization problem, we first adopt the Lagrangian relaxation technique to solve it. Then, we propose a variant of Q-learning, Q-greedyUCB that combines ε-greedy and Upper Confidence Bound (UCB) algorithms to solve this constrained MDP problem. We mathematically prove that the Q-greedyUCB algorithm converges to an optimal solution. Simulation results also show that Q-greedyUCB finds an optimal scheduling strategy, and is more efficient than Q-learning with ε-greedy, R-learning and the Average-payoff RL (ARL) algorithm in terms of the cumulative regret. We also show that our algorithm can learn and adapt to the changes of the environment, so as to obtain an optimal scheduling strategy under a given power constraint for the new environment.

URI: https://ieeexplore.ieee.org/document/9459561 https://repository.hanyang.ac.kr/handle/20.500.11754/166588

Appears in Collections:: COLLEGE OF ENGINEERING SCIENCES[E](공학대학) > ELECTRICAL ENGINEERING(전자공학부) > Articles

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository