Multi-Armed Bandit for Slotted Random Access Systems
- Title
- Multi-Armed Bandit for Slotted Random Access Systems
- Author
- 이동우
- Alternative Author(s)
- 이동우
- Advisor(s)
- 이주현
- Issue Date
- 2021. 8
- Publisher
- 한양대학교
- Degree
- Master
- Abstract
- This work investigates a random access (RA) game for a time-slotted RA system in single, and multi-cell RA systems. In the single-cell RA system, where there is a single access point (AP), and N players choose a set of slots of a frame and each frame consists of M multiple time slots. We obtain the pure strategy Nash equilibria (PNEs) of this RA game, where slots are fully utilized as in the centralized scheduling. As a realizing algorithm for PNEs, we propose an Exponential-weight algorithm for Exploration and Exploitation (EXP3)-based multi-agent (MA) learning algorithm. EXP3 is a bandit algorithm designed to find an optimal strategy in a multi-armed bandit (MAB) problem that users do not know the expected payoff of each strategy. Our simulation results show that the proposed algorithm can achieve PNEs. Moreover, it can adapt to time-varying environments, where the number of players varies over time. In this paper, our goal is to maximize the system throughput in a time-slotted uplink multi-cell random access communication system. To this end, we propose a two-stage reinforcement learning (RL)-based algorithm based on the EXP3. For the multi-cell RA system, we propose a two-stage RL-based algorithm based on the EXP3. The main goal of the proposed algorithm is to maximize the system throughput in a time-slotted multi-cell RA system. In each macro-time slot that consists of multiple time slots, players run the RL-based algorithm to choose the AP. Then, a transmission policy determines the sub-time slot that the player will transmit data in each time slot. Another RL-based learning algorithm is used to obtain an optimal transmission policy. To show that our method is efficient, we compare our proposed algorithm with the ε-greedy algorithm in two different scenarios. The simulation results show that the average system throughput of our algorithm outperforms that of ε-greedy algorithm.
- URI
- http://hanyang.dcollection.net/common/orgView/200000498737https://repository.hanyang.ac.kr/handle/20.500.11754/163642
- Appears in Collections:
- GRADUATE SCHOOL[S](대학원) > DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING(전자공학과) > Theses (Master)
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML