Repository at Hanyang University: Multi-Armed Bandit for Slotted Random Access Systems

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING(전자공학과)Theses (Master)

476 0

Multi-Armed Bandit for Slotted Random Access Systems

Title: Multi-Armed Bandit for Slotted Random Access Systems

Author: 이동우

Alternative Author(s): 이동우

Advisor(s): 이주현

Issue Date: 2021. 8

Publisher: 한양대학교

Degree: Master

Abstract: This work investigates a random access (RA) game for a time-slotted RA system in single, and multi-cell RA systems. In the single-cell RA system, where there is a single access point (AP), and N players choose a set of slots of a frame and each frame consists of M multiple time slots. We obtain the pure strategy Nash equilibria (PNEs) of this RA game, where slots are fully utilized as in the centralized scheduling. As a realizing algorithm for PNEs, we propose an Exponential-weight algorithm for Exploration and Exploitation (EXP3)-based multi-agent (MA) learning algorithm. EXP3 is a bandit algorithm designed to find an optimal strategy in a multi-armed bandit (MAB) problem that users do not know the expected payoff of each strategy. Our simulation results show that the proposed algorithm can achieve PNEs. Moreover, it can adapt to time-varying environments, where the number of players varies over time. In this paper, our goal is to maximize the system throughput in a time-slotted uplink multi-cell random access communication system. To this end, we propose a two-stage reinforcement learning (RL)-based algorithm based on the EXP3. For the multi-cell RA system, we propose a two-stage RL-based algorithm based on the EXP3. The main goal of the proposed algorithm is to maximize the system throughput in a time-slotted multi-cell RA system. In each macro-time slot that consists of multiple time slots, players run the RL-based algorithm to choose the AP. Then, a transmission policy determines the sub-time slot that the player will transmit data in each time slot. Another RL-based learning algorithm is used to obtain an optimal transmission policy. To show that our method is efficient, we compare our proposed algorithm with the ε-greedy algorithm in two different scenarios. The simulation results show that the average system throughput of our algorithm outperforms that of ε-greedy algorithm.

URI: http://hanyang.dcollection.net/common/orgView/200000498737 https://repository.hanyang.ac.kr/handle/20.500.11754/163642

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING(전자공학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE