Repository at Hanyang University: A Novel Procedure for Implementing a Turbo Decoder on a GPU with Coalesced Memory Access

Browse

My Repository

Repository at Hanyang UniversityCOLLEGE OF ENGINEERING[S](공과대학)ELECTRONIC ENGINEERING(융합전자공학부)Articles

217 0

Full metadata record

DC Field	Value	Language
dc.contributor.author	최승원	-
dc.date.accessioned	2019-11-25T05:35:36Z	-
dc.date.available	2019-11-25T05:35:36Z	-
dc.date.issued	2017-05	-
dc.identifier.citation	IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, v. E100A, no. 5, page. 1188-1196	en_US
dc.identifier.issn	1745-1337	-
dc.identifier.uri	https://www.jstage.jst.go.jp/article/transfun/E100.A/5/E100.A_1188/_article	-
dc.identifier.uri	https://repository.hanyang.ac.kr/handle/20.500.11754/114121	-
dc.description.abstract	The sub-blocking algorithm has been known as a core component in implementing a turbo decoder using a Graphic Processing Unit (GPU) to use as many cores in the GPU as possible for parallel processing. However, even though the sub-blocking algorithm allows a large number of threads in a given GPU to be adopted for processing a large number of sub-blocks in parallel, each thread must access the global memory with strided addresses, which results in uncoalesced memory access. Because uncoalesced memory access causes a lot of unnecessary memory transactions, the memory bandwidth efficiency drops significantly, possibly as low as 1/8 in the case of an Long Term Evolution (LTE) turbo decoder, depending upon the compute capability of a GPU. In this paper, we present a novel method for converting uncoalesced memory access into coalesced access in a way that completely recovers the memory bandwidth efficiency to 100 % without additional overhead. Our experimental tests, performed with NVIDIA's Geforce GTX 780 Ti GPU, show that the proposed method can enhance the throughput by nearly 30 % compared with a conventional turbo decoder that suffers from uncoalesced memory access. Throughput provided by the proposed method has been observed to be 51.4 Mbps when the number of iterations and that of sub-blocks are set to 6 and 32, respectively, in our experimental tests, which far exceeds the performance of previous works implemented the Max-Log-MAP algorithm.	en_US
dc.description.sponsorship	This research was supported by the MSIP (Ministry of Science, ICT&amp;Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2016- H8501-16-1006) supervised by the IITP (Institute for Information&amp;communications Technology Promotion).	en_US
dc.language.iso	en_US	en_US
dc.publisher	IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG	en_US
dc.subject	GPU	en_US
dc.subject	CUDA	en_US
dc.subject	turbo decoder	en_US
dc.subject	coalesced memory access	en_US
dc.subject	SDR	en_US
dc.title	A Novel Procedure for Implementing a Turbo Decoder on a GPU with Coalesced Memory Access	en_US
dc.type	Article	en_US
dc.relation.no	5	-
dc.relation.volume	E100A	-
dc.identifier.doi	10.1587/transfun.E100.A.1188	-
dc.relation.page	1188-1196	-
dc.relation.journal	IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES	-
dc.contributor.googleauthor	Ahn, Heungseop	-
dc.contributor.googleauthor	Choi, Seungwon	-
dc.relation.code	2017003823	-
dc.sector.campus	S	-
dc.sector.daehak	COLLEGE OF ENGINEERING[S]	-
dc.sector.department	DEPARTMENT OF ELECTRONIC ENGINEERING	-
dc.identifier.pid	choiseungwon	-

Appears in Collections:: COLLEGE OF ENGINEERING[S](공과대학) > ELECTRONIC ENGINEERING(융합전자공학부) > Articles

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show simple item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE