Repository at Hanyang University: A Parallelization Approach for Performance Improvement of Video Decoders

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)ELECTRONICS AND COMPUTER ENGINEERING(전자컴퓨터통신공학과)Theses (Ph.D.)

183 0

A Parallelization Approach for Performance Improvement of Video Decoders

Title: A Parallelization Approach for Performance Improvement of Video Decoders

Other Titles: 비디오 복호화기의 성능향상을 위한 병렬화 접근방식

Author: 조송현

Alternative Author(s): 조송현

Advisor(s): 송용호

Issue Date: 2014-08

Publisher: 한양대학교

Degree: Doctor

Abstract: As demand for video coding has been increasing significantly in many areas—including personal computers, digital TV, mobile electronics, and, recently, the internet of things (IoT)—, video coding standards have evolved to improve coding efficiency. This improvement in coding efficiency has been realized by adopting many advanced algorithms in its coding tools, many of which require a huge amount of computation. Moreover, many of the recent video applications need to decode videos of high resolution and frame rate in real time, resulting in large data processing requirements. In order to deal with demands for high performance, increasing operating frequency has been widely used in industry. However, it is no longer a promising solution to improve the performance of video decoders, as the operating frequency increases with power-consumption, and recent video decoding is often performed under energy-constrained environments, such as in mobile devices. In addition, software optimization techniques have been widely used to improve performance, but often cannot guarantee real-time decoding. In recent years, many recent electronic devices adopt multi-core architectures, which have the advantages of performance improvement and low-power consumption. Use of multi-core systems offers a promising way to deal with high computation requirements of video coding. However, the throughput performance enhancement of parallel video decoding is often limited by the dependency between data in a video decoder. That is, the data dependency serializes data decoding and therefore decreases the utilization of processing elements, resulting in poor parallel performance. In addition, video decoders often suffer from the starvation of memory bandwidth in recent multi-core systems as many threads compete for limited memory bandwidth. This potential lack of memory bandwidth often leads to poor quality-of-service (QoS) performance. Thus, this dissertation proposes a parallelization approach focusing on two issues of video decoding in multi-core systems: (i) how to increase the parallel performance under data dependency constraints, and (ii) how to guarantee QoS performance in terms of memory bandwidth. For the first issue, this dissertation proposes a technique to improve parallel performance of an MPEG-4 AVC/H.264 decoder by coordinating the execution of parallel threads. Experimental results show that the proposed parallel processing technique achieves up to 2.9x speedup. Furthermore, a method to identify essential, or actual, data dependency in the decoding of high efficient video coding (HEVC) is proposed. To the best of the present author's knowledge, the amount of actual data dependency is smaller than that of the conventional data dependency used in any previous research on parallel HEVC decoding. Experimental results show that the parallel speedup of parallel HEVC decoding based on actual dependency is up to 23.4% better than that based on the conventional data dependency on a six-core processor. With regard to the second issue, in order to meet QoS requirements in terms of memory bandwidth, this dissertation proposes a prediction scheme utilized in the dynamic memory bandwidth reservation for the motion compensator (MC) of HEVC. In the proposed technique, memory bandwidth is predicted by analyzing the metadata located in the video bitstream. Once the predicted bandwidth is reserved in a memory bandwidth reservation system, minimum bandwidth is guaranteed. A straightforward solution is to statically preserve the memory bandwidth sufficiently to cope with the worst-case data traffic. However, the actual memory bandwidth consumed by the MC in most cases is much less than that consumed in the worst case. Instead, the dynamic bandwidth reservation scheme utilizing the proposed bandwidth prediction algorithm considerably reduces the amount of over-allocated memory bandwidth.| 비디오 코딩의 필요성이 개인용 컴퓨터, 디지털 TV, 모바일 기기, 사물인터넷 등 다양한 영역에서 증가함에 따라 비디오 코덱 표준은 코딩효율을 향상시키기 위해 진화해 왔다. 이러한 효율성증가는 코딩 도구들에 다양한 고성능 알고리즘을 적용하여 실현되어 왔는데, 고성능 알고리즘 중 상당수는 많은 양의 계산이 필요하다. 게다가 최근 비디오 응용 중 많은 수는 고해상도의 비디오를 실시간으로 복호화 하기 위하여 많은 데이터를 처리해야 한다. 따라서 최근의 비디오 복호화기는 우수한 성능이 필요하다. 산업계에서는 이러한 고성능 요구사항을 만족하기 동작주파수를 증가시키는 방법을 사용해 왔다. 하지만 동작주파수 증가는 더 이상 비디오 복호화기의 성능향상을 위한 유망한 해법이 아니다. 동작주파수의 증가는 에너지 소모를 증가시키는데 비디오 복호화기가 모바일 기기와 같이 에너지 제약이 있는 환경에서 수행되는 경우가 많기 때문이다. 또한 성능을 향상시키기 위하여 소프트웨어 최적화 기법도 널리 사용되어 왔지만 소프트웨어 최적화만으로는 실시간 복호화를 보장하지 못하는 경우가 빈번히 발생한다. 한편, 최근의 많은 전자기기들은 성능향상과 저전력에 강점이 있는 멀티코어 아키텍처를 채택하고 있다. 멀티코어 시스템의 활용은 앞에서 언급한 비디오 코딩의 높은 계산량를 처리하기 위한 유망한 방법이다. 하지만, 병렬 비디오 복호화의 연산 성능은 데이터 간 의존성에 의해 자주 제한된다. 즉, 데이터 의존성은 연산을 순차적으로 수행하도록 하기 때문에 연산장치의 활용도를 감소시켜 병렬처리 성능을 저하시킨다. 게다가, 최근의 멀티코어 시스템에서는 다수의 스레드가 한정된 메모리 대역폭을 차지하기 위해 경쟁하기 때문에, 비디오 복호화기는 메모리 대역폭이 부족한 상황에 처할 수 있다. 이 잠재적인 메모리 대역폭 부족은 종종 QoS 성능 저하를 발생시킨다. 따라서 본 학위논문은 멀티코어 시스템에서의 비디오 복호화와 관련한 두 가지 주제에 초점을 맞춘다. 첫 번째 주제는 어떻게 데이터 의존성의 제약속에서 병렬처리 성능을 향상시키는 것이다. 두 번째 주제는 메모리 대역폭 측면에서 QoS를 보장하는 것이다. 첫째, 본 학위논문은 병렬쓰레드 간의 협력을 통해 병렬처리를 효율적으로 수행하는 알고리즘을 제시한다. 제안하는 병렬처리기법을 적용 시 H.264 복호화기가 쿼드코어에서 최대 2.9배 성능이 향상됨을 확인하였다. 또한, 본 학위논문은 HEVC 복호화 시 존재하는 실제 데이터 의존성을 발견하는 방법을 제시한다. 실제 데이터 의존성의 양은, 저자가 아는 한, 모든 기존 논문에서 사용하는 전형적인 데이터 의존성의 양보다 적다. 실험을 통해, 실제 데이터 의존성에 기반을 둔 HEVC 복호화의 병렬처리가 기존에 사용하던 데이터 의존성에 기반을 둔 병렬화에 비해 여섯 개의 코어를 지닌 멀티코어 시스템에서 23.4% 성능이 향상되었음을 확인하였다. 둘째, 높은 메모리 대역폭를 요구하는 상황에서도 quality of service (QoS)를 보장하기 위하여, 본 학위논문은 HEVC의 움직임 보상을 위한 메모리 대역폭을 동적으로 예측하고 예약하는 방법을 제시한다. 제안된 테크닉은 비디오 비트스트림 내 메타데이터를 분석함으로써 메모리 대역폭을 예측한다. 예측된 메모리 대역폭이 메모리 대역폭 예약 시스템에 예약되면 최소 대역폭이 보장된다. 이를 위한 단순한 해법은 최악의 경우를 감당할 수 있도록 충분한 양의 메모리 대역폭을 정적으로 예약하는 것이다. 그러나 정적으로 메모리 대역폭을 할당할 경우, 실제 움직임보상이 사용하는 메모리 대역폭은 대부분의 경우 최악의 경우보다 훨씬 작다. 반면, 제안된 예측 알고리즘을 활용한 동적 대역폭 예약 방법은 과할당된 메모리 대역폭을 상당히 줄여준다.; As demand for video coding has been increasing significantly in many areas—including personal computers, digital TV, mobile electronics, and, recently, the internet of things (IoT)—, video coding standards have evolved to improve coding efficiency. This improvement in coding efficiency has been realized by adopting many advanced algorithms in its coding tools, many of which require a huge amount of computation. Moreover, many of the recent video applications need to decode videos of high resolution and frame rate in real time, resulting in large data processing requirements. In order to deal with demands for high performance, increasing operating frequency has been widely used in industry. However, it is no longer a promising solution to improve the performance of video decoders, as the operating frequency increases with power-consumption, and recent video decoding is often performed under energy-constrained environments, such as in mobile devices. In addition, software optimization techniques have been widely used to improve performance, but often cannot guarantee real-time decoding. In recent years, many recent electronic devices adopt multi-core architectures, which have the advantages of performance improvement and low-power consumption. Use of multi-core systems offers a promising way to deal with high computation requirements of video coding. However, the throughput performance enhancement of parallel video decoding is often limited by the dependency between data in a video decoder. That is, the data dependency serializes data decoding and therefore decreases the utilization of processing elements, resulting in poor parallel performance. In addition, video decoders often suffer from the starvation of memory bandwidth in recent multi-core systems as many threads compete for limited memory bandwidth. This potential lack of memory bandwidth often leads to poor quality-of-service (QoS) performance. Thus, this dissertation proposes a parallelization approach focusing on two issues of video decoding in multi-core systems: (i) how to increase the parallel performance under data dependency constraints, and (ii) how to guarantee QoS performance in terms of memory bandwidth. For the first issue, this dissertation proposes a technique to improve parallel performance of an MPEG-4 AVC/H.264 decoder by coordinating the execution of parallel threads. Experimental results show that the proposed parallel processing technique achieves up to 2.9x speedup. Furthermore, a method to identify essential, or actual, data dependency in the decoding of high efficient video coding (HEVC) is proposed. To the best of the present author's knowledge, the amount of actual data dependency is smaller than that of the conventional data dependency used in any previous research on parallel HEVC decoding. Experimental results show that the parallel speedup of parallel HEVC decoding based on actual dependency is up to 23.4% better than that based on the conventional data dependency on a six-core processor. With regard to the second issue, in order to meet QoS requirements in terms of memory bandwidth, this dissertation proposes a prediction scheme utilized in the dynamic memory bandwidth reservation for the motion compensator (MC) of HEVC. In the proposed technique, memory bandwidth is predicted by analyzing the metadata located in the video bitstream. Once the predicted bandwidth is reserved in a memory bandwidth reservation system, minimum bandwidth is guaranteed. A straightforward solution is to statically preserve the memory bandwidth sufficiently to cope with the worst-case data traffic. However, the actual memory bandwidth consumed by the MC in most cases is much less than that consumed in the worst case. Instead, the dynamic bandwidth reservation scheme utilizing the proposed bandwidth prediction algorithm considerably reduces the amount of over-allocated memory bandwidth.

URI: https://repository.hanyang.ac.kr/handle/20.500.11754/129816 http://hanyang.dcollection.net/common/orgView/200000424643

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > ELECTRONICS AND COMPUTER ENGINEERING(전자컴퓨터통신공학과) > Theses (Ph.D.)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE