172 0

Multithreaded Chunking for Many Core Architecture

Title
Multithreaded Chunking for Many Core Architecture
Other Titles
다수의 프로세서 코어를 가지는 컴퓨터 구조를 위한 멀티쓰레드 파일분할방법
Author
민재홍
Advisor(s)
원유집
Issue Date
2010-02
Publisher
한양대학교
Degree
Master
Abstract
In this work, we develop novel variable size file chunking method, MUCH, which exploits multicore architecture of modern state of art microprocessor. To remove inter and intra file information redundancy, deduplication system partitions a file into a number of comparison unit called "chunk" and determines whether a given chunk is redundant. First task is called file chunking and second task is often termed as index lookup. Most of the existing works put great emphasis on devising a efficient method for index lookup. Despite its importance from performance point of view, file chunking has not been receiving proper attention. File chunking is very CPU intensive task. Modern IO interface can deliver 500 MByte/sec bandwidth. With single thread, file chunking cannot keep up with IO bandwidth and can limit the overall deduplication speed. We effectively address this problem by parallelizing the file chunking operation. We develop MUCH, Multithreaded Content Based File Chunking which can exploit the IO bandwidth and generates identical results independent of the parallelism degree. We expedite the file chunking process via posing lower and upper bound on chunk size. The key ingredient of MUCH is that it generates identical set of chunks independent of the parallelism degree. There are two types of threads in MUCH: master and chunkers. Master thread is responsible for distributing the segments to chunker threads and for post processing the generated chunks so that final chunking result is identical to the chunking result obtained from legacy single threaded chunking(chunking consistency). Chunker thread works in either bare mode or accelerated mode. Master thread performs inter-segment chunk coalescing to preserve chunking consistency. We formally prove that MUCH guarantees chunking consistency. We develop elaborate performance model for MUCH, which enables us to determine optimal segment size and optimal number of chunker threads. We implement MUCH in our prototype deduplication system, PRUNE. Via physical experiment, we examine the performance of MUCH varying the number of threads and the segment size under different file sets. When a file is completely loaded onto memory, single thread can deliver 95 MByte/sec chunking bandwidth. Via using three threads(Quad-Core CPU), we were able to achieve approximately 260 MByte/sec chunking speed. The speed of the modern IO interface standard increases exponentially. Serial ATA(SATA) 2.0 and 3.0 spec. delivers up to 200 MByte/sec and 600 MByte/sec IO bandwidth, respectively. Via parallelizing the file chunking operation while guaranteeing chunking consistency, MUCH successfully addresses performance issue of file chunking which is one of the critical problems in current deduplication system.
URI
https://repository.hanyang.ac.kr/handle/20.500.11754/142479http://hanyang.dcollection.net/common/orgView/200000413210
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > ELECTRONICS AND COMPUTER ENGINEERING(전자컴퓨터통신공학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE