471 0

High-Performance Graph Data Processing on a Single Machine

Title
High-Performance Graph Data Processing on a Single Machine
Author
조용연
Advisor(s)
김상욱
Issue Date
2018-02
Publisher
한양대학교
Degree
Doctor
Abstract
A graph consisting of nodes and edges is widely used in various real-world domains to represent relationships between objects. Such real-world graphs have unique characteristics as follows: (1) high sparsity, the number of relationships is much smaller than that of all the possible relationships among nodes, and (2) power-law degree, hub nodes have an extremely high degree while ordinary nodes have a very low degree distribution. Recently, the size of the real-world graph has dramatically increased, and there are many attempts to derive a meaningful information. However, existing graph processing methods have been developed without taking these characteristics of real-world graphs in consideration. This dissertation addresses ways to effectively process real-world graphs on a single machine. First, we propose method to multiplication of two sparse matrices on the graphic processing units (GPU) for social network analysis. A social network can be represented to an adjacency matrix (one of the graph representations), and its analysis can be performed through the matrix operations. Two sparse matrices multiplication is one of building blocks. However, GPU-based existing methods have been developed without leveraging characteristics of real-world graphs. We observe that existing methods leads to poor load balancing over threads to underutilized many cores in GPU, and then devise a method to maximize the performance of GPU by load balancing on threads as well as thread blocks. Experimental results show that our method outperforms existing methods up to two orders-of-magnitude. Furthermore, we suggest another approach to handle even larger graph data through a graph engine. Since graph engines store and retrieve the graph data into/from the storage and also provide various graph algorithms, they could be applied a wider range of fields requiring the large-scale graph (i.e., big graph) processing. Existing graph engines, however, do not consider the real-world graph. Therefore, we develop the graph engine, RealGraph, leveraging the characteristics of real-world big graphs.We embed techniques for efficiently processing real-world graph into RealGraph such as the block-based workload distribution for a nice load balancing for threads and the hierarchical indicator to reduce an inefficient scanning operation. As a result, the RealGraph outperforms state-of-the-art graph engines up to about 44 times. In terms of graph storage, the data layout is very important because it improves performance of CPU and I/O processing in graph engines. Previous graph engines, however, overlook the data layout, arrangement of nodes and edges into storage space. They simply arrange nodes and edges into storage space, according to given node IDs in the ascending order. For the better data layout, we analyze the node access patterns by graph algorithms on a graph engine. Based on these patterns, we propose the BF data layout that nodes processed together are located into a single and consecutive storage space. We verify the performance evaluation such that the BF data layout achieves higher performance of several graph engines up to 24 times.
URI
https://repository.hanyang.ac.kr/handle/20.500.11754/68623http://hanyang.dcollection.net/common/orgView/200000432056
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE(컴퓨터·소프트웨어학과) > Theses (Ph.D.)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE