109 0

Integration of graphs from different data sources using crowdsourcing

Title
Integration of graphs from different data sources using crowdsourcing
Author
김영훈
Keywords
Graph integration; Crowdsourcing; Entity resolution; ENTITY RESOLUTION; ALGORITHMS; DISTANCE
Issue Date
2017-04
Publisher
ELSEVIER SCIENCE INC
Citation
INFORMATION SCIENCES, v. 385, Page. 438-456
Abstract
Data integration is the process of identifying pairs of records from different databases that refer to the same entity in the real world. It has been extensively studied with regard to entity resolution, record linkage, duplicate detection or network alignment. With the increasing use of crowdsourcing platforms as a means of assessing queries manually at low cost, many studies have begun to consider ways to exploit crowdsourcing systems for efficient data integration. In this paper, we present an efficient algorithm to integrate two graphs collected from different sources using crowdsourcing systems. Given two graphs, we repeatedly select a query node from a graph and request a human annotator to find its matching node from the other graph, which is considered to be the one indicating the same entity as the query node. The proposed method is to choose the query nodes that would increase the precision the most if it is labeled. By experiments with both the simulated answers and the labels collected by real crowdsourcing, we show that our algorithm finds more accurate graph matches with a smaller cost for crowdsourcing than the baseline algorithms. (C) 2017 Elsevier Inc. All rights reserved.
URI
https://www.sciencedirect.com/science/article/pii/S002002551730018Xhttps://repository.hanyang.ac.kr/handle/20.500.11754/72026
ISSN
0020-0255; 1872-6291
DOI
10.1016/j.ins.2017.01.006
Appears in Collections:
COLLEGE OF COMPUTING[E] > COMPUTER SCIENCE(소프트웨어학부) > Articles
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE