Integration of graphs from different data sources using crowdsourcing
- Integration of graphs from different data sources using crowdsourcing
- Graph integration; Crowdsourcing; Entity resolution; ENTITY RESOLUTION; ALGORITHMS; DISTANCE
- Issue Date
- ELSEVIER SCIENCE INC
- INFORMATION SCIENCES, v. 385, Page. 438-456
- Data integration is the process of identifying pairs of records from different databases that refer to the same entity in the real world. It has been extensively studied with regard to entity resolution, record linkage, duplicate detection or network alignment. With the increasing use of crowdsourcing platforms as a means of assessing queries manually at low cost, many studies have begun to consider ways to exploit crowdsourcing systems for efficient data integration.
In this paper, we present an efficient algorithm to integrate two graphs collected from different sources using crowdsourcing systems. Given two graphs, we repeatedly select a query node from a graph and request a human annotator to find its matching node from the other graph, which is considered to be the one indicating the same entity as the query node. The proposed method is to choose the query nodes that would increase the precision the most if it is labeled. By experiments with both the simulated answers and the labels collected by real crowdsourcing, we show that our algorithm finds more accurate graph matches with a smaller cost for crowdsourcing than the baseline algorithms. (C) 2017 Elsevier Inc. All rights reserved.
- 0020-0255; 1872-6291
- Appears in Collections:
- COLLEGE OF COMPUTING[E] > COMPUTER SCIENCE(소프트웨어학부) > Articles
- Files in This Item:
There are no files associated with this item.
- RIS (EndNote)
- XLS (Excel)