379 0

Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction

Title
Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction
Author
최중민
Keywords
information extraction; informative block; visual block
Issue Date
2008-09
Publisher
SPRINGER
Citation
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, v. 14, No. 11, Page. 1893-1910
Abstract
As web sites are getting more complicated, the construction of web information extraction systems becomes more troublesome and time-consuming. A common theme is the difficulty in locating the segments of a page in which the target information is contained, which we call the informative blocks. This article reports on the Recognising Informative Page Blocks algorithm (RIPB), which is able to identify the informative block in a web page so that information extraction algorithms can work on it more efficiently. RIPB relies on an existing algorithm for vision-based page block segmentation to analyse and partition a web page into a set of visual blocks, and then groups related blocks with similar content structures into block clusters by using a tree edit distance method. RIPB recognises the informative block cluster by using tree alignment and tree matching. A series of experiments were performed, and the conclusions were that RIPB was more than 95% accurate in recognising informative block clusters, and improved the efficiency of information extraction by 17%.
URI
http://www.jucs.org/jucs_14_11/recognising_informative_web_pagehttps://repository.hanyang.ac.kr/handle/20.500.11754/80658
ISSN
0948-695X
DOI
10.3217/jucs-014-11-1893
Appears in Collections:
COLLEGE OF ENGINEERING SCIENCES[E](공학대학) > COMPUTER SCIENCE AND ENGINEERING(컴퓨터공학과) > Articles
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE