Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction
- Title
- Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction
- Author
- 최중민
- Keywords
- information extraction; informative block; visual block
- Issue Date
- 2008-09
- Publisher
- SPRINGER
- Citation
- JOURNAL OF UNIVERSAL COMPUTER SCIENCE, v. 14, No. 11, Page. 1893-1910
- Abstract
- As web sites are getting more complicated, the construction of web information extraction systems becomes more troublesome and time-consuming. A common theme is the difficulty in locating the segments of a page in which the target information is contained, which we call the informative blocks. This article reports on the Recognising Informative Page Blocks algorithm (RIPB), which is able to identify the informative block in a web page so that information extraction algorithms can work on it more efficiently. RIPB relies on an existing algorithm for vision-based page block segmentation to analyse and partition a web page into a set of visual blocks, and then groups related blocks with similar content structures into block clusters by using a tree edit distance method. RIPB recognises the informative block cluster by using tree alignment and tree matching. A series of experiments were performed, and the conclusions were that RIPB was more than 95% accurate in recognising informative block clusters, and improved the efficiency of information extraction by 17%.
- URI
- http://www.jucs.org/jucs_14_11/recognising_informative_web_pagehttps://repository.hanyang.ac.kr/handle/20.500.11754/80658
- ISSN
- 0948-695X
- DOI
- 10.3217/jucs-014-11-1893
- Appears in Collections:
- COLLEGE OF ENGINEERING SCIENCES[E](공학대학) > COMPUTER SCIENCE AND ENGINEERING(컴퓨터공학과) > Articles
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML