444 0

Building Intelligent Systems for Mining Information Extraction Rules from Web Pages by Using Domain Knowledge

Title
Building Intelligent Systems for Mining Information Extraction Rules from Web Pages by Using Domain Knowledge
Author
최중민
Issue Date
2001-06
Publisher
IEEE
Citation
ISIE 2001. 2001 IEEE International Symposium on Industrial Electronics Proceedings (Cat. No.01TH8570), v. 1, page. 322-327
Abstract
Previous research on automatic information extraction experienced difficulties in acquiting and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources. As a result, many real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents a method of building intelligent systems for mining information extraction rules from semi-structured Web pages by using domain knowledge. This system automatically generates a wrapper for each information source and performs information extraction and information integration by applying this wrapper to the corresponding source. Both the domain knowledge and the wrapper are represented by ML documents to increase flexibility and interoperability. By testing our prototype system on several real-estate information sites, we can claim that it creates the correct wrappers for most Web sources and consequently facilitates effective information extraction for heterogeneous information sources.
URI
https://ieeexplore.ieee.org/document/931807?arnumber=931807&SID=EBSCO:edseeehttps://repository.hanyang.ac.kr/handle/20.500.11754/158335
ISBN
0-7803-7090-2; 978-0-7803-7090-6
DOI
10.1109/ISIE.2001.931807
Appears in Collections:
COLLEGE OF ENGINEERING SCIENCES[E](공학대학) > COMPUTER SCIENCE AND ENGINEERING(컴퓨터공학과) > Articles
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE