297 0

Full metadata record

DC FieldValueLanguage
dc.contributor.author최중민-
dc.date.accessioned2019-07-10T06:17:08Z-
dc.date.available2019-07-10T06:17:08Z-
dc.date.issued2007-11-
dc.identifier.citation2007 International Symposium on Information Technology Convergence (ISITC 2007), Page. 306-310en_US
dc.identifier.isbn0-7695-3045-1-
dc.identifier.urihttps://ieeexplore.ieee.org/document/4410655-
dc.identifier.urihttps://repository.hanyang.ac.kr/handle/20.500.11754/107275-
dc.description.abstractAs the structure of a Web page is getting more complicated, the construction of wrapper induction rules becomes more difficult and time-consuming. The main problem in most wrapper induction methods is the difficulty in discriminating the meaningful blocks that contain the target information from the noise blocks that contains irrelevant information such as advertisements, menus, or copyright statements. To solve this problem, this paper proposes the RIPB(recognizing informative page blocks) algorithm that detects the informative blocks in a Web page by exploiting the visual block segmentation scheme. RIPB uses the visual page segmentation algorithm to analyze and partition a Web page into a set of logical blocks, and then groups related blocks with similar structures into a block cluster and recognizes the informative block clusters by applying some heuristic rules to the cluster information. The results of a series of experiments indicate that RIPB contributes to improve the accuracy of information extraction by allowing the wrapper induction module to focus only on the informative block information and ignore other noise information in building extraction rules.en_US
dc.language.isoen_USen_US
dc.publisherIEEEen_US
dc.titleDetecting Informative Web Page Blocks for Efficient Information Extraction Using Visual Block Segmentationen_US
dc.typeArticleen_US
dc.identifier.doi10.1109/ISITC.2007.6-
dc.contributor.googleauthorKang, Jinbeom-
dc.contributor.googleauthorChoi, Joongmin-
dc.sector.campusE-
dc.sector.daehakCOLLEGE OF ENGINEERING SCIENCES[E]-
dc.sector.departmentDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING-
dc.identifier.pidjmchoi-
Appears in Collections:
COLLEGE OF ENGINEERING SCIENCES[E](공학대학) > COMPUTER SCIENCE AND ENGINEERING(컴퓨터공학과) > Articles
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE