336 0

Block Classification of a Web Page by Using a Combination of Multiple Classifiers

Title
Block Classification of a Web Page by Using a Combination of Multiple Classifiers
Author
최중민
Keywords
web block classification; web data mining; combining multiple classifiers
Issue Date
2008-09
Publisher
IEEE
Citation
2008 Fourth International Conference on Networked Computing and Advanced Information Management, Page. 290 - 295
Abstract
Recently, researchers have been actively studying on web mining with various data in the World Wide Web. Since Web pages are generally semi-structured, which makes it difficult to identify informative blocks, techniques of content detection by removing unnecessary data (e.g. advertisements) from the Web pages become important. Generally a Web page consists of many blocks containing various data and structural information. In this paper, we propose a method that classifies the blocks of a web page into an appropriate category by building a Tree Alignment model representing HTML structure and a Vector model representing the features of the blocks. Web sites normally have their own templates and the blocks may be related to different categories even though they are located in the same position in the Web browser or are structurally similar. Hence it is difficult to classify the blocks into accurate categories through building one classifier. To solve the problem, in our approach, multiple classifiers are built, one for each training domain, and the block classification proceeds through combining them. © 2008 IEEE.
URI
https://ieeexplore.ieee.org/document/4624157https://repository.hanyang.ac.kr/handle/20.500.11754/104868
ISBN
978-076953322-3
DOI
10.1109/NCM.2008.170
Appears in Collections:
COLLEGE OF ENGINEERING SCIENCES[E](공학대학) > COMPUTER SCIENCE AND ENGINEERING(컴퓨터공학과) > Articles
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE