Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | 최중민 | - |
dc.date.accessioned | 2019-07-10T02:42:31Z | - |
dc.date.available | 2019-07-10T02:42:31Z | - |
dc.date.issued | 2007-11 | - |
dc.identifier.citation | 2007 International Conference on Convergence Information Technology (ICCIT 2007), Page. 2455-2460 | en_US |
dc.identifier.isbn | 0-7695-3038-9 | - |
dc.identifier.uri | https://ieeexplore.ieee.org/document/4420619 | - |
dc.identifier.uri | https://repository.hanyang.ac.kr/handle/20.500.11754/107257 | - |
dc.description.abstract | The main issue for effective Web information extraction is how to recognize similar patterns in a Web page. Traditionally, it has been shown that pattern matching by using the HTML DOM tree is more efficient than the simple string matching approach. Nonetheless, previous tree-based pattern matching methods have problems by assuming that all HTML tags have the same values, assigning the same weight to each node in HTML trees. This paper proposes an enhanced tree matching algorithm that improves the tree edit distance method by considering the characteristics of HTML features. We assign different values to different HTML tree nodes according to their weights for displaying the corresponding data objects in the browser. Pattern matching of HTML patterns is done by obtaining the maximum mapping values of two HTML trees that are constructed with weighted node values from HTML data objects. Experiments are done over several Web commerce sites to evaluate the effectiveness of the proposed HTML tree matching algorithm. | en_US |
dc.language.iso | en_US | en_US |
dc.publisher | IEEE | en_US |
dc.title | Web Information Extraction by HTML Tree Edit Distance Matching | en_US |
dc.type | Article | en_US |
dc.identifier.doi | 10.1109/ICCIT.2007.19 | - |
dc.contributor.googleauthor | Kim, Yeonjung | - |
dc.contributor.googleauthor | Park, Jeahyun | - |
dc.contributor.googleauthor | Kim, Taehwan | - |
dc.contributor.googleauthor | Choi, Joongmin | - |
dc.sector.campus | E | - |
dc.sector.daehak | COLLEGE OF ENGINEERING SCIENCES[E] | - |
dc.sector.department | DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING | - |
dc.identifier.pid | jmchoi | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.