Full metadata record

DC FieldValueLanguage
dc.contributor.author최중민-
dc.date.accessioned2019-10-14T01:13:51Z-
dc.date.available2019-10-14T01:13:51Z-
dc.date.issued2005-06-
dc.identifier.citationInternational Conference on Intelligent Data Engineering and Automated Learning; IDEAL 2005: Intelligent Data Engineering and Automated Learning, Page. 94-101en_US
dc.identifier.isbn978-3-540-26972-4-
dc.identifier.urihttps://link.springer.com/chapter/10.1007/11508069_13-
dc.identifier.urihttps://repository.hanyang.ac.kr/handle/20.500.11754/111004-
dc.description.abstractThe focused crawler is a topic-driven document-collecting crawler that was suggested as a promising alternative of maintaining up-to-date Web document indices in search engines. A major problem inherent in previous focused crawlers is the liability of missing highly relevant documents that are linked from off-topic documents. This problem mainly originated from the lack of consideration of structural information in a document. Traditional weighting method such as TFIDF employed in document classification can lead to this problem. In order to improve the performance of focused crawlers, this paper proposes a scheme of locality-based document segmentation to determine the relevance of a document to a specific topic. We segment a document into a set of sub-documents using contextual features around the hyperlinks. This information is used to determine whether the crawler would fetch the documents that are linked from hyperlinks in an off-topic document.en_US
dc.language.isoen_USen_US
dc.publisherSPRINGER-VERLAG BERLINen_US
dc.subjectParent Nodeen_US
dc.subjectImplicit Relationen_US
dc.subjectAnchor Texten_US
dc.subjectContext Graphen_US
dc.subjectFocus Crawleren_US
dc.titleA Focused Crawler with Document Segmentationen_US
dc.typeArticleen_US
dc.identifier.doi10.1007/11508069_13-
dc.relation.journalLECTURE NOTES IN COMPUTER SCIENCE-
dc.contributor.googleauthorYang, Jaeyoung-
dc.contributor.googleauthorKang, Jinbeom-
dc.contributor.googleauthorChoi, Joongmin-
dc.relation.code2007206327-
dc.sector.campusE-
dc.sector.daehakCOLLEGE OF ENGINEERING SCIENCES[E]-
dc.sector.departmentDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING-
dc.identifier.pidjmchoi-
Appears in Collections:
COLLEGE OF ENGINEERING SCIENCES[E](공학대학) > COMPUTER SCIENCE AND ENGINEERING(컴퓨터공학과) > Articles
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE