A Focused Crawler with Document Segmentation
- Title
- A Focused Crawler with Document Segmentation
- Author
- 최중민
- Keywords
- Parent Node; Implicit Relation; Anchor Text; Context Graph; Focus Crawler
- Issue Date
- 2005-06
- Publisher
- SPRINGER-VERLAG BERLIN
- Citation
- International Conference on Intelligent Data Engineering and Automated Learning; IDEAL 2005: Intelligent Data Engineering and Automated Learning, Page. 94-101
- Abstract
- The focused crawler is a topic-driven document-collecting crawler that was suggested as a promising alternative of maintaining up-to-date Web document indices in search engines. A major problem inherent in previous focused crawlers is the liability of missing highly relevant documents that are linked from off-topic documents. This problem mainly originated from the lack of consideration of structural information in a document. Traditional weighting method such as TFIDF employed in document classification can lead to this problem. In order to improve the performance of focused crawlers, this paper proposes a scheme of locality-based document segmentation to determine the relevance of a document to a specific topic. We segment a document into a set of sub-documents using contextual features around the hyperlinks. This information is used to determine whether the crawler would fetch the documents that are linked from hyperlinks in an off-topic document.
- URI
- https://link.springer.com/chapter/10.1007/11508069_13https://repository.hanyang.ac.kr/handle/20.500.11754/111004
- ISBN
- 978-3-540-26972-4
- DOI
- 10.1007/11508069_13
- Appears in Collections:
- COLLEGE OF ENGINEERING SCIENCES[E](공학대학) > COMPUTER SCIENCE AND ENGINEERING(컴퓨터공학과) > Articles
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML