322 0

Full metadata record

DC FieldValueLanguage
dc.contributor.author김상욱-
dc.date.accessioned2017-11-17T00:43:13Z-
dc.date.available2017-11-17T00:43:13Z-
dc.date.issued2016-01-
dc.identifier.citationResearch Journal of Applied Sciences, Engineering & Technology, v. 12, NO 2, Page. 214-222en_US
dc.identifier.issn2040-7467-
dc.identifier.issn2040-7459-
dc.identifier.urihttp://maxwellsci.com/jp/mspabstract.php?jid=RJASET&doi=rjaset.12.2323-
dc.identifier.urihttp://hdl.handle.net/20.500.11754/31477-
dc.description.abstractDocument similarity is used to search for such documents similar to a query document given. Text-based document similarity is computed by comparing the words in documents. The cosine similarity is the most popular text-based document similarity measure and computes the similarity of two documents based on their common word frequencies. It counts the exactly same words only, so cannot reflect semantic similarity between similar words having the same meaning. We propose a new document similarity measure to solve this problem by using the Earth Mover’s Distance (EMD). The EMD enables to compute the semantic similarity of documents. To apply the EMD to the similarity measure, we need to solve the high computational complexity and to define the distance between attributes. The high computational complexity comes from the large number of words in documents. Thus, we extract the topics from documents by using Latent Dirichlet Allocation (LDA), a document generating model. Since the number of topics is much smaller than that of words, the LDA helps reduce the computational complexity. We define the distance between topics using the cosine similarity. The experimental results on real-world document databases show that the proposed measure finds similar documents more accurately than the cosine similarity owing to reflecting semantic similarity.en_US
dc.description.sponsorshipThis study was supported by (1) the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2014R1A2A1A10054151) and (2) the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2015-H8501-15-1013) supervised by the IITP (Institute for Information & communication Technology Promotion).en_US
dc.language.isoenen_US
dc.publisherMedwellen_US
dc.subjectCosine similairtyen_US
dc.subjectdocument similarityen_US
dc.subjectearth mover’s distanceen_US
dc.subjectlatent dirichlet allocationen_US
dc.subjectsemantic similarityen_US
dc.titleDocument Similarity Measure Based on the Earth Mover's Distance Utilizing Latent Dirichlet Allocationen_US
dc.typeArticleen_US
dc.relation.no2-
dc.relation.volume12-
dc.identifier.doi10.19026/rjaset.12.2323-
dc.relation.page214-222-
dc.relation.journalResearch Journal of Applied Sciences-
dc.contributor.googleauthorJang, Min-Hee-
dc.contributor.googleauthorEom, Tae-Hwan-
dc.contributor.googleauthorKim, Sang-Wook-
dc.contributor.googleauthorHwang, Young-Sup-
dc.relation.code2016038405-
dc.sector.campusS-
dc.sector.daehakCOLLEGE OF ENGINEERING[S]-
dc.sector.departmentDEPARTMENT OF COMPUTER SCIENCE-
dc.identifier.pidwook-
Appears in Collections:
COLLEGE OF ENGINEERING[S](공과대학) > COMPUTER SCIENCE AND ENGINEERING(컴퓨터공학부) > Articles
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE