259 0

소셜 폭소노미를 활용한 의미기반 트윗 군집화 및 요약기법

Title
소셜 폭소노미를 활용한 의미기반 트윗 군집화 및 요약기법
Other Titles
Semantic Based Tweet Clustering and Summarization Exploiting Social Folksonomy
Author
허지욱
Alternative Author(s)
Jee-Uk Heu
Advisor(s)
이동호
Issue Date
2016-02
Publisher
한양대학교
Degree
Doctor
Abstract
As the rapid growth of the Internet and smart multimedia devices, users can obtain general and common information from a folksonomy system and a Socail Network Services, such as Wikipedia, Flickr, Del.ici.ous. Twitter and Facebook. However, users must manually review all of the searched documents without any assistance from search engines, which requires too much time and effort. Therefor, it is necessary to analyze and refine these contents and then cluster them by corresponding to the interest of the user. In this thesis, we have presented a novel semantic based tweet clustering and summarization system using by TagCluster that is collective intelligence from the Flickr to support analysis and calculation of words and sentences for Twitter which is one of the Social Network Services. The proposed system consists of 1) semantic based tweet clustering algorithm and 2) semantic based tweet summarization by exploiting folksononmy and user influence semantic analysis. For semantic based tweet clustering, we propose semantic based K-means clustering algorithm which not only measures the similarity between the data represented as vector space model but also measures the semantic similarity between the data by exploiting the TagCluster for clustering a large volume of tweets. Tweet is often too short and informal to provide sufficient information possesses a major challenge. Therefore, previous clustering algorithm handles multimedia which provides a lot of data for analyzing information, such as documents, images, and videos, but not enough to apply to SNSs data which lacks sufficient contextual information. For semantic based tweet summarization, we designed a novel document summarization system called FoDoSu that employs the TagClusters used by Flickr, a folksonomy system, for detecting key sentences from multiple documents. When analyzing the semantics of the words, there are many proper nouns and newly-coined words in the documents such as the names of people and products. It is hard to analyze the semantics of these words using WordNet because it does not cover proper nouns and newly-coined words. For this reason, we use the Flickr TagCluster instead of WordNet when analyzing the semantics of proper nouns and newly-coined words. The proposed method consists of word analysis step and sentence analysis step. In word analysis step. In word analysis step, we create a word frequency table for analyzing the semantics and contributions of words using LiteHITS algorithm which is modified the HITS algorithm. Then, by exploiting TagClusters, we analyze the semantic relationships between words in the word frequency table. In sentence analysis step, we create a summary of multiple documents by analyzing the importance of each word and its semantic relatedness to others. And then we extract the most meaningful tweets in each cluster, we also propose a new tweet summarization technique that analyzes the twitter user information for measuring the influence of users and exploits our designed document summarization method. Finally, through the experimental results, we show the effect of tweet summarization technique.
URI
https://repository.hanyang.ac.kr/handle/20.500.11754/126514http://hanyang.dcollection.net/common/orgView/200000428564
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE & ENGINEERING(컴퓨터공학과) > Theses (Ph.D.)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE