Text mining using automatic classification and summarization algorithm for news reports
- Title
- Text mining using automatic classification and summarization algorithm for news reports
- Author
- 서강원
- Advisor(s)
- 배석주
- Issue Date
- 2016-02
- Publisher
- 한양대학교
- Degree
- Master
- Abstract
- As services for news searching engine have been popular, interest in automatic classification and summarization becomes growing. Especially this automatic system will be of use for mobile or navigation users.
This research developed a methodology to quickly monitor key intelligence areas, provided a method that consolidates information into an understandable, concise groups of topics and sentences of interest. This research evaluated and altered some existing analysis methods, and developed an overall framework for classification and summarization.
Clustering analysis is commonly used for document classification. Among clustering methods, K-means algorithm is well known for effectively classifying large documents. However, as computerized database has been
growing exponentially, the accuracy of clustering algorithm falls and time for algorithm increases highly.
This research studied for classifying news reports in large data and extracting key sentences for a certain topic. This proposed algorithm does not just assign categories by the frequency of the words and extract sentences involving frequent words like existing algorithms. It adopted association analysis to increase accuracy for classification and a modified version of K-means algorithm to reduce clustering time. Also, it extracted key sentences within the specific areas according to the number of sentences in a report.
The proposed algorithm was applied to a real news report data containing 21974 articles from October 7th 2014 to October 20th 2014. The results showed that this algorithm has a better performance than many of other popular existing algorithms.
- URI
- https://repository.hanyang.ac.kr/handle/20.500.11754/127185http://hanyang.dcollection.net/common/orgView/200000428190
- Appears in Collections:
- GRADUATE SCHOOL[S](대학원) > INDUSTRIAL ENGINEERING(산업공학과) > Theses (Master)
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML