Repository at Hanyang University: 문장 생성 모델 학습 및 관광지 리뷰 데이터를 활용한 관광지 분류 모델

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)COMPUTER SCIENCE(컴퓨터·소프트웨어학과)Theses (Master)

146 0

문장 생성 모델 학습 및 관광지 리뷰 데이터를 활용한 관광지 분류 모델

Title: 문장 생성 모델 학습 및 관광지 리뷰 데이터를 활용한 관광지 분류 모델

Other Titles: Tourist Attraction Classification Model using Sentence Generation Model and Review Data

Author: 문준형

Advisor(s): 조인휘

Issue Date: 2024. 2

Publisher: 한양대학교 대학원

Degree: Master

Abstract: 인터넷 및 소셜네트워크가 발전하여 이로 인해 세상에 많은 데이터들이 생기기 시작했다. 이렇게 생겨난 여러 데이터들을 통해 필요한 정보를 얻어내고자 하는 이른바 빅데이터 시대가 시작되었다. 이어서 인공지능기술이 부상하기 시작했고 인공지능, 딥러닝을 활용해 사람들 개개인이 가지고 있는 정보를 바탕으로 개인의 취향에 맞는 콘텐츠를 추천해 주는 방법들이 많이 생겨났다. 하지만 이러한 추천을 하기 위해서는 사람들 개개인의 취향분석 뿐만 아니라 콘텐츠의 성향을 분석하는 것도 중요하다. 최근 이러한 추천 모델은 스트리밍 서비스의 증가에 따라 주로 음악, 영상 등에 많이 사용되고 있다. 음악이나 영상은 장르가 정해져 있는 만큼 성향 분석이 비교적 간단해서 추천 모델 활용에 적합하지만 관광지의 경우 관광객 개개인마다 느낀 점이 다를 수 있기 때문에 관광지의 성향을 분류하는 것은 음악, 동영상에 비해 어려움이 있다. 취향에 따른 추천을 하기 위해서는 콘텐츠를 각각 라벨링 하여 분류해야 하는데 관광지와 같이 장르가 정해져있지 않은 콘텐츠를 분류하기 위해서는 사람이 직접 라벨링 작업을 거쳐야 한다. 이때 한 명의 사람이 라벨링을 하게 된다면 그 사람의 개인적 주관이 포함된 라벨링이기 때문에 누군가에겐 제대로 된 추천을 할 수 없게 된다. 따라서 본 논문에서는 생성 모델을 활용한 BERT 분류 모델 auto fine-tuning 방법을 제안하고, 네이버 지도에서 크롤링 한 리뷰 데이터를 활용해 사전 학습된 분류 모델로 관광지를 분류하는 방법을 제안한다. 성능 평가는 개인이 직접 라벨링 하여 학습한 BERT 분류 모델과 생성 모델을 활용하여 학습한 BERT 분류 모델의 정확도를 비교했다. 사람이 직접 라벨링 하여 학습한 모델의 학습 정확도는 0.93%, 생성 모델로 학습한 모델의 학습 정확도는 0.98%로 학습 정확도에 있어서는 생성 모델로 학습한 모델이 학습 정확도가 더 높은 것을 확인할 수 있었다. 하지만 실제 관광지 데이터로 테스트한 결과 두 모델의 관광지 분류 정확도는 차이가 없는 것을 확인했다. 본 연구를 통해서 지도학습의 가장 큰 단점인 라벨링을 사람의 손이 아닌 생성 모델에게 맡김으로 좀 더 빠르고 간편한 학습이 가능한 것을 확인하였으며 논문에서 제안하는 방법이 라벨링이 필요한 여러 지도학습 자연어 처리 모델의 학습에 도움을 줄 수 있을 것이라 기대한다.|The development of the Internet and social networks has resulted in the creation of vast amounts of data worldwide, marking the onset of what is known as the era of big data. Subsequently, the rise of artificial intelligence technology has introduced numerous methods for recommending content tailored to individual preferences by utilizing AI and deep learning based on the information each person possesses. However, it is crucial not only to analyze individual preferences but also to understand the nature of the content to make these recommendations effectively. Recently, these recommendation models have been primarily employed in music and video streaming services due to their increased usage. While analyzing tendencies in music or videos is relatively straightforward due to their defined genres, classifying the nature of tourist destinations proves more challenging as each tourist might perceive them differently. To make recommendations based on preferences, each content needs to be labeled and classified. However, for content like tourist destinations that lack specific genres, direct human labeling becomes necessary. Yet, if only one person labels such content, it becomes subjective to that person's views, making it impossible to provide accurate recommendations to others. Hence, in this paper, we propose a method for auto fine-tuning BERT classification models using a generated model and a technique for classifying tourist destinations into a pre-trained classification model using review data crawled from Naver Maps. The performance evaluation compared the accuracy of the BERT classification model learned through direct human labeling and the one trained using a generative model. The learning accuracy of the model trained through human labeling was 0.93%, while the model trained with the generative model achieved a learning accuracy of 0.98%, indicating higher learning accuracy with the generative model. However, testing with actual tourist destination data revealed no difference in the accuracy of tourist destination classification between the two models. Through this study, it was confirmed that employing a generative model for labeling, instead of relying on human hands, enables faster and more straightforward learning, addressing one of the major drawbacks of supervised learning. It is anticipated that the proposed method in this paper could assist in training various supervised learning natural language processing models that require labeling.

URI: http://hanyang.dcollection.net/common/orgView/200000720738 https://repository.hanyang.ac.kr/handle/20.500.11754/188381

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE(컴퓨터·소프트웨어학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE