Repository at Hanyang University: 행렬 인수 분해를 이용한 추천 시스템 비교 분석

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)APPLIED STATISTICS(응용통계학과)Theses (Master)

965 0

행렬 인수 분해를 이용한 추천 시스템 비교 분석

Title: 행렬 인수 분해를 이용한 추천 시스템 비교 분석

Other Titles: Comparative analysis of the recommendation system using matrix factorization

Author: 정지웅

Alternative Author(s): JeeWoong Jeong

Advisor(s): 최정순

Issue Date: 2022. 2

Publisher: 한양대학교

Degree: Master

Abstract: 대부분의 기업은 고객의 니즈를 파악하고 적절한 상품 (또는 서비스)을 추천하는 것에 큰 관심을 갖고 있다. 4차 산업혁명이 진행되고 있는 현재에는 빅 데이터를 이용해 개인화 추천 시스템을 구축하여 고객이 원하는 상품을 보다 쉽게 찾을 수 있도록 하고 있다. 이를 통해 고객 만족도를 높이고 더 많은 매출도 기대할 수 있다. 추천 시스템에서는 고객의 선호도 데이터와 사람들의 선호도 데이터를 이용하여 상품을 추천한다. 먼저 고객과 비슷한 선호 패턴을 보인 사람들을 찾고 그 사람들이 긍정적인 평가 또는 관심을 보인 상품을 고객에게 추천한다. 이와 같은 과정을 협업 필터링 (Collaborative Filtering)이라 한다. 협업 필터링을 위한 데이터로는 고객이 상품에 남긴 평가 데이터가 있는데, 이 데이터를 재구성하여 행, 열, 성분이 각각 고객, 상품, 평가를 나타내도록 하는 행렬로 나타낼 수 있다. 그런데 고객 개인이 상품에 대해서 내린 평가는 전체 상품에 비해서 매우 적은 것이 일반적이다. 따라서 행렬은 희소 행렬 (Sparse Matrix) 형태가 된다. 행렬 인수 분해는 협업 필터링의 일종으로 사용자와 상품에 각각 잠재 인수 (Latent Factor) 벡터를 부여하고 이를 이용해 사용자와 상품의 잠재 인수 행렬을 정의한 후 행렬 곱의 성분이 각각 평점을 나타내는 것으로 보는 것이다. 본 연구에서는 4가지 행렬 인수 분해의 성능을 통해 추천 시스템에서 평점과 잠재 인수의 범위에 제약을 추가하는 것이 유효한지 알아보고 분포를 이용한 행렬 인수 분해 모형이 그렇지 않은 모형과 차이가 있는지 알아보았다. 활용한 행렬 인수 분해 4가지는Funk 행렬 인수 분해 (Funk Matrix Factorization, Funk MF), 가우시안 행렬 인수 분해 (Gaussian Matrix Factorization, GMF), Regularized Single-Element-Based 비음수 행렬 인수 분해 (Regularized Single-Element-Based Nonnegative Matrix Factorization, RSNMF), Sum Conditioned 포아송 행렬 인수 분해 (Sum Conditioned Poisson Factorization, SCPF)이다. 각각 영화, 책, 농담에 대한 평점 데이터인 MovieLens, Book Crossing, 그리고 Jester 데이터를 이용하여 행렬 인수 분해를 한 결과, 잠재 인수에 제약 조건이 없는 행렬 인수 분해 모형 (FMF, GMF)이 제약 조건이 있는 비음수 범위 행렬 인수 분해 모형 (RSNMF, SCPF)보다 좋고 안정적인 성능을 보여주었고 분포를 가정하는 행렬 인수 분해 모형 (GMF, SCPF)과 분포를 가정하지 않는 행렬 인수 분해 모형 (FMF, RSNMF)은 데이터에 따라 다른 성능을 보여주었다. |Many companies are very interested in identifying customer needs and recommending items (or services) that customers will be satisfied with. In the era of the 4th Industrial Revolution, the personalized recommendation system built using big data makes it easier for customers to find the items they want. With this system, customer satisfaction can be increased and greater sales can be expected. When recommending a item to a specific customer, the recommendation system recommends the item using the preference data of the customer and others. First, the recommendation system finds people with similar preferences to the customer and recommends items that they have shown positive ratings or interests to customers. This process is called collaborative filtering.As data for collaborative filtering, there is rating data assigned by customers to items, and this data can be reconstructed and represented as a matrix in which rows, columns, and components represent customers, items, and ratings, respectively. However, the actual number of rating data is generally very small compared to the total number of matrix components. Therefore, the matrix becomes a sparse matrix. Matrix factorization is a type of collaborative filtering, which assigns latent factor vectors to users and items, respectively, and uses them to define latent factor matrices for users and items. After that, the multiplication of the two matrices is regarded as the component of one matrix representing each rating. In this study, through the performance of the four matrix factorizations, it was investigated whether it is effective to add constraints to the range of ratings and latent factors in the recommendation system, and whether the matrix factorization model using a distribution differs from a model that does not. The four matrix factorizations are: Funk Matrix Factorization (FMF), Gaussian Matrix Factorization (GMF), Regularized Single-Element-Based Nonnegative Matrix Factorization (RSNMF) and Sum Conditioned Poisson Factorization (SCPF). As a result of matrix factorization using MovieLens, BookCrossing, and Jester data, respectively, rating data for movies, books, and jokes, matrix factorization models with no constraints on latent factors (FMF, GMF) showed better and more stable performance than non-negative range matrix factorization models (RSNMF, SCPF). In addition, the matrix factorization model assuming a distribution and the matrix factorization model not assuming a distribution showed different performance depending on the data.

URI: http://hanyang.dcollection.net/common/orgView/200000593035 https://repository.hanyang.ac.kr/handle/20.500.11754/167897

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > APPLIED STATISTICS(응용통계학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE