682 0

딥 러닝 기반의 비속어 감지 시스템 설계 및 구현

Title
딥 러닝 기반의 비속어 감지 시스템 설계 및 구현
Other Titles
Design and implementation of abuse sentence detecting system based on Deep Learning
Author
최준한
Alternative Author(s)
Choi, Jun Han
Advisor(s)
조인휘
Issue Date
2020-08
Publisher
한양대학교
Degree
Master
Abstract
인터넷자율정책기구 KISO에 따르면 2018년 기준으로 국내 대형 포털 등에서 관리되는 비속어 리스트가 11만개에 달한다고 하며, 이 수치는 언어와 인터넷 매체의 특성상 더욱 증가하고 빠르게 전파되고 있는 추세이다. 게임, 포털 사이트, SNS 등의 서비스에서 발생하는 사이버폭력에 대하여 제한된 운영 인력으로 대응하는 것이 점점 어려워지고 있다. 최근 자연어 처리 분야에서 딥러닝 기반의 인공신경망 모델이 유의미한 성능으로 발전하고 있다. 본 연구에서는 사이버폭력 문제 해결을 위해 딥러닝 기반의 인공신경망 모델 중 BERT 모델을 연구하여 비속어 예측과 이를 활용한 비속어 감지 시스템을 제안하고 이와 관련한 설계 및 구현과 시스템에 대한 성능 평가를 진행한다. 본 논문의 결과를 통해서 잘 훈련된 인공신경망 모델이 비속어 감지 주제에 있어 효과적으로 동작함을 확인하였다. 실제로 최종 성능 평가에서 95.12%의 정확성을 보였으며, 구현된 시스템에서 비속어 판단 수행시 GPU 하드웨어 가속 없이 건당 1.73초의 평균처리 속도를 확인했다. 이러한 판단 성능과 처리 속도는 사이버폭력 대응에 관한 실서비스에서 충분히 활용 가능한 수치이다.; The flow of information on the Internet will continue to increase, and so will increase cyber violence. In this paper, we proposed an artificial intelligence processing system to deal with such cyber violence. Artificial intelligence technology has developed significantly in the last few years, and artificial intelligence technology in the field of natural language processing is gradually nearing human processing power. In this paper, we consider prior studies on the subject of abuse sentence detecting in order to solve the cyber violence problem, and study the BERT model based on the transformer model among the deep learning techniques to implement the artificial neural network model for slang detecting. Among the realized models, the model with the highest evaluation score is selected and the abuse sentence detecting system using the model is proposed and implementation. The implications of this research are that the artificial neural network model well trained in the theme of slang sentence works effectively in the predict that requires grasping the context(the target is 3,138 data that are not actually participating in learning) As a result of evaluating the final performance, 2985 cases were predicted and the accuracy was 95.12%(2,985/3,138). Also, in this study, we confirmed the average processing speed of 1.73 seconds per processing time of 196 length sentences through the performance measurement of the system constructed assuming a real service. It is a performance that can be enough used in actual services. Looking at the results of previous research on the theme of abuse sentence detection, Although there is a difference between the learning data domain and the research methodology, 90.4% for the research using CNN and 86.42 to 92.05% for the research using RNN, LSTM, and GRU. Taken together, as artificial neural network technology develops, performance is gradually increasing and is expected to develop further in the nearing future. In this study, the same sentence(e.g homonym) was read in the slang word judgment when creating training data, and this sometimes made the criteria of judgment ambiguous. Such ambiguity of judgment may affect the learning and performance of the artificial neural network. Providing clear standards for the scope of sentences dealing with cyber violence in an authorized body such as the Broadcasting and Communications Commission will be a great help in the progress of research on cyber violence. In future research, it is necessary to apply the artificial neural network theory that has been further developed on the abuse sentence detecting theme and to carry out research under big data and high performance computing environment to study the performance improvement of the predict model.
URI
https://repository.hanyang.ac.kr/handle/20.500.11754/153289http://hanyang.dcollection.net/common/orgView/200000438440
Appears in Collections:
GRADUATE SCHOOL OF ENGINEERING[S](공학대학원) > ELECTRICAL ENGINEERING AND COMPUTER SCIENCE(전기ㆍ전자ㆍ컴퓨터공학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE