Effects of Outlier Dimensions on Contextual Information in BERT

Title
Effects of Outlier Dimensions on Contextual Information in BERT
Author
심유라
Advisor(s)
김영훈
Issue Date
2023. 2
Publisher
한양대학교
Degree
Master
Abstract
Recently, Transformer-based networks have shown excellent performance in natural language processing. BERT, a language model created by partially modifying the structure of the Transformer, is used in many natural language processing tasks and has shown satisfactory results. BERT encodes embedding vectors by adding contextual information through selfattention, in which the embedding vectors of related words become similar. This can be confirmed by the cosine similarity between words. Although the cosine similarity between words is low, due to the lack of contextual information in the initial layer of BERT, the cosine similarity between words increases as contextual information is added through self-attention toward the final layer. However, it has been proved that certain dimensions of embedding vectors play a dominant role in high cosine similarity values. In this study, first, we visualized whether BERT actually encodes contextual information by clustering word embedding vectors for each layer. Then, to find how the outlier dimensions governing cosine similarity would affect that BERT encodes contextual information, we clustered the word embedding vectors before and after clearing outlier dimensions and then compared the two results.
URI
http://hanyang.dcollection.net/common/orgView/200000650804https://repository.hanyang.ac.kr/handle/20.500.11754/179798
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > APPLIED ARTIFICIAL INTELLIGENCE(인공지능융합학과) > Theses(Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE