Repository at Hanyang University: Effects of Outlier Dimensions on Contextual Information in BERT

Effects of Outlier Dimensions on Contextual Information in BERT

Abstract: Recently, Transformer-based networks have shown excellent performance in natural language processing. BERT, a language model created by partially modifying the structure of the Transformer, is used in many natural language processing tasks and has shown satisfactory results. BERT encodes embedding vectors by adding contextual information through selfattention, in which the embedding vectors of related words become similar. This can be confirmed by the cosine similarity between words. Although the cosine similarity between words is low, due to the lack of contextual information in the initial layer of BERT, the cosine similarity between words increases as contextual information is added through self-attention toward the final layer. However, it has been proved that certain dimensions of embedding vectors play a dominant role in high cosine similarity values. In this study, first, we visualized whether BERT actually encodes contextual information by clustering word embedding vectors for each layer. Then, to find how the outlier dimensions governing cosine similarity would affect that BERT encodes contextual information, we clustered the word embedding vectors before and after clearing outlier dimensions and then compared the two results.

URI: http://hanyang.dcollection.net/common/orgView/200000650804 https://repository.hanyang.ac.kr/handle/20.500.11754/179798

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > APPLIED ARTIFICIAL INTELLIGENCE(인공지능융합학과) > Theses(Master)

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository