147 0

Transformer Encoder Attention Module

Title
Transformer Encoder Attention Module
Author
박해성
Alternative Author(s)
Hae Sung Park
Advisor(s)
최용석
Issue Date
2023. 8
Publisher
한양대학교
Degree
Master
Abstract
The recent trend in video understanding utilizes self-supervised video pre-training, aiming to enable transformers to better capture the main contexts in video. Specifically, the VideoMAE (Video Masked Auto Encoder) employs a high ratio of tube masking and reconstruction as pre-training. This pre-training approach enables seamless comprehension of core characteristics in each video, resulting in remarkable performance across various video datasets. However, during the action recognition stage, the VideoMAE relies on full video frames that inherently contain temporal redundancy, leading to spatial bias in the model. To address this issue, we propose a novel module named, Transformer Encoder Attention Module (TEAM), which effectively prevents the model from being spatially biased and enhances the context modeling ability of the model. The TEAM first figures out the core features among the overall extracted features from each video. After that, it discerns the specific parts of the video where those features are located, encouraging the model to focus more on these informative parts. Consequently, our module guides the VideoMAE to effectively focus on important aspects within each video, identifying what aspects are important and where they are located. This enables it to model more accurate spatio-temporal contexts. We conduct extensive experiments to explore the optimal structure that allows for the effective integration of the TEAM with the VideoMAE. Furthermore, the integrated model (VideoMAE+TEAM) outperforms the existing VideoMAE by a significant margin on Something-Something-V2 (71.3% vs 70.3%). Finally, the qualitative results demonstrate that our module encourages the model to disregard noise and focus more on the essential video features, capturing more precise context from each video.
URI
http://hanyang.dcollection.net/common/orgView/200000684849https://repository.hanyang.ac.kr/handle/20.500.11754/187089
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > ARTIFICIAL INTELLIGENCE(인공지능학과) > Theses(Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE