150 0

Self-Supervised Learning from Non-object Centric Images with a Geometric Transformation Sensitive Architecture

Title
Self-Supervised Learning from Non-object Centric Images with a Geometric Transformation Sensitive Architecture
Author
김태호
Advisor(s)
이종민
Issue Date
2023. 8
Publisher
한양대학교
Degree
Master
Abstract
The prevailing paradigm in the realm of self-supervised learning strategies, particularly those premised on invariance, is to lean on datasets comprised of object-centric imagery (a prominent example being ImageNet) during the pretraining phase. The goal of these strategies is to extract features that are invariant to various transformations. However, this approach is not without its challenges when the images being handled are not object-centric. The semantics of such images can undergo significant changes due to cropping operations. Moreover, as the model becomes more and more impervious to geometric transformations, its ability to effectively capture location-specific information becomes compromised. To address these challenges, our research introduces an innovative model architecture, which we have termed the Geometric Transformation Sensitive Architecture (GTSA). Unlike previous approaches, the GTSA is deliberately designed to maintain sensitivity to geometric transformations, with a particular emphasis on transformations like four-fold rotation, random cropping, and multi-cropping. We imbue the model with sensitivity by designing the student model to predict rotations and employ targets that adjust in accordance with the transformations, accomplished by pooling and rotating the teacher feature map. In addition to this, we also incorporate a patch correspondence loss into our model. This technique motivates the model to establish correspondences between patches that exhibit similar feature characteristics. This approach enables the model to capture long-term dependencies more effectively than by promoting local-to-global correspondence, which is commonly seen in models that are trained to disregard multi-cropping. To validate our methodology, we conduct a comprehensive ablation study examining the role of rotation prediction and patch correspondence loss in enhancing the performance of our model. Our findings confirm that these components significantly contribute to the improved performance of the model. When we put the GTSA to the test by pretraining it on non-object-centric images, it outshines competing methods that train their models to ignore geometric transformations. Our GTSA model surpasses the DINO baseline across multiple tasks, including image classification, semantic segmentation, detection, and instance segmentation. The improvements are measurable, with gains of 4.9 in Top-1, 3.3 in , 3.4 in , and 2.7 in .
URI
http://hanyang.dcollection.net/common/orgView/200000683743https://repository.hanyang.ac.kr/handle/20.500.11754/186721
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > ARTIFICIAL INTELLIGENCE(인공지능학과) > Theses(Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE