Repository at Hanyang University: Swin Transformer for Real World Super Resolution Using Locally-enhanced Position Encoding

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)DEPARTMENT OF ELECTRONIC ENGINEERING(융합전자공학과)Theses (Master)

365 0

Swin Transformer for Real World Super Resolution Using Locally-enhanced Position Encoding

Title: Swin Transformer for Real World Super Resolution Using Locally-enhanced Position Encoding

Other Titles: 지역적 위치 인코딩을 사용한 트랜스포머 기반 초해상도 모델

Author: 유동균

Alternative Author(s): 유동균

Advisor(s): 정제창

Issue Date: 2022. 2

Publisher: 한양대학교

Degree: Master

Abstract: Recently, with the development of image-related technologies, the image resolution and amount of information of image have increased. In order to utilize these images, image compression and transforming low resolution image are used. However, when converting from a low-resolution image to a high-resolution image, it is hard to restore due to loss of pixel. To handle this problem, super resolution method that converts low resolution image into high resolution image is used. Deep learning based models using CNN(Convolution neural network) have remarkable performance in super resolution task. However, these models would not perform well in real applications which involve the blur, noise, compressed images. To solve these problem, we use degradation model which make synthesized paired dataset. It effects on data augmentation and has well performance on reducing blur and noise. Transformer, which has been used in natural language processing in computer vision field, has been used, and it has shown high performance in several fields such as image classification and object detection. We apply Shifted-window Transformer (Swin Transformer) module for calculation efficiency. And Swin Transformer use relative position encoding, however, our model use Locally-enhanced position encoding using depth-wise convolution due to importance of adjacent pixel relation and information. Besides, we train our model with Generative Adversarial Network(GAN) to obtain natural texture and sharped edge. We present ablation study that effect of each component though peak signal to noise ratio(PSNR) and structural similarity. And GAN based model can figure out better performance in visual comparison. Besides, we compare our model and state of the art SR models with visual comparison. |최근 이미지 관련 기술들의 발전으로 이미지의 해상도와 정보량이 증가하였다. 이 이미지들을 활용하기 위해서는 이미지를 압축하거나 해상도를 낮추는 방법으로 사용한다. 하지만 저해상도 이미지에서 고해상도 이미지로 변환할 때, 화소 값의 손실로 인해 복원하기가 어렵다. 이 문제를 해결하기 위해서 저해상도에서 고해상도 이미지로 변환해주는 초해상도 기법이 사용된다. 기존에는 CNN을 이용한 딥러닝 기반 모델을 통해 저해상도 이미지에서 고해상도 이미지로 복원할 수가 있다. 하지만 기존 모델들은 블러, 노이즈, 영상 압축 등이 포함되어 있는 실제 영상기기에서는 크게 효과를 보지 못하였다. 이 문제를 해결하기 위해 본 논문에서는 먼저 열화 모델을 통해 블러나 노이즈를 인위적으로 학습데이터를 만들었고 이를 통해 데이터 증강 효과를 얻을 수가 있으며 노이즈나 블러 제거에 큰 효과가 있다. 최근에 컴퓨터 비전 분야에서 자연어 처리에서 사용되어왔던 트랜스포머가 사용되면서 이미지 분류 및 객체 검출 등 여러 분야에서 높은 성능을 보여왔다. 이에 따라 본 논문에서는 이 트랜스포머 모듈을 적용하였고 창 크기 단위로 잘라서 계산하는 Swin 트랜스포머(Shifted window Transformer)를 사용하여 모델을 구현하였다. 그리고 기존 Swin 트랜스포머에서 상대적 위치 인코딩이 사용되어왔는데, 초해상도 모델은 인접 픽셀간에 관계 및 정보가 더 중요하기 때문에 Depth-wise 컨볼루션을 이용한 지역적 위치 인코딩으로 모델을 설계하였다. 또한 자연스러운 질감과 선명한 이미지를 얻기 위해 적대적 신경망을 통해 모델을 학습하였다. 이 논문은 각 성분에 대한 효과를 알아보기 위해 최대 신호 대 잡음 비 (PSNR: peak signal to noise ratio)와 구조적 유사도(SSIM: structural similarity)를 통해 지표에서 수치적으로 향상된 절제 연구 결과를 보였다. 또한 GAN모델을 사용하였을 때, 객관적 수치에선 낮지만 주관적 비교인 시각적 비교를 통해 선명하고 자연스러운 질감의 이미지를 확인할 수 있다. 마지막으로 제안하는 모델과 SOTA 모델들 간에 결과 이미지를 비교하여 제안하는 모델의 성능을 확인할 수 있다.

URI: http://hanyang.dcollection.net/common/orgView/200000590799 https://repository.hanyang.ac.kr/handle/20.500.11754/167840

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > DEPARTMENT OF ELECTRONIC ENGINEERING(융합전자공학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE