571 0

Deep Learning-Based Pedestrian Detection under Various Illumination Conditions

Title
Deep Learning-Based Pedestrian Detection under Various Illumination Conditions
Author
윤판
Alternative Author(s)
윤판
Advisor(s)
Hyunchul Shin
Issue Date
2021. 2
Publisher
한양대학교
Degree
Doctor
Abstract
The objective of this research is to develop deep learning-based pedestrian detection techniques that enable pedestrian detection under various illumination conditions, such as daytime, nighttime, low light, shadows, overexposure, and total darkness. Currently, most existing pedestrian detectors rely on good lighting conditions, which are very likely to be stuck under adverse illumination conditions. In this thesis, novel and efficient pedestrian detection techniques are proposed for good performance in various illumination conditions. To detect pedestrians under various illumination conditions, this thesis applies deep learning for the effective fusion of the visible (VI) and infrared (IR) information in multispectral images. This thesis firstly proposes a multilayer fused convolutional neural network (MLF-CNN) which consists of a proposal generation stage and a detection stage. In the first stage, an MLF region proposal network is designed, and a summation fusion method is proposed for the integration of the two convolutional layers. This combination can detect pedestrians in different scales, even in adverse illumination conditions. Furthermore, instead of extracting features from a single layer, the proposed method extracts features from three feature maps and match the scale using the fused ROI pooling layers. This new multiple-layer fusion technique can significantly reduce the detection miss rate. Extensive evaluations well demonstrate that the proposed MLF-CNN achieves competitive performance. Even though the proposed MLF-CNN achieves desirable results, there are still some open problems such as poor performance in small-sized pedestrian detection and high computational cost of multispectral information fusion. To overcome the abovementioned problems, this thesis further proposes a multilayer fused deconvolutional single-shot detector (MFDSSD) that contains a two-stream convolutional module (TCM) and a multilayer fused deconvolutional module (MFDM). The TCM is used to extract convolutional features from multispectral input images. Then fusion blocks are incorporated into the MFDM to combine high-level features with rich semantic information and low-level features with detailed information to generate features with strong representational power for small pedestrian instances. In addition, the multispectral information is fused at multiple deconvolutional layers in the MFDM via fusion blocks. This multilayer fusion strategy adaptively makes the most use of VI and IR information. In addition, using fusion blocks for multilayer fusion can reduce the extra computational cost and redundant parameters. Empirical experiments show that the proposed approach achieves the best performance in detection accuracy as well as the detection speed. On the KAIST multispectral pedestrian dataset, for example, the proposed method achieves a 7.09% miss rate and a 20 fps detection speed, which outperforms the state-of-the-art published method by 4.35% in miss rate and is three times faster in its detection speed. In a total darkness environment, the multispectral detectors may not work well since the VI sensor only works when there is a substantial amount of visual information in the environment. In addition, the multispectral detectors require fully aligned VI and IR images as inputs. The IR sensors do not require external light but mainly rely on the radiant temperature of the object. Therefore, an IR camera system is proposed to identify pedestrians in total darkness that uses a novel attention-guided encoder-decoder convolutional neural network (AED-CNN). In AED-CNN, encoder-decoder modules are introduced to generate multi-scale features, in which new skip connection blocks are incorporated into the decoder to combine the feature maps from the encoder and decoder module. This new architecture increases context information which is helpful for extracting discriminative features from low-resolution and noisy IR images. Furthermore, an attention module is proposed to re-weight the multi-scale features generated by the encoder-decoder module. The attention mechanism effectively highlights pedestrians while eliminating background interference. Empirical experiments fully demonstrate that the proposed method shows the best performance. For example, the proposed AED-CNN achieves an average precision of 87.68% which significantly improves the average precision of the state-of-the-art method by 23.78% on Computer Vision Center (CVC)-09 pedestrian dataset.
URI
https://repository.hanyang.ac.kr/handle/20.500.11754/159319http://hanyang.dcollection.net/common/orgView/200000485427
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING(전자공학과) > Theses (Ph.D.)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE