134 0

An Explainable Deep Learning Algorithm with the Improved LRP for Video Classification

Title
An Explainable Deep Learning Algorithm with the Improved LRP for Video Classification
Author
김택위
Alternative Author(s)
JIN ZEWEI
Advisor(s)
조인휘
Issue Date
2024. 2
Publisher
한양대학교 대학원
Degree
Master
Abstract
An Explainable Deep Learning Algorithm with the Improved LRP for Video Classification JIN ZEWEI Department of Computer Science The Graduate School of Hanyang University Supervisor professor: Inwhee Joe The deep learning models often have typical "black box" characteristics. The reason for this is due to the extensive parameterization of parameters in the algorithm model, which leads to the deep network model performing highly nonlinear operations. As a result, the model loses its Explainable analysis. In this paper, two video classification models based on deep networks are introduced. First, Convolutional neural network is used as the encoder to extract the feature space of the frame in the video information, and then it is combined with LSTM model to process the data of time series and generate the corresponding description as the decoder. In the second model, a CNN network (VGG16) is combined with LSTM to train using time series images. For explainable analysis, we use the explainable model for networks, Layer-Wise Relevance Propagation (LRP). On the basis of the traditional LRP model, we regard the relevant neural networks as an information flow network as a whole to combine the calculation principle of mutual information from information theory in the explainable analysis. Positive-negative entropy to better preserves the activation characteristics of network nodes. Finally, in order to analyze the performance of the two video classification models and ultimately draw Explainable conclusions about the models, performance validation was conducted using the Common Video Dataset (UCF11). The accuracy of the classification model is: ConvLSTM: 75.94%, VGG16+LSTM: 92.50%. From the experimental results, we found the VGG16+LSTM classification model tends to use the frames biased towards the latter half of the video and the last frame as the basis for classification. As a comparison, interpretability analysis was performed using multiple LRP variant models. We have improved the traditional LRP model based on the related concepts of mutual information in information theory. The mutual information value calculated between the positive-negative correlation activation and the overall correlation activation is used as the basis for further strengthening the correlation activation. The positive-negative activation in the original LRP relevance propagation equation is further strengthened through parameter adjustment to better preserve the positive-negative activation characteristics. In the end, the improved model was used to achieve better explainable analysis results at the time series analysis level of video data. KeyWords: Deep Learning, Video Classification, Explainability, LSTM, ConvLSTM
URI
http://hanyang.dcollection.net/common/orgView/200000720655https://repository.hanyang.ac.kr/handle/20.500.11754/188373
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE(컴퓨터·소프트웨어학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE