Repository at Hanyang University: Tracking Persons-of-Interest via Unsupervised Representation Adaptation

Browse

My Repository

Repository at Hanyang UniversityCOLLEGE OF ENGINEERING[S](공과대학)COMPUTER SCIENCE(컴퓨터소프트웨어학부)Articles

317 0

Full metadata record

DC Field	Value	Language
dc.contributor.author	임종우	-
dc.date.accessioned	2021-03-31T01:54:38Z	-
dc.date.available	2021-03-31T01:54:38Z	-
dc.date.issued	2020-01	-
dc.identifier.citation	INTERNATIONAL JOURNAL OF COMPUTER VISION, v. 128, no. 1, page. 96-120	en_US
dc.identifier.issn	0920-5691	-
dc.identifier.issn	1573-1405	-
dc.identifier.uri	https://link.springer.com/article/10.1007%2Fs11263-019-01212-1	-
dc.identifier.uri	https://repository.hanyang.ac.kr/handle/20.500.11754/160988	-
dc.description.abstract	Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often can appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Existing multi-target tracking methods often use low-level features which are not sufficiently discriminative for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face representations using convolutional neural networks (CNNs). Unlike existing CNN-based approaches which are only trained on large-scale face image datasets offline, we automatically generate a large number of training samples using the contextual constraints for a given video, and further adapt the pre-trained face CNN to the characters in the specific videos using discovered training samples. The embedding feature space is fine-tuned so that the Euclidean distance in the space corresponds to the semantic face similarity. To this end, we devise a symmetric triplet loss function which optimizes the network more effectively than the conventional triplet loss. With the learned discriminative features, we apply an EM clustering algorithm to link tracklets across multiple shots to generate the final trajectories. We extensively evaluate the proposed algorithm on two sets of TV sitcoms and YouTube music videos, analyze the contribution of each component, and demonstrate significant performance improvement over existing techniques.	en_US
dc.description.sponsorship	The work is supported by National Basic Research Program of China (973 Program, 2015CB351705), National Key Research and Development Program of China (2017YFA0700805), NSFC (61703344), Office of Naval Research (N0014-16-1-2314), Ministry of Science and ICT of Korea (NRF-2017R1A2B4011928 and Next-Generation Information Computing Development program NRF-2017M3C4A7069369), NSF CRII (1755785), NSF CAREER (1149783) and gifts from Adobe, Panasonic, NEC, and NVIDIA.	en_US
dc.language.iso	en	en_US
dc.publisher	SPRINGER	en_US
dc.subject	Face tracking	en_US
dc.subject	Transfer learning	en_US
dc.subject	Convolutional neural networks	en_US
dc.subject	Triplet loss	en_US
dc.title	Tracking Persons-of-Interest via Unsupervised Representation Adaptation	en_US
dc.type	Article	en_US
dc.relation.no	1	-
dc.relation.volume	128	-
dc.identifier.doi	10.1007/s11263-019-01212-1	-
dc.relation.page	96-120	-
dc.relation.journal	INTERNATIONAL JOURNAL OF COMPUTER VISION	-
dc.contributor.googleauthor	Zhang, Shun	-
dc.contributor.googleauthor	Huang, Jia-Bin	-
dc.contributor.googleauthor	Lim, Jongwoo	-
dc.contributor.googleauthor	Gong, Yihong	-
dc.contributor.googleauthor	Wang, Jinjun	-
dc.contributor.googleauthor	Ahuja, Narendra	-
dc.contributor.googleauthor	Yang, Ming-Hsuan	-
dc.relation.code	2020053839	-
dc.sector.campus	S	-
dc.sector.daehak	COLLEGE OF ENGINEERING[S]	-
dc.sector.department	DEPARTMENT OF COMPUTER SCIENCE	-
dc.identifier.pid	jlim	-

Appears in Collections:: COLLEGE OF ENGINEERING[S](공과대학) > COMPUTER SCIENCE(컴퓨터소프트웨어학부) > Articles

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show simple item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE