Two Stream Self-supervised Learning for Action Recognition

Two Stream Self-supervised Learning for Action Recognition

Conference

Abstract

We present a self-supervised approach using spatiotemporal signals between video frames for action recognition. A two-stream architecture is leveraged to tangle spatial and temporal representation learning. Our task is formulated as both a sequence verification and spatiotemporal alignment tasks. The former task requires motion temporal structure understanding while the latter couples the learned motion with the spatial representation. The self-supervised pre-trained weights effectiveness is validated on the action recognition task. Quantitative evaluation shows the self-supervised approach competence on three datasets: HMDB51, UCF101, and Honda driving dataset (HDD). Further investigations to boost performance and generalize validity are still required.

Details

PUBLISHED IN
Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018
PUBLICATION DATE
18 kesäkuuta 2018
AUTHORS
Ahmed Taha, Moustafa Meshry, Xitong Yang, Teruhisa Misu, Yi-Ting Chen, Larry Davis