Skip to Main content Skip to Navigation
Journal articles

An enhanced 3DCNN‐ConvLSTM for spatiotemporal multimedia data analysis

Abstract : At present, human action recognition is a challenging and complex task in the field of computer vision. The combination of CNN and RNN is a common and effective network structure for this task. Especially, we use 3DCNN in CNN part and ConvLSTM in RNN part. We divide the video into multiple temporal segments by average and compress each segment into one feature map by pooling layer. Adding the pooling layer, dropout layer, and batch normalization layer into ConvLSTM is our groundbreaking work. We test our model on KTH, UCF‐11, and HMDB51 datasets and achieve a high accuracy of action recognition.
Document type :
Journal articles
Complete list of metadatas

https://hal-utt.archives-ouvertes.fr/hal-02297518
Contributor : Jean-Baptiste Vu Van <>
Submitted on : Thursday, September 26, 2019 - 11:18:38 AM
Last modification on : Friday, July 17, 2020 - 8:32:02 PM

Identifiers

Collections

ROSAS | CNRS | UTT

Citation

Tian Wang, Jiakun Li, Mengyi Zhang, Aichun Zhu, Hichem Snoussi, et al.. An enhanced 3DCNN‐ConvLSTM for spatiotemporal multimedia data analysis. Concurrency and Computation: Practice and Experience, Wiley, 2019, pp.e5302. ⟨10.1002/cpe.5302⟩. ⟨hal-02297518⟩

Share

Metrics

Record views

30