R. Skeleton and (. Yun, , p.70, 2012.

Y. , Joint Features, vol.80, p.30, 2012.

W. Hbrnn-(du and W. , , vol.80, p.40, 2015.

. Charm-(li, , vol.83, p.90, 2015.

L. Deep and . Zhu, , vol.86, p.3, 2016.

Y. Ji and C. , Joint Features, vol.86, p.90, 2014.

. St-lstm-(liu, , vol.88, p.60, 2016.

. Co-occurrence+deep-lstm-;-zhu, , p.41, 2016.

S. Song, , p.51, 2017.

. St-lstm+trust, ;. Gates, and . Liu, , vol.93, p.30, 2016.

. St-nbmim-(weng, , vol.93, p.30, 2018.

. Clips+cnn+mtln-(ke, , vol.93, p.57, 2017.

, CNN Kernel Feature Map, p.36, 2018.

G. Liu, , 2018.

, 16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit, vol.32

, 16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit, vol.32

, 16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit, vol.32

, 16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit, vol.32

, 16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit: BN-ReLU-Conv.-BN-ReLU-Dropout-Conv.,16 filters Residual unit, vol.32

. Si, ont proposé une architecture profonde appelée AGC-LSTM qui dispose d'un mécanisme d'attention. L'AGC-LSTM est capable d'extraire des caractéristiques discriminantes dans la dynamique spatio-temporelle et d'explorer la relation de cooccurrence entre le domaine spatial et le domaine temporel. Cela permet à cette architecture d'accroître la capacité d'apprendre la représentation sémantique de haut niveau en sélectionnant des informations spatiales discriminantes à, 2019.

S. Abu-el-haija, Youtube-8M: A large-scale video classification benchmark, 2016.

J. K. Aggarwal and Q. Cai, Human motion analysis: A review, Computer Vision and Image Understanding, vol.73, pp.428-440, 1999.

J. K. Aggarwal and L. Xia, Human activity recognition from 3D data: A review, Pattern Recognition Letters, vol.48, pp.70-80, 2014.

U. Ahsan, C. Sun, and I. Essa, DiscrimNet: Semi-supervised action recognition from videos using generative adversarial networks, 2018.

A. Alfaro, D. Mery, and A. Soto, Action recognition in video using sparse coding and relative features, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2688-2697, 2016.

K. H. Ali and T. Wang, Learning features for action recognition and identity with deep belief networks, International Conference on Audio, Language and Image Processing (ICALIP), pp.129-132, 2014.

M. Aliakbarian and . Sadegh, Encouraging LSTMs to anticipate actions very early, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.280-289, 2017.

N. S. Altman, An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician, vol.46, pp.175-185, 1992.

, ASUS Xtion Pro depth sensor, ASUS, pp.2018-2019, 2018.

M. Baccouche, Sequential deep learning for human action recognition, International Workshop on Human Behavior Understanding (HBU), pp.29-39, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01354493

S. Baek, Kinematic-layout-aware random forests for depthbased action recognition, 2016.

S. Bai, V. Zico-kolter, and . Koltun, An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, 2018.

L. Ballan, Effective codebooks for human action representation and classification in unconstrained videos, IEEE Transactions on Multimedia, vol.14, pp.1234-1245, 2012.

Z. Barret and V. L. Quoc, Neural architecture search with reinforcement learning, 2017.

C. Beaudry, R. Péteri, and L. Mascarilla, An efficient and sparse approach for large scale human action recognition in videos, Machine Vision and Applications, vol.27, pp.529-543, 2016.

Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks 5.2, pp.1045-9227, 1994.

H. Bilen, Dynamic image networks for action recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3034-3042, 2016.

D. M. Blei, Y. Andrew, M. I. Ng, and . Jordan, Latent dirichlet allocation, Journal of Machine Learning Research (JMLR), vol.3, pp.993-1022, 2003.

L. Bottou, Large-scale machine learning with stochastic gradient descent, International Conference on Computational Statistics (COMPSTAT), pp.177-186, 2010.

S. Brahnam and L. Nanni, High performance set of features for human action classification, Computer Vision, & Pattern Recognition (IPCV), pp.980-984, 2009.

C. Cao, Action recognition with joints-pooled 3D deep convolutional descriptors, International Joint Conference on Artificial Intelligence (IJCAI), vol.1, p.3, 2016.

Z. Cao, Realtime multi-person 2D pose estimation using part affinity fields, IEEE Computer Vision and Pattern Recognition (CVPR), 2017.

J. Carreira and A. Zisserman, Quo vadis, action recognition? a new model and the kinetics dataset, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4724-4733, 2017.

D. Castro, Predicting daily activities from egocentric images using deep learning, The ACM International Symposium on Wearable Computers (ISWC), pp.75-82, 2015.

A. Chaaraoui, J. R. Andre, F. Padilla-lopez, and . Florez-revuelta, Fusion of skeletal and silhouette-based features for human action recognition with RGB-D devices, International Conference on Computer Vision (ICCV), pp.91-97, 2013.

K. Chatfield, Return of the devil in the details: Delving deep into convolutional nets, 2014.

R. Chaudhry, Bio-inspired dynamic 3D discriminative skeletal features for human action recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.471-478, 2013.

C. Chen, R. Jafari, and N. Kehtarnavaz, Action recognition from depth sequences using depth motion maps-based local binary patterns, IEEE Winter Conference on Applications of Computer Vision (WCAV), pp.1092-1099, 2015.

C. Chen, K. Liu, and N. Kehtarnavaz, Real-time human action recognition based on depth motion maps, J. Real-Time Image Processing, vol.12, 2013.

M. Chen, Marginalized stacked denoising autoencoders, Proceedings of the Learning Workshop, vol.36, pp.7-15, 2012.

G. Cheng, Advances in human action recognition: a survey, 2015.

G. Chéron, I. Laptev, and C. Schmid, P-CNN: Pose-based CNN features for action recognition, IEEE International Conference on Computer Vision (ICCV), pp.3218-3226, 2015.

E. Cippitelli, Evaluation of a skeleton-based method for human activity recognition on a large-scale RGB-D dataset, IET International Conference on Technologies for Active and Assisted Living, pp.1-6, 2016.

E. Cippitelli, A human activity recognition system using skeleton data from RGB-D sensors, Computational Intelligence and Neuroscience, p.21, 2016.

D. Clevert, T. Unterthiner, and S. Hochreiter, Fast and accurate deep network learning by exponential linear units (ELUs), 2015.

C. Cortes and V. Vapnik, Support-vector networks, pp.273-297, 1995.

L. Cruz, D. Lucio, and L. Velho, Kinect and RGB-D images: Challenges and applications, SIBGRAPI Conference on Graphics, Patterns and Images Tutorials, pp.36-49, 2012.

N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
URL : https://hal.archives-ouvertes.fr/inria-00548512

, , vol.1, pp.886-893

M. Devanne, Space-time pose representation for 3D human action recognition, International Conference on Image Analysis and Processing (ICIAP), pp.456-464, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00839494

W. Ding, Profile HMMs for skeleton-based human action recognition, Signal Processing: Image Communication, vol.42, pp.109-119, 2016.

T. Dobhal, Human activity recognition using binary motion image and deep learning, Procedia Computer Science, vol.58, pp.178-185, 2015.

P. Dollár, Behavior recognition via sparse spatio-temporal features, Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS), pp.65-72, 2005.

J. Donahue, Long-term recurrent convolutional networks for visual recognition and description, IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp.2625-2634, 2015.

W. Du, Y. Wang, and Y. Qiao, RPAN: An end-to-end recurrent poseattention network for action recognition in videos, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3725-3734, 2017.

Y. Du, Marker-less 3D human motion capture with monocular image sequence and height-maps, European Conference on Computer Vision (ECCV), pp.20-36, 2016.

Y. Du, W. Wang, and L. Wang, Hierarchical recurrent neural network for skeleton based action recognition, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp.1110-1118, 2015.

A. Eitel, Multimodal deep learning for robust RGB-D object recognition, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp.681-687, 2015.

G. Evangelidis, G. Singh, and R. Horaud, Skeletal quads: Human action recognition using joint quadruples, International Conference on Pattern Recognition (ICPR), pp.4513-4518, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00989725

J. Fan, Human tracking using convolutional neural networks, IEEE Transactions on Neural Networks, vol.21, pp.1610-1623, 2010.

C. Feichtenhofer, A. Pinz, and R. Wildes, Spatiotemporal residual networks for video action recognition, Advances in Neural Information Processing Systems (NIPS), pp.3468-3476, 2016.

C. Feichtenhofer, A. Pinz, and R. P. Wildes, Spatiotemporal multiplier networks for video action recognition, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.7445-7454, 2017.

C. Feichtenhofer, A. Pinz, and A. Zisserman, Convolutional two-stream network fusion for video action recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1933-1941, 2016.

B. Fernando, Discriminative hierarchical rank pooling for activity recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1924-1932, 2016.

Y. Fisher and K. Vladlen, Multi-scale context aggregation by dilated convolutions, 2015.

P. Foggia, Exploiting the deep learning paradigm for recognizing human actions, IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp.93-98, 2014.

P. Foggia, Recognizing human actions by a bag of visual words, IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp.2910-2915, 2013.

K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, pp.267-285, 1980.

S. Gaglio, G. Lo-re, and M. Morana, Human activity recognition process using 3-D posture data, IEEE Transactions on Human-Machine Systems 45.5, pp.586-597, 2014.

C. Gan, Devnet: A deep event network for multimedia event detection and evidence recounting, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2568-2577, 2015.

Z. Gao, Human action recognition via multi-modality information, Journal of Electrical Engineering and Technology, vol.9, issue.2, pp.739-748, 2014.

M. A. Giese and T. Poggio, Cognitive neuroscience: Neural mechanisms for the recognition of biological movements, Nature Reviews Neuroscience, vol.4, p.179, 2003.

G. Gkioxari, R-CNNs for pose estimation and action detection, 2014.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), pp.249-256, 2010.

X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), pp.315-323, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752497

D. Gong, G. G. Medioni, and X. Zhao, Structured time series analysis for human action segmentation and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.36, pp.1414-1427, 2014.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, 2016.

I. Goodfellow, Generative adversarial nets, Advances in Neural Information Processing Systems (NIPS), pp.2672-2680, 2014.

A. Gorban, THUMOS Challenge: Action recognition with a large number of classes, pp.2019-2023, 2015.

L. Gorelick, Actions as space-time shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.29, pp.2247-2253, 2007.

M. A. Gowayyed, Histogram of oriented displacements (HOD): Describing trajectories of human joints for action recognition, IJCAI, 2013.

A. Graves, Supervised sequence labelling with recurrent neural networks, Studies in Computational Intelligence, 2008.

A. Graves, A. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks, Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, pp.6645-6649, 2013.

F. Gu, Marginalised stacked denoising Autoencoders for robust representation of real-time multi-View action recognition, Sensors 15, pp.17209-17231, 2015.

T. Guha and R. Ward, Learning sparse representations for human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.34, pp.1576-1588, 2012.

G. Guo and A. Lai, A survey on still image based human action recognition, Pattern Recognition, vol.47, pp.3343-3361, 2014.

J. Han, Enhanced computer vision with Microsoft Kinect sensor: A review, IEEE Transactions on Cybernetics, vol.43, pp.1318-1334, 2013.

M. Hasan and A. , Continuous learning of human activity models using deep nets, European Conference on Computer Vision (ECCV), pp.705-720, 2014.

K. He and J. Sun, Convolutional neural networks at constrained time cost, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5353-5360, 2015.

K. He, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp.1026-1034, 2015.

P. Heckbert, Fourier transforms and the fast Fourier transform (FFT) algorithm, In: Computer Graphics, vol.2, pp.15-463, 1995.

F. C. Heilbron, ActivityNet: A large-scale video benchmark for human activity understanding, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.961-970, 2015.

H. Pham and . Huy, Skeletal movement to color map: A novel representation for 3D action recognition with Inception residual networks, IEEE International Conference on Image Processing (ICIP), pp.3483-3487, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02193711

G. Hinton, A practical guide to training restricted Boltzmann machines, Momentum, pp.599-619, 2010.

G. E. Hinton, Training products of experts by minimizing contrastive divergence, Neural Computation, vol.14, pp.1771-1800, 2002.

G. E. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for deep belief nets, Neural Computation, vol.18, pp.1527-1554, 2006.

G. E. Hinton, T. J. Sejnowski, and D. Ackley, Boltzmann machines: Constraint satisfaction networks that learn, 1984.

G. E. Hinton, Improving neural networks by preventing coadaptation of feature detectors, 2012.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9, vol.8, pp.899-7667, 1997.

Y. Hou, Skeleton optical spectra based action recognition using convolutional neural networks, IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) PP.99, pp.1051-8215, 2017.

E. Hsu, K. Pulli, and J. Popovi?, Style translation for human motion, ACM Transactions on Graphics (TOG), vol.24, pp.1082-1089, 2005.

J. Hu, Jointly learning heterogeneous features for RGB-D activity recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5344-5352, 2015.

J. Hu, Jointly learning heterogeneous features for RGB-D activity recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5344-5352, 2015.

F. Huang, Y. Jie, Y. Boureau, and . Lecun, Unsupervised learning of invariant feature hierarchies with applications to object recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-8, 2007.

G. Huang, Densely connected convolutional networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2261-2269, 2017.

D. H. Hubel, N. Torsten, and . Wiesel, Receptive fields, binocular interaction and functional architecture in the cat's visual cortex, The Journal of Physiology, vol.160, pp.106-154, 1962.

P. J. Huber, Robust estimation of a location parameter, Breakthroughs in Statistics, pp.492-518, 1992.

M. E. Hussein, Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations, Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. IJCAI '13. Beijing, pp.2466-2472, 2013.

M. S. Ibrahim, A hierarchical deep temporal model for group activity recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1971-1980, 2016.

M. S. Ibrahim, A hierarchical deep temporal model for group activity recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1971-1980, 2016.

N. Ikizler and P. Duygulu, Human action recognition using distribution of oriented rectangular patches, Human Motion-Understanding, Modeling, Capture and Animation, pp.271-284, 2007.

L. Ilya and H. Frank, SGDR: Stochastic gradient descent with warm restarts, 2016.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, International Conference on Machine Learning (ICML), pp.448-456, 2015.

C. Ionescu, Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 36.7, pp.1325-1339, 2014.

P. Isola, Image-to-image translation with conditional adversarial networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1125-1134, 2017.

L. Ivan, Action recognition using rate-invariant analysis of skeletal shape trajectories, The INRIA Computer Vision and Machine Learning, 2012.

J. Sung, Unstructured human activity detection from RGB-D images, 2012 IEEE International Conference on Robotics and Automation, pp.842-849, 2012.

A. Jain, Learning human pose estimation features with convolutional networks, 2013.

A. Jain, Modeep: A deep learning framework using motion features for human pose estimation, Asian Conference on Computer Vision (ACCV), pp.302-315, 2014.

H. Jhuang, A biologically inspired system for action recognition, 2007.

S. Ji, 3D convolutional neural networks for human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.35, pp.221-231, 2013.

Y. Ji, G. Ye, and H. Cheng, Interactive body part contrast mining for human interaction recognition, IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp.1-6, 2014.

Y. Jiang, THUMOS challenge: Action recognition with a large number of classes, pp.2019-2023, 2014.

K. Jin, Action recognition using vague division depth motion maps, The Journal of Engineering, vol.1, 2017.

H. Kaiming, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778, 2016.

S. Kang, . Min, P. Richard, and . Wildes, Review of action recognition and detection methods, 2016.

A. Karpathy, Large-scale video classification with convolutional neural networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1725-1732, 2014.

A. Karpathy, Large-scale video classification with convolutional neural networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1725-1732, 2014.

I. Katircioglu, Learning latent representations of 3D human pose with deep neural networks, International Journal of Computer Vision (IJCV) 126, vol.12, pp.1326-1341, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02509358

Q. Ke, A new representation of skeleton sequences for 3d action recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4570-4579, 2017.

S. Ke, A review on video-based human activity recognition, vol.2, pp.88-131, 2013.

. Kim, J. S. Ho-joon, H. Lee, and . Yang, Human action recognition using a modified convolutional neural network, International Symposium on Neural Networks (ISNN), pp.715-723, 2007.

. Kim, J. Ho-joon, H. Lee, and . Yang, A weighted FMM neural network and its application to face detection, International Conference on Neural Information Processing (ICONIP), pp.177-186, 2006.

T. Kim, A. Soo, and . Reiter, Interpretable 3D human action analysis with temporal convolutional networks, 2017.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

G. Klambauer, Self-normalizing neural networks, Advances in Neural Information Processing Systems (NIPS), pp.971-980, 2017.

A. Klaser, M. Marsza?ek, and C. Schmid, A spatio-temporal descriptor based on 3D-gradients, British Machine Vision Conference (BMVC), pp.275-276, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00514853

J. F. Kooij, N. Schneider, and D. M. Gavrila, Analysis of pedestrian dynamics from a vehicle perspective, IEEE Intelligent Vehicles Symposium Proceedings (IVSP), pp.1445-1450, 2014.

H. Koppula, R. Swetha, A. Gupta, and . Saxena, Learning human activities and object affordances from RGB-D videos, The International Journal of Robotics Research, vol.32, pp.951-970, 2013.

H. Koppula, A. Swetha, and . Saxena, Learning spatio-temporal structure from RGB-D videos for human activity detection and anticipation, International Conference on Machine Learning (ICML), pp.792-800, 2013.

A. Krizhevsky, Learning multiple layers of features from tiny images, 2009.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), pp.1097-1105, 2012.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), pp.1097-1105, 2012.

H. Kuehne, HMDB: A large video database for human motion recognition, International Conference on Computer Vision (ICCV), pp.2556-2563, 2011.

H. Kuehne, HMDB: a large video database for human motion recognition, International Conference on Computer Vision (ICCV), pp.2556-2563, 2011.

K. Kulkarni, Continuous action recognition based on sequence alignment, International Journal of Computer Vision, vol.112, pp.90-114, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01058732

I. Kviatkovsky, E. Rivlin, and I. Shimshoni, Online action recognition using covariance of shape and motion, Computer Vision and Image Understanding 129, pp.15-26, 2014.

I. Laptev, Learning realistic human actions from movies, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-8, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00548659

I. Laptev, On space-time interest points, International Journal of Computer Vision, vol.64, pp.107-123, 2005.

I. Laptev, Learning realistic human actions from movies, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-8, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00548659

C. Lea, Temporal convolutional networks for action segmentation and detection, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.156-165, 2017.

Y. Lecun, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86, vol.11, pp.2278-2324, 1998.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature 521, p.436, 2015.

Y. Lecun, Backpropagation applied to handwritten zip code recognition, Neural Computation 1, pp.541-551, 1989.

Y. Lecun, Backpropagation applied to handwritten zip code recognition, Neural Computation 1, pp.541-551, 1989.

Y. Lecun, Efficient backprop, Neural networks: Tricks of the trade, pp.9-50, 1998.

Y. Lecun, Effiicient backProp, Neural networks: Tricks of the trade, pp.9-50, 1998.

C. Ledig, Photo-realistic single image super-resolution using a generative adversarial network, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4681-4690, 2017.

H. Lee, Efficient sparse coding algorithms, Proceedings of the 19th International Conference on Neural Information Processing Systems. NIPS'06, pp.801-808, 2006.

H. Lee, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, International Conference on Machine Learning (ICML), pp.609-616, 2009.

I. Lee, Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks, 2017 IEEE International Conference on Computer Vision (ICCV), pp.1012-1020, 2017.

C. Li, Joint distance maps based action recognition with convolutional neural networks, IEEE Signal Processing Letters, vol.24, pp.624-628, 2017.

C. Li, Skeleton-based action recognition using LSTM and CNN, IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp.585-590, 2017.

D. Li, Unified spatio-temporal attention networks for action recognition in videos, IEEE Transactions on Multimedia 21, vol.2, pp.416-428, 2019.

Q. Li, Action recognition by learning deep multi-granular spatiotemporal video representation, Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp.159-166, 2016.

S. Li and A. B. Chan, 3D human pose estimation from monocular images with deep convolutional neural network, Asian Conference on Computer Vision (ACCV), pp.332-347, 2014.

W. Li, Z. Zhang, and Z. Liu, Action recognition based on a bag of 3D points, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.9-14, 2010.

W. Li, Category-blind human action recognition: A practical recognition system, IEEE International Conference on Computer Vision (ICCV), pp.4444-4452, 2015.

X. Li, Region-based activity recognition using conditional GAN, Proceedings of the 2017 ACM on Multimedia Conference, pp.1059-1067, 2017.

Y. Li, Online human action detection using joint classificationregression recurrent neural networks, European Conference on Computer Vision (ECCV), pp.203-220, 2016.

B. Liang and L. Zheng, Three dimensional motion trail model for gesture recognition, IEEE International Conference on Computer Vision (ICCV), pp.684-691, 2013.

J. Ling, L. Tian, and C. Li, 3D human activity recognition using skeletal data from RGB-D sensors, International Symposium on Visual Computing (ISVC), pp.133-142, 2016.

A. A. Liu, Benchmarking a multimodal and multiview and interactive dataset for human action recognition, IEEE Transactions on Cybernetics 47, vol.7, pp.2168-2267, 2017.

A. Liu, Hierarchical clustering multi-task learning for joint human action grouping and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.39, pp.102-114, 2017.

C. Liu, Learning motion and content-dependent features with convolutions for action recognition, Multimedia Tools and Applications, vol.75, pp.13023-13039, 2015.

F. Liu, Simple to complex transfer learning for action recognition, IEEE Transactions on Image Processing, vol.25, pp.949-960, 2016.

J. Liu, J. Luo, and M. Shah, Recognizing realistic actions from videos "in the wild, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1996-2003, 2009.

J. Liu, Spatio-temporal LSTM with trust gates for 3D human action recognition, European Conference on Computer Vision (ECCV), pp.816-833, 2016.

J. Liu, Global context-aware attention LSTM networks for 3d action recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4674-4683, 2017.

J. Liu, Skeleton-based human action recognition with global contextaware attention LSTM networks, IEEE Transactions on Image Processing, vol.27, pp.1586-1599, 2018.

L. Liu, L. Shao, and P. Rockett, Genetic programming-evolved spatiotemporal descriptor for human action recognition, British Machine Vision Conference (BMVC), pp.1-12, 2012.

M. Liu, H. Liu, and C. Chen, Enhanced skeleton visualization for view invariant human action recognition, Pattern Recognition, pp.346-362, 2017.

D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, pp.91-110, 2004.

Z. Lu and Y. Peng, Latent semantic learning with structured sparse representation for human action recognition, Pattern Recognition, vol.46, pp.1799-1809, 2013.

J. Luo, W. Wang, and H. Qi, Group sparsity and geometry constrained dictionary learning for action recognition from depth maps, International Conference on Computer Vision (ICCV), pp.1809-1816, 2013.

Z. Luo, Unsupervised learning of long-term motion dynamics for videos, pp.2203-2212, 2017.

M. Luong, H. Pham, and C. Manning, Effective approaches to attention-based neural machine translation, 2015.

D. C. Luvizon, D. Picard, and H. Tabia, 2D/3D pose estimation and action recognition using multitask deep learning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5137-5146, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01815703

F. Lv and R. Nevatia, Recognition and segmentation of 3D human action using HMM and multi-class Adaboost, Proceedings of the 9th, 2006.

, European Conference on Computer Vision -Volume Part IV. ECCV'06, pp.359-372

B. Mahasseni and S. Todorovic, Regularizing long short term memory with 3D human-skeleton sequences for action recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3054-3062, 2016.

A. I. Maqueda, Human-action recognition module for the new generation of augmented reality applications, International Symposium on Consumer Electronics (ISCE), pp.262-264, 2015.

M. Marszalek, I. Laptev, and C. Schmid, Actions in context, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2929-2936, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00548645

J. Martinez, A simple yet effective baseline for 3D human pose estimation, IEEE International Conference on Computer Vision (ICCV), pp.2640-2649, 2017.

D. Mehta, Monocular 3D human pose estimation in the wild using improved CNN supervision, International Conference on 3D Vision (3DV), pp.506-516, 2017.

D. Mehta, VNect: Real-time 3D human pose estimation with a single RGB camera, ACM Transactions on Graphics (TOG), vol.36, p.44, 2017.

R. Michael and A. Jake, Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities, In: ICCV, vol.1, p.2, 2009.

. Microsoft, Kinect for Windows -Human interface guidelines v2.0, 2014.

M. Mirza and S. Osindero, Conditional generative adversarial nets, 2014.

I. Misra, M. Lawrence-zitnick, and . Hebert, Shuffle and learn: unsupervised learning using temporal order verification, European Conference on Computer Vision (ECCV), pp.527-544, 2016.

. Mnih, N. Volodymyr, A. Heess, and . Graves, Recurrent models of visual attention, Advances in Neural Information Processing Systems, pp.2204-2212, 2014.

L. Mo, Human physical activity recognition based on computer vision with deep learning model, IEEE International on Instrumentation and Measurement Technology Conference Proceedings (I2MTC), pp.1-6, 2016.

T. B. Moeslund and E. Granum, A survey of computer vision-based human motion capture, Computer Vision and Image Understanding, vol.81, pp.231-268, 2001.

T. B. Moeslund, A. Hilton, and V. Krüger, A survey of advances in vision-based human motion capture and analysis, Computer Vision and Image Understanding, vol.104, pp.90-126, 2006.

B. Moez, Spatio-temporal convolutional sparse Autoencoder for sequence classification, British Machine Vision Conference (BMVC), pp.1-12, 2012.

V. Nair and G. E. Hinton, 3D object recognition with deep belief nets, Proceedings of the 22Nd International Conference on Neural Information Processing Systems. NIPS'09, pp.1339-1347, 2009.

V. Nair and G. E. Hinton, Rectified linear units improve restricted boltzmann machines, Proceedings of the 27th International Conference on Machine Learning (ICML), pp.807-814, 2010.

A. Newell, K. Yang, and J. Deng, Stacked hourglass networks for human pose estimation, European Conference on Computer Vision (ECCV), pp.483-499, 2016.

J. Ng and . Yue-hei, Beyond short snippets: Deep networks for video classification, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4694-4702, 2015.

B. X. Nie, C. Xiong, and S. Zhu, Joint action recognition and pose estimation from video, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1293-1301, 2015.

J. Niebles, C. Carlos, L. Chen, and . Fei-fei, Modeling temporal structure of decomposable motion segments for activity classification, European Conference on Computer Vision (ECCV), pp.392-405, 2010.

S. J. Nowlan and J. C. Platt, A convolutional neural network hand tracker, Proceedings of the 7th International Conference on Neural Information Processing Systems. NIPS'94, pp.901-903, 1994.

F. Ofli, Berkeley MHAD: A comprehensive multimodal human action database, IEEE Workshop on Applications of Computer Vision (WACV), pp.53-60, 2013.

S. Oh, AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video, IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp.527-528, 2011.

E. Ohn-bar and M. Trivedi, Joint angles similarities and HOG2 for action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.465-470, 2013.

B. Olshausen and D. Field, Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature, vol.381, p.607, 1996.

O. Oreifej and Z. Liu, HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.716-723, 2013.

S. Park, J. Hwang, and N. Kwak, 3D human pose estimation using convolutional neural networks with 2D pose information, European Conference on Computer Vision (ECCV), pp.156-169, 2016.

G. Pavlakos, Coarse-to-fine volumetric prediction for single-image 3D human pose, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.7025-7034, 2017.

D. Pavllo, 3D human pose estimation in video with temporal convolutions and semi-supervised training, 2018.

X. Peng, Bag of visual words and fusion methods for action recognition: comprehensive study and good practice, Computer Vision and Image Understanding (CVIU) 150, pp.109-125, 2016.

H. Pham, Efficient neural architecture search via parameters sharing, International Conference on Machine Learning (ICML), pp.4095-4104, 2018.

H. Pham, Exploiting deep residual networks for human action recognition from skeletal data, Computer Vision and Image Understanding, vol.170, pp.51-66, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02192228

S. Phung, A. Lam, and . Bouzerdoum, A pyramidal neural network for visual pattern recognition, IEEE Transactions on Neural Networks, vol.18, pp.329-343, 2007.

C. A. Pickering, K. J. Burnham, and M. J. Richardson, A research study of hand gesture recognition technologies and applications for human vehicle interaction, The 3rd Conference on Automotive Electronics -Institution of Engineering and Technology, pp.1-15, 2007.

S. M. Pizer, Adaptive histogram equalization and its variations, Computer Vision, Graphics, and Image Processing, vol.39, pp.355-368, 1987.

O. P. Popoola and K. Wang, Video-based abnormal human behavior recognition: A review, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews, vol.42, pp.865-878, 2012.

R. Poppe, A survey on vision-based human action recognition, Image and Vision Computing, vol.28, pp.976-990, 2010.

L. Presti, M. L. Lo, and . Cascia, 3D skeleton-based human action classification: A survey, Pattern Recognition, vol.53, pp.130-147, 2016.

S. Qin, Y. Yang, and Y. Jiang, Gesture recognition from depth images using motion and shape features, International Symposium on Instrumentation and Measurement, Sensor Network and Automation (IMSNA), pp.172-175, 2013.

A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, 2015.

H. Rahmani and M. Bennamoun, Learning action recognition model from depth and skeleton videos, IEEE International Conference on Computer Vision (ICCV), pp.5832-5841, 2017.

H. Rahmani and A. Mian, 3D action recognition from novel viewpoints, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1506-1515, 2016.

H. Rahmani, A. Mian, and M. Shah, Learning a deep model for human action recognition from novel viewpoints, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.40, pp.667-681, 2018.

H. Rahmani, Histogram of oriented principal components for cross-view action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.38, pp.2430-2443, 2016.

R. Raina, Self-taught learning: transfer learning from unlabeled data, International Conference on Machine Learning (ICML), pp.759-766, 2007.

V. Ramakrishna, T. Kanade, and Y. Sheikh, Reconstructing 3D human pose from 2D image landmarks, European Conference on Computer Vision (ECCV), pp.573-586, 2012.

S. Ranasinghe, F. A. Machot, and H. Mayr, A review on applications of activity recognition systems with regard to performance and evaluation, International Journal of Distributed Sensor Networks, vol.12, 2016.

K. K. Reddy and M. Shah, Recognizing 50 human action categories of web videos, Machine Vision and Applications, vol.24, pp.971-981, 2013.

S. Reed, Generative adversarial text to image synthesis, Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol.48, pp.1060-1069, 2016.

M. D. Rodriguez, J. Ahmed, and M. Shah, Action MACH a spatio-temporal maximum average correlation height filter for action recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-8, 2008.

N. Rodríguez and . Díaz, A survey on ontologies for human behavior recognition, ACM Computing Surveys, p.43, 2014.

D. W. Ruck, K. Steven, M. Rogers, and . Kabrisky, Feature selection using a multilayer perceptron, Journal of Neural Network Computing, vol.2, pp.40-48, 1990.

D. E. Rumelhart, G. E. Hinton, and R. Williams, Learning representations by back-propagating errors, Cognitive Modeling, vol.323, pp.533-536, 1986.

O. Russakovsky, ImageNet large scale visual recognition challenge, International Journal of Computer Vision (IJCV), vol.115, pp.211-252, 2015.

M. S. Ryoo and J. K. Aggarwal, Hierarchical recognition of human activities interacting with objects, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-8, 2007.

M. S. Ryoo and J. K. Aggarwal, Observe-and-explain: A new approach for multiple hypotheses tracking of humans and objects, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-8, 2008.

S. Sabour, N. Frosst, and G. E. Hinton, Dynamic routing between capsules, Advances in Neural Information Processing Systems (NIPS), pp.3859-3869, 2017.

R. Salakhutdinov and G. E. Hinton, Deep Boltzmann machines, Artificial Intelligence and Statistics Conference (AISTATS), pp.448-455, 2009.

A. Sargano and . Bux, Human action recognition using transfer learning with deep representations, 2017 International Joint Conference on, pp.463-469, 2017.

A. Savitzky and . Golay, Smoothing and differentiation of data by simplified least squares procedures, In: Analytical Chemistry, vol.36, issue.8, pp.1627-1639, 1964.

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, IEEE International Conference on Pattern Recognition (ICPR), 2004.

, , vol.3, pp.32-36

M. Schuster and K. K. Paliwal, Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing, vol.45, pp.2673-2681, 1997.

P. Sermanet and Y. Lecun, Traffic sign recognition with multi-scale convolutional networks, International Joint Conference on Neural Networks (IJCNN), pp.2809-2813, 2011.

P. Sermanet, Pedestrian detection with unsupervised multi-stage feature learning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3626-3633, 2013.

A. Shahroudy, NTU RGB+D: A large scale dataset for 3D human activity analysis, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1010-1019, 2016.

A. Shahroudy, Deep multimodal feature analysis for action recognition in RGB+ D videos, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 40.5, pp.1045-1058, 2017.

J. Shao, Deeply learned attributes for crowded scene understanding, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4657-4666, 2015.

S. Sharma, R. Kiros, and R. Salakhutdinov, Action recognition using visual attention, 2015.

Y. Shi, Learning long-term dependencies for action recognition with a biologically-inspired deep network, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.716-725, 2017.

Z. Shi and T. Kim, Learning and refining of privileged informationbased RNNs for action recognition from depth sequences, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4684-4693, 2017.

J. Shotton, Real-time human pose recognition in parts from single depth images, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.3, 2011.

L. Shuang, S. Xiao, and W. Yichen, Compositional human pose regression, Computer Vision and Image Understanding, pp.1-8, 2018.

C. Si, An attention enhanced graph convolutional LSTM network for skeleton-based action recognition, 2019.

R. Sigala, Learning features of intermediate complexity for the recognition of biological motion, International Conference on Artificial Neural Networks (ICANN), pp.241-246, 2005.

G. A. Sigurdsson, Hollywood in homes: Crowdsourcing data collection for activity understanding, European Conference on Computer Vision (ECCV), pp.510-526, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01418216

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, Advances in Neural Information Processing Systems (NIPS), 2014.

, Very deep convolutional networks for large-scale image recognition, 2014.

A. Singh, D. Patil, and S. N. Omkar, Eye in the sky: Real-time drone surveillance system (DSS) for violent individuals identification using scat-terNet hybrid deep learning network, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.1629-1637, 2018.

B. Singh, A multi-stream bi-directional recurrent neural network for fine-grained action detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1961-1970, 2016.

S. Singh, S. A. Velastin, and H. Ragheb, Muhavi: A multicamera human action video dataset for the evaluation of action recognition methods, IEEE International Conference on Advanced Video and Signal Based Surveillance, pp.48-55, 2010.

S. Singh, C. Arora, and C. V. Jawahar, First person action recognition using deep learned descriptors, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2620-2628, 2016.

C. Sminchisescu, 3D human motion analysis in monocular video techniques and challenges, IEEE International Conference on Video and Signal Based Surveillance (ICVSBS), pp.76-76, 2006.

S. Song, An end-to-end spatio-temporal attention model for human action recognition from skeleton data, Thirty-first AAAI conference on Artificial Intelligence (AAAI), pp.4263-4270, 2017.

P. Sonwalkar, Hand gesture recognition for real time human machine interaction system, International Journal of Engineering Trends and Technology, 2015.

K. Soomro, M. Amir-roshan-zamir, and . Shah, UCF101: A dataset of 101 human actions classes from videos in the wild, 2012.

N. Srivastava, E. Mansimov, and R. Salakhudinov, Unsupervised learning of video representations using LSTMs, International Conference on Machine Learning (ICML), pp.843-852, 2015.

N. Srivastava, Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

T. Subetha and S. Chitrakala, A survey on human activity recognition from videos, International Conference on Information Communication and Embedded Systems (ICICES), pp.1-7, 2016.

L. Sun, Human action recognition using factorized spatio-temporal convolutional networks, IEEE International Conference on Computer Vision (ICCV), pp.4597-4605, 2015.

L. Sun, Lattice long short-term memory for human action recognition, pp.2147-2156, 2017.

J. Sung, Human activity detection from RGB-D images, Plan, Activity, and Intent Recognition 64, 2011.

C. Szegedy, Going deeper with convolutions, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, 2015.

C. Szegedy, Going deeper with convolutions, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, 2015.

C. Szegedy, Rethinking the Inception architecture for computer vision, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2818-2826, 2016.

C. Szegedy, Inception-v4, Inception-ResNet and the impact of residual connections on learning, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. AAAI'17, pp.4278-4284, 2017.

A. Tanfous, H. Ben, B. Drira, and . Ben-amor, Coding Kendall's shape trajectories for 3D action recognition, IEEE Computer Vision and Pattern Recognition (CVPR), pp.2840-2849, 2018.

H. Tang, A comparative evaluation of deep belief nets in semi-supervised learning, 2008.

Y. Tas and P. Koniusz, CNN-based action recognition and supervised domain adaptation on 3D body skeletons via kernel feature maps, British Machine Vision Conference, p.158, 2018.

G. W. Taylor, E. Geoffrey, S. T. Hinton, and . Roweis, Modeling human motion using binary latent variables, Advances in Neural Information Processing Systems 19, pp.1345-1352, 2007.

B. Tekin, Direct prediction of 3D body poses from motion compensated sequences, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.991-1000, 2016.

M. Telgarsky, Benefits of depth in neural networks". In: arXiv preprint, 2015.

I. Theodorakopoulos, Pose-based human action recognition via sparse representation in dissimilarity space, Journal of Visual Communication and Image, pp.12-23, 2014.

Y. Tian, S. Rahul, and M. Shah, Spatiotemporal deformable part models for action detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2642-2649, 2013.

J. Tompson, Real-time continuous pose recovery of human hands using convolutional networks, ACM Transactions on Graphics (TOG), vol.33, p.169, 2014.

A. Tran and L. Cheong, Two-stream flow-guided convolutional attention networks for action recognition, IEEE International Conference on Computer Vision (ICCV), pp.3110-3119, 2017.

D. Tran, Learning spatiotemporal features with 3D convolutional networks, IEEE International Conference on Computer Vision (ICCV), pp.4489-4497, 2015.

P. Turaga, Machine recognition of human activities: A survey, IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 18, pp.1473-1488, 2008.

I. Ullah and A. Petrosino, A strict pyramidal deep neural network for action recognition, International Conference on Image Analysis and Processing (ICIP), pp.236-245, 2015.

, Spatiotemporal features learning with 3DPyraNet, International Conference on Advanced Concepts for Intelligent Vision Systems, pp.638-647, 2016.

M. Valera and S. A. Velastin, Intelligent distributed surveillance systems: a review, IEE Proceedings -Vision, Image and Signal Processing, vol.152, pp.1350-245, 2005.

M. F. Valstar, The first facial expression recognition and analysis challenge, IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp.921-926, 2011.

G. Varol, I. Laptev, and C. Schmid, Long-term temporal convolutions for action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 40, vol.6, pp.162-8828, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01241518

A. Vedaldi and K. Lenc, Matconvnet: Convolutional neural networks for matlab, Proceedings of the 23rd ACM international conference on Multimedia, pp.689-692, 2015.

V. Veeriah, N. Zhuang, and G. Qi, Differential recurrent neural networks for action recognition, IEEE International Conference on Computer Vision (ICCV), pp.4041-4049, 2015.

R. Vemulapalli, F. Arrate, and R. Chellappa, Human action recognition by representing 3D skeletons as points in a lie group, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.588-595, 2014.

A. Vieira, On the improvement of human action recognition from depth map sequences using Space-Time Occupancy Patterns, Pattern Recognition Letters (PRL), vol.36, pp.221-227, 2014.

A. W. Vieira, Stop: Space-time occupancy patterns for 3D action recognition from depth map sequences, Iberoamerican Congress on Pattern Recognition (ICPR), pp.252-259, 2012.

P. Vincent, Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th International Conference on Machine Learning (ICML). ICML'08, pp.1096-1103, 2008.

C. Vondrick, H. Pirsiavash, and A. Torralba, Generating videos with scene dynamics, Advances In Neural Information Processing Systems (NIPS), pp.613-621, 2016.

M. Vrigkas, C. Nikou, and . Kakadiaris, A review of human activity recognition methods, Frontiers in Robotics and AI, vol.2, p.28, 2015.

H. Wang, Action recognition by dense trajectories, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3169-3176, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00583818

H. Wang and C. Schmid, Action recognition with improved trajectories, IEEE International Conference on Computer Vision (ICCV), pp.3551-3558, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00873267

H. Wang and L. Wang, Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.499-508, 2017.

J. Wang, Mining actionlet ensemble for action recognition with depth cameras, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1290-1297, 2012.

J. Wang, Cross-view action modeling, learning, and recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2649-2656, 2014.

J. Wang, Z. Liu, and Y. Wu, Human action recognition with depth cameras, 2014.

K. Wang, 3D human activity recognition with reconfigurable convolutional neural networks, Proceedings of the ACM International Conference on Multimedia, pp.97-106, 2014.

L. Wang, W. Hu, and T. Tan, Recent developments in human motion analysis, Pattern Recognition, vol.36, pp.585-601, 2003.

L. Wang, Visual tracking with fully convolutional networks, IEEE International Conference on Computer Vision (ICCV), pp.3119-3127, 2015.

L. Wang, Y. Qiao, and X. Tang, Action recognition with trajectorypooled deep-convolutional descriptors, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4305-4314, 2015.

L. Wang, CUHK&SIAT submission for THUMOS'15 action recognition challenge, THUMOS'15 Action Recognition Challenge, pp.1-3, 2015.

L. Wang, Towards good practices for very deep two-stream convnets, 2015.

L. Wang, Temporal segment networks: towards good practices for deep action recognition, European Conference on Computer Vision (ECCV), pp.20-36, 2016.

L. Wang, Untrimmednets for weakly supervised action recognition and detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4325-4334, 2017.

M. Wang, B. Ni, and X. Yang, Recurrent modeling of interaction context for collective activity recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3048-3056, 2017.

P. Wang, Graph based skeleton motion representation and similarity measurement for action recognition, European Conference on Computer Vision (ECCV), pp.370-385, 2016.

P. Wang, ConvNets-based action recognition from depth maps through virtual cameras and pseudocoloring, Proceedings of the ACM International Conference on Multimedia (ACM), pp.1119-1122, 2015.

P. Wang, Deep convolutional neural networks for action recognition using depth map sequences, 2015.

P. Wang, Action recognition based on joint trajectory maps using convolutional neural networks, ACM Multimedia, pp.102-106, 2016.

P. Wang, Scene flow to action map: A new representation for RGB-D based action recognition with convolutional neural networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.595-604, 2017.

X. Wang, A. Farhadi, and A. Gupta, Actions transformations, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2658-2667, 2016.

Y. Wang, Two-stream SR-CNNs for action recognition in videos, British Machine Vision Conference (BMVC), vol.108, pp.1-12, 2016.

Y. Wang, Hierarchical attention network for action recognition in videos, 2016.

L. Wanqing, Z. Zhengyou, and L. Zicheng, Action recognition based on a bag of 3D points, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.9-14, 2010.

W. Niu, Human activity detection and recognition for video surveillance, IEEE International Conference on Multimedia and Expo (ICME), vol.1, pp.719-722, 2004.

D. Weinland, R. Ronfard, and E. Boyer, Free viewpoint action recognition using motion history volumes, Computer Vision and Image Understanding, vol.104, pp.249-257, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00544629

D. Weinland, R. Ronfard, and E. Boyer, A survey of visionbased methods for action representation, segmentation and recognition, Computer Vision and Image Understanding, vol.115, pp.224-241, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00640088

W. Lin, Human activity recognition for video surveillance, IEEE International Symposium on Circuits and Systems (ISCAS), pp.2737-2740, 2008.

J. Weng, C. Weng, and J. Yuan, Spatio-temporal naive-bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4171-4180, 2017.

J. Weng, Discriminative spatio-temporal pattern discovery for 3D action recognition, IEEE Transactions on Circuits and Systems for Video Technology, pp.1-1, 2018.

C. Wolf, Evaluation of video activity localizations integrating quality and quantity measurements, Computer Vision and Image Understanding 127, pp.14-30, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01283866

D. Wu, An adaptive stacked denoising auto-encoder architecture for human action recognition, Applied Mechanics & Materials, vol.631, pp.403-409, 2014.

J. Wu, Action recognition with joint attention on multi-level deep features, 2016.

L. Xia and C. Chen, View invariant human action recognition using histograms of 3D joints, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.20-27, 2012.

L. Xia, C. Chen, and J. K. Aggarwal, View invariant human action recognition using histograms of 3D joints, IEEE Conference on Computer Vision and Pattern Recognitio (CVPR), pp.20-27, 2012.

C. Xie, Memory attention networks for skeleton-based action recognition, 2018.

L. Xie, A pyramidal deep learning architecture for human action recognition, International Journal of Modelling, Identification and Control, vol.21, pp.139-146, 2014.

Z. Xingyi, Deep kinematic pose regression, European Conference on Computer Vision (ECCV), pp.186-201, 2016.

Y. Xiong, Recognize complex events from static images by fusing deep channels, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1600-1609, 2015.

Y. Xiong, CUHK & ETHZ & SIAT submission to ActivityNet challenge, 2016.

H. Xu, Spatio-temporal pyramid model based on depth maps for action recognition, IEEE International Workshop on Multimedia Signal Processing (MMSP), pp.1-6, 2015.

K. Xu, Show, attend and tell: Neural image caption generation with visual attention, International Conference on Machine Learning (ICML), pp.2048-2057, 2015.

T. Xu, Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition, Image and Vision Computing, vol.55, pp.127-137, 2016.

J. Yang, K. Yu, and T. Huang, Supervised translation-invariant sparse coding, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3517-3524, 2010.

J. Yang, Linear spatial pyramid matching using sparse coding for image classification, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.6, 2009.

X. Yang and Y. Tian, Super normal vector for activity recognition using depth sequences, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp.804-811, 2014.

Y. Yang, I. Saleemi, and M. Shah, Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.35, pp.1635-1648, 2013.

B. Yao and L. Fei-fei, Modeling mutual context of object and human pose in human-object interaction activities, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.17-24, 2010.

M. Ye and R. Yang, Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2345-2352, 2014.

S. Yeung, End-to-end learning of action detection from frame glimpses in videos, IEEE Conference on Computer Vision and Pattern Recognition, pp.2678-2687, 2016.

K. Yu, Y. Lin, and J. Lafferty, Learning image representations from the pixel level via hierarchical sparse coding, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1713-1720, 2011.

K. Yun, Two-person interaction detection using body-pose features and multiple instance learning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.28-35, 2012.

N. Yurii, A method for solving a convex programming problem with convergence rate O(1/K2), Soviet Mathematics Doklady, pp.372-367, 1983.

J. Zang, Attention-based temporal weighted convolutional neural network for action recognition, International Conference on Artificial Intelligence Applications and Innovations (IFIP), pp.97-108, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01821048

M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, European Conference on Computer Vision (ECCV), pp.818-833, 2014.

H. Zhang, Real-time action recognition based on a modified deep belief network model, IEEE International Conference on Information and Automation (ICIA), pp.225-228, 2014.

J. Zhang, RGB-D-based action recognition datasets: A survey, Pattern Recognition, vol.60, pp.86-105, 2016.

P. Zhang, View adaptive neural networks for high performance skeletonbased human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 1, pp.1-1, 2019.

S. Zhang, X. Liu, and J. Xiao, On geometric features for skeleton-based action recognition using multilayer lstm networks, IEEE Winter Conference on Applications of Computer Vision (WACV), pp.148-157, 2017.

Z. Zhang, Microsoft Kinect sensor and its effect, IEEE Multimedia 19, pp.4-10, 2012.

R. Zhao, H. Ali, P. Van-der, and . Smagt, Two-stream RNN/CNN for action recognition in 3D videos, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.4260-4267, 2017.

X. Zhou, Sparseness meets deepness: 3D human pose estimation from monocular video, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4966-4975, 2016.

G. Zhu, An online continuous human action recognition algorithm based on the Kinect sensor, Sensors 16, vol.2, p.161, 2016.

H. Zhu, R. Vial, and S. Lu, TORNADO: A spatio-temporal convolutional regression network for video action proposal, IEEE International Conference on Computer Vision (ICCV), pp.5813-5821, 2017.

W. Zhu, Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI'16, pp.3697-3703, 2016.

Y. Zhu, Sparse coding on local spatial-temporal volumes for human action recognition, Asian Conference on Computer Vision (ACCV), pp.660-671, 2010.

N. Zouba, Assessing computer systems for monitoring elderly people living at home, Proceedings of the World Congress of Gerontology and Geriatrics (IAGG), pp.5-9, 2009.