C. Experiments and A. , 80 3.4.1 Spatio-Temporal Interest Points, Overall, vol.87, issue.87

. Aggarwal, S. Michael, and . Ryoo, Human activity analysis, ACM Computing Surveys, vol.43, issue.3, p.16, 2011.
DOI : 10.1145/1922649.1922653

A. Saad-ali, A. Basharat, and M. Shah, Chaotic invariants for human action recognition, Computer Vision IEEE 11th International Conference on, pp.1-8, 2007.

S. Andriluka, B. Roth, and . Schiele, People-trackingby-detection and people-detection-by-tracking, Computer Vision and Pattern Recognition CVPR 2008. IEEE Conference on, pp.1-8, 2008.

P. Vincent-arsigny, X. Fillard, N. Pennec, and . Ayache, Log-Euclidean metrics for fast and simple calculus on diffusion tensors, Magnetic Resonance in Medicine, vol.52, issue.2, pp.411-421, 2006.
DOI : 10.1002/mrm.20965

]. Avila, N. Thome, M. Cord, E. Valle, and A. Araujo, BOSSA: Extended bow formalism for image classification, 2011 18th IEEE International Conference on Image Processing, pp.2909-2912, 2011.
DOI : 10.1109/ICIP.2011.6116268
URL : https://hal.archives-ouvertes.fr/hal-00625533

. Avila, . Thome, . Cord, A. Valle, and . De-araújo, Pooling in image representation: The visual codeword point of view, Computer Vision and Image Understanding, vol.117, issue.5, pp.453-465, 2013.
DOI : 10.1016/j.cviu.2012.09.007
URL : https://hal.archives-ouvertes.fr/hal-01172709

G. Slawomir-bak, E. Charpiat, F. Corvee, M. Bremond, and . Thonnat, Learning to match appearances by correlations in a covariance metric space, Computer Vision?ECCV 2012, pp.806-820, 2012.

E. Slawomir-bak, F. Corvee, M. Bremond, and . Thonnat, Boosted human re-identification using riemannian manifolds, Image and Vision Computing, vol.30, issue.6, pp.443-452, 2012.

R. S?awomir-bak, F. Kumar, and . Bremond, Brownian descriptor: a Rich Meta-Feature for Appearance Matching, WACV: Winter Conference on Applications of Computer Vision, 2013.

P. Banerjee and R. Nevatia, Learning neighborhood cooccurrence statistics of sparse features for human activity recognition, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp.212-217, 2011.
DOI : 10.1109/AVSS.2011.6027324

T. Herbert-bay, L. Tuytelaars, and . Van-gool, Surf: Speeded up robust features, pp.404-417, 2006.

]. P. Beaudet, Rotationally invariant image operators, Proceedings of the 4th International Joint Conference on Pattern Recognition, pp.579-583, 1978.

P. Bilinski and F. Bremond, Contextual Statistics of Space-Time Ordered Features for Human Action Recognition, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, 2012.
DOI : 10.1109/AVSS.2012.29
URL : https://hal.archives-ouvertes.fr/hal-00718293

P. Bilinski and F. Bremond, Statistics of Pairwise Co-occurring Local Spatio-temporal Features for Human Action Recognition, Computer Vision?ECCV 2012. Workshops and Demonstrations, pp.311-320, 2012.
DOI : 10.1007/978-3-642-33863-2_31
URL : https://hal.archives-ouvertes.fr/hal-00760963

]. Bilinski, E. Corvee, S. Bak, and F. Bremond, Relative dense tracklets for human action recognition, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp.1-7, 2013.
DOI : 10.1109/FG.2013.6553699
URL : https://hal.archives-ouvertes.fr/hal-00806321

P. Bilinski, M. Koperski, S. Bak, and F. Bremond, Representing visual appearance by video Brownian covariance descriptor for human action recognition, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2014.
DOI : 10.1109/AVSS.2014.6918649
URL : https://hal.archives-ouvertes.fr/hal-01054943

M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, Actions as space-time shapes, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, pp.1395-1402, 2005.
DOI : 10.1109/ICCV.2005.28
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.8218

F. Aaron, J. W. Bobick, and . Davis, The recognition of human movement using temporal templates. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.23, issue.3, pp.257-267, 2001.

J. C. Christopher and . Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Min. Knowl. Discov, vol.2, issue.2, pp.121-167, 1998.

C. Chang and C. Lin, LIBSVM, ACM Transactions on Intelligent Systems and Technology, vol.2, issue.3, pp.1-27, 2011.
DOI : 10.1145/1961189.1961199

]. K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, pp.36-38, 2011.
DOI : 10.5244/C.25.76

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, pp.273-297, 1995.
DOI : 10.1007/BF00994018

]. T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, vol.13, issue.1, pp.21-27, 2006.
DOI : 10.1109/TIT.1967.1053964

C. Nello and J. Shawe-taylor, An introduction to support vector machines and other kernel-based learning methods, 2010.

C. Franklin and . Crow, Summed-area tables for texture mapping, SIGGRAPH '84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques, pp.207-212, 1984.

C. R. Gabriella-csurka, L. Dance, J. Fan, C. Willamowski, and . Bray, Visual categorization with bags of keypoints, Workshop Bibliography on Statistical Learning in Computer Vision, ECCV, pp.1-22, 2004.

]. O. Cula and K. J. Dana, Compact representation of bidirectional texture functions, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, pp.1041-1047, 2001.
DOI : 10.1109/CVPR.2001.990645

C. Scott, S. T. Deerwester, T. K. Dumais, G. W. Landauer, R. A. Furnas et al., Indexing by Latent Semantic Analysis, JASIS, vol.41, issue.6, pp.391-407, 1990.

]. P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, Behavior Recognition via Sparse Spatio-Temporal Features, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp.65-72, 2005.
DOI : 10.1109/VSPETS.2005.1570899
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.77.5712

]. A. Efros, A. C. Berg, G. Mori, and J. Malik, Recognizing action at a distance, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238420
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.589.7214

A. Fathi and G. Mori, Action recognition by learning mid-level motion features, 2008 IEEE Conference on Computer Vision and Pattern Recognition, p.42, 2008.
DOI : 10.1109/CVPR.2008.4587735
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.8599

]. V. Ferrari, M. Marin-jimenez, and A. Zisserman, Progressive search space reduction for human pose estimation, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587468

B. Förstner, . Moonen, C. Metric, . Matrices, F. In et al., Grafarend on the occasion of his 60th birthday Also appeared in: Geodesy -The Challenge of the 3rd Millennium, pp.978-981, 1999.

C. Gemert, J. Geusebroek, J. Cor, A. W. Veenman, and . Smeulders, Kernel Codebooks for Scene Categorization, Proceedings of the 10th European Conference on Computer Vision: Part III, ECCV '08, pp.696-709, 2008.
DOI : 10.1007/978-3-540-88690-7_52

]. Genuer, J. Poggi, and C. Tuleau, Random Forests: some methodological insights. Rapport de recherche RR-6729, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00340725

A. Gilbert, J. Illingworth, and R. Bowden, Fast realistic multi-action recognition using mined dense spatio-temporal features, 2009 IEEE 12th International Conference on Computer Vision, p.35, 2009.
DOI : 10.1109/ICCV.2009.5459335

]. A. Gilbert, J. Illingworth, and R. Bowden, Action Recognition using Mined Hierarchical Compound Features. TPAMI, 2011.

]. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, Actions as Space-Time Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.12, pp.2247-2253, 2007.
DOI : 10.1109/TPAMI.2007.70711

]. Guo, P. Ishwar, and J. Konrad, Action recognition in video by sparse representation on covariance manifolds of silhouette tunnels. Recognizing Patterns in Signals, Speech, Images and Videos, pp.294-305, 2010.

]. Guo, P. Ishwar, and J. Konrad, Action Recognition Using Sparse Representation on Covariance Manifolds of Optical Flow, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp.188-195, 2010.
DOI : 10.1109/AVSS.2010.71

]. Guo, P. Ishwar, and J. Konrad, Action recognition from video using feature covariance matrices, IEEE Transactions on Image Processing, vol.22, issue.6, pp.2479-2494, 2013.

G. Guodong and A. Lai, A survey on still image based human action recognition, Pattern Recognition, vol.47, issue.10, pp.3343-3361, 2014.

]. P. Hart, The condensed nearest neighbor rule (Corresp.) Information Theory, IEEE Transactions on, vol.14, issue.3, pp.515-516, 1968.

M. Puzicha and . Jordan, Learning from dyadic data Advances in neural information processing systems, pp.466-472, 1999.

]. Hsu, C. Chang, and C. Lin, A practical guide to support vector classification. Rapport technique, p.59, 2003.

H. Izadinia and M. Shah, Recognizing Complex Events Using Large Margin Joint Low-Level Event Model, Computer Vision?ECCV 2012, pp.430-444, 2012.
DOI : 10.1007/978-3-642-33765-9_31

K. Anil, M. N. Jain, P. J. Murty, and . Flynn, Data Clustering: A Review, ACM Comput. Surv, vol.31, issue.3, pp.264-323, 1999.

H. Mihir-jain, P. Jégou, and . Bouthemy, Better exploiting motion for better action recognition, CVPR -International Conference on Computer Vision and Pattern Recognition, p.37, 2013.

]. Jegou, M. Douze, C. Schmid, and P. Perez, Aggregating local descriptors into a compact image representation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3304-3311, 2010.
DOI : 10.1109/CVPR.2010.5540039
URL : https://hal.archives-ouvertes.fr/inria-00548637

H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez et al., Aggregating local image descriptors into compact codes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.34, issue.9, pp.1704-1716, 2012.
DOI : 10.1109/tpami.2011.235
URL : https://hal.archives-ouvertes.fr/inria-00633013

V. Finn and . Jensen, Introduction to bayesian networks, 1996.

L. S. Lin and . Davis, Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees. TPAMI, pp.181-202, 2011.

]. I. Jolliffe, Principal component analysis, 2002.
DOI : 10.1007/978-1-4757-1904-8

]. M. Kaaniche and F. Bremond, Tracking HoG Descriptors for Gesture Recognition. In Advanced Video and Signal Based Surveillance, AVSS '09. Sixth IEEE International Conference on, pp.140-145, 2009.
DOI : 10.1109/avss.2009.26
URL : https://hal.archives-ouvertes.fr/hal-00428697

]. Z. Kalal, J. Matas, K. Mikolajczyk, and . Learning, Bootstrapping Binary Classifiers by Structural Constraints, CVPR, 2010.
DOI : 10.1109/cvpr.2010.5540231
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.231.4328

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar et al., Large-Scale Video Classification with Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.223

]. Kim, S. Wong, and R. Cipolla, Tensor Canonical Correlation Analysis for Action Classification, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383137

A. Klaser, M. Marszalek, and C. Schmid, A Spatio-Temporal Descriptor Based on 3D-Gradients, Procedings of the British Machine Vision Conference 2008, pp.28-202, 0198.
DOI : 10.5244/C.22.99
URL : https://hal.archives-ouvertes.fr/inria-00514853

P. Koperski, F. Bilinski, and . Bremond, 3D trajectories for action recognition, 2014 IEEE International Conference on Image Processing (ICIP), 2014.
DOI : 10.1109/ICIP.2014.7025848
URL : https://hal.archives-ouvertes.fr/hal-01054949

]. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques, Informatica, vol.31, issue.3, pp.249-268, 2007.

A. Kovashka and K. Grauman, Learning a hierarchy of discriminative space-time neighborhood features for human action recognition, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.34, 2010.
DOI : 10.1109/CVPR.2010.5539881

]. Krapac, J. Verbeek, and F. Jurie, Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, pp.1487-1494, 2011.
DOI : 10.1109/ICCV.2011.6126406
URL : https://hal.archives-ouvertes.fr/inria-00612277

]. H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, pp.14-74, 2011.
DOI : 10.1109/ICCV.2011.6126543

D. John, A. Lafferty, F. C. Mccallum, and . Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, ICML, pp.282-289, 2001.

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies In Computer Vision and Pattern Recognition, IEEE Conference on, vol.28, issue.11, pp.1-8, 2008.

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.2169-2178, 2006.
DOI : 10.1109/CVPR.2006.68
URL : https://hal.archives-ouvertes.fr/inria-00548585

]. Li, W. Hu, Z. Zhang, X. Zhang, M. Zhu et al., Visual tracking via incremental Log-Euclidean Riemannian subspace learning, Computer Vision and Pattern Recognition CVPR 2008. IEEE Conference on, pp.1-8, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01485545

P. Li and Q. Wang, Local Log-Euclidean Covariance Matrix (L2ECM) for Image Representation and Its Applications, Computer Vision -ECCV 2012, pp.469-482, 2012.
DOI : 10.1007/978-3-642-33712-3_34

]. Liu and M. Shah, Learning human actions via information maximization In Computer Vision and Pattern Recognition, CVPR 2008. IEEE Conference on, pp.1-8, 2008.

]. Liu, J. Luo, and M. Shah, Recognizing Realistic Actions from Videos "in the Wild, CVPR, 2009.

. Lowe, G. David, and . Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

D. Bruce, T. Lucas, and . Kanade, An Iterative Image Registration Technique with an Application to Stereo Vision (DARPA), Proceedings of the 1981 DARPA Image Understanding Workshop, pp.121-130, 1981.

]. J. Macqueen, Some Methods for Classification and Analysis of MultiVariate Observations, Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, pp.281-297, 1967.

]. R. Messing, C. Pal, and H. Kautz, Activity recognition using the velocity histories of tracked keypoints, 2009 IEEE 12th International Conference on Computer Vision, pp.14-27, 2009.
DOI : 10.1109/ICCV.2009.5459154

T. B. Moeslund, A. Hilton, V. Kr, and ?. A´zgera´zger, A survey of advances in vision-based human motion capture and analysis, Computer Vision and Image Understanding, vol.104, issue.2-3, pp.90-126, 2006.
DOI : 10.1016/j.cviu.2006.08.002

K. Sreerama and . Murthy, Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey, Data Min. Knowl. Discov, vol.2, issue.4, pp.345-389, 1998.

]. , C. Niebles, H. Wang, and L. Fei-fei, Unsupervised learning of human action categories using spatial-temporal words, BMVC, p.36, 2006.

]. S. Nowozin, G. Bakir, and K. Tsuda, Discriminative Subsequence Mining for Action Classification, 2007 IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.
DOI : 10.1109/ICCV.2007.4409049

S. Oh, S. Mccloskey, I. Kim, A. Vahdat, . Kevinj et al., Multimedia event detection with multimodal feature fusion and temporal concept localization, Machine Vision and Applications, pp.49-69, 2014.
DOI : 10.1007/s00138-013-0525-x

I. Oikonomopoulos, M. Patras, and . Pantic, Spatiotemporal salient points for visual recognition of human actions, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), vol.36, issue.3, pp.710-719, 2005.
DOI : 10.1109/TSMCB.2005.861864

M. Pietikainen and T. Maenpaa, Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns. TPAMI, 2002.

D. Oneata, J. Verbeek, and C. Schmid, Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, pp.37-56, 2013.
DOI : 10.1109/ICCV.2013.228
URL : https://hal.archives-ouvertes.fr/hal-00873662

O. Oreifej and Z. Liu, HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.716-723, 2013.
DOI : 10.1109/CVPR.2013.98

]. O. Oshin, R. Gilbert, and . Bowden, Capturing the relative distribution of features for action recognition, Face and Gesture 2011, pp.111-116, 2011.
DOI : 10.1109/FG.2011.5771382

P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders et al., TRECVID 2014 ? An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics, Proceedings of TRECVID 2014, p.2014, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01230444

Y. Florent-perronnin, J. Liu, H. Sánchez, and . Poirier, Large- Scale Image Retrieval with Compressed Fisher Vectors, CVPR, 2010.

J. Florent-perronnin, T. Sánchez, and . Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, ECCV, 2010. (Cited on pages 36, pp.58-59, 2010.

]. Piro, R. Nock, F. Nielsen, and M. Barlaud, Boosting k-NN for categorization of natural scenes, CoRR, vol.abs, 1001.
URL : https://hal.archives-ouvertes.fr/hal-00481712

R. Lawrence and . Rabiner, Readings in Speech Recognition. chapitre A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, pp.267-296, 1990.

]. D. Reynolds and R. C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models. Speech and Audio Processing, IEEE Transactions on, vol.3, issue.1, pp.72-83, 1995.

]. D. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal Representations by Error Propagation, pp.318-362, 1986.
DOI : 10.1016/B978-1-4832-1446-7.50035-2

S. Michael, J. K. Ryoo, and . Aggarwal, Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities, Computer Vision IEEE 12th International Conference on, pp.1593-1600, 2009.

]. S. Sadanand and J. J. Corso, Action bank: A high-level representation of activity in video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247806

. Gerard and . Salton, Automatic information organization and retrieval, 1968.

]. S. Satkin and M. Hebert, Modeling the Temporal Extent of Actions, ECCV, 2010.
DOI : 10.1007/978-3-642-15549-9_39

K. Lawrence, F. Saul, and . Pereira, Aggregate and mixed-order Markov models for statistical language processing, CoRR, vol.9706007, 1997.

R. E. Schapire, A Brief Introduction to Boosting, Proceedings of the 16th International Joint Conference on Artificial Intelligence - IJCAI'99, pp.1401-1406, 1999.

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004.
DOI : 10.1109/ICPR.2004.1334462

P. Scovanner, S. Ali, and M. Shah, A 3-dimensional sift descriptor and its application to action recognition, Proceedings of the 15th international conference on Multimedia , MULTIMEDIA '07, pp.357-360, 2007.
DOI : 10.1145/1291233.1291311

]. Shi and C. Tomasi, Good Features to Track, IEEE Conference on Computer Vision and Pattern Recognition (CVPR'94), pp.593-600, 1994.

]. Shi, E. Petriu, and R. Laganiere, Sampling Strategies for Real-Time Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.198, 2013.
DOI : 10.1109/CVPR.2013.335

J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook et al., Efficient Human Pose Estimation from Single Depth Images, Trans. PAMI, vol.21, pp.2012-2032, 2012.

]. K. Simonyan, O. M. Parkhi, A. Vedaldi, and A. Zisserman, Fisher Vector Faces in the Wild, Procedings of the British Machine Vision Conference 2013, 2013.
DOI : 10.5244/C.27.8

]. J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, pp.1470-1477, 2003.
DOI : 10.1109/ICCV.2003.1238663

]. J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman, Discovering Object Categories in Image Collections, Proceedings of the International Conference on Computer Vision, 2005.

A. F. Smeaton, P. Over, and W. Kraaij, Evaluation campaigns and TRECVid, Proceedings of the 8th ACM international workshop on Multimedia information retrieval , MIR '06, pp.321-330, 2006.
DOI : 10.1145/1178677.1178722
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.329.3415

]. B. Solmaz, A. Shayan, M. Modiri, and . Shah, Classifying Web Videos using a Global Video Descriptor. Machine Vision and Applications, p.2012, 2012.

]. Soomro, M. Amir-roshan-zamir, and . Shah, UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint, p.2012, 2012.

]. V. Sreekanth, A. Vedaldi, C. V. Jawahar, and A. Zisserman, Generalized RBF feature maps for efficient detection, Proceedings of the British Machine Vision Conference (BMVC), p.61, 2010.

J. Stöttinger, B. T. Goras, T. Pöntiz, A. Hanbury, N. Sebe et al., Systematic Evaluation of Spatio-Temporal Features on Comparative Video Challenges, International Workshop on Video Event Categorization, Tagging and Retrieval, in conjunction with ACCV, 2010.
DOI : 10.1007/978-3-642-22822-3_35

]. G. Strang, Introduction to linear algebra. Wellesley-Cambridge Press, 2009.

]. Sun, X. Wu, and S. Yan, Loong-Fah Cheong, Tat-Seng Chua and Jintao Li. Hierarchical Spatio-Temporal Context Modeling for Action Recognition, CVPR, p.26, 2009.

]. Sun, Y. Mu, S. Yan, and L. Cheong, Activity recognition using dense long-duration trajectories, 2010 IEEE International Conference on Multimedia and Expo, pp.322-327, 2010.
DOI : 10.1109/ICME.2010.5583046
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.660.4529

L. Sun, K. Jia, Y. Tsung-han-chan, G. Fang, S. Wang et al., DL-SFA: Deeply-Learned Slow Feature Analysis for Action Recognition, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.336

]. N. Sundaram, T. Brox, and K. Keutzer, Dense point trajectories by GPUaccelerated large displacement optical flow, European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, 2010.

S. Richard and . Sutton, Learning to Predict by the Methods of Temporal Differences, pp.9-44, 1988.

S. Richard, A. G. Sutton, and . Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, pp.1054-1054, 1998.

J. Gábor, M. L. Székely, and . Rizzo, Brownian distance covariance, The Annals of Applied Statistics, vol.3, issue.132, pp.1236-1265, 2009.

A. P. Ta, C. Wolf, G. Lavoue, A. Baskurt, and J. Jolion, Pairwise Features for Human Action Recognition, 2010 20th International Conference on Pattern Recognition, p.31, 2010.
DOI : 10.1109/ICPR.2010.788
URL : https://hal.archives-ouvertes.fr/hal-01381471

]. C. Thurau and V. Hlavac, Pose primitive based human action recognition in videos or still images, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.
DOI : 10.1109/CVPR.2008.4587721

]. C. Tomasi and T. Kanade, Detection and tracking of point features. Shape and motion from image streams, 1991.

]. O. Tuzel, F. Porikli, and P. Meer, Pedestrian Detection via Classification on Riemannian Manifolds, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, issue.10, pp.1713-1727, 2008.
DOI : 10.1109/TPAMI.2008.75

N. Vladimir and . Vapnik, The nature of statistical learning theory, p.41, 1995.

P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, pp.511-112, 2001.
DOI : 10.1109/CVPR.2001.990517

]. S. Vishwanathan and Z. Sun, Nawanol Ampornpunt and Manik Varma. Multiple kernel learning and the SMO algorithm, Advances in neural information processing systems, pp.2361-2369, 2010.

]. Wang, G. Doretto, T. Sebastian, J. Rittscher, and P. Tu, Shape and Appearance Context Modeling, 2007 IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.
DOI : 10.1109/ICCV.2007.4409019
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.360.7869

H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, Evaluation of local spatio-temporal features for action recognition, Procedings of the British Machine Vision Conference 2009, pp.25-29, 2009.
DOI : 10.5244/C.23.124
URL : https://hal.archives-ouvertes.fr/inria-00439769

]. Wang, J. Yang, K. Yu, F. Lv, and S. Thomas, Huang and Yihong Gong. Locality-constrained Linear Coding for image classification, CVPR, pp.3360-3367, 2010.

]. Wang, A. Klaser, C. Schmid, and C. Liu, Action recognition by dense trajectories, CVPR 2011, pp.3169-3176, 2011.
DOI : 10.1109/CVPR.2011.5995407
URL : https://hal.archives-ouvertes.fr/inria-00583818

]. Wang, Z. Chen, and Y. Wu, Action recognition with multiscale spatio-temporal contexts, CVPR 2011, pp.3185-3192, 2011.
DOI : 10.1109/CVPR.2011.5995493

J. Wang, Z. Liu, Y. Wu, and J. Yuan, Mining actionlet ensemble for action recognition with depth cameras, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.1290-1297, 2012.
DOI : 10.1109/CVPR.2012.6247813

]. Wang, A. Kläser, C. Schmid, and C. Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, pp.60-79, 2013.
DOI : 10.1007/s11263-012-0594-8
URL : https://hal.archives-ouvertes.fr/hal-00725627

W. Heng and C. Schmid, Action recognition with improved trajectories, Computer Vision (ICCV), 2013 IEEE International Conference on, pp.3551-3558, 2013.

J. Christopher, P. Watkins, and . Dayan, Q-learning, Machine learning, vol.8, issue.3, pp.279-292, 1992.

D. Weinland, R. Ronfard, and E. Boyer, A survey of vision-based methods for action representation, segmentation and recognition, Computer Vision and Image Understanding, vol.115, issue.2, pp.224-241, 2011.
DOI : 10.1016/j.cviu.2010.10.002
URL : https://hal.archives-ouvertes.fr/inria-00459653

V. Wu, R. Kumar, J. Quinlan, Q. Ghosh, H. Yang et al., Top 10 algorithms in data mining, Knowledge and Information Systems, vol.9, issue.2, pp.1-37, 2008.
DOI : 10.1007/s10115-007-0114-2

]. Wu, O. Oreifej, and M. Shah, Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories, 2011 International Conference on Computer Vision, pp.1419-1426, 2011.
DOI : 10.1109/ICCV.2011.6126397

]. X. Wu, D. Xu, L. Duan, and J. Luo, Action recognition using context and appearance distribution features, CVPR 2011, p.181, 2011.
DOI : 10.1109/CVPR.2011.5995624

]. Wu, C. Yuan, and W. Hu, Human Action Recognition Based on Context-Dependent Graph Kernels, 2014 IEEE Conference on Computer Vision and Pattern Recognition, p.181, 2014.
DOI : 10.1109/CVPR.2014.334

J. Yamato, J. Ohya, and K. Ishii, Recognizing human action in timesequential images using hidden Markov model, Computer Vision and Pattern Recognition Proceedings CVPR '92., 1992 IEEE Computer Society Conference on, pp.379-385, 1992.

Y. Yang and M. Shah, Complex Events Detection Using Data-Driven Concepts, Computer Vision -ECCV 2012 -12th European Conference on Computer Vision Proceedings, Part III, pp.722-735, 2012.
DOI : 10.1007/978-3-642-33712-3_52
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.258.6996

M. Yilma and . Shah, Recognizing human actions in videos acquired by uncalibrated moving cameras, Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, pp.150-157, 2005.

W. Chunfeng-yuan, X. Hu, S. J. Li, G. Maybank, and . Luo, Human Action Recognition under Log-Euclidean Riemannian Metric, Lecture Notes in Computer Science, vol.5994, issue.102, pp.343-353, 2009.
DOI : 10.1007/978-3-642-12307-8_32

]. J. Yuan, Z. Liu, and Y. Wu, Discriminative Subvolume Search for Efficient Action Detection, CVPR, 2009.

C. Yuan, C. Ho, and . Lin, Recent Advances of Large-Scale Linear Classification, Proceedings of the IEEE, pp.2584-2603, 2012.
DOI : 10.1109/JPROC.2012.2188013

X. Zhang, M. Liu, W. Chang, T. Ge, and . Chen, Spatio-Temporal Phrases for Activity Recognition, Computer Vision? ECCV 2012, pp.707-721, 2012.
DOI : 10.1007/978-3-642-33712-3_51

]. Zhou, K. Yu, T. Zhang, S. Thomas, and . Huang, Image Classification Using Super-Vector Coding of Local Image Descriptors, Computer Vision? ECCV 2010, pp.141-154, 2010.
DOI : 10.1007/978-3-642-15555-0_11

]. Zhou, G. Wang, K. Jia, and Q. Zhao, Learning to Share Latent Tasks for Action Recognition, 2013 IEEE International Conference on Computer Vision, pp.2264-2271, 2013.
DOI : 10.1109/ICCV.2013.281