K. Jake, . Aggarwal, S. Michael, and . Ryoo, Human activity analysis: A review, ACM Computing Surveys (CSUR), vol.43, issue.3, p.16, 2011.

Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid, Labelembedding for attribute-based classification, CVPR, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00815747

J. Andén and S. Mallat, Multiscale scattering for audio classification, ISMIR, 2011.

J. Andén and S. Mallat, Deep scattering spectrum. Transactions on signal processing, pp.4114-4128, 2013.

J. Andén and S. Mallat, ScatNet (v0.2), 2013.

S. Arlot, A. Celisse, and Z. Harchaoui, Kernel change-point detection, 2012.

M. Ayari, J. Delhumeau, M. Douze, H. Jégou, D. Potapov et al., Inria @ trecvid'2011: Copy detection & multimedia event detection, TRECVID, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00648016

Y. Bengio, Learning deep architectures for ai. Foundations and Trends R in

C. M. Bishop, Pattern recognition and machine learning, 2009.

P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid et al., Finding Actors and Actions in Movies, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.283
URL : https://hal.archives-ouvertes.fr/hal-00904991

G. Bradski, The OpenCV Library. Dr. Dobb's Journal of Software Tools, 2000.

L. Cao, Y. Mu, A. Natsev, S. Chang, G. Hua et al., Scene Aligned Pooling for Complex Video Recognition, ECCV, 2012.
DOI : 10.1007/978-3-642-33709-3_49

C. Chang and C. Lin, LIBSVM, ACM Transactions on Intelligent Systems and Technology, vol.2, issue.3
DOI : 10.1145/1961189.1961199

V. Chasanis, A. Kalogeratos, and A. Likas, Movie segmentation into scenes and chapters using locally weighted bag of visual words, Proceeding of the ACM International Conference on Image and Video Retrieval, CIVR '09, 2009.
DOI : 10.1145/1646396.1646439

K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.76

H. Cheng, J. Liu, S. Ali, O. Javed, Q. Yu et al., Sri-sarnoff aurora system at trecvid 2012 multimedia event detection and recounting, TRECVID Workshop, 2012.

Y. Cong, J. Yuan, and J. Luo, Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection, Transactions on Multimedia, 2012.
DOI : 10.1109/TMM.2011.2166951

T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar, Movie/Script: Alignment and Parsing of Video and Text Transcription, ECCV, 2008.
DOI : 10.1007/978-3-540-88693-8_12

C. Franklin and . Crow, Summed-area tables for texture mapping, In ACM SIGGRAPH Computer Graphics, vol.18, pp.207-212, 1984.

A. Sandra-de-avila and . Lopes, VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method, Pattern Recognition Letters, vol.32, issue.1, pp.56-68, 2011.
DOI : 10.1016/j.patrec.2010.08.004

A. Divakaran, . Kadira, R. Peker, Z. Radhakrishnan, R. Xiong et al., Video Summarization Using Mpeg-7 Motion Activity and Audio Descriptors, Video Mining, 2003.
DOI : 10.1007/978-1-4757-6928-9_4

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang et al., Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint, 2013.

M. Douze, D. Oneata, M. Paulin, C. Leray, N. Chesneau et al., The INRIA-LIM-VocR and AXES submissions to Trecvid 2014 Multimedia Event Detection, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01089916

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459279

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, 2012.

M. Everingham, J. Sivic, and A. Zisserman, Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video, Procedings of the British Machine Vision Conference 2006, 2006.
DOI : 10.5244/C.20.92

M. Everingham, L. Van-gool, K. Christopher, J. Williams, A. Winn et al., The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

L. Fei-fei and P. Perona, A Bayesian Hierarchical Model for Learning Natural Scene Categories, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.524-531, 2005.
DOI : 10.1109/CVPR.2005.16

A. David, J. Forsyth, and . Ponce, A modern approach. Computer Vision: A Modern

A. Gaidon, Z. Harchaoui, and C. Schmid, Temporal Localization of Actions with Actoms, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.11, 2013.
DOI : 10.1109/TPAMI.2013.65
URL : https://hal.archives-ouvertes.fr/hal-00687312

A. Gaidon, Z. Harchaoui, and C. Schmid, Activity representation with motion hierarchies, International Journal of Computer Vision, vol.10, issue.3, pp.219-238, 2014.
DOI : 10.1007/s11263-013-0677-1
URL : https://hal.archives-ouvertes.fr/hal-00908581

G. Gkioxari and J. Malik, Finding action tubes, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298676

M. Grundmann, V. Kwatra, M. Han, and I. Essa, Efficient hierarchical graph-based video segmentation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539893

M. Gygli, H. Grabner, H. Riemenschneider, and L. Van-gool, Creating Summaries from User Videos, ECCV, 2014.
DOI : 10.1007/978-3-319-10584-0_33

Z. Harchaoui and O. Cappé, Retrospective Mutiple Change-Point Estimation with Kernels, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing, pp.768-772, 2007.
DOI : 10.1109/SSP.2007.4301363

Z. Harchaoui, F. Bach, and E. Moulines, Kernel change-point analysis, NIPS, 2008.

T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning, 2009.

M. Hoai and F. De-la-torre, Max-margin early event detectors, CVPR, 2012.

M. Hoai, Z. Lan, and F. De-la-torre, Joint segmentation and classification of human actions in video, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995470

D. Hoiem, A. A. Efros, and M. Hebert, Automatic photo pop-up, ACM Transactions on Graphics, vol.24, issue.3, pp.577-584, 2005.
DOI : 10.1145/1073204.1073232

Y. Hua, K. Alahari, and C. Schmid, Occlusion and Motion Reasoning for Long-Term Tracking, Proc. European Conference on Computer Vision, 2014.
DOI : 10.1007/978-3-319-10599-4_12
URL : https://hal.archives-ouvertes.fr/hal-01020149

A. Jain, A. Gupta, M. Rodriguez, S. Larry, and . Davis, Representing Videos Using Mid-level Discriminative Patches, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.2571-2578, 2013.
DOI : 10.1109/CVPR.2013.332

M. Jain, H. Jégou, and P. Bouthemy, Better Exploiting Motion for Better Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.330
URL : https://hal.archives-ouvertes.fr/hal-00813014

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar et al., Large-Scale Video Classification with Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.223

M. Steven and . Kay, Fundamentals of Statistical signal processing Detection theory, 1998.

A. Khosla, R. Hamid, C. Lin, and N. Sundaresan, Large-Scale Video Summarization Using Web-Image Priors, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.348

G. Kim, L. Sigal, and E. P. Xing, Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.538

J. Krapac, J. Verbeek, and F. Jurie, Learning Tree-structured Descriptor Quantizers for Image Categorization, BMVC, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00613118

J. Krapac, J. Verbeek, and F. Jurie, Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, pp.1487-1494, 2011.
DOI : 10.1109/ICCV.2011.6126406
URL : https://hal.archives-ouvertes.fr/inria-00612277

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012.

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126543

J. Lafferty, A. Mccallum, C. Fernando, and . Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, ICML, 2001.

L. Zhen-zhong-lan, S. Bao, W. Yu, A. G. Liu, and . Hauptmann, Double fusion for multimedia event detection, Advances in Multimedia Modeling, pp.173-185, 2012.

Z. Lan, L. Jiang, S. Yu, S. Rawat, Y. Cai et al., CMU-Informedia at TRECVID 2013 multimedia event detection, TRECVID Workshop, p.5, 2013.

I. Laptev and P. Pérez, Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409105

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.2169-2178, 2006.
DOI : 10.1109/CVPR.2006.68
URL : https://hal.archives-ouvertes.fr/inria-00548585

Y. Lee, J. Ghosh, and K. Grauman, Discovering important people and objects for egocentric video summarization, CVPR, 2012.

J. Lezama, K. Alahari, J. Sivic, and I. Laptev, Track to the future: Spatio-temporal video segmentation with long-range motion cues, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.6044588
URL : https://hal.archives-ouvertes.fr/hal-00817961

K. Li, S. Oh, A. A. Perera, and Y. Fu, A Videography Analysis Framework for Video Retrieval and Summarization, Procedings of the British Machine Vision Conference 2012, pp.1-12, 2012.
DOI : 10.5244/C.26.126

L. Li, H. Su, L. Fei-fei, and E. P. Xing, Object bank: A high-level image representation for scene classification & semantic feature sparsification, NIPS, pp.1378-1386, 2010.

Y. Li, S. Narayanan, and J. Kuo, Content-based movie analysis and indexing based on audiovisual cues. Circuits and Systems for Video Technology, pp.1073-1085, 2004.

C. Lin, Rouge: A package for automatic evaluation of summaries, Text Summarization Branches, ACL Workshop, pp.74-81, 2004.

Y. Liu, F. Zhou, W. Liu, F. De-la-torre, and Y. Liu, Unsupervised summarization of rushes videos, Proceedings of the international conference on Multimedia, MM '10, 2010.
DOI : 10.1145/1873951.1874069

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

Z. Lu and K. Grauman, Story-Driven Summarization for Egocentric Video, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.350

Y. Ma, X. Hua, L. Lu, and H. Zhang, A generic framework of user attention model and its application in video summarization, Transactions on Multimedia, 2005.

S. Maji, L. Bourdev, and J. Malik, Action recognition from a distributed representation of pose and appearance, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995631

D. Christopher, P. Manning, H. Raghavan, and . Schütze, Introduction to information retrieval, 2008.

M. Marszalek, I. Laptev, and C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206557
URL : https://hal.archives-ouvertes.fr/inria-00548645

A. Massoudi, F. Lefebvre, C. Demarty, L. Oisel, and B. Chupeau, A Video Fingerprint Based on Visual Digest and Local Fingerprints, 2006 International Conference on Image Processing, 2006.
DOI : 10.1109/ICIP.2006.312834

B. Viser-trecvid, Multimedia Event Detection system, TRECVID Workshop, 2011.

P. Natarajan, S. Wu, S. Vitaladevuni, X. Zhuang, and S. Tsakalidis, Multimodal feature fusion for robust event detection in web videos, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247814

C. Ngo, Y. Ma, and H. Zhang, Video summarization and scene detection by graph modeling. Circuits and Systems for Video Technology, 2005.

J. Carlos-niebles, C. Chen, and L. Fei-fei, Modeling temporal structure of decomposable motion segments for activity classification, ECCV, 2010.

D. Oneata, M. Douze, J. Revaud, S. Jochen, D. Potapov et al., Axes at trecvid 2012: Kis, ins, and med, TRECVID Workshop, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00746874

D. Oneata, J. Verbeek, and C. Schmid, Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.228
URL : https://hal.archives-ouvertes.fr/hal-00873662

D. Oneata, J. Verbeek, and C. Schmid, The LEAR submission at Thumos 2014, ECCV 2014 Workshop on the THUMOS Challenge 2014, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01074442

P. Over, F. Alan, G. Smeaton, and . Awad, The trecvid 2008 BBC rushes summarization evaluation, Proceeding of the 2nd ACM workshop on Video summarization, TVS '08, 2008.
DOI : 10.1145/1463563.1463564

P. Over, G. Awad, J. Fiscus, B. Antonishek, M. Michel et al., An overview of the goals, tasks, data, evaluation mechanisms and metrics, Proceedings of TRECVID, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00953093

P. Over, G. Awad, J. Fiscus, G. Sanders, D. Joy et al., An overview of the goals, tasks, data, evaluation mechanisms and metrics, Proceedings of TRECVID, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953093

P. Over, J. Fiscus, G. Sanders, D. Joy, G. Michel et al., TRECVID 2014 ? An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics, Proceedings of TRECVID 2014, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01230444

F. Perronnin and C. R. Dance, Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383266

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_11
URL : https://hal.archives-ouvertes.fr/inria-00548630

R. Poppe, A survey on vision-based human action recognition, Image and Vision Computing, vol.28, issue.6, pp.976-990, 2010.
DOI : 10.1016/j.imavis.2009.11.014

D. Potapov, M. Douze, Z. Harchaoui, and C. Schmid, Categoryspecific video summarization, ECCV, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01022967

R. Lawrence, . Rabiner, W. Ronald, and . Schafer, Introduction to digital speech processing. Foundations and trends in signal processing, pp.1-194, 2007.

M. Raptis and L. Sigal, Poselet Key-Framing: A Model for Human Activity Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.342

K. Kishore, M. Reddy, and . Shah, Recognizing 50 human action categories of web videos. Machine Vision and Applications, pp.971-981, 2013.

M. D. Rodriguez, J. Ahmed, and M. Shah, Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587727

A. Henry, S. Rowley, T. Baluja, and . Kanade, Neural network-based face detection . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.20, issue.1, pp.23-38, 1998.

Y. Rui, S. Thomas, S. Huang, and . Mehrotra, Exploring video structure beyond the shots, Multimedia Computing and Systems, pp.237-240, 1998.

Y. Rui, A. Gupta, and A. Acero, Automatically extracting highlights for TV Baseball programs, Proceedings of the eighth ACM international conference on Multimedia , MULTIMEDIA '00, 2000.
DOI : 10.1145/354384.354443

J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek, Compressed fisher vectors for large-scale image classification, 2013.

M. Schmidt, Ugm toolbox

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004.
DOI : 10.1109/ICPR.2004.1334462

G. Sharma, F. Jurie, and C. Schmid, Discriminative spatial saliency for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248093
URL : https://hal.archives-ouvertes.fr/hal-00714311

J. Shawe-taylor and N. Cristianini, Kernel methods for pattern analysis, 2004.
DOI : 10.1017/CBO9780511809682

K. Simonyan, M. Omkar, A. Parkhi, A. Vedaldi, and . Zisserman, Fisher Vector Faces in the Wild, Procedings of the British Machine Vision Conference 2013, 2013.
DOI : 10.5244/C.27.8

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, pp.1470-1477, 2003.
DOI : 10.1109/ICCV.2003.1238663

B. Snyder, Save the cat! The last book on screenwriting you'll ever need, 2005.

Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes, Tvsum: Summarizing web videos using titles, CVPR, pp.5179-5187, 2015.

K. Soomro, M. Amir-roshan-zamir, and . Shah, Ucf101: A dataset of 101 human actions classes from videos in the wild, 2012.

M. Sun, A. Farhadi, and S. Seitz, Ranking Domain-Specific Highlights by Analyzing Edited Videos, ECCV, pp.787-802, 2014.
DOI : 10.1007/978-3-319-10590-1_51

H. Sundaram, L. Xie, and S. Chang, A utility framework for the automatic generation of audio-visual skims, Proceedings of the tenth ACM international conference on Multimedia , MULTIMEDIA '02, 2002.
DOI : 10.1145/641007.641042

R. Szeliski, Computer vision: algorithms and applications, 2010.
DOI : 10.1007/978-1-84882-935-0

J. Tighe and S. Lazebnik, Superparsing, ECCV, 2010.
DOI : 10.1007/s11263-012-0574-z

T. Tommasi, N. Quadrianto, B. Caputo, and C. H. Lampert, Beyond Dataset Bias: Multi-task Unaligned Shared Knowledge Transfer, ACCV 2012, pp.1-15, 2013.
DOI : 10.1007/978-3-642-37331-2_1

L. Torresani, M. Szummer, and A. Fitzgibbon, Efficient Object Category Recognition Using Classemes, ECCV, pp.776-789, 2010.
DOI : 10.1007/978-3-642-15549-9_56

T. Ba, S. Truong, and . Venkatesh, Video abstraction: A systematic review and classification, ACM Transactions on Multimedia Computing, Communications, and Applications, vol.3, issue.1, p.3, 2007.

T. Tuytelaars and K. Mikolajczyk, Local invariant feature detectors: A survey. Foundations and Trends R in Computer Graphics and Vision, pp.177-280, 2007.

G. Tzanetakis and P. Cook, Musical genre classification of audio signals. Speech and Audio Processing, IEEE transactions on, vol.10, issue.5, pp.293-302, 2002.

J. Vermaak, P. Pérez, M. Gangnet, and A. Blake, Rapid Summarisation and Browsing of Video Sequences, Procedings of the British Machine Vision Conference 2002, pp.1-10, 2002.
DOI : 10.5244/C.16.40

P. Viola, J. Michael, and . Jones, Robust real-time face detection. IJCV, 2004.

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.441
URL : https://hal.archives-ouvertes.fr/hal-00873267

H. Wang, A. Klaser, C. Schmid, and C. Liu, Action recognition by dense trajectories, CVPR 2011, pp.3169-3176, 2011.
DOI : 10.1109/CVPR.2011.5995407
URL : https://hal.archives-ouvertes.fr/inria-00583818

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, 2013.
DOI : 10.1007/s11263-012-0594-8
URL : https://hal.archives-ouvertes.fr/hal-00725627

H. Wang, D. Oneata, J. Verbeek, and C. Schmid, A Robust and Efficient Video Representation for Action Recognition, International Journal of Computer Vision, vol.103, issue.1, p.2015
DOI : 10.1007/s11263-015-0846-5
URL : https://hal.archives-ouvertes.fr/hal-01145834

S. Chua, Event driven web video summarization by tag localization and key-shot identification, Transactions on Multimedia, vol.14, issue.4, pp.975-985, 2012.

X. Wang, Y. Jiang, Z. Chai, Z. Gu, X. Du et al., Realtime summarization of user-generated videos based on semantic recognition, ACM Multimedia, pp.849-852, 2014.

D. Weinland, R. Ronfard, and E. Boyer, A survey of vision-based methods for action representation, segmentation and recognition, Computer Vision and Image Understanding, vol.115, issue.2, pp.224-241, 2011.
DOI : 10.1016/j.cviu.2010.10.002
URL : https://hal.archives-ouvertes.fr/inria-00459653

L. Xie, P. Xu, S. Chang, A. Divakaran, and H. Sun, Structure analysis of soccer video with domain knowledge and hidden Markov models, Pattern Recognition Letters, vol.25, issue.7, p.25, 2004.
DOI : 10.1016/j.patrec.2004.01.005

J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study, International Journal of Computer Vision, vol.36, issue.1, pp.213-238, 2007.
DOI : 10.1007/s11263-006-9794-4
URL : https://hal.archives-ouvertes.fr/inria-00548574

B. Zhao, P. Eric, and . Xing, Quasi Real-Time Summarization for Consumer Videos, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.322