Human activity analysis: A review, ACM Computing Surveys (CSUR), vol.43, issue.3, p.16, 2011. ,
Labelembedding for attribute-based classification, CVPR, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00815747
Multiscale scattering for audio classification, ISMIR, 2011. ,
Deep scattering spectrum. Transactions on signal processing, pp.4114-4128, 2013. ,
ScatNet (v0.2), 2013. ,
Kernel change-point detection, 2012. ,
Inria @ trecvid'2011: Copy detection & multimedia event detection, TRECVID, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00648016
Learning deep architectures for ai. Foundations and Trends R in ,
Pattern recognition and machine learning, 2009. ,
Finding Actors and Actions in Movies, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.283
URL : https://hal.archives-ouvertes.fr/hal-00904991
The OpenCV Library. Dr. Dobb's Journal of Software Tools, 2000. ,
Scene Aligned Pooling for Complex Video Recognition, ECCV, 2012. ,
DOI : 10.1007/978-3-642-33709-3_49
LIBSVM, ACM Transactions on Intelligent Systems and Technology, vol.2, issue.3 ,
DOI : 10.1145/1961189.1961199
Movie segmentation into scenes and chapters using locally weighted bag of visual words, Proceeding of the ACM International Conference on Image and Video Retrieval, CIVR '09, 2009. ,
DOI : 10.1145/1646396.1646439
The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, 2011. ,
DOI : 10.5244/C.25.76
Sri-sarnoff aurora system at trecvid 2012 multimedia event detection and recounting, TRECVID Workshop, 2012. ,
Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection, Transactions on Multimedia, 2012. ,
DOI : 10.1109/TMM.2011.2166951
Movie/Script: Alignment and Parsing of Video and Text Transcription, ECCV, 2008. ,
DOI : 10.1007/978-3-540-88693-8_12
Summed-area tables for texture mapping, In ACM SIGGRAPH Computer Graphics, vol.18, pp.207-212, 1984. ,
VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method, Pattern Recognition Letters, vol.32, issue.1, pp.56-68, 2011. ,
DOI : 10.1016/j.patrec.2010.08.004
Video Summarization Using Mpeg-7 Motion Activity and Audio Descriptors, Video Mining, 2003. ,
DOI : 10.1007/978-1-4757-6928-9_4
Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint, 2013. ,
The INRIA-LIM-VocR and AXES submissions to Trecvid 2014 Multimedia Event Detection, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01089916
Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, 2009. ,
DOI : 10.1109/ICCV.2009.5459279
Pattern classification, 2012. ,
Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video, Procedings of the British Machine Vision Conference 2006, 2006. ,
DOI : 10.5244/C.20.92
The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010. ,
DOI : 10.1007/s11263-009-0275-4
A Bayesian Hierarchical Model for Learning Natural Scene Categories, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.524-531, 2005. ,
DOI : 10.1109/CVPR.2005.16
A modern approach. Computer Vision: A Modern ,
Temporal Localization of Actions with Actoms, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.11, 2013. ,
DOI : 10.1109/TPAMI.2013.65
URL : https://hal.archives-ouvertes.fr/hal-00687312
Activity representation with motion hierarchies, International Journal of Computer Vision, vol.10, issue.3, pp.219-238, 2014. ,
DOI : 10.1007/s11263-013-0677-1
URL : https://hal.archives-ouvertes.fr/hal-00908581
Finding action tubes, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,
DOI : 10.1109/CVPR.2015.7298676
Efficient hierarchical graph-based video segmentation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010. ,
DOI : 10.1109/CVPR.2010.5539893
Creating Summaries from User Videos, ECCV, 2014. ,
DOI : 10.1007/978-3-319-10584-0_33
Retrospective Mutiple Change-Point Estimation with Kernels, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing, pp.768-772, 2007. ,
DOI : 10.1109/SSP.2007.4301363
Kernel change-point analysis, NIPS, 2008. ,
The elements of statistical learning, 2009. ,
Max-margin early event detectors, CVPR, 2012. ,
Joint segmentation and classification of human actions in video, CVPR 2011, 2011. ,
DOI : 10.1109/CVPR.2011.5995470
Automatic photo pop-up, ACM Transactions on Graphics, vol.24, issue.3, pp.577-584, 2005. ,
DOI : 10.1145/1073204.1073232
Occlusion and Motion Reasoning for Long-Term Tracking, Proc. European Conference on Computer Vision, 2014. ,
DOI : 10.1007/978-3-319-10599-4_12
URL : https://hal.archives-ouvertes.fr/hal-01020149
Representing Videos Using Mid-level Discriminative Patches, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.2571-2578, 2013. ,
DOI : 10.1109/CVPR.2013.332
Better Exploiting Motion for Better Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013. ,
DOI : 10.1109/CVPR.2013.330
URL : https://hal.archives-ouvertes.fr/hal-00813014
Large-Scale Video Classification with Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.223
Fundamentals of Statistical signal processing Detection theory, 1998. ,
Large-Scale Video Summarization Using Web-Image Priors, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013. ,
DOI : 10.1109/CVPR.2013.348
Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.538
Learning Tree-structured Descriptor Quantizers for Image Categorization, BMVC, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00613118
Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, pp.1487-1494, 2011. ,
DOI : 10.1109/ICCV.2011.6126406
URL : https://hal.archives-ouvertes.fr/inria-00612277
ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012. ,
HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, 2011. ,
DOI : 10.1109/ICCV.2011.6126543
Conditional random fields: Probabilistic models for segmenting and labeling sequence data, ICML, 2001. ,
Double fusion for multimedia event detection, Advances in Multimedia Modeling, pp.173-185, 2012. ,
CMU-Informedia at TRECVID 2013 multimedia event detection, TRECVID Workshop, p.5, 2013. ,
Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, 2007. ,
DOI : 10.1109/ICCV.2007.4409105
Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008. ,
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.2169-2178, 2006. ,
DOI : 10.1109/CVPR.2006.68
URL : https://hal.archives-ouvertes.fr/inria-00548585
Discovering important people and objects for egocentric video summarization, CVPR, 2012. ,
Track to the future: Spatio-temporal video segmentation with long-range motion cues, CVPR 2011, 2011. ,
DOI : 10.1109/CVPR.2011.6044588
URL : https://hal.archives-ouvertes.fr/hal-00817961
A Videography Analysis Framework for Video Retrieval and Summarization, Procedings of the British Machine Vision Conference 2012, pp.1-12, 2012. ,
DOI : 10.5244/C.26.126
Object bank: A high-level image representation for scene classification & semantic feature sparsification, NIPS, pp.1378-1386, 2010. ,
Content-based movie analysis and indexing based on audiovisual cues. Circuits and Systems for Video Technology, pp.1073-1085, 2004. ,
Rouge: A package for automatic evaluation of summaries, Text Summarization Branches, ACL Workshop, pp.74-81, 2004. ,
Unsupervised summarization of rushes videos, Proceedings of the international conference on Multimedia, MM '10, 2010. ,
DOI : 10.1145/1873951.1874069
Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004. ,
DOI : 10.1023/B:VISI.0000029664.99615.94
Story-Driven Summarization for Egocentric Video, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013. ,
DOI : 10.1109/CVPR.2013.350
A generic framework of user attention model and its application in video summarization, Transactions on Multimedia, 2005. ,
Action recognition from a distributed representation of pose and appearance, CVPR 2011, 2011. ,
DOI : 10.1109/CVPR.2011.5995631
Introduction to information retrieval, 2008. ,
Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009. ,
DOI : 10.1109/CVPR.2009.5206557
URL : https://hal.archives-ouvertes.fr/inria-00548645
A Video Fingerprint Based on Visual Digest and Local Fingerprints, 2006 International Conference on Image Processing, 2006. ,
DOI : 10.1109/ICIP.2006.312834
Multimedia Event Detection system, TRECVID Workshop, 2011. ,
Multimodal feature fusion for robust event detection in web videos, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012. ,
DOI : 10.1109/CVPR.2012.6247814
Video summarization and scene detection by graph modeling. Circuits and Systems for Video Technology, 2005. ,
Modeling temporal structure of decomposable motion segments for activity classification, ECCV, 2010. ,
Axes at trecvid 2012: Kis, ins, and med, TRECVID Workshop, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00746874
Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.228
URL : https://hal.archives-ouvertes.fr/hal-00873662
The LEAR submission at Thumos 2014, ECCV 2014 Workshop on the THUMOS Challenge 2014, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01074442
The trecvid 2008 BBC rushes summarization evaluation, Proceeding of the 2nd ACM workshop on Video summarization, TVS '08, 2008. ,
DOI : 10.1145/1463563.1463564
An overview of the goals, tasks, data, evaluation mechanisms and metrics, Proceedings of TRECVID, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00953093
An overview of the goals, tasks, data, evaluation mechanisms and metrics, Proceedings of TRECVID, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00953093
TRECVID 2014 ? An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics, Proceedings of TRECVID 2014, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01230444
Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007. ,
DOI : 10.1109/CVPR.2007.383266
Improving the Fisher Kernel for Large-Scale Image Classification, ECCV, 2010. ,
DOI : 10.1007/978-3-642-15561-1_11
URL : https://hal.archives-ouvertes.fr/inria-00548630
A survey on vision-based human action recognition, Image and Vision Computing, vol.28, issue.6, pp.976-990, 2010. ,
DOI : 10.1016/j.imavis.2009.11.014
Categoryspecific video summarization, ECCV, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01022967
Introduction to digital speech processing. Foundations and trends in signal processing, pp.1-194, 2007. ,
Poselet Key-Framing: A Model for Human Activity Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013. ,
DOI : 10.1109/CVPR.2013.342
Recognizing 50 human action categories of web videos. Machine Vision and Applications, pp.971-981, 2013. ,
Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008. ,
DOI : 10.1109/CVPR.2008.4587727
Neural network-based face detection . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.20, issue.1, pp.23-38, 1998. ,
Exploring video structure beyond the shots, Multimedia Computing and Systems, pp.237-240, 1998. ,
Automatically extracting highlights for TV Baseball programs, Proceedings of the eighth ACM international conference on Multimedia , MULTIMEDIA '00, 2000. ,
DOI : 10.1145/354384.354443
Compressed fisher vectors for large-scale image classification, 2013. ,
Ugm toolbox ,
Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004. ,
DOI : 10.1109/ICPR.2004.1334462
Discriminative spatial saliency for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012. ,
DOI : 10.1109/CVPR.2012.6248093
URL : https://hal.archives-ouvertes.fr/hal-00714311
Kernel methods for pattern analysis, 2004. ,
DOI : 10.1017/CBO9780511809682
Fisher Vector Faces in the Wild, Procedings of the British Machine Vision Conference 2013, 2013. ,
DOI : 10.5244/C.27.8
Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, pp.1470-1477, 2003. ,
DOI : 10.1109/ICCV.2003.1238663
Save the cat! The last book on screenwriting you'll ever need, 2005. ,
Tvsum: Summarizing web videos using titles, CVPR, pp.5179-5187, 2015. ,
Ucf101: A dataset of 101 human actions classes from videos in the wild, 2012. ,
Ranking Domain-Specific Highlights by Analyzing Edited Videos, ECCV, pp.787-802, 2014. ,
DOI : 10.1007/978-3-319-10590-1_51
A utility framework for the automatic generation of audio-visual skims, Proceedings of the tenth ACM international conference on Multimedia , MULTIMEDIA '02, 2002. ,
DOI : 10.1145/641007.641042
Computer vision: algorithms and applications, 2010. ,
DOI : 10.1007/978-1-84882-935-0
Superparsing, ECCV, 2010. ,
DOI : 10.1007/s11263-012-0574-z
Beyond Dataset Bias: Multi-task Unaligned Shared Knowledge Transfer, ACCV 2012, pp.1-15, 2013. ,
DOI : 10.1007/978-3-642-37331-2_1
Efficient Object Category Recognition Using Classemes, ECCV, pp.776-789, 2010. ,
DOI : 10.1007/978-3-642-15549-9_56
Video abstraction: A systematic review and classification, ACM Transactions on Multimedia Computing, Communications, and Applications, vol.3, issue.1, p.3, 2007. ,
Local invariant feature detectors: A survey. Foundations and Trends R in Computer Graphics and Vision, pp.177-280, 2007. ,
Musical genre classification of audio signals. Speech and Audio Processing, IEEE transactions on, vol.10, issue.5, pp.293-302, 2002. ,
Rapid Summarisation and Browsing of Video Sequences, Procedings of the British Machine Vision Conference 2002, pp.1-10, 2002. ,
DOI : 10.5244/C.16.40
Robust real-time face detection. IJCV, 2004. ,
Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.441
URL : https://hal.archives-ouvertes.fr/hal-00873267
Action recognition by dense trajectories, CVPR 2011, pp.3169-3176, 2011. ,
DOI : 10.1109/CVPR.2011.5995407
URL : https://hal.archives-ouvertes.fr/inria-00583818
Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, 2013. ,
DOI : 10.1007/s11263-012-0594-8
URL : https://hal.archives-ouvertes.fr/hal-00725627
A Robust and Efficient Video Representation for Action Recognition, International Journal of Computer Vision, vol.103, issue.1, p.2015 ,
DOI : 10.1007/s11263-015-0846-5
URL : https://hal.archives-ouvertes.fr/hal-01145834
Event driven web video summarization by tag localization and key-shot identification, Transactions on Multimedia, vol.14, issue.4, pp.975-985, 2012. ,
Realtime summarization of user-generated videos based on semantic recognition, ACM Multimedia, pp.849-852, 2014. ,
A survey of vision-based methods for action representation, segmentation and recognition, Computer Vision and Image Understanding, vol.115, issue.2, pp.224-241, 2011. ,
DOI : 10.1016/j.cviu.2010.10.002
URL : https://hal.archives-ouvertes.fr/inria-00459653
Structure analysis of soccer video with domain knowledge and hidden Markov models, Pattern Recognition Letters, vol.25, issue.7, p.25, 2004. ,
DOI : 10.1016/j.patrec.2004.01.005
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study, International Journal of Computer Vision, vol.36, issue.1, pp.213-238, 2007. ,
DOI : 10.1007/s11263-006-9794-4
URL : https://hal.archives-ouvertes.fr/inria-00548574
Quasi Real-Time Summarization for Consumer Videos, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.322