Human activity analysis, ACM Computing Surveys, vol.43, issue.3, pp.16-43, 2011. ,
DOI : 10.1145/1922649.1922653
Good Practice in Large-Scale Learning for Image Classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.3, pp.59-82, 2013. ,
DOI : 10.1109/TPAMI.2013.146
URL : https://hal.archives-ouvertes.fr/hal-00690014
Multimodal fusion for multimedia analysis: a survey, Multimedia Systems, vol.24, issue.11, pp.1-35, 2010. ,
DOI : 10.1007/s00530-010-0182-0
Optimization with sparsityinducing penalties. arXiv preprint, p.72, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00613125
Structured Sparsity through Convex Optimization, Statistical Science, vol.27, issue.4, pp.450-468, 2012. ,
DOI : 10.1214/12-STS394
URL : https://hal.archives-ouvertes.fr/hal-00621245
Multiple kernel learning, conic duality, and the SMO algorithm, Twenty-first international conference on Machine learning , ICML '04, pp.6-59, 2004. ,
DOI : 10.1145/1015330.1015424
Video event classification using bag of words and string kernels. Image Analysis and Processing?ICIAP, pp.170-178, 2009. ,
Event Detec- 210 BIBLIOGRAPHY tion and Recognition for Semantic Annotation of Video, Multimedia Tools and Applications, pp.1-24, 2010. ,
Trajectories based descriptor for dynamic events annotation, Proceedings of the 2011 joint ACM workshop on Modeling and representing events, J-MRE '11, pp.13-18, 2011. ,
DOI : 10.1145/2072508.2072512
A new point process model for trajectory-based events annotation, Image Processing: Machine Vision Applications V, pp.83000-138, 2012. ,
DOI : 10.1117/12.912088
Informedia@ trecvid 2011 multimedia event detection, semantic indexing, TREC Video Retrieval Evaluation Workshop, vol.1, pp.107-123, 2011. ,
Visual objects in context, Nature Reviews Neuroscience, vol.8, issue.8, pp.617-629, 2004. ,
DOI : 10.1016/0001-6918(66)90003-5
Interprétation temps de mouvement réel, RFIA, p.44, 2012. ,
Surf: Speeded up robust features, Computer Vision?ECCV, pp.404-417, 2006. ,
Shape matching and object recognition using shape contexts, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, issue.4, pp.509-522, 2002. ,
DOI : 10.1109/34.993558
Spatial interaction and the statistical analysis of lattice systems, Journal of the Royal Statistical Society. Series B (Methodological), pp.192-236, 1974. ,
Augmenting bag-of-words: Data-driven discovery of temporal and structural information for activity recog- BIBLIOGRAPHY 211 ,
Scene perception: Detecting and judging objects undergoing relational violations, Cognitive Psychology, vol.14, issue.2, pp.143-177, 1982. ,
DOI : 10.1016/0010-0285(82)90007-X
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.405.408
Actions as spacetime shapes, Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, pp.1395-1402, 2005. ,
DOI : 10.1109/iccv.2005.28
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.8218
The recognition of human movement using temporal templates, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, issue.3, pp.44-46, 2001. ,
DOI : 10.1109/34.910878
State-of-the-art in visual attention modeling. Transaction on PAMI, 0198. ,
Ask the locals: Multi-way local pooling for image recognition, 2011 International Conference on Computer Vision, pp.2651-2658, 2011. ,
DOI : 10.1109/ICCV.2011.6126555
URL : https://hal.archives-ouvertes.fr/hal-00646816
Object Segmentation by Long Term Analysis of Point Trajectories, Computer Vision?ECCV 2010, pp.282-295, 2010. ,
DOI : 10.1007/978-3-642-15555-0_21
Multiple kernel learning for visual object recognition: A review, IEEE Transactions on Pattern Analysis and Machine Intelligence, p.59, 2013. ,
A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, pp.121-167, 1998. ,
Scene Aligned Pooling for Complex Video Recognition, pp.79-130, 2012. ,
DOI : 10.1007/978-3-642-33709-3_49
Semantic Segmentation with Second-Order Pooling, Computer Vision?ECCV 2012, pp.430-443 ,
DOI : 10.1007/978-3-642-33786-4_32
Skeleton point trajectories for human daily activity recognition, VISAPP, 2013. 44, p.56 ,
Horizon 2020: The EU Framework Programme for Research and Innovation, p.33, 2011. ,
On the algorithmic implementation of multiclass kernel-based vector machines, The Journal of Machine Learning Research, vol.2, pp.265-292, 2002. ,
Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.886-893, 2005. ,
DOI : 10.1109/CVPR.2005.177
URL : https://hal.archives-ouvertes.fr/inria-00548512
Human Detection Using Oriented Histograms of Flow and Appearance, Computer Vision?ECCV, vol.38, issue.1, pp.428-441, 2006. ,
DOI : 10.1023/A:1008162616689
URL : https://hal.archives-ouvertes.fr/inria-00548587
Multimedia movie segmentation using low-level and semantic features, p.62 ,
Object/background scene classification in photographs using linguistic statistics from the web, p.60, 2008. ,
Construction and Analysis of a Large Scale Image Ontology, p.186, 2009. ,
Marked point process in image analysis, IEEE Signal Processing Magazine, vol.19, issue.5, pp.77-84, 2002. ,
DOI : 10.1109/MSP.2002.1028354
Behavior Recognition via Sparse Spatio-Temporal Features, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp.65-72, 2006. ,
DOI : 10.1109/VSPETS.2005.1570899
The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, p.48, 2010. ,
DOI : 10.1007/s11263-009-0275-4
Color appearance models, p.169, 2006. ,
DOI : 10.1002/9781118653128
Learning hierarchical features for scene labeling. Transactions on Pattern Analysis and Machine Intelligence, 0201. ,
URL : https://hal.archives-ouvertes.fr/hal-00742077
Two-Frame Motion Estimation Based on Polynomial Expansion, Image Analysis, vol.51, issue.168, p.170, 2003. ,
DOI : 10.1007/3-540-45103-X_50
Improving " bagof-keypoints " image categorization: generative models and pdf-kernels, p.52, 2005. ,
Pose search: Retrieving people using their pose, 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2009. ,
DOI : 10.1109/CVPR.2009.5206495
Where computer vision needs help from computer science, ACM-SIAM Symposium on Discrete Algorithms. SIAM, p.34, 2011. ,
DOI : 10.1137/1.9781611973082.64
Color-based object recognition, Image Analysis and Processing, pp.319-326, 1997. ,
DOI : 10.1016/S0031-3203(98)00036-3
Simulation procedures and likelihood inference for spatial point processes, Scandinavian Journal of Statistics, vol.138, pp.359-373, 1994. ,
Fast realistic multi-action recognition using mined dense spatio-temporal features, 2009 IEEE 12th International Conference on Computer Vision, p.78, 2010. ,
DOI : 10.1109/ICCV.2009.5459335
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.158.3113
Action recognition using mined hierarchical compound features. Transaction on PAMI, pp.129-196, 2011. ,
DOI : 10.1109/tpami.2010.144
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.301.1835
Multiple kernel learning algorithms. The journal of machine learning, p.71, 2011. ,
Youtube online statistic, 2013. URL http, p.74 ,
Actions as spacetime shapes. Transactions on Pattern Analysis and Machine Intelligence, pp.45-63, 2007. ,
Biconvex sets and optimization with biconvex functions: a survey and extensions, Mathematical Methods of Operations Research, vol.21, issue.1, p.109, 2007. ,
DOI : 10.1007/s00186-007-0161-1
Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels, Advanced Video and Signal Based Surveillance, p.112, 2010. ,
DOI : 10.1007/978-3-642-17711-8_30
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.10, pp.44-55, 2009. ,
DOI : 10.1109/TPAMI.2009.83
Discriminative spatial pyramid, CVPR 2011, pp.160-165, 2011. ,
DOI : 10.1109/CVPR.2011.5995691
A Combined Corner and Edge Detector, Procedings of the Alvey Vision Conference 1988, pp.50-99, 1988. ,
DOI : 10.5244/C.2.23
Classification of video events using 4-dimensional time-compressed motion features, Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR '07, pp.178-185, 2007. ,
DOI : 10.1145/1282280.1282311
How many high-level concepts will fill the semantic gap in news video retrieval?, Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR '07, pp.44-201, 2007. ,
DOI : 10.1145/1282280.1282369
A Multi-Pronged Approach to Improving Semantic Extraction of News Video, Journal of Signal Processing Systems, vol.2, issue.2, pp.373-385, 2010. ,
DOI : 10.1007/s11265-009-0382-z
Description of interest regions with center-symmetric local binary patterns. Computer Vision, Graphics and Image Processing, pp.58-69, 2006. ,
Semantic analysis of soccer video using dynamic Bayesian network. Multimedia, IEEE Transactions on, vol.8, issue.4, pp.749-760, 2006. ,
Salient coding for image classification, CVPR 2011, pp.1753-1760, 2011. ,
DOI : 10.1109/CVPR.2011.5995682
Object, Scene and Actions: Combining Multiple Features for Human Action Recognition, Computer Vision?ECCV, pp.494-507, 2010. ,
DOI : 10.1007/978-3-642-15549-9_36
Tokyotech+ canon at trecvid 2011, Proceedings of NIST TRECVID Workshop, p.62, 2011. ,
Computational modelling of visual attention, Nature Reviews Neuroscience, vol.2, issue.3, pp.194-203, 2001. ,
DOI : 10.1038/35058500
A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20, issue.11, p.168, 1998. ,
DOI : 10.1109/34.730558
Representing Videos Using Mid-level Discriminative Patches, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.77, 2013. ,
DOI : 10.1109/CVPR.2013.332
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.307.3329
Better Exploiting Motion for Better Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.40-51, 2013. ,
DOI : 10.1109/CVPR.2013.330
URL : https://hal.archives-ouvertes.fr/hal-00813014
The principles of psychology, p.158, 1980. ,
Aggregating local descriptors into a compact image representation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3304-3311, 2010. ,
DOI : 10.1109/CVPR.2010.5540039
URL : https://hal.archives-ouvertes.fr/inria-00548637
Aggregating Local Image Descriptors into Compact Codes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.9, 0197. ,
DOI : 10.1109/TPAMI.2011.235
Beyond spatial pyramids: Receptive field learning for pooled image features, CVPR. IEEE, p.165, 2012. ,
Video anomaly detection in spatiotemporal context, 2010 IEEE International Conference on Image Processing, p.78, 2010. ,
DOI : 10.1109/ICIP.2010.5650993
Advanced Techniques for Semantic Concept Detection in General Videos, p.60, 2010. ,
Domain adaptive semantic diffusion for large scale context-based video annotation, pp.1420-1427, 2010. ,
Columbia-ucf trecvid2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching, TRECVID, p.62, 2010. ,
Consumer video understanding, Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pp.29-62, 2011. ,
DOI : 10.1145/1991996.1992025
High-level event recognition in unconstrained videos, International Journal of Multimedia Information Retrieval, vol.73, issue.2, pp.1-29, 2012. ,
DOI : 10.1007/s13735-012-0024-2
Trajectory-Based Modeling of Human Actions with Motion Reference Points, pp.51-193, 2012. ,
DOI : 10.1007/978-3-642-33715-4_31
Visual perception of biological motion and a model for its analysis, Perception & Psychophysics, vol.4, issue.2, p.54, 1973. ,
DOI : 10.3758/BF03212378
L1-regularized logistic regression stacking and transductive crf smoothing for action recognition in video ,
Is early vision optimized for extracting higher-order dependencies?, Advances in Neural Information Processing Systems (NIPS), pp.99-101, 2006. ,
Efficient visual event detection using volumetric features, p.50, 2005. ,
Learning human actions in video, p.34, 2010. ,
A Spatio-Temporal Descriptor Based on 3D-Gradients, Procedings of the British Machine Vision Conference 2008, pp.995-1004, 2008. ,
DOI : 10.5244/C.22.99
Motion Interchange Patterns for Action Recognition in Unconstrained Videos, p.69, 2012. ,
DOI : 10.1007/978-3-642-33783-3_19
Information Fusion in Multimedia Information Retrieval, Adaptive Multimedial Retrieval: Retrieval, User, and Semantics, pp.147-159, 2008. ,
DOI : 10.1007/978-3-540-79860-6_12
Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection, Computer Vision and Image Understanding, vol.117, issue.5, p.163, 2012. ,
DOI : 10.1016/j.cviu.2012.10.010
Learning a hierarchy of discriminative spacetime neighborhood features for human action recognition, CVPR. IEEE, pp.66-130, 2010. ,
Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, pp.1487-1494, 2011. ,
DOI : 10.1109/ICCV.2011.6126406
URL : https://hal.archives-ouvertes.fr/inria-00612277
Image classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), 0201. ,
HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, pp.40-188, 2011. ,
DOI : 10.1109/ICCV.2011.6126543
Discriminative random fields: A discriminative framework for contextual interaction in classification, p.61, 2008. ,
Conditional random fields: Probabilistic models for segmenting and labeling sequence data, MACHINE LEARNING-INTERNATIONAL WORKSHOP THEN CONFERENCE, pp.282-289, 2001. ,
Resource Constrained Multimedia Event Detection, In ACM Multimedia Modeling. IEEE, vol.186, p.194, 2014. ,
DOI : 10.1007/978-3-319-04114-8_33
Multimedia classification and event detection using double fusion, Multimedia Tools and Applications, pp.1-15, 2013. ,
DOI : 10.1007/s11042-013-1391-2
On Space-Time Interest Points, International Journal of Computer Vision, vol.17, issue.8, pp.107-123, 2005. ,
DOI : 10.1007/s11263-005-1838-7
Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, pp.1-8, 2007. ,
DOI : 10.1109/ICCV.2007.4409105
Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.130-134, 2008. ,
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659
Understanding video events: a survey of methods for automatic interpretation of semantic occurrences in video. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol.39, issue.5, pp.489-504, 2009. ,
Supervised learning of quantizer codebooks by information loss minimization. Transactions on Pattern Analysis and Machine Intelligence, p.52, 2009. ,
A sparse texture representation using local affine regions, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.8, pp.1265-1278, 2005. ,
DOI : 10.1109/TPAMI.2005.151
URL : https://hal.archives-ouvertes.fr/inria-00548530
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.130-134, 2006. ,
DOI : 10.1109/CVPR.2006.68
URL : https://hal.archives-ouvertes.fr/inria-00548585
Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis, CVPR 2011, p.201, 2011. ,
DOI : 10.1109/CVPR.2011.5995496
A tutorial on energy-based learning, Predicting Structured Data, vol.1, issue.81, p.82, 2006. ,
Multicategory Support Vector Machines, Journal of the American Statistical Association, vol.99, issue.465, pp.9967-81, 2004. ,
DOI : 10.1198/016214504000000098
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.1879
Track to the future: Spatio-temporal video segmentation with long-range motion cues, CVPR 2011, p.205, 2011. ,
DOI : 10.1109/CVPR.2011.6044588
URL : https://hal.archives-ouvertes.fr/hal-00817961
Object bank: A high-level image representation for scene classification & semantic feature sparsification, Advances in neural information proceeding systems, pp.44-56, 2010. ,
Feature detection with automatic scale selection, International Journal of Computer Vision, vol.30, issue.2, pp.79-116, 1998. ,
DOI : 10.1023/A:1008045108935
Recognizing realistic actions from videos "in the wild, pp.66-67 ,
In defense of soft-assignment coding, p.144, 2011. ,
Object recognition from local scale-invariant features, Proceedings of the Seventh IEEE International Conference on Computer Vision, pp.49-75, 1999. ,
DOI : 10.1109/ICCV.1999.790410
Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004. ,
DOI : 10.1023/B:VISI.0000029664.99615.94
Thinking of Images as What They Are: Compound Matrix Regression for Image Classification, International Joint Conferences on Artificial Intelligence (IJCAI), 2013. 99, p.110 ,
Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp.14-96, 1967. ,
Ensemble of exemplar-SVMs for object detection and beyond, 2011 International Conference on Computer Vision, 0200. ,
DOI : 10.1109/ICCV.2011.6126229
Texture features for browsing and retrieval of image data. Transactions on Pattern Analysis and Machine Intelligence, p.45, 1996. ,
Big data: The next frontier for innovation, competition, and productivity, p.33, 2011. ,
Actions in context. In Computer Vision and Pattern Recognition, CVPR 2009. IEEE Conference on, pp.2929-2936, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00548645
Le langage cinématographique, Cerf, vol.75, p.131, 1985. ,
Robust wide-baseline stereo from maximally stable extremal regions, Image and Vision Computing, vol.22, issue.10, pp.761-767, 2004. ,
DOI : 10.1016/j.imavis.2004.02.006
Trajectons: Action recognition through the motion analysis of tracked features, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, p.95, 2009. ,
DOI : 10.1109/ICCVW.2009.5457659
Searching informative concept banks for video event detection, Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, ICMR '13, pp.255-262 ,
DOI : 10.1145/2461466.2461507
Semantic model vectors for complex video event recognition. Multimedia, IEEE Transactions on, vol.14, issue.56, pp.88-101 ,
Activity recognition using the velocity histories of tracked keypoints, 2009 IEEE 12th International Conference on Computer Vision, pp.104-111, 2009. ,
DOI : 10.1109/ICCV.2009.5459154
Local Invariant Feature Tracks for High-Level Video Feature Extraction, Proceedings of the 11th International Workshop on Image Analysis for Multimedia Interactive Services, pp.44-51, 2010. ,
DOI : 10.1007/978-1-4614-3831-1_10
Microsoft kinect, 2013. URL http ,
Scale & affine invariant interest point detectors. IJCV, pp.163-171, 2004. ,
URL : https://hal.archives-ouvertes.fr/inria-00548554
Learning saliency maps for object categorization, p.163, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00203726
Combined ordered and improved trajectories for large scale human action recognition ,
A probabilistic framework for semantic video indexing , filtering, and retrieval. Multimedia, IEEE Transactions on, vol.60, p.78, 2002. ,
Bbn viser trecvid 2011 multimedia event detection system, NIST TRECVID Workshop, p.62, 2011. ,
<title>QBIC project: querying images by content, using color, texture, and shape</title>, Storage and Retrieval for Image and Video Databases, p.45, 1993. ,
DOI : 10.1117/12.143648
Sampling Strategies for Bag-of-Features Image Classification, p.49, 2006. ,
DOI : 10.1007/11744085_38
URL : https://hal.archives-ouvertes.fr/hal-00203752
Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, issue.7, pp.971-987, 2002. ,
DOI : 10.1109/TPAMI.2002.1017623
Modeling the shape of the scene: A holistic representation of the spatial envelope, International Journal of Computer Vision, vol.42, issue.3, pp.145-175, 2001. ,
DOI : 10.1023/A:1011139631724
CRFs for Image Classification, p.61, 2003. ,
Determining Patch Saliency Using Low-Level Context, p.163, 2008. ,
DOI : 10.1007/978-3-540-88688-4_33
Structured Learning of Human Interactions in TV Shows, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.12, pp.14-40, 2012. ,
DOI : 10.1109/TPAMI.2012.24
Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 0197. ,
DOI : 10.1109/CVPR.2007.383266
Improving the fisher kernel for largescale image classification, Computer Vision?ECCV 2010, pp.143-156, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00548630
Bilinear classifiers for visual recognition, Advances in Neural Information Processing Systems (NIPS), pp.1482-1490, 2009. ,
Cea list's participation at mediaeval 2012 placing task, p.77 ,
Vision-based human motion analysis: An overview. Computer vision and image understanding, pp.4-18, 2007. ,
A survey on vision-based human action recognition, Image and Vision Computing, vol.28, issue.6, p.43, 2010. ,
DOI : 10.1016/j.imavis.2009.11.014
Correlative multilabel video annotation, Proceedings of the 15th international conference on Multimedia, pp.17-26, 2007. ,
Segmenting Salient Objects from Images and Videos, pp.168-169, 2010. ,
DOI : 10.1007/978-3-642-15555-0_27
Joint pose estimation and action recognition in image graphs, 2011 18th IEEE International Conference on Image Processing, p.55, 2011. ,
DOI : 10.1109/ICIP.2011.6116197
URL : https://hal.archives-ouvertes.fr/hal-01063329
Learning to parse images of articulated bodies, Advances in neural information processing systems, p.54, 2006. ,
Real-time classification of dance gestures from skeleton animation, Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA '11, pp.44-56, 2011. ,
DOI : 10.1145/2019406.2019426
Recognizing 50 human action categories of web videos. MVA, 2012, pp.67-201 ,
Gibbs point processes for studying the development of spatial-temporal stochastic processes, Computational Statistics & Data Analysis, vol.36, issue.1, pp.85-105, 2001. ,
DOI : 10.1016/S0167-9473(00)00028-1
What helps where – and why? Semantic relatedness for knowledge transfer, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.910-917, 2010. ,
DOI : 10.1109/CVPR.2010.5540121
An overview of contest on semantic description of human activities (sdha) 2010. Recognizing Patterns in Signals, Speech, Images and Videos, pp.270-285, 2010. ,
Action bank: A high-level representation of activity in video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.69-193, 2012. ,
DOI : 10.1109/CVPR.2012.6247806
Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.65-66 ,
DOI : 10.1109/ICPR.2004.1334462
Locality-constrained and spatially regularized coding for scene categorization, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.52, 2012. ,
DOI : 10.1109/CVPR.2012.6248107
Top-down color attention for object recognition, 2009 IEEE 12th International Conference on Computer Vision, p.163, 2009. ,
DOI : 10.1109/ICCV.2009.5459362
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes, Journal of Machine Learning Research, p.143, 2013. ,
Discriminative spatial saliency for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.162-163 ,
DOI : 10.1109/CVPR.2012.6248093
URL : https://hal.archives-ouvertes.fr/hal-00714311
Sampling Strategies for Real-Time Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.66-193 ,
DOI : 10.1109/CVPR.2013.335
Unsupervised Discovery of Mid-Level Discriminative Patches, p.77, 2012. ,
DOI : 10.1007/978-3-642-33709-3_6
Positive definite dictionary learning for region covariances, 2011 International Conference on Computer Vision, pp.99-101, 2011. ,
DOI : 10.1109/ICCV.2011.6126346
Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, pp.95-96, 2003. ,
DOI : 10.1109/ICCV.2003.1238663
Evaluation campaigns and TRECVid, Proceedings of the 8th ACM international workshop on Multimedia information retrieval , MIR '06, pp.321-330, 2006. ,
DOI : 10.1145/1178677.1178722
Content-based image retrieval at the end of the early years. Pattern Analysis and Machine BIBLIOGRAPHY 227 ,
Multimedia semantic indexing using model vectors, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), 2003. ,
DOI : 10.1109/ICME.2003.1221649
Covariance, subspace, and intrinsic Crame/spl acute/r-Rao bounds, IEEE Transactions on Signal Processing, vol.53, issue.5, pp.1610-1630, 2005. ,
DOI : 10.1109/TSP.2005.845428
Concept-Based Video Retrieval, Foundations and Trends?? in Information Retrieval, vol.2, issue.4, pp.215-322, 2008. ,
DOI : 10.1561/1500000014
Early versus late fusion in semantic video analysis, Proceedings of the 13th annual ACM international conference on Multimedia , MULTIMEDIA '05, pp.399-402, 2005. ,
DOI : 10.1145/1101149.1101236
Classifying web videos using a global video descriptor. MVA, 2012, p.69 ,
Ucf101: A dataset of 101 human actions classes from videos in the wild, pp.68-185 ,
Hierarchical spatiotemporal context modeling for action recognition, Computer Vision and Pattern Recognition (CVPR, pp.62-77, 2009. ,
Human activity detection from rgbd images, Plan, Activity, and Intent Recognition, p.64, 2011. ,
An Introduction to Conditional Random Fields for Relational Learning. Introduction to statistical relational learning, pp.93-61, 2007. ,
Separating Style and Content with Bilinear Models, Neural Computation, vol.13, issue.6, pp.1247-1283, 2000. ,
DOI : 10.1016/0167-6393(88)90018-0
The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp.1089-1096, 2009. ,
DOI : 10.1109/ICCVW.2009.5457583
Efficient Object Category Recognition Using Classemes, Computer Vision?ECCV 2010, pp.44-57, 2010. ,
DOI : 10.1007/978-3-642-15549-9_56
A feature-integration theory of attentation, Cognitive psychology, vol.157, p.159, 1980. ,
Dense interest points, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.2281-2288, 2010. ,
DOI : 10.1109/CVPR.2010.5539911
Region covariance: A fast descriptor for detection and classification UCF. Thumos: The first international workshop on action recogntion with a large number of classes, Computer Vision?ECCV, vol.99, issue.100188, pp.101-187, 2006. ,
A statistical overview of recent literature in information fusion, Information Fusion Proceedings of the Third International Conference on, p.62, 2000. ,
URL : https://hal.archives-ouvertes.fr/hal-00514175
Evaluating color descriptors for object and scene recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.32, issue.48, pp.1582-1596, 2010. ,
Visual word ambiguity. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.32, issue.7, pp.1271-1283, 2010. ,
Is early vision optimized for extracting higher-order dependencies? Theory of Probability & Its Application, p.107, 1971. ,
A generic framework for semantic sports video analysis using dynamic bayesian networks, 2005. ,
Video event detection using motion relativity and visual relatedness, Proceeding of the 16th ACM international conference on Multimedia, MM '08, pp.239-248, 2008. ,
DOI : 10.1145/1459359.1459392
Lear-inria submission for the thumos workshop, p.14 ,
Evaluation of local spatio-temporal features for action recognition, Procedings of the British Machine Vision Conference 2009, pp.75-78, 2009. ,
DOI : 10.5244/C.23.124
URL : https://hal.archives-ouvertes.fr/inria-00439769
Action recognition by dense trajectories, CVPR 2011, pp.98-115, 0193. ,
DOI : 10.1109/CVPR.2011.5995407
URL : https://hal.archives-ouvertes.fr/inria-00583818
Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, pp.1-20 ,
DOI : 10.1007/s11263-012-0594-8
URL : https://hal.archives-ouvertes.fr/hal-00725627
Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, p.51, 2013. ,
DOI : 10.1109/ICCV.2013.441
URL : https://hal.archives-ouvertes.fr/hal-00873267
Locality-constrained Linear Coding for image classification, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3360-3367, 2010. ,
DOI : 10.1109/CVPR.2010.5540018
Learning sparse covariance patterns for natural scenes, Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp.2767-2774 ,
Local intensity order pattern for feature description, ICCV. IEEE, p.163, 2011. ,
Multi-cue fusion for semantic video indexing, Proceeding of the 16th ACM international conference on Multimedia, MM '08, pp.71-80, 2008. ,
DOI : 10.1145/1459359.1459370
Support vector machines for multi-class pattern recognition, ESANN, pp.61-72, 1999. ,
An efficient dense and scaleinvariant spatio-temporal interest point detector, Computer Vision?ECCV, vol.50, pp.650-663, 2008. ,
Modeling Appearances with Low-Rank SVM, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-6, 2007. ,
DOI : 10.1109/CVPR.2007.383099
Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories, 2011 International Conference on Computer Vision, pp.1419-1426, 2011. ,
DOI : 10.1109/ICCV.2011.6126397
Semantic context modeling with maximal margin Conditional Random Fields for automatic image annotation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3368-3375, 2010. ,
DOI : 10.1109/CVPR.2010.5540015
Mining Relationship Between Video Concepts using Probabilistic Graphical Models, 2006 IEEE International Conference on Multimedia and Expo, pp.301-304, 2006. ,
DOI : 10.1109/ICME.2006.262458
Linear spatial pyramid matching using sparse coding for image classification, CVPR. IEEE, pp.88-96, 2009. ,
Recognizing human actions from still images with latent poses, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.2030-2037, 2010. ,
DOI : 10.1109/CVPR.2010.5539879
Tag localization with spatial correlations and joint group sparsity, CVPR 2011, pp.881-888, 2011. ,
DOI : 10.1109/CVPR.2011.5995499
Does Human Action Recognition Benefit from Pose Estimation?, Procedings of the British Machine Vision Conference 2011, pp.44-56, 2011. ,
DOI : 10.5244/C.25.67
Modeling mutual context of object and human pose in human-object interaction activities, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.17-24, 2010. ,
DOI : 10.1109/CVPR.2010.5540235
Eye movements and vision, p.159, 1967. ,
DOI : 10.1007/978-1-4899-5379-7
Color texture moments for contentbased image retrieval, International Conference on Image Processing, p.45, 2002. ,
Nonlinear learning using local coordinate coding, NIPS, vol.96, p.97, 2009. ,
Learning image representations from the pixel level via hierarchical sparse coding, CVPR 2011, pp.99-101 ,
DOI : 10.1109/CVPR.2011.5995732