The LEAR submission at Thumos 2014, ECCV International Workshop and Competition on Action Recognition with a Large Number of Classes, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01074442
A robust and efficient video representation for action recognition. ArXiv e-prints, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01145834
SLIC superpixels compared to state-of-the-art superpixel methods, PAMI, vol.34, issue.11, pp.2274-2282, 2012. ,
Human motion analysis: A review, CVIU, vol.73, issue.3, pp.428-440, 1999. ,
Human activity analysis, ACM Computing Surveys, vol.43, issue.3, pp.1-43, 2011. ,
DOI : 10.1145/1922649.1922653
Measuring the Objectness of Image Windows, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.11, pp.2189-2202, 2012. ,
DOI : 10.1109/TPAMI.2012.28
Handwritten Word Spotting with Corrected Attributes, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.130
URL : https://hal.archives-ouvertes.fr/hal-00906787
The AXES submissions at TRECVID 2013, TRECVID Workshop, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00904404
Efficient algorithms for subwindow search in object detection and localization, CVPR, 2009. ,
Three things everyone should know to improve object retrieval, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012. ,
DOI : 10.1109/CVPR.2012.6248018
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.370.7498
All about VLAD, CVPR, 2013. ,
Contour Detection and Hierarchical Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.5, pp.898-916, 2011. ,
DOI : 10.1109/TPAMI.2010.161
Exploring large feature spaces with hierarchical multiple kernel learning, NIPS, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00319660
Space-time robust video representation for action recognition, ICCV, pp.51-53, 2013. ,
Video in sentences out, Proceedings of the Annual Conference on Uncertainty in Artificial Intelligence, 2012. ,
SURF: Speeded up robust features, CVIU, vol.110, issue.3, pp.346-359, 2008. ,
Rotationally invariant image operators, ICPR, 1978. ,
Dynamic Programming, 1957. ,
Object and Action Classification with Latent Window Parameters, International Journal of Computer Vision, vol.15, issue.4, pp.237-251, 2014. ,
DOI : 10.1007/s11263-013-0646-8
Actions as space-time shapes, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005. ,
DOI : 10.1109/ICCV.2005.28
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.8218
Learning to Localize Objects with Structured Output Regression, ECCV, 2008. ,
DOI : 10.1007/978-3-540-88682-2_2
Movement, activity and action: the role of knowledge in the perception of motion, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.352, issue.1358, pp.1257-1265, 1358. ,
DOI : 10.1098/rstb.1997.0108
Representing shape with a spatial pyramid kernel, Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR '07, 2007. ,
DOI : 10.1145/1282280.1282340
Learning mid-level features for recognition, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.14 ,
DOI : 10.1109/CVPR.2010.5539963
A theoretical analysis of feature pooling in visual recognition, ICML, 2010b. Cited on, p.14 ,
Shadow puppetry, Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999. ,
DOI : 10.1109/ICCV.1999.790422
Learning spatio-temporal graphs of human activities, ICCV, 2011. ,
Object Segmentation by Long Term Analysis of Point Trajectories, ECCV, p.12, 2010. ,
DOI : 10.1007/978-3-642-15555-0_21
Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.3, pp.93-95, 2011. ,
DOI : 10.1109/TPAMI.2010.143
Multi-view Super Vector for Action Recognition, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.83
Invariant features for 3-D gesture recognition, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, 1996. ,
DOI : 10.1109/AFGR.1996.557258
Recognition of human body motion using phase space constraints, Proceedings of IEEE International Conference on Computer Vision, 1995. ,
DOI : 10.1109/ICCV.1995.466880
Cross-dataset action detection, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010. ,
DOI : 10.1109/CVPR.2010.5539875
Scene Aligned Pooling for Complex Video Recognition, ECCV, 2012. ,
DOI : 10.1007/978-3-642-33709-3_49
CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.7, pp.1312-1328, 2012. ,
DOI : 10.1109/TPAMI.2011.231
Semantic Segmentation with Second-Order Pooling, ECCV, 2012. ,
DOI : 10.1007/978-3-642-33786-4_32
LIBSVM, ACM Transactions on Intelligent Systems and Technology, vol.2, issue.3, pp.1-27, 2011. ,
DOI : 10.1145/1961189.1961199
The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, pp.15-33, 2011. ,
DOI : 10.5244/C.25.76
Propagating multi-class pixel labels throughout video frames, 2010 Western New York Image Processing Workshop, 2010. ,
DOI : 10.1109/WNYIPW.2010.5649773
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.188.7421
Efficient Maximum Appearance Search for Large-Scale Object Detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.62, 2013. ,
DOI : 10.1109/CVPR.2013.410
Advances in human action recognition: A survey. ArXiv e-prints, 2015. ,
BING: Binarized Normed Gradients for Objectness Estimation at 300fps, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.3286-3293, 2014. ,
DOI : 10.1109/CVPR.2014.414
Poisson mixtures, Natural Language Engineering, vol.none, issue.02, pp.163-190, 1995. ,
DOI : 10.1002/asi.4630260402
Describing Textures in the Wild, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.461
URL : https://hal.archives-ouvertes.fr/hal-01109284
Image categorization using Fisher kernels of non-iid image models, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.66 ,
DOI : 10.1109/CVPR.2012.6247926
URL : https://hal.archives-ouvertes.fr/hal-00685943
Segmentation Driven Object Detection with Fisher Vectors, 2013 IEEE International Conference on Computer Vision, p.91 ,
DOI : 10.1109/ICCV.2013.369
URL : https://hal.archives-ouvertes.fr/hal-00873134
Trans-Media Pseudo-Relevance Feedback Methods in Multimedia Retrieval, Advances in Multilingual and Multimodal Information Retrieval, 2008. ,
DOI : 10.1007/978-3-540-85760-0_71
Efficient Multilevel Brain Tumor Segmentation With Integrated Bayesian Model Classification, IEEE Transactions on Medical Imaging, vol.27, issue.5, pp.629-640, 2008. ,
DOI : 10.1109/TMI.2007.912817
Support-vector networks, Machine Learning, pp.273-297, 1995. ,
DOI : 10.1007/BF00994018
An Efficient Approach to Semantic Segmentation, International Journal of Computer Vision, vol.60, issue.2, pp.198-212, 2011. ,
DOI : 10.1007/s11263-010-0344-8
Visual categorization with bags of keypoints, ECCV Workshop on Statistical Learning in Computer Vision, pp.13-32, 2004. ,
Compact representation of bidirectional texture functions, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001. ,
DOI : 10.1109/CVPR.2001.990645
Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), p.29 ,
DOI : 10.1109/CVPR.2005.177
URL : https://hal.archives-ouvertes.fr/inria-00548512
Space-time gestures, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1993. ,
DOI : 10.1109/CVPR.1993.341109
A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013. ,
DOI : 10.1109/CVPR.2013.340
Structured Forests for Fast Edge Detection, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.231
Behavior Recognition via Sparse Spatio-Temporal Features, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp.11-28, 2005. ,
DOI : 10.1109/VSPETS.2005.1570899
Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, pp.22-55, 2009. ,
DOI : 10.1109/ICCV.2009.5459279
Category Independent Object Proposals, ECCV, 2010. Cited on pages 88, 91, and 92 ,
DOI : 10.1007/978-3-642-15555-0_42
Hybrid Models for Human Motion Recognition, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005. ,
DOI : 10.1109/CVPR.2005.179
Describing objects by their attributes, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009. ,
DOI : 10.1109/CVPR.2009.5206772
Two-Frame Motion Estimation Based on Polynomial Expansion, Proceedings of the Scandinavian Conference on Image Analysis, p.12, 2003. ,
DOI : 10.1007/3-540-45103-X_50
Efficient Graph-Based Image Segmentation, International Journal of Computer Vision, vol.59, issue.2, pp.167-181, 2004. ,
DOI : 10.1023/B:VISI.0000022288.19776.77
Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1627-1645, 2010. ,
DOI : 10.1109/TPAMI.2009.167
Progressive search space reduction for human pose estimation, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008. ,
DOI : 10.1109/CVPR.2008.4587468
The Representation and Matching of Pictorial Structures, IEEE Transactions on Computers, vol.22, issue.1, pp.2267-92, 1973. ,
DOI : 10.1109/T-C.1973.223602
Spectral grouping using the nystrom method, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.26, issue.2, pp.214-225, 2004. ,
DOI : 10.1109/TPAMI.2004.1262185
Learning to segment moving objects in videos, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.91-92, 2015. ,
DOI : 10.1109/CVPR.2015.7299035
Actom sequence models for efficient action detection, CVPR 2011, pp.22-56, 2011. ,
DOI : 10.1109/CVPR.2011.5995646
URL : https://hal.archives-ouvertes.fr/inria-00575217
Activity representation with motion hierarchies, International Journal of Computer Vision, vol.10, issue.3, pp.219-238, 2013. ,
DOI : 10.1007/s11263-013-0677-1
URL : https://hal.archives-ouvertes.fr/hal-00908581
A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.438
Decomposing Bag of Words Histograms, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.45
URL : https://hal.archives-ouvertes.fr/hal-00874895
The Visual Analysis of Human Movement: A Survey, Computer Vision and Image Understanding, vol.73, issue.1, pp.82-98, 1999. ,
DOI : 10.1006/cviu.1998.0716
Finegrained categorization by alignments, ICCV, 2013. ,
DOI : 10.1109/iccv.2013.215
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.643.7151
On feature combination for multiclass object classification, 2009 IEEE 12th International Conference on Computer Vision, 2009. ,
DOI : 10.1109/ICCV.2009.5459169
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.88-91, 2014. ,
DOI : 10.1109/CVPR.2014.81
Deformable part models are convolutional neural networks, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,
DOI : 10.1109/CVPR.2015.7298641
Multiple kernel learning algorithms, JMLR, vol.12, pp.2211-2268, 2011. ,
Actions as Space-Time Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.12, pp.2247-2253, 2007. ,
DOI : 10.1109/TPAMI.2007.70711
Efficient hierarchical graph-based video segmentation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.89, 2010. ,
DOI : 10.1109/CVPR.2010.5539893
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.10, pp.311775-1789, 2009. ,
DOI : 10.1109/TPAMI.2009.83
Recommendations for video event recognition using concept vocabularies, Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, ICMR '13, 2013. ,
DOI : 10.1145/2461466.2461482
A Combined Corner and Edge Detector, Procedings of the Alvey Vision Conference 1988, 1988. ,
DOI : 10.5244/C.2.23
Combining efficient object localization and image classification, 2009 IEEE 12th International Conference on Computer Vision, 2009. ,
DOI : 10.1109/ICCV.2009.5459257
URL : https://hal.archives-ouvertes.fr/inria-00439516
Model-based vision: a program to see a walking person, Image and Vision Computing, vol.1, issue.1, pp.5-20, 1983. ,
DOI : 10.1016/0262-8856(83)90003-3
Semantic kernel forests from multiple taxonomies, NIPS, pp.1718-1726, 2012. ,
Representation and visual recognition of complex, multi-agent actions using belief networks, 1998. ,
Recognizing Complex Events Using Large Margin Joint Low-Level Event Model, ECCV, 2012. ,
DOI : 10.1007/978-3-642-33765-9_31
Exploiting generative models in discriminative classifiers, NIPS, pp.15-34, 1999. ,
Better Exploiting Motion for Better Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.52-80 ,
DOI : 10.1109/CVPR.2013.330
URL : https://hal.archives-ouvertes.fr/hal-00813014
Action Localization with Tubelets from Motion, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.22-91, 2014. ,
DOI : 10.1109/CVPR.2014.100
URL : https://hal.archives-ouvertes.fr/hal-00996844
On the burstiness of visual elements, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009. ,
DOI : 10.1109/CVPR.2009.5206609
Aggregating Local Image Descriptors into Compact Codes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.9, pp.1704-1716, 2012. ,
DOI : 10.1109/TPAMI.2011.235
3D Convolutional Neural Networks for Human Action Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.1, pp.221-231, 2013. ,
DOI : 10.1109/TPAMI.2012.59
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.169.4046
Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, 2013. ,
DOI : 10.1145/2647868.2654889
Trajectory-Based Modeling of Human Actions with Motion Reference Points, ECCV, p.50, 2012. ,
DOI : 10.1007/978-3-642-33715-4_31
High-level event recognition in unconstrained videos, International Journal of Multimedia Information Retrieval, vol.73, issue.2, pp.73-101, 2013. ,
DOI : 10.1007/s13735-012-0024-2
THUMOS challenge: Action recognition with a large number of classes, pp.43-53 ,
THUMOS challenge: Action recognition with a large number of classes ,
Visual perception of biological motion and a model for its analysis, Perception & Psychophysics, vol.4, issue.2, pp.201-211, 1973. ,
DOI : 10.3758/BF03212378
Textons, the elements of texture perception, and their interactions, Nature, vol.32, issue.5802, pp.91-97, 1981. ,
DOI : 10.1038/290091a0
L1- regularized logistic regression stacking and transductive CRF smoothing for action recognition in video, ICCV Workshop on Action Recognition with a Large Number of Classes, pp.52-53, 2013. ,
Large-Scale Video Classification with Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.223
Distribution of content words and phrases in text and language modelling, Natural Language Engineering, vol.2, issue.1, pp.15-59, 1996. ,
DOI : 10.1017/S1351324996001246
Segmental multi-way local pooling for video recognition, Proceedings of the 21st ACM international conference on Multimedia, MM '13, 2013. ,
DOI : 10.1145/2502081.2502167
A Spatio-Temporal Descriptor Based on 3D-Gradients, Procedings of the British Machine Vision Conference 2008, p.11, 2008. ,
DOI : 10.5244/C.22.99
Human Focused Action Localization in Video, ECCV Workshop on Sign, Gesture, and Activity, pp.22-44, 2010. ,
DOI : 10.1007/978-3-642-35749-7_17
Fisher vectors derived from hybrid Gaussian-Laplacian mixture models for image annotation ArXiv eprints, 2014. ,
Natural language description of human activities from video images based on concept hierarchy of actions, International Journal of Computer Vision, vol.50, issue.2, pp.171-184, 2002. ,
DOI : 10.1023/A:1020346032608
Geodesic object proposals, ECCV, 2014. Cited on pages 26 ,
Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, 2011. ,
DOI : 10.1109/ICCV.2011.6126406
URL : https://hal.archives-ouvertes.fr/inria-00612277
Imagenet classification with deep convolutional neural networks, NIPS, 2012. Cited on, p.17 ,
HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, pp.42-51, 2011. ,
DOI : 10.1109/ICCV.2011.6126543
Speech Processing for Audio Indexing, Advances in Natural Language Processing, 2008. ,
DOI : 10.1109/TSA.1996.481450
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.12, pp.31-2129, 2009. ,
DOI : 10.1109/TPAMI.2009.144
Learning to detect unseen object classes by between-class attribute transfer, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009. ,
DOI : 10.1109/CVPR.2009.5206594
Discriminative figure-centric models for joint action localization and recognition, ICCV, p.22, 2011. ,
On Space-Time Interest Points, International Journal of Computer Vision, vol.17, issue.8, pp.107-123, 2005. ,
DOI : 10.1007/s11263-005-1838-7
Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, pp.27-56, 2007. ,
DOI : 10.1109/ICCV.2007.4409105
Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.29-35, 2008. ,
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006. ,
DOI : 10.1109/CVPR.2006.68
URL : https://hal.archives-ouvertes.fr/inria-00548585
Learning hierarchical invariant spatiotemporal features for action recognition with independent subspace analysis, CVPR, 2011. ,
Representing and recognizing the visual appearance of materials using three-dimensional textons, International Journal of Computer Vision, vol.43, issue.1, pp.29-44, 2001. ,
DOI : 10.1023/A:1011126920638
Dynamic Pooling for Complex Event Recognition, 2013 IEEE International Conference on Computer Vision, pp.52-58 ,
DOI : 10.1109/ICCV.2013.339
Codemaps - Segment, Classify and Search Objects Locally, 2013 IEEE International Conference on Computer Vision, pp.88-90 ,
DOI : 10.1109/ICCV.2013.454
Recognizing realistic actions from videos, CVPR, pp.11-52, 2009. ,
Recognizing human actions by attributes, CVPR 2011, 2011. ,
DOI : 10.1109/CVPR.2011.5995353
Encoding high dimensional local features by sparse coding based Fisher vectors, NIPS, 2014. ,
Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004. ,
DOI : 10.1023/B:VISI.0000029664.99615.94
Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013. ,
DOI : 10.1109/CVPR.2013.242
An iterative image registration technique with an application to stereo vision, IJCAI, pp.674-679 ,
Action Recognition and Localization by Hierarchical Space-Time Segments, 2013 IEEE International Conference on Computer Vision, pp.51-52, 2013. ,
DOI : 10.1109/ICCV.2013.341
Maximum weight cliques with mutex constraints for video object segmentation, CVPR, 2012. ,
Complex Event Detection via Multi-source Video Attributes, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.21 ,
DOI : 10.1109/CVPR.2013.339
Textons, contours and regions: cue integration in image segmentation, Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999. ,
DOI : 10.1109/ICCV.1999.790346
Prime Object Proposals with Randomized Prim's Algorithm, 2013 IEEE International Conference on Computer Vision, pp.91-96 ,
DOI : 10.1109/ICCV.2013.315
Representation and Recognition of the Spatial Organization of Three-Dimensional Shapes, Proceedings of the Royal Society B: Biological Sciences, vol.200, issue.1140, pp.269-294, 1140. ,
DOI : 10.1098/rspb.1978.0020
Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.42-113, 2009. ,
DOI : 10.1109/CVPR.2009.5206557
Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition, ECCV, pp.50-52, 2012. ,
DOI : 10.1007/978-3-642-33709-3_60
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.423.5629
Representing Pairwise Spatial and Temporal Relations for Action Recognition, ECCV, 2010. ,
DOI : 10.1007/978-3-642-15549-9_37
Trajectons: Action recognition through the motion analysis of tracked features, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp.11-12, 2009. ,
DOI : 10.1109/ICCVW.2009.5457659
Conceptlets: Selective Semantics for Classifying Video Events, IEEE Transactions on Multimedia, vol.16, issue.8, pp.2214-2228, 2014. ,
DOI : 10.1109/TMM.2014.2359771
Spatially Local Coding for Object Recognition, ACCV, 2012. ,
DOI : 10.1007/978-3-642-37331-2_16
Semantic Model Vectors for Complex Video Event Recognition, IEEE Transactions on Multimedia, vol.14, issue.1, pp.88-101, 2012. ,
DOI : 10.1109/TMM.2011.2168948
Activity recognition using the velocity histories of tracked keypoints, 2009 IEEE 12th International Conference on Computer Vision, 2009. ,
DOI : 10.1109/ICCV.2009.5459154
A Survey of Computer Vision-Based Human Motion Capture, Computer Vision and Image Understanding, vol.81, issue.3, pp.231-268, 2001. ,
DOI : 10.1006/cviu.2000.0897
A survey of advances in visionbased human motion capture and analysis, CVIU, vol.104, issue.23, pp.90-126, 2006. ,
Ordered Trajectories for Large Scale Human Action Recognition, 2013 IEEE International Conference on Computer Vision Workshops, p.53, 2013. ,
DOI : 10.1109/ICCVW.2013.61
Combined ordered and improved trajectories for large scale human action recognition, ICCV Workshop on Action Recognition with a Large Number of Classes, pp.52-53, 2013. ,
Evaluating multimedia features and fusion for example-based event detection, Machine Vision and Applications, pp.17-32, 2014. ,
Multimodal feature fusion for robust event detection in web videos, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012. ,
DOI : 10.1109/CVPR.2012.6247814
Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification, ECCV, pp.42-43, 2010. ,
DOI : 10.1007/978-3-642-15552-9_29
Analyzing and recognizing walking figures in XYT, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition CVPR-94, 1994. ,
DOI : 10.1109/CVPR.1994.323868
Modeling the shape of the scene: A holistic representation of the spatial envelope, International Journal of Computer Vision, vol.42, issue.3, pp.145-175, 2001. ,
DOI : 10.1023/A:1011139631724
Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, pp.81-82 ,
DOI : 10.1109/ICCV.2013.228
URL : https://hal.archives-ouvertes.fr/hal-00873662
Spatio-temporal Object Detection Proposals, ECCV, 2014. ,
DOI : 10.1007/978-3-319-10578-9_48
URL : https://hal.archives-ouvertes.fr/hal-01021902
Efficient Action Localization with Approximately Normalized Fisher Vectors, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.326
URL : https://hal.archives-ouvertes.fr/hal-00979594
TRECVID 2012 ? an overview of the goals, tasks, data, evaluation mechanisms and metrics, Proceedings of TRECVID, pp.28-44, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00953826
Fast Object Segmentation in Unconstrained Video, 2013 IEEE International Conference on Computer Vision, pp.115-116 ,
DOI : 10.1109/ICCV.2013.223
A Topological Approach to Hierarchical Segmentation using Mean Shift, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.89-101, 2007. ,
DOI : 10.1109/CVPR.2007.383228
High Five: Recognising human interactions in TV shows, BMVC, p.11, 2010. ,
Fast and robust Earth Mover's Distances, 2009 IEEE 12th International Conference on Computer Vision, 2009. ,
DOI : 10.1109/ICCV.2009.5459199
Hybrid super vector with improved dense trajectories for action recognition, ICCV Workshop on Action Recognition with a Large Number of Classes, 2013. ,
Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics, ECCV, 2014. Cited on, p.16 ,
DOI : 10.1007/978-3-319-10578-9_43
Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice, Computer Vision and Image Understanding, vol.150, p.15, 2014. ,
DOI : 10.1016/j.cviu.2016.03.013
Action Recognition with Stacked Fisher Vectors, ECCV, 2014. Cited on, p.16 ,
DOI : 10.1007/978-3-319-10602-1_38
Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.34-64, 2007. ,
DOI : 10.1109/CVPR.2007.383266
Improving the Fisher Kernel for Large-Scale Image Classification, ECCV, pp.14-62, 2010. ,
DOI : 10.1007/978-3-642-15561-1_11
URL : https://hal.archives-ouvertes.fr/inria-00548630
Comparison of human and computer performance across face recognition experiments, Image and Vision Computing, vol.32, issue.1, pp.74-85, 2014. ,
DOI : 10.1016/j.imavis.2013.12.002
A survey on vision-based human action recognition, Image and Vision Computing, vol.28, issue.6, pp.976-990, 2010. ,
DOI : 10.1016/j.imavis.2009.11.014
Learning object class detectors from weakly annotated video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.100, 2012. ,
DOI : 10.1109/CVPR.2012.6248065
URL : https://hal.archives-ouvertes.fr/hal-00695940
Weakly Supervised Learning of Interactions between Humans and Objects, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.3, pp.601-614, 2012. ,
DOI : 10.1109/TPAMI.2011.158
URL : https://hal.archives-ouvertes.fr/inria-00516477
Explicit Modeling of Human-Object Interactions in Realistic Videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.4, pp.835-848, 2013. ,
DOI : 10.1109/TPAMI.2012.175
URL : https://hal.archives-ouvertes.fr/hal-00720847
Fundamentals of Speech Recognition, 1993. ,
A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, pp.257-286, 1989. ,
View-invariant representation and recognition of actions, International Journal of Computer Vision, vol.50, issue.2, pp.203-226, 2002. ,
DOI : 10.1023/A:1020350100748
Recognizing 50 human action categories of web videos. Machine Vision and Applications, pp.971-981, 2013. ,
Machine perception of three-dimensional solids, 1963. ,
Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008. ,
DOI : 10.1109/CVPR.2008.4587727
Towards model-based recognition of human movements in image sequences, CVGIP: Image Understanding, vol.59, issue.1, pp.94-115, 1994. ,
Translating Video Content to Natural Language Descriptions, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.61
"GrabCut", ACM Transactions on Graphics, vol.23, issue.3, pp.309-314, 2004. ,
DOI : 10.1145/1015706.1015720
Action bank: A high-level representation of activity in video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012. ,
DOI : 10.1109/CVPR.2012.6247806
Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.26, issue.1, pp.43-49, 1978. ,
DOI : 10.1109/TASSP.1978.1163055
Modeling the spatial layout of images beyond spatial pyramids, Pattern Recognition Letters, vol.33, issue.16, pp.2216-2223, 2012. ,
DOI : 10.1016/j.patrec.2012.07.019
Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, pp.222-245, 2013. ,
DOI : 10.1007/s11263-013-0636-x
Particle Video: Long-Range Motion Estimation Using Point Trajectories, International Journal of Computer Vision, vol.30, issue.3, pp.72-91, 2008. ,
DOI : 10.1007/s11263-008-0136-6
Learning discriminative space-time actions from weakly labelled videos, BMVC, 2012. ,
Modeling the Temporal Extent of Actions, ECCV, 2010. ,
DOI : 10.1007/978-3-642-15549-9_39
Constructing models for content-based image retrieval, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001. ,
DOI : 10.1109/CVPR.2001.990922
URL : https://hal.archives-ouvertes.fr/inria-00548274
Recognizing human actions: a local SVM approach, ICPR, 2004. ,
A 3-dimensional sift descriptor and its application to action recognition, Proceedings of the 15th international conference on Multimedia , MULTIMEDIA '07, 2007. ,
DOI : 10.1145/1291233.1291311
Object Recognition with Features Inspired by Visual Cortex, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005. ,
DOI : 10.1109/CVPR.2005.254
Matching Local Self-Similarities across Images and Videos, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007. ,
DOI : 10.1109/CVPR.2007.383198
Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.371
Two-stream convolutional networks for action recognition in videos, NIPS, 2014. ,
Fisher Vector Faces in the Wild, Procedings of the British Machine Vision Conference 2013, p.34 ,
DOI : 10.5244/C.27.8
Deep Fisher networks for large-scale image classification, NIPS, 2013b. Cited on, p.16 ,
Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, 2003. ,
DOI : 10.1109/ICCV.2003.1238663
Unsupervised learning of human motion, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.25, issue.7, pp.814-827, 2003. ,
DOI : 10.1109/TPAMI.2003.1206511
UCF101: A dataset of 101 human actions classes from videos in the wild, pp.43-53, 2012. ,
Visual recognition of american sign language using hidden Markov models, International Symposium on Computer Vision, 1995. ,
ACTIVE: Activity Concept Transitions in Video Event Classification, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.453
Hierarchical spatio-temporal context modeling for action recognition, CVPR, 2009. ,
Dense point trajectories by GPUaccelerated large displacement optical flow, ECCV, 2010. ,
Deep Fisher Kernels -- End to End Learning of the Fisher Kernel GMM Parameters, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.182
DeepFace: Closing the Gap to Human-Level Performance in Face Verification, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.220
Learning latent temporal structure for complex event detection, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.44 ,
DOI : 10.1109/CVPR.2012.6247808
Combining the Right Features for Complex Event Recognition, 2013 IEEE International Conference on Computer Vision, p.19, 2013. ,
DOI : 10.1109/ICCV.2013.335
SimpleFlow: A Non-iterative, Sublinear Optical Flow Algorithm, Computer Graphics Forum, 2012. ,
DOI : 10.1111/j.1467-8659.2012.03013.x
Motion Words for Videos, ECCV, 2014. Cited on, p.16 ,
DOI : 10.1007/978-3-319-10590-1_47
Convolutional Learning of Spatio-temporal Features, ECCV, 2010. ,
DOI : 10.1007/978-3-642-15567-3_11
Spatio-temporal deformable part models for action detection, CVPR, pp.22-23, 2013. ,
Optimal spatio-temporal path discovery for video event detection, CVPR 2011, p.22, 2011. ,
DOI : 10.1109/CVPR.2011.5995416
Max-margin structured output regression for spatiotemporal action localization, NIPS, pp.350-358, 2012. ,
C3D: Generic features for video analysis. ArXiv e-prints, 2014. ,
Feature Tracking and Motion Compensation for Action Recognition, Procedings of the British Machine Vision Conference 2008, 2008. ,
DOI : 10.5244/C.22.30
Selective Search for Object Recognition, International Journal of Computer Vision, vol.57, issue.1, pp.154-171, 2013. ,
DOI : 10.1007/s11263-013-0620-5
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.3382
Handling Uncertain Tags in Visual Recognition, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.462
Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach, 2013 IEEE International Conference on Computer Vision ,
DOI : 10.1109/ICCV.2013.463
Evaluating Color Descriptors for Object and Scene Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1582-1596, 2010. ,
DOI : 10.1109/TPAMI.2009.154
Segmentation as selective search for object recognition, 2011 International Conference on Computer Vision, p.25, 2011. ,
DOI : 10.1109/ICCV.2011.6126456
Fisher and VLAD with FLAIR, CVPR, pp.88-90, 2014. ,
Online video SEEDS for temporal window objectness, ICCV, pp.89-91, 2013. ,
Learning discriminative Fisher kernels, ICML, 2011. ,
Classifying Images of Materials: Achieving Viewpoint and Illumination Independence, ICCV, 2002. ,
DOI : 10.1007/3-540-47977-5_17
Multiple kernels for object detection, 2009 IEEE 12th International Conference on Computer Vision, 2009. ,
DOI : 10.1109/ICCV.2009.5459183
Robust Real-Time Face Detection, International Journal of Computer Vision, vol.57, issue.2, pp.137-154, 2004. ,
DOI : 10.1023/B:VISI.0000013087.49260.fb
Semantic indexing and multimedia event detection: ECNU at TRECVID 2012, TRECVID Workshop, 2012. ,
Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, pp.12-31, 2013. ,
DOI : 10.1109/ICCV.2013.441
URL : https://hal.archives-ouvertes.fr/hal-00873267
Evaluation of local spatio-temporal features for action recognition, Procedings of the British Machine Vision Conference 2009, 2009. ,
DOI : 10.5244/C.23.124
URL : https://hal.archives-ouvertes.fr/inria-00439769
Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, pp.60-79 ,
DOI : 10.1007/s11263-012-0594-8
URL : https://hal.archives-ouvertes.fr/hal-00725627
Locality-constrained Linear Coding for image classification, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010. ,
DOI : 10.1109/CVPR.2010.5540018
Mining Motion Atoms and Phrases for Complex Action Recognition, 2013 IEEE International Conference on Computer Vision, pp.2680-2687, 2013. ,
DOI : 10.1109/ICCV.2013.333
Latent Hierarchical Model of Temporal Structure for Complex Activity Classification, IEEE Transactions on Image Processing, vol.23, issue.2, pp.810-822, 2014. ,
DOI : 10.1109/TIP.2013.2295753
Video Action Detection with Relational Dynamic-Poselets, ECCV, 2014b. Cited on, p.22 ,
DOI : 10.1007/978-3-319-10602-1_37
A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition, ACCV, 2012b. Cited on, p.15 ,
DOI : 10.1007/978-3-642-37431-9_44
Regionlets for generic object detection, ICCV, 2013c. Cited on, p.91 ,
DOI : 10.1109/tpami.2015.2389830
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.407.4464
Parallel algorithms for approximation of distance maps on parametric surfaces, ACM Transactions on Graphics, vol.27, issue.4, 2008. ,
DOI : 10.1145/1409625.1409626
A survey of vision-based methods for action representation, segmentation and recognition, Computer Vision and Image Understanding, vol.115, issue.2, pp.224-241, 2011. ,
DOI : 10.1016/j.cviu.2010.10.002
URL : https://hal.archives-ouvertes.fr/inria-00459653
An efficient dense and scaleinvariant spatio-temporal interest point detector, ECCV, 2008. ,
Exemplar-based Action Recognition in Video, Procedings of the British Machine Vision Conference 2009, 2009. ,
DOI : 10.5244/C.23.90
Learning visual behavior for gesture analysis, Proceedings of International Symposium on Computer Vision, ISCV, 1995. ,
DOI : 10.1109/ISCV.1995.477006
Object categorization by learned universal visual dictionary, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, p.66 ,
DOI : 10.1109/ICCV.2005.171
Towards Good Practices for Action Video Encoding, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.330
Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories, 2011 International Conference on Computer Vision, 2011. ,
DOI : 10.1109/ICCV.2011.6126397
Evaluation of super-voxel methods for early video processing, CVPR, 2012. Cited on pages 26, p.99 ,
Streaming Hierarchical Video Segmentation, ECCV, pp.89-114, 2012. ,
DOI : 10.1007/978-3-642-33783-3_45
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.298.7791
Flattening Supervoxel Hierarchies by the Uniform Entropy Slice, 2013 IEEE International Conference on Computer Vision, pp.90-115, 2013. ,
DOI : 10.1109/ICCV.2013.279
Recognizing human action in timesequential images using hidden Markov model, CVPR, 1992. ,
Linear spatial pyramid matching using sparse coding for image classification, CVPR, 2009. ,
Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness, ECCV, pp.16-21, 2014. ,
DOI : 10.1007/978-3-319-10605-2_47
Local Trinary Patterns for human action recognition, 2009 IEEE 12th International Conference on Computer Vision, 2009. ,
DOI : 10.1109/ICCV.2009.5459201
Propagative Hough voting for human activity recognition, ECCV, 2012. Cited on, p.52 ,
Discriminative subvolume search for efficient action detection, CVPR, pp.22-90, 2009. ,
Beyond short snippets: Deep networks for video classification ArXiv e-prints, 2015. ,
Visualizing and Understanding Convolutional Networks, ECCV, 2014. ,
DOI : 10.1007/978-3-319-10590-1_53
Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013. ,
DOI : 10.1109/CVPR.2013.87
Submodular attribute selection for action recognition in video, NIPS, 2014. ,
Object detectors emerge in deep scene CNNs. ArXiv e-prints, 2014. ,
Image Classification Using Super-Vector Coding of Local Image Descriptors, ECCV, 2010. ,
DOI : 10.1007/978-3-642-15555-0_11
Action Recognition with Actons, 2013 IEEE International Conference on Computer Vision, p.50, 2013. ,
DOI : 10.1109/ICCV.2013.442
Edge Boxes: Locating Object Proposals from Edges, ECCV, pp.25-88, 2014. ,
DOI : 10.1007/978-3-319-10602-1_26