R. @bullet-mihir-jain, P. Benmokhtar, H. Gros, and . Jégou, Hamming Embedding Similarity-based Image Classification, ACM International Conference on Multimedia Retrieval (ICMR), 2012.

H. @bullet-mihir-jain, P. Jégou, and . Bouthemy, Better exploiting motion for better action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013.

P. @bullet-mihir-jain-van-gemert, H. Bouthemy, C. G. Jégou, and . Snoek, Action localization by tubelets from motion, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.

@. C. Other-publications, K. E. Snoek, D. Van-de-sande, A. Fontijne, M. Habibian et al., Mediamill at trecvid: Searching concepts, objects, instances and events in video, Proceedings of TRECVID, 2013.

O. Results-on, Each row shows a query and the top three retrieved images from the dataset. True-positives and false-positives are shown with green and red borders respectively, p.42

I. Results-on and D. Holidays, Each row shows a query and the top three retrieved images from the dataset. True-positives and false-positives are shown with green and red borders respectively, p.43

P. Image-classification-results-using, dataset with consistent setting of parameters. [FK: Fisher Kernel, FK*: Fisher Kernel with our SIFT variant, SV: super vector coding, BOW: bag of words, LLC: locally constrained linear coding, LLC-F: LLC with with original+left-right flipped training images, KCB: Kernel codebook, HE: Hamming Embedding similarity, HE*: HE with our SIFT variant, p.58, 2007.

H. Vig, 143] gets 61.9% by using external eye movements data. *Jiang et al. [66] used one-vs-one multi class SVM while our and other methods use one-vs-rest SVMs. With one-against-one multi class SVM we obtain 45, Comparison with the state of the art on Hollywood2 and, p.83

B. Alexe, T. Deselaers, and V. Ferrari, What is an object?, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540226

B. Alexe, T. Deselaers, and V. Ferrari, Measuring the Objectness of Image Windows, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.11, 2012.
DOI : 10.1109/TPAMI.2012.28

A. Ali, M. Basharat, and . Shah, Chaotic Invariants for Human Action Recognition, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409046

S. Ali and M. Shah, Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.2, pp.288-303, 2010.
DOI : 10.1109/TPAMI.2008.284

R. Aly, R. Arandjelovic, K. Chatfield, M. Douze, B. Fernando et al., The AXES submissions at TrecVid 2013, TRECVID Workshop, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00904404

R. Arandjelovi´carandjelovi´c and A. Zisserman, Three things everyone should know to improve object retrieval, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012.

R. Arandjelovi´carandjelovi´c and A. Zisserman, All about VLAD, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013.

S. Avila, N. Thome, M. Cord, E. Valle, A. De et al., Pooling in image representation: The visual codeword point of view, Computer Vision and Image Understanding, vol.117, issue.5, pp.453-465, 2013.
DOI : 10.1016/j.cviu.2012.09.007

URL : https://hal.archives-ouvertes.fr/hal-01172709

S. Avila, N. Thome, M. Cord, E. Valle, A. De-albuquerque-ará-ujo et al., BOSSA: Extended bow formalism for image classification, 2011 18th IEEE International Conference on Image Processing, 2011.
DOI : 10.1109/ICIP.2011.6116268

URL : https://hal.archives-ouvertes.fr/hal-00625533

H. Bay, T. Tuytelaars, and L. V. , Surf: Speeded up robust features, Proceedings of the European Conference on Computer Vision, 2006.

R. Behmo, P. Marcombes, A. Dalalyan, and V. Prinet, Towards Optimal Naive Bayes Nearest Neighbor, Proceedings of the European Conference on Computer Vision, 2010.
DOI : 10.1007/978-3-642-15561-1_13

URL : https://hal.archives-ouvertes.fr/hal-00654399

L. Bo and C. Sminchisescu, Efficient match kernels between sets of features for visual recognition, Advances in Neural Information Processing Systems, 2009.

O. Boiman, E. Shechman, and M. Irani, In defense of Nearest-Neighbor based image classification, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587598

Y. Boureau, F. Bach, Y. Lecun, and J. Ponce, Learning mid-level features for recognition, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539963

W. Brendel and S. Todorovic, Video object segmentation by tracking regions, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459242

W. Brendel and S. Todorovic, Learning spatiotemporal graphs of human activities, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126316

T. Brox and J. Malik, Object Segmentation by Long Term Analysis of Point Trajectories, Proceedings of the European Conference on Computer Vision, 2010.
DOI : 10.1007/978-3-642-15555-0_21

L. Cao, Z. Liu, and T. S. Huang, Cross-dataset action detection, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539875

B. Caputo and L. Jie, A performance evaluation of exact and approximate match kernels for object recognition. Electronic Letters on Computer Vision and Image Analysis, pp.15-26, 2009.

K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.76

O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4408891

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, pp.273-297, 1995.
DOI : 10.1007/BF00994018

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, ECCV Workshop Statistical Learning in Computer Vision, 2004.

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

N. Dalal, B. Triggs, and C. Schmid, Human Detection Using Oriented Histograms of Flow and Appearance, Proceedings of the European Conference on Computer Vision, 2006.
DOI : 10.1023/A:1008162616689

URL : https://hal.archives-ouvertes.fr/inria-00548587

J. Delhumeau, P. H. Gosselin, H. Jégou, and P. Pérez, Revisiting the VLAD image representation, Proceedings of the 21st ACM international conference on Multimedia, MM '13, 2013.
DOI : 10.1145/2502081.2502171

URL : https://hal.archives-ouvertes.fr/hal-00840653

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.

T. Deselaers, B. Alexe, and V. Ferrari, Weakly Supervised Localization and Learning with Generic Knowledge, International Journal of Computer Vision, vol.73, issue.2, pp.275-293, 2012.
DOI : 10.1007/s11263-012-0538-3

P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, Behavior recognition via sparse spatiotemporal features, VS-PETS, 2005.

W. Dong, M. Charikar, and K. Li, Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '08, 2008.
DOI : 10.1145/1390334.1390358

M. Douze, H. Jégou, H. Singh, L. Amsaleg, and C. Schmid, Evaluation of GIST descriptors for web-scale image search, Proceeding of the ACM International Conference on Image and Video Retrieval, CIVR '09, 2009.
DOI : 10.1145/1646396.1646421

URL : https://hal.archives-ouvertes.fr/inria-00394212

O. Duchenne, A. Joulin, and J. Ponce, A graph-matching kernel for object categorization, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126445

URL : https://hal.archives-ouvertes.fr/hal-00650345

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459279

T. Durand, N. Thome, M. Cord, and S. Avila, Image classification using object detectors, 2013 IEEE International Conference on Image Processing, 2013.
DOI : 10.1109/ICIP.2013.6738894

URL : https://hal.archives-ouvertes.fr/hal-01078079

I. Endres and D. Hoiem, Category Independent Object Proposals, Proceedings of the European Conference on Computer Vision, 2010.
DOI : 10.1007/978-3-642-15555-0_42

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

I. Everts, J. Van-gemert, and T. Gevers, Evaluation of Color STIPs for Human Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.367

G. Farnebäck, Two-Frame Motion Estimation Based on Polynomial Expansion, Scandinavian Conference on Image Analysis, 2003.
DOI : 10.1007/3-540-45103-X_50

P. F. Felzenszwalb, R. B. Girshick, D. A. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1627-1645, 2010.
DOI : 10.1109/TPAMI.2009.167

J. Feng, Y. Wei, L. Tao, C. Zhang, and J. Sun, Salient object detection by composition, Proceedings of the IEEE International Conference on Computer Vision, 2011.

A. Gaidon, Z. Harchaoui, and C. Schmid, Actom sequence models for efficient action detection, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995646

URL : https://hal.archives-ouvertes.fr/inria-00575217

A. Gaidon, Z. Harchaoui, and C. Schmid, Recognizing activities with cluster-trees of tracklets, Procedings of the British Machine Vision Conference 2012, 2012.
DOI : 10.5244/C.26.30

URL : https://hal.archives-ouvertes.fr/hal-00722955

A. Gaidon, M. Marszalek, and C. Schmid, Mining visual actions from movies, Procedings of the British Machine Vision Conference 2009, 2009.
DOI : 10.5244/C.23.125

URL : https://hal.archives-ouvertes.fr/inria-00440973

A. Gordo, J. A. Rodríguez-serrano, F. Perronnin, and E. Valveny, Leveraging category-level labels for instance-level image retrieval, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248035

P. H. Gosselin, M. Cord, and S. Philipp-foliguet, Kernels on bags for multi-object database retrieval, Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR '07, 2007.
DOI : 10.1145/1282280.1282317

K. Grauman and T. Darrell, The pyramid match kernel: discriminative classification with sets of image features, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.239

G. Griffin, A. Holub, and P. Perona, Caltech-256 object category dataset, 2007.

H. Harzallah, F. Jurie, and C. Schmid, Combining efficient object localization and image classification, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459257

URL : https://hal.archives-ouvertes.fr/inria-00439516

A. Hervieu, P. Bouthemy, and J. Cadre, A Statistical Video Content Recognition Method Using Invariant Features on Object Trajectories, IEEE Transactions on Circuits and Systems for Video Technology, pp.1533-1543, 2008.
DOI : 10.1109/TCSVT.2008.2005609

P. Huber, Robust statistics, 1981.

T. Jaakkola and D. Haussler, Exploiting generative models in discriminative classifiers, Advances in Neural Information Processing Systems, 1998.

T. Jaakkola and D. Haussler, Exploiting generative models in discriminative classifiers, Advances in Neural Information Processing Systems, 1999.

M. Jain, R. Benmokhtar, P. Gros, and H. Jégou, Hamming embedding similarity-based image classification, Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR '12, 2012.
DOI : 10.1145/2324796.2324820

URL : https://hal.archives-ouvertes.fr/hal-00688169

M. Jain, H. Jégou, and P. Bouthemy, Better Exploiting Motion for Better Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.330

URL : https://hal.archives-ouvertes.fr/hal-00813014

M. Jain, H. Jégou, and P. Gros, Asymmetric hamming embedding, Proceedings of the 19th ACM international conference on Multimedia, MM '11, 2011.
DOI : 10.1145/2072298.2072035

URL : https://hal.archives-ouvertes.fr/inria-00607278

M. Jain, J. Van-gemert, P. Bouthemy, H. Jégou, and C. G. Snoek, Action Localization with Tubelets from Motion, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.100

URL : https://hal.archives-ouvertes.fr/hal-00996844

H. Jégou and O. Chum, Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening, Proceedings of the European Conference on Computer Vision, 2012.
DOI : 10.1007/978-3-642-33709-3_55

H. Jégou, M. Douze, and C. Schmid, Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search, Proceedings of the European Conference on Computer Vision, 2008.
DOI : 10.1007/978-3-540-88682-2_24

H. Jégou, M. Douze, and C. Schmid, On the burstiness of visual elements, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206609

H. Jégou, M. Douze, and C. Schmid, Improving Bag-of-Features for Large Scale Image Search, International Journal of Computer Vision, vol.42, issue.3, pp.316-336, 2010.
DOI : 10.1007/s11263-009-0285-2

H. Jégou, M. Douze, and C. Schmid, Product Quantization for Nearest Neighbor Search, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.1, pp.117-128, 2011.
DOI : 10.1109/TPAMI.2010.57

H. Jégou, M. Douze, C. Schmid, and P. Pérez, Aggregating local descriptors into a compact image representation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540039

H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez et al., Aggregating Local Image Descriptors into Compact Codes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.9, pp.1704-1716, 2012.
DOI : 10.1109/TPAMI.2011.235

Y. Jiang, Q. Dai, X. Xue, W. Liu, and C. Ngo, Trajectory-Based Modeling of Human Actions with Motion Reference Points, Proceedings of the European Conference on Computer Vision, 2012.
DOI : 10.1007/978-3-642-33715-4_31

I. N. Junejo, E. Dexter, I. Laptev, and P. Pérez, View-Independent Action Recognition from Temporal Self-Similarities, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.1, pp.172-185, 2010.
DOI : 10.1109/TPAMI.2010.68

URL : https://hal.archives-ouvertes.fr/hal-01064695

F. Jurie and B. Triggs, Creating efficient codebooks for visual recognition, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.66

URL : https://hal.archives-ouvertes.fr/inria-00548511

J. Kim and K. Grauman, Asymmetric region-to-image matching for comparing images with generic object categories, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539923

A. Kläser, M. Marszalek, and C. Schmid, A Spatio-Temporal Descriptor Based on 3D-Gradients, Procedings of the British Machine Vision Conference 2008, 2008.
DOI : 10.5244/C.22.99

A. Kläser, M. Marsza?ek, C. Schmid, and A. Zisserman, Human Focused Action Localization in Video, Trends and Topics in Computer Vision, pp.219-233, 2012.
DOI : 10.1007/978-3-642-35749-7_17

O. Kliper-gross, Y. Gurovich, T. Hassner, and L. Wolf, Motion Interchange Patterns for Action Recognition in Unconstrained Videos, Proceedings of the European Conference on Computer Vision, 2012.
DOI : 10.1007/978-3-642-33783-3_19

J. Krapac, F. Jurie, and B. Triggs, Learning tree-structured descriptor quantizers for image categorization, Proceedings of the British Machine Vision Conference, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00613118

J. Krapac, J. J. Verbeek, and F. Jurie, Modeling spatial layout with fisher vectors for image categorization, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126406

URL : https://hal.archives-ouvertes.fr/inria-00612277

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126543

C. H. Lampert, M. B. Blaschko, and T. Hofmann, Beyond sliding windows: Object localization by efficient subwindow search, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587586

T. Lan, Y. Wang, and G. Mori, Discriminative figure-centric models for joint action localization and recognition, Proceedings of the IEEE International Conference on Computer Vision, 2011.

I. Laptev and T. Lindeberg, Space-time interest points, Proceedings of the IEEE International Conference on Computer Vision, 2003.

I. Laptev, M. Marzalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

J. Law-to, L. Chen, A. Joly, I. Laptev, O. Buisson et al., Video copy detection, Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR '07, pp.371-378, 2007.
DOI : 10.1145/1282280.1282336

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.68

URL : https://hal.archives-ouvertes.fr/inria-00548585

J. Lezama, K. Alahari, J. Sivic, and I. Laptev, Track to the future: Spatio-temporal video segmentation with long-range motion cues, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.6044588

URL : https://hal.archives-ouvertes.fr/hal-00817961

F. Li and P. Perona, A bayesian hierarchical model for learning natural scene categories, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005.

J. Liu, B. Kuipers, and S. Savarese, Recognizing human actions by attributes, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995353

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

W. Y. Ma and B. S. Manjunath, NeTra: A toolbox for navigating large image databases, Multimedia Systems, vol.7, issue.3, pp.184-198, 1999.
DOI : 10.1007/s005300050121

J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J. Ponce, Discriminative Sparse Image Models for Class-Specific Edge Detection and Image Interpretation, Proceedings of the European Conference on Computer Vision, 2008.
DOI : 10.1007/978-3-540-88690-7_4

S. Manen, M. Guillaumin, and L. Van-gool, Prime Object Proposals with Randomized Prim's Algorithm, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.315

M. Marzalek, I. Laptev, and C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206557

J. Matas, O. Chum, U. Martin, and T. Pajdla, Robust wide baseline stereo from maximally stable extremal regions, Proceedings of the British Machine Vision Conference, pp.384-393, 2002.

P. Matikainen, M. Hebert, and R. Sukthankar, Trajectons: Action recognition through the motion analysis of tracked features, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 2009.
DOI : 10.1109/ICCVW.2009.5457659

R. Messing, C. J. Pal, and H. A. Kautz, Activity recognition using the velocity histories of tracked keypoints, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459154

K. Mikolajczyk and C. Schmid, Scale & Affine Invariant Interest Point Detectors, International Journal of Computer Vision, vol.60, issue.1, pp.63-86, 2004.
DOI : 10.1023/B:VISI.0000027790.02288.f2

URL : https://hal.archives-ouvertes.fr/inria-00548554

K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.10, pp.1615-1630, 2005.
DOI : 10.1109/TPAMI.2005.188

URL : https://hal.archives-ouvertes.fr/inria-00548227

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas et al., A Comparison of Affine Region Detectors, International Journal of Computer Vision, vol.65, issue.1-2, pp.43-72, 2005.
DOI : 10.1007/s11263-005-3848-x

URL : https://hal.archives-ouvertes.fr/inria-00548528

F. Moosmann, B. Triggs, and F. Jurie, Randomized clustering forests for building fast and discriminative visual vocabularies, Advances in Neural Information Processing Systems, pp.985-992, 2007.

R. Negrel, D. Picard, and P. H. Gosselin, Using spatial pyramids with compacted vlat for image categorization, ICPR, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00753158

J. C. Niebles, C. Chen, and F. Li, Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification, Proceedings of the European Conference on Computer Vision, 2010.
DOI : 10.1007/978-3-642-15552-9_29

D. Nistér and H. Stewénius, Scalable Recognition with a Vocabulary Tree, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.2161-2168, 2006.
DOI : 10.1109/CVPR.2006.264

E. Nowak, F. Jurie, and B. Triggs, Sampling Strategies for Bag-of-Features Image Classification, Proceedings of the European Conference on Computer Vision, 2006.
DOI : 10.1007/11744085_38

URL : https://hal.archives-ouvertes.fr/hal-00203752

J. Odobez and P. Bouthemy, Robust Multiresolution Estimation of Parametric Motion Models, Journal of Visual Communication and Image Representation, vol.6, issue.4, pp.348-365, 1995.
DOI : 10.1006/jvci.1995.1029

A. Oliva and A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope, International Journal of Computer Vision, vol.42, issue.3, pp.145-175, 2001.
DOI : 10.1023/A:1011139631724

D. Oneata, J. Verbeek, and C. Schmid, Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.228

URL : https://hal.archives-ouvertes.fr/hal-00873662

D. Oneata, J. Verbeek, and C. Schmid, Action and Event Recognition with Fisher Vectors on a Compact Feature Set, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.228

URL : https://hal.archives-ouvertes.fr/hal-00873662

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.222

URL : https://hal.archives-ouvertes.fr/hal-00911179

P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders et al., Trecvid 2013 ? an overview of the goals, tasks, data, evaluation mechanisms and metrics, Proceedings of TRECVID 2013, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953093

F. Perronnin and C. R. Dance, Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383266

F. Perronnin, C. R. Dance, G. Csurka, and M. Bressan, Adapted Vocabularies for Generic Visual Categorization, Proceedings of the European Conference on Computer Vision, 2006.
DOI : 10.1007/11744085_36

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, Proceedings of the European Conference on Computer Vision, 2010.
DOI : 10.1007/978-3-642-15561-1_11

URL : https://hal.archives-ouvertes.fr/inria-00548630

F. Perronnin, Y. Liu, J. Sanchez, and H. Poirier, Large-scale image retrieval with compressed Fisher vectors, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540009

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Object retrieval with large vocabularies and fast spatial matching, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383172

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Lost in quantization: Improving particular object retrieval in large scale image databases, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587635

G. Piriou, P. Bouthemy, and J. Yao, Recognition of Dynamic Video Contents With Global Probabilistic Models of Visual Motion, IEEE Transactions on Image Processing, vol.15, issue.11, pp.3417-3430, 2006.
DOI : 10.1109/TIP.2006.881963

URL : https://hal.archives-ouvertes.fr/hal-00453197

E. Rahtu, J. Kannala, and M. Blaschko, Learning a category independent object detection cascade, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126351

URL : https://hal.archives-ouvertes.fr/hal-00855735

C. Rao, A. Yilmaz, and M. Shah, View-invariant representation and recognition of actions, International Journal of Computer Vision, vol.50, issue.2, pp.203-226, 2002.
DOI : 10.1023/A:1020350100748

M. Raptis, I. Kokkinos, and S. Soatto, Discovering discriminative action parts from midlevel video representations, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00918807

J. Revaud, M. Douze, C. Schmid, and H. Jégou, Event Retrieval in Large Video Collections with Circulant Temporal Encoding, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.318

URL : https://hal.archives-ouvertes.fr/hal-00801714

M. D. Rodriguez, J. Ahmed, and M. Shah, Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587727

C. Rosenberg, M. Hebert, and H. Schneiderman, Semi-Supervised Self-Training of Object Detection Models, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05), Volume 1, 2005.
DOI : 10.1109/ACVMOT.2005.107

S. Sadanand and J. J. Corso, Action bank: A high-level representation of activity in video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247806

C. Schmid and R. Mohr, Local grayvalue invariants for image retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.19, issue.5, pp.530-534, 1997.
DOI : 10.1109/34.589215

URL : https://hal.archives-ouvertes.fr/inria-00548358

P. Scovanner, S. Ali, and M. Shah, A 3-dimensional sift descriptor and its application to action recognition, Proceedings of the 15th international conference on Multimedia , MULTIMEDIA '07, 2007.
DOI : 10.1145/1291233.1291311

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, pp.1470-1477, 2003.
DOI : 10.1109/ICCV.2003.1238663

A. F. Smeaton, P. Over, and W. Kraaij, Evaluation campaigns and TRECVid, Proceedings of the 8th ACM international workshop on Multimedia information retrieval , MIR '06, 2006.
DOI : 10.1145/1178677.1178722

C. G. Snoek, K. E. Van-de-sande, D. Fontijne, A. Habibian, M. Jain et al., Mediamill at trecvid 2013: Searching concepts, objects, instances and events in video, Proceedings of TRECVID, 2013.

Z. Song, Q. Chen, Z. Huang, Y. Hua, and S. Yan, Contextualizing object detection and classification, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995330

C. Sun and R. Nevatia, ACTIVE: Activity Concept Transitions in Video Event Classification, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.453

J. Sun, X. Wu, S. Yan, L. F. Cheong, T. Chua et al., Hierarchical spatio-temporal context modeling for action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.

Y. Tian, R. Sukthankar, and M. Shah, Spatiotemporal Deformable Part Models for Action Detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.341

E. Tola, V. Lepetit, and P. Fua, A fast local descriptor for dense matching, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587673

D. Tran and J. Yuan, Optimal spatio-temporal path discovery for video event detection, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995416

D. Tran and J. Yuan, Max-margin structured output regression for spatio-temporal action localization, Advances in Neural Information Processing Systems, 2012.

D. Tran, J. Yuan, and D. Forsyth, Video event detection: From subvolume localization to spatio-temporal path search, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.

R. Trichet and R. Nevatia, Video segmentation with spatio-temporal tubes, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2013.
DOI : 10.1109/AVSS.2013.6636661

T. Tuytelaars, M. Fritz, K. Saenko, and T. Darrel, The NBNN kernel, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126449

H. Uemura, S. Ishikawa, and K. Mikolajczyk, Feature Tracking and Motion Compensation for Action Recognition, Procedings of the British Machine Vision Conference 2008, 2008.
DOI : 10.5244/C.22.30

J. R. Uijlings, K. E. Van-de-sande, T. Gevers, and A. W. Smeulders, Selective Search for Object Recognition, International Journal of Computer Vision, vol.57, issue.1, pp.154-171, 2013.
DOI : 10.1007/s11263-013-0620-5

M. M. Ullah, S. N. Parizi, and I. Laptev, Improving bag-of-features action recognition with non-local cues, Procedings of the British Machine Vision Conference 2010, 2010.
DOI : 10.5244/C.24.95

J. Van-gemert, J. Geusebroek, C. J. Veenman, and A. W. Smeulders, Kernel Codebooks for Scene Categorization, Proceedings of the European Conference on Computer Vision, 2008.
DOI : 10.1007/978-3-540-88690-7_52

J. Van-gemert, C. G. Snoek, C. J. Veenman, A. W. Smeulders, and J. Geusebroek, Comparing compact codebooks for visual categorization, Computer Vision and Image Understanding, vol.114, issue.4, pp.450-462, 2010.
DOI : 10.1016/j.cviu.2009.08.004

J. Van-gemert, C. Veenman, A. Smeulders, and J. Geusebroek, Visual Word Ambiguity, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.7, pp.1271-1283, 2010.
DOI : 10.1109/TPAMI.2009.132

A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman, Multiple kernels for object detection, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459183

E. Vig, M. Dorr, and D. Cox, Saliency-based space-variant descriptor sampling for action recognition, Proceedings of the European Conference on Computer Vision, 2012.

P. A. Viola and M. J. Jones, Robust Real-Time Face Detection, International Journal of Computer Vision, vol.57, issue.2, pp.137-154, 2004.
DOI : 10.1023/B:VISI.0000013087.49260.fb

H. Wang, A. Kläser, C. Schmid, and C. Liu, Action recognition by dense trajectories, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995407

URL : https://hal.archives-ouvertes.fr/inria-00583818

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, pp.60-79, 2013.
DOI : 10.1007/s11263-012-0594-8

URL : https://hal.archives-ouvertes.fr/hal-00725627

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.441

URL : https://hal.archives-ouvertes.fr/hal-00873267

H. Wang, M. M. Ullah, A. Kläser, I. Laptev, and C. Schmid, Evaluation of local spatiotemporal features for action recognition, Proceedings of the British Machine Vision Conference, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00439769

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang et al., Locality-constrained Linear Coding for image classification, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540018

T. Wang, S. Wang, and D. Xiaoqing, Detecting Human Action as the Spatio-Temporal Tube of Maximum Mutual Information, IEEE Transactions on Circuits and Systems for Video Technology, pp.277-290, 2014.
DOI : 10.1109/TCSVT.2013.2276856

G. Willems, T. Tuytelaars, and L. J. , An efficient dense and scale-invariant spatiotemporal interest point detector, Proceedings of the European Conference on Computer Vision, 2008.

J. Winn, A. Criminisi, and T. Minka, Object categorization by learned universal visual dictionary, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.171

S. Wu, O. Oreifej, and M. Shah, Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126397

Z. Wu, Q. Ke, M. Isard, and J. Sun, Bundling features for large scale partial-duplicate web image search, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.25-32, 2009.

C. Xu and J. Corso, Evaluation of super-voxel methods for early video processing, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012.

C. Xu, C. Xiong, and J. Corso, Streaming Hierarchical Video Segmentation, Proceedings of the European Conference on Computer Vision, 2012.
DOI : 10.1007/978-3-642-33783-3_45

J. Yang, Y. Li, Y. Tian, L. Duan, and W. Gao, Group sensitive multiple kernel learning for object categorization, Proceedings of the IEEE International Conference on Computer Vision, 2009.

J. Yang, K. Yu, Y. Gong, and T. Huang, Linear spatial pyramid matching using sparse coding for image classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1794-1801, 2009.

M. S. Yang-yang and G. Shu, Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.216

J. Yuan, Z. Liu, and Y. Wu, Discriminative Video Pattern Search for Efficient Action Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.9, pp.1728-1743, 2011.
DOI : 10.1109/TPAMI.2011.38

J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study, International Journal of Computer Vision, vol.36, issue.1, pp.213-238, 2007.
DOI : 10.1007/s11263-006-9794-4

URL : https://hal.archives-ouvertes.fr/inria-00548574

W. Zhao, H. Jégou, and G. Gravier, Oriented pooling for dense and non-dense rotationinvariant features, Proceedings of the British Machine Vision Conference, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00841590

X. Zhou, K. Yu, T. Zhang, and T. S. Huang, Image Classification Using Super-Vector Coding of Local Image Descriptors, Proceedings of the European Conference on Computer Vision, 2010.
DOI : 10.1007/978-3-642-15555-0_11

J. Zhu, B. Wang, X. Yang, W. Zhang, and Z. Tu, Action Recognition with Actons, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.442