G. Conférences-internationales-2014-nadia-derbas and . Quénot, Joint Audio-Visual Words for Violent Scenes Detection in Movies, ACM International Conference on Multimedia Retrieval (ICMR)

N. Derbas, Production d'annotations par plan pour l'indexation des vidéos, Rencontres Jeunes Chercheurs (RJC)

N. Derbas, B. Safadi, and G. Quénot, LIG at MediaEval 2013 Affect Task : Use of a Generic Method and Joint Audio-Visual Words, Workshop, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953091

N. Derbas, F. Thollard, B. Safadi, and G. Quénot, LIG at MediaEval 2012 affect task : use of a generic method, Workshop, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00770536

Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid, Good Practice in Large-Scale Learning for Image Classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.3, p.35, 2013.
DOI : 10.1109/TPAMI.2013.146

URL : https://hal.archives-ouvertes.fr/hal-00690014

A. Alahi, R. Ortiz, and P. Vandergheynst, FREAK: Fast Retina Keypoint, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.510-517, 2012.
DOI : 10.1109/CVPR.2012.6247715

URL : https://infoscience.epfl.ch/record/175537/files/2069.pdf

A. Amit and . Trouvé, POP: Patchwork of Parts Models for Object Recognition, International Journal of Computer Vision, vol.7, issue.2, pp.267-282, 2007.
DOI : 10.1007/978-1-4757-2440-0

]. P. Atre-10, M. A. Atrey, A. Hossain, M. S. Saddik, and . Kankanhalli, Multimodal fusion for multimedia analysis: a survey, Multimedia systems, pp.345-379, 2010.
DOI : 10.1115/1.3662552

]. S. Ayac-07a, G. Ayache, and . Quénot, Indexation de documents multimédia par réseaux d'opérateurs, pp.385-400, 2007.

]. S. Ayac-07b, G. Ayache, J. Quénot, S. Dong, L. Cen et al., Classifier fusion for SVM-based multimedia semantic indexing IRIM at TRECVID 2012 : Semantic Indexing and Instance Search, Advances in Information Retrieval Proceedings of the workshop on TREC Video Retrieval Evaluation (TRECVID) IRIM at TRECVID 2013 : Semantic Indexing and Instance Search " . In : Proceedings of the workshop on TREC Video Retrieval Evaluation (TRECVID), pp.494-504, 2007.

D. Batra, T. Chen, R. Sukthankar, H. Bay, A. Ess et al., Space-time shapelets for action recognition Speeded-up robust features (SURF), Motion and video Computing, pp.1-6, 2008.
DOI : 10.1109/wmvc.2008.4544051

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.1567

]. M. Beal, N. Jojic, and H. Attias, A graphical model for audiovisual object tracking " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.25, issue.7, pp.828-836, 2003.
DOI : 10.1109/tpami.2003.1206512

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.7327

]. E. Bermejo-nievas, O. D. Suarez, G. Bueno, R. García, and . Sukthankar, Violence Detection in Video Using Computer Vision Techniques, Computer Analysis of Images and Patterns, pp.332-339, 2011.
DOI : 10.1109/AVSS.2007.4425310

]. E. Bernstein and Y. Amit, Part-Based Statistical Models for Object Classification and Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.734-740, 2005.
DOI : 10.1109/CVPR.2005.270

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.297.3354

I. Biederman, Recognition-by-components: A theory of human image understanding., Psychological Review, vol.94, issue.2, pp.115-135, 1987.
DOI : 10.1037/0033-295X.94.2.115

M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, Actions as space-time shapes, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, pp.1395-1402, 2005.
DOI : 10.1109/ICCV.2005.28

]. A. Bobick and J. W. Davis, The recognition of human movement using temporal templates " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.23, issue.3, pp.257-267, 2001.

J. S. Boreczky and L. A. Rowe, Comparison of video shot boundary detection techniques, Journal of Electronic Imaging, vol.5, issue.2, pp.122-128, 1996.
DOI : 10.1117/12.238675

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.2179

A. Bosch, A. Zisserman, and X. Munoz, Representing shape with a spatial pyramid kernel, Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR '07, pp.401-408, 2007.
DOI : 10.1145/1282280.1282340

A. Bosch, A. Zisserman, and X. Muoz, Scene classification using a hybrid generative/discriminative approach " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.30, issue.4, pp.712-727, 2008.
DOI : 10.1109/tpami.2007.70716

L. Bottou and O. Bousquet, The Tradeoffs of Large Scale Learning, p.36, 2007.

C. and C. Lin, LIBSVM : a library for support vector machines, p.31, 2001.

C. and C. Lin, LIBSVM : a library for support vector machines, ACM Transactions on Intelligent Systems and Technology (TIST), vol.2, issue.11 3, pp.27-36, 2011.

K. W. Chawla, L. O. Bowyer, W. P. Hall, S. Kegelmeyer, R. Chopra et al., SMOTE : synthetic minority over-sampling technique " . arXiv preprint arXiv :1106.1813 Learning a similarity metric discriminatively, with application to face verification, Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, pp.36-539, 2005.

L. Chua, J. Chen, and . Wang, Stratification approach to modeling video, Multimedia Tools and Applications, pp.79-97, 2002.

A. Zisserman, An exemplar model for learning object classes " . In : Computer Vision and Pattern Recognition, CV- PR'07. IEEE Conference on, pp.1-8, 2007.

T. Cover and P. Hart, Nearest neighbor pattern classification " . Information Theory, IEEE Transactions on, vol.13, issue.1, pp.21-27, 1967.
DOI : 10.1109/tit.1967.1053964

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.2616

]. K. Cram-02, Y. Crammer, and . Singer, On the algorithmic implementation of multiclass kernel-based vector machines, The Journal of Machine Learning Research, vol.2, pp.265-292, 2002.

]. D. Crandall and D. P. Huttenlocher, Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition, Computer Vision?ECCV 2006, pp.16-29, 2006.
DOI : 10.1007/11744023_2

M. Cristani, M. Bicego, and V. Murino, Audio-Visual Event Recognition in Surveillance Video Sequences, Multimedia, pp.257-267, 2007.
DOI : 10.1109/TMM.2006.886263

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, Workshop on statistical learning in computer vision, ECCV, pp.1-2, 2004.

C. Triggs and . Schmid, Human detection using oriented histograms of flow and appearance, Computer Vision?ECCV 2006, pp.428-441, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00548587

]. S. Dana-07, N. Danafar, and . Gheissari, Action recognition for surveillance applications using optic flow and SVM, Computer Vision?ACCV 2007, pp.457-466, 2007.

A. Datta, M. Shah, N. Da, and V. Lobo, Person-on-person violence detection in video data, Object recognition supported by user interaction for service robots, pp.433-438, 2002.
DOI : 10.1109/ICPR.2002.1044748

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.8417

]. G. Davenport, T. A. Smith, and N. Pincever, Cinematic primitives for multimedia, IEEE Computer Graphics and Applications, vol.11, issue.4, pp.67-74, 1991.
DOI : 10.1109/38.126883

J. Delhumeau, P. Gosselin, H. Jégou, and P. Pérez, Revisiting the VLAD image representation, Proceedings of the 21st ACM international conference on Multimedia, MM '13, pp.653-656, 2013.
DOI : 10.1145/2502081.2502171

URL : https://hal.archives-ouvertes.fr/hal-00840653

. Jiang, The MediaEval 2013 Affect Task : Violent Scenes Detection, p.54, 1945.
URL : https://hal.archives-ouvertes.fr/hal-00932551

H. Demarty, B. Ionescu, Y. Jiang, V. L. Quang, M. Schedl et al., Benchmarking Violent Scenes Detection in movies, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI), p.59, 2014.
DOI : 10.1109/CBMI.2014.6849827

URL : https://hal.archives-ouvertes.fr/hal-00767036

N. Derbas, F. Thollard, B. Safadi, and G. Quénot, LIG at MediaEval 2012 Affect Task : Use of a Generic Method, 1938.
URL : https://hal.archives-ouvertes.fr/hal-00770536

N. Derbas, B. Safadi, and G. Quénot, LIG at MediaEval 2013 Affect Task : Use of a Generic Method and Joint Audio-Visual Words, 1958.
URL : https://hal.archives-ouvertes.fr/hal-00953091

T. G. Dietterich, R. H. Lathrop, and T. Lozano-pérez, Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, vol.89, issue.1-2, pp.31-71, 1997.
DOI : 10.1016/S0004-3702(96)00034-3

URL : http://doi.org/10.1016/s0004-3702(96)00034-3

P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, Behavior Recognition via Sparse Spatio-Temporal Features, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp.65-72, 2005.
DOI : 10.1109/VSPETS.2005.1570899

C. Drummond and R. C. Holte, C4. 5, class imbalance, and cost sensitivity : why under-sampling beats over-sampling, Workshop on Learning from Imbalanced Datasets II, Citeseer, p.36, 2003.

E. Dumont and G. Quénot, A Local Temporal Context-Based Approach for TV News Story Segmentation, 2012 IEEE International Conference on Multimedia and Expo, pp.973-978, 2012.
DOI : 10.1109/ICME.2012.3

URL : https://hal.archives-ouvertes.fr/hal-00767396

A. A. Efros, A. C. Berg, G. Mori, and J. Malik, Recognizing action at a distance, Proceedings Ninth IEEE International Conference on Computer Vision, pp.726-733, 2003.
DOI : 10.1109/ICCV.2003.1238420

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.331.921

]. M. Ever-10, L. Everingham, C. K. Van-gool, J. Williams, A. Winn et al., The pascal visual object classes (voc) challenge, International journal of computer vision, vol.88, issue.2, pp.303-338, 2010.

]. C. Fara-13, C. Farabet, L. Couprie, Y. Najman, and . Lecun, Learning hierarchical features for scene labeling " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.35, issue.8, pp.1915-1929, 2013.

]. G. Farn-03, L. Farnebäck, R. Fei-fei, A. Fergus, and . Torralba, Two-frame motion estimation based on polynomial expansion In : Image Analysis Recognizing and learning object categories, CVPR Short Course, pp.363-370, 2003.

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan, Object detection with discriminatively trained part-based models " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.32, issue.9, pp.1627-1645, 2010.

B. Feng, P. Ding, J. Chen, J. Bai, S. Xu et al., Multi-modal information fusion for news story segmentation in broadcast video, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1417-1420, 2012.
DOI : 10.1109/ICASSP.2012.6288156

R. Fergus, P. Perona, and A. Zisserman, Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition, International Journal of Computer Vision, vol.20, issue.1, pp.273-303, 2007.
DOI : 10.1109/34.655647

V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid, Groups of adjacent contour segments for object detection " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.30, issue.1, pp.36-51, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00203719

J. W. Fisher, I. , T. Darrell, W. T. Freeman, and P. A. Viola, Learning joint statistical models for audio-visual fusion and segregation, pp.772-778, 2000.

S. Sonnenburg, Optimized cutting plane algorithm for support vector machines, Proceedings of the 25th international conference on Machine learning, pp.320-327, 2008.

]. M. Fuss-06, A. Fussenegger, A. Opelt, and . Pinz, Object localization/segmentation using generic shape priors, 18th International Conference on, pp.41-44, 2006.

C. Galleguillos, B. Babenko, A. Rabinovich, and S. Belongie, Weakly Supervised Object Localization with Stable Segmentations, Computer Vision?ECCV, pp.193-207, 2008.
DOI : 10.1007/978-3-540-88682-2_16

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.210.4047

]. J. Van-gemert, J. Geusebroek, C. J. Veenman, and A. W. Smeulders, Kernel Codebooks for Scene Categorization, Computer Vision?ECCV, pp.696-709, 2008.
DOI : 10.1007/978-3-540-88690-7_52

X. Geng, D. Zhan, and Z. Zhou, Supervised Nonlinear Dimensionality Reduction for Visualization and Classification, Systems, Man, and Cybernetics, pp.1098-1107, 2005.
DOI : 10.1109/TSMCB.2005.850151

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.678.4764

A. W. Smeulders, Color-based object recognition, Pattern recognition, vol.32, issue.3, pp.453-464, 1999.

]. T. Giannakopoulos, D. I. Kosmopoulos, A. Aristidou, and S. Theodoridis, Violence Content Classification Using Audio Features, pp.502-507, 2006.
DOI : 10.1017/CBO9780511801389

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.74.3830

]. T. Giannakopoulos, A. Makris, D. Kosmopoulos, S. Perantonis, and S. Theodoridis, Audio-Visual Fusion for Detecting Violent Scenes in Videos, Artificial Intelligence : Theories, Models and Applications, pp.91-100, 2010.
DOI : 10.1007/978-3-642-12842-4_13

S. Wang, Q. Jiang, W. Huang, and . Gao, Detecting Violent Scenes in Movies by Auditory and Visual Cues, Advances in Multimedia Information Processing -PCM 2008, pp.317-326, 2008.

L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, Actions as space-time shapes " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.29, issue.12, pp.2247-2253, 2007.
DOI : 10.1109/tpami.2007.70711

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.8218

D. Gorisse, F. Precioso, P. Gosselin, L. Granjon, D. Pellerin et al., IRIM at TRECVID 2010 : High level feature extraction and instance search, p.107, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00953839

]. D. Grangier and S. Bengio, A discriminative kernel-based approach to rank images from text queries " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.30, issue.8, pp.1371-1384, 2008.
DOI : 10.1109/tpami.2007.70791

Z. Gu, T. Mei, X. Hua, J. Tang, and X. Wu, Multi-layer multiinstance learning for video concept detection " . Multimedia, IEEE Transactions on, vol.10, issue.8, pp.1605-1616, 2008.
DOI : 10.1109/tmm.2008.2007290

B. Habi-14a, ]. A. Habibian, T. Mensink, and C. G. Snoek, Composite Concept Discovery for Zero-Shot Video Event Detection, p.30, 2014.

]. A. Habi-14b, C. G. Habibian, and . Snoek, Stop-Frame Removal Improves Web Video Classification, Proceedings of International Conference on Multimedia Retrieval, pp.499-83, 2014.

A. Hamadi, G. Quénot, and P. Mulhem, Two-layers re-ranking approach based on contextual information for visual concepts detection in videos, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.1-6, 2012.
DOI : 10.1109/CBMI.2012.6269837

URL : https://hal.archives-ouvertes.fr/hal-00767172

A. Hamadi, B. Safadi, T. Vuong, D. Han, N. Derbas et al., Quaero at TRECVID 2013 : Semantic Indexing and Instance Search, Proc. TRECVID Workshop, p.74, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953086

]. L. Hansen, J. Larsen, and T. Kolenda, On independent component analysis for multimedia signals, Multimedia Image and Video Processing, pp.175-199, 2000.

Z. S. Harris, Distributional structure, p.23, 1954.

H. Harzallah, F. Jurie, and C. Schmid, Combining efficient object localization and image classification, 2009 IEEE 12th International Conference on Computer Vision, pp.237-244, 2009.
DOI : 10.1109/ICCV.2009.5459257

URL : https://hal.archives-ouvertes.fr/inria-00439516

J. Huang, S. R. Kumar, M. Mitra, W. Zhu, and R. Zabih, Image indexing using color correlograms, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.762-768, 1997.
DOI : 10.1109/CVPR.1997.609412

]. H. Jego-10a, M. Jégou, C. Douze, and . Schmid, Improving Bag-of-Features for Large Scale Image Search, International Journal of Computer Vision, vol.42, issue.3, pp.316-336, 2010.
DOI : 10.1007/s11263-009-0285-2

]. H. Jego-10b, M. Jégou, C. Douze, P. Schmid, and . Pérez, Aggregating local descriptors into a compact image representation, 2010 IEEE Conference on, pp.3304-3311, 2010.

]. H. Jego-12a, O. Jégou, ]. H. Chum, F. Jégou, M. Perronnin et al., Negative evidences and co-occurences in image retrieval : The benefit of PCA and whitening Aggregating local image descriptors into compact codes " . Pattern Analysis and Machine Intelligence Caffe : An Open Source Convolutional Architecture for Fast Feature Embedding, Computer Vision?ECCV 2012, pp.774-787, 2012.

J. Jiang, S. Wang, C. Chang, and . Ngo, Domain adaptive semantic diffusion for large scale context-based video annotation, Computer Vision IEEE 12th International Conference on, pp.1420-1427, 2009.

]. W. Jiang and A. C. Loui, Audio-visual grouplet, Proceedings of the 19th ACM international conference on Multimedia, MM '11, pp.123-132, 2011.
DOI : 10.1145/2072298.2072316

]. L. Jiang, W. Tong, D. Meng, and A. G. Hauptmann, Towards Efficient Learning of Optimal Spatial Bag-of-Words Representations, Proceedings of International Conference on Multimedia Retrieval, ICMR '14, p.25, 2014.
DOI : 10.1145/2578726.2578739

M. S. Kankanhalli and T. Chua, Video modeling using strata-based annotation, IEEE Multimedia, vol.7, issue.1, pp.68-74, 2000.
DOI : 10.1109/93.839313

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.8878

L. M. Kaplan, R. Murenzi, and K. R. Namuduri, Fast texture database retrieval using extended fractal features, Photonics West'98 Electronic Imaging International Society for Optics and Photonics, pp.162-173, 1997.
DOI : 10.1117/12.298440

V. Kellokumpu, G. Zhao, and M. Pietikäinen, Human activity recognition using a dynamic texture based method, pp.1-10, 2008.

E. Khoury, C. Sénac, and P. Joly, Audiovisual diarization of people in video content, Multimedia Tools and Applications, vol.13, issue.4, pp.747-775, 2014.
DOI : 10.1007/978-3-540-68585-2_49

T. Kolenda, L. K. Hansen, J. Larsen, and O. Winther, Independent component analysis for understanding multimedia content, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, pp.757-766, 2002.
DOI : 10.1109/NNSP.2002.1030096

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.650.5965

]. A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Communications of the ACM, vol.60, issue.6, pp.1097-1105, 2012.
DOI : 10.1162/neco.2009.10-08-881

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.299.205

M. Kubat and S. Matwin, Addressing the curse of imbalanced training sets : one-sided selection, pp.179-186, 1997.

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, pp.2556-2563, 2011.
DOI : 10.1109/ICCV.2011.6126543

URL : http://cbcl.mit.edu/publications/ps/Kuehne_etal_iccv11.pdf

C. La, ]. M. Cascia, S. Sethi, and S. Sclaroff, Combining textual and visual cues for content-based image retrieval on the world wide web " . In : Content-Based Access of Image and Video Libraries, Proceedings. IEEE Workshop on, pp.24-28, 1998.

C. H. Lampert, M. B. Blaschko, and T. Hofmann, Beyond sliding windows: Object localization by efficient subwindow search, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.
DOI : 10.1109/CVPR.2008.4587586

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.4517

L. Lan, S. Bao, W. Yu, A. Liu, and . Hauptmann, Multimedia classification and event detection using double fusion, Multimedia Tools and Applications, pp.1-15, 2013.
DOI : 10.1109/TMM.2008.917359

I. Laptev, On Space-Time Interest Points, International Journal of Computer Vision, vol.17, issue.8, pp.107-123, 2005.
DOI : 10.1007/BFb0017862

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.4359

S. Lazebnik, C. Schmid, and J. Ponce, A sparse texture representation using local affine regions " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.27, issue.8, pp.1265-1278, 2005.
DOI : 10.1109/tpami.2005.151

URL : https://hal.archives-ouvertes.fr/inria-00548530

S. Lazebnik, C. Schmid, and J. Ponce, Beyond bags of features : Spatial pyramid matching for recognizing natural scene categories " . In : Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, pp.2169-2178, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00548585

Y. Lecun, K. Kavukcuoglu, and C. Farabet, Convolutional networks and applications in vision, Proceedings of 2010 IEEE International Symposium on Circuits and Systems, pp.253-256, 2010.
DOI : 10.1109/ISCAS.2010.5537907

L. Lecun, Y. Bottou, P. Bengio, G. Haffner-lin, and . Wahba, Gradient-based learning applied to document recognition Multicategory support vector machines : Theory and application to the classification of microarray data and satellite radiance data, Proceedings of the IEEE, pp.2278-2324, 1998.

]. B. Leibe, A. Leonardis, and B. Schiele, An Implicit Shape Model for Combined Object Categorization and Segmentation, Workshop on Statistical Learning in Computer Vision, ECCV, pp.7-66, 2004.
DOI : 10.1007/11957959_26

F. F. Li, R. Vanrullen, C. Koch, and P. Perona, Rapid natural scene categorization in the near absence of attention, Proceedings of the National Academy of Sciences, pp.9596-9601, 2002.
DOI : 10.1126/science.287.5456.1273

]. Li, H. Su, E. P. Xing, and F. Li, Object Bank : A High- Level Image Representation for Scene Classification & Semantic Feature Sparsification, pp.5-30, 2010.
DOI : 10.1007/s11263-013-0660-x

W. Lin and . Wang, Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training, Advances in Multimedia Information Processing -PCM 2009, pp.930-935, 2009.
DOI : 10.1007/978-3-642-10467-1_84

F. Lin, S. Lv, M. Zhu, T. Yang, K. Cour et al., Large-scale image classification: Fast feature extraction and SVM training, CVPR 2011, pp.1689-1696, 2011.
DOI : 10.1109/CVPR.2011.5995477

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.225.3736

D. Liu, G. Hua, P. Viola, and T. Chen, Integrated feature selection and higher-order spatial feature extraction for object categorization, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.
DOI : 10.1109/CVPR.2008.4587403

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.319.2362

J. Liu, Z. Wu, and . Zhou, Exploratory undersampling for classimbalance learning, Systems, Man, and Cybernetics, pp.539-550, 2009.

]. D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.4931

]. D. Lowe, Object recognition from local scale-invariant features, Proceedings of the Seventh IEEE International Conference on Computer Vision, pp.1150-1157, 1999.
DOI : 10.1109/ICCV.1999.790410

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.4065

L. Lu, Z. Xie, D. Fu, Y. Jiang, and . Zhang, Multimodal feature integration for story boundary detection in broadcast news, Chinese Spoken Language Processing (ISCSLP) 7th International Symposium on, pp.420-425, 2010.

]. B. Luca-81, T. Lucas, and . Kanade, An iterative image registration technique with an application to stereo vision, pp.674-679, 1981.

B. S. Manjunath and W. Ma, Texture features for browsing and retrieval of image data " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.18, issue.8, pp.837-842, 1996.

J. Mao and A. K. Jain, Texture classification and segmentation using multiresolution simultaneous autoregressive models, Pattern Recognition, vol.25, issue.2, pp.173-188, 1992.
DOI : 10.1016/0031-3203(92)90099-5

R. Maree, P. Geurts, J. Piater, and L. Wehenkel, Random Subwindows for Robust Image Classification, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.34-40, 2005.
DOI : 10.1109/CVPR.2005.287

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.69.8683

M. Marszalek, I. Laptev, and C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.2929-2936, 2009.
DOI : 10.1109/CVPR.2009.5206557

URL : https://hal.archives-ouvertes.fr/inria-00548645

M. Mazloom, E. Gavves, K. Van-de-sande, and C. Snoek, Searching informative concept banks for video event detection, Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, ICMR '13, pp.255-262, 2013.
DOI : 10.1145/2461466.2461507

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.310.8951

L. Wu, 2013 internet trends, 2013.

L. Huang, G. Xie, A. Hua, and . Natsev, Semantic model vectors for complex video event recognition, Multimedia IEEE Transactions on, vol.14, issue.1, pp.88-101, 2012.

C. Schmid, Scale & affine invariant interest point detectors, International journal of computer vision, vol.60, issue.1, pp.63-86, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00548554

D. Mitrovi´cmitrovi´c, M. Zeppelzauer, C. Breiteneder, R. Ewerth, J. Zhou et al., Features for contentbased audio retrieval Advances in computers, Multimodal video concept detection via bag of auditory words and multiple kernel learning " . In : Advances in Multimedia Modeling, pp.71-150, 2010.

M. R. Naphade and J. R. Smith, On the detection of semantic concepts at TRECVID, Proceedings of the 12th annual ACM international conference on Multimedia , MULTIMEDIA '04, pp.660-667, 2004.
DOI : 10.1145/1027527.1027680

M. H. Nguyen, L. Torresani, F. De-la-torre, and C. Rother, Weakly supervised discriminative localization and classification: a joint learning process, 2009 IEEE 12th International Conference on Computer Vision, pp.1925-1932, 2009.
DOI : 10.1109/ICCV.2009.5459426

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.2127

E. Nowak, F. Jurie, and B. Triggs, Sampling strategies for bagof-features image classification, Computer Vision?ECCV 2006, pp.490-503, 2006.
DOI : 10.1007/11744085_38

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.9956

T. Pietikainen and . Maenpaa, Multiresolution grayscale and rotation invariant texture classification with local binary patterns " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.24, issue.7, pp.971-987, 2002.

D. Pietikäinen and . Harwood, A comparative study of texture measures with classification based on featured distributions, Pattern recognition, vol.29, issue.1, pp.51-59, 1996.

]. A. Oliva and A. Torralba, Modeling the shape of the scene : A holistic representation of the spatial envelope, International Journal of Computer Vision, vol.42, issue.3, pp.145-175, 2001.
DOI : 10.1023/A:1011139631724

A. Opelt and . Pinz, Object Localization with Boosting and Weak Supervision for Generic Object Recognition, Image Analysis, pp.862-871, 2005.
DOI : 10.1007/11499145_87

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.211.7741

P. Over, G. Awad, J. Fiscus, B. Antonishek, M. Michel et al., An overview of the goals, tasks, data, evaluation mechanisms and metrics, TRECVID 2013-TREC Video Retrieval Evaluation Online, pp.45-74, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01230444

M. Pandey and S. Lazebnik, Scene recognition and weakly supervised object localization with deformable part-based models, 2011 International Conference on Computer Vision, pp.1307-1314, 2011.
DOI : 10.1109/ICCV.2011.6126383

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.300.7841

D. Parikh and C. L. Zitnick, The role of features, algorithms and data in visual recognition, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.2328-2335, 2010.
DOI : 10.1109/CVPR.2010.5539920

C. Penet, C. Demarty, G. Gravier, and P. Gros, Audio event detection in movies using multiple audio words and contextual Bayesian networks, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.17-22, 2013.
DOI : 10.1109/CBMI.2013.6576546

URL : https://hal.archives-ouvertes.fr/hal-00822022

F. Perronnin and C. Dance, Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2007.
DOI : 10.1109/CVPR.2007.383266

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.7388

]. F. Perr-10a, J. Perronnin, Y. Sánchez, and . Liu, Large-scale image categorization with explicit data embedding, 2010 IEEE Conference on, pp.2297-2304, 2010.

]. F. Perr-10b, J. Perronnin, T. Sánchez, and . Mensink, Improving the fisher kernel for large-scale image classification, Computer Vision?ECCV 2010, pp.143-156, 2010.

J. C. Platt, Fast training of support vector machines using sequential minimal optimization, p.36, 1999.

A. Prest, C. Schmid, and V. Ferrari, Weakly supervised learning of interactions between humans and objects " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.34, issue.3, pp.601-614, 2012.
DOI : 10.1109/tpami.2011.158

URL : https://hal.archives-ouvertes.fr/inria-00516477

]. T. Quac-07, V. Quack, B. Ferrari, L. Leibe, and . Van-gool, Efficient mining of frequent and distinctive feature configurations, Computer Vision , 2007. ICCV 2007. IEEE 11th International Conference on, pp.1-8, 2007.

G. Quénot and F. Thollard, Reclassement d'images par le contenu, CORIA 2012, p.96

]. D. Rama-06, C. Ramanan, and . Sminchisescu, Training Deformable Models for Localization, Computer Vision and Pattern Recognition IEEE Computer Society Conference on, pp.206-213, 2006.

M. Redi and B. Merialdo, Saliency moments for image categorization, Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pp.39-107, 2011.
DOI : 10.1145/1991996.1992035

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.673.8189

R. Ries and . Lienhart, Deriving a discriminative color model for a given object class from weakly labeled training data, Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR '12, pp.44-68, 2012.
DOI : 10.1145/2324796.2324848

M. Rohrbach, M. Stark, and B. Schiele, Evaluating knowledge transfer and zero-shot learning in a large-scale setting, CVPR 2011, pp.1641-1648, 2011.
DOI : 10.1109/CVPR.2011.5995627

S. T. Roweis and L. K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science, vol.290, issue.5500, pp.2323-2326, 2000.
DOI : 10.1126/science.290.5500.2323

URL : http://astro.temple.edu/~msobel/courses_files/saulmds.pdf

H. A. Rowley, S. Baluja, and T. Kanade, Human face detection in visual scenes, p.66, 1995.

Y. Rui, T. S. Huang, and S. Mehrotra, Exploring Video Structure Beyond The Shots, Proceedings of the IEEE International Conference on Multimedia Computing and Systems, pp.237-249, 1998.

W. T. Russell, A. A. Freeman, J. Efros, A. Sivic, and . Zisserman, Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.1605-1614, 2006.
DOI : 10.1109/CVPR.2006.326

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.2856

S. Sadanand and J. J. Corso, Action bank: A high-level representation of activity in video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.1234-1241, 2012.
DOI : 10.1109/CVPR.2012.6247806

B. Safadi and G. Quénot, Evaluations of multi-learner approaches for concept indexing in video documents, Adaptivity, Personalization and Fusion of Heterogeneous Information, pp.88-91, 2010.

]. B. Safa-11a, N. Safadi, A. Derbas, F. Hamadi, G. Thollard et al., Quaero at TRECVID 2011 : Semantic Indexing and Multimedia Event Detection, p.32, 2011.

]. B. Safa-11b, G. Safadi, and . Quenot, Re-ranking for multimedia indexing and retrieval, Advances in Information Retrieval, pp.708-711, 2011.

B. Safadi and G. Quénot, Active learning with multiple classifiers for multimedia indexing, Multimedia Tools and Applications, pp.403-417, 2012.
DOI : 10.1007/s11042-010-0599-7

URL : https://hal.archives-ouvertes.fr/hal-00953838

J. Sánchez and F. Perronnin, High-dimensional signature compression for large-scale image classification, CVPR 2011, pp.1665-1672, 2011.
DOI : 10.1109/CVPR.2011.5995504

J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, pp.222-245, 2013.
DOI : 10.1007/s11263-006-9794-4

M. E. Sargin, E. Erzin, Y. Yemez, and A. M. Tekalp, Multimodal Speaker Identification Using Canonical Correlation Analysis, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, p.51, 2006.
DOI : 10.1109/ICASSP.2006.1660095

M. E. Sargin, Y. Yemez, E. Erzin, and A. M. Tekalp, Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis, IEEE Transactions on Multimedia, vol.9, issue.7, pp.1396-1403, 2007.
DOI : 10.1109/TMM.2007.906583

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.118.2660

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004.
DOI : 10.1109/ICPR.2004.1334462

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.173.6790

S. Shalev-shwartz and N. Srebro, SVM optimization, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.928-935, 2008.
DOI : 10.1145/1390156.1390273

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, pp.1470-1477, 2003.
DOI : 10.1109/ICCV.2003.1238663

A. F. Smeaton, P. Over, and A. R. Doherty, Video shot boundary detection: Seven years of TRECVid activity, Computer Vision and Image Understanding, vol.114, issue.4, pp.411-418, 2010.
DOI : 10.1016/j.cviu.2009.03.011

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.148.2826

J. R. Smith, A. Naphade, and . Natsev, Multimedia semantic indexing using model vectors, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), p.445, 2003.
DOI : 10.1109/ICME.2003.1221649

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.454.4023

T. Smith and G. Davenport, The stratification system a design environment for random access video, Network and Operating System Support for Digital Audio and Video, pp.250-261, 1993.
DOI : 10.1007/3-540-57183-3_22

C. G. Snoek, M. Worring, and A. W. Smeulders, Early versus late fusion in semantic video analysis, Proceedings of the 13th annual ACM international conference on Multimedia , MULTIMEDIA '05, pp.399-402, 2005.
DOI : 10.1145/1101149.1101236

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.5928

C. Snoek, K. Van-de-sande, D. Fontijne, A. Habibian, M. Jain et al., MediaMill at TRECVID 2013 : Searching concepts, objects, instances and events in video, p.79, 2013.
DOI : 10.1145/1873951.1874212

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.381.3359

F. De-souza, G. Chávez, E. Valle, A. De, and A. Araujo, Violence Detection in Video Using Spatio-Temporal Features, 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images, pp.224-230, 2010.
DOI : 10.1109/SIBGRAPI.2010.38

S. T. Strat, A. Benoit, P. Lambert, and A. Caplier, Retina enhanced SURF descriptors for spatio-temporal concept detection, Multimedia tools and applications, pp.443-469, 2014.
DOI : 10.1145/1390334.1390437

URL : https://hal.archives-ouvertes.fr/hal-00760192

M. A. Stricker and M. Orengo, Similarity of color images, IS&T/SPIE's Symposium on Electronic Imaging : Science & Technology International Society for Optics and Photonics, pp.381-392, 1995.
DOI : 10.1117/12.205308

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.2789

]. M. Swai-91, D. H. Swain, and . Ballard, Color indexing, International journal of computer vision, vol.7, issue.1, pp.11-32, 1991.

M. Tahir, J. Kittler, K. Mikolajczyk, F. Yan, K. E. Van-de-sande et al., Visual category recognition using Spectral Regression and Kernel Discriminant Analysis, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp.178-185, 2009.
DOI : 10.1109/ICCVW.2009.5457703

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.175.926

C. Thurau and V. Hlavác, Pose primitive based human action recognition in videos or still images, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.
DOI : 10.1109/CVPR.2008.4587721

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.324.989

]. K. Ting-00 and . Ting, An Empirical Study of MetaCost Using Boosting Algorithms, p.36, 2000.
DOI : 10.1007/3-540-45164-1_42

]. S. Todorovic and N. Ahuja, Extracting Subimages of an Unknown Category from a Set of Images, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 1 (CVPR'06), pp.927-934, 2006.
DOI : 10.1109/CVPR.2006.116

]. L. Torresani, M. Szummer, and A. Fitzgibbon, Efficient Object Category Recognition Using Classemes, Computer Vision?ECCV 2010, pp.776-789, 2010.
DOI : 10.1007/978-3-642-15549-9_56

M. R. Turner, Texture discrimination by Gabor functions, Biological Cybernetics, vol.55, issue.2-3, pp.71-82, 1986.

A. Ulges, C. Schulze, D. Keysers, and T. Breuel, Identifying relevant frames in weakly labeled videos for training concept detectors, Proceedings of the 2008 international conference on Content-based image and video retrieval, CIVR '08, pp.9-16, 2008.
DOI : 10.1145/1386352.1386358

]. J. Van-de-weijer, T. Gevers, and A. D. Bagdanov, Boosting color saliency in image feature detection " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.28, issue.1, pp.150-156, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00548615

]. K. Van-10, T. Van-de-sande, C. G. Gevers, and . Snoek, Evaluating color descriptors for object and scene recognition " . Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.32, issue.69, pp.1582-1596, 2010.

A. Vinokourov, D. R. Hardoon, and J. Shawe-taylor, Learning the semantics of multimedia content with application to web image retrieval and classification, p.32, 2003.

X. Wan and C. Kuo, A new approach to image retrieval with hierarchical color clustering " . Circuits and Systems for Video Technology, IEEE Transactions on, vol.8, issue.5, pp.628-643, 1998.

H. Wang, A. Klaser, C. Schmid, and C. Liu, Action recognition by dense trajectories, CVPR 2011, pp.3169-3176, 2011.
DOI : 10.1109/CVPR.2011.5995407

URL : https://hal.archives-ouvertes.fr/inria-00583818

D. Weinland, R. Ronfard, and E. Boyer, Free viewpoint action recognition using motion history volumes, Computer Vision and Image Understanding, vol.104, issue.2-3, pp.249-257, 2006.
DOI : 10.1016/j.cviu.2006.07.013

URL : https://hal.archives-ouvertes.fr/inria-00544629

L. K. Weinberger and . Saul, Distance metric learning for large margin nearest neighbor classification, The Journal of Machine Learning Research, vol.10, pp.207-244, 2009.

R. Weiss, A. Duda, D. K. Gifford, S. Weston, N. Bengio et al., Composition and search with a video algebra, IEEE Multimedia, vol.2, issue.1, pp.12-25, 1995.
DOI : 10.1109/93.368596

J. Weston and C. Watkins, Support vector machines for multi-class pattern recognition, pp.61-72, 1999.

]. J. Winn-05a, A. Winn, T. Criminisi, and . Minka, Object categorization by learned universal visual dictionary, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, pp.1800-1807, 2005.
DOI : 10.1109/ICCV.2005.171

]. J. Winn-05b, N. Winn, and . Jojic, LOCUS: learning object classes with unsupervised segmentation, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, pp.756-763, 2005.
DOI : 10.1109/ICCV.2005.148

]. J. Yang, K. Yu, and T. Huang, Efficient Highly Over-Complete Sparse Coding Using a Mixture Model, Computer Vision?ECCV 2010, pp.113-126, 2010.
DOI : 10.1007/978-3-642-15555-0_9

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.175.1478

G. Ye, I. Jhuo, D. Liu, Y. Jiang, D. Lee et al., Joint audio-visual bi-modal codewords for video event detection, Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR '12, 1950.
DOI : 10.1145/2324796.2324843

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.394.1190

M. Shah, A differential geometric approach to representing the human actions, Computer Vision and Image Understanding, vol.109, issue.3, pp.335-351, 2008.

T. Zhang and . Chen, Weakly Supervised Object Recognition and Localization with Invariant High Order Features, Procedings of the British Machine Vision Conference 2010, pp.1-11, 2010.
DOI : 10.5244/C.24.47

X. Zhou and . Liu, Training cost-sensitive neural networks with methods addressing the class imbalance problem, IEEE Transactions on Knowledge and Data Engineering, vol.18, issue.1, pp.63-77, 2006.
DOI : 10.1109/TKDE.2006.17