. .. , An overview of text classification

. , 3.2 Pruning with text and visual classification. .. 41 3.3.3 Video description and classifier, p.44

. .. Experiments, , vol.55

. .. Summary,

. Chesneau, we describe our webly-supervised approach for action classification, p.17, 2007.

M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, 2D human pose estimation: New benchmark and state of the art analysis, CVPR, p.81, 2014.

M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, Sequential deep learning for human action recognition, International Workshop on Human Behavior Understanding, p.22, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01354493

L. D. Baker and A. K. Mccallum, Distributional clustering of words for text classification, SIGIR, p.29, 1998.

H. Bay, T. Tuytelaars, and L. Van-gool, Surf: Speeded up robust features, ECCV, p.20, 2006.

A. Binder, W. Samek, K. Müller, and M. Kawanabe, Machine learning for visual concept recognition and ranking for images, Towards the Internet of Services: The THESEUS Research Program, p.42, 2014.

D. C. Blair and M. E. Maron, An evaluation of retrieval effectiveness for a full-text document-retrieval system, Com. ACM, p.17, 1985.

M. Brand, N. Oliver, and A. Pentland, Coupled hidden Markov models for complex action recognition, CVPR, p.101, 1997.

A. Bregman, Auditory Scene Analysis: The perceptual organization of sound, p.63, 1990.

T. Brox and J. Malik, Object segmentation by long term analysis of point trajectories, ECCV, p.20, 2010.

T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, High accuracy optical flow estimation based on a theory for warping, ECCV, p.85, 2004.

S. Buch, V. Escorcia, B. Ghanem, L. Fei-fei, and J. C. Nieble, End-to-end, single-stream temporal action detection in untrimmed videos, BMVC, vol.101, p.102, 2017.

L. Cao, Z. Liu, and T. Huang, Cross-dataset action detection, CVPR, vol.8, p.64, 2010.

J. Carreira and A. Zisserman, Quo vadis, action recognition? A new model and the kinetics dataset, CVPR, 2017.

C. Chang and C. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol.32, p.48, 2011.

J. Chen, Y. Cui, G. Ye, D. Liu, and S. Chang, Event-driven semantic concept discovery by exploiting weakly tagged internet images, ICMR, vol.55, p.57, 2014.

L. Chen, L. Duan, and D. Xu, Event recognition in videos by learning from heterogeneous web sources, CVPR, vol.23, p.28, 2013.

X. Chen, R. Mottaghi, X. Liu, S. Fidler, R. Urtasun et al., Detect what you can: Detecting and representing objects using holistic models and body parts, CVPR, p.62, 2014.

N. Chesneau, K. Alahari, and C. Schmid, Learning from web events for event classification. TCSVT, vol.26, p.100, 2017.

N. Chesneau, G. Rogez, K. Alahari, and C. Schmid, Detecting parts for action localization, BMVC, p.12, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01573629

C. E. Crangle, A. Zbyslaw, J. M. Cherry, and E. L. Hong, Concept extraction and synonymy management for biomedical information retrieval, TREC, p.15, 2004.

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, Workshop on statistical learning in computer vision, ECCV, 1921.

N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, CVPR, p.19, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00548512

N. Dalal, B. Triggs, and C. Schmid, Human detection using oriented histograms of flow and appearance, ECCV, vol.5, p.20, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00548587

A. Dayanik, C. G. Nevill-manning, and B. Oughtred, Partitioning a graph of sequences, structures and abstracts for information retrieval, TREC, p.15, 2003.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, CVPR, p.83, 2009.

S. Divvala, A. Farhadi, and C. Guestrin, Learning everything about anything: Webly-supervised visual concept learning, CVPR, p.24, 2014.

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, CVPR, vol.5, p.22, 2015.

L. Duan, D. Xu, I. W. Tsang, and J. Luo, Visual event recognition in videos by learning from web data, vol.27, p.28, 2012.

I. Endres, K. J. Shih, J. Jiaa, and D. Hoiem, Learning collections of part models for object recognition, CVPR, p.62, 2013.

D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, Scalable object detection using deep neural networks, CVPR, p.61, 2014.

M. Everingham, L. Vangool, C. Williams, W. J. , and A. Zisserman, The PASCAL Visual Object Classes (VOC) Challenge, IJCV, p.69, 2010.

C. Feichtenhofer, A. Pinz, and A. Zisserman, Convolutional two-stream network fusion for video action recognition, CVPR, 2016.

P. Felzenszwalb, R. Girshick, D. Mcallester, and D. Ramanan, Object detection with discriminatively trained part based models, vol.62, p.65, 2010.

A. Gaidon, Z. Harchaoui, and C. Schmid, Temporal Localization of Actions with Actoms. PAMI, vol.64, p.101, 1921.
URL : https://hal.archives-ouvertes.fr/hal-00687312

C. Gan, N. Wang, Y. Yang, D. Yeung, and A. G. Hauptmann, Devnet: A deep event network for multimedia event detection and evidence recounting, CVPR, vol.23, p.27, 2015.

C. Gan, C. Sun, L. Duan, and B. Gong, Webly-supervised video recognition by mutually voting for relevant web images and web video frames, ECCV, vol.55, p.57, 2016.

C. Gan, T. Yao, K. Yang, Y. Yang, and T. Mei, You lead, we exceed: Laborfree video concept learning by jointly exploiting web videos and images, CVPR, vol.55, p.57, 2016.

J. Gao, Z. Yang, and R. Nevatia, Red: Reinforced encoder-decoder networks for action anticipation, vol.101, p.102, 2017.

W. Gerbino and D. Salmaso, The effect of amodal completion on visual matching, Acta psychologica, p.63, 1987.

S. Gidaris and N. Komodakis, Object detection via a multi-region and semantic segmentation-aware CNN model, ICCV, p.61, 2015.

R. Girshick, R. Fast, and . Iccv, , vol.61, p.68, 2015.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, vol.61, p.79, 2014.

G. Gkioxari and J. Malik, Finding action tubes, CVPR, vol.67, p.86, 2015.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, NIPS, p.100, 2014.

A. Habibian, T. E. Mensink, and C. G. Snoek, Composite concept discovery for zero-shot video event detection, ICMR, vol.6, p.57, 2014.

P. Hanks, , p.17, 1986.

S. M. Harabagiu, G. A. Miller, and D. I. Moldovan, SIGLEX, vol.16, p.30, 1999.

S. Hare, A. Saffari, and P. Torr, Struck: Structured output tracking with kernels, ICCV, p.72, 2011.

Z. S. Harris, Distributional structure. Word, p.17, 1954.

A. G. Hauptmann and M. A. Smith, Text, speech and vision for video segmentation: The informedia project, AAAI Fall Symposium, 1995.

K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition. PAMI, p.61, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, CVPR, p.22, 2016.

D. Hiemstra, A probabilistic justification for using tf-idf term weighting in information retrieval, International Journal on Digital Libraries, p.42, 2000.

S. Hochreiter and J. Schmidhuber, Long short-term memory. Neural Computation, p.22, 1997.

R. Hou, R. Suthankar, and M. Shah, Real-time temporal action localization in untrimmed videos by sub-action discovery, BMVC, p.101, 2017.

L. Huang, Y. Yang, Y. Deng, and Y. Yu, Densebox: Unifying landmark localization with end to end object detection, p.61, 2015.

N. Hussein, E. Gavves, and A. W. Smeulders, Unified embedding and metric learning for zero-exemplar event detection, vol.25, p.99, 2017.

N. Ikizler-cinbis, R. Cinbis, and S. Sclaroff, Learning actions from the web, ICCV, vol.27, p.28, 2009.

M. Jain, J. Van-gemert, H. Jégou, P. Bouthemy, and C. Snoek, Action localization by tubelets from motion, CVPR, vol.8, p.66, 2014.

H. Jegou, F. Perronnin, M. Douze, J. Sánchez, P. Perez et al., Aggregating local image descriptors into compact codes, 1921.
URL : https://hal.archives-ouvertes.fr/inria-00633013

H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M. J. Black, Towards understanding action recognition, ICCV, vol.12, p.80, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00906902

S. Ji, W. Xu, M. Yang, and K. Yu, 3D convolutional neural networks for human action recognition. PAMI, p.22, 2013.

L. Jiang, T. Mitamura, S. Yu, and A. G. Hauptmann, Zero-example event search using multimodal pseudo relevance feedback, ICMR, vol.55, p.57, 2014.

L. Jiang, S. Yu, D. Meng, T. Mitamura, and A. G. Hauptmann, Bridging the ultimate semantic gap: A semantic search engine for internet videos, ICMR, vol.55, p.57, 2015.

L. Jiang, S. Yu, D. Meng, Y. Yang, T. Mitamura et al., Hauptmann. Fast and accurate content-based semantic search in 100M internet videos, ACMM, vol.55, p.57, 2015.

T. Joachims, Text categorization with suport vector machines: Learning with many relevant features, ECML, vol.29, p.35, 1998.

S. Johnson and M. Everingham, Clustered pose and nonlinear appearance models for human pose estimation, BMVC, p.81, 2010.

C. A. Kaestner, Support vector machines and kernel functions for text processing, Revista de Informática Teórica e Aplicada, vol.33, p.35, 2013.

V. Kalogeiton, P. Weinzaepfel, V. Ferrari, and C. Schmid, Action Tubelet Detector for Spatio-Temporal Action Localization, ICCV, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01519812

V. Kalogeiton, P. Weinzaepfel, V. Ferrari, and C. Schmid, Joint learning of object and action detectors, ICCV, p.103, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01575804

G. Kanizsa, Subjective contours, Scientific American, p.63, 1976.

G. Kanizsa and A. Chambolle, La grammaire du voir: essais sur la perception, Diderot Editeur arts et sciences, p.63, 1997.

A. Kar, S. Tulsiani, J. Carreira, and J. Malik, Amodal completion and size constancy in natural scenes, ICCV, vol.63, p.67, 2009.

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar et al., Large-scale video classification with convolutional neural networks, CVPR, vol.23, p.27, 2014.

Y. Ke, R. Sukthankar, and M. Hebert, Efficient visual event detection using volumetric features, ICCV, vol.23, p.27, 2005.

A. Kläser, M. Marsza?ek, and C. Schmid, A spatio-temporal descriptor based on 3D-gradients, BMVC, vol.5, p.20, 2008.

A. Kläser, M. Marszalek, C. Schmid, and A. Zisserman, Human Focused Action Localization in Video, International Workshop on Sign, Gesture, and Activity, vol.8, p.65, 2010.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, NIPS, vol.29, p.42, 1921.

W. Kuo, B. Hariharan, and J. Malik, Deepbox: Learning objectness with convolutional networks, ICCV, p.61, 2015.

C. H. Lampert, H. Nickisch, and S. Harmeling, Attribute-based classification for zero-shot visual object categorization, 2014.

T. Lan, Y. Wang, and G. Mori, Discriminative figure-centric models for joint action localization and recognition, ICCV, vol.8, p.65, 2011.

I. Laptev, On space-time interest points. IJCV, p.19, 2005.

I. Laptev, Modeling and visual recognition of human actions and interactions. Habilitation à diriger des recherches (HDR), 2013.
URL : https://hal.archives-ouvertes.fr/tel-01064540

I. Laptev and P. Pérez, Retrieving actions in movies, ICCV, vol.27, p.64, 2007.

I. Laptev, M. Marsza?ek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, CVPR, vol.23, p.27, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00548659

S. Lazebnik, C. Schmid, and J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, CVPR, vol.5, p.21, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00548585

T. Leung, Y. Song, and J. Zhang, Handling label noise in video classification via multiple instance learning, ICCV, vol.6, p.23, 2011.

D. D. Lewis, Reuters-21578 text categorization test collection, p.33, 1997.

J. Lezama, K. Alahari, J. Sivic, and I. Laptev, Track to the future: Spatiotemporal video segmentation with long-range motion cues, CVPR, p.20, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00817961

Y. Li, K. He, and J. Sun, R-fcn: Object detection via region-based fully convolutional networks, NIPS, p.61, 2016.

J. Liu, J. Luo, and M. Shah, Recognizing realistic actions from videos "in the wild, CVPR, vol.23, p.27, 2009.

J. Liu, Q. Yu, O. Javed, S. Ali, A. Tamrakar et al., Video event recognition using concept attributes, WACV, p.24, 2013.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed et al., SSD: Single shot multibox detector, ECCV, p.61, 2016.

H. Lodhi, C. Saunders, J. Shawe-taylor, N. Cristianini, and C. Watkins, Text classification using string kernels, JMLR, vol.19, p.29, 2002.

E. Loper and S. Bird, NLTK: The natural language toolkit, ACL Workshop, vol.34, p.46, 2002.

H. P. Luhn, A statistical approach to mechanized encoding and searching of literary information, IBM J. Res. Dev, p.17, 1957.

M. M. Puscas, E. Sangineto, D. Culibrk, and N. Sebe, Unsupervised tube extraction using transductive learning and dense trajectories, ICCV, vol.9, p.66, 2015.

S. Masnou and J. Morel, Level lines based disocclusion, ICIP, p.63, 1998.

P. Matikainen, M. Hebert, and R. Sukthankar, Trajectons: Action recognition through the motion analysis of tracked features, ICCV Workshops, p.20, 2009.

A. Mccallum and K. Nigam, A comparison of event models for naive bayes text classification, AAAI, p.29, 1998.

C. T. Meadow, Text Information Retrieval Systems, p.17, 1992.

T. Mensink, E. Gavves, and C. Snoek, Costa: Co-occurrence statistics for zero-shot classification, CVPR, vol.6, p.24, 2014.
DOI : 10.1109/cvpr.2014.313

URL : https://pure.uva.nl/ws/files/17209170/MensinkCVPR2014.pdf

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, ICLR, p.19, 2013.

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, NIPS, vol.34, p.39, 2013.

G. A. Miller, Wordnet: A lexical database for English, Com. ACM, vol.6, p.40, 1995.
DOI : 10.1145/219717.219748

D. Modolo and V. Ferrari, Learning semantic part-based models from google images, p.62, 2017.
DOI : 10.1109/tpami.2017.2724029

URL : http://arxiv.org/pdf/1609.03140

T. B. Moeslund, A. Hilton, and V. Krüger, A survey of advances in visionbased human motion capture and analysis, 2006.

M. M. Murray, D. M. Foxe, D. C. Javitt, and J. J. Foxe, Setting boundaries: brain dynamics of modal and amodal illusory shape completion in humans, Journal of Neuroscience, p.63, 2004.
DOI : 10.1523/jneurosci.1996-04.2004

URL : http://www.jneurosci.org/content/24/31/6898.full.pdf

J. R. Méndez, E. L. Iglesias, F. Fdez-riverola, F. Díaz, J. M. Corchado et al., Stemming and Stopword Removal on Anti-spam Filtering Domain, AEPIA, vol.15, p.30, 2005.

P. X. Nguyen, G. Rogez, C. Fowlkes, and D. Ramanan, The open world of micro-videos, 2016.

B. Ni, G. Wang, and P. Moulin, Rgbd-hudaact: A color-depth video database for human daily activity recognition. In Consumer Depth Cameras for Computer Vision, p.102, 2013.
DOI : 10.1109/iccvw.2011.6130379

URL : http://www.ntu.edu.sg/home/wanggang/NiWangMoulin2011.pdf

J. C. Niebles, H. Wang, and L. Fei-fei, Unsupervised learning of human action categories using spatial-temporal words. IJCV, vol.23, p.27, 2008.
DOI : 10.5244/c.20.127

URL : http://visionlab.ece.uiuc.edu/niebles/vpcvpr06.pdf

K. Nigam, A. K. Mccallum, S. Thrun, and T. Mitchell, Text classification from labeled and unlabeled documents using EM. Machine learning, p.29, 2000.

L. Niu, W. Li, and D. Xu, Visual recognition by learning from web data: A weakly supervised domain generalization approach, CVPR, vol.23, p.28, 2015.
DOI : 10.1109/cvpr.2015.7298894

Y. Niwa and Y. Nitta, Co-occurrence vectors from corpora vs. distance vectors from dictionaries, COLING, p.18, 1994.
DOI : 10.3115/991886.991938

URL : http://dl.acm.org/ft_gateway.cfm?id=991938&type=pdf

D. Oneata, J. Verbeek, and C. Schmid, Action and event recognition with fisher vectors on a compact feature set, ICCV, p.101, 2013.
DOI : 10.1109/iccv.2013.228

URL : https://hal.archives-ouvertes.fr/hal-00873662

D. Oneata, J. Revaud, J. Verbeek, and C. Schmid, Spatio-Temporal Object Detection Proposals, ECCV, vol.9, p.66, 2014.
DOI : 10.1007/978-3-319-10578-9_48

URL : https://hal.archives-ouvertes.fr/hal-01021902

W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo et al., Deepid-net: Deformable deep convolutional neural networks for object detection, CVPR, p.61, 2015.
DOI : 10.1109/cvpr.2015.7298854

URL : http://arxiv.org/pdf/1412.5661

P. Over, G. Awad, J. Fiscus, B. Antonishek, M. Michel et al., TRECVID 2010-An overview of the goals, tasks, data, evaluation mechanisms, and metrics. TRECVID, vol.2, p.7, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00953843

P. Over, J. Fiscus, G. Sanders, M. Michel, G. Awad et al., TRECVID 2013-An overview of the goals, tasks, data, evaluation mechanisms and metrics, p.29, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953093

M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, Zero-shot learning with semantic output codes, NIPS, 2009.

G. Paltoglou and M. Thelwall, A study of information retrieval weighting schemes for sentiment analysis, ACL, p.31, 2010.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in python, JMLR, p.34, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

X. Peng and C. Schmid, Multi-region two-stream R-CNN for action detection, ECCV, vol.67, p.86, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349107

X. Peng, C. Zou, Y. Qiao, and Q. Peng, Action recognition with stacked fisher vectors, ECCV, 1921.

F. Perronnin, J. Sánchez, and T. Mensink, Improving the fisher kernel for large-scale image classification, ECCV, p.21, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00548630

A. Pirkola and E. Leppänen, TREC 2003 genomics track experiments at UTA: Query expansion with predefinded high frequency terms, TREC, p.15, 2003.

J. Pomikálek and R. Reh??ekreh?reh??ek, The influence of preprocessing parameters on text categorization, International Journal of Applied Science, Engineering and Technology, vol.15, p.30, 2007.

M. F. Porter, An algorithm for suffix stripping. Program, vol.16, p.30, 1980.

A. Ratnaparkhi, A maximum entropy model for part-of-speech tagging, EMNLP, p.16, 1996.

R. Rauschenberger and S. Yantis, Masking unveils pre-amodal completion representation in visual search, Nature, p.63, 2001.

J. Redmon and A. Farhadi, Yolo9000: Better, faster, stronger, p.61, 2016.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, CVPR, p.61, 2016.

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, NIPS, vol.61, p.82, 2015.

S. Ren, K. He, R. Girshick, X. Zhang, and J. Sun, Object detection networks on convolutional feature maps, p.61, 2016.

G. Rogez, P. Weinzaepfel, and C. Schmid, LCR-Net: LocalizationClassification-Regression for Human Pose, CVPR, p.81, 2017.

S. Saha, G. Singh, M. Sapienza, P. Torr, and F. Cuzzolin, Deep learning for detecting multiple space-time action tubes in videos, BMVC, vol.67, p.102, 2016.

G. Salton, The SMART Retrieval System-Experiments in Automatic Document Processing, p.17, 1971.

G. Salton, A. Wong, and C. S. Yang, A vector space model for automatic indexing, Com. ACM, p.17, 1975.

J. Sanchez, F. Perronnin, T. E. Mensink, and J. Verbeek, Image classification with the Fisher vector: Theory and practice. IJCV, p.79, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00779493

P. Sand and S. Teller, Particle video: Long-range motion estimation using point trajectories. IJCV, p.20, 2008.

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus et al., Overfeat: Integrated recognition, localization and detection using CNN, ICLR, p.61, 2014.

S. Shehata, F. Karray, and M. S. Kamel, An efficient concept-based retrieval model for enhancing text retrieval quality. Knowledge and Information Systems, 2013.

R. Sibson, Slink: An optimally efficient algorithm for the single-link cluster method. The computer journal, p.66, 1973.

G. A. Sigurdsson, S. Divvala, A. Farhadi, and A. Gupta, Asynchronous temporal fields for action recognition, CVPR, p.102, 2017.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition. ICLR, vol.44, p.82, 2014.

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS, vol.23, p.27, 2014.

B. Singh, X. Han, Z. Wu, V. I. Morariu, and L. S. Davis, Selecting relevant web trained concepts for automated event retrieval, ICCV, vol.55, p.57, 2015.

J. Sivic and A. Zisserman, Video Google: A text retrieval approach to object matching in videos, ICCV, vol.5, p.20, 2003.

M. Smith and T. Kanade, Video skimming and characterization through the combination of image and language understanding techniques, CVPR, 1997.

Y. Song, M. Zhao, J. Yagnik, and X. Wu, Taxonomic classification for webbased videos, CVPR, vol.6, p.23, 2010.

S. Sonnenburg, S. Henschel, C. Widmer, J. Behr, A. Zien et al., The shogun machine learning toolbox, JMLR, p.34, 2010.

C. Tan, Y. Wang, and C. Lee, The use of bigrams to enhance text categorization, Inf. Process. Manage, p.18, 2002.

E. H. Taralova, F. De-la-torre, and M. Hebert, Motion words for videos, ECCV, 1921.

Y. Tian, R. Sukthankar, and M. Shah, Spatiotemporal deformable part models for action detection, CVPR, p.64, 2013.

S. Tomlinson, Robust, web and genomic retrieval with hummingbird searchserver at TREC, TREC, p.15, 2003.

G. Töpper, M. Knuth, and H. Sack, DBpedia ontology enrichment for inconsistency detection, ICSS, p.47, 2012.

K. Toutanova and C. Manning, Enriching the knowledge sources used in a maximum entropy part-of-speech tagger, EMNLP, vol.30, p.39, 2000.

K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, Feature-rich partof-speech tagging with a cyclic dependency network, NAACL, vol.16, p.30, 2003.

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning spatiotemporal features with 3D convolutional networks, ICCV, vol.5, p.22, 2015.

A. K. Uysal and S. Gunal, The impact of preprocessing on text classification. Information Processing and Management, vol.15, p.30, 2014.

J. Van-gemert, M. Jain, E. Gati, and C. Snoek, APT: Action localization proposals from dense trajectories, BMVC, vol.9, p.66, 2015.

J. C. Van-gemert, C. J. Veenman, A. W. Smeulders, and J. Geusebroek, Visual word ambiguity. PAMI, p.21, 2010.

V. N. Vapnik, Statistical learning theory, vol.5, p.14, 1998.

C. Vondrick, H. Pirsiavash, and A. Torralba, Generating videos with scene dynamics, NIPS, p.100, 2016.

H. Wang and C. Schmid, Action recognition with improved trajectories, ICCV, p.20, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00873267

H. Wang, A. Kläser, C. Schmid, and L. Cheng-lin, Action Recognition by Dense Trajectories, CVPR, vol.38, p.44, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00583818

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense trajectories and motion boundary descriptors for action recognition. IJCV, vol.19, p.20, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00803241

Q. Wang and K. Chen, Zero-shot visual recognition via bidirectional latent embedding, vol.5, p.6, 2016.

Z. Wang, M. Zhao, Y. Song, S. Kumar, and B. Li, YouTubeCat: Learning to categorize wild web videos, CVPR, vol.23, p.24, 2010.

P. Weinzaepfel, Z. Harchaoui, and C. Schmid, Learning to track for spatio-temporal action localization, ICCV, vol.67, p.86, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01159941

P. Weinzaepfel, X. Martin, and C. Schmid, Human action localization with sparse spatial supervision, vol.84, p.101, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01317558

S. Wu, S. Bondugula, F. Luisier, X. Zhuang, and P. Natarajan, Zero-shot event detection using multi-modal fusion of weakly supervised concepts, CVPR, vol.24, p.57, 2014.

L. Xia, C. Chen, and J. Aggarwal, View invariant human action recognition using histograms of 3D joints, CVPR, p.102, 2012.

Z. Xu, Y. Yang, and A. G. Hauptmann, A discriminative CNN video representation for event detection, CVPR, vol.23, p.27, 2015.

J. Yamato, J. Ohya, and K. Ishii, Recognizing human action in timesequential images using hidden markov model, CVPR, p.101, 1992.

Z. Yang, J. Gao, and R. Nevatia, Spatio-temporal action detection with cascade proposal and location anticipation, BMVC, vol.101, p.102, 2017.

G. Ye, Y. Li, H. Xu, D. Liu, and S. Chang, Eventnet: A large scale structured concept library for complex event detection in video, ACMM, vol.55, p.57, 2015.

D. Yoo, S. Park, J. Lee, A. S. Paek, and I. So-kweon, Attentionnet: Aggregating weak directions for accurate object detection, ICCV, p.61, 2015.

G. Yu and J. Yuan, Fast action proposals for human action detection and search, CVPR, p.66, 2015.

J. Yuan, Z. Liu, and Y. Wu, Discriminative subvolume search for efficient action detection, CVPR, vol.8, p.64, 2009.

X. Zhou, K. Yu, T. Zhang, and T. S. Huang, Image classification using super-vector coding of local image descriptors, ECCV, p.21, 2010.