An overview of text classification ,
3.2 Pruning with text and visual classification. .. 41 3.3.3 Video description and classifier, p.44 ,
, , vol.55
,
we describe our webly-supervised approach for action classification, p.17, 2007. ,
2D human pose estimation: New benchmark and state of the art analysis, CVPR, p.81, 2014. ,
Sequential deep learning for human action recognition, International Workshop on Human Behavior Understanding, p.22, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-01354493
Distributional clustering of words for text classification, SIGIR, p.29, 1998. ,
Surf: Speeded up robust features, ECCV, p.20, 2006. ,
Machine learning for visual concept recognition and ranking for images, Towards the Internet of Services: The THESEUS Research Program, p.42, 2014. ,
An evaluation of retrieval effectiveness for a full-text document-retrieval system, Com. ACM, p.17, 1985. ,
Coupled hidden Markov models for complex action recognition, CVPR, p.101, 1997. ,
Auditory Scene Analysis: The perceptual organization of sound, p.63, 1990. ,
Object segmentation by long term analysis of point trajectories, ECCV, p.20, 2010. ,
High accuracy optical flow estimation based on a theory for warping, ECCV, p.85, 2004. ,
End-to-end, single-stream temporal action detection in untrimmed videos, BMVC, vol.101, p.102, 2017. ,
Cross-dataset action detection, CVPR, vol.8, p.64, 2010. ,
Quo vadis, action recognition? A new model and the kinetics dataset, CVPR, 2017. ,
LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol.32, p.48, 2011. ,
Event-driven semantic concept discovery by exploiting weakly tagged internet images, ICMR, vol.55, p.57, 2014. ,
Event recognition in videos by learning from heterogeneous web sources, CVPR, vol.23, p.28, 2013. ,
Detect what you can: Detecting and representing objects using holistic models and body parts, CVPR, p.62, 2014. ,
Learning from web events for event classification. TCSVT, vol.26, p.100, 2017. ,
Detecting parts for action localization, BMVC, p.12, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01573629
Concept extraction and synonymy management for biomedical information retrieval, TREC, p.15, 2004. ,
Visual categorization with bags of keypoints, Workshop on statistical learning in computer vision, ECCV, 1921. ,
Histograms of oriented gradients for human detection, CVPR, p.19, 2005. ,
URL : https://hal.archives-ouvertes.fr/inria-00548512
Human detection using oriented histograms of flow and appearance, ECCV, vol.5, p.20, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00548587
Partitioning a graph of sequences, structures and abstracts for information retrieval, TREC, p.15, 2003. ,
ImageNet: A Large-Scale Hierarchical Image Database, CVPR, p.83, 2009. ,
Learning everything about anything: Webly-supervised visual concept learning, CVPR, p.24, 2014. ,
Long-term recurrent convolutional networks for visual recognition and description, CVPR, vol.5, p.22, 2015. ,
Visual event recognition in videos by learning from web data, vol.27, p.28, 2012. ,
Learning collections of part models for object recognition, CVPR, p.62, 2013. ,
Scalable object detection using deep neural networks, CVPR, p.61, 2014. ,
The PASCAL Visual Object Classes (VOC) Challenge, IJCV, p.69, 2010. ,
Convolutional two-stream network fusion for video action recognition, CVPR, 2016. ,
Object detection with discriminatively trained part based models, vol.62, p.65, 2010. ,
, Temporal Localization of Actions with Actoms. PAMI, vol.64, p.101, 1921.
URL : https://hal.archives-ouvertes.fr/hal-00687312
Devnet: A deep event network for multimedia event detection and evidence recounting, CVPR, vol.23, p.27, 2015. ,
Webly-supervised video recognition by mutually voting for relevant web images and web video frames, ECCV, vol.55, p.57, 2016. ,
You lead, we exceed: Laborfree video concept learning by jointly exploiting web videos and images, CVPR, vol.55, p.57, 2016. ,
Red: Reinforced encoder-decoder networks for action anticipation, vol.101, p.102, 2017. ,
The effect of amodal completion on visual matching, Acta psychologica, p.63, 1987. ,
Object detection via a multi-region and semantic segmentation-aware CNN model, ICCV, p.61, 2015. ,
, , vol.61, p.68, 2015.
Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, vol.61, p.79, 2014. ,
Finding action tubes, CVPR, vol.67, p.86, 2015. ,
Generative adversarial nets, NIPS, p.100, 2014. ,
Composite concept discovery for zero-shot video event detection, ICMR, vol.6, p.57, 2014. ,
, , p.17, 1986.
, SIGLEX, vol.16, p.30, 1999.
Struck: Structured output tracking with kernels, ICCV, p.72, 2011. ,
Distributional structure. Word, p.17, 1954. ,
Text, speech and vision for video segmentation: The informedia project, AAAI Fall Symposium, 1995. ,
, Spatial pyramid pooling in deep convolutional networks for visual recognition. PAMI, p.61, 2015.
Deep residual learning for image recognition, CVPR, p.22, 2016. ,
A probabilistic justification for using tf-idf term weighting in information retrieval, International Journal on Digital Libraries, p.42, 2000. ,
Long short-term memory. Neural Computation, p.22, 1997. ,
Real-time temporal action localization in untrimmed videos by sub-action discovery, BMVC, p.101, 2017. ,
Densebox: Unifying landmark localization with end to end object detection, p.61, 2015. ,
Unified embedding and metric learning for zero-exemplar event detection, vol.25, p.99, 2017. ,
Learning actions from the web, ICCV, vol.27, p.28, 2009. ,
Action localization by tubelets from motion, CVPR, vol.8, p.66, 2014. ,
Aggregating local image descriptors into compact codes, 1921. ,
URL : https://hal.archives-ouvertes.fr/inria-00633013
Towards understanding action recognition, ICCV, vol.12, p.80, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00906902
, 3D convolutional neural networks for human action recognition. PAMI, p.22, 2013.
Zero-example event search using multimodal pseudo relevance feedback, ICMR, vol.55, p.57, 2014. ,
Bridging the ultimate semantic gap: A semantic search engine for internet videos, ICMR, vol.55, p.57, 2015. ,
Hauptmann. Fast and accurate content-based semantic search in 100M internet videos, ACMM, vol.55, p.57, 2015. ,
Text categorization with suport vector machines: Learning with many relevant features, ECML, vol.29, p.35, 1998. ,
Clustered pose and nonlinear appearance models for human pose estimation, BMVC, p.81, 2010. ,
Support vector machines and kernel functions for text processing, Revista de Informática Teórica e Aplicada, vol.33, p.35, 2013. ,
Action Tubelet Detector for Spatio-Temporal Action Localization, ICCV, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01519812
Joint learning of object and action detectors, ICCV, p.103, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01575804
Subjective contours, Scientific American, p.63, 1976. ,
La grammaire du voir: essais sur la perception, Diderot Editeur arts et sciences, p.63, 1997. ,
Amodal completion and size constancy in natural scenes, ICCV, vol.63, p.67, 2009. ,
Large-scale video classification with convolutional neural networks, CVPR, vol.23, p.27, 2014. ,
Efficient visual event detection using volumetric features, ICCV, vol.23, p.27, 2005. ,
A spatio-temporal descriptor based on 3D-gradients, BMVC, vol.5, p.20, 2008. ,
Human Focused Action Localization in Video, International Workshop on Sign, Gesture, and Activity, vol.8, p.65, 2010. ,
Imagenet classification with deep convolutional neural networks, NIPS, vol.29, p.42, 1921. ,
Deepbox: Learning objectness with convolutional networks, ICCV, p.61, 2015. ,
Attribute-based classification for zero-shot visual object categorization, 2014. ,
Discriminative figure-centric models for joint action localization and recognition, ICCV, vol.8, p.65, 2011. ,
On space-time interest points. IJCV, p.19, 2005. ,
Modeling and visual recognition of human actions and interactions. Habilitation à diriger des recherches (HDR), 2013. ,
URL : https://hal.archives-ouvertes.fr/tel-01064540
Retrieving actions in movies, ICCV, vol.27, p.64, 2007. ,
Learning realistic human actions from movies, CVPR, vol.23, p.27, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00548659
Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, CVPR, vol.5, p.21, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00548585
Handling label noise in video classification via multiple instance learning, ICCV, vol.6, p.23, 2011. ,
Reuters-21578 text categorization test collection, p.33, 1997. ,
Track to the future: Spatiotemporal video segmentation with long-range motion cues, CVPR, p.20, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00817961
R-fcn: Object detection via region-based fully convolutional networks, NIPS, p.61, 2016. ,
Recognizing realistic actions from videos "in the wild, CVPR, vol.23, p.27, 2009. ,
Video event recognition using concept attributes, WACV, p.24, 2013. ,
SSD: Single shot multibox detector, ECCV, p.61, 2016. ,
Text classification using string kernels, JMLR, vol.19, p.29, 2002. ,
NLTK: The natural language toolkit, ACL Workshop, vol.34, p.46, 2002. ,
A statistical approach to mechanized encoding and searching of literary information, IBM J. Res. Dev, p.17, 1957. ,
Unsupervised tube extraction using transductive learning and dense trajectories, ICCV, vol.9, p.66, 2015. ,
Level lines based disocclusion, ICIP, p.63, 1998. ,
Trajectons: Action recognition through the motion analysis of tracked features, ICCV Workshops, p.20, 2009. ,
A comparison of event models for naive bayes text classification, AAAI, p.29, 1998. ,
Text Information Retrieval Systems, p.17, 1992. ,
Costa: Co-occurrence statistics for zero-shot classification, CVPR, vol.6, p.24, 2014. ,
DOI : 10.1109/cvpr.2014.313
URL : https://pure.uva.nl/ws/files/17209170/MensinkCVPR2014.pdf
Efficient estimation of word representations in vector space, ICLR, p.19, 2013. ,
Distributed representations of words and phrases and their compositionality, NIPS, vol.34, p.39, 2013. ,
Wordnet: A lexical database for English, Com. ACM, vol.6, p.40, 1995. ,
DOI : 10.1145/219717.219748
Learning semantic part-based models from google images, p.62, 2017. ,
DOI : 10.1109/tpami.2017.2724029
URL : http://arxiv.org/pdf/1609.03140
A survey of advances in visionbased human motion capture and analysis, 2006. ,
Setting boundaries: brain dynamics of modal and amodal illusory shape completion in humans, Journal of Neuroscience, p.63, 2004. ,
DOI : 10.1523/jneurosci.1996-04.2004
URL : http://www.jneurosci.org/content/24/31/6898.full.pdf
Stemming and Stopword Removal on Anti-spam Filtering Domain, AEPIA, vol.15, p.30, 2005. ,
, The open world of micro-videos, 2016.
Rgbd-hudaact: A color-depth video database for human daily activity recognition. In Consumer Depth Cameras for Computer Vision, p.102, 2013. ,
DOI : 10.1109/iccvw.2011.6130379
URL : http://www.ntu.edu.sg/home/wanggang/NiWangMoulin2011.pdf
Unsupervised learning of human action categories using spatial-temporal words. IJCV, vol.23, p.27, 2008. ,
DOI : 10.5244/c.20.127
URL : http://visionlab.ece.uiuc.edu/niebles/vpcvpr06.pdf
Text classification from labeled and unlabeled documents using EM. Machine learning, p.29, 2000. ,
Visual recognition by learning from web data: A weakly supervised domain generalization approach, CVPR, vol.23, p.28, 2015. ,
DOI : 10.1109/cvpr.2015.7298894
Co-occurrence vectors from corpora vs. distance vectors from dictionaries, COLING, p.18, 1994. ,
DOI : 10.3115/991886.991938
URL : http://dl.acm.org/ft_gateway.cfm?id=991938&type=pdf
Action and event recognition with fisher vectors on a compact feature set, ICCV, p.101, 2013. ,
DOI : 10.1109/iccv.2013.228
URL : https://hal.archives-ouvertes.fr/hal-00873662
Spatio-Temporal Object Detection Proposals, ECCV, vol.9, p.66, 2014. ,
DOI : 10.1007/978-3-319-10578-9_48
URL : https://hal.archives-ouvertes.fr/hal-01021902
Deepid-net: Deformable deep convolutional neural networks for object detection, CVPR, p.61, 2015. ,
DOI : 10.1109/cvpr.2015.7298854
URL : http://arxiv.org/pdf/1412.5661
TRECVID 2010-An overview of the goals, tasks, data, evaluation mechanisms, and metrics. TRECVID, vol.2, p.7, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00953843
TRECVID 2013-An overview of the goals, tasks, data, evaluation mechanisms and metrics, p.29, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00953093
Zero-shot learning with semantic output codes, NIPS, 2009. ,
A study of information retrieval weighting schemes for sentiment analysis, ACL, p.31, 2010. ,
Scikit-learn: Machine learning in python, JMLR, p.34, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00650905
Multi-region two-stream R-CNN for action detection, ECCV, vol.67, p.86, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01349107
Action recognition with stacked fisher vectors, ECCV, 1921. ,
Improving the fisher kernel for large-scale image classification, ECCV, p.21, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00548630
TREC 2003 genomics track experiments at UTA: Query expansion with predefinded high frequency terms, TREC, p.15, 2003. ,
The influence of preprocessing parameters on text categorization, International Journal of Applied Science, Engineering and Technology, vol.15, p.30, 2007. ,
An algorithm for suffix stripping. Program, vol.16, p.30, 1980. ,
A maximum entropy model for part-of-speech tagging, EMNLP, p.16, 1996. ,
Masking unveils pre-amodal completion representation in visual search, Nature, p.63, 2001. ,
, Yolo9000: Better, faster, stronger, p.61, 2016.
You only look once: Unified, real-time object detection, CVPR, p.61, 2016. ,
Faster R-CNN: Towards real-time object detection with region proposal networks, NIPS, vol.61, p.82, 2015. ,
Object detection networks on convolutional feature maps, p.61, 2016. ,
LCR-Net: LocalizationClassification-Regression for Human Pose, CVPR, p.81, 2017. ,
Deep learning for detecting multiple space-time action tubes in videos, BMVC, vol.67, p.102, 2016. ,
The SMART Retrieval System-Experiments in Automatic Document Processing, p.17, 1971. ,
A vector space model for automatic indexing, Com. ACM, p.17, 1975. ,
Image classification with the Fisher vector: Theory and practice. IJCV, p.79, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00779493
Particle video: Long-range motion estimation using point trajectories. IJCV, p.20, 2008. ,
Overfeat: Integrated recognition, localization and detection using CNN, ICLR, p.61, 2014. ,
An efficient concept-based retrieval model for enhancing text retrieval quality. Knowledge and Information Systems, 2013. ,
Slink: An optimally efficient algorithm for the single-link cluster method. The computer journal, p.66, 1973. ,
Asynchronous temporal fields for action recognition, CVPR, p.102, 2017. ,
Very deep convolutional networks for large-scale image recognition. ICLR, vol.44, p.82, 2014. ,
Two-stream convolutional networks for action recognition in videos, NIPS, vol.23, p.27, 2014. ,
Selecting relevant web trained concepts for automated event retrieval, ICCV, vol.55, p.57, 2015. ,
Video Google: A text retrieval approach to object matching in videos, ICCV, vol.5, p.20, 2003. ,
Video skimming and characterization through the combination of image and language understanding techniques, CVPR, 1997. ,
Taxonomic classification for webbased videos, CVPR, vol.6, p.23, 2010. ,
The shogun machine learning toolbox, JMLR, p.34, 2010. ,
The use of bigrams to enhance text categorization, Inf. Process. Manage, p.18, 2002. ,
Motion words for videos, ECCV, 1921. ,
Spatiotemporal deformable part models for action detection, CVPR, p.64, 2013. ,
Robust, web and genomic retrieval with hummingbird searchserver at TREC, TREC, p.15, 2003. ,
DBpedia ontology enrichment for inconsistency detection, ICSS, p.47, 2012. ,
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger, EMNLP, vol.30, p.39, 2000. ,
Feature-rich partof-speech tagging with a cyclic dependency network, NAACL, vol.16, p.30, 2003. ,
Learning spatiotemporal features with 3D convolutional networks, ICCV, vol.5, p.22, 2015. ,
The impact of preprocessing on text classification. Information Processing and Management, vol.15, p.30, 2014. ,
APT: Action localization proposals from dense trajectories, BMVC, vol.9, p.66, 2015. ,
, Visual word ambiguity. PAMI, p.21, 2010.
Statistical learning theory, vol.5, p.14, 1998. ,
Generating videos with scene dynamics, NIPS, p.100, 2016. ,
Action recognition with improved trajectories, ICCV, p.20, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00873267
Action Recognition by Dense Trajectories, CVPR, vol.38, p.44, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00583818
Dense trajectories and motion boundary descriptors for action recognition. IJCV, vol.19, p.20, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00803241
Zero-shot visual recognition via bidirectional latent embedding, vol.5, p.6, 2016. ,
YouTubeCat: Learning to categorize wild web videos, CVPR, vol.23, p.24, 2010. ,
Learning to track for spatio-temporal action localization, ICCV, vol.67, p.86, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01159941
Human action localization with sparse spatial supervision, vol.84, p.101, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01317558
Zero-shot event detection using multi-modal fusion of weakly supervised concepts, CVPR, vol.24, p.57, 2014. ,
View invariant human action recognition using histograms of 3D joints, CVPR, p.102, 2012. ,
A discriminative CNN video representation for event detection, CVPR, vol.23, p.27, 2015. ,
Recognizing human action in timesequential images using hidden markov model, CVPR, p.101, 1992. ,
Spatio-temporal action detection with cascade proposal and location anticipation, BMVC, vol.101, p.102, 2017. ,
Eventnet: A large scale structured concept library for complex event detection in video, ACMM, vol.55, p.57, 2015. ,
Attentionnet: Aggregating weak directions for accurate object detection, ICCV, p.61, 2015. ,
Fast action proposals for human action detection and search, CVPR, p.66, 2015. ,
Discriminative subvolume search for efficient action detection, CVPR, vol.8, p.64, 2009. ,
Image classification using super-vector coding of local image descriptors, ECCV, p.21, 2010. ,