This seems to be a good prior to build on for 3D object boxes, supporting planes and occlusion boundaries estimation in realistic images of cluttered indoor scenes, locally about objects in rooms before making any decision about the global interpretation. This has been pointed out by, 2004. ,
What is an object, CVPR, 2010. ,
Pictorial structures revisited: People detection and articulated pose estimation, CVPR, 2009. ,
DOI : 10.1109/cvprw.2009.5206754
Monocular 3D pose estimation and tracking by detection, CVPR, 2010. ,
DOI : 10.1109/cvpr.2010.5540156
Action comprehension: deriving spatial and functional relations, Journal of Experimental Psychology: Human Perception and Performance, vol.31, p.465, 2005. ,
DOI : 10.1037/0096-1523.31.3.465
URL : https://pearl.plymouth.ac.uk/bitstream/10026.1/1023/2/2005%20Spatial%20and%20Functional.pdf
Toward coherent object detection and scene layout understanding, Image and Vision Computing, vol.29, pp.569-579, 2011. ,
DOI : 10.1016/j.imavis.2011.08.001
Geometric image parsing in man-made environments, ECCV, 2010. ,
The recognition of human movement using temporal templates, 2001. ,
Representing shape with a spatial pyramid kernel, Proc. CIVR, 2007. ,
DOI : 10.1145/1282280.1282340
Detecting people using mutually consistent poselet activations, ECCV, 2010. ,
DOI : 10.1007/978-3-642-15567-3_13
Poselets: Body part detectors trained using 3D human pose annotations, ICCV, 2009. ,
DOI : 10.1109/iccv.2009.5459303
Object segmentation by alignment of poselet activations to image contours, CVPR, 2011. ,
Gestural knowledge evoked by objects as part of conceptual representations, Aphasiology, vol.20, pp.1112-1124, 2006. ,
DOI : 10.1080/02687030600741667
URL : http://web.uvic.ca/psyc/masson/BM06.pdf
Representation of manipulable man-made objects in the dorsal stream, Neuroimage, vol.12, pp.478-484, 2000. ,
Layout estimation of highly cluttered indoor scenes using geometric and semantic cues, ICIAP, 2013. ,
DOI : 10.1007/978-3-642-41184-7_50
URL : http://www.eecs.umich.edu/vision/IndoorHumanActivity/chao_iciap2013.pdf
Understanding indoor scenes using 3d geometric phrases, CVPR, 2013. ,
DOI : 10.1109/cvpr.2013.12
Approximating discrete probability distributions with dependence trees. Information Theory, IEEE Transactions on, vol.14, pp.462-467, 1968. ,
Visual categorization with bags of keypoints, 2004. ,
Histograms of oriented gradients for human detection, CVPR, 2005. ,
DOI : 10.1109/cvpr.2005.177
URL : https://hal.archives-ouvertes.fr/inria-00548512
Human pose estimation using body parts dependent joint regressors, CVPR, 2013. ,
DOI : 10.1109/cvpr.2013.391
URL : http://files.is.tue.mpg.de/jgall/download/jgall_humanpose2d_cvpr13.pdf
Fast, accurate detection of 100,000 object classes on a single machine, CVPR, 2013. ,
DOI : 10.1109/cvpr.2013.237
URL : http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40814.pdf
Bayesian geometric modeling of indoor scenes, CVPR, 2012. ,
Sampling bedrooms, CVPR, 2011. ,
Recognizing human actions in still images: a study of bag-of-features and partbased representations, Proc. BMVC. updated version, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-01060885
Willow actions database, 2010. ,
Learning person-object interactions for action recognition in still images, NIPS, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00648156
Detecting actions, poses, and objects with relational phraselets, ECCV, 2012. ,
DOI : 10.1007/978-3-642-33765-9_12
Discriminative models for static human-object interactions, CVPR, 2010. ,
DOI : 10.1109/cvprw.2010.5543176
Articulated body motion capture by annealed particle filtering, CVPR, 2000. ,
DOI : 10.1109/cvpr.2000.854758
URL : http://cs.gmu.edu/~zduric/it835/Papers/cvpr2000-deutscher.pdf
Tracking through singularities and discontinuities by random sampling, ICCV, 1999. ,
DOI : 10.1109/iccv.1999.790409
Mid-level visual element discovery as discriminative mode seeking, NIPS, 2013. ,
, , 2012.
Behavior recognition via sparse spatio-temporal features, ICCV Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. ,
Inferring 3d body pose from silhouettes using activity manifold learning, CVPR, 2004. ,
DOI : 10.1109/cvpr.2004.1315230
The PASCAL Visual Object Classes Challenge, 2007. ,
DOI : 10.1007/11736790_8
URL : https://hal.archives-ouvertes.fr/inria-00548597
, The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results, 2010.
DOI : 10.1007/s11263-014-0733-5
URL : https://www.pure.ed.ac.uk/ws/files/20017166/ijcv_voc14.pdf
, The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results, 2012.
DOI : 10.1007/s11263-014-0733-5
URL : https://www.pure.ed.ac.uk/ws/files/20017166/ijcv_voc14.pdf
Learning to recognize objects in egocentric activities, CVPR, 2011. ,
DOI : 10.1109/cvpr.2011.5995444
What, where and who? telling the story of an image by activity classification, scene recognition and object categorization, Computer Vision, pp.157-171, 2010. ,
A bayesian hierarchical model for learning natural scene categories, CVPR, 2005. ,
Learning models for object recognition, CVPR, 2001. ,
Object detection with discriminatively trained part based models, 2009. ,
Distance transforms of sampled functions, 1963. ,
Pictorial structures for object recognition, 2005. ,
Efficient matching of pictorial structures, CVPR, 2000. ,
, Discriminatively trained deformable part models, 2008.
A discriminatively trained, multiscale, deformable part model, CVPR, 2008. ,
Progressive search space reduction for human pose estimation, CVPR, 2008. ,
Progressive search space reduction for human pose estimation, 2008. ,
The representation and matching of pictorial structures, IEEE Transactions on Computer, vol.22, pp.67-92, 1973. ,
People watching: Human actions as a cue for single-view geometry, ECCV, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-01066257
Data-driven 3d primitives for single image understanding, ICCV, 2013. ,
Unfolding an indoor origami world, ECCV, 2014. ,
A decision theoretic generalisation of online learning, Computer and System Sciences, vol.55, pp.119-139, 1997. ,
Functional categorization of objects using real-time markerless motion capture, CVPR, 2011. ,
Action recognition in the premotor cortex, Brain, vol.119, pp.593-609, 1996. ,
Mirror neurons and the simulation theory of mind-reading, Trends in cognitive sciences, vol.2, pp.493-501, 1998. ,
Pedestrian detection from a moving vehicle, ECCV, 2000. ,
Joint 3d estimation of objects and scene layout, NIPS, 2011. ,
The ecological approach to visual perception, 1979. ,
Object detection with grammar models, NIPS, 2011. ,
Articulated pose estimation using discriminative armlet classifiers, CVPR, 2013. ,
Novel approach to nonlinear/non-gaussian bayesian state estimation, IEE Proceedings F (Radar and Signal Processing, 1993. ,
Actions as space-time shapes, 2007. ,
What makes a chair a chair, CVPR, 2011. ,
Style-based inverse kinematics, SIGGRAPH, 2004. ,
Context and observation driven latent variable model for human pose estimation, CVPR, 2008. ,
Blocks world revisited: Image understanding using qualitative geometry and mechanics, ECCV, 2010. ,
Observing humanobject interactions: Using spatial and functional compatibility for recognition, 2009. ,
Constraint integration for efficient multiview pose estimation with self-occlusions, vol.30, pp.493-506, 2008. ,
From 3D scene geometry to human workspace, CVPR, 2011. ,
Object recognition using coloured receptive fields, ECCV, 2000. ,
Computationally efficient regression on a dependency graph for human pose estimation, CVPR, 2013. ,
Combining efficient object localization and image classification, ICCV, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00439516
, The Elements of Statistical Learning, 2003.
Recovering the spatial layout of cluttered rooms, ICCV, 2009. ,
Thinking inside the box: Using appearance models and context based on room geometry, ECCV, 2010. ,
Recovering free space of indoor scenes from a single image, CVPR, 2012. ,
The role of action representations in visual object recognition, Experimental Brain Research, vol.174, pp.221-228, 2006. ,
Geometric context from a single image, ICCV, 2005. ,
Real-time body tracking using a gaussian process latent variable model, ICCV, 2007. ,
DOI : 10.1109/iccv.2007.4408946
URL : http://www.cs.man.ac.uk/~agalata/publications/tracking_iccv07.pdf
Recognizing actions from still images, Proc. ICPR, 2008. ,
DOI : 10.1109/icpr.2008.4761663
URL : http://www.cs.bilkent.edu.tr/~duygulu/papers/ICPR2008-ActionStill.pdf
Learning actions from the Web, ICCV, 2009. ,
Probabilistic methods for finding people, 2001. ,
Condensation-conditional density propagation for visual tracking, IJCV, vol.29, pp.5-28, 1998. ,
Detection and location of people in video images using adaptive fusion of color and edge information, Proc. ICPR, 2000. ,
A biologically inspired system for action recognition, ICCV, 2007. ,
DOI : 10.1109/iccv.2007.4408988
URL : http://www.cs.tau.ac.il/~wolf/papers/action151.pdf
Leeds sports pose dataset, 2010. ,
Clustered pose and nonlinear appearance models for human pose estimation, Proc. BMVC, 2010. ,
DOI : 10.5244/c.24.12
Learning effective human pose estimation from inaccurate annotation, CVPR, 2011. ,
Actions or hand-object interactions? human inferior frontal cortex and action observation, Neuron, vol.39, pp.1053-1058, 2003. ,
DOI : 10.1016/s0896-6273(03)00524-5
URL : https://doi.org/10.1016/s0896-6273(03)00524-5
Spectral latent variable models for perceptual inference, ICCV, 2007. ,
DOI : 10.1109/iccv.2007.4408845
Geometry driven semantic labeling of indoor scenes, ECCV, 2014. ,
DOI : 10.1007/978-3-319-10590-1_44
Activity forecasting, ECCV, 2012. ,
Simultaneous visual recognition of manipulation actions and manipulated objects, ECCV, 2008. ,
Robust higher order potentials for enforcing label consistency, IJCV, vol.82, pp.302-324, 2009. ,
DOI : 10.1007/s11263-008-0202-0
URL : https://radar.brookes.ac.uk/radar/file/e70ce935-9726-a58f-650e-e4ffae4709d8/1/kohli2009robust.pdf
But still, it moves, Trends in cognitive sciences, vol.8, pp.47-49, 2004. ,
Activation in human mt/mst by static images with implied motion, Journal of cognitive neuroscience, vol.12, pp.48-55, 2000. ,
DOI : 10.1162/08989290051137594
URL : http://web.mit.edu/bcs/nklab/media/pdfs/KourtziKanwisherJOCN00.pdf
Bayesian autocalibration for surveillance, ICCV, 2005. ,
DOI : 10.1109/iccv.2005.44
HMDB: a large video database for human motion recognition, ICCV, 2011. ,
DOI : 10.1109/iccv.2011.6126543
Modeling and visual recognition of human actions and interactions. Habilitation à diriger des recherches en mathématiques et en informatique, 2013. ,
URL : https://hal.archives-ouvertes.fr/tel-01064540
Space-time interest points, ICCV, 2003. ,
Learning realistic human actions from movies, CVPR, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00548659
Retrieving actions in movies, ICCV, 2007. ,
Gaussian process latent variable models for visualisation of high dimensional data, NIPS, 2003. ,
Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, CVPR, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00548585
Geometric reasoning for single image structure recovery, ICCV, 2009. ,
Object bank: A highlevel image representation for scene classification and semantic feature sparsification, NIPS, 2010. ,
What, where and who? Classifying events by scene and object recognition, ICCV, 2007. ,
Object recognition from local scale-invariant features, ICCV, 1999. ,
Distinctive image features from scale-invariant keypoints, IJCV, vol.60, pp.91-110, 2004. ,
Action recognition from a distributed representation of pose and appearance, CVPR, 2011. ,
Actions in context, CVPR, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00548645
Human detection based on a probabilistic assembly of robust part detectors, ECCV, 2004. ,
URL : https://hal.archives-ouvertes.fr/inria-00548537
Example-based object detection in images by components, 2001. ,
Exploiting human actions and object context for recognition tasks, ICCV, 1999. ,
Observing others: multiple action representation in the frontal lobe, Science, vol.310, pp.332-336, 2005. ,
Unsupervised learning of human action categories using spatial-temporal words, 2008. ,
An analysis system for scenes containing objects with substructures, Proceedings of the Fourth International Joint Conference on Pattern Recognitions, 1978. ,
Modeling the shape of the scene: A holistic representation of the spatial envelope, IJCV, vol.42, pp.145-175, 2001. ,
Learning and transferring mid-level image representations using convolutional neural networks, CVPR, 2014. ,
DOI : 10.1109/cvpr.2014.222
URL : https://hal.archives-ouvertes.fr/hal-00911179
Pedestrian detection using wavelet templates, CVPR, 1997. ,
DOI : 10.1109/cvpr.1997.609319
Vision science: photons to phenomenology, 1999. ,
Scene recognition and weakly supervised object localization with deformable part-based models, ICCV, 2011. ,
DOI : 10.1109/iccv.2011.6126383
URL : http://www.cs.unc.edu/~lazebnik/publications/megha_iccv2011.pdf
A trainable system for object detection, 2000. ,
Scene shape from texture of objects, CVPR, 2011. ,
DOI : 10.1109/cvpr.2011.5995326
URL : http://web.engr.oregonstate.edu/%7Esinisa/research/publications/cvpr11_sceneLayout.pdf
Combining image regions and human activity for indirect object recognition in indoor wide-angle views, ICCV, 2005. ,
DOI : 10.1109/iccv.2005.57
URL : http://dro.deakin.edu.au/eserv/DU:30044614/venkatesh-combiningimage-2005.pdf
Poselet conditioned pictorial structures, CVPR, 2013. ,
DOI : 10.1109/cvpr.2013.82
URL : http://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Pishchulin_Poselet_Conditioned_Pictorial_2013_CVPR_paper.pdf
Strong appearance and expressive spatial models for human pose estimation, ICCV, 2013. ,
DOI : 10.1109/iccv.2013.433
Weakly supervised learning of interactions between humans and objects, 2011. ,
DOI : 10.1109/tpami.2011.158
URL : https://hal.archives-ouvertes.fr/inria-00516477
Recognizing indoor scenes, CVPR, 2009. ,
DOI : 10.1109/cvprw.2009.5206537
URL : http://people.csail.mit.edu/torralba/publications/indoor.pdf
Learning to parse images of articulated bodies, NIPS, 2006. ,
Strike a pose: Tracking people by finding stylized poses, CVPR, 2005. ,
DOI : 10.1109/cvpr.2005.335
URL : http://nma.berkeley.edu/ark:/28722/bk0005s5c36
Density-aware person detection and tracking in crowds, ICCV, 2011. ,
DOI : 10.1109/iccv.2011.6126526
URL : https://hal.archives-ouvertes.fr/hal-00654266
What can casual walkers tell us about the 3D scene, CVPR, 2007. ,
DOI : 10.1109/iccv.2007.4409082
Nonlinear dimensionality reduction by locally linear embedding, Science, vol.290, pp.2323-2326, 2000. ,
DOI : 10.1126/science.290.5500.2323
Imagenet: Large scale visual recognition challenge, 2014. ,
DOI : 10.1007/s11263-015-0816-y
URL : http://arxiv.org/pdf/1409.0575
MODEC: Multimodal decomposable models for human pose estimation, CVPR, 2013. ,
DOI : 10.1109/cvpr.2013.471
Cascaded models for articulated pose estimation, ECCV, 2010. ,
DOI : 10.1007/978-3-642-15552-9_30
URL : https://repository.upenn.edu/cgi/viewcontent.cgi?article=1577&context=cis_papers
, , 2000.
, Agent-based moving object correspondence using differential discriminative diagnosis, CVPR
Data-driven scene understanding from 3D models, Proc. BMVC, 2012. ,
DOI : 10.5244/c.26.128
URL : http://www.bmva.org/bmvc/2012/BMVC/paper128/paper128.pdf
Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, 2002. ,
Recognizing human actions: a local svm approach, Proc. ICPR, 2004. ,
DOI : 10.1109/icpr.2004.1334462
Efficient structured prediction for 3D indoor scene understanding, CVPR, 2012. ,
DOI : 10.1109/cvpr.2012.6248006
Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation, ECCV, 2006. ,
DOI : 10.1007/11744023_1
Stochastic tracking of 3d human figures using 2d image motion, ECCV, 2000. ,
DOI : 10.1007/3-540-45053-x_45
Tracking loose-limbed people, CVPR, 2004. ,
DOI : 10.1109/cvpr.2004.1315063
Instance segmentation of indoor scenes using a coverage loss, ECCV, 2014. ,
DOI : 10.1007/978-3-319-10590-1_40
Two-stream convolutional networks for action recognition in videos, NIPS, 2014. ,
Unsupervised discovery of mid-level discriminative patches, ECCV, 2012. ,
DOI : 10.1007/978-3-642-33709-3_6
URL : http://arxiv.org/pdf/1205.3137
Video Google: A text retrieval approach to object matching in videos, ICCV, 2003. ,
DOI : 10.1109/iccv.2003.1238663
UCF101: A dataset of 101 human actions classes from videos in the wild, ICVS, 2008. ,
Adaptive background mixture models for real-time tracking, CVPR, 1998. ,
Learning discriminative part detectors for image classification and cosegmentation, ICCV, 2013. ,
DOI : 10.1109/iccv.2013.422
URL : https://hal.archives-ouvertes.fr/hal-00932380
A global geometric framework for nonlinear dimensionality reduction, Science, vol.290, pp.2319-2323, 2000. ,
DOI : 10.1126/science.290.5500.2319
Superparsing: scalable nonparametric image parsing with superpixels, ECCV, 2010. ,
DOI : 10.1007/978-3-642-15555-0_26
URL : http://www.cs.unc.edu/%7Elazebnik/publications/eccv10-jtighe.pdf
Joint training of a convolutional network and a graphical model for human pose estimation, CoRR, 2014. ,
Deeppose: Human pose estimation via deep neural networks, CVPR, 2014. ,
DOI : 10.1109/cvpr.2014.214
URL : http://arxiv.org/pdf/1312.4659
Unsupervised learning of functional categories in video scenes, ECCV, 2010. ,
DOI : 10.1007/978-3-642-15552-9_48
What is the spatial extent of an object, CVPR, 2009. ,
Mapping implied body actions in the human motor system, The Journal of Neuroscience, vol.26, pp.7942-7949, 2006. ,
3d people tracking with gaussian process dynamical models, CVPR, 2006. ,
Priors for people tracking from small training setsdeutscher2000articulated, ICCV, 2005. ,
Modeling human locomotion with topologically constrained latent variable models, ICCV Workshop on Human Motion: Understanding, Modeling, Capture and Animation, 2007. ,
Multiple kernels for object detection, ICCV, 2009. ,
Natural scene retrieval based on a semantic modeling step, pp.207-215, 2004. ,
Predicting actions from static scenes, ECCV, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01053935
Patch to the future: Unsupervised visual prediction, CVPR, 2014. ,
Beyond physical connections: Tree models in human pose estimation, CVPR, 2013. ,
Discriminative learning with latent variables for cluttered indoor scene understanding, ECCV, 2010. ,
Action recognition with improved trajectories, ICCV, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00873267
Learning semantic scene models by trajectory analysis, ECCV, 2006. ,
Unsupervised discovery of action classes, CVPR, 2006. ,
Learning motion categories using both semantic and structural information, CVPR, 2007. ,
Recognizing human actions from still images with latent poses, CVPR, 2010. ,
Articulated pose estimation using flexible mixtures of parts, CVPR, 2011. ,
Articulated human detection with flexible mixtures of parts, 2013. ,
Grouplet: A structured image representation for recognizing human and object interactions, CVPR, 2010. ,
Modeling mutual context of object and human pose in human-object interaction activities, CVPR, 2010. ,
Human action recognition by learning bases of action attributes and parts, ICCV, 2011. ,
Combining randomization and discrimination for fine-grained image categorization, CVPR, 2011. ,
Learning structural svms with latent variables, ICML, 2009. ,
DOI : 10.1145/1553374.1553523
URL : http://www.cs.cornell.edu/~cnyu/papers/siso_workshop.pdf
A data-driven approach for event prediction, ECCV, 2010. ,
DOI : 10.1007/978-3-642-15552-9_51
URL : http://people.csail.mit.edu/torralba/publications/evt_pred_eccv2010.pdf
Local features and kernels for classification of texture and object categories: a comprehensive study, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00548574
Image parsing with stochastic scene grammar, NIPS, 2011. ,