. Fouhey, This seems to be a good prior to build on for 3D object boxes, supporting planes and occlusion boundaries estimation in realistic images of cluttered indoor scenes, locally about objects in rooms before making any decision about the global interpretation. This has been pointed out by, 2004.

A. , What is an object, CVPR, 2010.

. Andriluka, Pictorial structures revisited: People detection and articulated pose estimation, CVPR, 2009.
DOI : 10.1109/cvprw.2009.5206754

. Andriluka, Monocular 3D pose estimation and tracking by detection, CVPR, 2010.
DOI : 10.1109/cvpr.2010.5540156

[. Bach, Action comprehension: deriving spatial and functional relations, Journal of Experimental Psychology: Human Perception and Performance, vol.31, p.465, 2005.
DOI : 10.1037/0096-1523.31.3.465

URL : https://pearl.plymouth.ac.uk/bitstream/10026.1/1023/2/2005%20Spatial%20and%20Functional.pdf

[. Bao, Toward coherent object detection and scene layout understanding, Image and Vision Computing, vol.29, pp.569-579, 2011.
DOI : 10.1016/j.imavis.2011.08.001

[. Barinova, Geometric image parsing in man-made environments, ECCV, 2010.

D. ;. Bobick, A. Bobick, and J. Davis, The recognition of human movement using temporal templates, 2001.

[. Bosch, Representing shape with a spatial pyramid kernel, Proc. CIVR, 2007.
DOI : 10.1145/1282280.1282340

[. Bourdev, Detecting people using mutually consistent poselet activations, ECCV, 2010.
DOI : 10.1007/978-3-642-15567-3_13

M. ;. Bourdev, L. Bourdev, and J. Malik, Poselets: Body part detectors trained using 3D human pose annotations, ICCV, 2009.
DOI : 10.1109/iccv.2009.5459303

[. Brox, Object segmentation by alignment of poselet activations to image contours, CVPR, 2011.

M. Bub, D. Bub, and M. Masson, Gestural knowledge evoked by objects as part of conceptual representations, Aphasiology, vol.20, pp.1112-1124, 2006.
DOI : 10.1080/02687030600741667

URL : http://web.uvic.ca/psyc/masson/BM06.pdf

M. ;. Chao, L. L. Chao, and A. Martin, Representation of manipulable man-made objects in the dorsal stream, Neuroimage, vol.12, pp.478-484, 2000.

[. Chao, Layout estimation of highly cluttered indoor scenes using geometric and semantic cues, ICIAP, 2013.
DOI : 10.1007/978-3-642-41184-7_50

URL : http://www.eecs.umich.edu/vision/IndoorHumanActivity/chao_iciap2013.pdf

[. Choi, Understanding indoor scenes using 3d geometric phrases, CVPR, 2013.
DOI : 10.1109/cvpr.2013.12

C. Liu-;-chow and C. Liu, Approximating discrete probability distributions with dependence trees. Information Theory, IEEE Transactions on, vol.14, pp.462-467, 1968.

[. Csurka, Visual categorization with bags of keypoints, 2004.

N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, CVPR, 2005.
DOI : 10.1109/cvpr.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

[. Dantone, Human pose estimation using body parts dependent joint regressors, CVPR, 2013.
DOI : 10.1109/cvpr.2013.391

URL : http://files.is.tue.mpg.de/jgall/download/jgall_humanpose2d_cvpr13.pdf

[. Dean, Fast, accurate detection of 100,000 object classes on a single machine, CVPR, 2013.
DOI : 10.1109/cvpr.2013.237

URL : http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40814.pdf

D. Pero, Bayesian geometric modeling of indoor scenes, CVPR, 2012.

D. Pero, Sampling bedrooms, CVPR, 2011.

[. Delaitre, Recognizing human actions in still images: a study of bag-of-features and partbased representations, Proc. BMVC. updated version, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01060885

[. Delaitre, Willow actions database, 2010.

[. Delaitre, Learning person-object interactions for action recognition in still images, NIPS, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00648156

R. Desai, C. Desai, and D. Ramanan, Detecting actions, poses, and objects with relational phraselets, ECCV, 2012.
DOI : 10.1007/978-3-642-33765-9_12

[. Desai, Discriminative models for static human-object interactions, CVPR, 2010.
DOI : 10.1109/cvprw.2010.5543176

[. Deutscher, Articulated body motion capture by annealed particle filtering, CVPR, 2000.
DOI : 10.1109/cvpr.2000.854758

URL : http://cs.gmu.edu/~zduric/it835/Papers/cvpr2000-deutscher.pdf

[. Deutscher, Tracking through singularities and discontinuities by random sampling, ICCV, 1999.
DOI : 10.1109/iccv.1999.790409

[. Doersch, Mid-level visual element discovery as discriminative mode seeking, NIPS, 2013.

[. Doersch, , 2012.

[. Dollár, Behavior recognition via sparse spatio-temporal features, ICCV Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005.

L. ;. Elgammal, A. Elgammal, and C. Lee, Inferring 3d body pose from silhouettes using activity manifold learning, CVPR, 2004.
DOI : 10.1109/cvpr.2004.1315230

[. Everingham, The PASCAL Visual Object Classes Challenge, 2007.
DOI : 10.1007/11736790_8

URL : https://hal.archives-ouvertes.fr/inria-00548597

[. Everingham, The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results, 2010.
DOI : 10.1007/s11263-014-0733-5

URL : https://www.pure.ed.ac.uk/ws/files/20017166/ijcv_voc14.pdf

[. Everingham, The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results, 2012.
DOI : 10.1007/s11263-014-0733-5

URL : https://www.pure.ed.ac.uk/ws/files/20017166/ijcv_voc14.pdf

[. Fathi, Learning to recognize objects in egocentric activities, CVPR, 2011.
DOI : 10.1109/cvpr.2011.5995444

-. Fei, L. , L. Li, and L. , What, where and who? telling the story of an image by activity classification, scene recognition and object categorization, Computer Vision, pp.157-171, 2010.

. Fei-fei, L. Perona-;-fei-fei, and P. Perona, A bayesian hierarchical model for learning natural scene categories, CVPR, 2005.

P. Felzenszwalb, Learning models for object recognition, CVPR, 2001.

[. Felzenszwalb, Object detection with discriminatively trained part based models, 2009.

. Felzenszwalb, P. Huttenlocher-;-felzenszwalb, and D. Huttenlocher, Distance transforms of sampled functions, 1963.

. Felzenszwalb, P. Huttenlocher-;-felzenszwalb, and D. Huttenlocher, Pictorial structures for object recognition, 2005.

. Felzenszwalb, P. Huttenlocher-;-felzenszwalb, and D. P. Huttenlocher, Efficient matching of pictorial structures, CVPR, 2000.

[. Felzenszwalb, Discriminatively trained deformable part models, 2008.

[. Felzenszwalb, A discriminatively trained, multiscale, deformable part model, CVPR, 2008.

[. Ferrari, Progressive search space reduction for human pose estimation, CVPR, 2008.

[. Ferrari, Progressive search space reduction for human pose estimation, 2008.

E. Fischler, M. A. Fischler, and R. A. Elschlager, The representation and matching of pictorial structures, IEEE Transactions on Computer, vol.22, pp.67-92, 1973.

. Fouhey, People watching: Human actions as a cue for single-view geometry, ECCV, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01066257

. Fouhey, Data-driven 3d primitives for single image understanding, ICCV, 2013.

. Fouhey, Unfolding an indoor origami world, ECCV, 2014.

Y. Freund and R. Schapire, A decision theoretic generalisation of online learning, Computer and System Sciences, vol.55, pp.119-139, 1997.

[. Gall, Functional categorization of objects using real-time markerless motion capture, CVPR, 2011.

[. Gallese, Action recognition in the premotor cortex, Brain, vol.119, pp.593-609, 1996.

. Gallese, V. Goldman-;-gallese, and A. Goldman, Mirror neurons and the simulation theory of mind-reading, Trends in cognitive sciences, vol.2, pp.493-501, 1998.

D. M. Gavrila, Pedestrian detection from a moving vehicle, ECCV, 2000.

[. Geiger, Joint 3d estimation of objects and scene layout, NIPS, 2011.

J. Gibson-;-gibson, The ecological approach to visual perception, 1979.

[. Girshick, Object detection with grammar models, NIPS, 2011.

[. Gkioxari, Articulated pose estimation using discriminative armlet classifiers, CVPR, 2013.

[. Gordon, Novel approach to nonlinear/non-gaussian bayesian state estimation, IEE Proceedings F (Radar and Signal Processing, 1993.

[. Gorelick, Actions as space-time shapes, 2007.

[. Grabner, What makes a chair a chair, CVPR, 2011.

[. Grochow, Style-based inverse kinematics, SIGGRAPH, 2004.

[. Gupta, Context and observation driven latent variable model for human pose estimation, CVPR, 2008.

[. Gupta, Blocks world revisited: Image understanding using qualitative geometry and mechanics, ECCV, 2010.

[. Gupta, Observing humanobject interactions: Using spatial and functional compatibility for recognition, 2009.

[. Gupta, Constraint integration for efficient multiview pose estimation with self-occlusions, vol.30, pp.493-506, 2008.

[. Gupta, From 3D scene geometry to human workspace, CVPR, 2011.

[. Hall, Object recognition using coloured receptive fields, ECCV, 2000.

C. Hara, K. Hara, and R. Chellappa, Computationally efficient regression on a dependency graph for human pose estimation, CVPR, 2013.

[. Harzallah, Combining efficient object localization and image classification, ICCV, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00439516

[. Hastie, The Elements of Statistical Learning, 2003.

[. Hedau, Recovering the spatial layout of cluttered rooms, ICCV, 2009.

[. Hedau, Thinking inside the box: Using appearance models and context based on room geometry, ECCV, 2010.

[. Hedau, Recovering free space of indoor scenes from a single image, CVPR, 2012.

. Helbig, The role of action representations in visual object recognition, Experimental Brain Research, vol.174, pp.221-228, 2006.

[. Hoiem, Geometric context from a single image, ICCV, 2005.

[. Hou, Real-time body tracking using a gaussian process latent variable model, ICCV, 2007.
DOI : 10.1109/iccv.2007.4408946

URL : http://www.cs.man.ac.uk/~agalata/publications/tracking_iccv07.pdf

. Ikizler, Recognizing actions from still images, Proc. ICPR, 2008.
DOI : 10.1109/icpr.2008.4761663

URL : http://www.cs.bilkent.edu.tr/~duygulu/papers/ICPR2008-ActionStill.pdf

. Ikizler, Learning actions from the Web, ICCV, 2009.

. Ioffe, S. Forsyth-;-ioffe, and D. A. Forsyth, Probabilistic methods for finding people, 2001.

B. ;. Isard, M. Isard, and A. Blake, Condensation-conditional density propagation for visual tracking, IJCV, vol.29, pp.5-28, 1998.

[. Jabri, Detection and location of people in video images using adaptive fusion of color and edge information, Proc. ICPR, 2000.

[. Jhuang, A biologically inspired system for action recognition, ICCV, 2007.
DOI : 10.1109/iccv.2007.4408988

URL : http://www.cs.tau.ac.il/~wolf/papers/action151.pdf

S. Johnson, Leeds sports pose dataset, 2010.

E. Johnson, S. Johnson, and M. Everingham, Clustered pose and nonlinear appearance models for human pose estimation, Proc. BMVC, 2010.
DOI : 10.5244/c.24.12

J. , E. , S. Everingham, and M. , Learning effective human pose estimation from inaccurate annotation, CVPR, 2011.

. Johnson-frey, Actions or hand-object interactions? human inferior frontal cortex and action observation, Neuron, vol.39, pp.1053-1058, 2003.
DOI : 10.1016/s0896-6273(03)00524-5

URL : https://doi.org/10.1016/s0896-6273(03)00524-5

[. Kanaujia, Spectral latent variable models for perceptual inference, ICCV, 2007.
DOI : 10.1109/iccv.2007.4408845

[. Khan, Geometry driven semantic labeling of indoor scenes, ECCV, 2014.
DOI : 10.1007/978-3-319-10590-1_44

[. Kitani, Activity forecasting, ECCV, 2012.

[. Kjellstrom, Simultaneous visual recognition of manipulation actions and manipulated objects, ECCV, 2008.

[. Kohli, P. Torr-;-kohli, and P. Torr, Robust higher order potentials for enforcing label consistency, IJCV, vol.82, pp.302-324, 2009.
DOI : 10.1007/s11263-008-0202-0

URL : https://radar.brookes.ac.uk/radar/file/e70ce935-9726-a58f-650e-e4ffae4709d8/1/kohli2009robust.pdf

Z. Kourtzi, But still, it moves, Trends in cognitive sciences, vol.8, pp.47-49, 2004.

[. Kourtzi, Z. Kanwisher-;-kourtzi, and N. Kanwisher, Activation in human mt/mst by static images with implied motion, Journal of cognitive neuroscience, vol.12, pp.48-55, 2000.
DOI : 10.1162/08989290051137594

URL : http://web.mit.edu/bcs/nklab/media/pdfs/KourtziKanwisherJOCN00.pdf

. Krahnstoever, N. Mendonca-;-krahnstoever, and P. R. Mendonca, Bayesian autocalibration for surveillance, ICCV, 2005.
DOI : 10.1109/iccv.2005.44

[. Kuehne, HMDB: a large video database for human motion recognition, ICCV, 2011.
DOI : 10.1109/iccv.2011.6126543

I. Laptev, Modeling and visual recognition of human actions and interactions. Habilitation à diriger des recherches en mathématiques et en informatique, 2013.
URL : https://hal.archives-ouvertes.fr/tel-01064540

L. Laptev, I. Laptev, and T. Lindeberg, Space-time interest points, ICCV, 2003.

[. Laptev, Learning realistic human actions from movies, CVPR, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00548659

P. Laptev, I. Laptev, and P. Pérez, Retrieving actions in movies, ICCV, 2007.

D. N. Lawrence, Gaussian process latent variable models for visualisation of high dimensional data, NIPS, 2003.

[. Lazebnik, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, CVPR, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00548585

[. Lee, Geometric reasoning for single image structure recovery, ICCV, 2009.

[. Li, Object bank: A highlevel image representation for scene classification and semantic feature sparsification, NIPS, 2010.

[. Li, F. Li, L. J. Fei-fei, and L. , What, where and who? Classifying events by scene and object recognition, ICCV, 2007.

D. Lowe-;-lowe, Object recognition from local scale-invariant features, ICCV, 1999.

D. Lowe-;-lowe, Distinctive image features from scale-invariant keypoints, IJCV, vol.60, pp.91-110, 2004.

[. Maji, Action recognition from a distributed representation of pose and appearance, CVPR, 2011.

[. Marszalek, Actions in context, CVPR, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00548645

[. Mikolajczyk, Human detection based on a probabilistic assembly of robust part detectors, ECCV, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00548537

. Mohan, Example-based object detection in images by components, 2001.

. Moore, Exploiting human actions and object context for recognition tasks, ICCV, 1999.

. Nelissen, Observing others: multiple action representation in the frontal lobe, Science, vol.310, pp.332-336, 2005.

. Niebles, Unsupervised learning of human action categories using spatial-temporal words, 2008.

[. Ohta, An analysis system for scenes containing objects with substructures, Proceedings of the Fourth International Joint Conference on Pattern Recognitions, 1978.

T. Oliva, A. Oliva, and A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, IJCV, vol.42, pp.145-175, 2001.

[. Oquab, Learning and transferring mid-level image representations using convolutional neural networks, CVPR, 2014.
DOI : 10.1109/cvpr.2014.222

URL : https://hal.archives-ouvertes.fr/hal-00911179

[. Oren, Pedestrian detection using wavelet templates, CVPR, 1997.
DOI : 10.1109/cvpr.1997.609319

S. E. Palmer, Vision science: photons to phenomenology, 1999.

L. Pandey, M. Pandey, and S. Lazebnik, Scene recognition and weakly supervised object localization with deformable part-based models, ICCV, 2011.
DOI : 10.1109/iccv.2011.6126383

URL : http://www.cs.unc.edu/~lazebnik/publications/megha_iccv2011.pdf

P. Papageorgiou, C. Papageorgiou, and T. Poggio, A trainable system for object detection, 2000.

T. Payet, N. Payet, and S. Todorovic, Scene shape from texture of objects, CVPR, 2011.
DOI : 10.1109/cvpr.2011.5995326

URL : http://web.engr.oregonstate.edu/%7Esinisa/research/publications/cvpr11_sceneLayout.pdf

[. Peursum, Combining image regions and human activity for indirect object recognition in indoor wide-angle views, ICCV, 2005.
DOI : 10.1109/iccv.2005.57

URL : http://dro.deakin.edu.au/eserv/DU:30044614/venkatesh-combiningimage-2005.pdf

. Pishchulin, Poselet conditioned pictorial structures, CVPR, 2013.
DOI : 10.1109/cvpr.2013.82

URL : http://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Pishchulin_Poselet_Conditioned_Pictorial_2013_CVPR_paper.pdf

. Pishchulin, Strong appearance and expressive spatial models for human pose estimation, ICCV, 2013.
DOI : 10.1109/iccv.2013.433

[. Prest, Weakly supervised learning of interactions between humans and objects, 2011.
DOI : 10.1109/tpami.2011.158

URL : https://hal.archives-ouvertes.fr/inria-00516477

T. Quattoni, A. Quattoni, and A. Torralba, Recognizing indoor scenes, CVPR, 2009.
DOI : 10.1109/cvprw.2009.5206537

URL : http://people.csail.mit.edu/torralba/publications/indoor.pdf

D. Ramanan, Learning to parse images of articulated bodies, NIPS, 2006.

[. Ramanan, Strike a pose: Tracking people by finding stylized poses, CVPR, 2005.
DOI : 10.1109/cvpr.2005.335

URL : http://nma.berkeley.edu/ark:/28722/bk0005s5c36

[. Rodriguez, Density-aware person detection and tracking in crowds, ICCV, 2011.
DOI : 10.1109/iccv.2011.6126526

URL : https://hal.archives-ouvertes.fr/hal-00654266

[. Rother, What can casual walkers tell us about the 3D scene, CVPR, 2007.
DOI : 10.1109/iccv.2007.4409082

S. T. Roweis and L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, vol.290, pp.2323-2326, 2000.
DOI : 10.1126/science.290.5500.2323

[. Russakovsky, Imagenet: Large scale visual recognition challenge, 2014.
DOI : 10.1007/s11263-015-0816-y

URL : http://arxiv.org/pdf/1409.0575

T. Sapp, B. Sapp, and B. Taskar, MODEC: Multimodal decomposable models for human pose estimation, CVPR, 2013.
DOI : 10.1109/cvpr.2013.471

[. Sapp, Cascaded models for articulated pose estimation, ECCV, 2010.
DOI : 10.1007/978-3-642-15552-9_30

URL : https://repository.upenn.edu/cgi/viewcontent.cgi?article=1577&context=cis_papers

[. Saptharishi, , 2000.

, Agent-based moving object correspondence using differential discriminative diagnosis, CVPR

. Satkin, Data-driven scene understanding from 3D models, Proc. BMVC, 2012.
DOI : 10.5244/c.26.128

URL : http://www.bmva.org/bmvc/2012/BMVC/paper128/paper128.pdf

B. Schölkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, 2002.

. Schuldt, Recognizing human actions: a local svm approach, Proc. ICPR, 2004.
DOI : 10.1109/icpr.2004.1334462

. Schwing, Efficient structured prediction for 3D indoor scene understanding, CVPR, 2012.
DOI : 10.1109/cvpr.2012.6248006

[. Shotton, Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation, ECCV, 2006.
DOI : 10.1007/11744023_1

[. Sidenbladh, Stochastic tracking of 3d human figures using 2d image motion, ECCV, 2000.
DOI : 10.1007/3-540-45053-x_45

. Sigal, Tracking loose-limbed people, CVPR, 2004.
DOI : 10.1109/cvpr.2004.1315063

[. Silberman, Instance segmentation of indoor scenes using a coverage loss, ECCV, 2014.
DOI : 10.1007/978-3-319-10590-1_40

K. Simonyan-and-zisserman-;-simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS, 2014.

[. Singh, Unsupervised discovery of mid-level discriminative patches, ECCV, 2012.
DOI : 10.1007/978-3-642-33709-3_6

URL : http://arxiv.org/pdf/1205.3137

Z. Sivic, J. Sivic, and A. Zisserman, Video Google: A text retrieval approach to object matching in videos, ICCV, 2003.
DOI : 10.1109/iccv.2003.1238663

[. Soomro, UCF101: A dataset of 101 human actions classes from videos in the wild, ICVS, 2008.

C. Staufer and W. Grimson, Adaptive background mixture models for real-time tracking, CVPR, 1998.

P. Sun, J. Sun, and J. Ponce, Learning discriminative part detectors for image classification and cosegmentation, ICCV, 2013.
DOI : 10.1109/iccv.2013.422

URL : https://hal.archives-ouvertes.fr/hal-00932380

. Tenenbaum, A global geometric framework for nonlinear dimensionality reduction, Science, vol.290, pp.2319-2323, 2000.
DOI : 10.1126/science.290.5500.2319

L. Tighe, J. Tighe, and S. Lazebnik, Superparsing: scalable nonparametric image parsing with superpixels, ECCV, 2010.
DOI : 10.1007/978-3-642-15555-0_26

URL : http://www.cs.unc.edu/%7Elazebnik/publications/eccv10-jtighe.pdf

. Tompson, Joint training of a convolutional network and a graphical model for human pose estimation, CoRR, 2014.

. Toshev, A. Szegedy-;-toshev, and C. Szegedy, Deeppose: Human pose estimation via deep neural networks, CVPR, 2014.
DOI : 10.1109/cvpr.2014.214

URL : http://arxiv.org/pdf/1312.4659

[. Turek, Unsupervised learning of functional categories in video scenes, ECCV, 2010.
DOI : 10.1007/978-3-642-15552-9_48

[. Uijlings, What is the spatial extent of an object, CVPR, 2009.

[. Urgesi, Mapping implied body actions in the human motor system, The Journal of Neuroscience, vol.26, pp.7942-7949, 2006.

[. Urtasun, 3d people tracking with gaussian process dynamical models, CVPR, 2006.

[. Urtasun, Priors for people tracking from small training setsdeutscher2000articulated, ICCV, 2005.

[. Urtasun, Modeling human locomotion with topologically constrained latent variable models, ICCV Workshop on Human Motion: Understanding, Modeling, Capture and Animation, 2007.

[. Vedaldi, Multiple kernels for object detection, ICCV, 2009.

J. Vogel and B. Schiele, Natural scene retrieval based on a semantic modeling step, pp.207-215, 2004.

[. Vu, Predicting actions from static scenes, ECCV, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01053935

[. Walker, Patch to the future: Unsupervised visual prediction, CVPR, 2014.

L. Wang, F. Wang, and Y. Li, Beyond physical connections: Tree models in human pose estimation, CVPR, 2013.

[. Wang, Discriminative learning with latent variables for cluttered indoor scene understanding, ECCV, 2010.

S. Wang, H. Wang, and C. Schmid, Action recognition with improved trajectories, ICCV, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00873267

[. Wang, Learning semantic scene models by trajectory analysis, ECCV, 2006.

[. Wang, Unsupervised discovery of action classes, CVPR, 2006.

[. Wong, Learning motion categories using both semantic and structural information, CVPR, 2007.

Y. , Recognizing human actions from still images with latent poses, CVPR, 2010.

Y. , R. Yang, Y. Ramanan, and D. , Articulated pose estimation using flexible mixtures of parts, CVPR, 2011.

Y. , R. Yang, Y. Ramanan, and D. , Articulated human detection with flexible mixtures of parts, 2013.

[. Yao, F. Yao, B. Fei-fei, and L. , Grouplet: A structured image representation for recognizing human and object interactions, CVPR, 2010.

[. Yao, F. Yao, B. Fei-fei, and L. , Modeling mutual context of object and human pose in human-object interaction activities, CVPR, 2010.

[. Yao, Human action recognition by learning bases of action attributes and parts, ICCV, 2011.

[. Yao, Combining randomization and discrimination for fine-grained image categorization, CVPR, 2011.

C. Yu and T. Joachims, Learning structural svms with latent variables, ICML, 2009.
DOI : 10.1145/1553374.1553523

URL : http://www.cs.cornell.edu/~cnyu/papers/siso_workshop.pdf

T. Yuen, J. Yuen, and A. Torralba, A data-driven approach for event prediction, ECCV, 2010.
DOI : 10.1007/978-3-642-15552-9_51

URL : http://people.csail.mit.edu/torralba/publications/evt_pred_eccv2010.pdf

[. Zhang, Local features and kernels for classification of texture and object categories: a comprehensive study, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00548574

Z. Zhao, Y. Zhao, and S. Zhu, Image parsing with stochastic scene grammar, NIPS, 2011.