T. Building-human, 151 8.3.1 Human detector, p.154

.. Weakly-supervised-human-tube-classier, 156 8.4.2 Multi-fold multiple instance learning Temporal supervision and detection, p.160

. Papandreou, Recently, signicant progress Incorporating segmentation Another useful cue consists in segmenting the humans In particular, this will allow to focus on features from trajectories that belong to the human, whereas boxes also contain background Jhuang et al. [2013] have shown that the human segmentation helps the action recognition task. Human segmentation can be obtained without additional annotation: recent works Kolesnikov and Lampert, 2016] show that reasonable segmentation performance can be obtained in a weakly-supervised setting. CNNs are learned using an estimation of the ground-truth segmentation based on the current estimate and priors such as the image or video labels, 2015.

I. Conferences, @. P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, DeepFlow: Large displacement optical ow with DeepMatching, Proceedings of the IEEE International Conference on Computer Vision (ICCV) 2013

@. J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid, EpicFlow: Edge-preserving interpolation of correspondences for optical flow, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI : 10.1109/CVPR.2015.7298720

URL : https://hal.archives-ouvertes.fr/hal-01142656

@. P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, Learning to detect Motion Boundaries, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI : 10.1109/CVPR.2015.7298873

URL : https://hal.archives-ouvertes.fr/hal-01142653

@. P. Weinzaepfel, Z. Harchaoui, and C. Schmid, Learning to Track for Spatio-Temporal Action Localization, 2015 IEEE International Conference on Computer Vision (ICCV)
DOI : 10.1109/ICCV.2015.362

URL : https://hal.archives-ouvertes.fr/hal-01159941

. Adiv, Determining three-dimensional motion and structure from optical ow generated by several moving objects, IEEE Trans. PAMI, p.25, 1985.
DOI : 10.1109/tpami.1985.4767678

J. Aggarwal and M. Ryoo, Human activity analysis, ACM Computing Surveys, vol.43, issue.3, p.118, 2011.
DOI : 10.1145/1922649.1922653

M. Aghaei, P. Dimiccoli, and . Radeva, Multi-face tracking by extended bag-of-tracklets in egocentric videos, Computer Vision and Image Understanding, p.167, 2015.

. Anandan, A computational framework and an algorithm for the measurement of visual motion, International Journal of Computer Vision, vol.27, issue.4, p.25, 1989.
DOI : 10.1007/BF00158167

P. Anandan and R. Weiss, Introducing a smoothness constraint in a matching approach for the computation of displacement elds, Image Understanding Workshop, p.25, 1985.

M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, 2D Human Pose Estimation: New Benchmark and State of the Art Analysis, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.147-152, 2014.
DOI : 10.1109/CVPR.2014.471

M. Arbelaez, C. Maire, J. Fowlkes, and . Malik, Contour Detection and Hierarchical Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.5, pp.89-99, 2011.
DOI : 10.1109/TPAMI.2010.161

C. Bailer, B. Taetz, and D. Stricker, Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation, 2015 IEEE International Conference on Computer Vision (ICCV)
DOI : 10.1109/ICCV.2015.457

URL : http://arxiv.org/abs/1508.05151

S. Baker and I. Matthews, Lucas-Kanade 20 Years On: A Unifying Framework, International Journal of Computer Vision, vol.56, issue.3, p.27, 2004.
DOI : 10.1023/B:VISI.0000011205.11775.fd

S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black et al., A database and evaluation methodology for optical ow, IJCV, vol.32, issue.8, p.77, 2011.

C. Barnes, E. Shechtman, D. B. Goldman, and A. Finkelstein, The Generalized PatchMatch Correspondence Algorithm, ECCV, pp.37-63, 2010.
DOI : 10.1007/978-3-642-15558-1_3

H. Bay, T. Tuytelaars, and L. Van-gool, Surf: Speeded up robust features, ECCV, p.120, 2006.
DOI : 10.1007/11744023_32

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.679.3046

P. R. Beaudet, Rotationally invariant image operators, International Joint Conference on Pattern Recognition, p.119, 1978.

Y. Bengio, Learning deep architectures for AI. Foundations and Trends in Machine Learning, p.45, 2009.

S. T. Bircheld, Depth and motion discontinuities, p.97, 1999.

M. J. Black, Robust dynamic motion estimation over time, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.25, 1991.
DOI : 10.1109/CVPR.1991.139705

J. Black and P. Anandan, The robust estimation of multiple motions: parametric and piecewise-smooth ow elds, Computer Vision and Image Understanding, vol.25, p.27, 1996.

M. J. Black and D. J. Fleet, Probabilistic detection and tracking of motion boundaries, IJCV, vol.97, p.98, 2000.

A. F. Bobick and J. W. Davis, The recognition of human movement using temporal templates, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, issue.3, 2001.
DOI : 10.1109/34.910878

P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce et al., Weakly Supervised Action Labeling in Videos under Ordering Constraints, ECCV, p.172, 2014.
DOI : 10.1007/978-3-319-10602-1_41

URL : https://hal.archives-ouvertes.fr/hal-01053967

H. Boyraz, S. Z. Masood, B. Liu, M. Tappen, and H. Foroosh, Action Recognition by Weakly-Supervised Discriminative Region Localization, Proceedings of the British Machine Vision Conference 2014, p.124, 2014.
DOI : 10.5244/C.28.111

M. Brand, Shadow puppetry, Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999.
DOI : 10.1109/ICCV.1999.790422

J. Braux-zin, R. Dupont, and A. Bartoli, A General Dense Image Matching Framework Combining Direct and Feature-Based Costs, 2013 IEEE International Conference on Computer Vision, pp.74-94
DOI : 10.1109/ICCV.2013.30

T. Brox and J. Malik, Object Segmentation by Long Term Analysis of Point Trajectories, ECCV, p.119, 2010.
DOI : 10.1007/978-3-642-15555-0_21

T. Brox and J. Malik, Large displacement optical ow: descriptor matching in variational motion estimation, IEEE Trans. PAMI, vol.30, issue.110, pp.76-78, 2011.

A. Brox, N. Bruhn, J. Papenberg, and . Weickert, High accuracy optical ow estimation based on a theory for warping, ECCV, pp.56-85, 0198.

T. Brox, A. Bruhn, and J. Weickert, Variational Motion Segmentation with Level Sets, ECCV, p.99, 2006.
DOI : 10.1007/11744023_37

A. Bruhn, J. Weickert, C. Feddern, T. Kohlberger, and C. Schnörr, Variational optical ow computation in real time, IEEE Trans. on Image Processing, vol.8, p.55, 2005.

A. Bruhn, J. Weickert, and C. Schnörr, Lucas/kanade meets horn/schunck: Combining local and global optic ow methods. IJCV, p.26, 2005.
DOI : 10.1023/b:visi.0000045324.43199.43

A. Bruhn, J. Weickert, T. Kohlberger, and C. Schnörr, A Multigrid Platform for Real-Time Motion Computation with Discontinuity-Preserving Variational Methods, International Journal of Computer Vision, vol.44, issue.2, p.25, 2006.
DOI : 10.1007/s11263-006-6616-7

J. Burt, C. Yen, and X. Xu, Multiresolution ow-through motion analysis, CVPR, p.22, 1983.

J. Butler, J. Wul, G. B. Stanley, and M. J. Black, A naturalistic open source movie for optical ow evaluation, ECCV, 2012. 3, pp.32-77

L. W. Campbell, D. A. Becker, A. Azarbayejani, A. F. Bobick, and A. Pentland, Invariant features for 3-D gesture recognition, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, 1996.
DOI : 10.1109/AFGR.1996.557258

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.9145

J. Canny, A computational approach to edge detection, IEEE Trans. PAMI, vol.89, p.91, 1986.

L. Cao, Z. Liu, and T. S. Huang, Cross-dataset action detection, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.126, 2010.
DOI : 10.1109/CVPR.2010.5539875

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected crfs, ICLR, p.121, 2015.

O. Demetz, M. Stoll, S. Volz, J. Weickert, and A. Bruhn, Learning brightness transfer functions for the joint recovery of illumination changes and optical ow, ECCV, pp.74-94, 2014.

P. Dollár and C. L. Zitnick, Structured Forests for Fast Edge Detection, 2013 IEEE International Conference on Computer Vision, pp.91-98, 2013.
DOI : 10.1109/ICCV.2013.231

P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, Behavior Recognition via Sparse Spatio-Temporal Features, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005.
DOI : 10.1109/VSPETS.2005.1570899

L. A. Donahue, S. Hendricks, M. Guadarrama, S. Rohrbach, K. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, CVPR, p.121, 2015.

A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas et al., Flownet: Learning optical ow with convolutional networks, ICCV, p.171, 2015.
DOI : 10.1109/iccv.2015.316

B. Drayer and T. Brox, Combinatorial regularization of descriptor matching for optical ow estimation, BMVC, p.171, 2015.

I. Duchenne, J. Laptev, F. Sivic, J. Bach, and . Ponce, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, p.173, 2009.
DOI : 10.1109/ICCV.2009.5459279

A. Ecker and S. Ullman, A hierarchical non-parametric method for capturing non-rigid deformations, Image and Vision Computing, p.37, 2009.
DOI : 10.1109/crv.2005.6

G. Farnebäck, Two-Frame Motion Estimation Based on Polynomial Expansion, Proceedings of the 13th Scandinavian conference on Image analysis, p.120, 2003.
DOI : 10.1007/3-540-45103-X_50

P. Felzenszwalb, R. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, p.122, 2010.
DOI : 10.1109/TPAMI.2009.167

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.2745

C. L. Fennema and W. B. Thompson, Velocity determination in scenes containing several moving objects, Computer Graphics and Image Processing, vol.9, issue.4, 1979.
DOI : 10.1016/0146-664X(79)90097-2

B. Fernando, E. Gavves, J. M. Oramas, A. Ghodrati, and T. Tuytelaars, Modeling video evolution for action recognition, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7299176

J. Fleet, M. J. Black, Y. Yacoob, and A. D. Jepson, Design and use of linear models for image motion analysis, p.98, 2000.

L. Fletcher, L. Petersson, and A. Zelinsky, Driver assistance systems based on vision in and out of vehicles, IEEE IV2003 Intelligent Vehicles Symposium. Proceedings (Cat. No.03TH8683), 2003.
DOI : 10.1109/IVS.2003.1212930

P. Fortun, C. Bouthemy, and . Kervrann, Aggregation of local parametric candidates with exemplar-based occlusion handling for optical ow, Computer Vision and Image Understanding, p.170, 2015.

Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, Towards Internet-scale multi-view stereo, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.29, 2010.
DOI : 10.1109/CVPR.2010.5539802

A. Gaidon, Z. Harchaoui, and C. Schmid, Temporal Localization of Actions with Actoms, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.11, p.173, 2013.
DOI : 10.1109/TPAMI.2013.65

URL : https://hal.archives-ouvertes.fr/hal-00687312

J. Gall, N. Razavi, and L. Van-gool, On-line adaption of class-specic codebooks for instance tracking, BMVC, p.135, 2010.

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, Vision meets robotics: The KITTI dataset, The International Journal of Robotics Research, vol.32, issue.11, p.77, 2013.
DOI : 10.1177/0278364913491297

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.650.8155

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, p.132
DOI : 10.1109/CVPR.2014.81

A. Giusti, D. C. Ciresan, J. Masci, L. M. Gambardella, and J. Schmidhuber, Fast image scanning with deep max-pooling convolutional neural networks, 2013 IEEE International Conference on Image Processing, p.134, 2013.
DOI : 10.1109/ICIP.2013.6738831

G. Gkioxari and J. Malik, Finding action tubes, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.147-149, 2015.
DOI : 10.1109/CVPR.2015.7298676

P. Golland and A. M. Bruckstein, Motion from Color, Computer Vision and Image Understanding, vol.68, issue.3, p.22, 1997.
DOI : 10.1006/cviu.1997.0553

L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, Actions as Space-Time Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.12, 2007.
DOI : 10.1109/TPAMI.2007.70711

Y. Hacohen, E. Shechtman, D. B. Goldman, and D. Lischinski, Non-rigid dense correspondence with applications for image enhancement, pp.30-53, 2011.

D. Hafner, O. Demetz, and J. Weickert, Scale Space and Variational Methods in Computer Vision, chapter Why Is the Census Transform Good for Robust Optic Flow Computation?, p.22, 2013.

S. Hare, A. Saari, and P. Torr, Struck: Structured output tracking with kernels, ICCV, pp.129-149, 2011.
DOI : 10.1109/iccv.2011.6126251

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.5858

C. Harris and M. Stephens, A Combined Corner and Edge Detector, Procedings of the Alvey Vision Conference 1988, p.119, 1988.
DOI : 10.5244/C.2.23

R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, pp.521540518-81, 2003.
DOI : 10.1017/CBO9780511811685

T. Hassner, V. Mayzels, and L. , Zelnik-Manor. On sifts and their scales, CVPR, p.65, 2012.

K. He and J. Sun, Computing nearest-neighbor elds via propagationassisted kd-trees, CVPR, 2012. 79, p.86

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.121, 2016.
DOI : 10.1109/CVPR.2016.90

URL : http://arxiv.org/abs/1512.03385

S. Herath, M. Harandi, and F. Porikli, Going deeper into action recognition: A survey, Image and Vision Computing, vol.60, p.118, 2016.
DOI : 10.1016/j.imavis.2017.01.010

M. Hoai, L. Torresani, F. De-la-torre, and C. Rother, Learning discriminative localization from weakly labeled data, Pattern Recognition, vol.47, issue.3, p.122, 2014.
DOI : 10.1016/j.patcog.2013.09.028

A. Hoiem, M. Efros, and . Hebert, Recovering Occlusion Boundaries from an Image, International Journal of Computer Vision, vol.14, issue.2, p.99, 2011.
DOI : 10.1007/s11263-010-0400-4

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.668

J. Hosang, R. Benenson, P. Dollár, and B. Schiele, What makes for eective detection proposals?, IEEE Trans. PAMI, p.130, 2015.

Y. Hua, K. Alahari, and C. Schmid, Occlusion and Motion Reasoning for Long-Term Tracking, ECCV, p.134, 2014.
DOI : 10.1007/978-3-319-10599-4_12

URL : https://hal.archives-ouvertes.fr/hal-01020149

Y. Hua, K. Alahari, and C. Schmid, Online Object Tracking with Proposal Selection, 2015 IEEE International Conference on Computer Vision (ICCV), p.168, 2015.
DOI : 10.1109/ICCV.2015.354

URL : https://hal.archives-ouvertes.fr/hal-01207196

A. Humayun, O. M. Aodha, and G. J. Brostow, Learning to find occlusion regions, CVPR 2011, p.99, 2011.
DOI : 10.1109/CVPR.2011.5995517

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.473.7843

N. Ikizler-cinbis, R. G. Cinbis, and S. Sclaro, Learning actions from the Web, 2009 IEEE 12th International Conference on Computer Vision, p.119, 2009.
DOI : 10.1109/ICCV.2009.5459368

A. Jain, J. Tompson, Y. Lecun, and C. Bregler, MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation, ACCV, p.167, 2014.
DOI : 10.1007/978-3-319-16808-1_21

M. Jain, J. Gemert, H. Jégou, P. Bouthemy, and C. Snoek, Action Localization with Tubelets from Motion, 2014 IEEE Conference on Computer Vision and Pattern Recognition, p.149
DOI : 10.1109/CVPR.2014.100

URL : https://hal.archives-ouvertes.fr/hal-00996844

R. Jain and H. Nagel, On the analysis of accumulative dierence pictures from image sequences of real world scenes, IEEE Trans. PAMI, issue.7, 1979.

D. Jain, H. Militzer, and . Nagel, Separating non-stationary from stationary scene components in a sequence of real world TV-images, 1977.

H. Jégou, F. Perronnin, M. Douze, J. Sanchez, P. Perez et al., Aggregating Local Image Descriptors into Compact Codes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.9, p.120, 2012.
DOI : 10.1109/TPAMI.2011.235

J. Jhuang, S. Gall, C. Zu, M. J. Schmid, and . Black, Towards Understanding Action Recognition, 2013 IEEE International Conference on Computer Vision, pp.125-174
DOI : 10.1109/ICCV.2013.396

URL : https://hal.archives-ouvertes.fr/hal-00906902

W. Ji, M. Xu, K. Yang, and . Yu, 3D Convolutional Neural Networks for Human Action Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.1, p.121, 2013.
DOI : 10.1109/TPAMI.2012.59

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.169.4046

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Cae: Convolutional architecture for fast feature embedding. arXiv preprint arXiv, pp.1408-5093, 2014.

K. Kalal, J. Mikolajczyk, and . Matas, Face-TLD: Tracking-Learning-Detection applied to faces, 2010 IEEE International Conference on Image Processing, p.135, 2010.
DOI : 10.1109/ICIP.2010.5653525

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.231.4326

Z. Kalal, K. Mikolajczyk, and J. Matas, Tracking-Learning-Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.7, pp.129-149, 2012.
DOI : 10.1109/TPAMI.2011.239

. Fei, Large-scale video classication with convolutional neural networks, CVPR, 2014. 105, p.121

R. Kennedy and C. J. Taylor, Optical ow with geometric occlusion estimation and fusion of multiple frames, EMMCVPR, 2015. 74, pp.75-93

D. Keysers, T. Deselaers, C. Gollan, and H. Ney, Deformation Models for Image Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.8, p.37, 2007.
DOI : 10.1109/TPAMI.2007.1153

A. Khoreva, R. Benenson, F. Galasso, M. Hein, and B. Schiele, Improved image boundaries for better video segmentation, p.168, 2016.

J. Kim, C. Liu, F. Sha, and K. Grauman, Deformable Spatial Pyramid Matching for Fast Dense Correspondences, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.65, 2013.
DOI : 10.1109/CVPR.2013.299

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.362.8285

A. Kläser, M. Marszaªek, and C. Schmid, A Spatio-Temporal Descriptor Based on 3D-Gradients, Procedings of the British Machine Vision Conference 2008, pp.123-148, 2008.
DOI : 10.5244/C.22.99

A. Kläser, M. Marszalek, C. Schmid, and A. Zisserman, Human Focused Action Localization in Video, International Workshop on Sign, Gesture , and Activity (SGA), pp.123-129, 2010.
DOI : 10.1007/978-3-642-35749-7_17

A. Kolesnikov and C. H. Lampert, Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation, ECCV, p.174, 2016.
DOI : 10.1007/978-3-319-46493-0_42

S. Korman and S. Avidan, Coherency sensitive hashing, ICCV, p.37, 2011.
DOI : 10.1109/iccv.2011.6126421

P. Krähenbühl and V. Koltun, Geodesic object proposals, ECCV, p.82, 2014.

A. Krizhevsky, I. Sutskever, and G. Hinton, Imagenet classication with deep convolutional neural networks, Advances in Neural Information Processing Systems 25, p.121, 2012.

H. Kuehne, E. Jhuang, T. Garrote, T. Poggio, and . Serre, HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, p.126, 2011.
DOI : 10.1109/ICCV.2011.6126543

H. Lampert, M. B. Blaschko, and T. Hofmann, Ecient subwindow search: A branch and bound framework for object localization, IEEE Trans. PAMI, p.122, 2009.

T. Lan, Y. Wang, and G. Mori, Discriminative gure-centric models for joint action localization and recognition, ICCV, p.125, 2011.

T. Lan, Y. Zhu, A. Zamir, and S. Savarese, Action Recognition by Hierarchical Mid-Level Action Elements, 2015 IEEE International Conference on Computer Vision (ICCV), p.124, 2015.
DOI : 10.1109/ICCV.2015.517

I. Laptev, On space-time interest points. IJCV, p.149, 2005.
DOI : 10.1007/s11263-005-1838-7

I. Laptev and P. Pérez, Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, p.147, 2007.
DOI : 10.1109/ICCV.2007.4409105

I. Laptev, M. Marszaªek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, p.121, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), p.121, 2006.
DOI : 10.1109/CVPR.2006.68

URL : https://hal.archives-ouvertes.fr/inria-00548585

K. Lebeda, S. Hadeld, and R. Bowden, Dense Rigid Reconstruction from Unstructured Discontinuous Video, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), p.167, 2015.
DOI : 10.1109/ICCVW.2015.110

Y. Lecun, L. Bottou, Y. Bengio, and P. Haner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.37-51, 1998.
DOI : 10.1109/5.726791

Y. Lecun, L. Bottou, G. Orr, and K. Muller, Ecient backprop, Neural Networks: Tricks of the trade, p.45, 1998.

V. S. Lempitsky, S. Roth, and C. Rother, Fusionow: Discrete-continuous optimization for optical ow estimation, CVPR, p.30, 2008.

M. Leordeanu, A. Zanr, and C. Sminchisescu, Locally ane sparse-todense matching for motion and occlusion estimation, ICCV, 2013. 72, pp.79-89
DOI : 10.1109/iccv.2013.216

J. Lezama, K. Alahari, J. Sivic, and I. Laptev, Track to the future: Spatio-temporal video segmentation with long-range motion cues, CVPR 2011, p.119, 2011.
DOI : 10.1109/CVPR.2011.6044588

URL : https://hal.archives-ouvertes.fr/hal-00817961

M. Li, J. M. Paluri, P. Rehg, and . Dollár, Unsupervised Learning of Edges, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.172, 2016.
DOI : 10.1109/CVPR.2016.179

C. Liu, W. T. Freeman, and E. H. Adelson, Analysis of contour motions, Advances in Neural Information Processing Systems, p.99, 2006.

C. Liu, J. Yuen, and A. Torralba, SIFT ow: Dense correspondence across scenes and its applications, IEEE Trans. PAMI, p.65, 2011.

J. Liu, J. Luo, and M. Shah, Recognizing realistic actions from videos in the wild, CVPR, p.119, 2009.

D. G. Lowe, Distinctive image features from scale-invariant keypoints. IJCV, pp.41-62, 2004.
DOI : 10.1023/b:visi.0000029664.99615.94

J. Lu, H. Yang, D. Min, and M. Do, Patch match lter: Ecient edge-aware ltering meets randomized search for fast correspondence eld estimation, CVPR, p.30, 2013.
DOI : 10.1109/cvpr.2013.242

B. D. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, IJCAI, p.120, 1920.

J. Ma, N. Zhang, S. Ikizler-cinbis, and . Sclaro, Action Recognition and Localization by Hierarchical Space-Time Segments, 2013 IEEE International Conference on Computer Vision, p.162, 2013.
DOI : 10.1109/ICCV.2013.341

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.663.1492

S. A. Ma, J. Bargal, L. Zhang, S. Sigal, and . Sclaro, Do less and achieve more: Training CNNs for action recognition utilizing action images from the Web, Pattern Recognition, p.172, 2015.
DOI : 10.1016/j.patcog.2017.01.027

Y. Mae, J. Shirai, Y. Miura, and . Kuno, Object tracking in cluttered background based on optical ow and edges, ICVPR, 1996.

J. Malik and P. Perona, Preattentive texture discrimination with early vision mechanisms, Journal of the Optical Society of America A, vol.7, issue.5, p.45, 1990.
DOI : 10.1364/JOSAA.7.000923

M. M. Puscas, E. Sangineto, D. Culibrk, and N. Sebe, Unsupervised Tube Extraction Using Transductive Learning and Dense Trajectories, 2015 IEEE International Conference on Computer Vision (ICCV), pp.123-149
DOI : 10.1109/ICCV.2015.193

V. Markandey and B. E. Flinchbaugh, Multispectral constraints for optical ow computation, ICCV, p.22, 1990.
DOI : 10.1109/iccv.1990.139488

M. Marszalek, I. Laptev, and C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, p.119, 2009.
DOI : 10.1109/CVPR.2009.5206557

URL : https://hal.archives-ouvertes.fr/inria-00548645

D. Martin, C. Fowlkes, D. Tal, and J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, p.106, 2001.
DOI : 10.1109/ICCV.2001.937655

W. N. Martin and J. Aggarwal, Dynamic scene analysis, Computer Graphics and Image Processing, vol.7, issue.3, 1977.
DOI : 10.1016/S0146-664X(78)80003-3

P. Matikainen, M. Hebert, and R. Sukthankar, Trajectons: Action recognition through the motion analysis of tracked features, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, p.120, 2009.
DOI : 10.1109/ICCVW.2009.5457659

M. Menze, C. Heipke, and A. Geiger, Discrete Optimization for Optical Flow, GCPR, 2015. 28, pp.75-93
DOI : 10.1007/978-3-319-24947-6_2

P. Mettes, J. C. Van-gemert, and C. G. Snoek, Spot On: Action Localization from Pointly-Supervised Proposals, ECCV, p.149, 2016.
DOI : 10.1007/978-3-319-46454-1_27

M. Middendorf and H. Nagel, Estimation and interpretation of discontinuities in optical ow elds, ICCV, p.97, 2001.

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas et al., A comparison of ane region detectors, IJCV, vol.29, issue.53, p.57, 2005.

T. B. Moeslund, A. Hilton, and V. Krüger, A survey of advances in visionbased human motion capture and analysis, Computer vision and Image Understanding, issue.10, 2006.

E. A. Mosabbeb, R. Cabral, F. De-la-torre, and M. Fathy, Multi-label Discriminative Weakly-Supervised Human Activity Recognition and Localization, ACCV, p.162, 2014.
DOI : 10.1007/978-3-319-16814-2_16

M. Muja and D. G. Lowe, Fast approximate nearest neighbors with automatic algorithm conguration, International Conference on Computer Vision Theory and Application VISSAPP'09, p.62, 2009.

T. Müller, C. Rabe, J. Rannacher, U. Franke, and R. Mester, Illuminationrobust dense optical ow using census signatures, Pattern Recognition, p.22, 2011.

H. H. Nagel and W. Enkelmann, An investigation of smoothness constraints for the estimation of displacement vector elds from image sequences

K. Nakayama and J. Loomis, Optical Velocity Patterns, Velocity-Sensitive Neurons, and Space Perception: A Hypothesis, Perception, vol.225, issue.1, 1974.
DOI : 10.1068/p030063

S. Negahdaripour, Revised denition of optical ow: Integration of radiometric and geometric cues for dynamic scene analysis, IEEE Trans. PAMI, p.23, 1998.

P. X. Nguyen, G. Rogez, C. Fowlkes, and D. Ramamnan, The open world of micro-videos, p.172, 2016.

J. C. Niebles, C. Chen, and L. Fei-fei, Modeling temporal structure of decomposable motion segments for activity classication, ECCV, p.173, 2010.

T. Nir, A. M. Bruckstein, and R. Kimmel, Over-parameterized variational optical ow, IJCV, vol.25, p.26, 2008.
DOI : 10.1007/s11263-007-0051-2

D. Oneata, J. Revaud, J. Verbeek, and C. Schmid, Spatio-temporal Object Detection Proposals, ECCV, 2014a. 10, p.149
DOI : 10.1007/978-3-319-10578-9_48

URL : https://hal.archives-ouvertes.fr/hal-01021902

D. Oneata, J. Verbeek, and C. Schmid, The LEAR submission at Thumos 2014, 2014b. URL https

D. Oneata, J. Verbeek, and C. Schmid, Ecient Action Localization with Approximately Normalized Fisher Vectors, CVPR, p.122, 2014.
DOI : 10.1109/cvpr.2014.326

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.634.9146

G. Papandreou, L. Chen, K. Murphy, and A. L. Yuille, Weakly-and semi-supervised learning of a dcnn for semantic image segmentation, ICCV, p.174, 2015.

A. Papazoglou and V. Ferrari, Fast Object Segmentation in Unconstrained Video, 2013 IEEE International Conference on Computer Vision, pp.98-168
DOI : 10.1109/ICCV.2013.223

A. Patron, M. Marszalek, A. Zisserman, and I. Reid, High ve: Recognising human interactions in tv shows, BMVC, p.119, 2010.

X. Peng, C. Zou, Y. Qiao, and Q. Peng, Action recognition with stacked sher vectors, ECCV, p.121, 2014.
DOI : 10.1007/978-3-319-10602-1_38

T. Pster, J. Charles, and A. Zisserman, Flowing convnets for human pose estimation in videos, ICCV, p.167, 2015.

J. Philbin, M. Isard, J. Sivic, and A. Zisserman, Descriptor learning for ecient retrieval, ECCV, p.29, 2010.

. Poppe, A survey on vision-based human action recognition, Image and Vision Computing, vol.28, issue.6, p.118, 2010.
DOI : 10.1016/j.imavis.2009.11.014

A. Prest, V. Ferrari, and C. Schmid, Explicit Modeling of Human-Object Interactions in Realistic Videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.4, p.173, 2012.
DOI : 10.1109/TPAMI.2012.175

URL : https://hal.archives-ouvertes.fr/hal-00720847

A. Prest, C. Leistner, J. Civera, C. Schmid, and V. Ferrari, Learning object class detectors from weakly annotated video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.106
DOI : 10.1109/CVPR.2012.6248065

URL : https://hal.archives-ouvertes.fr/hal-00695940

R. Ranftl, K. Bredies, and T. Pock, Non-local total generalized variation for optical ow estimation, ECCV, pp.75-93, 2014.
DOI : 10.1007/978-3-319-10590-1_29

D. Reddy, P. Singhal, V. Chari, and K. M. Krishna, Dynamic body VSLAM with semantic constraints, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), p.167, 2015.
DOI : 10.1109/IROS.2015.7353626

URL : http://arxiv.org/abs/1504.07269

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS, pp.2015-147
DOI : 10.1109/TPAMI.2016.2577031

URL : http://arxiv.org/abs/1506.01497

. Ren, Local grouping for optical ow, CVPR, p.79, 2008.

J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid, EpicFlow: Edge-preserving interpolation of correspondences for optical flow, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.72-73, 2015.
DOI : 10.1109/CVPR.2015.7298720

URL : https://hal.archives-ouvertes.fr/hal-01142656

J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid, DeepMatching: Hierarchical Deformable Dense Matching. IJCV, 2016, pp.76-77
DOI : 10.1007/s11263-016-0908-3

URL : https://hal.archives-ouvertes.fr/hal-01148432

M. D. Rodriguez, J. Ahmed, and M. Shah, Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, p.125, 2008.
DOI : 10.1109/CVPR.2008.4587727

. Rohr, Towards model-based recognition of human movements in image sequences. CVGIP: Image understanding, 1994.

S. Roth and M. J. Black, On the spatial statistics of optical ow, IJCV, vol.24, p.28, 2007.

M. Ruder, A. Dosovitskiy, and T. Brox, Artistic style transfer for videos. arXiv preprint, p.167, 2016.
DOI : 10.1007/978-3-319-45886-1_3

A. Salgado and J. Sánchez, Temporal constraints in large optical ow estimation, Computer Aided Systems TheoryEUROCAST, pp.709716-709741, 2007.

J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image classication with the sher vector: Theory and practice. IJCV, pp.121-158, 2013.

P. Sand and S. Teller, Particle video: Long-range motion estimation using point trajectories. IJCV, p.119, 2008.
DOI : 10.1007/s11263-008-0136-6

S. Satkin and M. Hebert, Modeling the Temporal Extent of Actions, ECCV, p.122, 2010.
DOI : 10.1007/978-3-642-15549-9_39

C. Schüldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., p.126, 2004.
DOI : 10.1109/ICPR.2004.1334462

M. Seitz and S. Baker, Filter ow, ICCV, p.24, 2009.

. Overfeat, Integrated recognition, localization and detection using CNN, ICLR, p.134, 2014.

D. Sevilla-lara, V. Sun, M. J. Jampani, and . Black, Optical ow with semantic segmentation and localized layers, CVPR, p.171, 2016.
DOI : 10.1109/cvpr.2016.422

URL : http://arxiv.org/abs/1603.03911

N. Shapovalova, A. Vahdat, K. Cannons, T. Lan, and G. Mori, Similarity Constrained Latent Support Vector Machine: An Application to Weakly Supervised Action Classification, ECCV, p.124, 2012.
DOI : 10.1007/978-3-642-33786-4_5

J. Shin, S. Kim, S. Kang, S. Lee, J. Paik et al., Optical ow-based real-time object tracking using non-prior training active feature model. Real-Time Imaging, 2005.
DOI : 10.1016/j.rti.2005.03.006

Z. Shou, D. Wang, and S. Chang, Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.173, 2016.
DOI : 10.1109/CVPR.2016.119

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS, p.132, 2014.

K. Simonyan and A. Zisserman, Very deep convolutional networks for largescale image recognition, ICLR, p.152, 2015.

P. Siva and T. Xiang, Weakly Supervised Action Detection, Procedings of the British Machine Vision Conference 2011, p.149, 2011.
DOI : 10.5244/C.25.65

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, p.120, 2003.
DOI : 10.1109/ICCV.2003.1238663

K. Soomro, A. R. Zamir, and M. Shah, UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild, CRCV-TR-12-01, p.133, 2012.

A. Spoerri, The Early Detection of Motion Boundaries, p.98, 1991.

A. Stein and M. Hebert, Occlusion Boundaries from Motion: Low-Level Detection and??Mid-Level Reasoning, International Journal of Computer Vision, vol.14, issue.7, p.99, 2009.
DOI : 10.1007/s11263-008-0203-z

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.193.4976

F. Stein, Ecient computation of optical ow using the census transform, Pattern recognition, p.170, 2004.

F. Stein, Ecient Computation of Optical Flow Using the Census Transform, Proceedings of the 26th DAGM Symposium, p.22, 2004.

M. Stoll, S. Volz, and A. Bruhn, Adaptive integration of feature matches into variational optical ow methods, ACCV, 2012. 54, p.172

D. Sun, S. Roth, J. P. Lewis, and M. J. Black, Learning optical ow, ECCV, p.28, 2008.

D. Sun, E. B. Sudderth, and M. J. Black, Layered image motion with explicit occlusions, temporal consistency, and depth ordering, NIPS, p.31, 2010.

D. Sun, J. Wul, E. Sudderth, H. Pster, and M. Black, A fully-connected layered model of foreground and background ow, CVPR, p.99, 2013.

D. Sun, C. Liu, and H. Pster, Local Layering for Joint Motion Estimation and Occlusion Detection, 2014 IEEE Conference on Computer Vision and Pattern Recognition, p.74, 2014.
DOI : 10.1109/CVPR.2014.144

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.665.5541

D. Sun, S. Roth, and M. Black, A quantitative analysis of current practices in optical ow estimation and the principles behind them. IJCV, 2014b, pp.93-110

J. Sun, Computing nearest-neighbor elds via propagation-assisted kdtrees, CVPR, 2012. 29, p.63

Z. Sun, G. Bebis, and R. Miller, On-road vehicle detection using optical sensors: a review, Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No.04TH8749), 2004.
DOI : 10.1109/ITSC.2004.1398966

T. Sundaram, K. Brox, and . Keutzer, Dense point trajectories by gpuaccelerated large displacement optical ow, ECCV, p.203, 2010.
DOI : 10.1007/978-3-642-15549-9_32

T. Sundberg, M. Brox, P. Maire, J. Arbelaez, and . Malik, Occlusion boundary detection and gure/ground assignment from optical ow, CVPR, p.99, 2011.
DOI : 10.1109/cvpr.2011.5995364

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.202

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.121, 2015.
DOI : 10.1109/CVPR.2015.7298594

URL : http://arxiv.org/abs/1409.4842

. Szeliski, Computer Vision: Algorithms and Applications, p.53, 2010.
DOI : 10.1007/978-1-84882-935-0

E. H. Taralova, F. De-la-torre, and M. Hebert, Motion Words for Videos, ECCV, p.121, 2014.
DOI : 10.1007/978-3-319-10590-1_47

D. Teney and M. Hebert, Learning to extract motion from videos in convolutional neural networks. arXiv preprint, p.171, 2016.

R. Tian, M. Sukthankar, and . Shah, Spatiotemporal Deformable Part Models for Action Detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.123, 2013.
DOI : 10.1109/CVPR.2013.341

R. Timofte and L. Van-gool, Sparse ow: Sparse matching for small to large displacement optical ow, Applications of Computer Vision (WACV), pp.72-167, 2015.
DOI : 10.1109/wacv.2015.151

E. Tola, V. Lepetit, and P. Fua, A fast local descriptor for dense matching, 2008 IEEE Conference on Computer Vision and Pattern Recognition, p.29, 2008.
DOI : 10.1109/CVPR.2008.4587673

E. Tola, V. Lepetit, and P. Fua, DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.5, p.65, 2010.
DOI : 10.1109/TPAMI.2009.77

L. Tran, R. Bourdev, L. Fergus, M. Torresani, and . Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), p.121, 2015.
DOI : 10.1109/ICCV.2015.510

URL : http://arxiv.org/abs/1412.0767

T. Trobin, D. Pock, H. Cremers, and . Bischof, An Unbiased Second-Order Prior for High-Accuracy Motion Estimation, Pattern Recognition, vol.25, p.27, 2008.
DOI : 10.1007/978-3-540-69321-5_40

S. Uchida and H. Sakoe, A monotonic and continuous two-dimensional warping based on dynamic programming, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170), p.37, 1998.
DOI : 10.1109/ICPR.1998.711195

H. Uemura, S. Ishikawa, and K. Mikolajczyk, Feature Tracking and Motion Compensation for Action Recognition, Procedings of the British Machine Vision Conference 2008, p.120, 2008.
DOI : 10.5244/C.22.30

J. Uijlings, K. Van-de-sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, International Journal of Computer Vision, vol.57, issue.1, p.139, 2013.
DOI : 10.1007/s11263-013-0620-5

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.3382

M. Unger, M. Werlberger, T. Pock, and H. Bischof, Joint motion estimation and segmentation of complex scenes with label costs and occlusion modeling, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.99, 2012.
DOI : 10.1109/CVPR.2012.6247887

J. C. Van-gemert, C. J. Veenman, A. W. Smeulders, and J. Geusebroek, Visual Word Ambiguity, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.7, p.120, 2010.
DOI : 10.1109/TPAMI.2009.132

J. C. Van-gemert, M. Jain, E. Gati, and C. G. Snoek, APT: Action localization proposals from dense trajectories, Procedings of the British Machine Vision Conference 2015, pp.147-149
DOI : 10.5244/C.29.177

G. Varol, Y. Laptev, and C. Schmid, Long-term Temporal Convolutions for Action Recognition, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01241518

S. Vogel, K. Roth, and . Schindler, An evaluation of data costs for optical ow, GCPR, p.170, 2013.

C. Vogel, K. Schindler, and S. Roth, Piecewise rigid scene ow, ICCV, 2013b. 75, p.102
DOI : 10.1109/iccv.2013.174

S. Volz, A. Bruhn, L. Valgaerts, and H. Zimmer, Modeling temporal coherence for optical ow, ICCV, p.25, 2011.
DOI : 10.1109/iccv.2011.6126359

URL : http://hdl.handle.net/11858/00-001M-0000-0024-513E-5

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, pp.131-136, 1998.
DOI : 10.1007/s11263-012-0594-8

URL : https://hal.archives-ouvertes.fr/hal-00725627

H. Wang, D. Oneata, J. Verbeek, and C. Schmid, A robust and ecient video representation for action recognition. IJCV, 2015, pp.131-136
DOI : 10.1007/s11263-015-0846-5

URL : http://arxiv.org/abs/1504.05524

J. Wang and E. Adelson, Representing moving images with layers, IEEE Transactions on Image Processing, vol.3, issue.5, p.99, 1994.
DOI : 10.1109/83.334981

L. Wang, Y. Qiao, and X. Tang, Video Action Detection with Relational Dynamic-Poselets, ECCV, p.173, 2014.
DOI : 10.1007/978-3-319-10602-1_37

. Wasserman, All of Statistics: A Concise Course in Statistical Inference, p.80, 2010.
DOI : 10.1007/978-0-387-21736-9

O. Weber, Y. S. Devir, A. M. Bronstein, M. M. Bronstein, and R. Kimmel, Parallel algorithms for approximation of distance maps on parametric surfaces, ACM Transactions on Graphics, vol.27, issue.4, p.82, 2008.
DOI : 10.1145/1409625.1409626

A. Wedel, D. Cremers, T. Pock, and H. Bischof, Structure-and motionadaptive regularization for high accuracy optic ow, ICCV, p.85, 2009.
DOI : 10.1109/iccv.2009.5459375

A. Wedel, T. Pock, C. Zach, H. Bischof, and D. Cremers, An improved algorithm for tv-l 1 optical ow, Statistical and Geometrical Approaches to Visual Motion Analysis, p.27, 2009.

D. Weinland, R. Ronfard, and E. Boyer, A survey of vision-based methods for action representation, segmentation and recognition, Computer Vision and Image Understanding, vol.115, issue.2, p.118, 2011.
DOI : 10.1016/j.cviu.2010.10.002

URL : https://hal.archives-ouvertes.fr/inria-00459653

P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, Deepow: Large displacement optical ow with deep matching, ICCV, pp.79-86, 2013.
DOI : 10.1109/iccv.2013.175

P. Weinzaepfel, Z. Harchaoui, and C. Schmid, Learning to track for spatiotemporal action localization, ICCV, 2015a. 14, pp.149-159
URL : https://hal.archives-ouvertes.fr/hal-01159941

P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, Learning to detect Motion Boundaries, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.13, 2015.
DOI : 10.1109/CVPR.2015.7298873

URL : https://hal.archives-ouvertes.fr/hal-01142653

P. Weinzaepfel, X. Martin, and C. Schmid, Towards Weakly-Supervised Action Localization. arXiv preprint, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01317558

G. Willems, T. Tuytelaars, and L. Van-gool, An ecient dense and scaleinvariant spatio-temporal interest point detector, ECCV, p.119, 2008.
DOI : 10.1007/978-3-540-88688-4_48

J. Wills, S. Agarwal, and S. Belongie, A Feature-based Approach for Dense Segmentation and Estimation of Large Disparity Motion, International Journal of Computer Vision, vol.II, issue.12, p.37, 2006.
DOI : 10.1007/s11263-006-6660-3

J. Wul and M. J. Black, Ecient sparse-to-dense optical ow estimation using a learned basis and layers, CVPR, p.28, 2015.

C. Xu, S. Hsieh, C. Xiong, and J. J. Corso, Can humans fly? Action understanding with multiple classes of actors, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.122, 2015.
DOI : 10.1109/CVPR.2015.7298839

J. Xu, Y. Jia, and . Matsushita, Motion detail preserving optical ow estimation, IEEE Trans. PAMI, vol.8, issue.93, pp.76-78, 2012.

H. Yang, W. Lin, and J. Lu, DAISY lter ow: A generalized discrete approach to dense correspondences, CVPR, 2014. 29, p.65
DOI : 10.1109/cvpr.2014.435

M. Young and W. Rheinboldt, Iterative solution of large linear systems, p.203, 1971.

G. Yu and J. Yuan, Fast action proposals for human action detection and search, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.147-149
DOI : 10.1109/CVPR.2015.7298735

J. Yuan, Z. Liu, and Y. Wu, Discriminative subvolume search for ecient action detection, CVPR, p.122, 2009.

R. Zabih and J. Woodll, Non-parametric local transforms for computing visual correspondence, ECCV, p.22, 1994.
DOI : 10.1007/BFb0028345

C. Zach, T. Pock, and H. Bischof, A duality based approach for realtime tv-l 1 optical ow, Pattern Recognition, p.110, 2007.

X. Zhou, K. Yu, T. Zhang, and T. S. Huang, Image classication using super-vector coding of local image descriptors, ECCV, p.120, 2010.

H. Zimmer, A. Bruhn, J. Weickert, L. Valgaerts, A. Salgado et al., Complementary optic ow, EMM-CVPR, p.26, 2009.
DOI : 10.1007/978-3-642-03641-5_16

H. Zimmer, A. Bruhn, and J. Weickert, Optic ow in harmony, IJCV, vol.54, issue.55, p.85, 2011.

C. L. Zitnick and P. Dollár, Edge Boxes: Locating Object Proposals from Edges, ECCV, pp.130-139, 2014.
DOI : 10.1007/978-3-319-10602-1_26

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.453.5208