J. Aach and G. M. Church, Aligning gene expression time series with time warping algorithms, Bioinformatics, vol.17, issue.6, pp.495-508, 2001.

S. Abu-el-haija, N. Kothari, J. Lee, P. Natsev, G. Toderici et al., Youtube-8m : A large-scale video classification benchmark, vol.87, 2016.

J. Assfalg, M. Bertini, C. Colombo, A. D. Bimbo, and W. Nunziati, Semantic annotation of soccer videos : automatic highlights identification, CVIU, vol.92, issue.2, p.17, 2003.

W. J. Baddar, G. Gu, S. Lee, and Y. M. Ro, Dynamics transfer gan : Generating video by transferring arbitrary temporal dynamics from a source video to a single target image, vol.9, 2017.

H. Ben-younes, R. Cadene, M. Cord, and N. Thome, Mutan : Multimodal tucker fusion for visual question answering, Proc. IEEE Int. Conf. Comp. Vis, vol.3, p.45, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02073637

J. A. Bengua, H. N. Phien, H. D. Tuan, and M. N. Do, Efficient tensor completion for color image and video recovery : Low-rank tensor train, IEEE Transactions on Image Processing, vol.26, issue.5, pp.2466-2479, 2017.

A. Bhattacharyya, M. Malinowski, B. Schiele, and M. Fritz, Long-term image boundary prediction, 2016.

H. Bilen, B. Fernando, E. Gavves, A. Vedaldi, and S. Gould, Dynamic image networks for action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3034-3042, 2016.

J. Böer, Multiple alignment using hidden markov models. proteins, 4 :14, vol.80, 1995.

M. Borga, Canonical correlation : a tutorial, vol.4, p.53, 2001.

A. Bruderlin and L. Williams, Motion signal processing, Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, vol.80, pp.97-104, 1995.

W. Byeon, Q. Wang, R. K. Srivastava, and P. Koumoutsakos, Fully contextaware video prediction, 2017.

J. Carreira and A. Zisserman, Quo vadis, action recognition ? a new model and the kinetics dataset, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4724-4733, 2017.

A. Cichocki, D. Mandic, L. De-lathauwer, G. Zhou, Q. Zhao et al., Tensor decompositions for signal processing applications : From two-way to multiway component analysis, IEEE Signal Processing Magazine, vol.32, issue.2, pp.145-163, 2015.

A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari, Nonnegative matrix and tensor factorizations : applications to exploratory multi-way data analysis and blind source separation, vol.36, 2009.

N. Cohen, O. Sharir, and A. Shashua, On the expressive power of deep learning : A tensor analysis, Conference on Learning Theory, pp.698-728, 2016.

, Fédération Internationale. de Football Association (FIFA). 2014 fifa world cup brazil : matches description. Germany-vs

L. De-lathauwer, B. De-moor, and J. Vandewalle, A multilinear singular value decomposition, SIAM journal on Matrix Analysis and Applications, vol.21, issue.4, p.37, 2000.

M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi et al., Reconnaissance d actions humaines 3d par l analyse de forme des trajectoires de mouvement, Compression et Représentation des Signaux Audiovisuels, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01207938

A. Diba, V. Sharma, and L. Van-gool, Deep temporal linear encoding networks, Computer Vision and Pattern Recognition, 2017.

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2625-2634, 2015.

L. Duan, M. Xu, T. Chua, Q. Tian, and C. Xu, A mid-level representation framework for semantic sports video analysis, Proceedings of the eleventh ACM international conference on Multimedia, p.16, 2003.

L. Duan, M. Xu, Q. Tian, C. Xu, and J. S. Jin, A unified framework for semantic shot classification in sports video, IEEE Transactions on Multimedia, vol.7, issue.6, pp.1066-1083, 2005.

S. Ebadollahi, L. Xie, S. Chang, and J. R. Smith, Visual event detection using multi-dimensional concept dynamics, IEEE International Conference on, pp.881-884, 2006.

A. Edison and C. Jiji, Optical acceleration for motion description in videos, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.39-47, 2017.

A. Ekin, Generic play-break event detection for summarization and hierarchical sports video analysis, Multimedia and Expo, 2003. ICME'03. Proceedings. 2003 International Conference on, vol.1, p.16, 2003.

G. Farnebäck, Two-frame motion estimation based on polynomial expansion, Image Analysis, p.18, 2003.

A. Fathi and G. Mori, Action recognition by learning mid-level motion features, Computer Vision and Pattern Recognition, pp.1-8, 2008.

C. Feichtenhofer, A. Pinz, and R. Wildes, Spatiotemporal residual networks for video action recognition, Advances in neural information processing systems, pp.3468-3476, 2016.

B. Fernando, S. Gavves, O. Mogrovejo, J. Antonio, A. Ghodrati et al., Tuytelaars. Modeling video evolution for action recognition, Proceedings CVPR 2015, pp.5378-5387, 2015.

T. Garipov, D. Podoprikhin, A. Novikov, and D. Vetrov, Ultimate tensorization : compressing convolutional and fc layers alike, p.43, 2016.

A. Georghiades, P. Belhumeur, and D. Kriegman, From few to many : Illumination cone models for face recognition under variable lighting and pose, IEEE Trans. Pattern Anal. Mach. Intelligence, vol.23, issue.6, p.68, 2001.

G. Gkioxari and J. Malik, Finding action tubes, Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, pp.759-768, 2015.

D. Gong and G. Medioni, Dynamic manifold warping for view invariant action recognition, 2011 IEEE International Conference on, vol.81, pp.571-578, 2011.

Y. Gong, L. T. Sin, C. H. Chuan, H. Zhang, and M. Sakauchi, Automatic parsing of tv soccer programs, Proceedings of the International Conference on, vol.16, pp.167-174, 1995.

X. Guo, X. Huang, L. Zhang, L. Zhang, A. Plaza et al., Support tensor machines for classification of hyperspectral remote sensing imagery, IEEE Transactions on Geoscience and Remote Sensing, vol.54, issue.6, p.40, 2016.

A. Hanjalic, Generic approach to highlights extraction from a sport video, Proceedings. 2003 International Conference on, vol.1, p.16, 2003.

M. A. Hasan, On multi-set canonical correlation analysis, IJCNN 2009. International Joint Conference on, vol.85, pp.1128-1133, 2009.

E. Hsu, K. Pulli, and J. Popovi´cpopovi´c, Style translation for human motion, In ACM Transactions on Graphics, vol.24, pp.1082-1089, 2005.

M. Jaderberg, K. Simonyan, and A. Zisserman, Spatial transformer networks, NIPS, pp.2017-2025, 2015.

S. Ji, W. Xu, M. Yang, and K. Yu, 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, vol.35, pp.221-231, 2013.

C. Jia, G. Zhong, and Y. R. Fu, Low-rank tensor learning with discriminant analysis for action classification and image recovery, AAAI, p.51, 2014.

H. Jiang, D. Sun, V. Jampani, M. Yang, E. Learned-miller et al., Super slomo : High quality estimation of multiple intermediate frames for video interpolation, 2017.

G. Johansson, Visual perception of biological motion and a model for its analysis, Perception & psychophysics, vol.14, issue.2, pp.201-211, 1973.

I. N. Junejo, E. Dexter, I. Laptev, and P. Perez, View-independent action recognition from temporal self-similarities, IEEE transactions on pattern analysis and machine intelligence, vol.33, pp.172-185, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01064695

S. E. Kahou, V. Michalski, and R. Memisevic, Ratm : recurrent attentive tracking model, 2015.

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar et al., Large-scale video classification with convolutional neural networks, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, vol.92, pp.1725-1732, 2014.

C. Keimel, M. Rothbucher, H. Shen, and K. Diepold, Video is a cube, IEEE Signal Processing Magazine, vol.28, issue.6, pp.41-49, 2011.

O. Kihl, B. Tremblais, and B. Augereau, Multivariate orthogonal polynomials to extract singular points, IEEE International Conference on Image Processing, vol.17, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00340382

T. Kim, Hand gesture dataset website, vol.59, pp.2018-2024

T. Kim and R. Cipolla, Gesture recognition under small sample size, Asian conference on computer vision, p.59, 2007.

T. S. Kim and A. Reiter, Interpretable 3d human action analysis with temporal convolutional networks, Computer Vision and Pattern Recognition Workshops, pp.1623-1631, 2017.

A. Kläser, M. Marsza?ek, C. Schmid, and A. Zisserman, Human focused action localization in video, European Conference on Computer Vision, pp.219-233, 2010.

J. Kossaifi, A. Khanna, Z. Lipton, T. Furlanello, and A. Anandkumar, Tensor contraction layers for parsimonious deep nets, Computer Vision and Pattern Recognition Workshops, pp.1940-1946

, IEEE, vol.44, 2017.

I. Kotsia, W. Guo, and I. Patras, Higher rank support tensor machines for visual recognition, Pattern Recognition, vol.45, issue.12, p.39, 2012.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, vol.16, pp.1097-1105, 2012.

I. Laptev, On space-time interest points, International journal of computer vision, vol.64, issue.2-3, pp.107-123, 2005.

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, Computer Vision and Pattern Recognition, pp.1-8, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00548659

V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets, and V. Lempitsky, Speedingup convolutional neural networks using fine-tuned cp-decomposition, vol.42, 2014.

C. Lee, Human action recognition using tensor dynamical system modeling, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, p.39, 2017.

R. Leonardi, P. Migliorati, and M. Prandini, Semantic indexing of soccer audio-visual sequences : a multimodal approach based on controlled markov chains. Circuits and Systems for Video Technology, IEEE Transactions on, vol.14, issue.5, p.16, 2004.

Z. Li, K. Gavrilyuk, E. Gavves, M. Jain, and C. G. Snoek, Videolstm convolves, attends and flows for action recognition, Computer Vision and Image Understanding, vol.166, pp.41-50, 2018.

X. Liang, L. Lee, W. Dai, and E. P. Xing, Dual motion gan for future-flow embedded video prediction, 2017.

J. Liu, P. Musialski, P. Wonka, and J. Ye, Tensor completion for estimating missing values in visual data, IEEE transactions on pattern analysis and machine intelligence, vol.35, p.45, 2013.

Z. Liu, R. Yeh, X. Tang, Y. Liu, and A. Agarwala, Video frame synthesis using deep voxel flow, ICCV, 2017.

Z. Liu, L. Yuan, X. Tang, M. Uyttendaele, and J. Sun, Fast burst images denoising, ACM Transactions on Graphics (TOG), vol.33, issue.6, p.232, 2014.

E. F. Lock, K. A. Hoadley, J. S. Marron, and A. B. Nobel, Joint and individual variation explained (jive) for integrated analysis of multiple data types, The annals of applied statistics, vol.7, p.53, 2013.

A. P. Lopes, R. S. Oliveira, J. M. De-almeida, and A. D. Araújo, Comparing alternatives for capturing dynamic information in bag-of-visualfeatures approaches applied to human actions recognition, Multimedia Signal Processing, p.36, 2009.

D. Mahajan, F. Huang, W. Matusik, R. Ramamoorthi, and P. Belhumeur, Moving gradients : a path-based method for plausible image interpolation, In ACM Transactions on Graphics, vol.28, p.42, 2009.

H. Mao, S. Han, J. Pool, W. Li, X. Liu et al., Exploring the granularity of sparsity in convolutional neural networks, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, vol.40, 2017.

P. K. Mital, T. J. Smith, R. L. Hill, and J. M. Henderson, Clustering of gaze during dynamic scene viewing is predicted by motion, Cognitive Computation, vol.3, issue.1, pp.5-24, 2011.

J. Y. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga et al., Beyond short snippets : Deep networks for video classification, Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, pp.4694-4702, 2015.

J. C. Niebles, H. Wang, and L. Fei-fei, Unsupervised learning of human action categories using spatial-temporal words, International journal of computer vision, vol.79, issue.3, pp.299-318, 2008.

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and transferring midlevel image representations using convolutional neural networks, Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp.1717-1724, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00911179

I. V. Oseledets, Tensor-train decomposition, SIAM Journal on Scientific Computing, vol.33, issue.5, p.42, 2011.

H. Pan, P. Van-beek, and M. I. Sezan, Detection of slow-motion replay segments in sports video for highlights generation, Acoustics, Speech, and Signal Processing, vol.3, p.28, 2001.

R. Pascanu, T. Mikolov, and Y. Bengio, On the difficulty of training recurrent neural networks, International Conference on Machine Learning, pp.1310-1318, 2013.

A. H. Phan and A. Cichocki, Tensor decompositions for feature extraction and classification of high dimensional datasets. Nonlinear theory and its applications, vol.1, pp.37-68, 2010.

X. Qian, Global Motion Estimation and Its Applications, 2012.

L. R. Rabiner and B. Juang, Fundamentals of speech recognition, vol.14, 1993.

A. Raventos, R. Quijada, L. Torres, and F. Tarres, Automatic summarization of soccer highlights using audio-visual descriptors, 2014.

C. L. René and V. A. Hager, Temporal convolutional networks for action segmentation and detection, IEEE International Conference on Computer Vision (ICCV, 2017.

Y. Runsheng, S. Zhenyu, and Q. Laiyun, Unsupervised learning aids prediction : Using future representation learning variantial autoencoder for human action prediction, 2017.

D. Sadlier and N. E. O'connor, Event detection in field sports video using audio-visual features and a support vector machine. Circuits and Systems for Video Technology, IEEE Transactions on, vol.15, issue.10, p.16, 2005.

S. M. Safdarnejad, X. Liu, and L. Udpa, Robust global motion compensation in presence of predominant foreground, BMVC, p.24, 2015.

M. Saito, E. Matsumoto, and S. Saito, Temporal generative adversarial nets with singular value clipping, IEEE International Conference on Computer Vision (ICCV), vol.9, pp.2830-2839, 2017.

S. C. Sajjan and C. Vijaya, Comparison of dtw and hmm for isolated word recognition, Pattern Recognition, Informatics and Medical Engineering (PRIME), 2012 International Conference on, vol.80, pp.466-470, 2012.

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, Advances in neural information processing systems, pp.568-576, 2014.

B. Singh, T. K. Marks, M. Jones, O. Tuzel, and M. Shao, A multi-stream bidirectional recurrent neural network for fine-grained action detection, Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, vol.99, pp.1961-1970, 2016.

N. Srivastava, E. Mansimov, and R. Salakhudinov, Unsupervised learning of video representations using lstms, International conference on machine learning, pp.843-852, 2015.

Y. Su, H. Wang, P. Jing, and C. Xu, A spatial-temporal iterative tensor decomposition technique for action and gesture recognition, Multimedia Tools and Applications, vol.76, issue.8, p.51, 2017.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition, vol.87, pp.1-9, 2015.

R. Szeliski, Prediction error as a quality metric for motion and stereo, The Proceedings of the Seventh IEEE International Conference on, vol.2, pp.781-788, 1999.

Y. Tabii and R. O. Thami, A new method for soccer video summarizing based on shot detection, classification and finite state machine, Proceedings of The 5th international conference SETIT, 2009.

G. W. Taylor, R. Fergus, Y. Lecun, and C. Bregler, Convolutional learning of spatio-temporal features, European conference on computer vision, pp.140-153, 2010.

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning spatiotemporal features with 3d convolutional networks, Proceedings of the IEEE International Conference on Computer Vision, pp.4489-4497, 2015.

G. Trigeorgis, M. A. Nicolaou, B. W. Schuller, and S. Zafeiriou, Deep canonical time warping for simultaneous alignment and representation learning of sequences, IEEE Trans. PAMI, pp.1128-1138, 2018.

G. Trigeorgis, M. A. Nicolaou, S. Zafeiriou, and B. W. Schuller, Deep canonical time warping, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p.81, 2016.

S. Tulyakov, M. Liu, X. Yang, and J. Kautz, Mocogan : Decomposing motion and content for video generation, vol.9, 2017.

G. Varol, I. Laptev, and C. Schmid, Long-term temporal convolutions for action recognition, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01241518

M. A. Vasilescu, A multilinear (tensor) algebraic framework for computer graphics, computer vision, and machine learning, p.48, 2009.

M. A. Vasilescu and D. Terzopoulos, Multilinear analysis of image ensembles : Tensorfaces, European Conference on Computer Vision, pp.447-460, 2002.

C. Vondrick, H. Pirsiavash, and A. Torralba, Generating videos with scene dynamics, Advances In Neural Information Processing Systems, vol.9, pp.613-621, 2016.

H. T. Vu, C. Carey, and S. Mahadevan, Manifold warping : Manifold alignment over time, AAAI, vol.1, 2012.

J. Wan, Y. Zhao, S. Zhou, I. Guyon, S. Escalera et al., Chalearn looking at people rgb-d isolated and continuous datasets for gesture recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, p.89, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01381151

H. Wang, A. Kläser, C. Schmid, and C. Liu, Action recognition by dense trajectories, Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, vol.16, pp.3169-3176, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00583818

H. Wang, W. Yang, C. Yuan, H. Ling, and W. Hu, Human activity prediction using temporally-weighted generalized time warping, Neurocomputing, vol.225, pp.139-147, 2017.

L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin et al., Temporal segment networks for action recognition in videos, vol.6, 2018.

X. Wang, A. Farhadi, A. Gupta, and . Actions?transformationsactions?transformations, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.2658-2667, 2016.

R. Wilbur and A. C. Kak, Purdue rvl-slll american sign language database, vol.88, 2006.

C. Wu, Y. Ma, H. Zhan, and Y. Zhong, Events recognition by semantic inference for sports video, Multimedia and Expo, 2002. ICME'02, vol.1, p.16, 2002.

L. Xie, S. Chang, A. Divakaran, and H. Sun, Structure analysis of soccer video with hidden markov models, Acoustics, Speech, and Signal Processing, vol.4, p.16, 2002.

Z. Xiong, X. S. Zhou, Q. Tian, Y. Rui, and T. S. Huang, Semantic retrieval of video, IEEE Signal Processing Magazine, vol.23, issue.2, p.16, 2006.

H. Xu, A. Das, and K. Saenko, R-c3d : Region convolutional 3d network for temporal activity detection, The IEEE International Conference on Computer Vision (ICCV), vol.6, p.91, 2017.

P. Xu, L. Xie, S. Chang, A. Divakaran, A. Vetro et al., Algorithms and system for segmentation and structure analysis in soccer video, ICME, vol.1, p.16, 2001.

Y. Yang, D. Krompass, and V. Tresp, Tensor-train recurrent neural networks for video classification, p.43, 2017.

Q. Ye, Q. Huang, W. Gao, and S. Jiang, Exciting event detection in broadcast soccer video with mid-level description and incremental learning, Proceedings of the 13th annual ACM international conference on Multimedia, vol.24, pp.455-458, 2005.