M. A. Ahad, Motion History Image, Motion History Images for Action Recognition and Understanding, pp.31-76, 2013.

L. Bahl, P. Brown, P. D. Souza, and R. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.11, pp.49-52, 1986.

L. E. Baum and T. Petrie, Statistical Inference for Probabilistic Functions of Finite State Markov Chains, The Annals of Mathematical Statistics, vol.37, issue.6, pp.1554-1563, 1966.

Y. Bengio, A. Courville, and P. Vincent, Representation Learning: A Review and New Perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1798-1828, 2013.

Y. Bengio, J. Louradour, R. Collobert, W. , and J. , Curriculum Learning, Proceedings of the 26th Annual International Conference on Machine Learning, pp.41-48, 2009.

M. K. Bhuyan, D. Ghosh, and P. K. Bora, Finite state representation of hand gesture using key video object plane, IEEE Region 10 Conference TENCON, vol.A, pp.579-582, 2004.

C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 2006.

T. Bluche, Deep Neural Networks for Large Vocabulary Handwritten Text Recognition. Theses, Université, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01249405

H. Bourlard and N. Morgan, A continuous speech recognition system embedding MLP into HMM, Advances in neural information processing systems, pp.186-193, 1990.

G. Bradski, The OpenCV Library. Dr. Dobb's Journal of Software Tools, 2000.

A. Braffort, M. Benchiheub, and B. Berret, Aplus: A 3d corpus of french sign language, Proceedings of the 17th International ACM SIGACCESS Conference on Computers &#38, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01634100

, Accessibility, ASSETS '15, pp.381-382

J. Bromley, I. Guyon, Y. Lecun, E. Säckinger, and R. Shah, Signature Verification Using a "Siamese" Time Delay Neural Network, Proceedings of the 6th International Conference on Neural Information Processing Systems, NIPS'93, pp.737-744, 1993.

Z. Cao, M. Long, J. Wang, Y. , and P. S. , HashNet: Deep Learning to Hash by Continuation, 2017 IEEE International Conference on Computer Vision (ICCV), pp.5609-5618, 2017.

R. A. Caruana, Multitask Learning: A Knowledge-Based Source of Inductive Bias, Machine Learning Proceedings, pp.41-48, 1993.

X. Chai, Z. Liu, F. Yin, Z. Liu, C. et al., Two streams Recurrent Neural Networks for Large-Scale Continuous Gesture Recognition, 2016 23rd International Conference on Pattern Recognition (ICPR), pp.31-36, 2016.

K. Cho, B. Van-merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1724-1734, 2014.

S. Chopra, R. Hadsell, and Y. Lecun, Learning a similarity metric discriminatively, with application to face verification, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol.1, pp.539-546, 2005.

G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems, vol.2, issue.4, pp.303-314, 1989.

S. Dieleman, J. Schlüter, C. Raffel, E. Olson, S. K. Sønderby et al., Lasagne: First release, 2015.

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term Recurrent Convolutional Networks for Visual Recognition and Description, 2014.

Y. Duan, M. Andrychowicz, B. Stadie, O. J. Ho, J. Schneider et al., One-shot imitation learning, Advances in neural information processing systems, pp.1087-1098, 2017.

D. Orazio, T. Marani, R. Renò, V. , C. et al., Recent trends in gesture recognition: how depth data has improved classical approaches. Image and Vision Computing, 2016.

S. Escalera, X. Baró, J. Gonzalez, M. A. Bautista, M. Madadi et al., Chalearn looking at people challenge 2014: Dataset and results, Computer Vision-ECCV 2014 Workshops, pp.459-473, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01381162

L. Fei-fei, R. Fergus, and P. Perona, One-shot learning of object categories, IEEE transactions on pattern analysis and machine intelligence, vol.28, pp.594-611, 2006.

H. Franco, M. Weintraub, and M. Cohen, Context modeling in a hybrid HMM-neural net speech recognition system, Proceedings of International Conference on Neural Networks (ICNN'97), vol.4, pp.2089-2092, 1997.

K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.36, issue.4, pp.193-202, 1980.

F. A. Gers and E. Schmidhuber, LSTM recurrent networks learn simple context-free and context-sensitive languages, IEEE Transactions on Neural Networks, vol.12, issue.6, pp.1333-1340, 2001.

F. A. Gers, J. Schmidhuber, and F. Cummins, Learning to Forget: Continual Prediction with LSTM, Neural Computation, vol.12, issue.10, pp.2451-2471, 2000.

R. Girshick, J. Donahue, T. Darrell, M. , and J. , Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.580-587, 2014.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, Proceedings of the 23rd international conference on Machine learning, pp.369-376, 2006.

A. Graves and N. Jaitly, Towards End-to-end Speech Recognition with Recurrent Neural Networks, Proceedings of the 31st International Conference on International Conference on Machine Learning, vol.32, 2014.

A. Graves, A. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks, Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pp.6645-6649, 2013.

A. Graves and J. Schmidhuber, Offline handwriting recognition with multidimensional recurrent neural networks, Advances in neural information processing systems, pp.545-552, 2009.

A. Graves, G. Wayne, and I. Danihelka, Neural turing machines, 2014.

R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung, Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit, Nature, vol.405, p.947, 2000.

B. Hariharan and R. Girshick, Low-Shot Visual Recognition by Shrinking and Hallucinating Features, The IEEE International Conference on Computer Vision (ICCV), 2017.

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.

G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012.

G. E. Hinton, S. Osindero, and Y. Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.

G. E. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for deep belief nets, Neural Computation, vol.18, issue.7, p.16764513, 2006.

G. E. Hinton and R. R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, pp.504-507, 2006.

G. E. Hinton and R. S. Zemel, Autoencoders, Minimum Description Length and Helmholtz Free Energy, Advances in Neural Information Processing Systems, vol.6, pp.3-10, 1994.

S. Hochreiter, The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions, Int. J. Uncertain. Fuzziness Knowl.-Based Syst, vol.6, issue.2, pp.107-116, 1998.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.9, issue.8, pp.1735-1780, 1997.

S. Hochreiter, A. S. Younger, and P. R. Conwell, Learning to Learn Using Gradient Descent, Artificial Neural NetworksICANN 2001, pp.87-94, 2001.

A. L. Hodgkin and A. F. Huxley, A quantitative description of membrane current and its application to conduction and excitation in nerve, The Journal of Physiology, vol.117, issue.4, pp.500-544, 1952.

E. Hoffer and N. Ailon, Deep metric learning using Triplet network, 2014.

G. Huang, Z. Liu, L. V. Maaten, and K. Q. Weinberger, Densely Connected Convolutional Networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2261-2269, 2017.

R. Ibañez, A. Soria, A. Teyseyre, C. , and M. , Easy gesture recognition for Kinect, Advances in Engineering Software, vol.76, pp.171-180, 2014.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015.

P. Jaccard, Lois de distribution florale dans la zone alpine, Bulletin de la SocietéVáudoise des Sciences Naturelles, vol.38, pp.69-130, 1902.

M. G. Jacob and J. P. Wachs, Context-based hand gesture recognition for the operating room, Pattern Recognition Letters, vol.36, pp.196-203, 2014.

M. Jaderberg, K. Simonyan, A. Zisserman, C. Cortes, N. D. Lawrence et al., Spatial Transformer Networks, Advances in Neural Information Processing Systems, vol.28, pp.2017-2025, 2015.

D. Kelly, J. M. Donald, and C. Markham, Evaluation of threshold model HMMS and Conditional Random Fields for recognition of spatiotemporal gestures in sign language, Computer Vision Workshops (ICCV Workshops), pp.490-497, 2009.

J. H. Kim, N. D. Thang, K. , and T. S. , 3-D hand motion tracking and gesture recognition using a data glove, 2009 IEEE International Symposium on Industrial Electronics, pp.1013-1018, 2009.

D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, Proceedings of the International Conference on Learning Representations (ICLR), 2014.

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins et al., Overcoming catastrophic forgetting in neural networks, Proceedings of the National Academy of Sciences, vol.114, issue.13, pp.3521-3526, 2017.

A. Kläser, M. Marszalek, and C. Schmid, A Spatio-Temporal Descriptor Based on 3d-Gradients, Proceedings of the British Machine Vision Conference, pp.1-10, 2008.

G. Koch, R. Zemel, and R. Salakhutdinov, Siamese neural networks for one-shot image recognition, ICML Deep Learning Workshop, vol.2, 2015.

O. Koller, J. Forster, and H. Ney, Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers, Computer Vision and Image Understanding, vol.141, pp.108-125, 2015.

O. Koller, H. Ney, and R. Bowden, Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3793-3802, 2016.

O. Koller, S. Zargaran, and H. Ney, Re-sign: Re-aligned end-to-end sequence modelling with deep recurrent cnn-hmms, IEEE Conference on Computer Vision and Pattern Recognition, pp.3416-3424, 2017.

O. Koller, S. Zargaran, H. Ney, and R. Bowden, Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition, Proceedings of the British Machine Vision Conference, 2016.

J. Konecny and M. Hagara, One-Shot-Learning Gesture Recognition using HOG-HOF Features, Journal of Machine Learning Research, vol.15, pp.2513-2532, 2014.

A. Krizhevsky, Learning multiple layers of features from tiny images, 2009.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.

A. Kurakin, Z. Zhang, and Z. Liu, A real time system for dynamic hand gesture recognition with a depth sensor, Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European, pp.1975-1979, 2012.

B. M. Lake, C. Lee, J. R. Glass, and J. B. Tenenbaum, One-shot learning of generative speech concepts, Proceedings of the 36th Annual Meeting of the Cognitive Science Society, 2014.

B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, Human-level concept learning through probabilistic program induction, Science, vol.350, issue.6266, pp.1332-1338, 2015.

P. Lamere, P. Kwok, R. Gouvêa, B. Raj, R. Singh et al., The CMU SPHINX-4 Speech Recognition System. Unpublished manuscript, 2003.

I. Laptev, On Space-Time Interest Points, International Journal of Computer Vision, vol.64, issue.2-3, pp.107-123, 2005.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.521, issue.7553, pp.436-444, 2015.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

H. Lee and J. H. Kim, An HMM-based threshold model approach for gesture recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.21, issue.10, pp.961-973, 1999.

J. Lee, H. Choi, D. Park, Y. Chung, H. Kim et al., Fault Detection and Diagnosis of Railway Point Machines by Sound Analysis, Sensors, issue.4, p.16, 2016.

J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras et al., Noise2noise: Learning Image Restoration without Clean Data, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.2971-2980, 2018.

V. I. Levenshtein, Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, vol.10, p.707, 1966.

H. Li and M. Greenspan, Model-based segmentation and recognition of dynamic gestures in continuous video streams, Pattern Recognition, vol.44, issue.8, pp.1614-1628, 2011.

A. L. Maas, A. Y. Hannun, and A. Y. Ng, Rectifier nonlinearities improve neural network acoustic models, Proceedings of the 30th International Conference on Machine Learning (workshop), vol.30, p.3, 2013.

L. V. Maaten and G. Hinton, Visualizing data using t-SNE, Journal of machine learning research, vol.9, pp.2579-2605, 2008.

A. Mccallum, D. Freitag, and F. C. Pereira, Maximum Entropy Markov Models for Information Extraction and Segmentation, Proceedings of the Seventeenth International Conference on Machine Learning, pp.591-598, 2000.

M. Mccloskey and N. J. Cohen, Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem, Psychology of Learning and Motivation, vol.24, pp.109-165, 1989.

L. Miranda, T. Vieira, D. Martínez, T. Lewiner, A. W. Vieira et al., Online gesture recognition from pose kernel learning and decision forests, Pattern Recognition Letters, vol.39, pp.65-73, 2014.

T. M. Mitchell, The need for biases in learning generalizations, 1980.

L. Morency, A. Quattoni, D. , and T. , Latent-dynamic discriminative models for continuous gesture recognition, Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on, pp.1-8, 2007.

N. Morgan and H. Bourlard, An Introduction to the Hybrid HMM/Connectionist Approach. Signal Processing Magazine, IEEE, vol.12, issue.3, pp.24-42, 1995.

V. Nair and G. E. Hinton, Rectified linear units improve restricted boltzmann machines, Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML'10, pp.807-814, 2010.

N. Neverova, C. Wolf, G. Taylor, and F. Nebout, ModDrop: Adaptive Multi-Modal Gesture Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, pp.1692-1706, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01178733

N. Neverova, C. Wolf, G. W. Taylor, and F. Nebout, Multi-scale deep learning for gesture detection and localization, Computer Vision-ECCV 2014 Workshops, pp.474-490, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01419792

R. Pascanu, T. Mikolov, and Y. Bengio, On the difficulty of training recurrent neural networks, Proceedings of the 30th International Conference on Machine Learning, vol.28, pp.1310-1318, 2013.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

W. Pei, D. M. Tax, and L. Van-der-maaten, Modeling time series similarity with siamese recurrent networks, 2016.

L. Pigou, A. Van-den-oord, S. Dieleman, M. Van-herreweghe, D. et al., Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video, International Journal of Computer Vision, pp.1-10, 2016.

D. B. Pisoni, Speech and Speaker Recognition, vol.84, pp.457-458, 1988.

S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz et al., Adaptive Histogram Equalization and Its Variations, Comput. Vision Graph. Image Process, vol.39, issue.3, pp.355-368, 1987.

M. Plappert, C. Mandery, A. , and T. , Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks, 2017.

L. Rabiner and B. Juang, An introduction to hidden Markov models, IEEE ASSP Magazine, vol.3, issue.1, pp.4-16, 1986.

L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, vol.77, issue.2, pp.257-286, 1989.

M. Ravanelli, P. Brakel, M. Omologo, and Y. Bengio, Improving Speech Recognition by Revising Gated Recurrent Units, Proc. Interspeech, pp.1308-1312, 2017.

D. Rezende, . Shakir, I. Danihelka, K. Gregor, and D. Wierstra, One-Shot Generalization in Deep Generative Models, Proceedings of The 33rd International Conference on Machine Learning, vol.48, pp.1521-1529, 2016.

O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015, pp.234-241, 2015.

S. Ruder, An overview of gradient descent optimization algorithms, 2016.

A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, Meta-Learning with Memory-Augmented Neural Networks, Proceedings of The 33rd International Conference on Machine Learning, vol.48, pp.1842-1850, 2016.

J. Schmidhuber, Evolutionary principles in self-referential learning. on learning now to learn: The meta-meta-meta...-hook. Diploma thesis, 1987.

J. Schreiber, pomegranate: Fast and Flexible Probabilistic Modeling in Python, Journal of Machine Learning Research, vol.18, issue.164, pp.1-6, 2018.

B. Seddik, S. Gazzah, T. Chateau, A. , and N. E. , Augmented skeletal joints for temporal segmentation of sign language actions, Image Processing, Applications and Systems Conference (IPAS), pp.1-6, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01295859

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le et al., Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, ICLR, 2017.

Z. Shi and T. K. Kim, Learning and Refining of Privileged Information-Based RNNs for Action Recognition from Depth Sequences, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4684-4693, 2017.

J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio et al., Real-time human pose recognition in parts from single depth images, CVPR 2011, pp.1297-1304, 2011.

K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.

N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

R. K. Srivastava, K. Greff, J. Schmidhuber, C. Cortes, N. D. Lawrence et al., Training Very Deep Networks, Advances in Neural Information Processing Systems, vol.28, pp.2377-2385, 2015.

I. Sutskever, O. Vinyals, L. , Q. V. Welling, M. Cortes et al., Sequence to Sequence Learning with Neural Networks, Advances in Neural Information Processing Systems, vol.27, pp.3104-3112, 2014.

C. Sutton, An Introduction to Conditional Random Fields. Foundations and Trends® in Machine Learning, vol.4, pp.267-373, 2012.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1-9, 2015.

Y. Taigman, M. Yang, M. Ranzato, W. , and L. , DeepFace: Closing the Gap to Human-Level Performance in Face Verification, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.1701-1708, 2014.

, Theano: A Python framework for fast computation of mathematical expressions, Theano Development Team, 2016.

S. Thrun, Lifelong learning algorithms, Learning to Learn, pp.181-209, 1998.

A. Toshev and C. Szegedy, DeepPose: Human Pose Estimation via Deep Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.1653-1660, 2014.

K. Tripathi and N. B. Nandi, Continuous Indian Sign Language Gesture Recognition and Sentence Formation, Procedia Computer Science, vol.54, pp.523-531, 2015.
DOI : 10.1016/j.procs.2015.06.060

URL : https://doi.org/10.1016/j.procs.2015.06.060

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion, J. Mach. Learn. Res, vol.11, pp.3371-3408, 2010.

T. K. Vintsyuk, Speech discrimination by dynamic programming, Cybernetics, vol.4, issue.1, pp.52-57, 1972.
DOI : 10.1007/bf01074755

O. Vinyals, S. Bengio, and M. Kudlur, Order Matters: Sequence to sequence for sets, Proceedings of the International Conference on Learning Representations (ICLR), 2016.

O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra, Matching networks for one shot learning, Advances in Neural Information Processing Systems, pp.3630-3638, 2016.

H. Wang, X. Chai, X. Hong, G. Zhao, C. et al., Isolated Sign Language Recognition with Grassmann Covariance Matrices, ACM Trans. Access. Comput, vol.8, issue.4, 2016.
DOI : 10.1145/2897735

H. Wang, A. Kläser, C. Schmid, and C. Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.103, issue.1, pp.60-79, 2013.
DOI : 10.1007/s11263-012-0594-8

URL : https://hal.archives-ouvertes.fr/hal-00725627

W. Li, Z. Zhang, and Z. Liu, Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures, IEEE Transactions on Circuits and Systems for Video Technology, vol.18, pp.1499-1510, 2008.

P. J. Werbos, Generalization of backpropagation with application to a recurrent gas market model, Neural Networks, vol.1, issue.4, pp.339-356, 1988.

B. Willmore, P. A. Watters, and D. J. Tolhurst, A Comparison of Natural-Image-Based Models of Simple-Cell Coding, Perception, vol.29, issue.9, pp.1017-1040, 2000.

M. Woodward and C. Finn, Active one-shot learning, 2017.

D. Wu, L. Pigou, P. J. Kindermans, N. D. Le, L. Shao et al., Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.8, pp.1583-1597, 2016.
DOI : 10.1109/tpami.2016.2537340

URL : https://doi.org/10.1109/tpami.2016.2537340

D. Wu and L. Shao, Multimodal dynamic networks for gesture recognition, Proceedings of the 22nd ACM international conference on Multimedia, pp.945-948, 2014.
DOI : 10.1145/2647868.2654969

D. Wu, F. Zhu, and L. Shao, One shot learning gesture recognition from RGBD images, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp.7-12, 2012.
DOI : 10.1109/cvprw.2012.6239179

K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville et al., Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, 2015.

. Corr,

H. Yang and S. Lee, Combination of manual and non-manual features for sign language recognition based on conditional random field and active appearance model, Machine Learning and Cybernetics (ICMLC), 2011 International Conference on, vol.4, pp.1726-1731, 2011.

Y. Yin and R. Davis, Real-time continuous gesture recognition for natural human-computer interaction, Visual Languages and Human-Centric Computing (VL/HCC), pp.113-120, 2014.
DOI : 10.1109/vlhcc.2014.6883032

S. J. Young and S. Young, The HTK Hidden Markov Model Toolkit: Design and Philosophy, vol.2, pp.2-44, 1994.

S. Zagoruyko and N. Komodakis, Wide Residual Networks, Proceedings of the British Machine Vision Conference, 2016.
DOI : 10.5244/c.30.87

URL : https://hal.archives-ouvertes.fr/hal-01832503

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, 2016.

F. Zhu, L. Shao, J. Xie, and Y. Fang, From handcrafted to learned representations for human action recognition: A survey, Image and Vision Computing, 2016.

, Expériences et Contributions dans le passé, un modèle doit être capable d'apprendre à classer de nouvelles occurrences des ces catégories

, En effet, la plupart des techniques d'apprentissage nécessitant un large ensemble de données statistiquement représentatives de la réalité ne peuvent s'adapter à cette situation extrême. De nouveaux modèles sont dont nécessaire pour exploiter au mieux les exemples d'apprentissage. Des travaux essaie notamment de tirer parti de la robustesse et de la flexibilité des réseaux de neurones pour cette tâche, tout essayant d'éviter les problème de sur-apprentissage

, En effet, un utilisateur pourrait par exemple programmer un robot assistant pour réagir à ses commandes de manière similaire aux systèmes vocaux actuellement. Cette thèse étudie ce paradigme d'apprentissage et de test dans le cas de gestes isolés, c'est à dire que le modèle doit prédire la classe pour des séquences, L'apprentissage one-shot présente des perspective d'utilisations très intéressante pour la reconnaissance de gestes, et c'est la raison pour laquelle nous avons décidé de l'étudier

, Comparé aux paradigmes d'apprentissage et de classification traditionnels, le one-shot reste jusqu'à aujourd'hui un domaine assez peu étudié, avec néanmoins une activité de recherche croissante. Par exemple, il n'existe à notre connaissance qu'une seule contribution sur l'utilisation de réseaux de neurones pour des données séquentielles, 2016.

. Santoro, ont cependant définit un cadre d'étude et une méthodologie plus précise pour étudier le sujet, ce qui permet de définir des objectifs tangibles à accomplir et de canaliser l'effort de recherche sur ce domaine, Cette situation s'explique notamment par la difficulté de la tâche qui requiert des techniques non-conventionnelles. Les publications récentes, 2016.

, Expériences et Contributions

, sur des séquence possède un état de l'art très fourni avec de nombreuses applications déjà disponibles auprès du grand public. Le domaine reste cependant très actif et continu de s'améliorer ou de se diversifier

. Apprentissage-"en-un and . Coup, Dans son mode de fonctionnement one-shot, la reconnaissance a lieu dans un contexte épisodique caractérisé par la succession d'étapes suivantes : 1. tirage d'un vocabulaire í²± ep

, tirage d'un (éventuellement plus) exemple d'apprentissage pour chaque classe de í²± ep , donnant ainsi lieu à un ensemble d'apprentissage (í±‹ ep

, entrainement d'un classifieur sur (í±‹ ep

, Elle a donc aussi lieu lors de a phase de test pour évaluer les performance d'un modèle. L'objectif n'est donc pas de maximiser les performances pour un épisode particulier mais pour n'importe quelle combinaison de classes í²± ep à distinguer les unes des autres

, Pour la majorité des modèles, cette étape de préparation est incontournable pour le succès de la reconnaissance, tandis que les données épisodiques ne servent qu'à spécialiser le modèle en pratique. La littérature sur le sujet apporte plusieurs implémentations possibles, comme le transfert d'apprentissage depuis un modèle entraîné sur tâche différente d'un domaine connexe. Un exemple de transfert est l'adaptation des travaux de vérification pour la biométrie. Pour toute base de données compatible avec la classification en apprentissage one-shot, il est possible de constituer un sous-problème de vérification en tirant des paires d'exemples issues de classes tirée au hasard. Lorsque la taille de l'ensemble d'apprentissage (nombre d'exemples et de classes) est suffisante, on peut aussi entrainer directement le modèle sur des épisodes en veillant toujours à isoler les catégories utilisée de celles réservée au tests. Nous expérimentons ces deux approches ainsi qu'une troisième solution dérivée de la vérification, avec pour objectif d'apprendre des gestes issus du langage des signes. Un lexique de signes enregistrés dans des vidéos nous offre en effet un nombre de classes suffisant, Il est à noter que cette procédure de test épisodique est souvent précédée d'une autre phase d'apprentissage en amont qui prépare le modèle et facilite le travail de l'étape 3 dans laquelle le faible nombre d'exemple limite la quantité d'information disponible, 2000.

, La plupart des publications sur l'apprentissage one-shot faisant appel à des réseaux de neurones utilisent une stratégie commune : un réseau de neurone est entrainé pour générer un