. .. , 82 6.1.2 Évaluation de l'impact du style de parole sur la qualité, vol.85

, Évaluation de la robustesse des systèmes de prédiction de performances

, Impact de la taille du corpus d'apprentissage sur la qualité, vol.87

, Effet de la qualité du SRAP ayant généré les données d'apprentissage sur l'apprentissage des SPPs, p.89

.. .. Conclusion,

. .. Apprentissage-multi-tâche, , p.101

.. .. Conclusion,

, Ainsi, l'objectif sera d'évaluer l'impact de ces informations au moment de la prédiction de performances, de minimiser le coût de développement des systèmes de RAP appris sur un dialecte, Bibliographie personnelle générer des représentations spécifiques et les intégrer dans notre système de prédiction CNN appris sur des données arabes

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen et al., TensorFlow : Large-scale machine learning on heterogeneous systems, 2015.

C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri, Openfst : A general and efficient weighted finite-state transducer library, International Conference on Implementation and Application of Automata, pp.11-23, 2007.

T. Anastasakos, J. Mcdonough, R. Schwartz, and J. Makhoul, A compact model for speaker-adaptive training, Fourth International Conference on, vol.2, pp.1137-1140, 1996.

A. Asadi, R. Schwartz, and J. Makhoul, Automatic detection of new words in a large vocabulary continuous speech recognition system, Proc. of International Conference on Acoustics, Speech and Signal Processing, 1990.

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, 2014.

L. Baum, An inequality and associated maximization technique in statistical estimation of probabilistic functions of a markov process, Inequalities, vol.3, pp.1-8, 1972.

F. Béchet, Lia phon : un systeme complet de phonétisation de textes, vol.42, pp.47-67, 2001.

Y. Belinkov and J. Glass, Analyzing hidden representations in end-to-end automatic speech recognition systems, Advances in Neural Information Processing Systems, pp.2438-2448, 2017.

Y. Belinkov, L. Màrquez, H. Sajjad, N. Durrani, F. Dalvi et al., Evaluating layers of representation in neural machine translation on part-of-speech and semantic tagging tasks, Proceedings of the Eighth International Joint Conference on Natural Language Processing, vol.1, pp.1-10, 2017.

J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, Journal of Machine Learning Research, vol.13, pp.281-305, 2012.

J. Blatz, E. Fitzgerald, G. Foster, S. Gandrabur, C. Goutte et al., Confidence estimation for machine translation, Proceedings of the 20th international conference on Computational Linguistics, p.315, 2004.

P. Breheny, Classification and regression trees, 1984.

S. Brunessaux, P. Giroux, B. Grilheres, M. Manta, M. Bodin et al., The maurdor project : improving automatic processing of digital documents, 11th IAPR International Workshop on Document Analysis Systems (DAS), pp.349-354, 2014.

F. Chollet, , 2015.

R. Collobert and S. Bengio, Links between perceptrons, mlps and svms, Proceedings of the twenty-first international conference on Machine learning, p.23, 2004.

R. Collobert and J. Weston, A unified architecture for natural language processing : Deep neural networks with multitask learning, Proceedings of the 25th international conference on Machine learning, pp.160-167, 2008.

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu et al., Natural language processing (almost) from scratch, Journal of Machine Learning Research, vol.12, issue.8, pp.2493-2537, 2011.

J. Crego, J. Kim, G. Klein, A. Rebollo, K. Yang et al., Systran's pure neural machine translation systems, 2016.

G. E. Dahl, D. Yu, L. Deng, and A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Transactions on audio, speech, and language processing, vol.20, issue.1, pp.30-42, 2012.

W. Dai, C. Dai, S. Qu, J. Li, and S. Das, Very deep convolutional neural networks for raw waveforms, Acoustics, Speech and Signal Processing, pp.421-425, 2017.

S. B. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, Readings in speech recognition, pp.65-74, 1990.

M. De-calmès and G. Pérennou, Bdlex : a lexicon for spoken and written french, Proceedings of 1st International Conference on Langage Resources & Evaluation, pp.1129-1136, 1998.

J. G. Souza, H. Zamani, M. Negri, M. Turchi, and F. Daniele, Multitask learning for adaptive quality estimation of automatically transcribed utterances, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies, pp.714-724, 2015.

J. G. Souza, C. Buck, M. Turchi, and M. Negri, Fbk-uedin participation to the wmt13 quality estimation shared task, Proceedings of the eighth workshop on statistical machine translation, pp.352-358, 2013.

V. V. Digalakis and L. G. Neumeyer, Speaker adaptation using combined transformation and bayesian methods, IEEE transactions on speech and audio processing, vol.4, issue.4, pp.294-300, 1996.

W. A. Dreschler, H. Verschuure, C. Ludvigsen, and S. Westermann, Icra noises : artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment : Ruidos icra : Señates de ruido artificial con espectro similar al habla y propiedades temporales para pruebas de instrumentos auditivos, Audiology, vol.40, issue.3, pp.148-157, 2001.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011.

F. Eyben, M. Wöllmer, and B. Schuller, Opensmile : The munich versatile and fast open-source audio feature extractor, Proceedings of the 18th ACM International Conference on Multimedia, MM '10, pp.1459-1462, 2010.

Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, Icml, vol.96, pp.148-156, 1996.

Y. Fu and L. Du, Combination of multiple predictors to improve confidence measure based on local posterior probabilities, Acoustics, Speech, and Signal Processing, vol.1, p.93, 2005.

K. Fukushima, Neocognitron : a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological cybernetics, vol.36, issue.4, pp.193-202, 1980.

O. Galibert, Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech, INTERSPEECH, pp.1131-1134, 2013.

O. Galibert, S. Rosset, C. Grouin, P. Zweigenbaum, and L. Quintard, Structured and extended named entity evaluation in automatic speech transcriptions, Proceedings of 5th International Joint Conference on Natural Language Processing, pp.518-526, 2011.

S. Galliano, E. Geoffrois, D. Mostefa, K. Choukri, J. Bonastre et al., The ester phase ii evaluation campaign for the rich transcription of french broadcast news, pp.1149-1152, 2005.

J. Gauvain and C. Lee, Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains, IEEE transactions on speech and audio processing, vol.2, pp.291-298, 1994.

P. Geurts, D. Ernst, and L. Wehenkel, Extremely randomized trees, Machine learning, vol.63, issue.1, pp.3-42, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00341932

I. J. Good, The population frequencies of species and the estimation of population parameters, Biometrika, vol.40, issue.3-4, pp.237-264, 1953.

I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning, vol.1, 2016.

A. Graves and N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, International Conference on Machine Learning, pp.1764-1772, 2014.

A. Graves, A. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks, Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pp.6645-6649, 2013.

G. Gravier, G. Adda, N. Paulson, M. Carré, A. Giraudel et al., The etape corpus for the evaluation of speech-based tv content processing in the french language, LREC-Eighth international conference on Language Resources and Evaluation, p.page na, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00712591

T. J. Hazen, S. Seneff, and J. Polifroni, Recognition confidence scoring and its use in speech understanding systems, Computer Speech & Language, vol.16, issue.1, pp.49-67, 2002.

H. Hermansky and L. A. Cox, Perceptual linear predictive (plp) analysisresynthesis technique, Second European Conference on Speech Communication and Technology, 1991.

H. Hermansky, E. Variani, and V. Peddinti, Mean temporal distance : Predicting asr error from temporal properties of speech signal, Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp.7423-7426, 2013.

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed et al., Deep neural networks for acoustic modeling in speech recognition : The shared views of four research groups, IEEE Signal processing magazine, vol.29, issue.6, pp.82-97, 2012.

D. H. Hubel and T. N. Wiesel, Receptive fields, binocular interaction and functional architecture in the cat's visual cortex, The Journal of physiology, vol.160, issue.1, pp.106-154, 1962.

S. Ioffe and C. Szegedy, Batch normalization : Accelerating deep network training by reducing internal covariate shift, 2015.

S. Jalalvand, D. Falavigna, M. Matassoni, P. Svaizer, and M. Omologo, Boosted acoustic model learning and hypotheses rescoring on the chime-3 task, Automatic Speech Recognition and Understanding (ASRU), pp.409-415, 2015.

S. Jalalvand, M. Negri, F. Daniele, and M. Turchi, Driving rover with segmentbased asr quality estimation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol.1, pp.1095-1105, 2015.

S. Jalalvand, M. Negri, D. Falavigna, and M. Turchi, Driving rover with segmentbased asr quality estimation, vol.01, 2015.

S. Jalalvand, M. Negri, M. Turchi, J. G. Souza, D. Falavigna et al., Transcrater : a tool for automatic speech recognition quality estimation, Proceedings of ACL-2016 System Demonstrations, pp.43-48, 2016.

F. Jelinek, Continuous speech recognition by statistical methods, Proceedings of the IEEE, vol.64, issue.4, pp.532-556, 1976.

F. Jelinek, R. L. Mercer, L. R. Bahl, and J. K. Baker, Perplexity -measure of the difficulty of speech recognition tasks, The Journal of the Acoustical Society of America, vol.62, issue.S1, pp.63-63, 1977.

J. Kahn, O. Galibert, L. Quintard, M. Carré, A. Giraudel et al., A presentation of the repere challenge, Content-Based Multimedia Indexing (CBMI), 2012 10th International Workshop on, pp.1-6, 2012.

Y. Kim, Convolutional neural networks for sentence classification, 2014.

D. P. Kingma and J. Ba, Adam : A method for stochastic optimization, CoRR, vol.6980, 2014.

R. Kneser and H. Ney, Improved backing-off for m-gram language modeling, icassp, vol.1, pp.181-185, 1995.

G. Koch, R. Zemel, and R. Salakhutdinov, Siamese neural networks for one-shot image recognition, ICML Deep Learning Workshop, vol.2, 2015.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.

J. Lafferty, A. Mccallum, and F. C. Pereira, Conditional random fields : Probabilistic models for segmenting and labeling sequence data, 2001.

S. Lai, L. Xu, K. Liu, and J. Zhao, Recurrent convolutional neural networks for text classification, AAAI, vol.333, pp.2267-2273, 2015.

B. Lecouteux, G. Linares, and B. Favre, Combined low level and high level features for out-of-vocabulary word detection, 2009.
URL : https://hal.archives-ouvertes.fr/hal-02088875

Y. Lecun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Handwritten digit recognition with a back-propagation network, Advances in neural information processing systems, pp.396-404, 1990.

C. J. Leggetter and P. C. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer speech & language, vol.9, pp.171-185, 1995.

L. V. Maaten and G. Hinton, Visualizing data using t-sne, Journal of machine learning research, vol.9, issue.11, pp.2579-2605, 2008.

J. D. Markel and A. J. Gray, Linear prediction of speech, vol.12, 2013.

A. F. Martin and C. S. Greenberg, Nist 2008 speaker recognition evaluation : Performance across telephone and room microphone channels, Tenth Annual Conference of the International Speech Communication Association, 2009.

J. Mauclair, Mesures de confiance en traitement automatique de la parole et applications, 2006.

W. S. Mcculloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics, vol.5, issue.4, pp.115-133, 1943.

B. Mcfee, C. Raffel, D. Liang, D. P. Ellis, M. Mcvicar et al., librosa : Audio and music signal analysis in python, 2015.

N. Meinshausen and P. Bühlmann, Stability selection, Journal of the Royal Statistical Society : Series B (Statistical Methodology), vol.72, issue.4, pp.417-473, 2010.

E. Mendoza, N. Valencia, J. Muñoz, and H. Trujillo, Differences in voice quality between men and women : use of the long-term average spectrum (ltas), Journal of Voice, vol.10, issue.1, pp.59-66, 1996.

B. T. Meyer, S. H. Mallidi, H. Kayser, and H. Hermansky, Predicting error rates for unknown data in automatic speech recognition, Acoustics, Speech and Signal Processing, pp.5330-5334, 2017.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, NIPS, 2013.

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, pp.3111-3119, 2013.

A. Mohamed, G. Hinton, and G. Penn, Understanding how deep belief networks perform acoustic modelling, Acoustics, Speech and Signal Processing, pp.4273-4276, 2012.

P. J. Moreno, B. Logan, and B. Raj, A boosting approach for confidence scoring, Seventh European Conference on Speech Communication and Technology, 2001.

M. Negri, M. Turchi, J. G. De-souza, and D. Falavigna, Quality estimation for automatic speech recognition, COLING, pp.1813-1823, 2014.

D. Palaz, M. M. Doss, and R. Collobert, Convolutional neural networks-based continuous speech recognition using raw speech signal, Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp.4295-4299, 2015.

J. Pennington, R. Socher, and C. Manning, Glove : Global vectors for word representation, Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp.1532-1543, 2014.

G. Perennou and M. D. Calmes, Bdlex lexical data and knowledge base of spoken and written french, European conference on Speech Technology, 1987.

D. Povey, L. Burget, M. Agarwal, P. Akyazi, F. Kai et al., The subspace gaussian mixture model structured model for speech recognition, Computer Speech & Language, vol.25, issue.2, pp.404-439, 2011.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The kaldi speech recognition toolkit, IEEE 2011 workshop on automatic speech recognition and understanding, 2011.

C. Quirk, Training a sentence-level machine translation confidence measure, LREC. Citeseer, 2004.

L. R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE, vol.77, issue.2, pp.257-286, 1989.

F. Rosenblatt, The perceptron : a probabilistic model for information storage and organization in the brain, Psychological review, vol.65, issue.6, p.386, 1958.

R. San-segundo, B. Pellom, K. Hacioglu, W. Ward, and J. M. Pardo, Confidence measures for spoken dialogue systems, Acoustics, Speech, and Signal Processing, vol.1, pp.393-396, 2001.

R. Sarikaya, Y. Gao, M. Picheny, and H. Erdogan, Semantic confidence measurement for spoken dialog systems, IEEE Transactions on Speech and Audio Processing, vol.13, issue.4, pp.534-545, 2005.

H. Schmid, Treetagger| a language independent part-of-speech tagger, vol.43, p.28, 1995.

F. Sébastien, F. Jérôme, P. Julien, and R. Stéphane, Prédiction a priori de la qualité de la transcription automatique de la parole bruitée, Proc. XXXIIe Journées d'Études sur la Parole, pp.249-257, 2018.

F. Seide, G. Li, and D. Yu, Conversational speech transcription using contextdependent deep neural networks, Twelfth annual conference of the international speech communication association, 2011.

X. Shi, I. Padhi, and K. Knight, Does string-based neural mt learn source syntax ?, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.1526-1534, 2016.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout : a simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

G. Stemmer, S. Steidl, E. Nöth, H. Niemann, and A. Batliner, Comparison and combination of confidence measures, International Conference on Text, Speech and Dialogue, pp.181-188, 2002.

A. Stolcke, Srilm -an extensible language modeling toolkit, Seventh International Conference on Spoken Language Processing, pp.901-904, 2002.

A. Stolcke, Srilm-an extensible language modeling toolkit, Interspeech, vol.2002, p.2002, 2002.

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, On the importance of initialization and momentum in deep learning, International conference on machine learning, pp.1139-1147, 2013.

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Advances in neural information processing systems, pp.3104-3112, 2014.

N. Ueffing and H. Ney, Word-level confidence estimation for machine translation, Computational Linguistics, vol.33, issue.1, pp.9-40, 2007.

V. Vapnik, Estimation of dependences based on empirical data, 2006.

S. Wang, Y. Qian, and K. Yu, What does the speaker embedding encode ? In Interspeech, vol.2017, pp.1497-1501, 2017.

P. Wiggers and L. J. Rothkrantz, Using confidence measures and domain knowledge to improve speech recognition, Eighth European Conference on Speech Communication and Technology, 2003.

C. J. Willmott and K. Matsuura, Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance. Climate research, vol.30, pp.79-82, 2005.

I. H. Witten and T. C. Bell, The zero-frequency problem : Estimating the probabilities of novel events in adaptive text compression, Ieee transactions on information theory, vol.37, issue.4, pp.1085-1094, 1991.

Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi et al., Google's neural machine translation system : Bridging the gap between human and machine translation, 2016.

Z. Wu and S. King, Investigating gated recurrent neural networks for speech synthesis, 2016.

S. R. Young, Recognition confidence measures : Detection of misrecognitions and out-of-vocabulary words, Proc. of International Conference on Acoustics, Speech and Signal Processing, vol.1, pp.21-24, 1994.

M. D. Zeiler, ADADELTA : an adaptive learning rate method. CoRR, abs/1212, vol.5701, 2012.

R. Zhang and A. I. Rudnicky, Word level confidence annotation using combinations of features, Seventh European Conference on Speech Communication and Technology, 2001.

A. Table, 9 -Exemples de WER prédits au niveau des tours de parole par les meilleurs systèmes de prédiction : TranscRater vs CNN Sof

A. A. Annexes,

R. Transcrater and E. ,

A. Table, 10 -Exemples de WER prédits au niveau des tours de parole par les meilleurs systèmes de prédiction : TranscRater vs CNN Sof tmax EMBED+RAW-SIG