D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A Learning Algorithm for Boltzmann Machines*, Cognitive Science, vol.85, issue.1, pp.147-169, 1985.
DOI : 10.1037/0033-295X.85.2.59

J. Allen, Short term spectral analysis, synthesis, and modification by discrete Fourier transform, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.25, issue.3, pp.235-238, 1977.
DOI : 10.1109/TASSP.1977.1162950

X. Anguera, C. Wooters, and J. Hernando, Acoustic Beamforming for Speaker Diarization of Meetings, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.7, pp.2011-2022, 2007.
DOI : 10.1109/TASL.2007.902460

S. Araki, T. Hayashi, M. Delcroix, M. Fujimoto, K. Takeda et al., Exploring multi-channel features for denoising-autoencoder-based speech enhancement, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.116-120, 2015.
DOI : 10.1109/ICASSP.2015.7177943

S. Araki and T. Nakatani, Hybrid approach for multichannel source separation combining time-frequency mask with multi-channel Wiener filter, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.225-228, 2011.
DOI : 10.1109/ICASSP.2011.5946381

B. S. Atal, Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, The Journal of the Acoustical Society of America, vol.55, issue.6, pp.1304-1312, 1974.
DOI : 10.1121/1.1914702

R. Badeau and T. Virtanen, Nonnegative matrix factorization, Audio Source Separation and Speech Enhancement chapter, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01170924

L. Bahl, P. Brown, P. De-souza, and R. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.49-52, 1986.
DOI : 10.1109/ICASSP.1986.1169179

J. Barker, Missing-Data Techniques: Recognition with Incomplete Spectrograms, Techniques for Noise Robustness in Automatic Speech Recognition chapter, 2012.
DOI : 10.1109/TPAMI.2007.52

J. Barker, R. Marxer, E. Vincent, and S. Watanabe, The third ???CHiME??? speech separation and recognition challenge: Dataset, task and baselines, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.504-511, 2015.
DOI : 10.1109/ASRU.2015.7404837

URL : https://hal.archives-ouvertes.fr/hal-01211376

J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing, 2008.

J. Benesty, S. Makino, and J. Chen, Speech Enhancement, 2005.

E. Benetos, S. Dixon, D. Giannoulis, H. Kirchhoff, and A. Klapuri, Automatic music transcription: challenges and future directions, Journal of Intelligent Information Systems, vol.20, issue.3, pp.41-407, 2013.
DOI : 10.1109/TASL.2011.2164530

URL : http://openaccess.city.ac.uk/2524/1/JIIS-MIRrors-AMT-postprint.pdf

Y. Bengio, Practical Recommendations for Gradient-Based Training of Deep Architectures, Neural Networks: Tricks of the Trade, pp.437-478, 2012.
DOI : 10.1162/089976602317318938

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks, Proceedings of the Conference on Neural Information Processing Systems (NIPS), pp.153-160, 2006.

J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu et al., Theano: a CPU and GPU math expression compiler, Proceedings of the Python for Scientific Computing Conference (SciPy), 2010.

N. Bertin, C. Févotte, and R. Badeau, A tempering approach for Itakura-Saito non-negative matrix factorization. With application to music transcription, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1545-1548, 2009.
DOI : 10.1109/ICASSP.2009.4959891

URL : https://hal.archives-ouvertes.fr/hal-00945283

C. Blandin, A. Ozerov, and E. Vincent, Multi-source TDOA estimation in reverberant audio using angular spectra and clustering, Signal Processing, vol.92, issue.8, pp.92-1950, 2012.
DOI : 10.1016/j.sigpro.2011.09.032

URL : https://hal.archives-ouvertes.fr/inria-00576297

P. Bofill and M. Zibulevsky, Underdetermined blind source separation using sparse representations, Signal Processing, vol.81, issue.11, pp.2353-2362, 2001.
DOI : 10.1016/S0165-1684(01)00120-7

URL : http://iew3.technion.ac.il/~mcib/undetermICA.pdf

S. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.27, issue.2, pp.113-120, 1979.
DOI : 10.1109/TASSP.1979.1163209

H. Bourlard and Y. Kamp, Auto-association by multilayer perceptrons and singular value decomposition, Biological Cybernetics, vol.13, issue.4-5, pp.291-294, 1988.
DOI : 10.1109/MASSP.1987.1165576

URL : https://infoscience.epfl.ch/record/82601/files/rr00-16.pdf

M. Brandstein, D. Ward, and . Eds, Microphone Arrays: Signal Processing Techniques and Applications, 2001.

G. J. Brown and M. Cooke, Computational auditory scene analysis, Computer Speech & Language, vol.8, issue.4, pp.297-336, 1994.
DOI : 10.1006/csla.1994.1016

G. J. Brown and D. Wang, Separation of Speech by Computational Auditory Scene Analysis, Speech Enhancement chapter 16, pp.371-402, 2005.
DOI : 10.1007/3-540-27489-8_16

E. Cano, D. Fitzgerald, and K. Brandenburg, Evaluation of quality of sound source separation algorithms: Human perception vs quantitative metrics, 2016 24th European Signal Processing Conference (EUSIPCO), pp.1758-1762, 2016.
DOI : 10.1109/EUSIPCO.2016.7760550

M. Cartwright, B. Pardo, G. J. Mysore, and M. Hoffman, Fast and easy crowdsourced perceptual audio evaluation, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.619-623, 2016.
DOI : 10.1109/ICASSP.2016.7471749

S. F. Chen and J. Goodman, An empirical study of smoothing techniques for language modeling, Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp.310-318, 1996.

F. Chollet, Keras. https, 2015.

J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv e-prints, 2014.

P. Comon, Independent component analysis, a new concept? Signal Processing, pp.287-314, 1994.

B. Cornelis, M. Moonen, and J. Wouters, Performance Analysis of Multichannel Wiener Filter-Based Noise Reduction in Hearing Aids Under Second Order Statistics Estimation Errors, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.5, pp.1368-1381, 2011.
DOI : 10.1109/TASL.2010.2090519

R. Crochiere, A weighted overlap-add method of short-time Fourier analysis/Synthesis, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.28, issue.1, pp.99-102, 1980.
DOI : 10.1109/TASSP.1980.1163353

S. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.28, issue.4, pp.357-366, 1980.
DOI : 10.1109/TASSP.1980.1163420

M. Delfarah and D. Wang, Features for Masking-Based Monaural Speech Separation in Reverberant Conditions, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.5, pp.1085-1094, 2017.
DOI : 10.1109/TASLP.2017.2687829

L. Deng, Deep Learning: Methods and Applications, Foundations and Trends?? in Signal Processing, vol.7, issue.3-4, pp.197-387, 2014.
DOI : 10.1561/2000000039

URL : http://research.microsoft.com/pubs/209355/DeepLearning-NowPublishing-Vol7-SIG-039.pdf

L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, High-performance robust speech recognition using stereo training data, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), pp.301-304, 2001.
DOI : 10.1109/ICASSP.2001.940827

L. Deng and D. Shaughnessy, Speech Processing: A Dynamic and Optimization- Oriented Approach, 2003.

S. Doclo, W. Kellermann, S. Makino, and S. E. Nordholm, Multichannel Signal Enhancement Algorithms for Assisted Listening Devices: Exploiting spatial diversity using multiple microphones, IEEE Signal Processing Magazine, vol.32, issue.2, pp.18-30, 2015.
DOI : 10.1109/MSP.2014.2366780

C. S. Doire, M. Brookes, P. A. Naylor, C. M. Hicks, D. Betts et al., Single-Channel Online Enhancement of Speech Corrupted by Reverberation and Noise, Speech, and Language Processing, pp.572-587, 2017.
DOI : 10.1109/TASLP.2016.2641904

J. Droppo, A. Acero, and L. Deng, Uncertainty decoding with SPLICE for noise robust speech recognition, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.57-60, 2002.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011.

C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, Incorporating second-order functional knowledge for better option pricing, Proceedings of the Conference on Neural Information Processing Systems (NIPS), pp.472-478, 2000.

N. Q. Duong, H. Tachibana, E. Vincent, N. Ono, R. Gribonval et al., Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.205-208, 2011.
DOI : 10.1109/ICASSP.2011.5946376

URL : https://hal.archives-ouvertes.fr/inria-00557145

N. Q. Duong, E. Vincent, and R. Gribonval, Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.7, pp.18-1830, 2010.
DOI : 10.1109/TASL.2010.2050716

URL : https://hal.archives-ouvertes.fr/inria-00435807

N. Q. Duong, E. Vincent, and R. Gribonval, Under-Determined Reverberant Audio Source Separation Using Local Observed Covariance and Auditory-Motivated Time-Frequency Representation, Proceedings of the International Conference on Latent Variable Analysis and Signal Separation, pp.73-80, 2010.
DOI : 10.1007/978-3-642-15995-4_10

URL : https://hal.archives-ouvertes.fr/inria-00541868

J. L. Elman, Finding Structure in Time, Cognitive Science, vol.49, issue.2, pp.179-211, 1990.
DOI : 10.1007/BF00308682

V. Emiya, E. Vincent, N. Harlander, and V. Hohmann, Subjective and Objective Quality Assessment of Audio Source Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.7, pp.2046-2057, 2011.
DOI : 10.1109/TASL.2011.2109381

URL : https://hal.archives-ouvertes.fr/inria-00485729

Y. Ephraim and D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.32, issue.6, pp.1109-1121, 1984.
DOI : 10.1109/TASSP.1984.1164453

H. Erdogan, J. R. Hershey, S. Watanabe, and J. Le-roux, Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.708-712, 2015.
DOI : 10.1109/ICASSP.2015.7178061

H. Erdogan, J. R. Hershey, S. Watanabe, M. I. Mandel, and J. L. Roux, Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks, Interspeech 2016, pp.1981-1985, 2016.
DOI : 10.21437/Interspeech.2016-552

S. Ewert, B. Pardo, M. Mueller, and M. D. Plumbley, Score-Informed Source Separation for Musical Audio Recordings: An overview, IEEE Signal Processing Magazine, vol.31, issue.3, pp.31-116, 2014.
DOI : 10.1109/MSP.2013.2296076

C. Févotte, N. Bertin, and J. Durrieu, Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis, Neural Computation, vol.14, issue.3, pp.793-830, 2009.
DOI : 10.1016/j.sigpro.2007.01.024

C. Févotte and J. Idier, Algorithms for Nonnegative Matrix Factorization with the ??-Divergence, Neural Computation, vol.11, issue.9, pp.2421-2456, 2011.
DOI : 10.1109/TASL.2009.2034186

C. Févotte and A. Ozerov, Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues, Proceedings of International Symposium on Computer Music Modeling and Retrieval, pp.102-115, 2010.
DOI : 10.1109/TASL.2006.885253

J. G. Fiscus, A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER), 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, pp.347-354, 1997.
DOI : 10.1109/ASRU.1997.659110

B. J. Frey, T. T. Kristjansson, L. Deng, and A. Acero, Algonquin -learning dynamic noise models from noisy speech for robust speech recognition, Proceedings of the Conference on Neural Information Processing Systems (NIPS), pp.1165-1171, 2001.

M. Fujimoto and T. Nakatani, Multi-pass feature enhancement based on generative-discriminative hybrid approach for noise robust speech recognition, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5750-5754, 2016.
DOI : 10.1109/ICASSP.2016.7472779

K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.40, issue.4, pp.193-202, 1980.
DOI : 10.1007/BF00344251

M. Gales and S. Young, The Application of Hidden Markov Models in Speech Recognition, Foundations and Trends?? in Signal Processing, vol.1, issue.3, pp.195-304, 2008.
DOI : 10.1561/2000000004

M. J. Gales, Maximum likelihood linear transformations for HMM-based speech recognition, Computer Speech & Language, vol.12, issue.2, pp.75-98, 1998.
DOI : 10.1006/csla.1998.0043

URL : http://svr-www.eng.cam.ac.uk/~mjfg/lintran_CSL.ps.gz

S. Gannot, E. Vincent, S. Markovich-golan, and A. Ozerov, A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.4, 2017.
DOI : 10.1109/TASLP.2016.2647702

URL : https://hal.archives-ouvertes.fr/hal-01414179

T. Gerkmann and E. Vincent, Spectral masking and filtering, Audio Source Separation and Speech Enhancement chapter, 2017.

L. Gillick and S. J. Cox, Some statistical issues in the comparison of speech recognition algorithms, International Conference on Acoustics, Speech, and Signal Processing, pp.532-535, 1989.
DOI : 10.1109/ICASSP.1989.266481

B. R. Glasberg and B. C. Moore, Derivation of auditory filter shapes from notched-noise data, Hearing Research, vol.47, issue.1-2, pp.103-138, 1990.
DOI : 10.1016/0378-5955(90)90170-T

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), pp.249-256, 2010.

X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier networks, Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), pp.315-323, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752497

V. Goel and W. J. Byrne, Minimum Bayes-risk automatic speech recognition, Computer Speech & Language, vol.14, issue.2, pp.115-135, 2000.
DOI : 10.1006/csla.2000.0138

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

R. A. Gopinath, Maximum likelihood modeling with Gaussian distributions for classification, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), pp.661-664, 1998.
DOI : 10.1109/ICASSP.1998.675351

URL : http://www.research.ibm.com/people/r/rameshg/gopinath-icassp98.ps

E. M. Grais, G. Roma, A. J. Simpson, and M. D. Plumbley, Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks, Interspeech 2016, 2016.
DOI : 10.21437/Interspeech.2016-216

E. M. Grais, G. Roma, A. J. Simpson, and M. D. Plumbley, Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.9, pp.1469-1479, 2017.
DOI : 10.1109/TASLP.2017.2716443

URL : http://epubs.surrey.ac.uk/841432/1/Two%20Stage%20Single%20Channel%20Audio%20Source%20Separation%20using%20Deep%20Neural%20Networks.pdf

K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, LSTM: A Search Space Odyssey, IEEE Transactions on Neural Networks and Learning Systems, vol.28, issue.10, pp.1-11, 2017.
DOI : 10.1109/TNNLS.2016.2582924

D. Griffin and J. Lim, Signal estimation from modified short-time Fourier transform, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.32, issue.2, pp.236-243, 1984.
DOI : 10.1109/TASSP.1984.1164317

URL : http://hil.t.u-tokyo.ac.jp/~kameoka/SAP/papers/Griffin1984__Signal_Estimation_from_Modified_Short-Time_Fourier_Transform.pdf

C. Gulcehre, M. Moczulski, M. Denil, and Y. Bengio, Noisy activation functions, Proceedings of the International Conference on Machine Learning (ICML), pp.3059-3068, 2016.

M. B. Gur and C. Niezrecki, A source separation approach to enhancing marine mammal vocalizations, The Journal of the Acoustical Society of America, vol.126, issue.6, pp.3062-3070, 2009.
DOI : 10.1121/1.3257549

K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.123

URL : http://arxiv.org/pdf/1502.01852

T. Heittola, A. Mesaros, T. Virtanen, and M. Gabbouj, Supervised model training for overlapping sound events based on unsupervised source separation, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.8677-8681, 2013.
DOI : 10.1109/ICASSP.2013.6639360

URL : http://www.cs.tut.fi/~moncef/publications/supervised-model-ICASSP-2013.pdf

M. Herdin, N. Czink, H. Ozcelik, and E. Bonek, Correlation Matrix Distance, a Meaningful Measure for Evaluation of Non-Stationary MIMO Channels, 2005 IEEE 61st Vehicular Technology Conference, pp.136-140, 2005.
DOI : 10.1109/VETECS.2005.1543265

H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, The Journal of the Acoustical Society of America, vol.87, issue.4, pp.1738-1752, 1990.
DOI : 10.1121/1.399423

J. Hershey, Z. Chen, J. Le-roux, and S. Watanabe, Deep clustering: Discriminative embeddings for segmentation and separation, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.31-35, 2016.
DOI : 10.1109/ICASSP.2016.7471631

URL : http://arxiv.org/pdf/1508.04306

J. Heymann, L. Drude, and R. Haeb-umbach, Neural network based spectral mask estimation for acoustic beamforming, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.196-200, 2016.
DOI : 10.1109/ICASSP.2016.7471664

J. Heymann, L. Drude, and R. Haeb-umbach, A generic neural acoustic beamforming architecture for robust multi-channel speech processing, Computer Speech & Language, vol.46, pp.374-385, 2017.
DOI : 10.1016/j.csl.2016.11.007

I. Himawan, P. Motlicek, D. Imseng, B. Potard, N. Kim et al., Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4540-4544, 2015.
DOI : 10.1109/ICASSP.2015.7178830

G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.29-82, 2012.
DOI : 10.1109/MSP.2012.2205597

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

T. Hori, Z. Chen, H. Erdogan, J. R. Hershey, J. L. Roux et al., The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.475-481, 2015.
DOI : 10.1109/ASRU.2015.7404833

P. Huang, M. Kim, M. Hasegawa-johnson, and P. Smaragdis, Deep learning for monaural speech separation, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1562-1566, 2014.
DOI : 10.1109/ICASSP.2014.6853860

URL : http://www.isle.illinois.edu/sst/pubs/2014/huang14icassp.pdf

P. Huang, M. Kim, M. Hasegawa-johnson, and P. Smaragdis, Singingvoice separation from monaural recordings using deep recurrent neural networks, Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp.477-482, 2014.

P. Huang, M. Kim, M. Hasegawa-johnson, and P. Smaragdis, Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, issue.12, pp.2136-2147, 2015.
DOI : 10.1109/TASLP.2015.2468583

C. Hummersone, T. Stokes, and T. Brookes, On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis, Blind Source Separation: Advances in Theory, Algorithms and Applications, pp.349-368, 2014.
DOI : 10.1007/978-3-642-55016-4_12

A. Hyvärinen and E. Oja, Independent component analysis: algorithms and applications, Neural Networks, vol.13, issue.4-5, pp.411-430, 2000.
DOI : 10.1016/S0893-6080(00)00026-5

T. Ishii, H. Komiyama, T. Shinozaki, Y. Horiuchi, and S. Kuroiwa, Reverberant speech recognition based on denoising autoencoder, Proceedings of INTERSPEECH, pp.3512-3516, 2013.

F. Itakura and S. Saito, Analysis synthesis telephony based on the maximum likelihood method, Proceedings of International Congress on Acoustics, pp.17-20, 1968.

A. J. Izenman, Linear Discriminant Analysis, Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning chapter 8, pp.237-280, 2008.
DOI : 10.1007/978-0-387-78189-1_8

X. Jaureguiberry, E. Vincent, and G. Richard, Fusion Methods for Speech Enhancement and Audio Source Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.7, pp.24-1266, 2016.
DOI : 10.1109/TASLP.2016.2553441

URL : https://hal.archives-ouvertes.fr/hal-01120685

J. Jensen and R. C. Hendriks, Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.1, pp.92-102, 2012.
DOI : 10.1109/TASL.2011.2157685

URL : http://dmirlab.tudelft.nl/sites/default/files/05773481-1.pdf

Y. Jiang, D. Wang, R. Liu, and Z. Feng, Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.12, pp.22-2112, 2014.
DOI : 10.1109/TASLP.2014.2361023

A. Jourjine, S. Rickard, and O. Yilmaz, Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), pp.2985-2988, 2000.
DOI : 10.1109/ICASSP.2000.861162

R. Jozefowicz, W. Zaremba, and I. Sutskever, An empirical exploration of recurrent network architectures, Proceedings of International Conference on Machine Learning (ICML), pp.2342-2350, 2015.

B. H. Juang, Deep neural networks ??? a developmental perspective, APSIPA Transactions on Signal and Information Processing, vol.14, 2016.
DOI : 10.1109/TIT.1987.1057328

T. G. Kang, K. Kwon, J. W. Shin, and N. S. Kim, NMF-based Target Source Separation Using Deep Neural Network, IEEE Signal Processing Letters, vol.22, issue.2, pp.229-233, 2015.
DOI : 10.1109/LSP.2014.2354456

P. Karanasou, C. Wu, M. Gales, and P. C. Woodland, I-Vectors and Structured Neural Networks for Rapid Adaptation of Acoustic Models, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.4, pp.818-828, 2017.
DOI : 10.1109/TASLP.2017.2670141

T. Kim, H. T. Attias, S. Y. Lee, and T. W. Lee, Blind Source Separation Exploiting Higher-Order Frequency Dependencies, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.1, pp.70-79, 2007.
DOI : 10.1109/TASL.2006.872618

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization. arXiv e-prints, 2014.

C. Knapp and G. Carter, The generalized correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.24, issue.4, pp.320-327, 1976.
DOI : 10.1109/TASSP.1976.1162830

R. Kneser and H. Ney, Improved backing-off for M-gram language modeling, 1995 International Conference on Acoustics, Speech, and Signal Processing, pp.181-184, 1995.
DOI : 10.1109/ICASSP.1995.479394

R. Koning, N. Madhu, and J. Wouters, Ideal Time–Frequency Masking Algorithms Lead to Different Speech Intelligibility and Quality in Normal-Hearing and Cochlear Implant Listeners, IEEE Transactions on Biomedical Engineering, vol.62, issue.1, pp.331-341, 2015.
DOI : 10.1109/TBME.2014.2351854

D. Kounades-bastian, L. Girin, X. Alameda-pineda, S. Gannot, and R. Horaud, A Variational EM Algorithm for the Separation of Time-Varying Convolutive Audio Mixtures, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.8, pp.24-1408, 2016.
DOI : 10.1109/TASLP.2016.2554286

URL : https://hal.archives-ouvertes.fr/hal-01301762

S. Kullback and R. A. Leibler, On Information and Sufficiency, The Annals of Mathematical Statistics, vol.22, issue.1, pp.79-86, 1951.
DOI : 10.1214/aoms/1177729694

URL : http://doi.org/10.1214/aoms/1177729694

K. Kumatani, J. Mcdonough, and B. Raj, Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors, IEEE Signal Processing Magazine, vol.29, issue.6, pp.29-127, 2012.
DOI : 10.1109/MSP.2012.2205285

URL : http://www.lsv.uni-saarland.de/personalPages/kkumatani/pubdata/apsipa2012b.pdf

H. Kuttruff, Room Acoustics, 2014.

L. Roux, J. Hershey, J. R. Weninger, and F. , Deep NMF for speech separation, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.66-70, 2015.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.9, issue.7553, pp.521-436, 2015.
DOI : 10.1007/s10994-013-5335-x

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998.
DOI : 10.1109/5.726791

D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, issue.6755, pp.401-788, 1999.

D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, Proceedings of the Conference on Neural Information Processing Systems, pp.556-562, 2000.

I. Lee, T. Kim, and T. Lee, Fast fixed-point independent vector analysis algorithms for convolutive blind source separation, Signal Processing, vol.87, issue.8, pp.1859-1871, 2007.
DOI : 10.1016/j.sigpro.2007.01.010

A. Lefèvre, F. Bach, and C. Févotte, Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.313-316, 2011.
DOI : 10.1109/ASPAA.2011.6082314

V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Soviet Physics Doklady, issue.8, p.10, 1966.

H. Li, S. Nie, X. Zhang, and H. Zhang, Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation, Interspeech 2016, pp.550-554, 2016.
DOI : 10.21437/Interspeech.2016-120

J. Li, L. Deng, Y. Gong, and R. Haeb-umbach, An Overview of Noise-Robust Automatic Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.4, pp.745-777, 2014.
DOI : 10.1109/TASLP.2014.2304637

H. Liao and M. J. Gales, Joint uncertainty decoding for noise robust speech recognition, Proceedings of INTERSPEECH, pp.3129-3132, 2005.

J. S. Lim and A. V. Oppenheim, Enhancement and bandwidth compression of noisy speech, Proceedings of the IEEE, pp.1586-1604, 1979.

R. Lippmann, E. Martin, and D. Paul, Multi-style training for robust isolated-word speech recognition, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.705-708, 1987.
DOI : 10.1109/ICASSP.1987.1169544

D. Liu, P. Smaragdis, and M. Kim, Experiments on deep learning for speech denoising Singapore, Proceedings of INTERSPEECH, pp.2685-2688, 2014.

A. Liutkus and R. Badeau, Generalized Wiener filtering with fractional power spectrograms, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.266-270, 2015.
DOI : 10.1109/ICASSP.2015.7177973

URL : https://hal.archives-ouvertes.fr/hal-01110028

A. Liutkus, D. Fitzgerald, and R. Badeau, Cauchy nonnegative matrix factorization, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.1-5, 2015.
DOI : 10.1109/WASPAA.2015.7336900

URL : https://hal.archives-ouvertes.fr/hal-01170924

A. Liutkus, D. Fitzgerald, and Z. Rafii, Scalable audio separation with light Kernel Additive Modelling, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.76-80, 2015.
DOI : 10.1109/ICASSP.2015.7177935

URL : https://hal.archives-ouvertes.fr/hal-01114890

A. Liutkus, D. Fitzgerald, Z. Rafii, B. Pardo, and L. Daudet, Kernel Additive Models for Source Separation, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.62-4298, 2014.
DOI : 10.1109/TSP.2014.2332434

URL : https://hal.archives-ouvertes.fr/hal-01011044

A. Liutkus and P. Leveau, Separation of music+effects sound track from several international versions of the same movie, Proceedings of Audio Engineering Society (AES) Convention, pp.1-15, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00959108

A. Liutkus, F. Stöter, Z. Rafii, D. Kitamura, B. Rivet et al., The 2016 Signal Separation Evaluation Campaign, Proceedings of the International Conference on Latent Variable Analysis and Signal Separation, pp.323-332, 2017.
DOI : 10.1109/EUSIPCO.2016.7760551

URL : https://hal.archives-ouvertes.fr/hal-01472932

B. Loesch and B. Yang, Adaptive Segmentation and Separation of Determined Convolutive Mixtures under Dynamic Conditions, Proceedings of the International Conference on Latent Variable Analysis and Signal Separation, pp.41-48, 2010.
DOI : 10.1007/978-3-642-15995-4_6

P. C. Loizou, Speech Enhancement: Theory and Practice, 2007.

N. Madhu, A. Spriet, S. Jansen, R. Koning, and J. Wouters, The Potential for Speech Intelligibility Improvement Using the Ideal Binary Mask and the Ideal Wiener Filter in Single Channel Noise Reduction Systems: Application to Auditory Prostheses, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.1, pp.63-72, 2013.
DOI : 10.1109/TASL.2012.2213248

S. Makino, H. Sawada, and . Lee, Blind Speech Separation, 2007.
DOI : 10.1007/978-1-4020-6479-1

S. Markovich-golan, A. Bertrand, M. Moonen, and S. Gannot, Optimal distributed minimum-variance beamforming approaches for speech enhancement in wireless acoustic sensor networks, Signal Processing, vol.107, pp.4-20, 2015.
DOI : 10.1016/j.sigpro.2014.07.014

S. Markovich-golan, W. Kellermann, and S. Gannot, Spatial filtering Audio Source Separation and Speech Enhancement chapter 10, 2017.

W. S. Mcculloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, The Bulletin of Mathematical Biophysics, vol.5, issue.4, pp.115-133, 1943.
DOI : 10.1007/BF02478259

R. Mcgill, J. W. Tukey, and W. A. Larsen, Variations of box plots. The American Statistician, pp.12-16, 1978.

Y. Miao, H. Zhang, and F. Metze, Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, issue.11, pp.23-1938, 2015.
DOI : 10.1109/TASLP.2015.2457612

T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur, Recurrent neural network based language model Makuhari, Proceedings of INTERSPEECH, pp.1045-1048, 2010.

D. J. Montana and L. Davis, Training feedforward neural networks using genetic algorithms, Proceedings of International Joint Conferences on Artificial Intelligence (IJCAI), pp.762-767, 1989.

G. J. Mysore and P. Smaragdis, A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.17-20, 2011.
DOI : 10.1109/ICASSP.2011.5946317

URL : https://hal.archives-ouvertes.fr/hal-01084331

G. R. Naik and W. Wang, Blind Source Separation: Advances in Theory, Algorithms and Applications, 2014.
DOI : 10.1007/978-3-642-55016-4

V. Nair and G. E. Hinton, Rectified linear units improve restricted boltzmann machines, Proceedings of the International Conference on Machine Learning (ICML), pp.807-814, 2010.

T. Nakatani, S. Araki, T. Yoshioka, M. Delcroix, and M. Fujimoto, Dominance Based Integration of Spatial and Spectral Features for Speech Enhancement, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.12, pp.2516-2531, 2013.
DOI : 10.1109/TASL.2013.2277937

A. Narayanan and D. Wang, Ideal ratio mask estimation using deep neural networks for robust speech recognition, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.7092-7096, 2013.
DOI : 10.1109/ICASSP.2013.6639038

A. Narayanan and D. Wang, Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, issue.1, pp.92-101, 2015.
DOI : 10.1109/TASLP.2014.2372314

Y. Nesterov, A method of solving a convex programming problem with convergence rate O (1/k2), Soviet Mathematics Doklady, vol.27, issue.2, pp.372-376, 1983.

A. A. Nugraha, A. Liutkus, and E. Vincent, Multichannel Audio Source Separation With Deep Neural Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.9, pp.1652-1664, 2016.
DOI : 10.1109/TASLP.2016.2580946

URL : https://hal.archives-ouvertes.fr/hal-01163369

A. A. Nugraha, A. Liutkus, and E. Vincent, Multichannel music separation with deep neural networks, 2016 24th European Signal Processing Conference (EUSIPCO), pp.1748-1752, 2016.
DOI : 10.1109/EUSIPCO.2016.7760548

URL : https://hal.archives-ouvertes.fr/hal-01334614

A. A. Nugraha, K. Yamamoto, and S. Nakagawa, Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition, EURASIP Journal on Audio, Speech, and Music Processing, vol.4, issue.5, pp.2014-2015, 2014.
DOI : 10.1109/JSTSP.2010.2057191

URL : https://asmp-eurasipjournals.springeropen.com/track/pdf/10.1186/1687-4722-2014-13?site=asmp-eurasipjournals.springeropen.com

K. Osako, Y. Mitsufuji, R. Singh, and B. Raj, Supervised monaural source separation based on autoencoders, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.11-15, 2017.
DOI : 10.1109/ICASSP.2017.7951788

A. Ozerov, C. Févotte, and M. Charbit, Factorial Scaled Hidden Markov Model for polyphonic audio representation and source separation, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.121-124, 2009.
DOI : 10.1109/ASPAA.2009.5346527

URL : https://hal.archives-ouvertes.fr/inria-00553336

A. Ozerov, E. Vincent, and F. Bimbot, A General Flexible Framework for the Handling of Prior Information in Audio Source Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.4, pp.1118-1133, 2012.
DOI : 10.1109/TASL.2011.2172425

URL : https://hal.archives-ouvertes.fr/inria-00536917

L. Parra and C. Spence, Convolutive blind separation of non-stationary sources, IEEE Transactions on Speech and Audio Processing, vol.8, issue.3, pp.320-327, 2000.
DOI : 10.1109/89.841214

R. Pascanu, T. Mikolov, and Y. Bengio, On the difficulty of training recurrent neural networks, Proceedings of the International Conference on Machine Learning (ICML), pp.1310-1318, 2013.

H. Poon and P. Domingos, Sum-product networks: A new deep architecture, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp.689-690, 2011.
DOI : 10.1109/ICCVW.2011.6130310

I. Potamitis, One-Channel Separation and Recognition of Mixtures of Environmental Sounds: The Case of Bird-Song Classification in Composite Soundscenes, New Directions in Intelligent Interactive Multimedia, pp.595-604, 2008.
DOI : 10.1007/978-3-540-68127-4_61

D. Povey, H. J. Kuo, and H. Soltau, Fast speaker adaptive training for speech recognition, Proceedings of INTERSPEECH, pp.1245-1248, 2008.

L. Prechelt, Early stopping ? but when, 2012.
DOI : 10.1007/3-540-49430-8_3

L. Rabiner and B. Juang, Fundamentals of Speech Recognition, 1993.

L. R. Rabiner and R. W. Schafer, Introduction to Digital Speech Processing, Foundations and Trends?? in Signal Processing, vol.1, issue.1???2, pp.1-194, 2007.
DOI : 10.1561/2000000001

B. Raj and R. M. Stern, Missing-feature approaches in speech recognition, IEEE Signal Processing Magazine, vol.22, issue.5, pp.101-116, 2005.
DOI : 10.1109/MSP.2005.1511828

G. Richard, T. Virtanen, J. P. Bello, N. Ono, and H. Glotin, Introduction to the Special Section on Sound Scene and Event Analysis, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.6, pp.25-1169, 2017.
DOI : 10.1109/TASLP.2017.2699334

B. Rivet, W. Wang, S. M. Naqvi, and J. A. Chambers, Audiovisual Speech Source Separation: An overview of key methodologies, IEEE Signal Processing Magazine, vol.31, issue.3, pp.31-125, 2014.
DOI : 10.1109/MSP.2013.2296173

URL : https://hal.archives-ouvertes.fr/hal-00990000

R. Rojas, Neural Networks: A Systematic Introduction, 1996.
DOI : 10.1007/978-3-642-61068-4

N. Roman, D. Wang, and G. J. Brown, Speech segregation based on sound localization, The Journal of the Acoustical Society of America, vol.114, issue.4, pp.2236-2252, 2003.
DOI : 10.1121/1.1610463

URL : http://www.cis.ohio-state.edu/~dwang/papers/RWB.ijcnn01.pdf

F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain., Psychological Review, vol.65, issue.6, pp.386-408, 1958.
DOI : 10.1037/h0042519

R. Rosenfeld, Two decades of statistical language modeling: where do we go from here?, Proceedings of the IEEE, vol.88, issue.8, pp.1270-1278, 2000.
DOI : 10.1109/5.880083

S. T. Roweis, Factorial models and refiltering for speech separation and denoising, Proceedings of EUROSPEECH, pp.1009-1012, 2003.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol.85, issue.6088, pp.323-533, 1986.
DOI : 10.1038/323533a0

T. N. Sainath, R. J. Weiss, K. W. Wilson, B. Li, A. Narayanan et al., Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition, Speech, and Language Processing, pp.965-979, 2017.
DOI : 10.1109/TASLP.2017.2672401

Y. Salaün, E. Vincent, N. Bertin, N. Souviraà-labastie, X. Jaureguiberry et al., The Flexible Audio Source Separation Toolbox Version 2.0, Show & Tell of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014.

L. Samarakoon and K. C. Sim, Factorized Hidden Layer Adaptation for Deep Neural Network Based Acoustic Modeling, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.12, pp.24-2241, 2016.
DOI : 10.1109/TASLP.2016.2601146

H. Sawada, H. Kameoka, S. Araki, and N. Ueda, Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.5, pp.971-982, 2013.
DOI : 10.1109/TASL.2013.2239990

A. M. Saxe, J. L. Mcclelland, and S. Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv e-prints, 2013.

J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks, vol.61, pp.85-117, 2015.
DOI : 10.1016/j.neunet.2014.09.003

M. Schuster and K. K. Paliwal, Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing, vol.45, issue.11, pp.2673-2681, 1997.
DOI : 10.1109/78.650093

URL : https://maxwell.ict.griffith.edu.au/spl/publications/papers/ieeesp97_schuster.pdf

M. Senior, Mixing secrets for the small studio, 2011.

R. Serizel, M. Moonen, B. V. Dijk, and J. Wouters, Low-rank Approximation Based Multichannel Wiener Filter Algorithms for Noise Reduction with Application in Cochlear Implants, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.4, pp.785-799, 2014.
DOI : 10.1109/TASLP.2014.2304240

URL : https://hal.archives-ouvertes.fr/hal-01390918

L. S. Simon and E. Vincent, A General Framework for Online Audio Source Separation, Proceedings of the International Conference on Latent Variable Analysis and Signal Separation, pp.397-404, 2012.
DOI : 10.1007/978-3-662-04619-7

URL : https://hal.archives-ouvertes.fr/hal-00655398

U. ?im?ekli, A. Liutkus, and A. T. Cemgil, Alpha-Stable Matrix Factorization, IEEE Signal Processing Letters, vol.22, issue.12, pp.2289-2293, 2015.
DOI : 10.1109/LSP.2015.2477535

S. Sivasankaran, A. A. Nugraha, E. Vincent, J. A. Morales-cordovilla, S. Dalmia et al., Robust ASR using neural network based speech enhancement and feature simulation, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.482-489, 2015.
DOI : 10.1109/ASRU.2015.7404834

URL : https://hal.archives-ouvertes.fr/hal-01204553

S. Sivasankaran, E. Vincent, and I. Illina, Discriminative importance weighting of augmented training data for acoustic model training, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4885-4889, 2017.
DOI : 10.1109/ICASSP.2017.7953085

URL : https://hal.archives-ouvertes.fr/hal-01415759

P. Smaragdis, Blind separation of convolved mixtures in the frequency domain, Neurocomputing, vol.22, issue.1-3, pp.21-34, 1998.
DOI : 10.1016/S0925-2312(98)00047-2

P. Smaragdis, Convolutive Speech Bases and Their Application to Supervised Speech Separation, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.1, pp.1-12, 2007.
DOI : 10.1109/TASL.2006.876726

URL : https://www.merl.com/reports/docs/TR2007-002.pdf

P. Smaragdis and J. C. Brown, Non-negative matrix factorization for polyphonic music transcription, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684), pp.177-180, 2003.
DOI : 10.1109/ASPAA.2003.1285860

URL : http://www.merl.com/publications/docs/TR2003-139.pdf

P. Smaragdis, C. Fevotte, G. J. Mysore, N. Mohammadiha, and M. Hoffman, Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view, IEEE Signal Processing Magazine, vol.31, issue.3, pp.31-66, 2014.
DOI : 10.1109/MSP.2013.2297715

J. O. Smith, Spectral Audio Signal Processing, 2011.

P. Smolensky, Information processing in dynamical systems: Foundations of harmony theory, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, pp.194-281, 1986.

P. Sprechmann, A. M. Bronstein, and G. Sapiro, Supervised non-negative matrix factorization for audio source separation, Excursions in Harmonic Analysis Applied and Numerical Harmonic Analysis, pp.407-420, 2015.
DOI : 10.1007/978-3-319-20188-7_16

R. M. Stern and N. Morgan, Features Based on Auditory Physiology and Perception, Techniques for Noise Robustness in Automatic Speech Recognition chapter, 2012.
DOI : 10.1121/1.385079

S. S. Stevens, J. Volkmann, and E. B. Newman, A Scale for the Measurement of the Psychological Magnitude Pitch, The Journal of the Acoustical Society of America, vol.8, issue.3, pp.185-190, 1937.
DOI : 10.1121/1.1915893

N. Sturmel, A. Liutkus, J. Pinel, L. Girin, S. Marchand et al., Linear mixing models for active listening of music productions in realistic studio conditions, Proceedings of Audio Engineering Society (AES) Convention, pp.1-10, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00790783

P. Swietojanski, J. Li, and S. Renals, Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.8, pp.24-1450, 2016.
DOI : 10.1109/TASLP.2016.2560534

V. Tavakoli, J. Jensen, M. Christensen, and J. Benesty, A Framework for Speech Enhancement With Ad Hoc Microphone Arrays, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.6, pp.24-1038, 2016.
DOI : 10.1109/TASLP.2016.2537202

. Theano-dev and . Team, Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, 2016.

M. Togami, Online speech source separation based on maximum likelihood of local Gaussian modeling, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.213-216, 2011.
DOI : 10.1109/ICASSP.2011.5946378

M. Togami and Y. Kawaguchi, Simultaneous Optimization of Acoustic Echo Reduction, Speech Dereverberation, and Noise Reduction against Mutual Interference, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.11, pp.22-1612, 2014.
DOI : 10.1109/TASLP.2014.2341918

H. W. Tseng, M. Hong, and . Luo, Combining sparse NMF with deep neural network: A new classification-based approach for speech enhancement, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2145-2149, 2015.
DOI : 10.1109/ICASSP.2015.7178350

Y. Tu, J. Du, Y. Xu, L. Dai, and C. Lee, Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers, The 9th International Symposium on Chinese Spoken Language Processing, pp.250-254, 2014.
DOI : 10.1109/ISCSLP.2014.6936615

S. Uhlich, F. Giron, and Y. Mitsufuji, Deep neural network based instrument extraction from music, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2135-2139, 2015.
DOI : 10.1109/ICASSP.2015.7178348

S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp et al., Improving music source separation based on deep neural networks through data augmentation and network blending, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.261-265, 2017.
DOI : 10.1109/ICASSP.2017.7952158

B. D. Veen and K. M. Buckley, Beamforming: a versatile approach to spatial filtering, IEEE ASSP Magazine, vol.5, issue.2, pp.4-24, 1988.
DOI : 10.1109/53.665

K. Veselý, A. Ghoshal, L. Burget, and D. Povey, Sequence-discriminative training of deep neural networks, Proceedings of INTERSPEECH, pp.2345-2349, 2013.

O. Viikki and K. Laurila, Noise robust HMM-based speech recognition using segmental cepstral feature vector normalization, Proceedings of the Tutorial and Research Workshop on Robust Speech Recognisiton for Unknown Communication Channels, pp.107-110, 1997.

E. Vincent, Musical source separation using time-frequency source priors, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.1, pp.91-98, 2006.
DOI : 10.1109/TSA.2005.860342

URL : https://hal.archives-ouvertes.fr/inria-00544269

E. Vincent, An Experimental Evaluation of Wiener Filter Smoothing Techniques Applied to Under-Determined Audio Source Separation, Proceedings of the International Conference on Latent Variable Analysis and Signal Separation, pp.157-164, 2010.
DOI : 10.1007/978-3-642-15995-4_20

URL : https://hal.archives-ouvertes.fr/inria-00544035

E. Vincent, N. Bertin, R. Gribonval, and F. Bimbot, From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound, IEEE Signal Processing Magazine, vol.31, issue.3, pp.31-107, 2014.
DOI : 10.1109/MSP.2013.2297440

URL : https://hal.archives-ouvertes.fr/hal-00922378

E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.4, pp.1462-1469, 2006.
DOI : 10.1109/TSA.2005.858005

URL : https://hal.archives-ouvertes.fr/inria-00544230

E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, and M. E. Davies, Probabilistic Modeling Paradigms for Audio Source Separation, Machine Audition: Principles, Algorithms and Systems chapter, pp.162-185, 2011.
DOI : 10.4018/978-1-61520-919-4.ch007

URL : https://hal.archives-ouvertes.fr/inria-00544016

E. Vincent, H. Sawada, P. Bofill, S. Makino, and J. P. Rosca, First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results, Proceedings of the International Conference on Independent Component Analysis and Signal Separation, pp.552-559, 2007.
DOI : 10.1007/978-3-540-74494-8_69

URL : https://hal.archives-ouvertes.fr/inria-00544199

E. Vincent, T. Virtanen, S. Gannot, and . Eds, Audio Source Separation and Speech Enhancement, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01120685

E. Vincent, S. Watanabe, A. A. Nugraha, J. Barker, and R. Marxer, An analysis of environment, microphone and data simulation mismatches in robust speech recognition, Computer Speech & Language, vol.46, pp.535-557, 2017.
DOI : 10.1016/j.csl.2016.11.005

URL : https://hal.archives-ouvertes.fr/hal-01399180

P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.1096-1103, 2008.
DOI : 10.1145/1390156.1390294

T. Virtanen, R. Singh, and B. Raj, Techniques for Noise Robustness in Automatic Speech Recognition, 2012.
DOI : 10.1002/9781118392683

T. Virtanen, E. Vincent, and S. Gannot, Time-frequency processing ? spectral properties, Audio Source Separation and Speech Enhancement chapter, 2017.

L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus, Regularization of neural networks using dropconnect, Proceedings of the International Conference on Machine Learning (ICML), pp.1058-1066, 2013.

D. Wang, On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis, Speech Separation by Humans and Machines, pp.181-197, 2005.
DOI : 10.1007/0-387-22794-6_12

D. Wang, Time-Frequency Masking for Speech Separation and Its Potential for Hearing Aid Design, Trends in Amplification, vol.52, issue.20, pp.332-353, 2008.
DOI : 10.1109/TSP.2004.828896

D. Wang, Deep learning reinvents the hearing aid, IEEE Spectrum, vol.54, issue.3, pp.32-37, 2017.
DOI : 10.1109/MSPEC.2017.7864754

D. Wang, G. J. Brown, and . Eds, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, 2006.
DOI : 10.1109/9780470043387

D. L. Wang and G. J. Brown, Separation of speech from interfering sounds based on oscillatory correlation, IEEE Transactions on Neural Networks, vol.10, issue.3, pp.684-697, 1999.
DOI : 10.1109/72.761727

Y. Wang and D. Wang, Towards Scaling Up Classification-Based Speech Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.7, pp.1381-1390, 2013.
DOI : 10.1109/TASL.2013.2250961

Y. Wang and D. Wang, A deep neural network for time-domain signal reconstruction, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4390-4394, 2015.
DOI : 10.1109/ICASSP.2015.7178800

Z. Wang, E. Vincent, R. Serizel, and Y. Yan, Rank-1 constrained Multichannel Wiener Filter for speech recognition in noisy environments, Computer Speech & Language, vol.49, 2017.
DOI : 10.1016/j.csl.2017.11.003

URL : https://hal.archives-ouvertes.fr/hal-01634449

E. Warsitz and R. Haeb-umbach, Blind Acoustic Beamforming Based on Generalized Eigenvalue Decomposition, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.5, pp.1529-1539, 2007.
DOI : 10.1109/TASL.2007.898454

F. Weninger, J. Du, E. Marchi, and T. Gao, Single-channel classification and clustering approaches, Audio Source Separation and Speech Enhancement chapter, 2017.

F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le-roux et al., Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR, Proceedings of the International Conference on Latent Variable Analysis and Signal Separation, pp.91-99, 2015.
DOI : 10.1007/978-3-319-22482-4_11

URL : https://hal.archives-ouvertes.fr/hal-01163493

F. Weninger, J. Le-roux, J. R. Hershey, and B. Schuller, Discriminatively trained recurrent neural networks for single-channel speech separation, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp.577-581, 2014.
DOI : 10.1109/GlobalSIP.2014.7032183

P. J. Werbos, Backpropagation: past and future, IEEE International Conference on Neural Networks, pp.343-353, 1988.
DOI : 10.1109/ICNN.1988.23866

P. J. Werbos, Backpropagation through time: what it does and how to do it, Proceedings of the IEEE, pp.1550-1560, 1990.
DOI : 10.1109/5.58337

B. M. Wilamowski and H. Yu, Neural Network Learning Without Backpropagation, IEEE Transactions on Neural Networks, vol.21, issue.11, pp.1793-1803, 2010.
DOI : 10.1109/TNN.2010.2073482

D. S. Williamson, Y. Wang, and D. Wang, Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality, The Journal of the Acoustical Society of America, vol.138, issue.3, pp.1399-1407, 2015.
DOI : 10.1121/1.4928612

D. S. Williamson, Y. Wang, and D. Wang, Complex Ratio Masking for Monaural Speech Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.3, pp.483-492, 2016.
DOI : 10.1109/TASLP.2015.2512042

S. Wisdom, J. R. Hershey, J. Le-roux, and S. Watanabe, Deep unfolding for multichannel source separation, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.121-125, 2016.
DOI : 10.1109/ICASSP.2016.7471649

M. Wöllmer, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll, Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6822-6826, 2013.
DOI : 10.1109/ICASSP.2013.6638983

X. Xiao, S. Watanabe, H. Erdogan, L. Lu, J. Hershey et al., Deep beamforming networks for multi-channel speech recognition, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5745-5749, 2016.
DOI : 10.1109/ICASSP.2016.7472778

Y. Xu, J. Du, L. Dai, and C. Lee, An Experimental Study on Speech Enhancement Based on Deep Neural Networks, IEEE Signal Processing Letters, vol.21, issue.1, pp.65-68, 2014.
DOI : 10.1109/LSP.2013.2291240

D. Yu and L. Deng, Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP, IEEE Signal Processing Magazine, vol.28, issue.1, pp.145-154, 2011.
DOI : 10.1109/MSP.2010.939038

D. Yu and L. Deng, Automatic Speech Recognition: A Deep Learning Approach, 2015.
DOI : 10.1007/978-1-4471-5779-3

H. Yu and B. M. Wilamowski, Levenberg???Marquardt Training, 2011.
DOI : 10.1201/b10604-15

W. Zaremba, I. Sutskever, and O. Vinyals, Recurrent neural network regularization. arXiv e-prints, 2015.

M. D. Zeiler, ADADELTA: An adaptive learning rate method. arXiv e-prints, 2012.

X. L. Zhang and D. Wang, A Deep Ensemble Learning Method for Monaural Speech Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.5, pp.967-977, 2016.
DOI : 10.1109/TASLP.2016.2536478

Z. Zhang, N. Cummins, and B. Schuller, Advanced Data Exploitation in Speech Analysis: An overview, IEEE Signal Processing Magazine, vol.34, issue.4, pp.107-129, 2017.
DOI : 10.1109/MSP.2017.2699358

K. ?molíková, M. Karafiát, K. Veselý, M. Delcroix, S. Watanabe et al., Data Selection by Sequence Summarizing Neural Network in Mismatch Condition Training, Interspeech 2016, pp.2354-2358, 2016.
DOI : 10.21437/Interspeech.2016-741

G. Zweig and P. Nguyen, SCARF: A segmental conditional random field toolkit for speech recognition, Proceedings of INTERSPEECH, pp.2858-2861, 2010.