,
,
, Fully Convolutional Speech Recognition 117
,
,
,
,
, Face aging with conditional generative adversarial networks, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01617351
Generating sentences from a continuous space, 2015. ,
, Neural photo editing with introspective adversarial networks, 2016.
Infogan: Interpretable representation learning by information maximizing generative adversarial nets, Advances in Neural Information Processing Systems, pp.2172-2180, 2016. ,
, Censoring representations with an adversary, 2015.
Domain-adversarial training of neural networks, Journal of Machine Learning Research, vol.17, issue.59, pp.1-35, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01624607
Generative adversarial nets, Advances in neural information processing systems, pp.2672-2680, 2014. ,
Transforming auto-encoders, Artificial Neural Networks and Machine Learning-ICANN 2011, pp.44-51, 2011. ,
Image-to-image translation with conditional adversarial networks, 2016. ,
Adam: A method for stochastic optimization, 2014. ,
Deep convolutional inverse graphics network, Advances in Neural Information Processing Systems, pp.2539-2547, 2015. ,
Precomputed real-time texture synthesis with markovian generative adversarial networks, European Conference on Computer Vision, pp.702-716 ,
, , 2016.
Deep learning face attributes in the wild, Proceedings of International Conference on Computer Vision (ICCV), 2015. ,
Learning to pivot with adversarial networks, 2016. ,
Disentangling factors of variation in deep representation using adversarial training, Advances in Neural Information Processing Systems, pp.5041-5049, 2016. ,
Automated flower classification over a large number of classes, Computer Vision, Graphics & Image Processing, pp.722-729, 2008. ,
, Invertible conditional gans for image editing, 2016.
Learning deep representations of fine-grained visual descriptions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.49-58, 2016. ,
Deep visual analogy-making, Advances in Neural Information Processing Systems, pp.1252-1260, 2015. ,
Learning factorial codes by predictability minimization, Neural Computation, vol.4, issue.6, pp.863-879, 1992. ,
Unsupervised cross-domain image generation, 2016. ,
Deep feature interpolation for image content changes, 2016. ,
Unsupervised creation of parameterized avatars, 2017. ,
Attribute2image: Conditional image generation from visual attributes, European Conference on Computer Vision, pp.776-791, 2016. ,
Weakly-supervised disentangling with recurrent transformations for 3d view synthesis, Advances in Neural Information Processing Systems, pp.1099-1107, 2015. ,
Unpaired image-to-image translation using cycle-consistent adversarial networks, 2017. ,
Amazon's mechanical turk: A new source of inexpensive, yet high-quality, data? Perspectives on psychological science, vol.6, pp.3-5, 2011. ,
Gradient conversion between time and frequency domains using wirtinger calculus, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01534863
An expert system for harmonizing four-part chorales, Computer Music Journal, vol.12, issue.3, pp.43-51, 1988. ,
Neural audio synthesis of musical notes with wavenet autoencoders, 2017. ,
Convolutional sequence to sequence learning, 2017. ,
Auditory nonlinearity, The Journal of the Acoustical Society of America, vol.41, issue.3, pp.676-699, 1967. ,
Signal estimation from modified short-time fourier transform, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.32, issue.2, pp.236-243, 1984. ,
Deepbach: a steerable model for bach chorales generation, 2016. ,
Conditional end-to-end audio transforms, 2018. ,
Morpheus: automatic music generation with recurrent pattern constraints and tension profiles, 2016. ,
Distilling the knowledge in a neural network, 2015. ,
Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997. ,
Analysis synthesis telephony based on the maximum likelihood method, The 6th international congress on acoustics, pp.280-292, 1968. ,
Efficient neural audio synthesis, 2018. ,
Adam: A method for stochastic optimization, International Conference on Learning Representations, 2015. ,
Detection theory: A user's guide. Psychology press, 2004. ,
Samplernn: An unconditional end-to-end neural audio generation model, 2016. ,
Parallel wavenet: Fast high-fidelity speech synthesis, 2017. ,
Deep voice 3: Scaling text-to-speech with convolutional sequence learning, Proc. 6th International Conference on Learning Representations, 2018. ,
Crowdmos: An approach for crowdsourcing mean opinion score studies, Acoustics, Speech and Signal Processing, pp.2416-2419, 2011. ,
Char2wav: End-to-end speech synthesis, 2017. ,
End-to-end memory networks, Advances in neural information processing systems, pp.2440-2448, 2015. ,
Voice synthesis for in-the-wild speakers via a phonological loop, 2017. ,
Wavenet: A generative model for raw audio, 2016. ,
Towards end-to-end speech synthesis, 2017. ,
Gradient-based learning algorithms for recurrent networks and their computational complexity. Backpropagation: Theory, architectures, and applications, vol.1, pp.433-486, 1995. ,
, Gammatone-based spectrograms, using gammatone filterbanks or Fourier transform weightings
Convolutional Neural Networks for Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, pp.1533-1545, 2014. ,
Analysis/synthesis comparison of vocoders utilized in statistical parametric speech synthesis, 2012. ,
Comparative evaluation of feature normalization techniques for speaker verification, NOLISP, 2011. ,
Greg Diamos, and others. Deep speech 2: End-to-end speech recognition in english and mandarin, 2015. ,
Deep Scattering Spectrum, IEEE Transactions on Signal Processing, vol.62, pp.4114-4128, 2014. ,
Unsupervised neural machine translation, 2017. ,
Some developmental processes in speech perception. Child Phonology: Perception & Production, 1980. ,
Discovering discrete subword units with binarized autoencoders and hidden-markov-model encoders, INTERSPEECH, 2015. ,
Neural Machine Translation by Jointly Learning to Align and Translate, 2014. ,
The dragon system-an overview, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.23, issue.1, pp.24-29, 1975. ,
Spline filters for end-to-end deep learning, ICML, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01879266
The pascal chime speech separation and recognition challenge, Computer Speech & Language, vol.27, issue.3, pp.621-633, 2013. ,
URL : https://hal.archives-ouvertes.fr/inria-00584051
The third 'chime'speech separation and recognition challenge: Dataset, task and baselines, Automatic Speech Recognition and Understanding (ASRU), pp.504-511, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01211376
, Markus Kliegl, Atul Kumar, and others. Reducing Bias in Production Speech Models, 2017.
Computer vision and image understanding, vol.110, pp.346-359, 2008. ,
Comparison of neural and conventional classifiers on a speech recognition problem, First IEE International Conference on, pp.86-89, 1989. ,
Listen and translate: A proof of concept for end-to-end speech-to-text translation, 2016. ,
Automatic assessment of dysarthria severity level using audio descriptors, ICASSP, pp.5070-5074, 2017. ,
Experiments with time delay networks and dynamic time warping for speaker independent isolated digits recognition, First European Conference on Speech Communication and Technology, 1989. ,
, Triplet Loss for Speaker Turn Embedding. Acoustics, Speech and Signal Processing, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01830421
Signature verification using a "Siamese" time delay neural network, International Journal of Pattern Recognition and Artificial Intelligence, vol.7, issue.04, pp.669-688, 1993. ,
Australian Aboriginal Languages: Consonant Salient Phonologies and the'place-of-articulation Imperative'. Australian Speech Science and Technology Association, 2003. ,
Multitask learning, Learning to learn, pp.95-133, 1998. ,
Deep recurrent neural networks for acoustic modelling, 2015. ,
, , 2015.
Large scale online learning of image similarity through ranking, The Journal of Machine Learning Research, vol.11, pp.1109-1135, 2010. ,
Parallel inference of dirichlet process gaussian mixture models for unsupervised acoustic modeling: a feasibility study, INTERSPEECH, 2015. ,
State-of-the-art speech recognition with sequence-to-sequence models, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4774-4778, 2018. ,
Towards better decoding and language model integration in sequence to sequence models, 2016. ,
Attention-based models for speech recognition, Advances in neural information processing systems, pp.577-585, 2015. ,
Unsupervised crossmodal alignment of speech and text embedding spaces, 2018. ,
NKI-CCRT Corpus -Speech Intelligibility Before and After Advanced Head and Neck Cancer Treated with Concomitant Chemoradiotherapy, LREC, 2012. ,
Otitis media in aboriginal children: tackling a major health problem, The Medical Journal of Australia, vol.177, issue.4, pp.177-178, 2002. ,
Natural language processing (almost) from scratch, Journal of Machine Learning Research, vol.12, pp.2493-2537, 2011. ,
Wav2letter: an end-to-end convnet-based speech recognition system, 2016. ,
Word translation without parallel data, 2017. ,
Language Modeling with Gated Convolutional Networks, ICML, 2017. ,
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE transactions on acoustics, speech, and signal processing, vol.28, issue.4, pp.357-366, 1980. ,
Sing: Symbol-to-instrument neural generator, Advances in Neural Information Processing Systems, pp.9055-9065, 2018. ,
Frontend factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.4, pp.788-798, 2011. ,
Imagenet: A largescale hierarchical image database, Computer Vision and Pattern Recognition, pp.248-255, 2009. ,
Nlp on spoken documents without asr, EMNLP, 2010. ,
The zero resource speech challenge 2017, Automatic Speech Recognition and Understanding Workshop, p.2017 ,
URL : https://hal.archives-ouvertes.fr/hal-01687504
, IEEE, pp.323-330, 2017.
Real-time speech and music classification by large audio feature space extraction, 2015. ,
Opensmile: the munich versatile and fast open-source audio feature extractor, ACM Multimedia, 2010. ,
Analysis and synthesis of speech processes. Manual of phonetics, vol.2, pp.173-277, 1968. ,
Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, 1970. ,
Learning hierarchical features for scene labeling, IEEE transactions on pattern analysis and machine intelligence, vol.35, pp.1915-1929, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00742077
Elements of psychophysics, 1966. ,
Word-level information influences phonetic learning in adults and infants, Cognition, vol.127, issue.3, pp.427-438, 2013. ,
Statistical methods for research workers, Statistical methods for research workers, 1925. ,
Parametric coding of speech spectra, The Journal of the Acoustical Society of America, vol.68, issue.2, pp.412-419, 1980. ,
Locally normalized filter banks applied to deep neural-network-based robust speech recognition, IEEE Signal Processing Letters, vol.24, pp.377-381, 2017. ,
Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition, Competition and cooperation in neural nets, pp.267-285, 1982. ,
TIMIT acoustic-phonetic continuous speech corpus, Linguistic data consortium, vol.10, issue.5, p.0, 1993. ,
A Convolutional Encoder Model for Neural Machine Translation, ACL, 2017. ,
Convolutional Sequence to Sequence Learning, ICML, 2017. ,
Acoustic Modelling from the Signal Domain Using CNNs, 2016. ,
Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.580-587, 2014. ,
Towards end-to-end speech recognition with recurrent neural networks, International Conference on Machine Learning, pp.1764-1772, 2014. ,
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, Proceedings of the 23rd international conference on Machine learning, pp.369-376, 2006. ,
Speech recognition with deep recurrent neural networks, Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pp.6645-6649, 2013. ,
The mel scale's disqualifying bias and a consistency of pitch-difference equisections in 1956 with equal cochlear distances and equal frequency ratios, Hearing research, vol.103, pp.1-2, 1997. ,
End-to-end Speech Recognition Using Lattice-free MMI, 2018. ,
The CAPIO 2017 Conversational Speech Recognition System, 2017. ,
, Deep Speech: Scaling up end-to-end speech recognition, 2014.
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, CVPR, 2015. ,
Deep residual learning for image recognition, Proceedings of CVPR, 2016. ,
Quantile based histogram equalization for noise robust large vocabulary speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, vol.14, pp.845-854, 2006. ,
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012. ,
Improving neural networks by preventing co-adaptation of feature detectors, 2012. ,
Distinguishing deceptive from non-deceptive speech, Ninth European Conference on Speech Communication and Technology, 2005. ,
Speech acoustic modeling from raw multichannel waveforms, Acoustics, Speech and Signal Processing, pp.4624-4628, 2015. ,
Effective Attention Mechanism in Dynamic Models for Speech Emotion Recognition, ICASSP, pp.2526-2530, 2018. ,
Receptive fields, binocular interaction and functional architecture in the cat's visual cortex, The Journal of physiology, vol.160, issue.1, pp.106-154, 1962. ,
Exploiting speech knowledge in neural nets for recognition, Speech Communication, vol.9, pp.1-13, 1990. ,
Learning a better representation of speech soundwaves using restricted boltzmann machines, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5884-5887, 2011. ,
Efficient spoken term discovery using randomized algorithms, Automatic Speech Recognition and Understanding (ASRU), pp.401-406, 2011. ,
A summary of the 2012 JH CLSP Workshop on zero resource speech technologies and models of early language acquisition, Proceedings of ICASSP 2013, 2013. ,
Continuous speech recognition by statistical methods, Proceedings of the IEEE, vol.64, issue.4, pp.532-556, 1976. ,
Leveraging weakly supervised data to improve end-to-end speech-to-text translation, 2018. ,
A systematic review of speech recognition technology in health care, In BMC Med. Inf. & Decision Making, 2014. ,
Maximum likelihood estimation for multivariate mixture observations of markov chains (corresp.), IEEE Transactions on Information Theory, vol.32, issue.2, pp.307-309, 1986. ,
On learning to identify genders from raw speech signal using cnns, 2018. ,
Fully unsupervised small-vocabulary speech recognition using a segmental bayesian model, INTERSPEECH, 2015. ,
Deep convolutional acoustic word embeddings using word-pair side information, 2015. ,
Dnnbased voice activity detection with local feature shift technique, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp.1-4, 2016. ,
Voiceprint identification, Nature, vol.196, pp.1253-1257, 1962. ,
,
Dysarthric speech database for universal access research, INTERSPEECH, 2008. ,
Automatic intelligibility classification of sentence-level pathological speech, Computer Speech & Language, vol.29, issue.1, pp.132-144, 2015. ,
Dysarthric speech recognition using dysarthria-severitydependent and speaker-adaptive models, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.3622-3626, 2013. ,
Joint CTC-attention based end-to-end speech recognition using multi-task learning, Acoustics, Speech and Signal Processing, pp.4835-4839, 2017. ,
Testing the correlation of word error rate and perplexity, Speech Communication, vol.38, issue.1-2, pp.19-28, 2002. ,
A new frequency scale for acoustic measurements, Bell Lab Rec, pp.299-301, 1949. ,
Audio event classification using deep neural networks, INTERSPEECH, 2013. ,
Imagenet classification with deep convolutional neural networks, NIPS, 2012. ,
Feature normalisation for robust speech recognition, 2015. ,
Fader networks: Manipulating images by sliding attributes, NIPS, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-02275215
Phrase-based & neural unsupervised machine translation, 2018. ,
Backpropagation applied to handwritten zip code recognition, Neural computation, vol.1, issue.4, pp.541-551, 1989. ,
Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998. ,
Speaker-independent phone recognition using hidden markov models, IEEE Trans. Acoustics, Speech, and Signal Processing, vol.37, pp.1641-1648, 1988. ,
A novel scheme for speaker recognition using a phonetically-aware deep neural network, Acoustics, Speech and Signal Processing, pp.1695-1699, 2014. ,
An introduction to the application of the theory of probabilistic functions of a markov process to automatic speech recognition, Bell System Technical Journal, vol.62, issue.4, pp.1035-1074, 1983. ,
Audio event classification using deep neural networks, 2016. ,
, Network in network, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00737767
Human information processing: An introduction to psychology, 2013. ,
Letter-Based Speech Recognition with Gated ConvNets. CoRR, abs/1712.09444, 2017. ,
Suitability of Dysphonia Measurements for Telemonitoring of Parkinson's Disease, IEEE Transactions on Biomedical Engineering, vol.56, pp.1015-1022, 2009. ,
Topic identification for speech without asr, INTERSPEECH, 2017. ,
Gram-ctc: Automatic unit selection and target decomposition for sequence labelling, 2017. ,
Least squares quantization in pcm, IEEE transactions on information theory, vol.28, issue.2, pp.129-137, 1982. ,
Birdvox-full-night: A dataset and benchmark for avian flight call detection, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.266-270, 2018. ,
Per-channel energy normalization: Why and how, IEEE Signal Processing Letters, vol.26, pp.39-43, 2019. ,
Object recognition from local scale-invariant features. In Computer vision, The proceedings of the seventh IEEE international conference on, vol.2, pp.1150-1157, 1999. ,
Segmental recurrent neural networks for end-to-end speech recognition, 2016. ,
Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement, INTERSPEECH, 2008. ,
An excitation model for hmm-based speech synthesis based on residual modeling, SSW, 2007. ,
Lpcw: An lpc vocoder with linear predictive spectral warping, IEEE International Conference on ICASSP'76, vol.1, pp.466-469, 1976. ,
Learning phonemes with a proto-lexicon, Cognitive science, vol.37, issue.1, pp.103-127, 2013. ,
librosa: Audio and music signal analysis in python, Proceedings of the 14th python in science conference, pp.18-25, 2015. ,
Adapting acoustic and lexical models to dysarthric speech, ICASSP, pp.4924-4927, 2011. ,
Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech, vol.6657, pp.291-300, 2011. ,
EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, Automatic Speech Recognition and Understanding Workshop (ASRU), 2015. ,
Recurrent neural network based language model, INTERSPEECH, 2010. ,
Learning to detect dysarthria from raw speech. CoRR, abs/1811.11101, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-02274504
Improved phone recognition using bayesian triphone models, Proceedings of the 1998 IEEE International Conference on, vol.1, pp.409-412, 1998. ,
Deep belief networks for phone recognition, Nips workshop on deep learning for speech recognition and related applications, vol.1, p.39, 2009. ,
Weighted finite-state transducers in speech recognition, Computer Speech & Language, vol.16, issue.1, pp.69-88, 2002. ,
Automatic speech recognition: An auditory perspective, Speech processing in the auditory system, pp.309-338 ,
, , 2004.
Towards directly modeling raw speech signal for speaker verification using cnns, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4884-4888, 2018. ,
Rectified linear units improve restricted boltzmann machines, Proceedings of the 27th international conference on machine learning (ICML-10), pp.807-814, 2010. ,
Emotion recognition in speech using neural networks, Neural computing & applications, vol.9, issue.4, pp.290-296, 2000. ,
Cepstrum pitch determination. The journal of the acoustical society of America, vol.41, pp.293-309, 1967. ,
Short-time "cepstrum" pitch detection, The Journal of the Acoustical Society of America, vol.36, issue.5, pp.1030-1030, 1964. ,
Multichannel endto-end speech recognition, ICML, 2017. ,
Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development. Proceedings of the, vol.107, pp.13354-13363, 2010. ,
A novel approach to detecting non-native speakers and their native language, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4398-4401, 2010. ,
, , 1987.
Subunit definition and analysis for humpback whale call classification, Applied Acoustics, vol.71, issue.11, pp.1107-1112, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-02264967
End-to-end phoneme sequence recognition using convolutional neural networks, 2013. ,
Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, In INTERSPEECH, 2013. ,
Convolutional neural networksbased continuous speech recognition using raw speech signal, Acoustics, Speech and Signal Processing, pp.4295-4299, 2015. ,
Jointly Learning to Locate and Classify Words Using Convolutional Networks, 2016. ,
Librispeech: an ASR corpus based on public domain audio books, Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp.5206-5210, 2015. ,
Unsupervised pattern discovery in speech. Audio, Speech, and Language Processing, IEEE Transactions on, vol.16, issue.1, pp.186-197, 2008. ,
The design for the Wall Street Journal-based CSR corpus, Proceedings of the workshop on Speech and Natural Language, pp.357-362, 1992. ,
Deep scattering spectrum with deep neural networks, 2014 IEEE International Conference on, pp.210-214, 2014. ,
Classification of huntington disease using acoustic and lexical features, 2018. ,
Improving the fisher kernel for large-scale image classification, European conference on computer vision, pp.143-156, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00548630
Speaker-aware long short-term memory multi-task learning for speech recognition, Signal Processing Conference, pp.1911-1915, 2016. ,
, Buckeye Corpus of Conversational Speech, 2007.
Prosodic manifestations of confidence and uncertainty in spoken language, INTERSPEECH, 2008. ,
Semi-orthogonal low-rank matrix factorization for deep neural networks, In Interspeech, 2018. ,
A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE, vol.77, issue.2, pp.257-286, 1989. ,
Exploring architectures, data and units for streaming end-to-end speech recognition with rnn-transducer, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.193-199, 2017. ,
, Speaker recognition from raw waveform with sincnet, 2018.
Connectionist probability estimators in hmm speech recognition, IEEE Trans. Speech and Audio Processing, vol.2, pp.161-174, 1994. ,
A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge, INTERSPEECH, 2015. ,
Towards a Comparative Database of Dysarthric Articulation, Proceedings of ISSP, 2008. ,
The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, vol.46, pp.523-541, 2012. ,
, The IBM 2016 speaker recognition system, 2016.
Learning filter banks within a deep neural network framework, ASRU, pp.297-302, 2013. ,
Learning the speech front-end with raw waveform CLDNNs, Sixteenth Annual Conference of the International Speech Communication Association, 2015. ,
Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.30-36, 2015. ,
Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, vol.26, issue.1, pp.43-49, 1978. ,
Fusing shallow and deep learning for bioacoustic bird species classification, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.141-145, 2017. ,
Weight normalization: A simple reparameterization to accelerate training of deep neural networks, Advances in Neural Information Processing Systems, pp.901-909, 2016. ,
, English conversational telephone speech recognition by humans and machines, 2017.
Formant centralization ratio: a proposal for a new acoustic measure of dysarthric speech, Journal of speech, language, and hearing research : JSLHR, vol.53, pp.114-139, 2010. ,
Emotion identification from raw speech signals using dnns, 2018. ,
Evaluating speech features with the minimal-pair abx task: Analysis of the classical mfc/plp pipeline, INTERSPEECH 2013: 14th Annual Conference of the International Speech Communication Association, pp.1-5, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00918599
ABX-discriminability measures and applications, 2016. ,
URL : https://hal.archives-ouvertes.fr/tel-01407461
Gammatone features and feature combination for large vocabulary speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing -ICASSP '07, vol.4, 2007. ,
Paralinguistics in speech and language-state-of-theart and the challenge, Computer Speech & Language, vol.27, issue.1, pp.4-39, 2013. ,
The interspeech 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism, Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, 2013. ,
Christian A. Müller, and Shrikanth Narayanan. The interspeech 2010 paralinguistic challenge, INTERSPEECH, 2010. ,
, George Trigeorgis, Panagiotis Tzirakis, and Stefanos P. Zafeiriou. The interspeech 2017 computational paralinguistics challenge: Addressee, cold & snoring. In INTERSPEECH, 2017.
Three recent trends in paralinguistics on the way to omniscient machine intelligence, Journal on Multimodal User Interfaces, vol.12, pp.273-283, 2018. ,
The Interspeech 2009 Emotion Challenge, Proc. Interspeech, pp.312-315, 2009. ,
A purely end-to-end system for multi-speaker speech recognition, 2018. ,
, Neural machine translation of rare words with subword units, 2015.
Auditory toolbox. Interval Research Corporation, vol.10, 1998. ,
Efficient auditory coding, Nature, vol.439, issue.7079, pp.978-982, 2006. ,
On the psychophysical law, Psychological review, vol.64, pp.153-81, 1957. ,
The relation of pitch to frequency: A revised scale, The American Journal of Psychology, vol.53, issue.3, pp.329-353, 1940. ,
A scale for the measurement of the psychological magnitude pitch, The Journal of the Acoustical Society of America, vol.8, issue.3, pp.185-190, 1937. ,
On the importance of initialization and momentum in deep learning, International conference on machine learning, pp.1139-1147, 2013. ,
Sequence to sequence learning with neural networks, Advances in neural information processing systems, pp.3104-3112, 2014. ,
Contributions of infant word learning to language development, Philosophical transactions of the Royal Society of London. Series B, Biological sciences, vol.364, pp.3617-3649, 2009. ,
Weakly Supervised Multi-Embeddings Learning of Acoustic Models, ICLR, 2014. ,
Phonetics Embedding Learning with Side Information, IEEE Spoken Language Technology Workshop, 2014. ,
Multi-task recurrent model for speech and speaker recognition, 2016. ,
yeah right": Sarcasm recognition for spoken dialogue systems, Ninth International Conference on Spoken Language Processing, 2006. ,
A Hybrid Dynamic Time Warping-Deep Neural Network Architecture for Unsupervised Acoustic Modeling, Sixteenth Annual Conference of the International Speech Communication Association, 2015. ,
, Attention-based Wav2text with Feature Transfer Learning, 2017.
Sequence-to-Sequence ASR Optimization via Reinforcement Learning, 2017. ,
Multilingual speech recognition with a single end-to-end model, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4904-4908, 2018. ,
Experiment on voice identification, The Journal of the Acoustical Society of America, vol.51, issue.6B, pp.2030-2043, 1972. ,
Combining time-and frequency-domain convolution in convolutional neural network-based phone recognition, Acoustics, Speech and Signal Processing, pp.190-194, 2014. ,
Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network, ICASSP, pp.5200-5204, 2016. ,
Phone recognition with hierarchical convolutional deep maxout networks, EURASIP Journal on Audio, Speech, and Music Processing, vol.2015, issue.1, p.25, 2015. ,
Acoustic modeling with deep neural networks using raw time signal for LVCSR, 2014. ,
Instance Normalization: The Missing Ingredient for Fast Stylization, 2016. ,
Fitting the mel scale, Proceedings., 1999 IEEE International Conference on, vol.1, pp.217-220, 1999. ,
Deep content-based music recommendation, Advances in neural information processing systems, pp.2643-2651, 2013. ,
Wavenet: A generative model for raw audio, 2016. ,
Stochastic triplet embedding, Machine Learning for Signal Processing (MLSP), pp.1-6, 2012. ,
Stochastic adaptive neural architecture search for keyword spotting, 2018. ,
The zero resource speech challenge, Proc. of Interspeech, 2015. ,
Cepstral domain segmental feature vector normalization for noise robust speech recognition, Speech Communication, vol.25, issue.1-3, pp.133-147, 1998. ,
A recursive feature vector normalization approach for robust speech recognition in noise, ICASSP, 1998. ,
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE transactions on Information Theory, vol.13, issue.2, pp.260-269, 1967. ,
A smartphone-based ASR data collection tool for under-resourced languages, Speech Communication, vol.56, pp.119-131, 2014. ,
Phoneme recognition using time-delay neural networks, IEEE Trans. Acoustics, Speech, and Signal Processing, vol.37, pp.328-339, 1989. ,
Segmental audio word2vec: Representing utterances as sequences of vectors with applications in spoken term detection, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6269-6273, 2018. ,
Trainable frontend for robust and far-field keyword spotting, ICASSP, pp.5670-5674, 2017. ,
Voice attributes affecting likability perception, INTERSPEECH, 2010. ,
Classification of whale vocalizations using the weyl transform, Acoustics, Speech and Signal Processing, pp.773-777, 2015. ,
Short-time gaussianization for robust speaker verification, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, 2002. ,
Genetic cnn, 2017 IEEE International Conference on Computer Vision (ICCV), pp.1388-1397, 2017. ,
Achieving human parity in conversational speech recognition, 2016. ,
Empirical evaluation of rectified activations in convolutional network, 2015. ,
Mixed excitation for hmm-based speech synthesis, INTERSPEECH, 2001. ,
The htk book, Cambridge university engineering department, vol.3, p.175, 2002. ,
Joint learning of speaker and phonetic similarities with siamese networks, INTERSPEECH, 2016. ,
A Deep Scattering Spectrum-Deep Siamese Network Pipeline for Unsupervised Acoustic Modeling, ICASSP, 2016. ,
Learning Filterbanks from Raw Speech for Phone Recognition, 2017. ,
End-to-End Speech Recognition from the Raw Waveform, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01888739
Fully convolutional speech recognition, 2018. ,
ADADELTA: An adaptive learning rate method, 2012. ,
Visualizing and understanding convolutional networks, ECCV, 2014. ,
Details of the nitech hmm-based speech synthesis system for the blizzard challenge, IEICE Transactions, pp.90-325, 2005. ,
Improved training of end-to-end attention models for speech recognition, 2018. ,
Towards end-to-end speech recognition with deep convolutional neural networks, 2017. ,
Evolving learning for analysing mood-related infant vocalisation, 2018. ,
Improving End-to-End Speech Recognition with Policy Learning, International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018. ,
, Neural architecture search with reinforcement learning, 2016.