Skip to Main content Skip to Navigation

Apport des modèles neuronaux de bout-en-bout pour la compréhension automatique de la parole dans l'habitat intelligent

Abstract : Smart speakers offer the possibility of interacting with smart home systems, and make it possible to issue a range of requests about various subjects. They represent the first ambient voice interfaces that are frequently available in home environments. Very often they are only capable of inferring voice commands of a simple syntax in short utterances in the realm of smart homes that promote home care for senior adults. They support them during everyday situations by improving their quality of life, and also providing assistance in situations of distress. The design of these smart homes mainly focuses on the safety and comfort of its habitants. As a result, these research projects frequently concentrate on human activity detection, resulting in a lack of attention for the communicative aspects in a smart home design. Consequently, there are insufficient speech corpora, specific to the home automation field, in particular for languages other than English. However the availability of these corpora are crucial for developing interactive communication systems between the smart home and its inhabitants. Such corpora at one’s disposal could also contribute to the development of a generation of smart speakers capable of extracting more complex voice commands. As a consequence, part of our work consisted in developing a corpus generator, producing home automation domain specific voice commands, automatically annotated with intent and concept labels. The extraction of intents and concepts from these commands, by a Spoken Language Understanding (SLU) system is necessary to provide the decision-making module with the information, necessary for their execution. In order to react to speech, the natural language understanding (NLU) module is typically preceded by an automatic speech recognition (ASR) module, automatically converting speech into transcriptions. As several studies have shown, the interaction between ASR and NLU in a sequential SLU approach accumulates errors. Therefore, one of the main motivations of our work is the development of an end-to-end SLU module, extracting concepts and intents directly from speech. To achieve this goal, we first develop a sequential SLU approach as our baseline approach, in which a classic ASR method generates transcriptions that are passed to the NLU module, before continuing with the development of an End-to-end SLU module. These two SLU systems were evaluated on a corpus recorded in the home automation domain. We investigate whether the prosodic information that the end-to-end SLU system has access to, contributes to SLU performance. We position the two approaches also by comparing their robustness, facing speech with more semantic and syntactic variation.The context of this thesis is the ANR VocADom project.
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Wednesday, April 7, 2021 - 4:55:18 PM
Last modification on : Thursday, June 3, 2021 - 2:40:44 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03192050, version 1



Thierry Desot. Apport des modèles neuronaux de bout-en-bout pour la compréhension automatique de la parole dans l'habitat intelligent. Réseau de neurones [cs.NE]. Université Grenoble Alpes [2020-..], 2020. Français. ⟨NNT : 2020GRALM069⟩. ⟨tel-03192050⟩



Record views


Files downloads