Skip to Main content Skip to Navigation

3D tongue motion visualization based on the B-mode ultrasound tongue images

Abstract : A silent speech interface (SSI) is a system to enable speech communication with non-audible signal, that employs sensors to capture non-acoustic features for speech recognition and synthesis. Extracting robust articulatory features from such signals, however, remains a challenge. As the tongue is a major component of the vocal tract, and the most important articulator during speech production, a realistic simulation of tongue motion in 3D can provide a direct, effective visual representation of speech production. This representation could in turn be used to improve the performance of speech recognition of an SSI, or serve as a tool for speech production research and the study of articulation disorders. In this thesis, we explore a novel 3D tongue visualization framework, which combines the 2D ultrasound imaging and 3D physics-based modeling technique. Firstly, different approaches are employed to follow the motion of the tongue in the ultrasound image sequences, which can be divided into two main types of methods: speckle tracking and contour tracking. The methods to track speckles include deformation registration, optical-flow, and local invariant features-based method. Moreover, an image-based tracking re-initialization method is proposed to improve the robustness of speckle tracking. Compared to speckle tracking, the extraction of the contour of the tongue surface from ultrasound images exhibits superior performance and robustness. In this thesis, a novel contour-tracking algorithm is presented for ultrasound tongue image sequences, which can follow the motion of tongue contours over long durations with good robustness. To cope with missing segments caused by noise, or by the tongue midsagittal surface being parallel to the direction of ultrasound wave propagation, active contours with a contour-similarity constraint are introduced, which can be used to provide “prior” shape information. Experiments on synthetic data and on real 60 frame per second data from different subjects demonstrate that the proposed method gives good contour tracking for ultrasound image sequences even over durations of minutes, which can be useful in applications such as speech recognition where very long sequences must be analyzed in their entirety…
Complete list of metadatas

Cited literature [56 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Wednesday, May 31, 2017 - 12:55:12 PM
Last modification on : Friday, May 29, 2020 - 4:00:46 PM
Document(s) archivé(s) le : Wednesday, September 6, 2017 - 4:18:52 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01529771, version 1


Kele Xu. 3D tongue motion visualization based on the B-mode ultrasound tongue images. Computer Aided Engineering. Université Pierre et Marie Curie - Paris VI, 2016. English. ⟨NNT : 2016PA066498⟩. ⟨tel-01529771⟩



Record views


Files downloads