Skip to Main content Skip to Navigation

Apprentissage probabiliste de similarités d'édition

Abstract : In computer science, a lot of applications use distances. In the context of structured data, strings or trees, we mainly use the edit distance. The edit distance is defined as the minimum number of edit operation (insertion, deletion and substitution) needed to transform one data into the other one. Given the application, it is possible to tune the edit distance by adding a weight to each edit operation. In this work, we use a supervised machine learning approach to learn the weight of edit operation. The exploited algorithm, called Expectation-Maximisation, is a method for finding maximum likelihood estimates of parameters in a model given a learning sample of pairs of similar examples. The first contribution is an extension of earlier works on string to trees. The model is represent by a transducer with a single state. We apply successfully our method on a handwritten character recognition task. In a last part, we introduce a new model on strings under constraints. The model is made of a finite set of states where the transitions are constrained. A constraint is a finite set of boolean functions defined over an input string and one of its position. We show the relevance of our approach on a molecular biology task. We consider the problem of detecting Transcription Factor Binding Site in DNA sequences
Document type :
Complete list of metadatas

Cited literature [73 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Wednesday, July 18, 2012 - 12:12:10 PM
Last modification on : Monday, January 13, 2020 - 5:46:02 PM
Long-term archiving on: : Friday, October 19, 2012 - 2:41:19 AM


Version validated by the jury (STAR)


  • HAL Id : tel-00718835, version 1


Laurent Boyer. Apprentissage probabiliste de similarités d'édition. Autre [cs.OH]. Université Jean Monnet - Saint-Etienne, 2011. Français. ⟨NNT : 2011STET4027⟩. ⟨tel-00718835⟩



Record views


Files downloads