Skip to Main content Skip to Navigation
Theses

Apprentissage probabiliste de similarités d'édition

Abstract : In computer science, a lot of applications use distances. In the context of structured data, strings or trees, we mainly use the edit distance. The edit distance is defined as the minimum number of edit operation (insertion, deletion and substitution) needed to transform one data into the other one. Given the application, it is possible to tune the edit distance by adding a weight to each edit operation. In this work, we use a supervised machine learning approach to learn the weight of edit operation. The exploited algorithm, called Expectation-Maximisation, is a method for finding maximum likelihood estimates of parameters in a model given a learning sample of pairs of similar examples. The first contribution is an extension of earlier works on string to trees. The model is represent by a transducer with a single state. We apply successfully our method on a handwritten character recognition task. In a last part, we introduce a new model on strings under constraints. The model is made of a finite set of states where the transitions are constrained. A constraint is a finite set of boolean functions defined over an input string and one of its position. We show the relevance of our approach on a molecular biology task. We consider the problem of detecting Transcription Factor Binding Site in DNA sequences
Document type :
Theses
Complete list of metadatas

Cited literature [73 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00718835
Contributor : Abes Star :  Contact
Submitted on : Wednesday, July 18, 2012 - 12:12:10 PM
Last modification on : Monday, January 13, 2020 - 5:46:02 PM
Long-term archiving on: : Friday, October 19, 2012 - 2:41:19 AM

File

Manuscrit_LaurentBoyer1.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00718835, version 1

Citation

Laurent Boyer. Apprentissage probabiliste de similarités d'édition. Autre [cs.OH]. Université Jean Monnet - Saint-Etienne, 2011. Français. ⟨NNT : 2011STET4027⟩. ⟨tel-00718835⟩

Share

Metrics

Record views

559

Files downloads

1064