Modèles de mélange de von Mises-Fisher

Abstract : In contemporary life directional data are present in most areas, in several forms, aspects and large sizes / dimensions; hence the need for effective methods of studying the existing problems in these fields. To solve the problem of clustering, the probabilistic approach has become a classic approach, based on the simple idea: since the g classes are different from each other, it is assumed that each class follows a distribution of probability, whose parameters are generally different from one class to another. We are concerned here with mixture modelling. Under this assumption, the initial data are considered as a sample of a d-dimensional random variable whose density is a mixture of g distributions of probability where each one is specific to a class. In this thesis we are interested in the clustering of directional data that has been treated using known classification methods which are the most appropriate for this case. In which both approaches the geometric and the probabilistic one have been considered. In the first, some kmeans like algorithms have been explored and considered. In the second, by directly handling the estimation of parameters from which is deduced the partition maximizing the log-likelihood, this approach is represented by the EM algorithm. For the latter approach, model mixtures of distributions of von Mises-Fisher have been used, proposing variants of the EM algorithm: EMvMF, the CEMvMF, the SEMvMF and the SAEMvMF. In the same context, the problem of finding the number of the components in the mixture and the choice of the model, using some information criteria {Bic, Aic, Aic3, Aic4, AICC, AICU, CAIC, Clc, Icl-Bic, LI, Icl, Awe} have been discussed. The study concludes with a comparison of the used vMF model with a simpler exponential model. In the latter, it is assumed that all data are distributed on a hypersphere of a predetermined radius greater than one, instead of a unit hypersphere in the case of the vMF model. An improvement of this method based on the estimation step of the radius in the algorithm NEMρ has been proposed: this allowed us in most of our applications to find the best partitions; we have developed also the NCEMρ and NSEMρ algorithms. The algorithms proposed in this work were performed on a variety of textual data, genetic data and simulated data according to the vMF model; these applications gave us a better understanding of the different studied approaches throughout this thesis.
Document type :
Theses
Complete list of metadatas

Cited literature [127 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00987196
Contributor : Abes Star <>
Submitted on : Monday, May 5, 2014 - 4:07:26 PM
Last modification on : Thursday, April 11, 2019 - 4:02:18 PM
Long-term archiving on : Tuesday, August 5, 2014 - 12:50:11 PM

File

va_Bouberima_Wafia.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00987196, version 1

Collections

Citation

Wafia Parr Bouberima. Modèles de mélange de von Mises-Fisher. Mathématiques générales [math.GM]. Université René Descartes - Paris V, 2013. Français. ⟨NNT : 2013PA05S028⟩. ⟨tel-00987196⟩

Share

Metrics

Record views

450

Files downloads

807