Mixture models for clustering and dimension reduction

Abstract : In Chapter 1 we give a general introduction and motivate the need for clustering and dimension reduction methods. We continue in Chapther 2 with a review of different types of existing clustering and dimension reduction methods.

In Chapter 3 we introduce mixture densities and the expectation-maximization (EM) algorithm to estimate their parameters. Although the EM algorithm has many attractive properties, it is not guaranteed to return optimal parameter estimates. We present greedy EM parameter estimation algorithms which start with a one-component mixture and then iteratively add a component to the mixture and re-estimate the parameters of the current mixture. Experimentally, we demonstrate that our algorithms avoid many of the sub-optimal estimates returned by the EM algorithm. Finally, we present an approach to accelerate mixture densities estimation from many data points. We apply this approach to both the standard EM algorithm and our greedy EM algorithm.

In Chapter 4 we present a non-linear dimension reduction method that uses a constrained EM algorithm for parameter estimation. Our approach is similar to Kohonen's self-organizing map, but in contrast to the self-organizing map, our parameter estimation algorithm is guaranteed to converge and optimizes a well-defined objective function. In addition, our method allows data with missing values to be used for parameter estimation and it is readily applied to data that is not specified by real numbers but for example by discrete variables. We present the results of several experiments to demonstrate our method and to compare it with Kohonen's self-organizing map.

In Chapter 5 we consider an approach for non-linear dimension reduction which is based on a combination of clustering and linear dimension reduction. This approach forms one global non-linear low dimensional data representation by combining multiple, locally valid, linear low dimensional representations. We derive an improvement of the original parameter estimation algorithm, which requires less computation and leads to better parameter estimates. We experimentally compare this approach to several other dimension reduction methods. We also apply this approach to a setting where high dimensional 'outputs' have to be predicted from high dimensional 'inputs'. Experimentally, we show that the considered non-linear approach leads to better predictions than a similar approach which also combines several local linear representations, but does not combine them into one global non-linear representation.

In Chapter 6 we summarize our conclusions and discuss directions for further research.
Type de document :
Thèse
Computer Science [cs]. Universiteit van Amsterdam, 2004. English
Liste complète des métadonnées

Littérature citée [175 références]  Voir  Masquer  Télécharger


https://tel.archives-ouvertes.fr/tel-00321484
Contributeur : Jakob Verbeek <>
Soumis le : mardi 5 avril 2011 - 14:51:29
Dernière modification le : lundi 25 septembre 2017 - 10:08:04
Document(s) archivé(s) le : mercredi 6 juillet 2011 - 02:56:43

Identifiants

  • HAL Id : tel-00321484, version 2

Citation

Jakob Verbeek. Mixture models for clustering and dimension reduction. Computer Science [cs]. Universiteit van Amsterdam, 2004. English. 〈tel-00321484v2〉

Partager

Métriques

Consultations de la notice

736

Téléchargements de fichiers

2419