Mixture models for clustering and dimension reduction

Abstract : In Chapter 1 we give a general introduction and motivate the need for clustering and dimension reduction methods. We continue in Chapther 2 with a review of different types of existing clustering and dimension reduction methods.

In Chapter 3 we introduce mixture densities and the expectation-maximization (EM) algorithm to estimate their parameters. Although the EM algorithm has many attractive properties, it is not guaranteed to return optimal parameter estimates. We present greedy EM parameter estimation algorithms which start with a one-component mixture and then iteratively add a component to the mixture and re-estimate the parameters of the current mixture. Experimentally, we demonstrate that our algorithms avoid many of the sub-optimal estimates returned by the EM algorithm. Finally, we present an approach to accelerate mixture densities estimation from many data points. We apply this approach to both the standard EM algorithm and our greedy EM algorithm.

In Chapter 4 we present a non-linear dimension reduction method that uses a constrained EM algorithm for parameter estimation. Our approach is similar to Kohonen's self-organizing map, but in contrast to the self-organizing map, our parameter estimation algorithm is guaranteed to converge and optimizes a well-defined objective function. In addition, our method allows data with missing values to be used for parameter estimation and it is readily applied to data that is not specified by real numbers but for example by discrete variables. We present the results of several experiments to demonstrate our method and to compare it with Kohonen's self-organizing map.

In Chapter 5 we consider an approach for non-linear dimension reduction which is based on a combination of clustering and linear dimension reduction. This approach forms one global non-linear low dimensional data representation by combining multiple, locally valid, linear low dimensional representations. We derive an improvement of the original parameter estimation algorithm, which requires less computation and leads to better parameter estimates. We experimentally compare this approach to several other dimension reduction methods. We also apply this approach to a setting where high dimensional 'outputs' have to be predicted from high dimensional 'inputs'. Experimentally, we show that the considered non-linear approach leads to better predictions than a similar approach which also combines several local linear representations, but does not combine them into one global non-linear representation.

In Chapter 6 we summarize our conclusions and discuss directions for further research.
Document type :
Theses
Complete list of metadatas

Cited literature [175 references]  Display  Hide  Download


https://tel.archives-ouvertes.fr/tel-00321484
Contributor : Jakob Verbeek <>
Submitted on : Tuesday, April 5, 2011 - 2:51:29 PM
Last modification on : Monday, September 25, 2017 - 10:08:04 AM
Long-term archiving on : Wednesday, July 6, 2011 - 2:56:43 AM

Identifiers

  • HAL Id : tel-00321484, version 2

Citation

Jakob Verbeek. Mixture models for clustering and dimension reduction. Computer Science [cs]. Universiteit van Amsterdam, 2004. English. ⟨tel-00321484v2⟩

Share

Metrics

Record views

745

Files downloads

3072