Skip to Main content Skip to Navigation
Theses

Evaluation d'une mesure de similitude en classification supervisée : application à la préparation de données séquentielles

Sylvain Ferrandiz 1
1 Equipe CODAG - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Abstract : In the data mining process, the main part of the data preparation step is
devoted to feature construction and selection. The filter approach usually adopted requires
evaluation methods for any kind of feature. We address the problem of the supervised
evaluation of a sequential feature. We show that this problem is solved if a more general
problem is tackled : that of the supervised evaluation of a similarity measure.

We provide such an evaluation method. We first turn the problem into the search of
a discriminating Voronoi partition. Then, we define a new supervised criterion evaluating
such partitions and design a new optimised algorithm. The criterion automatically prevents
from overfitting the data and the algorithm quickly provides a good solution. In the
end, the method can be interpreted as a robust non parametric method for estimating
the conditional density of a categorical target feature given a similarity measure defined
from a descriptive feature.

The method is experimented on many datasets. It is useful for answering questions like :
which day of the week or which hourly time segment is the most relevant to discriminate
customers from their call detailed records ? Which series allows to better estimate the
customer need for a new service ?
Document type :
Theses
Complete list of metadatas

Cited literature [57 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00123406
Contributor : Hal System <>
Submitted on : Tuesday, January 9, 2007 - 3:26:23 PM
Last modification on : Tuesday, February 5, 2019 - 12:12:41 PM
Long-term archiving on: : Tuesday, April 6, 2010 - 9:50:23 PM

Identifiers

  • HAL Id : tel-00123406, version 1

Citation

Sylvain Ferrandiz. Evaluation d'une mesure de similitude en classification supervisée : application à la préparation de données séquentielles. Informatique [cs]. Université de Caen, 2006. Français. ⟨tel-00123406⟩

Share

Metrics

Record views

264

Files downloads

614