Calcul de motifs sous contraintes pour la classification supervisée

Abstract : Recent advances in local pattern mining (e.g., frequent itemsets or association rules) has shown to be very useful for classification tasks. This thesis deals with local constraint-based pattern mining and its use in classification problems. We suggest methodological contributions for two difficult classification tasks: When training classifiers, the presence of attribute-noise can severely harm their performance. Existing methods try to correct noisy attribute values or delete noisy objects -- thus leading to some information loss. In this thesis, we propose an application-independent method for noise-tolerant feature construction -- without modifying attribute values or deleting any objects. Our approach is two-step: Firstly, we mine a set delta-strong characterization rules. These rules own fair properties such as a minimal body, redundancy-awareness and are based on delta-freeness and delta-closedness -- both have already served as a basis for a fault-tolerant pattern and for cluster characterization in noisy data sets. Secondly, from each extracted rule, we build a new numeric robust descriptor. The experiments we led in noisy environments have shown that classical classifiers are more accurate on data sets with the new robust features than on original data -- thus validating our approach. When class distribution is imbalanced, existing pattern-based classification methods show a bias towards the majority class. In this case, accuracy results for the majority class are abnormally high to the expense of poor accuracy results for the minority class(es). In this thesis, we explain the whys and whens of this bias. Existing methods do not take into account the class distribution or the error repartition of mined patterns in the different classes. In order to overcome this problem, we suggest a new framework and deal with a new pattern type to be mined: the One-Versus-Each-characterization rules (OVE). However, in this new framework, several frequency and infrequency thresholds have to be tuned. Therefore, we suggest fitcare an optimization algorithm for automatic parameter tuning in addition to an extraction algorithm for OVE-characterization rule mining. The experimentations on imbalanced multi-class data sets have shown that fitcare is significantly more accurate on minor class prediction than existing approaches. The application of our OVE framework to a soil erosion data analysis scenario has shown the added-value of our proposal by providing a soil erosion characterization validated by domain experts.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00516706
Contributor : Dominique Gay <>
Submitted on : Friday, September 10, 2010 - 6:23:25 PM
Last modification on : Wednesday, November 20, 2019 - 7:10:35 AM
Long-term archiving on : Saturday, December 11, 2010 - 2:56:16 AM

Identifiers

  • HAL Id : tel-00516706, version 1

Collections

Citation

Dominique Gay. Calcul de motifs sous contraintes pour la classification supervisée. Interface homme-machine [cs.HC]. Université de Nouvelle Calédonie; INSA de Lyon, 2009. Français. ⟨tel-00516706⟩

Share

Metrics

Record views

496

Files downloads

982