Ensemble Learning, Comparative Analysis and Further Improvements with Dynamic Ensemble Selection

Anil Narassiguin 1, 2
Abstract : Ensemble methods has been a very popular research topic during the last decade. Their success arises largely from the fact that they offer an appealing solution to several interesting learning problems, such as improving prediction accuracy, feature selection, metric learning, scaling inductive algorithms to large databases, learning from multiple physically distributed data sets, learning from concept-drifting data streams etc. In this thesis, we first present an extensive empirical comparison between nineteen prototypical supervised ensemble learning algorithms, that have been proposed in the literature, on various benchmark data sets. We not only compare their performance in terms of standard performance metrics (Accuracy, AUC, RMS) but we also analyze their kappa-error diagrams, calibration and bias-variance properties. We then address the problem of improving the performances of ensemble learning approaches with dynamic ensemble selection (DES). Dynamic pruning is the problem of finding given an input x, a subset of models among the ensemble that achieves the best possible prediction accuracy. The idea behind DES approaches is that different models have different areas of expertise in the instance space. Most methods proposed for this purpose estimate the individual relevance of the base classifiers within a local region of competence usually given by the nearest neighbours in the euclidean space. We propose and discuss two novel DES approaches. The first, called ST-DES, is designed for decision tree based ensemble models. This method prunes the trees using an internal supervised tree-based metric; it is motivated by the fact that in high dimensional data sets, usual metrics like euclidean distance suffer from the curse of dimensionality. The second approach, called PCC-DES, formulates the DES problem as a multi-label learning task with a specific loss function. Labels correspond to the base classifiers and multi-label training examples are formed based on the ability of each classifier to correctly classify each original training example. This allows us to take advantage of recent advances in the area of multi-label learning. PCC-DES works on homogeneous and heterogeneous ensembles as well. Its advantage is to explicitly capture the dependencies between the classifiers predictions. These algorithms are tested on a variety of benchmark data sets and the results demonstrate their effectiveness against competitive state-of-the-art alternatives
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-02146962
Contributor : Abes Star <>
Submitted on : Tuesday, June 4, 2019 - 12:07:10 PM
Last modification on : Wednesday, November 20, 2019 - 3:18:22 AM

File

TH2018NARASSIGUINANIL.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02146962, version 1

Citation

Anil Narassiguin. Ensemble Learning, Comparative Analysis and Further Improvements with Dynamic Ensemble Selection. Artificial Intelligence [cs.AI]. Université de Lyon, 2018. English. ⟨NNT : 2018LYSE1075⟩. ⟨tel-02146962⟩

Share

Metrics

Record views

102

Files downloads

217