Skip to Main content Skip to Navigation

Ensemble Learning, Comparative Analysis and Further Improvements with Dynamic Ensemble Selection

Anil Narassiguin 1, 2 
Abstract : Ensemble methods has been a very popular research topic during the last decade. Their success arises largely from the fact that they offer an appealing solution to several interesting learning problems, such as improving prediction accuracy, feature selection, metric learning, scaling inductive algorithms to large databases, learning from multiple physically distributed data sets, learning from concept-drifting data streams etc. In this thesis, we first present an extensive empirical comparison between nineteen prototypical supervised ensemble learning algorithms, that have been proposed in the literature, on various benchmark data sets. We not only compare their performance in terms of standard performance metrics (Accuracy, AUC, RMS) but we also analyze their kappa-error diagrams, calibration and bias-variance properties. We then address the problem of improving the performances of ensemble learning approaches with dynamic ensemble selection (DES). Dynamic pruning is the problem of finding given an input x, a subset of models among the ensemble that achieves the best possible prediction accuracy. The idea behind DES approaches is that different models have different areas of expertise in the instance space. Most methods proposed for this purpose estimate the individual relevance of the base classifiers within a local region of competence usually given by the nearest neighbours in the euclidean space. We propose and discuss two novel DES approaches. The first, called ST-DES, is designed for decision tree based ensemble models. This method prunes the trees using an internal supervised tree-based metric; it is motivated by the fact that in high dimensional data sets, usual metrics like euclidean distance suffer from the curse of dimensionality. The second approach, called PCC-DES, formulates the DES problem as a multi-label learning task with a specific loss function. Labels correspond to the base classifiers and multi-label training examples are formed based on the ability of each classifier to correctly classify each original training example. This allows us to take advantage of recent advances in the area of multi-label learning. PCC-DES works on homogeneous and heterogeneous ensembles as well. Its advantage is to explicitly capture the dependencies between the classifiers predictions. These algorithms are tested on a variety of benchmark data sets and the results demonstrate their effectiveness against competitive state-of-the-art alternatives
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, June 4, 2019 - 12:07:10 PM
Last modification on : Tuesday, April 19, 2022 - 10:10:41 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02146962, version 1


Anil Narassiguin. Ensemble Learning, Comparative Analysis and Further Improvements with Dynamic Ensemble Selection. Artificial Intelligence [cs.AI]. Université de Lyon, 2018. English. ⟨NNT : 2018LYSE1075⟩. ⟨tel-02146962⟩



Record views


Files downloads