Skip to Main content Skip to Navigation
Theses

Investigation of training data issues in ensemble classification based on margin concept : application to land cover mapping

Abstract : Classification has been widely studied in machine learning. Ensemble methods, which build a classification model by integrating multiple component learners, achieve higher performances than a single classifier. The classification accuracy of an ensemble is directly influenced by the quality of the training data used. However, real-world data often suffers from class noise and class imbalance problems. Ensemble margin is a key concept in ensemble learning. It has been applied to both the theoretical analysis and the design of machine learning algorithms. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. This work focuses on exploiting the margin concept to improve the quality of the training set and therefore to increase the classification accuracy of noise sensitive classifiers, and to design effective ensemble classifiers that can handle imbalanced datasets. A novel ensemble margin definition is proposed. It is an unsupervised version of a popular ensemble margin. Indeed, it does not involve the class labels. Mislabeled training data is a challenge to face in order to build a robust classifier whether it is an ensemble or not. To handle the mislabeling problem, we propose an ensemble margin-based class noise identification and elimination method based on an existing margin-based class noise ordering. This method can achieve a high mislabeled instance detection rate while keeping the false detection rate as low as possible. It relies on the margin values of misclassified data, considering four different ensemble margins, including the novel proposed margin. This method is extended to tackle the class noise correction which is a more challenging issue. The instances with low margins are more important than safe samples, which have high margins, for building a reliable classifier. A novel bagging algorithm based on a data importance evaluation function relying again on the ensemble margin is proposed to deal with the class imbalance problem. In our algorithm, the emphasis is placed on the lowest margin samples. This method is evaluated using again four different ensemble margins in addressing the imbalance problem especially on multi-class imbalanced data. In remote sensing, where training data are typically ground-based, mislabeled training data is inevitable. Imbalanced training data is another problem frequently encountered in remote sensing. Both proposed ensemble methods involving the best margin definition for handling these two major training data issues are applied to the mapping of land covers.
Document type :
Theses
Complete list of metadatas

Cited literature [202 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01662444
Contributor : Abes Star :  Contact
Submitted on : Wednesday, December 13, 2017 - 10:46:57 AM
Last modification on : Tuesday, July 9, 2019 - 10:13:34 AM

File

These_Wei_FENG.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01662444, version 1

Citation

Wei Feng. Investigation of training data issues in ensemble classification based on margin concept : application to land cover mapping. Earth Sciences. Université Michel de Montaigne - Bordeaux III, 2017. English. ⟨NNT : 2017BOR30016⟩. ⟨tel-01662444⟩

Share

Metrics

Record views

249

Files downloads

735