Skip to Main content Skip to Navigation
Theses

Handling imbalanced datasets by reconstruction rules in decomposition schemes

Abstract : Disproportion among class priors is encountered in a large number of domains making conventional learning algorithms less effective in predicting samples belonging to the minority classes. We aim at developing a reconstruction rule suited to multiclass skewed data. In performing this task we use the classification reliability that conveys useful information on the goodness of classification acts. In the framework of One-per-Class decomposition scheme we design a novel reconstruction rule, Reconstruction Rule by Selection, which uses classifiers reliabilities, crisp labels and a-priori distributions to compute the final decision. Tests show that system performance improves using this rule rather than using well-established reconstruction rules. We investigate also the rules in the Error Correcting Output Code (ECOC) decomposition framework. Inspired by a statistical reconstruction rule designed for the One-per-Class and Pair-Wise Coupling decomposition approaches, we have developed a rule that applies softmax regression on reliability outputs in order to estimate the final classification. Results show that this choice improves the performances with respect to the existing statistical rule and to well-established reconstruction rules. On the topic of reliability estimation we notice that small attention has been given to efficient posteriors estimation in the boosting framework. On this reason we develop an efficient posteriors estimator by boosting Nearest Neighbors. Using Universal Nearest Neighbours classifier we prove that a sub-class of surrogate losses exists, whose minimization brings simple and statistically efficient estimators for Bayes posteriors.
Document type :
Theses
Complete list of metadatas

Cited literature [155 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00995021
Contributor : Abes Star :  Contact
Submitted on : Thursday, May 22, 2014 - 3:18:44 PM
Last modification on : Monday, October 12, 2020 - 10:30:35 AM
Long-term archiving on: : Friday, August 22, 2014 - 12:45:36 PM

File

2014NICE4007.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00995021, version 1

Collections

Citation

Roberto d'Ambrosio. Handling imbalanced datasets by reconstruction rules in decomposition schemes. Other [cs.OH]. Université Nice Sophia Antipolis, 2014. English. ⟨NNT : 2014NICE4007⟩. ⟨tel-00995021⟩

Share

Metrics

Record views

761

Files downloads

949