Skip to Main content Skip to Navigation

Ensemble Learning for Extremely Imbalced Data Flows

Abstract : Machine learning is the study of designing algorithms that learn from trainingdata to achieve a specific task. The resulting model is then used to predict overnew (unseen) data points without any outside help. This data can be of manyforms such as images (matrix of pixels), signals (sounds,...), transactions (age,amount, merchant,...), logs (time, alerts, ...). Datasets may be defined to addressa specific task such as object recognition, voice identification, anomaly detection,etc. In these tasks, the knowledge of the expected outputs encourages a supervisedlearning approach where every single observed data is assigned to a label thatdefines what the model predictions should be. For example, in object recognition,an image could be associated with the label "car" which suggests that the learningalgorithm has to learn that a car is contained in this picture, somewhere. This is incontrast with unsupervised learning where the task at hand does not have explicitlabels. For example, one popular topic in unsupervised learning is to discoverunderlying structures contained in visual data (images) such as geometric formsof objects, lines, depth, before learning a specific task. This kind of learning isobviously much harder as there might be potentially an infinite number of conceptsto grasp in the data. In this thesis, we focus on a specific scenario of thesupervised learning setting: 1) the label of interest is under represented (e.g.anomalies) and 2) the dataset increases with time as we receive data from real-lifeevents (e.g. credit card transactions). In fact, these settings are very common inthe industrial domain in which this thesis takes place.
Document type :
Complete list of metadatas

Cited literature [177 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Wednesday, July 15, 2020 - 4:43:31 PM
Last modification on : Monday, August 3, 2020 - 8:52:26 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02899943, version 1


Jordan Frery. Ensemble Learning for Extremely Imbalced Data Flows. Artificial Intelligence [cs.AI]. Université de Lyon, 2019. English. ⟨NNT : 2019LYSES034⟩. ⟨tel-02899943⟩



Record views


Files downloads