Skip to Main content Skip to Navigation
Theses

Unsupervised anomaly detection : methods and applications

Abstract : An anomaly (also known as outlier) is an instance that significantly deviates from the rest of the input data and being defined by Hawkins as 'an observation, which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism'. Anomaly detection (also known as outlier or novelty detection) is thus the machine learning and data mining field with the purpose of identifying those instances whose features appear to be inconsistent with the remainder of the dataset. In many applications, correctly distinguishing the set of anomalous data points (outliers) from the set of normal ones (inliers) proves to be very important. A first application is data cleaning, i.e., identifying noisy and fallacious measurement in a dataset before further applying learning algorithms. However, with the explosive growth of data volume collectable from various sources, e.g., card transactions, internet connections, temperature measurements, etc. the use of anomaly detection becomes a crucial stand-alone task for continuous monitoring of the systems. In this context, anomaly detection can be used to detect ongoing intrusion attacks, faulty sensor networks or cancerous masses.The thesis proposes first a batch tree-based approach for unsupervised anomaly detection, called 'Random Histogram Forest (RHF)'. The algorithm solves the curse of dimensionality problem using the fourth central moment (aka kurtosis) in the model construction while boasting linear running time. A stream based anomaly detection engine, called 'ODS', that leverages DenStream, an unsupervised clustering technique is presented subsequently and finally Automated Anomaly Detection engine which alleviates the human effort required when dealing with several algorithm and hyper-parameters is presented as last contribution.
Document type :
Theses
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03651493
Contributor : ABES STAR :  Contact
Submitted on : Monday, April 25, 2022 - 5:31:10 PM
Last modification on : Tuesday, April 26, 2022 - 11:01:10 AM
Long-term archiving on: : Tuesday, July 26, 2022 - 7:38:24 PM

File

99608_PUTINA_2022_archivage.pd...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03651493, version 1

Collections

Citation

Andrian Putina. Unsupervised anomaly detection : methods and applications. Machine Learning [cs.LG]. Institut Polytechnique de Paris, 2022. English. ⟨NNT : 2022IPPAT012⟩. ⟨tel-03651493⟩

Share

Metrics

Record views

140

Files downloads

61