Discriminant chronicle mining

Abstract : Data are recorded for a wide range of application and their analysis is a great challenge addressed by many studies. Among these applications, this thesis was motivated by analyzing care pathway data to conduct pharmaco-epidemiological studies. Pharmaco-epidemiology is the study of the uses and effects of healthcare products in well defined populations. The goal is then to automate this study by analyzing data. Within the data analysis approaches, pattern mining approaches extract behavior descriptions, called patterns, characterizing the data. Patterns are often easily interpretable and give insights about hidden behaviors described by the data. In this thesis, we are interested in mining discriminant temporal patterns from temporal sequences, i.e. a list of timestamped events. Temporal patterns represent expressively behaviors through their temporal dimension. Discriminant patterns are suitable adapted for representing behaviors occurring specifically in small subsets of a whole population. Surprisingly, if temporal patterns are essential to describe timestamped data and discriminant patterns are crucial to identify alternative behaviors that differ from mainstream, discriminant temporal patterns received little attention up to now. In this thesis, the model of discriminant chronicles is proposed to address the lack of interest in discriminant temporal pattern mining approaches. A chronicle is a temporal pattern representable as a graph whose nodes are events and vertices are numerical temporal constraints. The chronicle model was choosen because of its high expressiveness when dealing with temporal sequences and also by its unique ability to describe numerically the temporal dimension among other discriminant pattern models. The contribution of this thesis, centered on the discriminant chronicle model, is threefold: (i) a discriminant chronicle model mining algorithm (DCM), (ii) the study of the discriminant chronicle model interpretability through its generalization and (iii) the DCM application on a pharmaco-epidemiology case study. The DCM algorithm is an efficient algorithm dedicated to extract discriminant chronicles and based on the Ripperk numerical rule learning algorithm. Using Ripperk allows to take advantage to its efficiency and its incomplete heuristic dedicated to avoid redundant patterns. The DCM generalization allows to swap Ripperk with alternative machine learning algorithms. The extracted patterns are not chronicles but a generalized form of chronicles. More expressive machine learning algorithms extract more expressive generalized chronicles but impact negatively their interpretability. The trade-off between this expressiveness gain, evaluated by classification accuracy, and this interpretability loss, is compared for several types of generalized chronicles. The interest of the discriminant chronicle model and the DCM efficiency is validated on synthetic and real datasets in pattern-based classification context. Finally, chronicles are extracted from a pharmaco-epidemiology dataset and presented to clinicians who validated them to be interesting to describe epidemiological behaviors.
Document type :
Theses
Complete list of metadatas

Cited literature [40 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02044269
Contributor : Abes Star <>
Submitted on : Thursday, February 21, 2019 - 2:04:07 PM
Last modification on : Saturday, February 23, 2019 - 1:23:53 AM
Long-term archiving on : Wednesday, May 22, 2019 - 3:51:15 PM

File

DAUXAIS_Yann.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02044269, version 1

Citation

Yann Dauxais. Discriminant chronicle mining. Databases [cs.DB]. Université Rennes 1, 2018. English. ⟨NNT : 2018REN1S052⟩. ⟨tel-02044269⟩

Share

Metrics

Record views

86

Files downloads

115