Etude comportementale des mesures d'intérêt d'extraction de connaissances

Abstract : The search for interesting association rules is an important and active field in data mining. Since knowledge discovery from databases used algorithms (KDD) tend to generate a large number of rules, it is difficult for the user to select by himself the really interesting knowledge. To address this problem, an automatic post-filtering rules is essential to significantly reduce their number. Hence, many interestingness measures have been proposed in the literature in order to filter and/or sort discovered rules. As interestingness depends on both user preferences and data, interestingness measures were classified into two categories : subjective measures (user-driven) and objective measures (data-driven). We focus on the study of objective measures. Nevertheless, there are a plethora of objective measures in the literature, which increase the user’s difficulty for choosing the appropriate measure. Thus, our goal is to avoid such difficulty by proposing groups of similar measures by means of categorization approaches. The thesis presents two approaches to assist the user in his problematic of objective measures choice : (1) formal study as per the definition of a set of measures properties that lead to a good measure evaluation ; (2) experimental study of the behavior of various interestingness measures from data analysispoint of view. Regarding the first approach, we perform a thorough theoretical study of a large number of measures in several formal properties. To do this, we offer first of all a formalization of these properties in order to remove any ambiguity about them. We then study for various objective interestingness measures, the presence or absence of appropriate characteristic properties. Interestingness measures evaluation is therefore a starting point for measures categorization. Different clustering methods have been applied : (i) non overlapping methods (CAH and k-means) which allow to obtain disjoint groups of measures, (ii) overlapping method (Boolean factor analysis) that provides overlapping groups of measures. Regarding the second approach, we propose an empirical study of the behavior of about sixty measures on datasets with different nature. Thus, we propose an experimental methodology, from which we seek to identify groups of measures that have empirically similar behavior. We do next confrontation with the two classification results, formal and empirical in order to validate and enhance our first approach. Both approaches are complementary, in order to help the user making the right choice of the appropriate interestingness measure to his application.
Document type :
Theses
Complete list of metadatas

Cited literature [191 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01023975
Contributor : Abes Star <>
Submitted on : Tuesday, July 15, 2014 - 3:08:49 PM
Last modification on : Thursday, March 14, 2019 - 12:20:16 PM
Long-term archiving on : Friday, November 21, 2014 - 6:09:02 PM

File

GRISSA_2013CLF22401.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01023975, version 1

Citation

Dhouha Grissa. Etude comportementale des mesures d'intérêt d'extraction de connaissances. Autre [cs.OH]. Université Blaise Pascal - Clermont-Ferrand II, 2013. Français. ⟨NNT : 2013CLF22401⟩. ⟨tel-01023975⟩

Share

Metrics

Record views

465

Files downloads

2895