Fouille de motifs : formalisation et unification

Abstract : Over the last two decades, a great deal of work has been devoted to the algorithmic aspects of the Frequent Pattern (FP) mining problem, leading to a phenomenal number of algorithms and associated implementations, each of which claims supremacy. Meanwhile, it is generally well agreed that developing a unifying theory is one of the most important issues in data mining research. Hence, our primary motivation for this work is to introduce a high level formalism for this basic problem, which induces a unified vision of the algorithmic approaches presented so far. The key distinctive feature of the introduced model is that it combines, in one fashion, both the qualitative and the quantitative aspects of this basic problem, which were previously handled separately. In this thesis, we propose a new model for the FP-mining task based on formal series. In fact, we encode the patterns as words over a sorted alphabet and express this problem by a formal series over the counting semiring $(\N,+,\times,0,1)$, whose range represents the patterns, and the coefficients are their supports. The aim is threefold: first, to define a clear, unified and extensible theoretical framework through which we can state the main FP-approaches. Second, to prove a convenient connection between the determinization of the acyclic weighted automaton that represents a transaction dataset and the computation of the associated collection of FP. Finally, to devise a first implementation, baptized \WAFI ~(for Weighted Automata Frequent Itemset mining algorithm), of our model by means of weighted automata, which we evaluate against representative leading algorithms. The obtained results show the suitability of our formalism.
Document type :
Theses
Complete list of metadatas

Cited literature [187 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01760242
Contributor : Slimane Oulad-Naoui <>
Submitted on : Monday, June 4, 2018 - 10:29:23 PM
Last modification on : Friday, March 15, 2019 - 3:36:19 PM

File

ma these finale.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01760242, version 2

Citation

Slimane Oulad-Naoui. Fouille de motifs : formalisation et unification. Informatique [cs]. UATL (Algeria), 2018. Français. ⟨tel-01760242v2⟩

Share

Metrics

Record views

297

Files downloads

175