Skip to Main content Skip to Navigation
Theses

Frequent itemset sampling of high throughput streams on FPGA accelerators

Maël Gueguen 1, 2, 3
2 CAIRN - Energy Efficient Computing ArchItectures with Embedded Reconfigurable Resources
Inria Rennes – Bretagne Atlantique , IRISA-D3 - ARCHITECTURE
3 LACODAM - Large Scale Collaborative Data Mining
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : The field of frequent pattern mining aims to discover recurring patterns from a given database. Many pattern mining approaches are available in the scientific literature, yet most of them suffer from the same drawback: there can be many output results, which contain highly redundant information. This makes such results hard to analyze. A technique called output space sampling has recently being used along frequent pattern mining for this very reason. Output space sampling consists in returning a bounded sample of the results, with statistical guarantees that ensure it is representative of the complete output. In a field where fast adaptation to trends is prevalent, an imperfect real-time analysis can be preferable over exhaustive offline analysis. To this aim, the thesis focuses its work on dedicated hardware architectures, more energy and time efficient than commonly used servers. The first contribution of the thesis is a frequent pattern mining accelerator for FPGA architectures. the proposed solution allow for a greater architectural flexibility, while reducing the cost of on-Chip memory, a scarce resource for the architecture. This first contribution proposes algorithmic improvements, to allow for a regularisation of the explored research space suited for efficient computing on FPGA. Furthermore, we propose an FPGA accelerator able to manage the heavy load of communication with its external memory. The second contribution extends the first one, restricted to static databases, to streaming databases. This requires to reconsider the theoretical basis of the sampling technique, as the value of the sample must be representative of the most recent snapshot of the stream, but also of the important trends in the close past of the stream.
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03120148
Contributor : Abes Star :  Contact
Submitted on : Wednesday, April 7, 2021 - 9:04:37 AM
Last modification on : Thursday, April 8, 2021 - 9:05:26 AM

File

GUEGUEN_Mael.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03120148, version 2

Citation

Maël Gueguen. Frequent itemset sampling of high throughput streams on FPGA accelerators. Embedded Systems. Université Rennes 1, 2020. English. ⟨NNT : 2020REN1S053⟩. ⟨tel-03120148v2⟩

Share

Metrics

Record views

38

Files downloads

50