Skip to Main content Skip to Navigation
Theses

Revisiting Data Partitioning for Scalable RDF Graph Processing

Abstract : The Resource Description Framework (RDF) and SPARQL are very popular graph-based standards initially designed to represent and query information on the Web. The flexibility offered by RDF motivated its use in other domains and today RDF datasets are great information sources. They gather billions of triples in Knowledge Graphs that must be stored and efficiently exploited. The first generation of RDF systems was built on top of traditional relational databases. Unfortunately, the performance in these systems degrades rapidly as the relational model is not suitable for handling RDF data inherently represented as a graph. Native and distributed RDF systems seek to overcome this limitation. The former mainly use indexing as an optimization strategy to speed up queries. Distributed and parallel RDF systems resorts to data partitioning. The logical representation of the database is crucial to design data partitions in the relational model. The logical layer defining the explicit schema of the database provides a degree of comfort to database designers. It lets them choose manually or automatically (through advisors) the tables and attributes to be partitioned. Besides, it allows the partitioning core concepts to remain constant regardless of the database management system. This design scheme is no longer valid for RDF databases. Essentially, because the RDF model does not explicitly enforce a schema since RDF data is mostly implicitly structured. Thus, the logical layer is inexistent and data partitioning depends strongly on the physical implementations of the triples on disk. This situation contributes to have different partitioning logics depending on the target system, which is quite different from the relational model’s perspective. In this thesis, we promote the novel idea of performing data partitioning at the logical level in RDF databases. Thereby, we first process the RDF data graph to support logical entity-based partitioning. After this preparation, we present a partitioning framework built upon these logical structures. This framework is accompanied by data fragmentation, allocation, and distribution procedures. This framework was incorporated to a centralized (RDF_QDAG) and a distributed (gStoreD) triple store. We conducted several experiments that confirmed the feasibility of integrating our framework to existent systems improving their performances for certain queries. Finally, we design a set of RDF data partitioning management tools including a data definition language (DDL) and an automatic partitioning wizard.
Document type :
Theses
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03167657
Contributor : Abes Star :  Contact
Submitted on : Friday, March 12, 2021 - 11:48:24 AM
Last modification on : Friday, March 26, 2021 - 10:27:47 AM
Long-term archiving on: : Sunday, June 13, 2021 - 6:44:28 PM

File

2020ESMA0020_galicia-auyon.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03167657, version 1

Collections

Citation

Jorge Armando Galicia Auyón. Revisiting Data Partitioning for Scalable RDF Graph Processing. Other [cs.OH]. ISAE-ENSMA Ecole Nationale Supérieure de Mécanique et d'Aérotechique - Poitiers, 2021. English. ⟨NNT : 2021ESMA0001⟩. ⟨tel-03167657⟩

Share

Metrics

Record views

170

Files downloads

413