Skip to Main content Skip to Navigation
Conference papers

MTCopula: Synthetic Complex Data Generation Using Copula

Fodil Benali 1, 2 Damien Bodénès 2 Nicolas Labroche 1 Cyril de Runz 1
1 BDTLN - Bases de données et traitement des langues naturelles
LIFAT - Laboratoire d'Informatique Fondamentale et Appliquée de Tours
Abstract : Nowadays, marketing strategies are data-driven, and their quality depends significantly on the quality and quantity of available data. As it is not always possible to access this data, there is a need for synthetic data generation. Most of the existing techniques work well for low-dimensional data and may fail to capture complex dependencies between data dimensions. Moreover, the tedious task of identifying the right combination of models and their respective parameters is still an open problem. In this paper, we present MTCopula, a novel approach for synthetic complex data generation based on Copula functions. MTCopula is a flexible and extendable solution that automatically chooses the best Copula model, between Gaussian Copula and T-Copula models, and the best-fitted marginals to catch the data complexity. It relies on Maximum Likelihood Estimation to fit the possible marginal distribution models and introduces Akaike Information Criterion to choose both the best marginals and Copula models, thus removing the need for a tedious manual exploration of their possible combinations. Comparisons with state-of-art synthetic data generators on a real use case private dataset, called AdWanted, and literature datasets show that our approach preserves better the variable behaviors and the dependencies between variables in the generated synthetic datasets.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03188317
Contributor : Cyril de Runz <>
Submitted on : Thursday, April 1, 2021 - 8:05:23 PM
Last modification on : Friday, April 23, 2021 - 9:01:19 PM

File

paper8.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03188317, version 1

Citation

Fodil Benali, Damien Bodénès, Nicolas Labroche, Cyril de Runz. MTCopula: Synthetic Complex Data Generation Using Copula. 23rd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP), 2021, Nicosia, Cyprus. pp.51-60. ⟨hal-03188317⟩

Share

Metrics

Record views

29

Files downloads

59