Classification croisée pour l'analyse de bases de données de grandes dimensions de pharmacovigilance

Abstract : This thesis gathers methodological contributions to the statistical analysis of large datasets in pharmacovigilance. The pharmacovigilance datasets produce sparse and large matrices and these two characteritics are the main statistical challenges for modelling them. The first part of the thesis is dedicated to the coclustering of the pharmacovigilance contingency table thanks to the normalized Poisson latent block model. The objective is on the one hand, to provide pharmacologists with some interesting and reduced areas to explore more precisely. On the other hand, this coclustering remains a useful background information for dealing with individual database. Within this framework, a parameter estimation procedure for this model is detailed and objective model selection criteria are developed to choose the best fit model. Datasets are so large that we propose a procedure to explore the model space in coclustering, in a non exhaustive way but a relevant one. Additionnally, to assess the performances of the methods, a convenient coclustering index is developed to compare partitions with high numbers of clusters. The developments of these statistical tools are not specific to pharmacovigilance and can be used for any coclustering issue. The second part of the thesis is devoted to the statistical analysis of the large individual data, which are more numerous but also provides even more valuable information. The aim is to produce individual clusters according their drug profiles and subgroups of drugs and adverse effects with possible links, which overcomes the coprescription and masking phenomenons, common contingency table issues in pharmacovigilance. Moreover, the interaction between several adverse effects is taken into account. For this purpose, we propose a new model, the multiple latent block model which enables to cocluster two binary tables by imposing the same row ranking. Assertions inherent to the model are discussed and sufficient identifiability conditions for the model are presented. Then a parameter estimation algorithm is studied and objective model selection criteria are developed. Moreover, a numeric simulation model of the individual data is proposed to compare existing methods and study its limits. Finally, the proposed methodology to deal with individual pharmacovigilance data is presented and applied to a sample of the French pharmacovigilance database between 2002 and 2010.
Complete list of metadatas

Cited literature [49 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01806330
Contributor : Abes Star <>
Submitted on : Saturday, June 2, 2018 - 1:05:57 AM
Last modification on : Friday, May 17, 2019 - 11:02:55 AM
Long-term archiving on : Monday, September 3, 2018 - 3:27:09 PM

File

72034_ROBERT_2017_diffusion.pd...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01806330, version 1

Collections

Citation

Valérie Robert. Classification croisée pour l'analyse de bases de données de grandes dimensions de pharmacovigilance. Applications [stat.AP]. Université Paris-Saclay, 2017. Français. ⟨NNT : 2017SACLS111⟩. ⟨tel-01806330⟩

Share

Metrics

Record views

310

Files downloads

223