Modélisation des arbres onco-généalogiques et application à la détermination de phénotypes cancéreux spécifiques favorisant une exploration génotypique ciblée

Fabrice Kwiatkowski

Résumé

In oncogenetics, the study of cancer cases in the family pedigree enables to orient the diagnosis towards particular mutations / associations of mutations, or to reject the hypothesis of genetic susceptibility for cancer in the family. If this diagnosis usually relies on the oncogeneticist, it is possible to propose an algorithmic approach to process the information contained in these pedigrees. Three methods have been developed for this purpose:• Use of the whole family pedigree as a model, then calculation of the mutation risk according to various assumptions and conservation of the most probable one, i.e. fitting best to the model.• Generation of sub-trees (skeleton containing for example all father-mother-son-daughter occurrences found in a tree) summarizing oncogenetic information and constitution by aggregation of family profiles. Determination of mutational risk by calculating the distance between subtrees and profiles.• Use of the statistical summary counting cases by type of cancer, by age of diagnosis as well as other synthetic demographic data (celibacy rate, fertility indices, early procreation, etc.). Processing of these summaries using principal component analysis (PCA) and hierarchical clustering in order to highlight groups of families with similar phenotype, likely to correspond to specific genotypes.These approaches were tested either on trees generated randomly using the known risks of breast/ovarian cancer induced by mutations of BRCA genes, or on the oncogenetic database of the Jean Perrin Comprehensive Cancer Center which contains several thousand pedigrees from families predisposed to cancer. This allowed us to determine in particular an optimal size for oncogenetic pedigrees. The generation of subtrees did not prove its superiority over the use of statistical summaries. With these latter, we have developed a doubly hierarchical clustering (H²C), the first level corresponding to the families themselves and the second to the members of the families. This H²C still requires some validation. Finally, the PCAs on the summaries allowed us to regroup families in an efficient manner, by clearly discriminating among the families predisposed to breast/ovarian cancer, the families with very penetrating mutations (BRCA genes) from other families in which the deleterious mutations should be on one or more other genes, yet not recognized as such.

En oncogénétique, c’est l’étude des cas de cancer dans l’arbre généalogique familial qui permet d’orienter le diagnostic vers certaines mutations voire certaines associations de mutations, ou bien de rejeter l’hypothèse d’une susceptibilité génétique de cancer dans la famille. Si ce diagnostic repose d’ordinaire sur l’oncogénéticien, il est possible de proposer une approche algorithmique pour traiter les informations contenues dans ces arbres. Trois méthodes ont été développées à cette fin : • Utilisation de l’arbre généalogique tel quel comme modèle puis calcul du risque mutationnel selon diverses hypothèses et conservation de celle la plus probable au vu du modèle.• Génération de sous-arbres (squelette contenant par ex. toutes les occurrences père-mère-fils-fille d’un arbre) résumant l’information oncogénétique et constitution par agrégation de profils familiaux. Détermination du risque mutationnel par calcul de distance entre les sous-arbres et les profils.• Utilisation du résumé statistique dénombrant les cas par type de cancer, par âge de diagnostic ainsi que d’autres données démographiques synthétiques (taux de célibat, indices de fertilité, précocité de la procréation…). Traitement de ces résumés à l’aide d’analyse en composantes principales (ACP) et de clustering afin de mettre en évidence des groupes de familles de phénotype similaire, susceptibles de correspondre à des génotypes spécifiques.Ces approches ont été testées tantôt sur des arbres générés aléatoirement à partir des risques connus de cancers sein/ovaire induits par les mutations sur les gènes BRCA, tantôt sur la base de données oncogénétique du Centre de lutte contre le cancer Jean Perrin qui contient plusieurs milliers d’arbres de familles prédisposées au cancer. Cela nous a permis de déterminer en particulier une taille optimale pour les arbres onco-généalogiques. La génération de sous-arbres n’a pas montré un intérêt supérieur à l’utilisation des résumés statistiques. A l’aide de ces derniers, nous avons développé un modèle de classement automatique doublement hiérarchique (CAH²), le premier niveau correspondant aux familles elles-mêmes et le second aux membres des familles. Ce CAH² nécessite encore quelques validations. Enfin les ACP sur les résumés nous ont permis de regrouper les familles de manière efficace, en discriminant bien, parmi les familles à risque sein/ovaire, les familles avec des mutations très pénétrantes (gènes BRCA) des autres familles chez lesquelles les mutations délétères devraient être sur un ou plusieurs autres gènes mais non encore répertoriés comme tels.

Modeling of oncogenetic pedigrees and application to the determination of specific cancer phenotypes favoring targeted genotypic exploration

Modélisation des arbres onco-généalogiques et application à la détermination de phénotypes cancéreux spécifiques favorisant une exploration génotypique ciblée

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager