Machine Learning approaches in Predictive Medicine using Electronic Health Records data - TEL - Thèses en ligne Accéder directement au contenu
Thèse Année : 2021

Machine Learning approaches in Predictive Medicine using Electronic Health Records data

Machine Learning approaches in Predictive Medicine using Electronic Health Records data

Résumé

Traditional approaches in medicine to manage diseases can be briefly reduced to the “one-size-fits all” concept (i.e., the effect of treatment reflects the whole sample). On the contrary, precision medicine may represent the extension and the evolution of traditional medicine because is mainly preventive and proactive rather than reactive. This evolution may lead to a predictive, personalized, preventive, participatory, and psycho-cognitive healthcare. Among all these characteristics, the predictive medicine (PM), used to forecast disease onset, diagnosis, and prognosis, is the one this thesis emphasizes. Thus, it is possible to introduce a new emerging healthcare area, named predictive precision medicine (PPM), which may benefit from a huge amount of medical information stored in Electronic Health Records (EHRs) and Machine Learning (ML) techniques. The thesis ecosystem, which consists of the previous $3$ inter-connected key points (i.e., PPM, EHR, ML), contributes to the biomedical and health informatics by proposing meaningful ML methodologies to face and overcome the state-of-the-art challenges, that emerge from real-world EHR datasets, such as high-dimensional & heterogeneous data; unbalanced setting; sparse labeling; temporal ambiguity; interpretability/explainability; and generalization capability. The following ML methodologies designed from specific clinical objectives in PM scenario are suitable to constitute the main core of any novel clinical Decision Support Systems usable by physicians for prevention, screening, diagnosis, and treatment purposes: i) a sparse-balanced Support Vector Machine (SB-SVM) approach aimed to discover type 2 diabetes (T2D) using features extracted from a novel EHR dataset of a general practitioner (GP); ii) a high-interpretable ensemble Regression Forest (TyG-er) approach aimed to identify non-trivial clinical factors in EHR data to determine where the insulin-resistance condition is encoded; iii) a Multiple Instance Learning boosting (MIL-Boost) approach applied to EHR data aimed to early predict an insulin resistance worsening (low vs high T2D risk) in terms of TyG index; iv) a novel Semi-Supervised Multi-task Learning (SS-MTL) approach aimed to predict short-term kidney disease evolution (i.e., patient’s risk profile) on multiple GPs’ EHR data; v) A XGBoosting (XGBoost) approach aimed to predict the sequential organ failure assessment score (SOFA) score at day 5, by utilising only EHR data at the admission day in the Intensive Care Unit (ICU). The SOFA score describes the COVID-19 patient’s complications in ICU and helps clinicians to create COVID-19 patients' risk profiles. The thesis also contributed to the publication of novel publicly available EHR datasets (i.e., FIMMG dataset, FIMMG_obs dataset, FIMMG_pred dataset, mFIMMG dataset).
Traditional approaches in medicine to manage diseases can be briefly reduced to the “one-size-fits all” concept (i.e., the effect of treatment reflects the whole sample). On the contrary, precision medicine may represent the extension and the evolution of traditional medicine because is mainly preventive and proactive rather than reactive. This evolution may lead to a predictive, personalized, preventive, participatory, and psycho-cognitive healthcare. Among all these characteristics, the predictive medicine (PM), used to forecast disease onset, diagnosis, and prognosis, is the one this thesis emphasizes. Thus, it is possible to introduce a new emerging healthcare area, named predictive precision medicine (PPM), which may benefit from a huge amount of medical information stored in Electronic Health Records (EHRs) and Machine Learning (ML) techniques. The thesis ecosystem, which consists of the previous $3$ inter-connected key points (i.e., PPM, EHR, ML), contributes to the biomedical and health informatics by proposing meaningful ML methodologies to face and overcome the state-of-the-art challenges, that emerge from real-world EHR datasets, such as high-dimensional & heterogeneous data; unbalanced setting; sparse labeling; temporal ambiguity; interpretability/explainability; and generalization capability. The following ML methodologies designed from specific clinical objectives in PM scenario are suitable to constitute the main core of any novel clinical Decision Support Systems usable by physicians for prevention, screening, diagnosis, and treatment purposes: i) a sparse-balanced Support Vector Machine (SB-SVM) approach aimed to discover type 2 diabetes (T2D) using features extracted from a novel EHR dataset of a general practitioner (GP); ii) a high-interpretable ensemble Regression Forest (TyG-er) approach aimed to identify non-trivial clinical factors in EHR data to determine where the insulin-resistance condition is encoded; iii) a Multiple Instance Learning boosting (MIL-Boost) approach applied to EHR data aimed to early predict an insulin resistance worsening (low vs high T2D risk) in terms of TyG index; iv) a novel Semi-Supervised Multi-task Learning (SS-MTL) approach aimed to predict short-term kidney disease evolution (i.e., patient’s risk profile) on multiple GPs’ EHR data; v) A XGBoosting (XGBoost) approach aimed to predict the sequential organ failure assessment score (SOFA) score at day 5, by utilising only EHR data at the admission day in the Intensive Care Unit (ICU). The SOFA score describes the COVID-19 patient’s complications in ICU and helps clinicians to create COVID-19 patients' risk profiles. The thesis also contributed to the publication of novel publicly available EHR datasets (i.e., FIMMG dataset, FIMMG_obs dataset, FIMMG_pred dataset, mFIMMG dataset).
Fichier principal
Vignette du fichier
PhD_thesis_bernardini.pdf (2.71 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

tel-03269623 , version 1 (30-06-2021)

Identifiants

  • HAL Id : tel-03269623 , version 1

Citer

Michele Bernardini. Machine Learning approaches in Predictive Medicine using Electronic Health Records data. Artificial Intelligence [cs.AI]. Università Politecnica delle Marche, 2021. English. ⟨NNT : ⟩. ⟨tel-03269623⟩
384 Consultations
459 Téléchargements

Partager

Gmail Facebook X LinkedIn More