A. Table, 3 -Medical tests values considered and discretized according to reference ranges. Examples are given between square bracket. Category Medical test (unit) Blood cells lymphocite percentage (%)

&. and L. Number,

&. ,

&. 80%],

&. ,

&. 11%],

&. 350k],

, Hemoglobin mean corpuscular hemoglobin concentration (%) [ ; < 30%; 30 ? 35%

&. 35%],

, Coagulation sedimentation rates (mm)

. Lipidemia, HDL cholesterol (mmol/l)

, > 2] Chemistry albuminemia (µmol/l)

, > 80], chloremia (mmol/l)

, Biology serum glutamo-oxaloacetate transferase (IU/l)

A. Agibetov, K. Blagec, H. Xu, and M. Samwald, «Fast and scalable neural embedding models for biomedical sentence classification, BMC bioinformatics, vol.19, issue.1, p.12, 2018.

R. Artstein and M. Poesio, «Inter-coder agreement for computational linguistics», Computational Linguistics, vol.34, pp.555-596, 2008.

J. S. Ash, M. Berg, and E. Coiera, «Some unintended consequences of information technology in health care: the nature of patient care information system-related errors», Journal of the American Medical Informatics Association, vol.11, issue.2, pp.104-112, 2004.

J. Bergstra and Y. Bengio, «Random search for hyper-parameter optimization, Journal of Machine Learning Research, vol.13, p.57, 2012.

G. S. Birkhead, M. Klompas, and N. R. Shah, Uses of electronic health records for public health surveillance to advance public health, Annual review of public health, vol.36, p.7, 2015.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information», 2016.

A. Bordes, N. Usunier, A. Garcia-duran, J. Weston, and O. Yakhnenko, «Translating embeddings for modeling multi-relational data», dans Advances in neural information processing systems, vol.33, pp.2787-2795, 2013.

C. M. Boyd, J. Darer, C. Boult, L. P. Fried, L. Boult et al., «Clinical practice guidelines and quality of care for older patients with multiple comorbid diseases: implications for pay for performance, Jama, vol.294, issue.6, pp.716-724, 2005.

L. Breiman, Machine learning, vol.45, issue.1, p.57, 2001.

P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. Pietra, and J. C. Lai, Classbased n-gram models of natural language», Computational linguistics, vol.18, pp.467-479, 1992.

. Iv-appendix-a,

G. C. Cawley and N. L. Talbot, «On over-fitting in model selection and subsequent selection bias in performance evaluation, Journal of Machine Learning Research, vol.11, p.57, 2010.

D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, and S. Trani, «Dexter: an open source framework for entity linking, dans Proceedings of the sixth international workshop on Exploiting semantic annotations in information retrieval, pp.17-20, 2013.

C. Chang and . Lin, Libsvm: a library for support vector machines, ACM transactions on intelligent systems and technology (TIST), vol.2, p.57, 2011.

A. L. Choi-et, Gram: graph-based attention model for healthcare representation learning, dans Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol.32, pp.787-795, 2017.

R. Conroy, K. Pyörälä, A. E. Fitzgerald, S. Sans, A. Menotti et al., «Estimation of ten-year risk of fatal cardiovascular disease in europe: the score project, European heart journal, vol.24, issue.11, pp.987-1003, 2003.

O. Corby and C. F. Zucker, «The kgram abstract machine for knowledge graph querying, dans Web Intelligence and Intelligent Agent Technology (WI-IAT), vol.1, pp.338-341, 2010.

H. Cunningham, Gate: A framework and graphical development environment for robust nlp tools and applications, dans Proc. 40th annual meeting of the association for computational linguistics, vol.35, pp.168-175, 2002.

J. Daiber, M. Jakob, C. Hokamp, and P. N. Mendes, «Improving efficiency and accuracy in multilingual entity extraction, dans Proceedings of the 9th International Conference on Semantic Systems (I-Semantics), vol.34, p.37, 2013.

S. De-rosis and C. Seghieri, Basic ict adoption and use by general practitioners: an analysis of primary care systems in 31 european countries, BMC medical informatics and decision making, vol.15, issue.1, p.70, 2015.

J. Dem?ar, «Statistical comparisons of classifiers over multiple data sets, Journal of Machine learning research, vol.7, pp.1-30, 2006.

J. Devlin, M. Chang, K. Lee, and K. Toutanova, «Bert: Pre-training of deep bidirectional transformers for language understanding», vol.11, p.29, 2018.

D. Agostino, R. B. , R. S. Vasan, M. J. Pencina, P. A. Wolf et al., «General cardiovascular risk profile for use in primary care, Circulation, vol.117, issue.6, pp.743-753, 2008.

J. Eisenschlos, S. Ruder, P. Czapla, M. Kardas, S. Gugger et al., Multifit: Efficient multi-lingual language model fine-tuning», 2019.

J. Escudié, B. Rance, G. Malamut, S. Khater, A. Burgun et al., «A novel data-driven workflow combining literature and electronic health records to estimate comorbidities burden for a specific disease: a case study on autoimmune comorbidities in patients with celiac disease, BMC medical informatics and decision making, vol.17, issue.1, pp.140-149, 2017.

J. M. Flach, P. Schanely, L. Kuenneke, B. Chidoro, J. Mubaslat et al., «Electronic health records and evidence-based practice: Solving the little-data problem, dans Proceedings of the International Symposium on Human Factors and Ergonomics in Health Care, vol.7, p.74, 2018.

A. Foncubierta-rodriguez, Description and retrieval of medical visual information based on language modelling, p.10, 2014.

G. Forman and M. Scholz, «Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement, ACM SIGKDD Explorations Newsletter, vol.12, issue.1, p.60, 2010.

O. Frunza, D. Inkpen, and T. Tran, «A machine learning approach for identifying disease-treatment relations in short texts, IEEE transactions on knowledge and data engineering, vol.23, issue.6, pp.801-814, 2011.

R. Gazzotti, C. Zucker, F. Gandon, V. Lacroix-hugues, and D. Darmon, «Évaluation des améliorations de prédiction d'hospitalisation par l'ajout de connaissances métier aux dossiers médicaux, 2019.

, Revue des Nouvelles Technologies de l'Information (RNTI), vol.35

R. Gazzotti, C. Zucker, F. Gandon, V. Lacroix-hugues, and D. Darmon, Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hospitalization Prediction», dans SAC2020 -The 35th ACM/SIGAPP Symposium On Applied Computing, 2020.

R. Gazzotti, E. Noual, C. Zucker, F. Gandon, A. Giboin et al., «Designing the Interaction with a prediction system to prevent hospitalization», dans RJCIA 2019 -Rencontres des Jeunes Chercheurs en Intelligence Artificielle PFIA, pp.54-58, 2019.

R. Gazzotti, C. F. Zucker, F. Gandon, V. Lacroix-hugues, and D. Darmon, «Injecting domain knowledge in electronic medical records to improve hospitalization prediction, dans The 16th Extended Semantic Web Conference (ESWC 2019), vol.11503, pp.116-130, 2019.

B. A. Goldstein, A. M. Navar, M. J. Pencina, and J. Ioannidis, 2017, «Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review, Journal of the American Medical Informatics Association, vol.24, issue.1, pp.198-208

X. Han and L. Sun, A generative entity-mention model for linking entities with knowledge base, dans Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.945-954, 2011.

Z. S. Harris, Distributional structure, vol.10, p.13, 1954.

A. Henriksson, J. Zhao, H. Boström, and H. Dalianis, «Modeling heterogeneous clinical sequence data in semantic space for adverse drug event detection, dans Data VII APPENDIX A. APPENDIX Science and Advanced Analytics (DSAA), pp.1-8, 2015.

W. R. Hersh, Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance, Clin Pharmacol Ther, vol.81, issue.2, pp.126-134, 2007.

R. Hillestad, J. Bigelow, A. Bower, F. Girosi, R. Meili et al., «Can electronic medical record systems transform health care? potential health benefits, savings, and costs, Health affairs, vol.24, issue.5, pp.1103-1117, 2005.

J. S. Howard and . Ruder, Universal language model fine-tuning for text classification, 2018.

K. Jha, M. Röder, and A. N. Ngomo, Eaglet-a named entity recognition and entity linking gold standard checking tool», dans European Semantic Web Conference, vol.34, pp.149-154, 2017.

B. Jin, C. Che, Z. Liu, S. Zhang, X. Yin et al., «Predicting the risk of heart failure with ehr sequential data modeling, IEEE Access, vol.6, pp.9256-9261, 2018.

S. Khera, D. Kolte, S. Deo, A. Kalra, T. Gupta et al., «Derivation and external validation of a simple risk tool to predict 30-day hospital readmissions after transcatheter aortic valve replacement.», EuroIntervention: journal of EuroPCR in collaboration with the Working Group on Interventional Cardiology of the, European Society of Cardiology, vol.15, issue.2, p.75, 2019.

K. Krippendorff, Estimating the reliability, systematic error and random error of interval data, Educational and Psychological Measurement, vol.30, issue.1, pp.61-70, 1970.

V. Lacroix-hugues, Utilisation des enregistrements médicaux électroniques, exemple dútilisation dans le cadre du projet PRIMEGE PACA ; quels sont les principaux motifs de recours, diagnostics et prescriptions en soins primaires., thèse de doctorat, vol.47, 2016.

V. Lacroix-hugues, D. Darmon, C. Pradier, and P. Staccini, Creation of the first french database in primary care using the icpc2: Feasibility study.», Studies in health technology and informatics, vol.245, pp.462-466, 2017.

. Viii-appendix-a,

Q. Le and T. Mikolov, Distributed representations of sentences and documents, pp.1188-1196, 2014.

J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim et al., Biobert: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics, vol.36, issue.4, pp.1234-1240, 2020.

C. Lin, H. Canhao, T. Miller, D. Dligach, R. M. Plenge et al.,

. Savova, Feature engineering and selection for rheumatoid arthritis disease activity classification using electronic medical records, dans ICML Workshop on Machine Learning for Clinical Data Analysis, p.16, 2012.

J. Liu, Z. Zhang, and N. Razavian, «Deep ehr: Chronic disease prediction using medical notes, 2018.

L. Liu, J. Shen, M. Zhang, Z. Wang, and J. Tang, Learning the joint representation of heterogeneous temporal events for clinical endpoint prediction, 2018.

E. Loper and S. Bird, Nltk: the natural language toolkit, 2002.

S. C. De-lusignan and . Van-weel, «The use of routinely collected computer data for research in primary care: opportunities and challenges, Family practice, vol.23, issue.2, pp.253-263, 2005.

L. Màrquez and H. R. , «Part-of-speech tagging using decision trees, dans European Conference on Machine Learning, pp.25-36, 1998.

P. Mccullagh and J. A. Nelder, Generalized linear models, vol.37, p.57, 1989.

T. Mikolov, K. Chen, G. S. Corrado, and J. A. Dean, «Computing numeric representations of words in a high-dimensional space, vol.037, p.14, 2015.

H. Min, H. Mobahi, K. Irvin, S. Avramovic, and J. Wojtusiak, Predicting activities of daily living for cancer patients using an ontology-guided machine learning methodology», Journal of biomedical semantics, vol.8, issue.1, p.31, 2017.

D. Moussallem, R. Usbeck, M. Röeder, and A. N. Ngomo, «Mag: A multilingual, knowledge-base agnostic and deterministic entity linking approach», dans Proceedings of the Knowledge Capture Conference, p.35, 2017.

L. Na, C. Yang, C. Lo, F. Zhao, Y. Fukuoka et al., «Feasibility of reidentifying individuals in large national physical activity data sets from which protected health information has been removed with use of machine learning, JAMA network open, vol.1, issue.8, 2018.

C. Nadeau and Y. Bengio, «Inference for the generalization error, Mach. Learn, vol.52, pp.239-281, 2003.

F. J. Ordónez, P. Toledo, and A. Sanchis, «Activity recognition using hybrid generative/discriminative models on home environments using binary sensors, Sensors, vol.13, issue.5, pp.5460-5477, 2013.

K. Pearson and . Liii, on lines and planes of closest fit to systems of points in space, Journal of Science, vol.2, issue.11, pp.559-572

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, p.57, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

J. Pennington, R. Socher, and C. Manning, «Glove: Global vectors for word representation», dans Proceedings of the 2014 conference on empirical methods in natural language processing, vol.10, p.32, 2014.

A. Piktus, N. B. Edizel, P. Bojanowski, E. Grave, R. Ferreira et al., Misspelling oblivious word embeddings», dans Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.3226-3234, 2019.

M. T. Ribeiro, S. Singh, and C. Guestrin, «Why should i trust you?: Explaining the X APPENDIX A. APPENDIX predictions of any classifier, dans Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, vol.22, pp.1135-1144, 2016.

G. Rizzo and R. Troncy, Nerd: A framework for evaluating named entity recognition tools in the web of data, dans 10th International Semantic Web Conference (ISWC'11), vol.34, pp.1-4, 2011.

A. G. Salguero, M. Espinilla, P. Delatorre, and J. Medina, Using ontologies for the online recognition of activities of daily living, Sensors, vol.18, p.32, 2018.

J. Shang, T. Ma, C. Xiao, and J. Sun, «Pre-training of graph augmented transformers for medication recommendation, 2019.

A. Singh, G. Nadkarni, O. Gottesman, S. B. Ellis, E. P. Bottinger et al., «Incorporating temporal ehr data in predictive models for risk stratification of renal function deterioration, Journal of biomedical informatics, vol.53, p.28, 2015.

J. Snoek, H. Larochelle, and R. P. Adams, «Practical bayesian optimization of machine learning algorithms», dans Advances in neural information processing systems, vol.25, pp.2951-2959, 2012.

K. A. Stroetmann, J. Artmann, V. N. Stroetmann, D. Protti, J. Dumortier et al., European countries on their journey towards national ehealth infrastructures, 2011.

C. Sutton and A. Mccallum, An introduction to conditional random fields», Foundations and Trends® in Machine Learning, vol.4, p.27, 2012.

B. Tang, H. Cao, X. Wang, Q. Chen, and H. Xu, Evaluating word representation features in biomedical named entity recognition tasks, BioMed research international, p.11, 2014.

A. Tchechmedjiev, A. Abdaoui, V. Emonet, S. Zevio, and C. Jonquet, Sifr annotator: ontology-based semantic annotation of french biomedical text and clinical notes, BMC bioinformatics, vol.19, issue.1, p.52, 2018.
URL : https://hal.archives-ouvertes.fr/lirmm-01934127

R. Tibshirani, «Regression shrinkage and selection via the lasso», Journal of the Royal Statistical Society: Series B (Methodological), vol.58, issue.1, p.59, 1996.

M. E. Tinetti, S. T. Bogardus-jr, and J. V. Agostini, «Potential pitfalls of disease-specific guidelines for patients with multiple conditions, N Engl J Med, vol.351, issue.1, pp.2870-2874, 2004.

R. Usbeck, M. Röder, A. Ngomo, C. Baron, A. Both et al., Gerbil: general entity annotator benchmarking framework», dans Proceedings of the 24th international conference on World Wide Web, International World Wide Web Conferences Steering Committee, vol.35, pp.1133-1143, 2015.

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy et al., Glue: A multi-task benchmark and analysis platform for natural language understanding», 2018.

P. L. Whetzel, N. F. Noy, N. H. Shah, P. R. Alexander, C. Nyulas et al., Bioportal: enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications», Nucleic acids research, vol.39, pp.541-545, 2011.

Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun et al., Ernie: Enhanced language representation with informative entities», 2019.

Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Y. , Random erasing data augmentation», 2017.