Skip to Main content Skip to Navigation

Assisted authoring for avoiding inadequate claims in scientific reporting

Abstract : In this thesis, we report on our work on developing Natural Language Processing (NLP) algorithms to aid readers and authors of scientific (biomedical) articles in detecting spin (distorted presentation of research results). Our algorithm focuses on spin in abstracts of articles reporting Randomized Controlled Trials (RCTs). We studied the phenomenon of spin from the linguistic point of view to create a description of its textual features. We annotated a set of corpora for the key tasks of our spin detection pipeline: extraction of declared (primary) and reported outcomes, assessment of semantic similarity of pairs of trial outcomes, and extraction of relations between reported outcomes and their statistical significance levels. Besides, we anno-tated two smaller corpora for identification of statements of similarity of treatments and of within-group comparisons. We developed and tested a number of rule-based and machine learning algorithmsforthe key tasksof spindetection(outcome extraction,outcome similarity assessment, and outcome-significance relation extraction). The best performance was shown by a deep learning approach that consists in fine-tuning deep pre-trained domain-specific language representations(BioBERT and SciBERT models) for our downstream tasks. This approach was implemented in our spin detection prototype system, called De-Spin, released as open source code. Our prototype includes some other important algorithms, such as text structure analysis (identification of the abstract of an article, identification of sections within the abstract), detection of statements of similarity of treatments and of within-group comparisons, extraction of data from trial registries. Identification of abstract sections is performed with a deep learning approach using the fine-tuned BioBERT model, while other tasks are performed using a rule-based approach. Our prototype system includes a simple annotation and visualization interface
Complete list of metadatas

Cited literature [315 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Tuesday, September 15, 2020 - 11:01:23 AM
Last modification on : Saturday, October 10, 2020 - 3:25:56 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02938856, version 1



Anna Koroleva. Assisted authoring for avoiding inadequate claims in scientific reporting. Bioinformatics [q-bio.QM]. Université Paris-Saclay; Universiteit van Amsterdam, 2020. English. ⟨NNT : 2020UPASS021⟩. ⟨tel-02938856⟩



Record views


Files downloads