Skip to Main content Skip to Navigation

Fine grained classification of polarized and propagandist text in news articles and political debates

Vorakit Vorakitphan 1
1 WIMMICS - Web-Instrumented Man-Machine Interactions, Communities and Semantics
CRISAM - Inria Sophia Antipolis - Méditerranée , Laboratoire I3S - SPARKS - Scalable and Pervasive softwARe and Knowledge Systems
Abstract : In recent years, disinformation has become more viral, mainly due to its spread online on social media, leading to potential threatening consequences for the society. Heterogeneous forms of online disinformation are possible, i.e., deliberately manipulated or fabricated content with the intentional aim of creating conspiracy theories, rumours, or misbehaved stances and judgements, for instance, in news articles, and political discourse and debates. One of many instances of online disinformation, and certainly one of the most dangerous ones, is propaganda. This disinformation instance represents an effective but often misleading communication strategy which is employed to promote a certain viewpoint to the audience, for instance in the political context. The need to effectively and automatically identify, classify and understand such phenomenon is becoming a urgent need. In this thesis, I tackle this issue and I propose a fine-grained classification approach of polarized and propagandist text in news articles and political debates. More precisely, as the audience' perceptions are perceived differently depending on the context, the source of information, the audience background and preferences, a discussed topic can deviate or polarize the audience into a partisanship. This thesis firstly investigates such polarization given a use-case in a political scenario using Aspects-Based Sentiment Analysis to verify how extensively these methods can be employed to gain insights from the political posts on social media. The thesis discusses the design and evaluation of a number of techniques in extracting the main features of propagandist text in the area of Natural Language Processing (NLP) where sentiment analysis, persuasion techniques, message simplicity, and ultimately argumentation are proposed and thoroughly investigated. The findings in this thesis show that such features can capture particular characteristics of propaganda in texts. Furthermore, these features are employed to tackle the NLP tasks of propaganda detection and classification through the design and implementation of a neural architecture to classify fine-grained propaganda techniques. The work in this thesis goes beyond the state-of-the-art of current systems for fine-grained propaganda detection and classification. Various Machine Learning approaches ranging from feature-based logistic regression to recent neural architectures have been experimented on standard benchmarks in propaganda detection. As a result, a full pipeline in propaganda detection and classification is presented where the task of detecting the propagandist text snippets obtained .71 F1-score, and the transformer-based architecture obtained an average of .67 F1-score for the task of propaganda technique classification, outperforming the state-of-the-art systems. This pipeline is demonstrated with a proof-of-concept tool called PROTECT. Finally, as a last contribution of this thesis, I carried out the creation of a new annotated linguistic resource. This resource is annotated with 6 types of propaganda techniques, which breaks down into 14 sub-categories of propaganda in the political debates of the US presidential campaigns from 1960 to 2016. The data set I built contains 1666 instances of propagandist text.
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Friday, March 18, 2022 - 9:31:17 AM
Last modification on : Sunday, May 1, 2022 - 3:17:59 AM


Version validated by the jury (STAR)


  • HAL Id : tel-03612796, version 2



Vorakit Vorakitphan. Fine grained classification of polarized and propagandist text in news articles and political debates. Artificial Intelligence [cs.AI]. Université Côte d'Azur, 2021. English. ⟨NNT : 2021COAZ4109⟩. ⟨tel-03612796v2⟩



Record views


Files downloads