Brouillard de pollution en Chine. Analyse sémantique différentielle de corpus institutionnels, médiatiques et de microblogues

Qinran Dang

Résumé

Air pollution has increasingly become a serious problem in China, more and more journalistic articles and miniblogs (weibo in Chinese, equivalent to tweet), comming from governmental or media websites, social networks, blogs and forums, etc., discuss the issue of «雾霾» (wumai in Chinese, means smog) in China through several angles : political, ecological, economic, sociological, health, etc. The semantics of the themes adressed in these texts differ significantly from each other according to their textual genre. In the framework of our research, our objectif is double-fold : on the one hand, to identify different themes of a digital propose-bulit corpus relating to wumai ; and on the other hand, to interpret differentially the semantics of these themes. Firstly, we collect the textual data written in chinese and related to wumai. These journalistic articles and weibo deriving from three traditional chinese and the social network are divided into four genres of sub-corpus. Secondly, we constitute our corpus through a series of data processing : data cleaning, word segmentation, normalization, POS tagging, benchmarking and data organization. We study the characteristics of the four genres of sub-corpus through a series of discriminating variables - hyperstructural, lexical, semiotic, rhetorical, modal and syntactic - distributed at the infratextual and intratextual level. After that, based on the characteristics of each textual genre, we identify the main themes exposed in each genre of sub-corpus, and analyze the semantics of these identified themes in a contrastive way. Our analysis results are interpreted from two angles : quantitative and qualitative. All statistical analysis are assisted by textometric tools ; and the semantic interpretations are implemented on several fundamental concepts of SI (Sémantique interprétative) proposed by Rastier (1987).

Au fur et à mesure de la dégradation de la qualité de l'air en Chine, de plus en plus d'articles journalistiques et de microblogues (weibo en chinois, équivalent de tweet), provenant de sites web gouvernementaux, médiatiques, de réseaux sociaux, de forums ou de blogs, traitent le problème du « 雾霾 » (wumai en chinois, pour désigner le brouillard de pollution) en Chine sous plusieurs angles : politique, écologique, économique, sociologique, sanitaire, etc. La sémantique des thèmes abordés dans ces textes diffère sensiblement en fonction de leur genre textuel. Dans cette thèse, nous avons pour objectif d'une part, de relever les différents thèmes d'un corpus numérique traitant du wumai et spécifiquement construit à cette fin, et d'autre part, d'interpréter de façon différentielle la sémantique de ces thèmes. Dans un premier temps, nous collectons les données textuelles en langue chinoise relatives au wumai. Ces textes provenant de trois sites web chinois traditionnels et du réseau social sont divisés en quatre genres textuels. Après une série de traitements préparatoires : nettoyage, segmentation, normalisation, annotation, balisage et organisation, nous étudions les caractéristiques des quatre genres textuels du corpus à partir d'une série de variables discriminantes - hyperstructurelles, lexicales, sémiotiques, rhétoriques, modales et syntaxiques - réparties au niveau infratextuel et intratextuel. Ensuite, en nous basant sur les caractéristiques de chaque genre textuel, nous relevons les thèmes principaux exposés dans chaque genre de sous-corpus, et analysons de manière contrastive la sémantique de ces thèmes récupérés. Les résultats d'étude sont interprétés de manière quantitative et qualitative. Les analyses quantitatives s'effectuent à l'aide d'outils textométriques, les interprétations sémantiques s'inscrivent dans le cadre théorique de la sémantique interprétative (SI) proposée par Rastier (1987).

Air Pollution in China. Differential semantic analysis of institutional, media and microblogging corpora

Brouillard de pollution en Chine. Analyse sémantique différentielle de corpus institutionnels, médiatiques et de microblogues

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager