Just published: La détection automatique multilingue d’énoncés
biaisés dans Wikipédia. (Master Thesis)

Aleksandrova, Desislava (2020) La détection automatique multilingue d’énoncés biaisés dans Wikipédia.
Master Thesis. Département de linguistique et de traduction, Université de Montréal. [PDF (1Mo)].

We propose a multilingual method for the extraction of biased sentences from Wikipedia, and use it to create corpora in Bulgarian, French and English. Sifting through the revision history of the articles that at some point had been considered biased and later corrected, we retrieve the last tagged and the first untagged revisions as the before/after snapshots of what was deemed a violation of Wikipedia’s neutral point of view policy. We extract the sentences that were removed or rewritten in that edit. The approach yields sufficient data even in the case of relatively small Wikipedias, such as the Bulgarian one, where 62k articles produced 5 thousand biased sentences. We evaluate our method by manually annotating 520 sentences for Bulgarian and French, and 744 for English. We assess the level of noise and analyze its sources. Finally, we exploit the data with well-known classification methods to detect biased sentences.

Keywords : bias, neutrality, classification, multilingual, corpora, Wikipedia.

Comments are closed.