In the current post-truth era, online information is consistently under scrutiny with respect to its credibility (its quality and veracity). Computer science has been prolific in developing automated solutions for relevant tasks such as claim verification or bias estimation. However , the validity of such solutions relies heavily on their training and evaluation datasets. Inevitably, systematic and methodological errors (known as data biases) might appear during their compilation. We survey 12 published and freely-available datasets and annotate them for data biases using an established theoretical framework. We employ three expert annotators coming from the disciplines of computer science, philosophy, communication science and show that indeed, all annotated datasets suffer from biases.
Dimitrios Bountouridis et al.
pdf | 241,098 kB