Upload
jacco-van-ossenbruggen
View
390
Download
0
Embed Size (px)
Citation preview
The Nature Of Digitally-Produced
Jacco van Ossenbruggen, Laura Hollink, Myriam C. Traub
Really? Do you know
the limitations of your tool and their impact on your research?
Of course, I only use unbiased tools on unbiased data for my
research!
Today: Biases in the source data are unknown because it was not created for scientific purposes.Biases in the tool chain are unknown because we use black box computational workflows and tools.
We need a systematic fit-for-purpose assessment for digital tools and data, starting with the ability to measure and quantify technology-induced bias.
Do you know THAT…
… even simple word count tools return different counts for the same text?
… most search engines are biased on document length?… the version of the Google Ngram Viewer corpus may
impact your trend analysis?… OCR performs better on expensive newspapers targeting
the social elite than on titles printed on cheaper paper?… the Twitter Streaming API is not serving a random
sample of the complete “firehose”?… the performance of predictive models on new
data cannot be predicted, not even by the developers of the models?
…
Traub, Myriam C., and Jacco van Ossenbruggen. Workshop on Tool Criticism in the Digital Humanities, Amsterdam, 22 May 2015
If you know OF OTHER examples of technology-induced bias: please tweet! #toolcrit
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
1980: Bick and Müller observe that years of experience in scientific data collection methods have informed us about each method’s limitations with respect to validity and representativeness. They claim, however, that this is still largely missing in “new” research methods based on data that has been produced for purposes other than scientific inquiry.
Tool criticism
Scholars need to assume a critical attitude towards the use of tools and perform a systematic fit-for-purpose assessment.
Tool makers need to publish the source code of the tools and document their requirements and functionalities.
Data providers need to make provenance information available.
Data scientists need to develop quality metrics that measure
technology-induced bias and take the context and requirements of research tasks into
account.
Bick, Wolfgang, and Paul J Müller. "The Nature of Process-Produced Data. Towards a Social-Scientific Source Criticism." Historical social research: The use of historical and processproduced data. (1980).
AmsterdamData Science