Methods for User-Centered Design and Evaluation of Text ... ?· Methods for User-Centered Design and…

  • Published on

  • View

  • Download


<ul><li><p>Methods for User-Centered Design and Evaluation of Text Analysis Tools in a Digital History Project </p><p>Ben Heuwing, Thomas Mandl, Christa Womser-Hacker Department of Information Science and Natural Language Processing, University of Hildesheim </p><p>Universittsplatz 1, 31141 Hildesheim, Germany [heuwing|mandl|womser] </p><p> ABSTRACT </p><p>This paper reports on the user centered, formative </p><p>evaluation of tools and the validation of models for the </p><p>analysis of historical textbooks in the context of the digital </p><p>history project Children and their World. The goal of the </p><p>project is to create methods for computer-supported, </p><p>interactive analysis that can be applied to a large corpus of </p><p>historical textbooks on history and geography (~5000 </p><p>volumes). A first version of a tool for text analysis has been </p><p>created based on a user centered design process, including a </p><p>contextual study on current work practices of historians, </p><p>participative design workshops, and the prioritization of </p><p>requirements with the project stakeholders. In addition, </p><p>several generations of text models used in these tools have </p><p>been iteratively evaluated by the historians in the project. In </p><p>the context of a cooperative validation study, researchers </p><p>have used these tools to examine existing hypotheses from </p><p>the field. The method enables the validation of text models </p><p>regarding established knowledge, provides additional </p><p>insights into the requirements for tools and visualizations, </p><p>and helps to strengthen the expert users trust in the tools. </p><p>Based on the findings, this paper proposes four principles </p><p>for the analysis of a corpus of historical texts. Moreover, </p><p>the methods presented are discussed regarding the </p><p>application of user centered design in the context of digital </p><p>humanities projects. </p><p>Keywords </p><p>Digital Humanities, Digital History, User Centered Design, </p><p>Interactive Text Analysis </p><p>INTRODUCTION </p><p>The last years have seen a rising interest in the area of </p><p>humanities computing, or digital humanities. The analysis </p><p>and interpretation of texts is certainly the most important </p><p>research activity in the humanities which needs to be </p><p>supported by digital humanities tools (Toms &amp; OBrien, </p><p>2008). In general, humanists expect tools that respect and </p><p>gradually enhance their existing research practices (Gibbs </p><p>&amp; Owens, 2012). However, user groups in the humanities </p><p>are highly diverse, ranging from sporadic humanist users, </p><p>which use tools that are optimized for a specific purpose </p><p>and a specific set of data, through humanist expert-users of </p><p>highly configurable tools, to active humanist programmers </p><p>making use of available programming libraries. In addition, </p><p>scholars from different disciplines in the humanities have </p><p>distinct goals when analyzing and interpreting texts, which </p><p>results in differences in their information behavior (Palmer </p><p>&amp; Cragin, 2008). Because of this diversity of processes and </p><p>motivations, the specific research interests of the intended </p><p>users have to be carefully studied before the findings are </p><p>translated into optimized designs. In the context of research </p><p>tools for the humanities, user centered design will not only </p><p>have to inform the appearance and the utility of an </p><p>interface, but also the transparency of automated processing </p><p>steps of text mining and the utility of the models created. </p><p>General advice on user centered development in the context </p><p>of digital humanities projects is presented by Warwick </p><p>(2012). However, the application of these methods in digital </p><p>humanities projects has rarely been reported on. </p><p>The project Children and their World presented in this </p><p>paper1 confronts the problem of how the world was </p><p>represented in textbooks in the period of the German </p><p>Empire between 1871 and 1914. For most children in this </p><p>period, the textbooks that they used in school were their </p><p>most important or even their only source of knowledge </p><p>about the world outside of their direct experience. These </p><p>popular media therefore are of high value for the research </p><p>on the history of knowledge. The books are expected to </p><p>contribute dominantly to the childrens perception of other </p><p>countries and continents and to their understanding of the </p><p>role of Germany in the world. For example, the exploding </p><p>economic globalization during the 19th century and the </p><p>simultaneously increasing role of nation states can be </p><p>expected to be represented in the textbooks. Consequently, </p><p>research questions of the project include the investigation of </p><p>assumptions about the increase of contents regarding the </p><p>national history of Germany in textbooks during this time </p><p>1 </p><p> ASIST 2016, October 14-18, 2016, Copenhagen, Denmark. </p><p>Author Retains Copyright. </p></li><li><p>and changes in the descriptions of colonies and emigration. </p><p>In addition to changes over time, researchers want to </p><p>compare textbooks for different school types, regions in </p><p>Germany, and religious denominations. </p><p>The version of the corpus used in our experiments currently </p><p>consists of 2935 textbooks with 605830 pages, and mostly </p><p>covers textbooks on history and geography. The corpus </p><p>(example in Figure 1) has been digitized and is publicly </p><p>accessible.2 In this context, the primary user group consists </p><p>of the researchers in the project, i.e. expert users which </p><p>have general knowledge about digital humanities methods </p><p>and can configure and use the tools developed in the project </p><p>as well as existing digital humanities tools. However, a core </p><p>set of the functionality will be made available publicly, so a </p><p>secondary user group of more sporadic users of the tools is </p><p>targeted as well, mostly from the specialization of textbook </p><p>research. </p><p>In this context, it is of importance to understand existing </p><p>processes, but to also provide a creative environment to </p><p>include the participating historians as expert users into the </p><p>processes of innovation and development. The project </p><p>therefore adopted strategies from user centered design, </p><p>including contextual user studies and participative </p><p>workshops. In addition, during development a methodology </p><p>is needed which helps to estimate the utility of the resulting </p><p>tools from a domain perspective, because the corpus is </p><p>prohibitively large for manual inspection and comparison. </p><p>Therefore, an evaluation study of a first generation of tools </p><p>has been conducted which systematically validates the text </p><p>models for interactive analysis relative to established </p><p>knowledge in the domain. In this study, analysis of existing </p><p>hypotheses is conducted interactively and jointly by a </p><p>domain expert and an expert for the tools provided. The </p><p>2 - Strtgen (2014) provides further </p><p>information about the corpus </p><p>idea for the study has been derived from existing research </p><p>in the digital humanities (DiMaggio, Nag, &amp; Blei, 2013; </p><p>Evans, 2014) and has been formalized as a method which, </p><p>in addition to the primary goal of validating modeling </p><p>techniques, also helps to establish trust in the technologies </p><p>provided. The method is also well suited for formative </p><p>evaluation, because during the cooperation of the experts </p><p>new approaches to analysis can be developed. In the </p><p>process, existing problems and new ideas can be derived </p><p>from the conversation of the experts, similar to the </p><p>approach of co-discovery in user testing (e.g. Rieh, Yang, </p><p>Yakel, &amp; Markey, 2010). The study focuses on the </p><p>validation of text models and the formative evaluation of </p><p>the tools. In addition, the user centered process used to </p><p>create the tools in the project is presented and discussed as </p><p>an example of how to incorporate user centered design into </p><p>projects in the digital humanities. </p><p>RELATED WORK </p><p>We will give an overview on existing approaches to user </p><p>research in the digital humanities and on methods for text </p><p>analysis in the area of digital history, focusing on topic </p><p>models and methods for their evaluation. </p><p>User Studies in the Digital Humanities </p><p>While a technical focus still appears to be predominant in </p><p>many digital humanities projects, the application of user </p><p>centered design methods has been promoted for the creation </p><p>of new tools (Gibbs &amp; Owens, 2012; Warwick, 2012). The </p><p>variety of users of digital tools in the humanities is made </p><p>visible through the broad conception of such tools, ranging </p><p>from software libraries to be used in scripting </p><p>environments,3 to configurable, general-purpose test-mining </p><p>software,4 and to tools and interfaces created and offered </p><p>specifically to explore an existing data-set.5 In the field of </p><p>historical research, researchers have been found to expect </p><p>digital tools to improve their existing research processes </p><p>and to support the methods they already employ (Gibbs &amp; </p><p>Owens, 2012). If new tools require the introduction of new </p><p>methods and procedures, these should therefore be </p><p>consistent with established approaches. The information </p><p>behavior of historians emphasizes the collection of </p><p>contextual information about sources and their assessment </p><p>regarding a current research interest (Rhee, 2012). A </p><p>tendency to distrust results of text-mining has been noted in </p><p>the discipline (Gibbs &amp; Owens, 2012). </p><p>Regarding the analysis of sources, researchers in the </p><p>humanities may choose to select, to interlink and to </p><p>3 For an example cf. a collection of programming tutorials </p><p>for historians - </p><p>4 E.g. Voyant, a collection of tools for text-analysis -</p><p> </p><p>5 E.g. the interactive maps of the American Panorama </p><p>project - </p><p>Figure 1: Cover and Page of a history textbook for Girls Schools published in 1907, results of text recognition </p><p><br /> I. berblick ber die Srandenburgisch-Preuische Geschichte <br /> bis zum Regierungsantritte des Groen Kurfiirsteu. <br /> 1. Die Marken an der Ostgrenze des Deutschen Reiches. <br /> Karl der Groe und die Slawen. Der deutsche Bestandteil des <br /> Reiches Karls des Groen reichte nach Unterwerfung der Sachsen bis <br /> zur Elbe. Auf dem rechten Ufer dieses Flusses wohnte der slawische <br /> Volksstamm der Wenden. Diese hatten in den Strmen der Vlker- <br /> wandrnng die frher dort ansssige germanische Bevlkerung ber die <br /> Elbe getrieben; ihr Sinn stand auf Raubzge, die sie ber den Grenz- <br /> strm [] </p></li><li><p>combine contents for further analysis (Collins &amp; Jubb, </p><p>2012). In a survey on the use of tools, researchers in the </p><p>area of digital humanities rate those tools that enable </p><p>analysis by comparing texts, creating concordances (textual </p><p>context), and analysing distributions of words or phrases as </p><p>most useful to their research practices (Toms &amp; OBrien, </p><p>2008). However, basic tools for text manipulation to be </p><p>used from the command line (UNIX tools) are also </p><p>popular among the respondents. This may indicate a </p><p>perceived inflexibility of available, more specialised tools. </p><p>For projects in the digital humanities, productive </p><p>interdisciplinary communication constitutes an important </p><p>factor of success (Collins &amp; Jubb, 2012; Lin, 2012). In a </p><p>project, a high level of collaboration between humanists and </p><p>technologists is necessary to create tools which can enhance </p><p>existing research interests, e.g. when existing resources are </p><p>visualised or interlinked in new ways. This communication </p><p>across disciplines can benefit from methods of participatory </p><p>design. For example, participative design workshops with </p><p>different user groups have been conducted define </p><p>requirements for the collection and analysis of archival </p><p>materials (Boukhelifa, Giannisakis, Dimara, Willett, &amp; </p><p>Fekete, 2015), emphasizing the need for personal </p><p>collections of materials, and the possibility of annotating </p><p>texts with private notes. </p><p>Support for Text Analysis in Historical Research </p><p>An impressive collection of tools has been developed to </p><p>support text analysis in the humanities, based on </p><p>technologies from the areas of information retrieval, natural </p><p>language processing, corpus linguistics, and text-mining. </p><p>For the computer-supported study of documents in the </p><p>digital humanities, the extraction of facts from the texts </p><p>(Facts from Texts - Thaller, 2012) can be distinguished </p><p>from the analysis of the ways these text represent </p><p>knowledge and sentiments about the world (Text as Text </p><p>- Thaller, 2012), the latter including studies of historical </p><p>semantics, historical literary studies, and the study of the </p><p>history of science, or, in the case of our project, more </p><p>generally the history of the representation of knowledge. </p><p>However, text analysis of historical documents presents a </p><p>number of challenges, primarily the quality of optical </p><p>character recognition (OCR) and the general absence of </p><p>conventions for spelling (Piotrowski, 2012). In many cases, </p><p>no resources and tools exist which can be used with </p><p>historical varieties of a language. </p><p>Term-Based Analysis </p><p>A basic analysis of term frequencies can provide important </p><p>insights. For example, the variation of frequencies of </p><p>relevant terms in titles of Victorian books can help to </p><p>analyze the change of religious orientations in a society </p><p>over time (Gibbs &amp; Cohen, 2011). In this case, bigrams of </p><p>words are extracted from the texts to examine the </p><p>characteristic usage of specific terms over time. With the </p><p>goal of applying tools from corpus linguistics to historical </p><p>research questions, a methodology has been proposed </p><p>which involves the creation of term groups by domain </p><p>experts, the analysis of these terms in context to find </p><p>additional relevant terms for a concept, followed by the </p><p>time-based and comparative analysis of their frequencies </p><p>and of their co-occurrences to analyze their usage and </p><p>accompanying opinions (Baker et al., 2008). In this </p><p>approach, the repeated analysis of examples displayed as </p><p>concordances is intended to integrate quantitative and </p><p>qualitative approaches. </p><p>Identification of Semantic Entities and Relations </p><p>In addition to the analysis of term frequencies, the </p><p>identification and disambiguation of named entities such as </p><p>person names, place names, or events, can be helpful to </p><p>analyze the contents of texts. For example, Cybulska &amp; </p><p>Vossen (2011) report a recall rate of 49% and a precision of </p><p>57% for extracted historical events. A prominent example </p><p>in historical research presents an analysis of the spatial </p><p>focus of a local American newspaper over time, based on </p><p>gazetteers and lexical resources (Blevins, 2014). Here, the </p><p>identification of mentions of locations and places clearly </p><p>does not try to extract chunks of information (Thaller, </p><p>2012), but to analyze the development of the discourse </p><p>about these places. Additional ideas exist which can help to </p><p>model discourses on historical events (Nuessli &amp; Kaplan, </p><p>2014), and, on a more general level, on historical trends </p><p>(Allen &amp; Chu, 2014). </p><p>Statistical Models of Contents with Topic Modeling </p><p>A number of statistical approaches from the area of text </p><p>mining can be applied to analyze text collections on a more </p><p>abstract level, including unsupervised document clustering </p><p>and topic modeling as well as supervised approaches such </p><p>as document classification. Topic modeling is a prominent </p><p>approach to unsupervised text-mining in the digital </p><p>humanities. In most studies, topic modeling is implemented </p><p>using LDA (latent dirichlet allocation - Blei, Ng, &amp; Jordan, </p><p>2003), which analyzes the contents of documents based on </p><p>the assumption that those terms which co-occur frequently </p><p>with each other constitute a semantically meaningful topic. </p><p>The modeling process results in a list of topics, each one </p><p>represented as a distribution over the vocabulary of a </p><p>corpus. In practice, however, only the most important terms </p><p>attributed to the topic are taken into consideration when </p><p>interpreting a topic. The distribution of terms for a topic can </p><p>be used to infer its relevance for a document, resulting in a </p><p>distribution over all topics for each document in the corpus. </p><p>In order to generate topic models which are suitable for a </p><p>specific purpose, in most cases several generations of </p><p>models have to be created to optimize the parameters for </p><p>modeling (DiMaggio et al., 2013). The most important </p><p>parameter is the number of topics to be created, which has </p><p>to be specified in advance. In addition, the removal of </p><p>stopwords, the selection of specific word forms, the </p><p>documents or relevant text snippets used, and the definition </p><p>of the delimitation of a document (e.g. sentences, </p><p>paragraphs, pages or complete books) may influence the </p></li><li><p>result of the modeling process (cf. e.g. Newman &amp; Block, </p><p>2006). </p><p>The interpretation of the topic models based on the terms </p><p>and their weights in the topics often constitutes the first step </p><p>of an analysis for historical research. If possible, topics are </p><p>then labeled manually (e.g. Newman &amp; Block, 2006; Evans, </p><p>2014). A possible next step is the aggregation of the results </p><p>of topic-inference for all documents of a period to analyze </p><p>topics over time, e.g. to explore trends in historical </p><p>newspapers (Newman &amp; Block, 2006; Yang, Torget, &amp; </p><p>Mihalcea, 2011; Eisenstein, Sun, &amp; Klein, 2014). </p><p>Because the goal of our project is not the deduction and </p><p>verification of factual information but the analysis of broad </p><p>topical trends and opinions, topic modeling has been </p><p>introduced as the primary method to analyze the text corpus </p><p>in our project. As an unsupervised approach to text </p><p>modeling, it is also expected to be less susceptible to the </p><p>influence of preconceived opinions of researchers. </p><p>Evaluation of topic models </p><p>Evaluation of unsupervised methods for text mining, such </p><p>as clustering and topic-modeling, constitutes a problem as </p><p>there is no training data available for automatic comparison </p><p>while at the same time most text corpora are too large to be </p><p>inspected manually. Existing approaches to evaluate topic </p><p>modeling mainly focus on the automatic evaluation of </p><p>models. The performance of models on held-out parts of a </p><p>text-corpus can be assessed automatically (Mimno, </p><p>Wallach, Talley, Leenders, &amp; McCallum, 2011),