21
Prof. Dr. Thomas Mandl Informationswissenschaft Universität Hildesheim [email protected] Text Mining in historical text books A new project on information behavior Mandl: Text Mining in historical text books Hildesheim, 6. Oct. 2013 A new project on information behavior in the humanities

Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Prof. Dr. Thomas Mandl

Informationswissenschaft

Universität Hildesheim

[email protected]

Text Mining in historical text books

A new project on information behavior

Mandl: Text Mining in historical text booksHildesheim, 6. Oct. 2013

A new project on information behavior in the humanities

Page 2: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Partner

�Georg Eckert Institute for International Textbook Research. Member of the Leibniz Associationin Braunschweig

Mandl: Text Mining in historical text books

�German Institute for International Educational Research Member of the Leibniz Associationin Frankfurt a.M.

Page 3: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Text Mining

�“Discover useful and previously unknown “gems” of information in large text collections” (Avaquest 2002)

�„Text Mining is the extraction of knowledge out of many texts. .. The newly created knowledge could not be read from one single text but only by considering patterns over many texts … „

Mandl: Text Mining in historical text books

by considering patterns over many texts … „ (Mandl 2013 @ KSS)

�„Process of deriving high quality information from text“ (Choi 2013 @ LWA)

Page 4: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Text Mining

Representation

Surrogat

Result

New Knowledge

Mandl: Text Mining in historical text books

SurrogatAuthors

Extraction

Depending on goal

Algorithm

Machine LearningVisualisationTemporal Analysis ….

New KnowledgeText

Creation

Page 5: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Mandl: Text Mining in historical text books

Page 6: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

The World of the Children

Mandl: Text Mining in historical text books6

The World of the Children

Knowledge about the world and meaning of world in childerns text books and

children novels between 1850 und 1918

Page 7: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Corpora

Childrens‘ and teens‘ booksText books (History and Geography)

Mandl: Text Mining in historical text books7

Knowledge from school books Popular Knowledge

Knowledge

about the world

Page 8: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Corpora

�GEI-Digital- All German history text books of the „Kaiserreich“

- Some 1.800 books (350.000 pages) - OCR processed - Metadaten in METS/MODS-Structure

Mandl: Text Mining in historical text books

-

- Metadaten in METS/MODS-Structure - Digitalisation of Geograpy books ongoing- Until 2014 some 3.000 books more will be available (500.000 pages)

- TEI-Format

Page 9: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Corpora

�Youth literature from UB Braunschweig- 2500 books, - Digitalisation ongoing- Genres: Adventure, Robinsonades, songbooks, moral and religious tales

Mandl: Text Mining in historical text books

Page 10: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Text Books

�Text book in 19th century as a primary tool to access world knowledge

�Children were obliged to visit school- Between 1871 and 1918: 836 new history text books were published

- Between 1890 and 1918: as many books were

Mandl: Text Mining in historical text books

- Between 1890 and 1918: as many books were published as between 1700 and 1850

Page 11: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Areas of Transmitted Knowledge

„Foreign“ Continents,

Countires and Peoples,

Colonies, Discoveries,

Expeditions,

War and Peace,

political treaties,

Regents and states

men

Mandl: Text Mining in historical text books11

Science and Culture,

Urbanisation and

Industrialisation,

Pauperism, Social

politics and family

Transport,

Communication, Trade,

Industry, Technology,

environment and nature

Page 12: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Topics

Mandl: Text Mining in historical text books12

Page 13: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Goals

�Semantic, categorial and symbolic knowledge about the world

�How did the authorities interpret the world and transfer this knowledge to the children

�diachronic perspective�Differences between parts of the corpora

Mandl: Text Mining in historical text books

�Differences between parts of the corpora

Page 14: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Topics

Krieg und Frieden,

politische Bündnisse,

Regenten und

Staatsmänner

Mandl: Text Mining in historical text books14

Wissenschaft und Kultur,

Urbanisierung und

Industrialisierung,

Pauperismus, Sozialpolitik

und Familie

Transport,

Kommunikation,

Handel, Industrie,

Technik, Umwelt

und Natur

Page 15: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Areas

Krieg und Frieden,

politische Bündnisse,

Regenten und

Staatsmänner

Mandl: Text Mining in historical text books15

Wissenschaft und Kultur,

Urbanisierung und

Industrialisierung,

Pauperismus, Sozialpolitik

und Familie

Transport,

Kommunikation,

Handel, Industrie,

Technik, Umwelt

und Natur

Page 16: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Untersuchungsfragen

Weltwissen, Weltbilder, Weltdeutung

Mandl: Text Mining in historical text books16

Perspektiven und Vergleiche:

diachronsynchron intermedial transmedial

Selbst- und Fremdbilder, Fortschritts-, Krisen- und Bewältigungsdiskurse, …

Page 17: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Mandl: Text Mining in historical text books

Page 18: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Methods, Tools, Technologies

Tools for digital Analysis

•Opinion Mining and Topic Extraction from historic text corpora

Hermeneutics (intellectual text analysis) and qualitative Content Analysis

and

Mandl: Text Mining in historical text books18

•Opinion Mining and Topic Extraction from historic text corpora

•Analysis and extraction of opinionated text parts•Temporal Development of Topics and Opinions•Adaptation of Tools to the specific demands of the historians and the structure of the Texts („domain specific text mining“)

•User oriented creation, evaluation and refinement of Software tools

Page 19: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Tasks and Work Packages

�GEI- Historical Research

�DIPF - Software development

�U Hildesheim- Information Behavior Analysis

Mandl: Text Mining in historical text books

- Information Behavior Analysis- Requirements Specification- Evaluation

Page 20: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

Mandl: Text Mining in historical text books

Wilson's process model of Ellis' behavioral framewo rk of information seeking (adapted from Wilson 1999)

Page 21: Text Mining in historical text books · Krieg und Frieden, politische Bündnisse, Regenten und Staatsmänner Mandl: Text Mining in historical text books14 Wissenschaft und Kultur,

KONVENS 2014:

Conference on Natural Language Processing

Uni HildesheimOktober 2014

Mandl: Text Mining in historical text books

Thank you foryour attention