3
Tutorial: Python for Digital Humanities Dafydd Gibbon Universität Bielefeld, Germany 2 nd South African Workshop on Digital Humanities, Potchefstroom, April 2016 Course Plan V01-2016-02-05 Objectives Prerequisites Programming experience with other languages, especially object-oriented and functional styles Interest in Digital Humanities applications, especially for art, literature, language and speech Learning objectives Intermediate to advanced DH relevant skills in Python Natural Language Programming Machine Learning Practicalities – prepare before the course Working environment Xubuntu 15.10 or later (xfce desktop) Python 2.7. Why Linux? Main reason: greatest flexibility for scientific computing Python is pre-installed in the main distributions Libraries are easily installed with pip, easy-install or apt-get Several SD environment options: Python editor, IPython, CLI editor & runtime Easy web servers Lighty (lighttpd), which I use, Apache, etc. for web apps Linux can be installed as: dual boot / VirtualBox or VMware / Live Linux / Persistent Live Linux / hot switch (CB) Why Python? Many libraries catering for Digital Humanities, e.g. NLTK, SciKit Learn Easy learning curve Why Python 2.7? Not all libraries have been ported to Python 3. n I will be using hot-switched Xubuntu with xfce4, installed with Dave Schneider’s ‘crouton’ script on an Intel- powered Chromebook in developer mode. Check the web for this if you are interested. A Live Linux CD with pre-installed libraries will be provided if time allows. Dafydd Gibbon Python for Digital Humanities, V01-2016-02-05 1/3

Tutorial: Python for Digital Humanities · Tutorial: Python for Digital Humanities Dafydd Gibbon Universität Bielefeld, Germany 2nd South African Workshop on Digital Humanities,

Embed Size (px)

Citation preview

Tutorial: Python for Digital HumanitiesDafydd Gibbon

Universität Bielefeld, Germany

2nd South African Workshop on Digital Humanities, Potchefstroom, April 2016

Course Plan V01-2016-02-05

Objectives

Prerequisites• Programming experience with other languages, especially object-oriented and functional styles• Interest in Digital Humanities applications, especially for art, literature, language and speech

Learning objectives• Intermediate to advanced DH relevant skills in Python• Natural Language Programming• Machine Learning

Practicalities – prepare before the course

Working environment• Xubuntu 15.10 or later (xfce desktop)• Python 2.7.

Why Linux?• Main reason:

◦ greatest flexibility for scientific computing• Python is pre-installed in the main distributions• Libraries are easily installed

◦ with pip, easy-install or apt-get• Several SD environment options:

◦ Python editor, IPython, CLI editor & runtime• Easy web servers

◦ Lighty (lighttpd), which I use, Apache, etc. for web apps• Linux can be installed as:

◦ dual boot / VirtualBox or VMware / Live Linux / Persistent Live Linux / hot switch (CB)

Why Python?• Many libraries catering for Digital Humanities, e.g. NLTK, SciKit Learn• Easy learning curve• Why Python 2.7?

◦ Not all libraries have been ported to Python 3.n

I will be using hot-switched Xubuntu with xfce4, installed with Dave Schneider’s ‘crouton’ script on an Intel-powered Chromebook in developer mode. Check the web for this if you are interested.A Live Linux CD with pre-installed libraries will be provided if time allows.

Dafydd Gibbon Python for Digital Humanities, V01-2016-02-05 1/3

Day 1. Python Basics

Python ecology• Versions, Distributions, SDKs, Webtools, Hosting

Environments• Linux and Windows environments

Modules and libraries• NumPy, SciPy, MatPlotLib, SciKitLearn, Pandas, NLTK, os, sys, time, re, cgi, …• Modules

Data structures and algorithms• numbers,chars (encoding), strings, lists, structures, dictionaries, graphics• object oriented and functional programming• loops, list comprehensions; lambda; recursion• error-trapping and debugging

Language• text corpus analysis and lexicon construction

Day 2. NLTK – Natural Language Processing Toolkit

Overview of NLTK

Text analysis• corpus analysis and lexicon construction• text data, web corpora

Parsing• finite state; context-free; statistical

Machine Translation• rule-based; statistical

Speech• signal libraries• annotation, TextGridTools

Day 3. Pattern matching and machine Learning

Overview• Distance measures• Pattern matching• Decision Tree Induction• Clustering• SVMs

SciKit Learn• library (sklearn)• practical examples

Dafydd Gibbon Python for Digital Humanities, V01-2016-02-05 2/3

Selected literature

Python information• https://www.python.org/• Check the O’Reilly handbooks on Python

Specialised literature• Bird, Steven, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python –

Analyzing Text with the Natural Language Toolkit. O’Reilly Media. (Cf. also http://www.nltk.org/book_1ed/)

• Garreta, Raúl and Guillermo Moncecchi. 2013. Learning scikit-learn: Machine Learning in Python. Packt Publishing

Two web tool examples• http://wwwhomes.uni-bielefeld.de/DistGraph/• http://wwwhomes.uni-bielefeld.de/TGA

I usually use Amazon Kindle versions (less expensive).

Dafydd Gibbon Python for Digital Humanities, V01-2016-02-05 3/3