Upload
vanxuyen
View
215
Download
0
Embed Size (px)
Citation preview
Tutorial: Python for Digital HumanitiesDafydd Gibbon
Universität Bielefeld, Germany
2nd South African Workshop on Digital Humanities, Potchefstroom, April 2016
Course Plan V01-2016-02-05
Objectives
Prerequisites• Programming experience with other languages, especially object-oriented and functional styles• Interest in Digital Humanities applications, especially for art, literature, language and speech
Learning objectives• Intermediate to advanced DH relevant skills in Python• Natural Language Programming• Machine Learning
Practicalities – prepare before the course
Working environment• Xubuntu 15.10 or later (xfce desktop)• Python 2.7.
Why Linux?• Main reason:
◦ greatest flexibility for scientific computing• Python is pre-installed in the main distributions• Libraries are easily installed
◦ with pip, easy-install or apt-get• Several SD environment options:
◦ Python editor, IPython, CLI editor & runtime• Easy web servers
◦ Lighty (lighttpd), which I use, Apache, etc. for web apps• Linux can be installed as:
◦ dual boot / VirtualBox or VMware / Live Linux / Persistent Live Linux / hot switch (CB)
Why Python?• Many libraries catering for Digital Humanities, e.g. NLTK, SciKit Learn• Easy learning curve• Why Python 2.7?
◦ Not all libraries have been ported to Python 3.n
I will be using hot-switched Xubuntu with xfce4, installed with Dave Schneider’s ‘crouton’ script on an Intel-powered Chromebook in developer mode. Check the web for this if you are interested.A Live Linux CD with pre-installed libraries will be provided if time allows.
Dafydd Gibbon Python for Digital Humanities, V01-2016-02-05 1/3
Day 1. Python Basics
Python ecology• Versions, Distributions, SDKs, Webtools, Hosting
Environments• Linux and Windows environments
Modules and libraries• NumPy, SciPy, MatPlotLib, SciKitLearn, Pandas, NLTK, os, sys, time, re, cgi, …• Modules
Data structures and algorithms• numbers,chars (encoding), strings, lists, structures, dictionaries, graphics• object oriented and functional programming• loops, list comprehensions; lambda; recursion• error-trapping and debugging
Language• text corpus analysis and lexicon construction
Day 2. NLTK – Natural Language Processing Toolkit
Overview of NLTK
Text analysis• corpus analysis and lexicon construction• text data, web corpora
Parsing• finite state; context-free; statistical
Machine Translation• rule-based; statistical
Speech• signal libraries• annotation, TextGridTools
Day 3. Pattern matching and machine Learning
Overview• Distance measures• Pattern matching• Decision Tree Induction• Clustering• SVMs
SciKit Learn• library (sklearn)• practical examples
Dafydd Gibbon Python for Digital Humanities, V01-2016-02-05 2/3
Selected literature
Python information• https://www.python.org/• Check the O’Reilly handbooks on Python
Specialised literature• Bird, Steven, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python –
Analyzing Text with the Natural Language Toolkit. O’Reilly Media. (Cf. also http://www.nltk.org/book_1ed/)
• Garreta, Raúl and Guillermo Moncecchi. 2013. Learning scikit-learn: Machine Learning in Python. Packt Publishing
Two web tool examples• http://wwwhomes.uni-bielefeld.de/DistGraph/• http://wwwhomes.uni-bielefeld.de/TGA
I usually use Amazon Kindle versions (less expensive).
Dafydd Gibbon Python for Digital Humanities, V01-2016-02-05 3/3