21

HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

HG2051 � Language and the Computer

Computational Linguistics with PythonIntroduction, Organization, Overview of NLP, Main Issues

Michael Wayne GoodmanPostdoctoral Research Fellow, School of Humanities

https://goodmami.org

Lecture 1http://compling.hss.ntu.edu.sg/courses/hg2051/

August 13, 2019

Page 2: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Presentat��ion agenda

Introduction

Administrivia

Course Overview

Getting Started

Page 3: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Today's Session

I Personal Introductions

I Administrivia

I Course OverviewI Why use computers in linguisticsI What this course is (and isn't)

I Getting StartedI Algorithmic thinkingI The Python languageI Introduction to the Natural Language Tool Kit (NLTK;

external slides)

Page 4: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Self Introduction

I BS in Computer Science, Minor in Japanese

I MA and PhD in Computational Linguistics

I Dissertation: Semantic Operations for Transfer-based

Machine Translation

I Interned at the National Institute for Information andCommunications Technology (NICT), Japan

I Contracted at Microsoft Research (MSR) on the MachineTranslation team

I Research Associate at NTU

I Postdoctoral Research Fellow at NTU

Page 5: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Student Introductions

I Your name

I Programming experienceI e.g., Javascript, C/C++, Excel spreadsheetsI "none" is ok!

I Any language you speak other than EnglishI e.g., Mandarin, Bahasa Malay, TamilI "none" is ok (for this course)!

Page 6: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Presentat��ion agenda

Introduction

Administrivia

Course Overview

Getting Started

Page 7: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Schedule

Week Date Content Projects

1 13 Aug Course Intro �

2 20 Aug Basics: Lists, Sets �

3 27 Aug Basics: Strings, Control Assignment 1

4 3 Sep NLTK Text Corpora Assignment 2

� 10 Sep Students' Union Day �

5 17 Sep Lexical Resources Assignment 3

6 24 Sep Processing Raw Text Assignment 4

� 1 Oct Recess �

7 8 Oct Regular Expressions Assignment 5

8 15 Oct Structured Programs Assignment 6

9 22 Oct N-grams and Collocations Assignment 7

10 29 Oct Part-of-Speech Tagging Group Project

11 5 Nov Classi�cation Assignment 8

12 12 Nov Linguistic Data Management Assignment 9

� 19 Nov Examination

I Schedule will be updated online:I http://compling.hss.ntu.edu.sg/courses/hg2051/

Page 8: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Assessment

I Continuous Assessment (100%)I 9 Assignments (40%)

I 5% eachI 1 week to completeI lowest score dropped

I 1 Group Project (20%)I ~1 month to complete

I Final (in-class, online, open-book) programmingchallenge (30%)I individual work (one program each)I in-class exam (5�6 hours)

I Participation (10%)I Every week there will be short problems to do in class

Page 9: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Extra Credit

I If you submit a patch that gets accepted to the NLTK oranother tool we use:I you can get 1-5% extra credit (depending on the

size/di�culty)I you can't go over 100%

I A patch can involveI �xing a bug in codeI extending the code with new capabilitiesI �xing a bug in or extending documentation

I spelling errorsI rewordingI translating

Page 10: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Presentat��ion agenda

Introduction

Administrivia

Course Overview

Getting Started

Page 11: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Why use Computers in Linguistics?

I Linguistics without computers is like taking a walk (or along hard hike)I It can be very pleasantI You can see lots of detailsI There is only so much ground you can cover

I Using a software tool is like catching the MRTI Very e�cient for set routesI You have to adapt to itI Hard to customize

I Programming is like driving a carI It is expensive to start o� (you have to learn!)I You are free to go where you want to

Page 12: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

The goal of this course

To learn enough about programming to �exibly analyze dataand then do something with it

I The language will be Python

I We will use the NLTK

I You will be able to write your own programs by the end

Page 13: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

HG2051 Prerequisites

I A little linguistic knowledgeI You know what a word isI You know what a part of speech isI You know what a parse tree is

If you don't know these, you will have to do a littlebackground reading

I No computational knowledgeI You have to be ready to learnI If you are a very experienced Python programmer, then

you will not learn so muchI If you can program, but in a di�erent framework, then

you will learn something new, and I will expect morefrom your code

Page 14: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

What HG2051 isn't

I We won't be learning how to build carsI this is the prerequisite for further NLP coursesI . . . but we won't be writing taggers and parsers yet

I Just an introduction to PythonI We will be motivated by NLP

I Very easy (this is a feature, not a bug)I but it is very fun

Page 15: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

The Three Virtues of a Programmer

I Laziness: The quality that makes you go to great e�ort toreduce overall energy expenditure. It makes you writelabor-saving programs that other people will �nd useful,and document what you wrote so you don't have toanswer so many questions about it.

I Impatience: The anger you feel when the computer isbeing lazy. This makes you write programs that don't justreact to your needs, but actually anticipate them. Or atleast pretend to.

I Hubris: The quality that makes you write (and maintain)programs that other people won't want to say bad thingsabout.

Larry Wall, Tom Christiansen, Randal L. Schwartz, andStephen Potter (1996) Programming Perl 2nd Ed, O'Reilly.

Page 16: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Readings

I Core readings are from the NLTK book

I Any supplementary readings will be available online

I You must read the material before classI I will assume that you have done soI You get good at programming by programming � that is

how we should spend our time

Page 17: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Acknowledgments

I Thanks to Graham Wilcock for the inspiration for thiscourse, and permission to adapt his course notes.

I Likewise to Francis Bond

I Thanks to Steven Bird, Ewan Klein, and Edward Loperfor releasing the NLTK

I Thanks to Guido van Rossum, Python BenevolentDictator for Life (BDFL)I And to the Python community

Page 18: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Presentat��ion agenda

Introduction

Administrivia

Course Overview

Getting Started

Page 19: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Algorithmic Thinking

I Exercise: How to make kaya toast

I Also see: http://www.cookingforengineers.com/

Page 20: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

The Python Language

I Very easy to learn

I Very powerful

I Very popular in academia, data science and machinelearning, web applications, etc.

I But it's not very fastI but not a big issue in practice

I Python is a language, not a program

I I suggest you install the Anaconda distribution:I https://www.anaconda.com/distribution/I We will be using Python 3; make sure you're using the

right version

Page 21: HG2051 Language and the Computer Computational Linguistics ...compling.hss.ntu.edu.sg/courses/hg2051/HG2051-intro.pdf · Week Date Content Projects 1 13 Aug Course Intro 2 20 Aug

Using Python

I Python can be used interactively, e.g., using IDLE:

>>> x = 1

>>> y = x + 2

>>> print(y)

3

I Or you can execute static Python code:

python myprogram.py

I I recommend using a Jupyter Notebook (demo)