8
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 1 WORDS Lab CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005 Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk . sourceforge .net/tutorial/tagging/index.html

WORDS Lab

Embed Size (px)

DESCRIPTION

WORDS Lab. CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005. Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk.sourceforge.net/tutorial/tagging/index.html. Words, Words, Words. - PowerPoint PPT Presentation

Citation preview

Page 1: WORDS Lab

CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 1

WORDS Lab

CSC 9010: Special Topics. Natural Language Processing.

Paula Matuszek, Mary-Angela Papalaskari

Spring, 2005

Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk.sourceforge.net/tutorial/tagging/index.html

Page 2: WORDS Lab

CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 2

Words, Words, Words

• So far we have covered methods that largely operate on tokens. – Tokenizing text– Stemming words and determining lemmas– POS-tagging– Language models based on n-gram

frequencies

Page 3: WORDS Lab

CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 3

Every time I fire a linguist, my performance goes up1

• None of this has much of what could be considered "linguistic" knowledge or "understanding". – No parsing– Not much domain knowledge o "meaning"

• For the next two sections of the course we will talk extensively about syntax and semantics.

1. Hirschberg, Julia. 1998. "Every time I fire a linguist, my performance goes up," and other myths of the statistical natural language processing revolution. Invited talk, Fifteenth National Conference on Artificial Intelligence (AAAI-98).

Page 4: WORDS Lab

CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 4

What's In a Word?• For this lab, we will focus on some of the

things that can be done with application of the techniques we have already studied.

• Format will be – Try a demo– Discuss what techniques were needed to

implement it– Discuss some of what would be needed to

improve it

Page 6: WORDS Lab

CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 6

Pearson Knowledge TechnologiesText Classification Demo

• www.k-a-t.com:8080/classify/

• Techniques:

• How good is it? What might improve it?

• Reference: www.k-a-t.com/publications.shtml

Page 7: WORDS Lab

CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 7

Google Sets• labs.google.com/sets• Techniques:

• How good is it? What might improve it?

• Reference: if you find one let me know. Possibly something like this: ww.arxiv.org/pdf/cs.CL/0412098

Page 8: WORDS Lab

CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 8

AT&T Text to Speech• www.research.att.com/projects/tts/demo.html

• Techniques:

• How good is it? What might improve it?

• Reference: www.research.att.com/projects/tts/pubs.html