Upload
jackson-hansen
View
24
Download
1
Embed Size (px)
DESCRIPTION
WORDS Lab. CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005. Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk.sourceforge.net/tutorial/tagging/index.html. Words, Words, Words. - PowerPoint PPT Presentation
Citation preview
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 1
WORDS Lab
CSC 9010: Special Topics. Natural Language Processing.
Paula Matuszek, Mary-Angela Papalaskari
Spring, 2005
Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk.sourceforge.net/tutorial/tagging/index.html
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 2
Words, Words, Words
• So far we have covered methods that largely operate on tokens. – Tokenizing text– Stemming words and determining lemmas– POS-tagging– Language models based on n-gram
frequencies
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 3
Every time I fire a linguist, my performance goes up1
• None of this has much of what could be considered "linguistic" knowledge or "understanding". – No parsing– Not much domain knowledge o "meaning"
• For the next two sections of the course we will talk extensively about syntax and semantics.
1. Hirschberg, Julia. 1998. "Every time I fire a linguist, my performance goes up," and other myths of the statistical natural language processing revolution. Invited talk, Fifteenth National Conference on Artificial Intelligence (AAAI-98).
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 4
What's In a Word?• For this lab, we will focus on some of the
things that can be done with application of the techniques we have already studied.
• Format will be – Try a demo– Discuss what techniques were needed to
implement it– Discuss some of what would be needed to
improve it
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 5
Gender Genie• www.bookblog.net/gender/genie.html
• Techniques:
• How good is it? What might improve it?
• Reference:– www.cs.biu.ac.il/~koppel
/papers/male-female-text-final.pdf
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 6
Pearson Knowledge TechnologiesText Classification Demo
• www.k-a-t.com:8080/classify/
• Techniques:
• How good is it? What might improve it?
• Reference: www.k-a-t.com/publications.shtml
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 7
Google Sets• labs.google.com/sets• Techniques:
• How good is it? What might improve it?
• Reference: if you find one let me know. Possibly something like this: ww.arxiv.org/pdf/cs.CL/0412098
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 8
AT&T Text to Speech• www.research.att.com/projects/tts/demo.html
• Techniques:
• How good is it? What might improve it?
• Reference: www.research.att.com/projects/tts/pubs.html