10
CS 4705 CS4705 Natural Language Processing

CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally

CS 4705

CS4705

Natural Language Processing

Page 2: CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally

What is Natural Language Processing?

• The study of human languages and how they can be represented computationally and analyzed, recognized, and generated algorithmically– The cat is on the mat. --> on (mat, cat)

– on (mat, cat) --> The cat is on the mat

• Studying NLP involves studying natural language, formal representations, and algorithms for their manipulation

Page 3: CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally

What must we know about human language to do NLP?

• Morphology: words and their composition– cat, cats, dogs– child, children– undo, union

• Phonetics and Phonology: speech sounds, their production, and the rule systems that govern their use– tap, butter – nice white rice; height/hot; kite/cot; night/not...– city hall, parking lot, city hall parking lot– The cat is on the mat. The cat is on the mat?

Page 4: CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally

• Syntax: the structuring of words into larger phrases– John hit Bill

– Bill was hit by John (passive)

– Bill, John hit (preposing)

– Who John hit was Bill (wh-cleft)

• Semantics: the (truth-functional) meaning of words and phrases– gun(x) & holster(y) & in(x,y)

– fake (gun (x)) (compositional semantics)

– The king of France is bald (presupposition violation)

– bass fishing, bass playing (word sense disambiguation)

Page 5: CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally

• Pragmatics and Discourse: the meaning of words and phrases in context– George got married and had a baby.

– George had a baby and got married.

– Some people left early.

– Prosodic Variation

• German teachers

• Bill doesn’t drink because he’s unhappy.

• John only introduced Mary to Sue.

• John called Bill a Republican and then he insulted him.

• John likes his mother, and so does Bill.

Page 6: CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally

Bureaucracy

• Instructor: Julia Hirschberg – ([email protected])

– Office and hours: CEPSR 705, TBA

• Teaching Assistant: Sameer Maskey– ([email protected])

– Office and hours: CEPSR 720, TBA

• Syllabus available at http://www1.cs.columbia.edu/~julia/cs4705/syllabus.html

Page 7: CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally

• Text: Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2000 (available at Morningside Bookstore) Note errata available on website; check before reading

each chapter please

• Assignments: 3 homework assignments, midterm, final (1 extra homework for graduate students)– Evaluation: 40% homework + 40% exams + 20% class

participation

Page 8: CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally

Academic Integrity

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is forbidden, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you are going to have trouble completing an assignment, talk to the instructor or TA in advance of the due date please. Everyone: Read/write protect your homework files at all times.

Page 9: CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally

NLP Applications

• Speech Synthesis, Speech Recognition, IVR Systems (TOOT: more or less succeeds)

• Information Retrieval (SCANMail demo)• Information Extraction

– Question Answering (AQUA)

• Machine Translation (SYSTRAN)• Summarization (NewsBlaster)• Automated Psychotherapy (Eliza)

Page 10: CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally

For Next Class

• Read Chapters 1-2• For fun: Experiment with Eliza:

– Does she pass the Turing Test?

– What kind of input defeats her?

– How could you improve her ability to fool people into thinking she is human? Using FSAs alone?