Introduction to Natural Language Processing and Speech

Welcome Introduction and Overview

Introduction to Natural Language Processing and SpeechComputer Science Research PracticumFall 2012Andrew RosenbergArtificial IntelligenceAI is no longer a single subdiscipline in computer scienceNatural Language ProcessingSpeech/Spoken Language ProcessingRoboticsLogic/PlanningCognitive RadioMachine Learning1Artificial IntelligenceWhat is intelligence?

How does computer science make intelligent tools, systems, algorithms?

Does computer science theory contribute to the definition of intelligence?2Language and SpeechWhat is the relationship between language and intelligence/thought/cognition?3Whorf Sapir hypothesisPinker universalistFodor holism3Language and SpeechMost people consider language to be the most direct access to cognition and thought.

Language is core to Artificial Intelligence4Natural Language ProcessingInformation Retrieval (search)Information ExtractionKnowledge Base PopulationSummarizationQuestion AnsweringNamed Entity RecognitionNamed Entity Linking, Co-reference resolutionParsingSentiment Analysis

5Information RetrievalInput: QueryOutput: Relevant Documents

Simplest approach: Identify every document that contains the word or words in the queryWhat about related words?run is related to running runs and marathonHow do you rank for relevance?

6Information ExtractionIdentify specific information from a single document or set of documents.Who works for what organizationWho was born when? died when?Who did what to whom.This is *very* complex.Domain specific systems are developedHow many different ways are there to say the same thing?

7Obama is the President of the USA.7Named Entity Recognition and LinkingBo Obama is Fat. POTUS says so.The President called his dog fat. Mr. Obama, speaking to an interviewer said that The White House dog needs to go on a diet. Recognize that Bo Obama POTUS, The President Mr. Obama, The White House are all ENTITIES?How do you recognize that POTUS, The President, Mr. Obama, him all refer to the same person?

8ParsingUnderstanding grammatical structure from text.Important step in some relation extraction, question answering, etc.9

Sentiment AnalysisCan you tell the difference between a positive review and a negative one?Some reviews come with labelsSome labels have no reviewsSome reviews have no stars10

Spoken Language ProcessingAutomatic Speech RecognitionRich TranscriptionSpeaker RecognitionSpeech SynthesisText NormalizationDiscourse and DialogTurn takingEmotion Recognition11Speech RecognitionConverting speech to text.Acoustic ModelingSpeech to PhonemePronunciation ModelingHow are words pronounced?Language ModelingWhat sequences of words are most common?1213Rich TranscriptionALSO FROM NORTH STATION I THINK THE ORANGE LINE RUNS BY THERE TOO SO YOU CAN ALSO CATCH THE ORANGE LINE AND THEN INSTEAD OF TRANSFERRING UM I YOU KNOW THE MAP IS REALLY OBVIOUS ABOUT THIS BUT INSTEAD OF TRANSFERRING AT PARK STREET YOU CAN TRANSFER AT UH WHATS THE STATION NAME DOWNTOWN CROSSING UM AND THATLL GET YOU BACK TO THE RED LINE JUST AS EASILY

This is how much of spoken language processing and NLP treat speech.

There are transcription errors. There is no punctuation. There is no segmentation.

There are grammatical issues. Word choice is different, structure is different.14Rich TranscriptionAlso, from the North Station...

(I think the Orange Line runs by there too so you can also catch the Orange Line... )

And then instead of transferring

(um I- you know, the map is really obvious about this but)

Instead of transferring at Park Street, you can transfer at (uh whats the station name) Downtown Crossing and (um) thatll get you back to the Red Line just as easily.This is how much of spoken language processing and NLP treat speech.

There are transcription errors. There is no punctuation. There is no segmentation.

There are grammatical issues. Word choice is different, structure is different.Speaker/Author RecognitionWhat makes one speaker or author distinguishable from another?Email hacks, Chat transcripts, Anonymous authors.What are the acoustics which distinguish across two speakers?Spectral QualitiesProsodic QualitiesLexical, syntactic and content usage15Speech SynthesisGenerating Speech from TextThere are tools like Festival, HTS and Mary TTS that make this relatively easyUnit SelectionUse a corpus of a single speaker and paste together small slices of speech to make new wordsWatson http://www.youtube.com/watch?v=WFR3lOm_xhE Parametric SynthesisLearn the spectral shape of different speech sounds, and synthesize them from oscillators and additive noise.Mary TTS Web clienthttp://mary.dfki.de:59125/16Discourse and DialogHow do you accomplish some task through discourse?Understanding the semantics of a user turnGenerating an appropriate promptDialog/Task planning.Semantic Frame filling.17

Emotion RecognitionWhat are the acousticproperties of emotionexpression?Loudness, speaking rate, pitch, hesitation etc.This type of analysis can extend to other speaker statesIntoxicationSleepinessAgeGenderPersonality FactorsDeception18Three Hundred Twelve.

Three Thousand Twelve.

Corpus AnalysisA corpus is a body of linguistic materialCorpora (plural of corpus) are generally shared across research groupsAllow for reproducible findingsDivision of LaborDescribing phenomena is an important first step in most research.What is the distribution of ratings?What are the correlations between features and labels?Are there errors in the annotation? 19Some famous corporaPenn TreebankParse trees and part of speechACE and KBP Information ExtractionSwitchboardConversational telephone speechTIMITPhonetic TranscriptionBoston Radio News CorpusProsodic Annotation20The standard approach21Identify labeled training dataDecide what to label What is a data point?Extract features based on the entityTrain a supervised classifier Machine LearningEvaluate Cross-validation or a held-out test set.How does machine learning fit in?22Automatically identifying patterns in dataAutomatically making decisions based on dataHypothesis:DataLearning AlgorithmBehaviorDataProgrammer or ExpertBehaviorChallengesConversational textSocial Media: Facebook, Twitter, redditEmailChat/IMSpoken Dialog SystemsText Dialog SystemsSentiment AnalysisReviewsCollaborative FilteringNatural Language Generation23Publicly available web-dataSocial Mediatwitter, google plus, forums, etc.Reviewsamazon, tripadvisor, etc.Wikipedia.Find missing links in wikipediaFind potentially incorrect information in wikipediaYouTube videos, soundcloud songs.Can you classify topics? Music genres?24Use of web technologiesThe feedback loop.The use of the tool provides information that can be used to improve the tool.

The use of the product provides training data.Which search results are best.Which ads are usefulWhich recommendations are correct

25Feedback in GoogleRank the top hits in response to a queryWhen someone clicks on a link, boost its ranking/relevanceSame for adsUI/UX experimnets26

Feedback in AmazonTry to give users an offer.If they take it increase its value.27

Feedback in NetflixSuggestions for people like youHow do you group peopleHow do you group movies28

Project ideasLook at the most recent conferences in NLP and SpeechICASSP, Interspeech, ASRUACL, EMNLP, NAACL-HLT, CoLINGAlso, JournalsComputational LinguisticsComputer Speech and LanguageIEEE transactions on Audio Speech and Language ProcessingConsider real-world problems and applications29