Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Natural Language Processing and its ApplicationDr., Samir Rustamov,Assistant Professor, School of IT & Engineering, ADA University
2
WhatisNaturalLanguageProcessing(NLP)?
• NaturalLanguageProcessing(NLP)isafieldofartificialintelligencethatenablescomputersinteractwithhumaninnaturallanguage.
Ultimategoal:Naturalhuman-to-computercommunication
BİL711 Natural Language Processing 3
Computer Human
HumanJudge
• HumanJudge askstele-typedquestionstoComputer andHuman.• Computer’sjobistoactlikeahuman.• Human’s jobistoconvinceJudgethatheisnotmachine.• Computer isjudged“intelligent”ifitcanfoolthejudge• Judgmentofintelligenceislinkedtoappropriateanswerstoquestionsfromthesystem.
The Turing Test(Can Machine think? A. M. Turing, 1950)
NaturalLanguageProcessingMarkettoReach$22.3Billionby2025
BİL711 Natural Language Processing 6
FormsofNaturalLanguage
• Theinput/outputofaNLPsystemcanbe:• writtentext• speech
• Toprocesswrittentext,weneed:• lexical,syntactic,semanticknowledgeaboutthelanguage• discourseinformation,realworldknowledge
• Toprocessspokenlanguage,weneedeverythingrequiredtoprocesswrittentext,plusthechallengesofspeechrecognitionandspeechsynthesis.
ComponentsofNLP
NaturalLanguageUnderstandingMappingthegiveninputinthenaturallanguageintoausefulrepresentation.
NaturalLanguageGenerationProducingoutputinthenaturallanguage
fromsomeinternalrepresentation.
NLUnderstanding ismuchharderthanNLGeneration.But,stillbothofthemarehard.
WhyNLUnderstandingishard?
• Naturallanguageisextremelyrichinformandstructure,andveryambiguous.• Oneinputcanmeanmanydifferentthings.Ambiguitycanbeatdifferentlevels.• Lexical(wordlevel)ambiguity-- differentmeaningsofwords• Syntacticambiguity-- differentwaystoparsethesentence• Interpretingpartialinformation-- howtointerpretpronouns
• Manyinputcanmeanthesamething.• Interactionamongcomponentsoftheinputisnotclear.
BİL711 Natural Language Processing 9
Exampleofambiguity
• Someinterpretationsof: Adamı gördüm.1. Isawtheman.2. IsawAdam3. Isawmyisland.4. Ivisitedmyisland.5. IsawmyADA6. IvisitedmyADA7. Ibribedtheman.
• SemanticAmbiguity:• gör tosee• gör tovisit• gör tobribe
BİL711 Natural Language Processing 10
ResolveAmbiguities
• lexicaldisambiguation -- Resolutionofpart-of-speechandword-senseambiguitiesaretwoimportantkindsoflexicaldisambiguation.• syntacticambiguity -- canbeaddressedbyprobabilisticparsing.
WhatisPoS tagging?Whyisitimportant?
“gül” - ?
Eachwordhasapart-of-speechtagtodescribeitscategory.POSTaggerstrytofindPOStagsforthewords.
Whyitmatters?Applicationsu Machinetranslation– “Daşıdaşı”
Phonetics&Phonology (Speechsound)
Morphology, LexiconWords&theirforms(Words&theirforms)
Syntax,Parsing (Structureofsentences)
Semantics(Meaningofsentences)
Pragmatics (Meaning incontext&forapurpose )
Discourse(Connectedsentenceprocessinginalargerbodyoftext)
LanguageProcessing
Applicationsforspellingcorrection
Websearch
PhonesWordprocessing
Spellingcorrection
- Cinseddi dunyanınyedimocusəsindənbirdir.- Niye?- Cunki duzəldikleti enuzunomurlu sheydi.f( )=
-Çinsəddidünyanınyeddimöcüzəsindənbiridir.- Niyə?- Çünkidüzəltdiklətiənuzunömürlüşeydir.
WhenthespaceorganizationNASAfirststartedsendingupastronaunts theydiscoveredballpointpenswouldnotworkinzerogravity.Tosolvetheproblem,NASAscientistsspenttenyearsand12billion todevelopapenthatwouldwriteinzerogravity,upsidedown,underwater,onalltypesofsurface,andattemperaturesrangingfrombelowfreezing to300C.Russiansusedapencil.Originaltext
NASAkosmik təşkilatı ilkdəfə astronavtların göndərilməsinə başladıqları zaman,ballpointqələmləri sıfır çəkisi ilə işləməyəcəkdi.Problemi həll etmək üçün,NASAalimləri sıfır ağırlıq,baş aşağı,su altında,bütün səthlərdə və aşağıdadondurmadan 300dərəcə qədər dəyişən temperaturda yazacaqbir qələmhazırlamaq üçün onil və 12milyard dollar sərf etmişdir.Ruslar bir qələmistifadə edirdi.Google
Kosmos təşkilatı NASAilkastronaunts qaldıraraq başlayanda onlar kəşf diyircəkliqələmlər çəkisizlik şəraitində işləməyəcək.Problemin həlli,NASAalimləriçəkisizlik yazacaqqələm inkişaf etdirmək,onil və 12milyard xərcləyib,tərsinə,su altında,bütün səthinin növləri,300c-aşaxta tutmuş və temperaturda .karandaş istifadə olunan Ruslar.Dilmanc
Machinetranslation
Whyismachinetranslationhard?
• Requiresbothunderstandingthe“from”languageandgeneratingthe“to”language.
• Howcanweteachacomputera“secondlanguage”whenitdoesn’tevenreallyhaveafirstlanguage?
• Canwedomachinetranslationwithoutsolvingnaturallanguageunderstanding andnaturallanguagegeneration first?
TextClassification
•Assigning subjectcategories,topics,orgenres•Spamdetection•Authorship identification•Age/gender identification• LanguageIdentification•Sentimentanalysis•…
Informationretrieval
• Informationretrieval istheactivityofobtaininginformationresourcesrelevanttoaninformationneedfromacollectionofinformationresources.
TextSummarization• Goal:produceanabridgedversionofatextthatcontainsinformationthatisimportantorrelevanttoauser.
Dialoguesystems
• A dialog system or conversational agent (CA)is acomputer systemintended to conversewith ahuman,with acoherent structure.
QuestionAnswering:
24
Deep Learning Algorithms NLP Usage
NeuralNetwork– NN(feed)
•Part-of-speechTagging•Tokenization•NamedEntityRecognition•IntentExtraction
RecurrentNeuralNetworks-(RNN)•MachineTranslation•QuestionAnsweringSystem•ImageCaptioning
RecursiveNeuralNetworks
•Parsingsentences•SentimentAnalysis•Paraphrasedetection•RelationClassification•Objectdetection
ConvolutionalNeuralNetwork-(CNN)
•Sentence/Textclassification•Relationextractionandclassification•Spamdetection•Categorizationofsearchqueries•Semanticrelationextraction
References:
• 1.OverviewofArtificialIntelligenceandNaturalLanguageProcessing.NAVDEEPSINGHGILL.https://www.upwork.com/hiring/for-clients/artificial-intelligence-and-natural-language-processing-in-big-data/
2.NaturalLanguageProcessingMarkettoReach$22.3Billionby2025https://www.tractica.com/newsroom/press-releases/natural-language-processing-market-to-reach-22-3-billion-by-2025/3.DanJurafsky.NaturalLanguageProcessingLectures.4.BİL711NaturalLanguageProcessing.Prof.Dr.İlyas Çiçekli.