Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Advanced Artificial Intelligence
Natural Language Processing
Chung-Ang University, Van Dat Tuong
Good afternoon everyone. Today I am delighted to be here to talk about Natural Language Processing (NLP), a
subfield of Artificial Intelligence.
Contents
• Introduction
• History
• Rule-based vs Statistical NLP
• Major Evaluations and tasks
• How NLP works
2Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
My talk is divided into 5 main parts: Introduction, History, Rule-based vs Statistical NLP, Major Evaluations and
tasks, and How NLP works
Introduction
People communicate in many different ways: through speaking and listening, making gestures, using specialized
hand signals (such as when driving or directing traffic), using sign languages for the deaf, or through various forms
of text.
3Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Introduction
By text we mean words that are written or printed on a flat surface (paper, card, street signs and so on) or displayed
on a screen or electronic devices in order to be read by their intended recipients (or by whoever happens to be
passing by).
4Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Introduction
This presentation will focus on concerning with the method in which computer systems can analyze and interpret
texts, and I will assume for convenience that these texts are presented in an electronic format (documents, images,
etc). This leads to the term, Natural Language Processing.
5Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Introduction
NLP is defined as a subfield of computer science, information engineering, and artificial intelligence concerned
with the interactions between computers and human (natural) languages, in particular how to program computers
to process and analyze large amounts of natural language data.
6Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Introduction
Let's use an example to show just how powerful NLP is used in a practical situation: When you're typing on google
search, like many of us do every day, you'll see word suggestions based on what you typed and what you're
currently typing. That's NLP in action.
7Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Introduction
As a human, you may speak and write in English, Korean, or Chinese. But a computer’s native language – known as
machine code or machine language, is largely incomprehensible to most people. At the device’s lowest levels,
communication occurs not with words but through millions of '0’ and '1’ (binary form) that produce logical actions.
8Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
The real reason why NLP is hard
The process of reading and understanding language is complex than it seems at the first glance. There are many
things that go into truly understanding what a piece of text means in the real-world. For example, what do you
think the above piece of text means?
9Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
The real reason why NLP is hard
To a human it’s probably quite obvious what this sentence means. We know Steph Curry is a basketball player; or
even if not, we know that he plays on some kind of team, probably a sport team. When we see “on fire” and
“destroyed” we know that it means Steph Curry played really well last night and beat the other team.
10Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
The real reason why NLP is hard
On the other way, computer tends to take things a bit too literally. Viewing things literally like a computer, we would
see “Steph Curry” based on the capitalization, assume that’s a person, place, or otherwise important thing which is
great! But then we see that Steph Curry “was on fire”. A computer might tell you that someone literally lit Steph
Curry onfire yesterday! … yikes. After that, the computer might say that he has physically destroyed the other
team…. and they no longer exist according to this computer… great…
11Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Fortunately, not all is grim! Thanks to Machine Learning computer systems can actually do some really clever
things to quickly extract and understand information from natural language!
12Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Sol.
History
The history of NLP generally started in 1950s, although work can be found from earlier periods. In 1950, Alan
Turing published an article titled "Intelligence" which proposed what is now called the Turing test as a
criterion of intelligence.
13Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
History
In 1954, Georgetown experiment involved fully automatic translation of more than sixty Russian sentences
into English. The authors claimed that within three or five years then, machine translation would be a solved
problem, however, real progress was much slower.
14Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
History
In 1960s, some notably successful NLP systems were developed, such as SHRDLU - a natural language
system working in restricted "blocks worlds" with restricted vocabularies.
15Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
History
And ELIZA - a simulation of a Rogerian psychotherapist, sometimes can provide a startlingly human-like
interaction. ELIZA might provide a generic response like "Why do you want to be happy?" responding for
statement “I want to be happy“.
16Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
History
During 1970s, many programmers began to write "conceptual ontologies", which structured real-world
information into computer-understandable data. By the time, many chatterbots were written including PARRY,
Racter, and Jabberwacky, conceptually similar to ELIZA in previous stage.
17Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
History
Starting from late 1980s, there was a revolution in NLP with the introduction of machine learning algorithms.
Some of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-
then rules, similar to existed hand-written rules.
18Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
History
19Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
However, part-of-speech tagging introduced the use of hidden Markov models (HMM) to NLP, and increasingly,
research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued
weights to the features, making up the input data.
Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very
common for the real-world data), and produce more reliable results when integrated into a larger system comprising
multiple subtasks.
History
20Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Recent researches have increasingly focused on unsupervised and semi-supervised learning algorithms which are
able to learn from non-annotated data or combinations of annotated and non-annotated data. Typically, comparing
with supervised learning, results are less accurate for a given amount of input data, however, there is an enormous
amount of non-annotated data available, which can often make up the inferior results.
History
21Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
In 2010s, feature learning and deep neural network-style machine learning methods became widespread in NLP, due
in part to a flurry of results showing that such techniques can achieve state-of-the-art results in many NLP tasks.
History
22Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Popular techniques include the use of word embeddings to capture semantic properties of words, and an increase in
end-to-end learning of a higher-level task (e.g. question answering) instead of relying on a pipeline of separate
intermediate tasks (e.g. part-of-speech tagging and dependency parsing).
History
23Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
In some areas, this shift has entailed substantial changes in how NLP systems are designed, such that deep neural
network-based approaches may be viewed as a new paradigm distinct from statistical NLP. For instance, the term
neural machine translation (NMT) emphasizes the fact that deep learning-based approaches to machine translation
directly learn sequence-to-sequence transformations, obviating the need for intermediate steps such as word
alignment and language modeling that were used in statistical machine translation (SMT).
Rule-based vs Statistical NLP
24Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
In the early days, many language-processing systems were designed by hand-coding a set of rules, e.g. by writing
grammars or devising heuristic rules for stemming. However, this is rarely robust to natural language variation.
Since the so-called "statistical revolution" in the late 1980s and mid 1990s, the machine-learning paradigm calls
instead for using statistical inference to automatically learn such rules through the analysis of large corpora of
typical real-world examples (a set of documents, possibly with human or computer annotations).
Rule-based vs Statistical NLP
25Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Systems based on machine-learning algorithms have many advantages over hand-produced rules:
• The learning procedures used during machine learning automatically focus on the most common cases, whereas
when writing rules by hand it is often not at all obvious where the effort should be directed.
• Automatically learning procedures can make use of statistical-inference algorithms to produce models that are
robust to unfamiliar input and to erroneous input.
Rule-based vs Statistical NLP
26Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
• Based on automatically learning the rules, systems are more accurate simply by supplying more input data which
requires only a corresponding increase in the number of man-hours worked, generally without significant
increases in the complexity of the annotation process.
Major evaluations and tasks
The next slides consist some of the most commonly researched tasks in NLP. Note that some of these tasks
have direct real-world applications, while others more commonly serve as subtasks that are used to aid in
solving larger tasks.
Though NLP tasks are closely intertwined, they are frequently subdivided into categories for convenience.
27Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
1.Syntax
2.Semantics
3.Discourse
4.Speech
Syntax
Grammar induction, also known as grammar inference, is the process in machine learning which learns a “formal
grammar” which describes a language’s syntax from a set of observations, thus constructing a model which
accounts for the characteristics of the observed objects.
1. Grammar induction
28Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Syntax
More generally, grammar inference is a branch of machine learning where the instance space consists of discrete
combinatorial objects such as strings, trees and graphs.
1. Grammar induction
29Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Syntax
In computational linguistics, lemmatization is the algorithmic process of determining the lemma of a word based
on its intended meaning. It depends on correctly identifying the intended part of speech and meaning of a word in
a sentence, as well as within the larger context surrounding that sentence, such as neighboring sentences or even an
entire document.
2. Lemmatization
30Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Syntax
In linguistic morphology and information retrieval, stemming is the process of reducing inflected words to their
word stem, base or root form - generally a written word form. It is usually sufficient that related words map to the
same stem, even if this stem is not in itself a valid root.
3. Stemming
31Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Syntax
Lemmatization is closely related to stemming. The difference is that a stemmer operates on a single word without
knowledge of the context, and therefore cannot discriminate between words which have different meanings
depending on part of speech. However, stemmers are typically easier to implement and run faster. The reduced
accuracy may not matter for some applications.
3. Stemming
32Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Syntax
Morphological segmentation separates words into individual morphemes and identify the class of the morphemes.
The difficulty of this task depends greatly on the complexity of the morphology (i.e. the structure of words) of the
language being considered.
4. Morphological segmentation
33Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Syntax
English has fairly simple morphology, especially inflectional morphology, and thus it is often possible to ignore this
task entirely and simply model all possible forms of a word (e.g. "open, opens, opened, opening") as separate words.
Some other languages like Turkish, Meitei can not apply this approach as each dictionary entry has thousands of
possible word forms.
4. Morphological segmentation
34Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Syntax
Tagging is the task of labelling (or tagging) each word in a sentence with the appropriate part of speech. Given a
sentence, determine the part of speech for each word. Inflectional morphology languages, like English is prone to
ambiguity. Chinese is also prone to ambiguity because it is a tonal language during verbalization.
35Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
5. Part-of-Speech tagging
Syntax
Part-of-speech tagging is harder than just having a list of words and their parts of speech, because some words can
represent more than one part of speech at different times, and because some parts of speech are complex or
unspoken.
Many words, especially common ones, can serve as multiple parts of speech.
"book" :
36Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
noun ("the book on the table")
or verb ("to book a flight");
"set" can be a noun, verb or adjective; and
"out" can be any of at least five different parts of speech.
5. Part-of-Speech tagging
Syntax
Parsing determines the parse tree (grammatical analysis) of a given sentence. The grammar for natural languages is
ambiguous and typical sentences have multiple possible analyses. In fact, perhaps surprisingly, for a typical
sentence there may be thousands of potential parses (most of which will seem completely nonsensical to a human).
6. Parsing
37Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Syntax
There are two primary types of parsing: Dependency Parsing and Constituency Parsing. Dependency Parsing
focuses on the relationships between words in a sentence (marking things like Primary Objects and predicates),
whereas Constituency Parsing focuses on building out the Parse Tree using a Probabilistic Context-Free Grammar.
6. Parsing
38Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Syntax
Accurately sentence breaking requires analysis of the local context around periods and the punctuations. Given a
chunk of text, find the sentence boundaries. Sentence boundaries are often marked by periods or other punctuation
marks, but these same characters can serve other purposes (e.g. marking abbreviations).
7. Sentence breaking
39Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Syntax
Word segmentation separates a chunk of continuous text into separate words. For a language like English, this is
fairly trivial, since words are usually separated by spaces. However, some languages like Chinese and Japanese
require knowledge of the vocabulary and morphology of words in the language.
8. Word Segmentation
Sentence Segmentation
Image segmentation
Topic segmentation
40Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Syntax
Terminology extraction (also known as term extraction, glossary extraction, term recognition, or terminology
mining) is a subtask of information extraction. The goal of terminology extraction is to automatically extract
relevant terms from a given corpus.
9. Terminology extraction
41Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Lexical semantics is a subfield of linguistic semantics. The units of analysis in lexical semantics are lexical units
which include not only words but also sub-words or sub-units such as affixes and even compound words and phrases.
It looks at how the meaning of the lexical units correlates with the structure of the language or syntax.
1. Lexical Semantics
• He plays bass guitar.
• That bass was delicious!
42Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
The study of lexical semantics looks at the classification and decomposition of lexical items, the differences and
similarities in lexical semantic structure cross-linguistically, and the relationship of lexical meaning to sentence
meaning and syntax.
1. Lexical Semantics
43Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that
investigates the use of software to automatically translate text or speech from one human language to another.
2. Machine translation
44Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
This is one of the most difficult problems, and is a member of a class of problems requiring all of the different
types of knowledge that humans possess (grammar, semantics, facts about the real world, etc.) in order to solve
properly.
2. Machine translation
45Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Translation process includes:
• Decoding the meaning of the source text, and
• Re-encoding this meaning in the target language
2. Machine translation
46Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Given a stream of text, determine which items in the text map to proper names, such as people or places, and what
the type of each such name is (e.g. person, location). Note that, although capitalization can aid in recognizing
named entities in languages such as English, this information cannot aid in determining the type of entity,
3. Natural entity recognition (NER)
47Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
and in any case is often inaccurate or insufficient. For example, the first word of a sentence is also capitalized, and
named entities often span several words, only some of which are capitalized. Furthermore, many other languages in
non-Western scripts like Chinese do not have any capitalization at all.
3. Natural entity recognition (NER)
48Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
And even the languages with capitalization, for example, German capitalizes all nouns, regardless of whether they
are names, French and Spanish do not capitalize names that serve as adjectives.
3. Natural entity recognition (NER)
49Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Natural language generation is one task of NLP that focuses on generating natural language from structured data
such as a knowledge base or a logical form (linguistics). It can be used to produce long documents that
summarize or explain the contents of computer databases, for example making news reports.
4. Natural language generation
50Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
It can also be used to generate short blurbs of text in interactive conversations (a chatterbot) which might even be
read out loud by a text-to-speech system.
4. Natural language generation
51Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Natural language understanding converts chunks of text into more formal representations such as first-order logic
structures that are easier for computer programs to manipulate. It involves the identification of the intended
semantic from the multiple possible semantics which can be derived from a natural language expression
5. Natural language understanding
52Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
which usually takes the form of organized notations of natural language concepts. Introduction and creation of
language metamodel and ontology are efficient, however, empirical. An explicit formalization of natural language
semanticswithout confusions with implicit assumptions such as closed-world assumption (CWA) vs.
5. Natural language understanding
53Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
open-world assumption (OWA), or subjective Yes/No vs. objective True/False is expected for the construction of a
basis of semantics formalization.
5. Natural language understanding
54Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Given an image representing printed text, determine the corresponding text.
6. Optical character recognition(OCR)
55Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Question answering is a computer science discipline within the fields of information retrieval and NLP, which is
concerned with building systems that automatically answer, or even ask questions in a natural human language.
7. Question answering
56Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
An important feature in V.A:
❑ Apple Siri
❑ Cortana (MS)
❑ Google Assistant
❑ Samsung Bixby
❑ Jarvis (FB)
Semantics
Given a human-language question, determine its answer. Typical questions have a specific right answer (such as
"What is the capital of Canada?"), but sometimes open-ended questions are also considered (such as "What is the
meaning of life?"). Recent works have looked at even more complex questions.
7. Question answering
57Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Given two text fragments, determine if one being true entails the other, entails the other's negation, or allows the
other to be either true or false.
8. Recognizing Textual entailment
58Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Given a chunk of text, identify the relationships among named entities (e.g. Roman Empire has the capital Rome;
Roman Roads has length of 400.000km).
9. Relationship extraction
59Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Semantic analysis extracts subjective information usually from a set of documents, often using online reviews to
determine polarity about specific objects. It is especially useful for identifying trends of public opinion in the
social media, for the purpose of marketing.
10. Semantic analysis
60Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Given a chunk of text, separate it into segments that each of which is devoted to a topic, and identify the topic of
the segment.
11. Topic segmentation and recognition
61Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Semantics
Many words have more than one meaning; we have to select the meaning which makes the most sense in the
context. For this problem, we are typically given a list of words and associated word senses, e.g. from a dictionary
or from an online resource such as WordNet.
12. Word sense disambiguation
62Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Discourse
Automatic summarization is a part of machine learning and data mining. The main idea of summarization is to find a
subset of data which contains the information of the entire data set. Such techniques are widely used in industry
today.
1. Automatic Summarization
63Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Discourse
In details, it is the process of shortening a text document with software, in order to create a summary with the major
points of the original document. It produces a readable summary of a chunk of text and often be used to provide
summaries of text of a known type, such as articles in the financial section of a newspaper.
1. Automatic Summarization
64Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Discourse
Given a sentence or larger chunk of text, determine which words refer to the same objects. Anaphora
resolution is a specific example of this task, and is specifically concerned with matching up pronouns with the
nouns or names to which they refer.
2. Coreference resolution
65Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Discourse
More general idea comes to "bridging relationships" which involves referring expressions. For example, in a
sentence "He entered John's house through the front door", "the front door" is a referring expression and the
bridging relationship to be identified is the fact that the door being referred to is the front door of John's house.
2. Coreference resolution
66Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Discourse
Discourse analysis is a general term for a number of approaches to analyze written, vocal, or sign language use, or
any significant semiotic event. It includes a number of related tasks. One task is identifying the discourse structure
of connected text, i.e. the nature of the discourse relationships between sentences (e.g. elaboration, explanation,
3. Discourse analysis
67Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Discourse
contrast). Another possible task is recognizing and classifying the speech acting in a chunk of text (e.g. yes-no
question, content question, statement, assertion, etc.). In other words, discourse analysis study the way sentences
and speech go together to make texts and interactions and how those texts and interactions fit into our social world.
3. Discourse analysis
68Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Speech
Given a sound clip of a person or people speaking, determine the textual representation of the speech. This is the
opposite of text to speech and is one of the extremely difficult problems. In natural speech there are hardly any
pauses between successive words, and thus "speech segmentation" is a necessary subtask of speech recognition.
1. Speech recognition
69Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Speech
Note also that in most spoken languages, the sounds representing successive letters blend into each other in a
process termed co-articulation, so the conversion of the analog signal to discrete characters can be a very difficult
process.
1. Speech recognition
70Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Speech
The quality of a speech recognition systems are assessed according to two factors: Accuracy (error rate in
converting spoken words to digital data) and speed (how well the software can keep up with a human speaker).
1. Speech recognition
71Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Speech
Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken
natural languages. It’s a subtask of speech recognition as mentioned above andtypically grouped with it.
2. Speech segmentation
72Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Speech
Given a text, transform those units and produce a spoken representation. Text-to-speech can be used to aid the
visually impaired. A text-to-speech (TTS) system converts normal language text intospeech.
3. Text-to-speech
73Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Speech
A text-to-speech system (or "engine") is composed of two parts: front-end and back-end. The front-end has two
major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of
written-out words. Then, it assigns phonetic transcriptions to each word, and divides and marks the text into
prosodic units, like phrases, clauses, and sentences.
3. Text-to-speech
74Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Speech
The back-end, often referred to as the synthesizer, then converts the symbolic linguistic representation into sound.
In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations),
which is then imposed on the output speech.
3. Text-to-speech
75Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
So, How does NLP work?
As mentioned above, NLP is a form of artificial intelligence that analyzes the human language. It takes many
forms, but at its core, the technology helps machine understanding, and even communicating with humans.
76Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
So, How does NLP work?
Understanding NLP isn't the easiest thing. It's a highly advanced form of AI that's only recently become
viable. It means that not only we are still learning about NLP but also it's very difficult to grasp.
77Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
So, How does NLP work?
78Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
1
2
3
4
5
6
7
So, How does NLP work?
What following below is the easiest way to understand how NLP works. The first step in NLP depends on the
specific application of the system.
79Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
So, How does NLP work?
Voice-based systems like Alexa or Google Assistant need to translate your words into text. That's done (usually)
using the Hidden Markov Models system (HMM). The HMM uses math models to determine what you've said and
translate that into text usable by the NLP system.
80Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
So, How does NLP work?
Put it in the simplest way, the HMM listens to 10- to 20-millisecond clips of your speech and looks for phonemes
(the smallest unit of speech) to compare with pre-recorded speech. Next is the actual understanding of the language
and context.
81Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
So, How does NLP work?
Each NLP system uses slightly different techniques, but on the whole, they're fairly similar. The systems try to break
each word down into its part of speech (noun, verb, etc.). This happens through a series of coded grammar rules that
rely on algorithms that incorporate statistical machine learning to help to determine the context of what you said.
82Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
So, How does NLP work?
If we're not talking about speech-to-text NLP, the system just skips the first step and moves directly into analyzing
the words using the algorithms and grammar rules. The end result is the ability to categorize what is said in many
different ways. Depending on the underlying focus of the NLP software, the results get used in different ways.
83Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
Future vision
Virtual assistant is one among important future visions. Virtual assistants will become better at understanding and
responding to complex and long-form natural language requests, which use conversational language, in real time.
These assistants will be able to converse more like humans, take notes during dictation, analyze complex requests
and execute tasks in a single context, suggest important improvements to business documents, and more.
Human-like virtual assistants
84Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong
THANK YOU FOR
LISTENING!!
85Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong