Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Advanced Artificial Intelligence

Natural Language Processing

Chung-Ang University, Van Dat Tuong

Good afternoon everyone. Today I am delighted to be here to talk about Natural Language Processing (NLP), a

subfield of Artificial Intelligence.

Contents

• Introduction

• History

• Rule-based vs Statistical NLP

• Major Evaluations and tasks

• How NLP works

2Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

My talk is divided into 5 main parts: Introduction, History, Rule-based vs Statistical NLP, Major Evaluations and

tasks, and How NLP works

Introduction

People communicate in many different ways: through speaking and listening, making gestures, using specialized

hand signals (such as when driving or directing traffic), using sign languages for the deaf, or through various forms

of text.


Introduction

By text we mean words that are written or printed on a flat surface (paper, card, street signs and so on) or displayed

on a screen or electronic devices in order to be read by their intended recipients (or by whoever happens to be

passing by).


Introduction

This presentation will focus on concerning with the method in which computer systems can analyze and interpret

texts, and I will assume for convenience that these texts are presented in an electronic format (documents, images,

etc). This leads to the term, Natural Language Processing.


Introduction

NLP is defined as a subfield of computer science, information engineering, and artificial intelligence concerned

with the interactions between computers and human (natural) languages, in particular how to program computers

to process and analyze large amounts of natural language data.


Introduction

Let's use an example to show just how powerful NLP is used in a practical situation: When you're typing on google

search, like many of us do every day, you'll see word suggestions based on what you typed and what you're

currently typing. That's NLP in action.


Introduction

As a human, you may speak and write in English, Korean, or Chinese. But a computer’s native language – known as

machine code or machine language, is largely incomprehensible to most people. At the device’s lowest levels,

communication occurs not with words but through millions of '0’ and '1’ (binary form) that produce logical actions.


The real reason why NLP is hard

The process of reading and understanding language is complex than it seems at the first glance. There are many

things that go into truly understanding what a piece of text means in the real-world. For example, what do you

think the above piece of text means?



To a human it’s probably quite obvious what this sentence means. We know Steph Curry is a basketball player; or

even if not, we know that he plays on some kind of team, probably a sport team. When we see “on fire” and

“destroyed” we know that it means Steph Curry played really well last night and beat the other team.



On the other way, computer tends to take things a bit too literally. Viewing things literally like a computer, we would

see “Steph Curry” based on the capitalization, assume that’s a person, place, or otherwise important thing which is

great! But then we see that Steph Curry “was on fire”. A computer might tell you that someone literally lit Steph

Curry onfire yesterday! … yikes. After that, the computer might say that he has physically destroyed the other

team…. and they no longer exist according to this computer… great…


Fortunately, not all is grim! Thanks to Machine Learning computer systems can actually do some really clever

things to quickly extract and understand information from natural language!


Sol.

History

The history of NLP generally started in 1950s, although work can be found from earlier periods. In 1950, Alan

Turing published an article titled "Intelligence" which proposed what is now called the Turing test as a

criterion of intelligence.


History

In 1954, Georgetown experiment involved fully automatic translation of more than sixty Russian sentences

into English. The authors claimed that within three or five years then, machine translation would be a solved

problem, however, real progress was much slower.


History

In 1960s, some notably successful NLP systems were developed, such as SHRDLU - a natural language

system working in restricted "blocks worlds" with restricted vocabularies.


History

And ELIZA - a simulation of a Rogerian psychotherapist, sometimes can provide a startlingly human-like

interaction. ELIZA might provide a generic response like "Why do you want to be happy?" responding for

statement “I want to be happy“.


History

During 1970s, many programmers began to write "conceptual ontologies", which structured real-world

information into computer-understandable data. By the time, many chatterbots were written including PARRY,

Racter, and Jabberwacky, conceptually similar to ELIZA in previous stage.


History

Starting from late 1980s, there was a revolution in NLP with the introduction of machine learning algorithms.

Some of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-

then rules, similar to existed hand-written rules.


History


However, part-of-speech tagging introduced the use of hidden Markov models (HMM) to NLP, and increasingly,

research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued

weights to the features, making up the input data.

Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very

common for the real-world data), and produce more reliable results when integrated into a larger system comprising

multiple subtasks.

History


Recent researches have increasingly focused on unsupervised and semi-supervised learning algorithms which are

able to learn from non-annotated data or combinations of annotated and non-annotated data. Typically, comparing

with supervised learning, results are less accurate for a given amount of input data, however, there is an enormous

amount of non-annotated data available, which can often make up the inferior results.

History


In 2010s, feature learning and deep neural network-style machine learning methods became widespread in NLP, due

in part to a flurry of results showing that such techniques can achieve state-of-the-art results in many NLP tasks.

History


Popular techniques include the use of word embeddings to capture semantic properties of words, and an increase in

end-to-end learning of a higher-level task (e.g. question answering) instead of relying on a pipeline of separate

intermediate tasks (e.g. part-of-speech tagging and dependency parsing).

History


In some areas, this shift has entailed substantial changes in how NLP systems are designed, such that deep neural

network-based approaches may be viewed as a new paradigm distinct from statistical NLP. For instance, the term

neural machine translation (NMT) emphasizes the fact that deep learning-based approaches to machine translation

directly learn sequence-to-sequence transformations, obviating the need for intermediate steps such as word

alignment and language modeling that were used in statistical machine translation (SMT).

Rule-based vs Statistical NLP


In the early days, many language-processing systems were designed by hand-coding a set of rules, e.g. by writing

grammars or devising heuristic rules for stemming. However, this is rarely robust to natural language variation.

Since the so-called "statistical revolution" in the late 1980s and mid 1990s, the machine-learning paradigm calls

instead for using statistical inference to automatically learn such rules through the analysis of large corpora of

typical real-world examples (a set of documents, possibly with human or computer annotations).



Systems based on machine-learning algorithms have many advantages over hand-produced rules:

• The learning procedures used during machine learning automatically focus on the most common cases, whereas

when writing rules by hand it is often not at all obvious where the effort should be directed.

• Automatically learning procedures can make use of statistical-inference algorithms to produce models that are

robust to unfamiliar input and to erroneous input.



• Based on automatically learning the rules, systems are more accurate simply by supplying more input data which

requires only a corresponding increase in the number of man-hours worked, generally without significant

increases in the complexity of the annotation process.

Major evaluations and tasks

The next slides consist some of the most commonly researched tasks in NLP. Note that some of these tasks

have direct real-world applications, while others more commonly serve as subtasks that are used to aid in

solving larger tasks.

Though NLP tasks are closely intertwined, they are frequently subdivided into categories for convenience.


1.Syntax

2.Semantics

3.Discourse

4.Speech

Syntax

Grammar induction, also known as grammar inference, is the process in machine learning which learns a “formal

grammar” which describes a language’s syntax from a set of observations, thus constructing a model which

accounts for the characteristics of the observed objects.

1. Grammar induction


Syntax

More generally, grammar inference is a branch of machine learning where the instance space consists of discrete

combinatorial objects such as strings, trees and graphs.

1. Grammar induction


Syntax

In computational linguistics, lemmatization is the algorithmic process of determining the lemma of a word based

on its intended meaning. It depends on correctly identifying the intended part of speech and meaning of a word in

a sentence, as well as within the larger context surrounding that sentence, such as neighboring sentences or even an

entire document.

2. Lemmatization


Syntax

In linguistic morphology and information retrieval, stemming is the process of reducing inflected words to their

word stem, base or root form - generally a written word form. It is usually sufficient that related words map to the

same stem, even if this stem is not in itself a valid root.

3. Stemming


Syntax

Lemmatization is closely related to stemming. The difference is that a stemmer operates on a single word without

knowledge of the context, and therefore cannot discriminate between words which have different meanings

depending on part of speech. However, stemmers are typically easier to implement and run faster. The reduced

accuracy may not matter for some applications.

3. Stemming


Syntax

Morphological segmentation separates words into individual morphemes and identify the class of the morphemes.

The difficulty of this task depends greatly on the complexity of the morphology (i.e. the structure of words) of the

language being considered.

4. Morphological segmentation


Syntax

English has fairly simple morphology, especially inflectional morphology, and thus it is often possible to ignore this

task entirely and simply model all possible forms of a word (e.g. "open, opens, opened, opening") as separate words.

Some other languages like Turkish, Meitei can not apply this approach as each dictionary entry has thousands of

possible word forms.

4. Morphological segmentation


Syntax

Tagging is the task of labelling (or tagging) each word in a sentence with the appropriate part of speech. Given a

sentence, determine the part of speech for each word. Inflectional morphology languages, like English is prone to

ambiguity. Chinese is also prone to ambiguity because it is a tonal language during verbalization.


5. Part-of-Speech tagging

Syntax

Part-of-speech tagging is harder than just having a list of words and their parts of speech, because some words can

represent more than one part of speech at different times, and because some parts of speech are complex or

unspoken.

Many words, especially common ones, can serve as multiple parts of speech.

"book" :


noun ("the book on the table")

or verb ("to book a flight");

"set" can be a noun, verb or adjective; and

"out" can be any of at least five different parts of speech.

5. Part-of-Speech tagging

https://en.wikipedia.org/wiki/Parts_of_speech

Syntax

Parsing determines the parse tree (grammatical analysis) of a given sentence. The grammar for natural languages is

ambiguous and typical sentences have multiple possible analyses. In fact, perhaps surprisingly, for a typical

sentence there may be thousands of potential parses (most of which will seem completely nonsensical to a human).

6. Parsing


Syntax

There are two primary types of parsing: Dependency Parsing and Constituency Parsing. Dependency Parsing

focuses on the relationships between words in a sentence (marking things like Primary Objects and predicates),

whereas Constituency Parsing focuses on building out the Parse Tree using a Probabilistic Context-Free Grammar.

6. Parsing


Syntax

Accurately sentence breaking requires analysis of the local context around periods and the punctuations. Given a

chunk of text, find the sentence boundaries. Sentence boundaries are often marked by periods or other punctuation

marks, but these same characters can serve other purposes (e.g. marking abbreviations).

7. Sentence breaking


Syntax

Word segmentation separates a chunk of continuous text into separate words. For a language like English, this is

fairly trivial, since words are usually separated by spaces. However, some languages like Chinese and Japanese

require knowledge of the vocabulary and morphology of words in the language.

8. Word Segmentation

Sentence Segmentation

Image segmentation

Topic segmentation


Syntax

Terminology extraction (also known as term extraction, glossary extraction, term recognition, or terminology

mining) is a subtask of information extraction. The goal of terminology extraction is to automatically extract

relevant terms from a given corpus.

9. Terminology extraction


Semantics

Lexical semantics is a subfield of linguistic semantics. The units of analysis in lexical semantics are lexical units

which include not only words but also sub-words or sub-units such as affixes and even compound words and phrases.

It looks at how the meaning of the lexical units correlates with the structure of the language or syntax.

1. Lexical Semantics

• He plays bass guitar.

• That bass was delicious!


Semantics

The study of lexical semantics looks at the classification and decomposition of lexical items, the differences and

similarities in lexical semantic structure cross-linguistically, and the relationship of lexical meaning to sentence

meaning and syntax.

1. Lexical Semantics


Semantics

Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that

investigates the use of software to automatically translate text or speech from one human language to another.

2. Machine translation


Semantics

This is one of the most difficult problems, and is a member of a class of problems requiring all of the different

types of knowledge that humans possess (grammar, semantics, facts about the real world, etc.) in order to solve

properly.



Semantics

Translation process includes:

• Decoding the meaning of the source text, and

• Re-encoding this meaning in the target language



Semantics

Given a stream of text, determine which items in the text map to proper names, such as people or places, and what

the type of each such name is (e.g. person, location). Note that, although capitalization can aid in recognizing

named entities in languages such as English, this information cannot aid in determining the type of entity,

3. Natural entity recognition (NER)


Semantics

and in any case is often inaccurate or insufficient. For example, the first word of a sentence is also capitalized, and

named entities often span several words, only some of which are capitalized. Furthermore, many other languages in

non-Western scripts like Chinese do not have any capitalization at all.



Semantics

And even the languages with capitalization, for example, German capitalizes all nouns, regardless of whether they

are names, French and Spanish do not capitalize names that serve as adjectives.



Semantics

Natural language generation is one task of NLP that focuses on generating natural language from structured data

such as a knowledge base or a logical form (linguistics). It can be used to produce long documents that

summarize or explain the contents of computer databases, for example making news reports.

4. Natural language generation


Semantics

It can also be used to generate short blurbs of text in interactive conversations (a chatterbot) which might even be

read out loud by a text-to-speech system.

4. Natural language generation


Semantics

Natural language understanding converts chunks of text into more formal representations such as first-order logic

structures that are easier for computer programs to manipulate. It involves the identification of the intended

semantic from the multiple possible semantics which can be derived from a natural language expression

5. Natural language understanding


Semantics

which usually takes the form of organized notations of natural language concepts. Introduction and creation of

language metamodel and ontology are efficient, however, empirical. An explicit formalization of natural language

semanticswithout confusions with implicit assumptions such as closed-world assumption (CWA) vs.



Semantics

open-world assumption (OWA), or subjective Yes/No vs. objective True/False is expected for the construction of a

basis of semantics formalization.



Semantics

Given an image representing printed text, determine the corresponding text.

6. Optical character recognition(OCR)


Semantics

Question answering is a computer science discipline within the fields of information retrieval and NLP, which is

concerned with building systems that automatically answer, or even ask questions in a natural human language.

7. Question answering


An important feature in V.A:

❑ Apple Siri

❑ Cortana (MS)

❑ Google Assistant

❑ Samsung Bixby

❑ Jarvis (FB)

Semantics

Given a human-language question, determine its answer. Typical questions have a specific right answer (such as

"What is the capital of Canada?"), but sometimes open-ended questions are also considered (such as "What is the

meaning of life?"). Recent works have looked at even more complex questions.

7. Question answering


Semantics

Given two text fragments, determine if one being true entails the other, entails the other's negation, or allows the

other to be either true or false.

8. Recognizing Textual entailment


Semantics

Given a chunk of text, identify the relationships among named entities (e.g. Roman Empire has the capital Rome;

Roman Roads has length of 400.000km).

9. Relationship extraction


Semantics

Semantic analysis extracts subjective information usually from a set of documents, often using online reviews to

determine polarity about specific objects. It is especially useful for identifying trends of public opinion in the

social media, for the purpose of marketing.

10. Semantic analysis


Semantics

Given a chunk of text, separate it into segments that each of which is devoted to a topic, and identify the topic of

the segment.

11. Topic segmentation and recognition


Semantics

Many words have more than one meaning; we have to select the meaning which makes the most sense in the

context. For this problem, we are typically given a list of words and associated word senses, e.g. from a dictionary

or from an online resource such as WordNet.

12. Word sense disambiguation


Discourse

Automatic summarization is a part of machine learning and data mining. The main idea of summarization is to find a

subset of data which contains the information of the entire data set. Such techniques are widely used in industry

today.

1. Automatic Summarization


Discourse

In details, it is the process of shortening a text document with software, in order to create a summary with the major

points of the original document. It produces a readable summary of a chunk of text and often be used to provide

summaries of text of a known type, such as articles in the financial section of a newspaper.

1. Automatic Summarization


Discourse

Given a sentence or larger chunk of text, determine which words refer to the same objects. Anaphora

resolution is a specific example of this task, and is specifically concerned with matching up pronouns with the

nouns or names to which they refer.

2. Coreference resolution


Discourse

More general idea comes to "bridging relationships" which involves referring expressions. For example, in a

sentence "He entered John's house through the front door", "the front door" is a referring expression and the

bridging relationship to be identified is the fact that the door being referred to is the front door of John's house.

2. Coreference resolution


Discourse

Discourse analysis is a general term for a number of approaches to analyze written, vocal, or sign language use, or

any significant semiotic event. It includes a number of related tasks. One task is identifying the discourse structure

of connected text, i.e. the nature of the discourse relationships between sentences (e.g. elaboration, explanation,

3. Discourse analysis


Discourse

contrast). Another possible task is recognizing and classifying the speech acting in a chunk of text (e.g. yes-no

question, content question, statement, assertion, etc.). In other words, discourse analysis study the way sentences

and speech go together to make texts and interactions and how those texts and interactions fit into our social world.

3. Discourse analysis


Speech

Given a sound clip of a person or people speaking, determine the textual representation of the speech. This is the

opposite of text to speech and is one of the extremely difficult problems. In natural speech there are hardly any

pauses between successive words, and thus "speech segmentation" is a necessary subtask of speech recognition.

1. Speech recognition


Speech

Note also that in most spoken languages, the sounds representing successive letters blend into each other in a

process termed co-articulation, so the conversion of the analog signal to discrete characters can be a very difficult

process.



Speech

The quality of a speech recognition systems are assessed according to two factors: Accuracy (error rate in

converting spoken words to digital data) and speed (how well the software can keep up with a human speaker).



Speech

Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken

natural languages. It’s a subtask of speech recognition as mentioned above andtypically grouped with it.

2. Speech segmentation


Speech

Given a text, transform those units and produce a spoken representation. Text-to-speech can be used to aid the

visually impaired. A text-to-speech (TTS) system converts normal language text intospeech.

3. Text-to-speech


Speech

A text-to-speech system (or "engine") is composed of two parts: front-end and back-end. The front-end has two

major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of

written-out words. Then, it assigns phonetic transcriptions to each word, and divides and marks the text into

prosodic units, like phrases, clauses, and sentences.

3. Text-to-speech


Speech

The back-end, often referred to as the synthesizer, then converts the symbolic linguistic representation into sound.

In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations),

which is then imposed on the output speech.

3. Text-to-speech


So, How does NLP work?

As mentioned above, NLP is a form of artificial intelligence that analyzes the human language. It takes many

forms, but at its core, the technology helps machine understanding, and even communicating with humans.



Understanding NLP isn't the easiest thing. It's a highly advanced form of AI that's only recently become

viable. It means that not only we are still learning about NLP but also it's very difficult to grasp.




1

2

3

4

5

6

7


What following below is the easiest way to understand how NLP works. The first step in NLP depends on the

specific application of the system.



Voice-based systems like Alexa or Google Assistant need to translate your words into text. That's done (usually)

using the Hidden Markov Models system (HMM). The HMM uses math models to determine what you've said and

translate that into text usable by the NLP system.



Put it in the simplest way, the HMM listens to 10- to 20-millisecond clips of your speech and looks for phonemes

(the smallest unit of speech) to compare with pre-recorded speech. Next is the actual understanding of the language

and context.



Each NLP system uses slightly different techniques, but on the whole, they're fairly similar. The systems try to break

each word down into its part of speech (noun, verb, etc.). This happens through a series of coded grammar rules that

rely on algorithms that incorporate statistical machine learning to help to determine the context of what you said.



If we're not talking about speech-to-text NLP, the system just skips the first step and moves directly into analyzing

the words using the algorithms and grammar rules. The end result is the ability to categorize what is said in many

different ways. Depending on the underlying focus of the NLP software, the results get used in different ways.


Future vision

Virtual assistant is one among important future visions. Virtual assistants will become better at understanding and

responding to complex and long-form natural language requests, which use conversational language, in real time.

These assistants will be able to converse more like humans, take notes during dictation, analyze complex requests

and execute tasks in a single context, suggest important improvements to business documents, and more.

Human-like virtual assistants


THANK YOU FOR

LISTENING!!


Documents

Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence