42
The Semantics and Pragmatics of Natural Language Daniela GÎFU http://profs.info.uaic.ro/~daniela.gifu/ “ALEXANDRU IOAN CUZA” UNIVERSIT ATY OF IAŞI FACULTY OF COMPUTER SCIENCE

The Semantics and Pragmatics of Natural Language

  • Upload
    others

  • View
    9

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The Semantics and Pragmatics of Natural Language

The Semantics and Pragmatics

of Natural Language

Daniela GÎFU

http://profs.info.uaic.ro/~daniela.gifu/

“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI

FACULTY OF COMPUTER SCIENCE

Page 2: The Semantics and Pragmatics of Natural Language

Course 1

SPNL OVERVIEW

2

Page 3: The Semantics and Pragmatics of Natural Language

https://profs.info.uaic.ro/~daniela.gifu/

Who am I?

Page 4: The Semantics and Pragmatics of Natural Language

“Alexandru Ioan Cuza” University of Iași

T H E H A L L O F T H E L O S T S T E P S

Page 5: The Semantics and Pragmatics of Natural Language

Faculty of Computer Science

BE AMONG THE FIRST…..

Page 6: The Semantics and Pragmatics of Natural Language

Romanian Academy

Page 7: The Semantics and Pragmatics of Natural Language
Page 8: The Semantics and Pragmatics of Natural Language

What is this course about?

➢ Meaning and Natural Language Processing (NLP)

➢ Computational Semantics

➢ Computational Pragmatics

8

Page 9: The Semantics and Pragmatics of Natural Language

Familiarization

with relevant terminology

• Semantics

• Pragmatics

• Natural language

• Computational Linguistics

• Natural Language Processing

…9

Page 10: The Semantics and Pragmatics of Natural Language

Simulation of human (natural)

intelligence by machines

Interdisplinary field ~

Scientific study of

language from a

computational

perspective

A discipline that spans

theory and practice to

understand

computer systems and

networks at a deep level.10

Page 11: The Semantics and Pragmatics of Natural Language

Computational Linguistics (CL)

vs.

Natural Language Processing (CLP)

11

Page 12: The Semantics and Pragmatics of Natural Language

CL = gives theoretical background (computational

theories on language), linguistics models.

NLP = applied CL, including:

- natural language technology (NLT)

- human language technology (HLT)

12

Researches

Engineering techniques have to be underpinned by scientific

understanding…

Good performances in some

tasks when large amount of data

(with annotation) are available

Page 13: The Semantics and Pragmatics of Natural Language

Spoken language

- speech processing (from speech to text to syntax and

semantics to speech) - https://speechlogger.appspot.com/ro/

Ex: mobile

Written language – my area of interest

Language in correlation with other modalities

(multimodality)

- speech

- intonation

- image

Ex: GPS (Global Positioning System)13

Natural Language Technology

Page 14: The Semantics and Pragmatics of Natural Language

Document segmentation and interpretation

– cleaning (elimination of dots, enhancing contrast,

etc.)

– separation of text from image, curved lines...

– recognizing printed, semi-uncial characters, etc.

• Optical Character Recognition (OCR)

~ 100% accuracy in scanning printed Latin script

based material

Challenge in OCR

14

Written Language Technologies

Students?

Page 15: The Semantics and Pragmatics of Natural Language

15

OCR Handwriting – Why?

= presents some unique particularities

= many varieties of cursive writing

see: https://pdf.iskysoft.com/ocr-pdf/handwriting-ocr.html

Page 16: The Semantics and Pragmatics of Natural Language

16

OCR Handwriting very challenging

= the interpretation of physician handwriting (Rasmussen,

L.V. et al., 2012; Broda. B. & Piasecki, M., 2007)

= analysis of old handwritten documents (useful for linguists,

musicians, historians, etc.)

Document Image

Analysis

PR = a sub-topic of machine learning

(description or classification (recognition) of

measurements.

Page 17: The Semantics and Pragmatics of Natural Language

17

Differences between CL Approaches

Page 18: The Semantics and Pragmatics of Natural Language

•Analysis and understanding of written language

– sub-syntactic processing

• lexical units

• sentence splitting

• clause borders

• part of speech and morphological information

• lemmas

• entity names

• groups (nominal, verbal, prepositional, etc.)

and lexical attractions (collocations)

18

Written Language Technologies

Page 19: The Semantics and Pragmatics of Natural Language

• Language analysis and understanding

– semantic and discourse processing

• semantic disambiguation → word senses

• semantic roles labeling → NLTK

• rhetorical structure of discourse and dialogue →

RST (Rhetorical Structure Theory)

• anaphora resolution → StandfordCoreNLP

• text summarization → Machine Learning

19

Written Language Technologies

Page 20: The Semantics and Pragmatics of Natural Language

20

the study of mathematical structures and methods that are

of importance to linguistics.

→ Phonetics → Phonology → Morphology →

Syntax and → Semantics → and…

Sociolinguistics → Language Acquisition.

20

Mathematical Linguistics

Mathematical Linguistics before Computational Linguistics….

ML ⇔ CL?

Page 21: The Semantics and Pragmatics of Natural Language

= art of solving problems that need to analyze

(or generate) natural language text.

Find that metrics for a good solution to the

engineering problem…

NLP

Google Translate – Don’t blame!!!!

Romanian = Luceafărul de dimineață

English = The morning gentleman (bad answer)

= Morning star (good answer)

Why????

explains how human translators do their job...

21

Let’s try!

Page 22: The Semantics and Pragmatics of Natural Language

22

NLP – a subdomain of

Artificial Intelligence & Linguistics

Thematic Areas

- Linguistics - mathematical linguistics - computational

linguistics

- Formal Language

- Linguistic and Language Processing

- The grammatical structure of utterances: the sentence,

constituents, phrase, classifications and structural rules,

syntactic processing ...

- Parser or Syntax Analyzer

- Semantics & Pragmatics

Page 23: The Semantics and Pragmatics of Natural Language

= an area of Artificial Intelligence (AI) devoted to

creating computers that use NL as input and/or

output.

NLP

23

AI-hard problem

= machine reading

comprehension

= produces language

as output on the basis

of data input

Page 24: The Semantics and Pragmatics of Natural Language

= developing computational methods/models of human

linguistics behavior.

CL

▪ INFORMATION RETRIEVAL

▪ INFORMATION EXTRACTION

▪ MACHINE TRANSLATION

▪ QUESTION – ANSWERING

▪ SUMMARIZATION

▪ MACHINE READABLE DICTIONARIES

▪ SPELLING & GRAMMAR CHECKERS

24

Let’s describe and exemplify

Page 25: The Semantics and Pragmatics of Natural Language

2525

A discipline concerned with understanding written and spoken

language from a computational perspective.

- detecting synonymy (Grigonytė et al., 2010);

- developing WordNet (including Romanian - Gala et Mititelu,

2013), (Iftene and Balahur, 2007)...;

- WSD (Yang, H. et al. 2010), (Lefever et Hoste, 2010), (Tufiș,

2002)...;

- semantic annotation (Garcia et al., 2012)...;

- reconstructing a diachronic morphology (Cristea et al.,

2007/2012)

- diachronic text classification (Mihalcea and Năstase, 2012;

Popescu and Strapparava, 2015), etc.

- epoch detection (Gifu, 2015/2016/2017)...;

CL – Applications

Tools developed

by students…

Page 26: The Semantics and Pragmatics of Natural Language

26

Linguistic & Language Processing

1. Linguistics

- Science of language. Includes:

✓ Sounds (phonology)

✓ Word formation (morphology)

✓ Sentence structure (syntax)

✓ Meaning (semantics) and understanding

(pragmatics)…

2. Levels of linguistic analysis

- Higher level → Speech Recognition (SR)

- Lower levels → Natural Language Processing (NLP)

Page 27: The Semantics and Pragmatics of Natural Language

27

Levels of Linguistic Analysis

NLP

Letters - strings

Morphemes

Words

Phrases & sentences

Meaning out of context

Meaning in context

Phonemes

Acoustic signal

Speech

Recognition

Phonetics – production and perception of speech

Phonology – Sound patterns of language

Lexicon – Dictionary of words in a language

Morphology – Word formation and structure

Syntax – Sentence structure

Semantics – Intended meaning

Pragmatics – Understanding from external info

Page 28: The Semantics and Pragmatics of Natural Language

NLP Pipeline

Course purpose

28

Page 29: The Semantics and Pragmatics of Natural Language

29

MAIN CONCEPTS

1. Natural Language

- used by human beings for communication...

- sign, system, symbols, rule-set (or grammar)

2. Semantics

- literal meaning determined from a word, phrase,

sentence.

3. Pragmatics

- contextual meaning {situation, speaker, etc.}

Page 30: The Semantics and Pragmatics of Natural Language

30

Natural or ordinary language

• A system of speech symbols → (form criterion)

Types:

a) speech (spoken language)

b) signing (written language) - the representation of a spoken or

gestural language.

• The most important means of human communication →

(function criterion)

Page 31: The Semantics and Pragmatics of Natural Language

31

Natural Language…• Multiplicity of languages

Page 32: The Semantics and Pragmatics of Natural Language

32

Formal Language_I

1. Symbol

- a character, an abstract entity that has no meaning by

itself

Ex: lettters, digits and special characters

2. Alphabet

- finite set of symbols

- often denoted by Σ

Ex:

B = {0, 1} says B is an alphabet of two symbols, 0 and 1

C = {a, b, c} – C an alphabet of 3 symbols, a, b and c

* More about formal language:

http://www.its.caltech.edu/~matilde/FormalLanguageTheory.pdf

Page 33: The Semantics and Pragmatics of Natural Language

33

Formal Language_II

3. String or word

- a finite sequence of symbols from an alphabet

Ex: 01110 and 111 are strings from the alphabet B above

aaabccc and b are strings from the C above

4. Sentence

- a string of words.

Ex: I saw the gentleman with the hat.

String = a b c d e b f

Page 34: The Semantics and Pragmatics of Natural Language

34

Formal language_III

Define possible relations of parts of a string to each other?

A.

[I] saw the gentleman [with the binocular] = [a] b c d [e b f]

B.

I saw [the gentleman with the binocular] = a b [c d e b f ]

We can represent structures with trees…

I saw the gentleman with the binocular. I saw the gentleman with the binocular.

Page 35: The Semantics and Pragmatics of Natural Language

35

Formal Language_IV

5. Language

- a set of strings of symbols from an alphabet.

6. Natural Language or ordinary language

- open-ended = built on 3 different knowledge components: the

sound of words - phonology; the meaning of words -

semantics; the grammatical rules according to which words are

put together - syntax.

7. Formal language

- a set L of sequences/strings over some finite alphabet Σ

- described using formal grammars (a set of rules for strings,

specified to it).

- many application (e.g., Prognosis wearable system)

Page 36: The Semantics and Pragmatics of Natural Language

36

Formal Language_VContext-Free Grammars (CFG) - a finite set of grammar rules https://www.tutorialspoint.com/automata_theory/context_free_grammar_introduction.htm

= a quadruple (N, T, P, S) , where:

N = a finite set of non-terminal symbols (character or variable).

Note! Each n ∈ N = type of phrase/clause in the sentence.

T = a finite set of terminals (an alphabet, defined by the grammar) disjoint of N: N ∩ T = NULL.

P = a finite set of (rewrite) rules or productions of the grammar, from N to

P: N → (N ∪ T)*

Note! The left-hand side of the production rule P does have any right context or left

context. * = Kleene star operation = unary operation on sets of strings or sets of symbols or

characters → a set N is written as N* (used for regular expressions).

Ex: {"a", "b", "c"}* = {ε, "a", "b", "c", "aa", "ab",

"ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab",

...} - {ε} (the language consisting only of the empty string)

S = start symbol/start symbol, used to represent the whole sentence.

Page 37: The Semantics and Pragmatics of Natural Language

37

Main Concepts - IICONCLUSIONS

Computational semantics and pragmatics:

➢ automatic construction of semantic representations for NL

expressions (in context).

➢ automatic inferences over the representations.

Major Issues:

➢Ambiguity of various levels:

lexical, syntactic, semantic, pragmatic

➢ Interface between LF from linguistic form and context of use

(essential for modelling anaphora).

Tools used include:

➢ Information: syntax, world knowledge, lexical semantics,

corpora…

➢ Inference: logic (model checkers and theorem proving), machine

learning, statistics…

Page 38: The Semantics and Pragmatics of Natural Language

38

Semester Homework:

1. Each student has to present a paper about

his/her SEMEVAL task that guide final project

- https://aclweb.org/anthology/

between 2018-2021

EMNLP (Empirical Methods on Natural Language

Processing)

ACL (Association of Computational Linguistics)

EACL (European Association of Computational

Linguistics)

COLING (International Conference on

Computational Linguistics) …

Page 39: The Semantics and Pragmatics of Natural Language

39

Final project: SEMEVAL 2022

Groups structured by 2-3 students:

- 1-2 humanists & 1 computer scientists prepare a paper

at the SEMEVAL-2022 based to their research

supervised constantly -

https://semeval.github.io/SemEval2022/tasks

Page 40: The Semantics and Pragmatics of Natural Language

40

Projects steps – next time

1. Form a team...

2. Choose a task

3. Define the teamwork

4. Establish the modular structure

5. Edit the paper – a possible structure

Page 41: The Semantics and Pragmatics of Natural Language

41

5. Edit the paper – making and outline

* Choosing a Title

* Abstract (executive summary) & Keywords

* Introduction (the new approach; background

information; research problem/question; theoretical

framework)

* SOTA (citation tracking; content alert services;

evaluating sources; primary sources; secondary sources…)

* Methodology (qualitative methods; quantitative

methods)

* Results

* Discussion

* Conclusions and future work

* References

Page 42: The Semantics and Pragmatics of Natural Language

Thank you!

42