50
Dr. Muhammad S Khan ([email protected]) Ashton Building, Room G22 http://www.csc.liv.ac.uk/~khan/comp218 Decision, Computation and Language Introduction

Decision, Computation and Language - Comp Sci, Liverpoolkhan/comp218/lectures/Introduction.pdf · Decision, Computation and Language ... Some handout notes, available later M S Khan

Embed Size (px)

Citation preview

Dr. Muhammad S Khan

([email protected])

Ashton Building, Room G22

http://www.csc.liv.ac.uk/~khan/comp218

Decision, Computation and Language

Introduction

Module Specifications

Timetable: 3 lectures per week, for 10 weeks.

Module website: http://cgi.csc.liv.ac.uk/~khan/comp218

Assessment: Written exam: 80% .

Two class tests: 10% each

Unassessed exercises from time to time; bring paper

Some handout notes, available later

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 2

Aims

This module aims to:

Introduce formal concepts of automata, grammars and languages.

Introduce ideas of computability and decidability.

Illustrate the importance of automata, formal language theory and general models of computation in Computer Science and Artificial Intelligence.

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 3

Languages

Spoken languages and programming languages: in both cases we like to know whether a sequence of symbols belongs to the language.

Formal Languages have the property that there is a precise rule that governs what strings belong to the language.

Formal languages include programming languages, database query languages, various file formats. (so, in the world of computers, they are every where...)

By contrast, English, French etc. are not formal languages, although you can still try to write down rules that work most of the time.

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 4

“valid” English sentences?

letters: a,b,c,...,z | no problem

words: dog, house, therefore, fine,... mostly we can agree on what words are real; there are too many to write down in a list

sentences? (e.g. “Colourless green ideas sleep furiously.” composed by Noam Chomsky as example of grammatical but nonsensical sentence)

But: “Furiously sleep ideas green colorless.” is not a sentence! (see wikipedia)

Can we write down a specificiation of a large collection of valid sentences...?

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 5

How we usually describe or teach syntax

Informal descriptions:

“An arithmetic expression is constructed from variables and numbers, and infix operators +, -, *, /, and sub-expressions may possibly be enclosed in parentheses, in which case every ( must have a corresponding ) ...”

“A comment begins with /* and ends with */”

“your password should have 4-8 characters and contain a non-alphabetic character”

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 6

How we usually describe or teach syntax

By examples:

Let E denote the set of arithmetic expressions

E = {x, 1 + (2 - x), x * y + ((z)), …}

a42 is a valid variable name; 42a is not, because a variable name can't start with a number.

(The variable-name description is a combination) We need some notation to express these descriptions more precisely!

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 7

An analogy

COMP109: propositional logic. Express facts about the world in formal logic. Why? So we can follow computational procedures to draw inferences from them.

COMP218: notations for representing formal languages.

These give us : ways to define them precisely

ways to build compilers that recognise the languages

ways to check whether

a string of symbols belongs to a language

whether two alternative descriptions of languages are actually the same language

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 8

What COMP218 is about

Tools to analyse languages

These tool originate in analysis of natural (spoken) language as well as programming languages.

Natural language: want to recognise valid sentence

Programming language: want to recognise valid program

No length limit on sentences, so you can't list them. A list is infeasible even you limit the length to some “reasonable” amount (e.g. 50 words), also not very enlightening.

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 9

What COMP218 is about (continued)

Some notations/methods for describing a language (other than explicitly listing it) include

finite automaton (example on board)

regular expressions

Backus-Naur form

context-free grammar (example on next slide)

We will see that some of the above can describe more languages than others.

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 10

Grammars

Sets of rules for generating syntactically correct programs/sentences.

A grammar for generating some English sentences:

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 11

Observations

The grammar can generate sentences like “the dog sees the cat and the mouse leaves the house”, which may be nonsense e.g. “the house sees me”.

Arbitrarily long sentences can be generated (which you can't do by enumeration!)

Semantics can be given with reference to grammar, e.g. logical conjunction of sub-sentences formed by word “and”

The grammar could be extended to handle subordinate clauses, adverbs etc.

You can't define natural language sentences completely this way, but you can for programming languages

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 12

Parse Tree

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 13

Extending the grammar

Suppose we add these rules:

<S> either <S> or <S>

<S> if <S> then <S>

Do we run into problems?

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 14

Back to programming languages

Stages of compilation:

lexical analysis: divide sequence of characters into tokens, such as variable names, operators, labels. In a natural language tokens are strings of consecutive letters (easy to recognise!)

parsing: identify relationships between tokens

code generation: generate object code

code optimisation

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 15

Lexical Analysis

Pay = salary + (overtimerate*overtime);

Break into tokens as follows: pay

=

salary

+

(

overtimerate

*

overtime

)

;

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 16

Definitions and notation

An alphabet is a finite set of symbols.

A word over alphabet A is a string of symbols belonging to A.

The empty word will be denoted 𝜖

A* denotes the set of all words over A

A+ denotes the set of all non-empty words over A

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 17

Definitions and notation

The concatenation (a.k.a. "product") of two words is obtained by appending them together to form one long word.

Concatenation of words w1 and w2 can be written as w1w2.

For any word w, note that 𝑤𝜖 = 𝜖𝑤 = 𝑤.

Concatenation is associative. 𝑤𝑛 denotes the concatenation of n copies of w.

|w| denotes the length (number of letters) of w.

| w1w2 | = | w1| + | w2 |

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 18

Definitions and notation

For 𝑤 ∈ 𝐴∗, the reverse of w is denoted 𝑤𝑅 and consists of 𝑤′𝑠 letters in reverse order.

A palindrome is a word w satisfying w = 𝑤𝑅.

If u, v, w are words and w = uv then u is a prefix of w and v is a suffix of w. A proper prefix of w is a prefix that is not equal to 𝜖 𝑜𝑟 𝑤. (Similarly for proper suffix)

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 19

Languages

A language (or formal language) over alphabet A is a subset of 𝐴∗. We can express new languages in terms of other languages using concatenation and closure.

𝐿1𝐿2 = 𝑤1𝑤2: 𝑤1 ∈ 𝐿1 𝑎𝑛𝑑 𝑤2 ∈ 𝐿2

𝐿∗ = 𝑤1𝑤2…𝑤𝑛: 𝑛 ≥ 0 𝑎𝑛𝑑 𝑤1, 𝑤2, …𝑤𝑛 ∈ 𝐿

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 20

What is Automata Theory?

Study of abstract computing devices, or “machines”

Automaton = an abstract computing device

Note: A “device” need not even be a physical hardware!

A fundamental question in computer science:

Find out what different models of machines can do and cannot do

The theory of computation

Computability vs. Complexity

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 21

Alan Turing (1912-1954)

(A pioneer of automata theory)

Father of Modern Computer Science

English mathematician

Studied abstract machines called Turing machines even before computers existed

Heard of the Turing test?

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 22

Automata Theory

Automata theory : the study of abstract computing devices, or ”machines” Before computers (1930), Alan Turing studied an abstract machine (Turing machine) that had all the capabilities of today’ s computers (concerning what they could compute). His goal was to describe precisely the boundary between what a computing machine could do and what it could not do. Simpler kinds of machines (finite automata) were studied by a number of researchers and useful for a variety of purposes. Theoretical developments bear directly on what computer scientists do today:

Finite automata, formal grammars: design/ construction of software Turing machines: help us understand what we can expect from a software Theory of intractable problems: are we likely to be able to write a program to solve a given problem? Or we should try an approximation, a heuristic...

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 23

Why Study Automata Theory?

Finite automata are a useful model for many important kinds of software and hardware:

1. Software for designing and checking the behaviour of digital circuits

2. The lexical analyser of a typical compiler, that is, the compiler component that breaks the input text into logical units

3. Software for scanning large bodies of text, such as collections of Web pages, to find occurrences of words, phrases or other patterns

4. Software for verifying systems of all types that have a finite number of distinct states, such as communications protocols of protocols for secure exchange information

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 24

The Central Concepts of Automata Theory

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 25

Theory of Computation: A Historical Perspective

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 26

1930s • Alan Turing studies Turing machines

• Decidability

• Halting problem

1940-1950s • “Finite automata” machines studied

• Noam Chomsky proposes the

“Chomsky Hierarchy” for formal

languages

1969 Cook introduces “intractable” problems

or “NP-Hard” problems

1970- Modern computer science: compilers,

computational & complexity theory evolve

Languages & Grammars

Languages: “A language is a collection of sentences of finite length all constructed from a finite alphabet of symbols” Grammars: “A grammar can be regarded as a device that enumerates the sentences of a language” - nothing more, nothing less N. Chomsky, Information and Control, Vol 2, 1959

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 27

Or “words”

Image source: Nowak et al. Nature, vol 417, 2002

The Chomsky Hierarchy

A containment hierarchy of classes of formal languages

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 28

Regular

(DFA) Context-

free

(PDA)

Context-

sensitive

(LBA)

Recursively-

enumerable

(TM)

The Central Concepts of Automata Theory

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 29

Alphabet

An alphabet is a finite, non-empty set of symbols

We use the symbol ∑ (sigma) to denote an alphabet

Examples:

Binary: ∑ = {0,1}

All lower case letters: ∑ = {a,b,c,..z}

Alphanumeric: ∑ = {a-z, A-Z, 0-9}

DNA molecule letters: ∑ = {a,c,g,t}

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 30

Strings

A string (or sometimes a word) is a finite sequence of symbols chosen from some alphabet Example: 01101 and 111 are strings from the binary alphabet = 0,1 Empty string: the string with zero occurrences of symbols This string is denoted by 𝜖 and may be chosen from any alphabet whatsoever. Length of a string: the number of positions for symbols in the string Example: 01101 has length 5

There are only two symbols (0 and 1) in the string 01101, but 5 positions for symbols

Notation of length of w: |w| Example: |011| = 3 and |𝜖| = 0

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 31

Strings

A string or word is a finite sequence of symbols chosen from ∑

Empty string is (or “epsilon”)

Length of a string w, denoted by “|w|”, is equal to the number of (non- ) characters in the string

E.g., x = 010100 |x| = 6

x = 01 0 1 00 |x| = ?

xy = concatenation of two strings x and y

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 32

Powers of an albhabet

If ∑ is an alphabet, we can express the set of all strings of a certain length from that alphabet by using the exponential notation:

:𝑘 the set of strings of length k, each of whose is in Examples:

:0 {𝜖}, regardless of what alphabet 𝑘 is. That is 𝜖 is the only string of length 0 If = 0,1 , then:

1. =1 {0, 1}

2. =2 {00, 01, 10, 11}

3. =3 {000, 001, 010, 011, 100, 101, 110, 111} Note: confusion between and 1: 1. ∑ is an alphabet; its members 0 and 1 are symbols

2. 1 is a set of strings; its members are strings (each one of length 1)

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 33

34

Kleen star

:∗ The set of all strings over an alphabet {0, 1} = {ǫ, 0, 1, 00, 01, 10, 11, 000, . . .} = ∪0 ∪1 ∪⋯2∗ The symbol ∗ is called Kleene star and is named after the mathematician

and logician Stephen Cole Kleene. = ∪1 ∪⋯2+ . Thus: = ∪+ 𝜀∗ . Let ∑ be an alphabet.

∑k = the set of all strings of length k ∑* = ∑0 U ∑1 U ∑2 U … ∑+ = ∑1 U ∑2 U ∑3 U …

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language

Concatenation

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 35

Languages

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 36

Other language examples

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 37

operators on languages: Union

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 38

operators on languages: Concatenation

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 39

Important operators on languages: Closure

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 40

41

Languages

L is a said to be a language over alphabet ∑, only if L ∑*

this is because ∑* is the set of all strings (of all possible length including 0) over the given alphabet ∑

Examples:

1. Let L be the language of all strings consisting of n 0’s followed by n 1’s: L = {,01,0011,000111,…}

2. Let L be the language of all strings of with equal number of 0’s and 1’s:

L = {,01,10,0011,1100,0101,1010,1001,…}

Definition: Ø denotes the Empty language

Let L = {}; Is L=Ø?

NO

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language

The Membership Problem

Given a string w ∑*and a language L over ∑, decide whether or not w L.

Example:

Let w = 100011

Q) Is w the language of strings with equal number of 0s and 1s?

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 42

Formal Proofs

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 43

44

Deductive Proofs

From the given statement(s) to a conclusion statement (what we want to prove)

Logical progression by direct implications

Example for parsing a statement:

“If y≥4, then 2y≥y2.”

(there are other ways of writing this).

given conclusion

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language

45

Example: Deductive proof

Let Claim 1: If y≥4, then 2y≥y2.

Let x be any number which is obtained by adding the squares of 4 positive integers.

Given x and assuming that Claim 1 is true, prove that 2x≥x2

Proof:

1) Given: x = a2 + b2 + c2 + d2

2) Given: a≥1, b≥1, c≥1, d≥1

3) a2≥1, b2≥1, c2≥1, d2≥1 (by 2)

4) x ≥ 4 (by 1 & 3)

5) 2x ≥ x2 (by 4 and Claim 1)

“implies” or “follows”

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language

46

Quantifiers

“For all” or “For every” Universal proofs

Notation*=?

“There exists” Used in existential proofs

Notation*=?

Implication is denoted by => E.g., “IF A THEN B” can also be written as “A=>B”

*I wasn’t able to locate the symbol for these notation in powerpoint. Sorry!

Please follow the standard notation for these quantifiers.

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language

47

Proving techniques

By contradiction Start with the statement contradictory to the given statement

E.g., To prove (A => B), we start with: (A and ~B)

… and then show that could never happen What if you want to prove that “(A and B => C or D)”?

By induction (3 steps) Basis, inductive hypothesis, inductive step

By contrapositive statement If A then B ≡ If ~B then ~A

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language

Proving techniques…

By counter-example

Show an example that disproves the claim

Note: There is no such thing called a “proof by example”!

So when asked to prove a claim, an example that satisfied that claim is not a proof

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 48

49

Different ways of saying the same thing

“If H then C”: i. H implies C

ii. H => C

iii. C if H

iv. H only if C

v. Whenever H holds, C follows

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language

50

“If-and-Only-If” statements “A if and only if B” (A <==> B)

(if part) if B then A ( <= )

(only if part) A only if B ( => ) (same as “if A then B”)

“If and only if” is abbreviated as “iff” i.e., “A iff B”

Example: Theorem: Let x be a real number. Then floor of x = ceiling of x if and only if x is an integer.

Proofs for iff have two parts One for the “if part” & another for the “only if part”

M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language