Upload
dinhkhanh
View
237
Download
1
Embed Size (px)
Citation preview
Dr. Muhammad S Khan
Ashton Building, Room G22
http://www.csc.liv.ac.uk/~khan/comp218
Decision, Computation and Language
Introduction
Module Specifications
Timetable: 3 lectures per week, for 10 weeks.
Module website: http://cgi.csc.liv.ac.uk/~khan/comp218
Assessment: Written exam: 80% .
Two class tests: 10% each
Unassessed exercises from time to time; bring paper
Some handout notes, available later
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 2
Aims
This module aims to:
Introduce formal concepts of automata, grammars and languages.
Introduce ideas of computability and decidability.
Illustrate the importance of automata, formal language theory and general models of computation in Computer Science and Artificial Intelligence.
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 3
Languages
Spoken languages and programming languages: in both cases we like to know whether a sequence of symbols belongs to the language.
Formal Languages have the property that there is a precise rule that governs what strings belong to the language.
Formal languages include programming languages, database query languages, various file formats. (so, in the world of computers, they are every where...)
By contrast, English, French etc. are not formal languages, although you can still try to write down rules that work most of the time.
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 4
“valid” English sentences?
letters: a,b,c,...,z | no problem
words: dog, house, therefore, fine,... mostly we can agree on what words are real; there are too many to write down in a list
sentences? (e.g. “Colourless green ideas sleep furiously.” composed by Noam Chomsky as example of grammatical but nonsensical sentence)
But: “Furiously sleep ideas green colorless.” is not a sentence! (see wikipedia)
Can we write down a specificiation of a large collection of valid sentences...?
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 5
How we usually describe or teach syntax
Informal descriptions:
“An arithmetic expression is constructed from variables and numbers, and infix operators +, -, *, /, and sub-expressions may possibly be enclosed in parentheses, in which case every ( must have a corresponding ) ...”
“A comment begins with /* and ends with */”
“your password should have 4-8 characters and contain a non-alphabetic character”
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 6
How we usually describe or teach syntax
By examples:
Let E denote the set of arithmetic expressions
E = {x, 1 + (2 - x), x * y + ((z)), …}
a42 is a valid variable name; 42a is not, because a variable name can't start with a number.
(The variable-name description is a combination) We need some notation to express these descriptions more precisely!
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 7
An analogy
COMP109: propositional logic. Express facts about the world in formal logic. Why? So we can follow computational procedures to draw inferences from them.
COMP218: notations for representing formal languages.
These give us : ways to define them precisely
ways to build compilers that recognise the languages
ways to check whether
a string of symbols belongs to a language
whether two alternative descriptions of languages are actually the same language
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 8
What COMP218 is about
Tools to analyse languages
These tool originate in analysis of natural (spoken) language as well as programming languages.
Natural language: want to recognise valid sentence
Programming language: want to recognise valid program
No length limit on sentences, so you can't list them. A list is infeasible even you limit the length to some “reasonable” amount (e.g. 50 words), also not very enlightening.
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 9
What COMP218 is about (continued)
Some notations/methods for describing a language (other than explicitly listing it) include
finite automaton (example on board)
regular expressions
Backus-Naur form
context-free grammar (example on next slide)
We will see that some of the above can describe more languages than others.
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 10
Grammars
Sets of rules for generating syntactically correct programs/sentences.
A grammar for generating some English sentences:
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 11
Observations
The grammar can generate sentences like “the dog sees the cat and the mouse leaves the house”, which may be nonsense e.g. “the house sees me”.
Arbitrarily long sentences can be generated (which you can't do by enumeration!)
Semantics can be given with reference to grammar, e.g. logical conjunction of sub-sentences formed by word “and”
The grammar could be extended to handle subordinate clauses, adverbs etc.
You can't define natural language sentences completely this way, but you can for programming languages
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 12
Extending the grammar
Suppose we add these rules:
<S> either <S> or <S>
<S> if <S> then <S>
Do we run into problems?
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 14
Back to programming languages
Stages of compilation:
lexical analysis: divide sequence of characters into tokens, such as variable names, operators, labels. In a natural language tokens are strings of consecutive letters (easy to recognise!)
parsing: identify relationships between tokens
code generation: generate object code
code optimisation
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 15
Lexical Analysis
Pay = salary + (overtimerate*overtime);
Break into tokens as follows: pay
=
salary
+
(
overtimerate
*
overtime
)
;
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 16
Definitions and notation
An alphabet is a finite set of symbols.
A word over alphabet A is a string of symbols belonging to A.
The empty word will be denoted 𝜖
A* denotes the set of all words over A
A+ denotes the set of all non-empty words over A
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 17
Definitions and notation
The concatenation (a.k.a. "product") of two words is obtained by appending them together to form one long word.
Concatenation of words w1 and w2 can be written as w1w2.
For any word w, note that 𝑤𝜖 = 𝜖𝑤 = 𝑤.
Concatenation is associative. 𝑤𝑛 denotes the concatenation of n copies of w.
|w| denotes the length (number of letters) of w.
| w1w2 | = | w1| + | w2 |
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 18
Definitions and notation
For 𝑤 ∈ 𝐴∗, the reverse of w is denoted 𝑤𝑅 and consists of 𝑤′𝑠 letters in reverse order.
A palindrome is a word w satisfying w = 𝑤𝑅.
If u, v, w are words and w = uv then u is a prefix of w and v is a suffix of w. A proper prefix of w is a prefix that is not equal to 𝜖 𝑜𝑟 𝑤. (Similarly for proper suffix)
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 19
Languages
A language (or formal language) over alphabet A is a subset of 𝐴∗. We can express new languages in terms of other languages using concatenation and closure.
𝐿1𝐿2 = 𝑤1𝑤2: 𝑤1 ∈ 𝐿1 𝑎𝑛𝑑 𝑤2 ∈ 𝐿2
𝐿∗ = 𝑤1𝑤2…𝑤𝑛: 𝑛 ≥ 0 𝑎𝑛𝑑 𝑤1, 𝑤2, …𝑤𝑛 ∈ 𝐿
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 20
What is Automata Theory?
Study of abstract computing devices, or “machines”
Automaton = an abstract computing device
Note: A “device” need not even be a physical hardware!
A fundamental question in computer science:
Find out what different models of machines can do and cannot do
The theory of computation
Computability vs. Complexity
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 21
Alan Turing (1912-1954)
(A pioneer of automata theory)
Father of Modern Computer Science
English mathematician
Studied abstract machines called Turing machines even before computers existed
Heard of the Turing test?
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 22
Automata Theory
Automata theory : the study of abstract computing devices, or ”machines” Before computers (1930), Alan Turing studied an abstract machine (Turing machine) that had all the capabilities of today’ s computers (concerning what they could compute). His goal was to describe precisely the boundary between what a computing machine could do and what it could not do. Simpler kinds of machines (finite automata) were studied by a number of researchers and useful for a variety of purposes. Theoretical developments bear directly on what computer scientists do today:
Finite automata, formal grammars: design/ construction of software Turing machines: help us understand what we can expect from a software Theory of intractable problems: are we likely to be able to write a program to solve a given problem? Or we should try an approximation, a heuristic...
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 23
Why Study Automata Theory?
Finite automata are a useful model for many important kinds of software and hardware:
1. Software for designing and checking the behaviour of digital circuits
2. The lexical analyser of a typical compiler, that is, the compiler component that breaks the input text into logical units
3. Software for scanning large bodies of text, such as collections of Web pages, to find occurrences of words, phrases or other patterns
4. Software for verifying systems of all types that have a finite number of distinct states, such as communications protocols of protocols for secure exchange information
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 24
The Central Concepts of Automata Theory
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 25
Theory of Computation: A Historical Perspective
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 26
1930s • Alan Turing studies Turing machines
• Decidability
• Halting problem
1940-1950s • “Finite automata” machines studied
• Noam Chomsky proposes the
“Chomsky Hierarchy” for formal
languages
1969 Cook introduces “intractable” problems
or “NP-Hard” problems
1970- Modern computer science: compilers,
computational & complexity theory evolve
Languages & Grammars
Languages: “A language is a collection of sentences of finite length all constructed from a finite alphabet of symbols” Grammars: “A grammar can be regarded as a device that enumerates the sentences of a language” - nothing more, nothing less N. Chomsky, Information and Control, Vol 2, 1959
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 27
Or “words”
Image source: Nowak et al. Nature, vol 417, 2002
The Chomsky Hierarchy
A containment hierarchy of classes of formal languages
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 28
Regular
(DFA) Context-
free
(PDA)
Context-
sensitive
(LBA)
Recursively-
enumerable
(TM)
The Central Concepts of Automata Theory
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 29
Alphabet
An alphabet is a finite, non-empty set of symbols
We use the symbol ∑ (sigma) to denote an alphabet
Examples:
Binary: ∑ = {0,1}
All lower case letters: ∑ = {a,b,c,..z}
Alphanumeric: ∑ = {a-z, A-Z, 0-9}
DNA molecule letters: ∑ = {a,c,g,t}
…
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 30
Strings
A string (or sometimes a word) is a finite sequence of symbols chosen from some alphabet Example: 01101 and 111 are strings from the binary alphabet = 0,1 Empty string: the string with zero occurrences of symbols This string is denoted by 𝜖 and may be chosen from any alphabet whatsoever. Length of a string: the number of positions for symbols in the string Example: 01101 has length 5
There are only two symbols (0 and 1) in the string 01101, but 5 positions for symbols
Notation of length of w: |w| Example: |011| = 3 and |𝜖| = 0
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 31
Strings
A string or word is a finite sequence of symbols chosen from ∑
Empty string is (or “epsilon”)
Length of a string w, denoted by “|w|”, is equal to the number of (non- ) characters in the string
E.g., x = 010100 |x| = 6
x = 01 0 1 00 |x| = ?
xy = concatenation of two strings x and y
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 32
Powers of an albhabet
If ∑ is an alphabet, we can express the set of all strings of a certain length from that alphabet by using the exponential notation:
:𝑘 the set of strings of length k, each of whose is in Examples:
:0 {𝜖}, regardless of what alphabet 𝑘 is. That is 𝜖 is the only string of length 0 If = 0,1 , then:
1. =1 {0, 1}
2. =2 {00, 01, 10, 11}
3. =3 {000, 001, 010, 011, 100, 101, 110, 111} Note: confusion between and 1: 1. ∑ is an alphabet; its members 0 and 1 are symbols
2. 1 is a set of strings; its members are strings (each one of length 1)
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 33
34
Kleen star
:∗ The set of all strings over an alphabet {0, 1} = {ǫ, 0, 1, 00, 01, 10, 11, 000, . . .} = ∪0 ∪1 ∪⋯2∗ The symbol ∗ is called Kleene star and is named after the mathematician
and logician Stephen Cole Kleene. = ∪1 ∪⋯2+ . Thus: = ∪+ 𝜀∗ . Let ∑ be an alphabet.
∑k = the set of all strings of length k ∑* = ∑0 U ∑1 U ∑2 U … ∑+ = ∑1 U ∑2 U ∑3 U …
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language
operators on languages: Union
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 38
operators on languages: Concatenation
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 39
Important operators on languages: Closure
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 40
41
Languages
L is a said to be a language over alphabet ∑, only if L ∑*
this is because ∑* is the set of all strings (of all possible length including 0) over the given alphabet ∑
Examples:
1. Let L be the language of all strings consisting of n 0’s followed by n 1’s: L = {,01,0011,000111,…}
2. Let L be the language of all strings of with equal number of 0’s and 1’s:
L = {,01,10,0011,1100,0101,1010,1001,…}
Definition: Ø denotes the Empty language
Let L = {}; Is L=Ø?
NO
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language
The Membership Problem
Given a string w ∑*and a language L over ∑, decide whether or not w L.
Example:
Let w = 100011
Q) Is w the language of strings with equal number of 0s and 1s?
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 42
44
Deductive Proofs
From the given statement(s) to a conclusion statement (what we want to prove)
Logical progression by direct implications
Example for parsing a statement:
“If y≥4, then 2y≥y2.”
(there are other ways of writing this).
given conclusion
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language
45
Example: Deductive proof
Let Claim 1: If y≥4, then 2y≥y2.
Let x be any number which is obtained by adding the squares of 4 positive integers.
Given x and assuming that Claim 1 is true, prove that 2x≥x2
Proof:
1) Given: x = a2 + b2 + c2 + d2
2) Given: a≥1, b≥1, c≥1, d≥1
3) a2≥1, b2≥1, c2≥1, d2≥1 (by 2)
4) x ≥ 4 (by 1 & 3)
5) 2x ≥ x2 (by 4 and Claim 1)
“implies” or “follows”
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language
46
Quantifiers
“For all” or “For every” Universal proofs
Notation*=?
“There exists” Used in existential proofs
Notation*=?
Implication is denoted by => E.g., “IF A THEN B” can also be written as “A=>B”
*I wasn’t able to locate the symbol for these notation in powerpoint. Sorry!
Please follow the standard notation for these quantifiers.
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language
47
Proving techniques
By contradiction Start with the statement contradictory to the given statement
E.g., To prove (A => B), we start with: (A and ~B)
… and then show that could never happen What if you want to prove that “(A and B => C or D)”?
By induction (3 steps) Basis, inductive hypothesis, inductive step
By contrapositive statement If A then B ≡ If ~B then ~A
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language
Proving techniques…
By counter-example
Show an example that disproves the claim
Note: There is no such thing called a “proof by example”!
So when asked to prove a claim, an example that satisfied that claim is not a proof
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 48
49
Different ways of saying the same thing
“If H then C”: i. H implies C
ii. H => C
iii. C if H
iv. H only if C
v. Whenever H holds, C follows
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language
50
“If-and-Only-If” statements “A if and only if B” (A <==> B)
(if part) if B then A ( <= )
(only if part) A only if B ( => ) (same as “if A then B”)
“If and only if” is abbreviated as “iff” i.e., “A iff B”
Example: Theorem: Let x be a real number. Then floor of x = ceiling of x if and only if x is an integer.
Proofs for iff have two parts One for the “if part” & another for the “only if part”
M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language