Automata Theory

From Wikipedia, the free encyclopediaIncomputer science, in the area offormal language theory, frequent use is made of a variety ofstring functions; however, the notation used is different from that used oncomputer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.Contents[hide] 1Strings and languages 2Alphabet of a string 3String substitution 4String homomorphism 5String projection 6Right quotient 7Syntactic relation 8Right cancellation 9Prefixes 10See also 11Notes 12ReferencesStrings and languages[edit]A string is a finite sequence of characters. Theempty stringis denoted by. The concatenation of two stringandis denoted by, or shorter by. Concatenating with the empty string makes no difference:. Concatenation of strings is associative:.For example,.Alanguageis a finite or infinite set of strings. Besides the usual set operations like union, intersection etc., concatenation can be applied to languages: if bothandare languages, their concatenationis defined as the set of concatenations of any string fromand any string from, formally. Again, the concatenation dotis often omitted for shortness.The languageconsisting of just the empty string is to be distinguished from the empty language. Concatenating any language with the former doesn't make any change:, while concatenating with the latter always yields the empty language:. Concatenation of languages is associative:.For example, abbreviating, the set of all three-digit decimal numbers is obtained as. The set of all decimal numbers of arbitrary length is an example for an infinite language.Alphabet of a string[edit]Thealphabet of a stringis the set of all of the characters that occur in a particular string. Ifsis a string, itsalphabetis denoted by

Thealphabet of a languageis the set of all characters that occur in any string of, formally:.For example, the setis the alphabet of the string, and theaboveis the alphabet of theabovelanguageas well as of the language of all decimal numbers.String substitution[edit]LetLbe alanguage, and let be its alphabet. Astring substitutionor simply asubstitutionis a mappingfthat maps letters in to languages (possibly in a different alphabet). Thus, for example, given a lettera , one hasf(a)=LawhereLa *is some language whose alphabet is . This mapping may be extended to strings asf()=for theempty string, andf(sa)=f(s)f(a)for stringsL. String substitutions may be extended to entire languages as[1]

Regular languagesare closed under string substitution. That is, if each letter of a regular language is substituted by another regular language, the result is still a regular language.[2]Similarly,context-free languagesare closed under string substitution.[3][note 1]A simple example is the conversionfuc(.) to upper case, which may be defined e.g. as follows:lettermapped to languageremark

xfuc(x)

a{ A }map lower-case char to corresponding upper-case char

A{ A }map upper-case char to itself

{ SS }no upper-case char available, map to two-char string

0{ }map digit to empty string

!{ }forbid punctuation, map to empty language

...similar for other chars

For the extension offucto strings, we have e.g. fuc(Strae) = {S} {T} {R} {A} {SS} {E} = {STRASSE}, fuc(u2) = {U} {} = {U}, and fuc(Go!) = {G} {O} {} = {}.For the extension offucto languages, we have e.g. fuc({ Strae, u2, Go! }) = { STRASSE } { U } { } = { STRASSE, U }.Another example is the conversion of anEBCDIC-encoded string toASCII.String homomorphism[edit]Astring homomorphism(often referred to simply as ahomomorphisminformal language theory) is a string substitution such that each letter is replaced by a single string. That is,f(a)=s, wheresis a string, for each lettera.[note 2][4]String homomorphisms aremonoid morphismson thefree monoid, preserving thebinary operationofstring concatenation. Given a languageL, the setf(L) is called thehomomorphic imageofL. Theinverse homomorphic imageof a stringsis defined asf1(s) = {w|f(w)=s}while the inverse homomorphic image of a languageLis defined asf1(L) = {s|f(s) L}In general,f(f1(L)) L, while one does havef(f1(L)) LandLf1(f(L))for any languageL.The class of regular languages is closed under homomorphisms and inverse homomorphisms.[5]Similarly, the context-free languages are closed under homomorphisms[note 3]and inverse homomorphisms.[6]A string homomorphism is said to be -free (or e-free) iff(a) for allain the alphabet . Simple single-lettersubstitution ciphersare examples of (-free) string homomorphisms.An example string homomorphismguccan also be obtained by defining similar to theabovesubstitution:guc(a) = A, ...,guc(0) = , but lettinggucundefined on punctuation chars. Examples for inverse homomorphic images are guc1({ SSS }) = { sss, s, s }, sinceguc(sss) =guc(s) =guc(s) = SSS, and guc1({ A, bb }) = { a }, sinceguc(a) = A, while bb cannot be reached byguc.For the latter language,guc(guc1({ A, bb })) =guc({ a }) = { A } { A, bb }. The homomorphismgucis not -free, since it maps e.g. 0 to .String projection[edit]Ifsis a string, andis an alphabet, thestring projectionofsis the string that results by removing all letters which are not in. It is written as. It is formally defined by removal of letters from the right hand side:

Heredenotes theempty string. The projection of a string is essentially the same as aprojection in relational algebra.String projection may be promoted to theprojection of a language. Given aformal languageL, its projection is given by

Right quotient[edit]Theright quotientof a letterafrom a stringsis the truncation of the letterain the strings, from the right hand side. It is denoted as. If the string does not haveaon the right hand side, the result is the empty string. Thus:

The quotient of the empty string may be taken:

Similarly, given a subsetof a monoid, one may define the quotient subset as

Left quotients may be defined similarly, with operations taking place on the left of a string.Syntactic relation[edit]The right quotient of a subsetof a monoiddefines anequivalence relation, called therightsyntactic relationofS. It is given by

The relation is clearly of finite index (has a finite number of equivalence classes) if and only if the family right quotients is finite; that is, if

is finite. In this case,Sis arecognizable language, that is, a language that can be recognized by afinite state automaton. This is discussed in greater detail in the article onsyntactic monoids.Right cancellation[edit]Theright cancellationof a letterafrom a stringsis the removal of the first occurrence of the letterain the strings, starting from the right hand side. It is denoted asand is recursively defined as

The empty string is always cancellable:

Clearly, right cancellation and projectioncommute:

Prefixes[edit]Theprefixes of a stringis the set of allprefixesto a string, with respect to a given language:

here.Theprefix closure of a languageis

Example:

A language is calledprefix closedif.The prefix closure operator isidempotent:

Theprefix relationis abinary relationsuch thatif and only if. This relation is a particular example of aprefix order

From Wikipedia, the free encyclopediaIncomputer science, in the area offormal language theory, frequent use is made of a variety ofstring functions; however, the notation used is different from that used oncomputer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.Contents[hide] 1Strings and languages 2Alphabet of a string 3String substitution 4String homomorphism 5String projection 6Right quotient 7Syntactic relation 8Right cancellation 9Prefixes 10See also 11Notes 12ReferencesStrings and languages[edit]A string is a finite sequence of characters. Theempty stringis denoted by. The concatenation of two stringandis denoted by, or shorter by. Concatenating with the empty string makes no difference:. Concatenation of strings is associative:.For example,.Alanguageis a finite or infinite set of strings. Besides the usual set operations like union, intersection etc., concatenation can be applied to languages: if bothandare languages, their concatenationis defined as the set of concatenations of any string fromand any string from, formally. Again, the concatenation dotis often omitted for shortness.The languageconsisting of just the empty string is to be distinguished from the empty language. Concatenating any language with the former doesn't make any change:, while concatenating with the latter always yields the empty language:. Concatenation of languages is associative:.For example, abbreviating, the set of all three-digit decimal numbers is obtained as. The set of all decimal numbers of arbitrary length is an example for an infinite language.Alphabet of a string[edit]Thealphabet of a stringis the set of all of the characters that occur in a particular string. Ifsis a string, itsalphabetis denoted by

Thealphabet of a languageis the set of all characters that occur in any string of, formally:.For example, the setis the alphabet of the string, and theaboveis the alphabet of theabovelanguageas well as of the language of all decimal numbers.String substitution[edit]LetLbe alanguage, and let be its alphabet. Astring substitutionor simply asubstitutionis a mappingfthat maps letters in to languages (possibly in a different alphabet). Thus, for example, given a lettera , one hasf(a)=LawhereLa *is some language whose alphabet is . This mapping may be extended to strings asf()=for theempty string, andf(sa)=f(s)f(a)for stringsL. String substitutions may be extended to entire languages as[1]

Regular languagesare closed under string substitution. That is, if each letter of a regular language is substituted by another regular language, the result is still a regular language.[2]Similarly,context-free languagesare closed under string substitution.[3][note 1]A simple example is the conversionfuc(.) to upper case, which may be defined e.g. as follows:lettermapped to languageremark

xfuc(x)

a{ A }map lower-case char to corresponding upper-case char

A{ A }map upper-case char to itself

{ SS }no upper-case char available, map to two-char string

0{ }map digit to empty string

!{ }forbid punctuation, map to empty language

...similar for other chars

For the extension offucto strings, we have e.g. fuc(Strae) = {S} {T} {R} {A} {SS} {E} = {STRASSE}, fuc(u2) = {U} {} = {U}, and fuc(Go!) = {G} {O} {} = {}.For the extension offucto languages, we have e.g. fuc({ Strae, u2, Go! }) = { STRASSE } { U } { } = { STRASSE, U }.Another example is the conversion of anEBCDIC-encoded string toASCII.String homomorphism[edit]Astring homomorphism(often referred to simply as ahomomorphisminformal language theory) is a string substitution such that each letter is replaced by a single string. That is,f(a)=s, wheresis a string, for each lettera.[note 2][4]String homomorphisms aremonoid morphismson thefree monoid, preserving thebinary operationofstring concatenation. Given a languageL, the setf(L) is called thehomomorphic imageofL. Theinverse homomorphic imageof a stringsis defined asf1(s) = {w|f(w)=s}while the inverse homomorphic image of a languageLis defined asf1(L) = {s|f(s) L}In general,f(f1(L)) L, while one does havef(f1(L)) LandLf1(f(L))for any languageL.The class of regular languages is closed under homomorphisms and inverse homomorphisms.[5]Similarly, the context-free languages are closed under homomorphisms[note 3]and inverse homomorphisms.[6]A string homomorphism is said to be -free (or e-free) iff(a) for allain the alphabet . Simple single-lettersubstitution ciphersare examples of (-free) string homomorphisms.An example string homomorphismguccan also be obtained by defining similar to theabovesubstitution:guc(a) = A, ...,guc(0) = , but lettinggucundefined on punctuation chars. Examples for inverse homomorphic images are guc1({ SSS }) = { sss, s, s }, sinceguc(sss) =guc(s) =guc(s) = SSS, and guc1({ A, bb }) = { a }, sinceguc(a) = A, while bb cannot be reached byguc.For the latter language,guc(guc1({ A, bb })) =guc({ a }) = { A } { A, bb }. The homomorphismgucis not -free, since it maps e.g. 0 to .String projection[edit]Ifsis a string, andis an alphabet, thestring projectionofsis the string that results by removing all letters which are not in. It is written as. It is formally defined by removal of letters from the right hand side:

Heredenotes theempty string. The projection of a string is essentially the same as aprojection in relational algebra.String projection may be promoted to theprojection of a language. Given aformal languageL, its projection is given by

Right quotient[edit]Theright quotientof a letterafrom a stringsis the truncation of the letterain the strings, from the right hand side. It is denoted as. If the string does not haveaon the right hand side, the result is the empty string. Thus:

The quotient of the empty string may be taken:

Similarly, given a subsetof a monoid, one may define the quotient subset as

Left quotients may be defined similarly, with operations taking place on the left of a string.Syntactic relation[edit]The right quotient of a subsetof a monoiddefines anequivalence relation, called therightsyntactic relationofS. It is given by

The relation is clearly of finite index (has a finite number of equivalence classes) if and only if the family right quotients is finite; that is, if

is finite. In this case,Sis arecognizable language, that is, a language that can be recognized by afinite state automaton. This is discussed in greater detail in the article onsyntactic monoids.Right cancellation[edit]Theright cancellationof a letterafrom a stringsis the removal of the first occurrence of the letterain the strings, starting from the right hand side. It is denoted asand is recursively defined as

The empty string is always cancellable:

Clearly, right cancellation and projectioncommute:

Prefixes[edit]Theprefixes of a stringis the set of allprefixesto a string, with respect to a given language:

here.Theprefix closure of a languageis

Example:

A language is calledprefix closedif.The prefix closure operator isidempotent:

Theprefix relationis abinary relationsuch thatif and only if. This relation is a particular example of aprefix order

CmSc 365 Theory of Computation

Alphabets and Languages Definitions Operations on strings Languages Operations on languages ProblemsLearning goalsExam-like problems

1. Some definiions and propertiesAlphabet: A finite set of symbols.E.G {a,b,c,x.y.z}. {0,1},{0,1,2,3,4,5,6,7,8,9}Stringover an alphabet: A finite sequence of symbols from the alphabetE.G.: thisisastring - over {a,b,c,,z}01011 - over {0,1}3786 - over {0,1,2,3,4,5,6,7,8,9}A string with one symbol only = symbol itselfEmpty string- no symbols, notation:eNote: we use the letters a,b,c,, w,x,y,z both for naming strings and for writing instances of strings.Usually for names of strings we use the last letters: w, x,y,zThus x = abc means thatabcis a string and we call itx.Length of a string- its length as a sequence (the number of symbols)ifw= abcd, |w| = 4Ifw= classroom, |w| = 9We can match a position in a string with the symbol there:Ifw= classroom,w(3) = a,w(4) = s, andw(5) = sTo be able to distinguish between same symbols, we refer to them as different occurrences of the same symbol.2. Operations on stringsConcatenation:combines two strings by putting them one after the other.E.Gx= abc,y= mnop, thenx y= abcmnop, or simplyxy= abcmnopThe concatenation of the empty string with any other string gives the string itself:xe=ex = xSubstring:Ifwis a string, thenvis a substring ofwif there exist stringsxandysuch thatw = xvyxis calledprefix, andyis calledsuffixofwThe i-th concatenation of a string with itself is defined in the following way:w0=ewi+1= wi wfor each i 0.Sow1= w, bang2= bangbangKleene star operationon strings: Letwbe a string.w*is the set of strings obtainedby applying any number of concatenations ofwwith itself, including the empty string.Example:a* = { e, a, aa, aaa, aaaa, aaaaa, }Reversalof a stringwdenotedwRis the string spelled backwardsFormal definition:Ifwis a string of length 0, thenwR= w =eIfwis a string of length n+1 > 0, thenw = uafor some a, andwR= a uR.3. LanguagesIf is an alphabet, then * is the set of all strings over .Language:any set of strings over an alphabet , i.e. any subset of *.* is a countably infinite set. Its elements can be ordered in the following way:a. The alphabet is a finite set, so we can order the symbols in some way.b. The set * can be partitioned into disjoint sets with respect to the length of the strings (there are infinite number of strings, however each string has a finite length)c. For eachk 0 first we enumerate all strings of lengthkbefore all strings of lengthk+1. This means that we first order the strings of length 0 (this is the empty string), then strings of length 1, then of length 2, etc.d. Strings of length k, denoted as nkare enumerated lexicographically :ai1ai2aikprecedesaj1aj2ajkif for somem, 0 m k-1, we have ih=jhfor h = 1,,mand im+1< jm+1. Note that if ih=jhmeans that aihis the same as ajh.4. Operations on languagesLanguages are sets, so the operations union, intersection and difference are applicable. There are two operations specific for languages:Concatenation of languagesConcatenation of languages is defined in the following way:If L1and L2are languages, then L = L1L2(or simply L1L2) is the set:L = {w*:w = xy , xL1, yL2}i.e. L consists of all possible concatenations between strings in L1and strings in L2.Concatenation of languages corresponds to the Cartesian product of sets.Kleene starof a language L: the set of all strings obtained by concatenating zero or more strings from L. It is denoted by L*.If we consider as a language, then * would be the Kleene star of that language.5. Problemsa. Is the set of all possible meaningful English sentences countable?b. Is the set of all possible meaningless English sentences countable?c. Define the relation

Documents

Automata Theory