Contents
• Origin and importance of Language• Linguistic Tree• Formal Definition of a Language• Is English a finite or infinite Language• How do we learn a language
Origin of Language?
• We know that we human beings have the same ancestors• We have evolved into the most intelligent beings• But we need to remember that even the languages have evolved• But can we say that in the lines of human beings all the languages
have a same ancestor?• We know that Telugu, Tamil, Kannada and Malayalam have the same
origin which is the Dravidian family of languages• Do all the languages start from a common one?
Atkinson QD, Meade A, Venditti C, Greenhill SJ, Pagel M (2008) Languages evolve in punctuational bursts. Science 319: 588.
Importance of Languages
• The emergence of the human language faculty represented one of the major transitions in the evolutions of life• It is the first time, exchange of highly complex information between
individuals has become possible• Parallels between genetic and language evolutions have been noticed
by Charles Darwin. But the issue is debatable• It is generally accepted that language has evolved and diversified
obeying mechanisms similar to those of biological evolution• The extant languages amount to a total of nearly 7000 languages and
divided into 19 linguistic familiesAtkinson QD, Meade A, Venditti C, Greenhill SJ, Pagel M (2008) Languages evolve in punctuational bursts. Science 319: 588.
Linguistic Tree• The number in red show the estimated age of the
languages
• From the diagram we can interpret that the language has originated somewhere like 8,700 years ago
• This says that our Indian languages are not older than 3000 years
• Our Indian languages are still the oldest languages in the tree
• All the languages are broadly classified as: Celtic, Italic, French/lberian, West Germanic, North Germanic, Baltic, Slavic, Indic, Iranian, Albanian, Greek, Armenian, Tocharian, Antolian
Controversies in the origin of Sanskrit• The main key to the Indo-Iranian language is the Sanskrit language• People believe that it originated around 1500 BCE (Radio Carbon
dating of the scripts prove it)• Nearly 3500 years old• But we know that during the Ramayana, Sita was residing in the
Valmiki’s ashram. Who happened to write Ramayana in Sanskrit• But estimates show that Ramayana happened nearly 1.6 lakh years
ago which contradicts our assumption
Formal Definition of Language
• Alphabet: An alphabet is a finite set of symbols. Without loss of generality, we can consider the binary alphabet, {0,1}, by enumerating the actual alphabet in binary code
• Sentence: A sentence is defined as a string of symbols. The set of allsentences over the binary alphabet is {0,1,00,01,10,11,000,...}.There are infinitely many sentences, as many as integers; the set ofall sentences is ‘countable’.
Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova
Formal Definition of a Language
• Language: A language is a set of sentences. Among all possible sentences some are part of the language and some are not. A finite language contains a finite number of sentences. An infinite language contains an infinite number of sentences.
• Grammar: A grammar is a finite list of rules specifying a language. A grammar is expressed in terms of ‘rewrite rules’: a certain string can be rewritten as another string. Strings contain elements of the alphabet together with ‘non-terminals’, which are place holders. After iterated application of the rewrite rules, the final string will only contain symbols of the alphabet.
Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova
Is English Language finite or infinite?• The number of words in English is nearly 500,000 as per the Oxford
dictionary• We also know that any alphabet can be one of the 26 characters• Practically speaking a spoke english sentence can be in the worst case
100 words (Though the number is absurd)• But if we consider written English, the length of a sentence can be
unbounded (E.g.: Given a sentence a longer one can be made by joining with other)• So if we consider written english the cardinality of the set of all english
sentence will be infinite which makes English an infinite languageComputational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova
Learning Theory of a Language
• The learner is presented with data and has to infer the rules that generate these data• The difference between ‘learning’ and ‘memorization’ is the ability to
generalize beyond one’s own experience to novel circumstances• So if we consider language: The child will generalize to novel
sentences never heard before.• Learning theory describes the mathematics of learning with the aim
of outlining conditions for successful generalization
Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova
Learning theory of a Language
• Children learn their native language by hearing grammatical sentences from their parents or others. • From this ‘environmental input’, children construct an internal
representation of the underlying grammar. • Children are not told the grammatical rules. • Neither children nor adults are ever aware of the grammatical rules
that specify their own language.
Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova
Learnability
• Imagine a speaker-hearer pair. The speaker uses grammar, G, to construct sentence of language L.• The hearer receives sentences and should after some time be able to
use grammar G to construct other sentences of L• Mathematically speaking, the hearer is described by an algorithm, A,
which takes a list of sentences as input and generates a language as output• Furthermore, a set of languages is learnable by an algorithm if each
language of this set is learnable
Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova
Learnability
• We are interested in what set of languages, L = {L1,L2,…} can be learned by a given algorithm• Gold’s theorem implies there exists no algorithm that can learn the set of
regular languages• Implies -> No algorithm can learn context-free languages, context-sensitive
languages or computable languages• Gold’s theorem formally states there exists no algorithm that can
learn a set of ‘super-finite’ languages. • Such a set includes all finite languages and at least one infinite language.
Intuitively, if the learner infers that the target language is an infinite language, whereas the actual target language is a finite language that is contained in the infinite language, then the learner will not encounter any contradicting evidence, and will never converge onto the correct language.
Learning a Language
• We might think of this in the content of infinite languages. Let us look at finite languages• In the context of statistical learning theory, the set of all finite
languages cannot be learned. • In the Gold framework, the set of all finite languages can be learned,
but only by memorization:The learner will identify the correct language only after having heardall sentences of this language
Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova
References
• Atkinson QD, Meade A, Venditti C, Greenhill SJ, Pagel M (2008) Languages evolve in punctuational bursts. Science 319: 588.• Universal Entropy of Word Ordering Across Linguistic Families.
Marcelo A. Montemurro, Zanette DH. PLoS ONE (2011), e19875. doi:10.1371/ journal.pone.0019875. • Computational and evolutionary aspects of language (2011), Martin A
Nowak, Natalia L Komarova