8
i “swp0062” – 2011/11/24 – 13:53 – page 1 – #13 i i i i i Chapter 1 Introduction to Articial Intelligence In 1990 I was relatively new to the eld of Articial Intelligence (AI). At the 6th Conference on Uncertainty in Articial Intelligence, which was held at MIT, I met Eugene Charniak, who at the time was a well-known researcher in AI. During a conversation while strolling along the campus, I said, “I heard that the Japanese are applying fuzzy logic to AI.” Gene responded “I don’t believe the Japanese have been any more successful at AI than us.” This comment substantiated that which I had already begun to realize, namely that AI seemed to be a dismal failure. Dr. Charniak’s comment was correct in 1990 and it is still correct today. If we consider AI to be the development of an articial entity such as the Termi- nator in the movie by the same name or HAL in the classic sci-movie Space 1

Introduction to Ai

Embed Size (px)

DESCRIPTION

it is the introduction to artificial intelligence

Citation preview

Page 1: Introduction to Ai

i“swp0062” – 2011/11/24 – 13:53 – page 1 – #13 i

i

ii

i

Chapter 1

Introduction to ArtificialIntelligence

In 1990 I was relatively new to the field of Artificial Intelligence (AI). Atthe 6th Conference on Uncertainty in Artificial Intelligence, which was held atMIT, I met Eugene Charniak, who at the time was a well-known researcher inAI. During a conversation while strolling along the campus, I said, “I heard thatthe Japanese are applying fuzzy logic to AI.” Gene responded “I don’t believethe Japanese have been any more successful at AI than us.” This commentsubstantiated that which I had already begun to realize, namely that AI seemedto be a dismal failure.Dr. Charniak’s comment was correct in 1990 and it is still correct today. If

we consider AI to be the development of an artificial entity such as the Termi-nator in the movie by the same name or HAL in the classic sci-fi movie Space

1

Page 2: Introduction to Ai

i“swp0062” – 2011/11/24 – 13:53 – page 2 – #14 i

i

ii

i

2 CHAPTER 1. INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Odyssey, then we have not developed anything close. The Terminator and HALare artificial entities that can learn and make decisions in a complex, changingenvironment, affect that environment, and communicate their knowledge andchoices to humans. We have no such entities.So why does the field of AI persist and why was this book written? In

their efforts to develop artificial intelligence, researchers looked at the behav-ior/reasoning of intelligent entities such as humans, and developed algorithmsbased on that behavior. These algorithms have been used to solve many inter-esting problems including the development of systems that behave intelligentlyin limited domains. Such systems includes ones that can perform medical di-agnosis, diagnose problems with software, make financial decisions. navigate adifficult terrain, monitor the possible failure of a space shuttle, recognize speech,understand text, plan a trip, track a target, learn an individual’s preferencesfrom the preferences of similar individuals, learn the causal relationships amonggenes, and learn which genes affect a phenotype. This book concerns these al-gorithms and applications. Before discussing the content of this book further,we provide a brief history of AI.

1.1 History of Artificial Intelligence

We start with the well-known Turing test.

1.1.1 Turing test

Abandoning the philosophical question of what it means for an artificial entityto think or have intelligence, Alan Turing [1950] developed an empirical testof artificial intelligence, which is more appropriate to the computer scientistendeavoring to implement artificial intelligence on a computer. The Turingtest is an operational test; that is, it provides a concrete way to determinewhether the entity is intelligent. The test involves a human interrogatorwho is in one room, another human being in second room, and an artificialentity in a third room. The interrogator is allowed to communicate with boththe other human and the artificial entity only with a textual device such as aterminal. The interrogator is asked to distinguish the other human from theartificial entity based on answers to questions posed by the interrogator. Ifthe interrogator cannot do this, the Turing test is passed and we say that theartificial entity is intelligent.Note that the Turing test avoids physical interaction between the inter-

rogator and the artificial entity; the assumption is that physical interaction isnot necessary to intelligence. For example, HAL in the movie Space Odysseyis simply an entity with which the crew communicates, and HAL would passthe Turing test. If the interrogator is provided with visual information aboutthe artificial entity so that the interrogator can test the entity’s ability to per-ceive and navigate in the world, we call the test the total Turing test. TheTerminator in the movie of the same name would pass this test.

Page 3: Introduction to Ai

i“swp0062” – 2011/11/24 – 13:53 – page 3 – #15 i

i

ii

i

1.1. HISTORY OF ARTIFICIAL INTELLIGENCE 3

Figure 1.1: The Chinese room experiment.

Searle [1980] took exception to the Turing test with his Chinese roomthought experiment. The experiment proceeds as follows. Suppose that wehave successfully developed a computer program that appears to understandChinese. That is, the program takes sentences written with Chinese charactersas input, processes the characters, and outputs sentences written using Chinesecharacters. If it is able to convince a Chinese interrogator that it is a human,then the Turing test would be passed.Searle asks “does the program literally understand Chinese or is it only

simulating the ability to understand Chinese?” To address this question, Searleproposes that he could sit in a closed room holding a book with an Englishversion of the program, and adequate paper and pencils to carry out the in-structions of the program by hand. The Chinese interrogator could then provideChinese sentences through a slot in the door, Searle could process them usingthe program’s instructions, and send Chinese sentences back through the sameslot. See Figure 1.1. Searle says that he has performed the exact same taskas the computer that passed the Turing test. That is, each is following a pro-gram that simulates intelligent behavior. However, Searle notes that he doesnot speak Chinese. Therefore, since he does not understand Chinese, the rea-sonable conclusion is that the computer does not understand Chinese either.Searle argues that if the computer is not understanding the conversation, thenit is not thinking, and therefore it does not have an intelligent mind.Searle formulated the philosophical position known as strong AI, which is

as follows:

The appropriately programmed computer really is a mind, in thesense that computers given the right programs can be literally saidto understand and have other cognitive states.

- Searle 1980

Page 4: Introduction to Ai

i“swp0062” – 2011/11/24 – 13:53 – page 4 – #16 i

i

ii

i

4 CHAPTER 1. INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Based on his Chinese room experiment, Searle concludes that strong AI isnot possible. He states that “I can have any formal program you like, but I stillunderstand nothing.” Searle’s paper resulted in a great deal of controversy anddiscussion for some time to come (See for example [Harnad, 2001]).The position that computers could appear and behave intelligently, but not

necessarily understand, is called weak AI. The essence of the matter is whethera computer could actually have a mind (strong AI) or could only simulate amind (weak AI). This distinction is perhaps of greater concern to the philoso-pher who is discussing the notion of consciousness [Chalmers, 1996]. Perhapsfacetiously, a philosopher could even argue that emergentism might take placein the Chinese room experiment, and a mind might arise from Searle perform-ing all his manipulations. Practically speaking, none of this is of concern to thecomputer scientist. If the program for all purposes behaves as if it is intelligent,computer scientists have achieved their goal.

1.1.2 Emergence of AI

Initial efforts at AI involved modeling the neurons in the brain. An artificialneuron is treated as a binary variable that is switched to either on or off. Thisnotion was first proposed in [Mcculloch and Pitts, 1943], and was furtheredby Donald Hebb [1949] when he developed Hebbian learning for neuralnetworks. In 1951 Marvin Minsky and Dean Edmonds built SNARC, the firstneural network computer.Following this accomplishment and Turing’s development of the Turing test,

researchers became increasingly interested in the study of neural networks andintelligent systems, resulting in John McCarthy1 organizing a two-month work-shop involving interested researchers at Dartmouth University in 1956. Hecoined the term Artificial Intelligence at that workshop. Attendees includedMinsky, Claude Shannon (the developer of information theory), and many oth-ers. AI emerged as a new discipline whose goal was to create computer systemsthat could learn, react, and make decisions in a complex, changing environment.

1.1.3 Cognitive Science and AI

Cognitive science is the discipline that studies the mind and its processes.It concerns how information is represented and processed by the mind. It isan interdisciplinary field spanning philosophy, psychology, artificial intelligence,neuroscience, linguistics, and anthropology, and emerged as its own disciplinesomewhat concurrently with AI. Cognitive science involves empirical studies ofthe mind, whereas AI concerns the development of an artificial mind. However,owing to their related endeavors, each field is able to borrow from the other.

1 John McCarthy developed the LISP programming language for AI applications and isconsidered by many to be the father of AI.

Page 5: Introduction to Ai

i“swp0062” – 2011/11/24 – 13:53 – page 5 – #17 i

i

ii

i

1.1. HISTORY OF ARTIFICIAL INTELLIGENCE 5

1.1.4 Logical Approach to AI

Most of the early successes of AI were based on modeling human logic. In1955-56 Allen Newell and Herbert Simon developed a program called the LogicTheorist that was intended to mimic the problem solving skills of a humanbeing and is considered the first artificial intelligence program. It was able toprove 38 of the first 52 theorems in Whitehead and Russell’s Principia Math-ematica, and find shorter proofs for some of them [McCorduck, 2004]. In 1961Newell and Simon forwarded the General Problem Solver (GPS), whichwas a program intended to work as a universal problem solver machine. It’sreasoning was based on means-end analysis and the way human’s handled goalsand sub-goals while solving a problem. GPS was able to solve simple problemslike the Towers of Hanoi, but did not scale up owing to combinatorial explo-sion. In 1959 Gelernter developed the Geometry Theorem Prover, whichwas able to prove theorems in geometry.McCarthy [1958] describes a hypothetical program called theAdvice Taker.

This program was unlike previous efforts in that it was designed to accept newaxioms about the environment, and reason with them without being repro-grammed.

The main advantages we expect the advice taker to have is that itsbehaviour will be improvable merely by making statements to it,telling it about its symbolic environment and what is wanted fromit. To make these statements will require little if any knowledge ofthe program or the previous knowledge of the advice taker. Onewill be able to assume that the advice taker will have available toit a fairly wide class of immediate logical consequences of anythingit is told and its previous knowledge. This property is expected tohave much in common with what makes us describe certain humansas having common sense. We shall therefore say that a programhas common sense if it automatically deduces for itself a sufficientlywide class of immediate consequences of anything it is told and whatit already knows.

- McCarthy 1958The Advice Taker advanced an important notion in AI, namely the notion ofseparating the representation of the world (the knowledge) from the manipula-tion of the representation (the reasoning).In order to obtain a manageable grasp on developing an entity that could

reason intelligently relative to all aspects of its world, researchers developed mi-croworlds. The most well-known of these is the blocks world [Winograd, 1972],[Winston, 1973], which is discussed in Section 3.2.2. This world consists of aset of blocks placed on a table. A robot then has the task of manipulating theblocks in various ways.These early successes in AI led researchers to be very optimistic about its

future. The following is a well-known quote:

It is not my aim to surprise or shock you — but the simplest way Ican summarize is to say that there are now in the world machines

Page 6: Introduction to Ai

i“swp0062” – 2011/11/24 – 13:53 – page 6 – #18 i

i

ii

i

6 CHAPTER 1. INTRODUCTION TO ARTIFICIAL INTELLIGENCE

that think, that learn and that create. Moreover, their ability to dothese things is going to increase rapidly until — in a visible future —the range of problems they can handle will be coextensive with therange to which the human mind has been applied.

- Simon 1957However, systems that could prove theorems containing a limited number

of facts and systems that behaved well in a microworld failed to scale up to thesystems that could prove theorems involving many facts and ones that interactwith complex worlds. One reason for this is combinatorial explosion. There arerelatively few objects in a microworld and therefore there are not many possibleactions. As the number of objects increases, the complexity of the search canincrease exponentially. Another difficulty is in representing a complex worldrather than a simple microworld.

1.1.5 Knowledge-based Systems

The initial AI efforts just described concerned the development of all-purposeintelligent programs, which worked in limited domains and solved relativelysimple problems. However, these programs failed to scale up to handling diffi-cult problems. Such methods are called weak methods because of their failureto scale up (not to be confused with weak AI discussed earlier). With HAL andthe terminator nowhere in sight, many researchers turned their efforts to devel-oping useful systems that solved difficult problems in limited domains. Thesesystems used powerful, domain-specific knowledge and are called knowledge-based systems. Since they often perform the task of an expert, another termfor many such systems is expert systems. Ordinarily, they follow the ap-proach McCarthy specified for the Advice Taker. That is, the knowledge isrepresented by rules about the particular domain, and the reasoning consists ofgeneral-purpose algorithms that manipulate the rules. The details of how thisis done appear in Section 2.3.1.Successful knowledge-based systems include DENDRAL [Lindsay et al., 1980],

a system for analyzing mass spectrograms in chemistry, XCON [McDermott, 1982],a system for configuring VAX computers, and ACRONYM [Brooks, 1981], a vi-sion support system.Initially, knowledge-based systems were based on logic, performed exact

inference, and arrived at categorical conclusions. However, in many domains,in particular medicine, we cannot be certain of our conclusions.

Why are categorical decisions not sufficient for all of medicine? Be-cause the world is too complex! Although many decisions may bemade straightforwardly, many are too difficult to be prescribed inany simple matter. While many factors may enter into a decision,when these factors may themselves be uncertain, when some factorsmay become unimportant depending on other factors, and whenthere is a significant cost associated with gathering information thatmay not actually be required for the decision, then the rigidity of theflowchart makes it an inappropriate decision-making instrument.

Page 7: Introduction to Ai

i“swp0062” – 2011/11/24 – 13:53 – page 7 – #19 i

i

ii

i

1.1. HISTORY OF ARTIFICIAL INTELLIGENCE 7

- Szolovits and Pauker 1978Researchers searched for ways to incorporate uncertainty in the rules in their

knowledge-based systems. The most notable such effort was the incorporationof certainty factors in the MYCIN system [Buchanan and Shortliffe, 1984].MYCIN is a medical expert system for diagnosing bacterial infections and pre-scribing treatments for them. Certainty factors are described in the introductionto Chapter 5.

1.1.6 Probabilistic Approach to AI

Neapolitan [1990] shows that the rule-based representation of uncertain knowl-edge and reasoning is not only cumbersome and complex, but also does notmodel how humans reason very well. Pearl [1986] made the more reasonableconjecture that humans identify local probabilistic causal relationships betweenindividual propositions and reason with these relationships. At this same timeresearchers in decision analysis [Shachter, 1986] were developing influence dia-grams, which provide us with a normative decision in the face of uncertainty. Inthe 1980’s researchers from cognitive science (e.g. Judea Pearl) , computer sci-ence (e.g. Peter Cheeseman and Lotfi Zadeh), medicine (e.g. David Heckermanand Gregory Cooper), mathematics (e.g. Richard Neapolitan) and philosophy(e.g. Henry Kyburg) met at the newly formed Workshop on Uncertainty inArtificial intelligence (now a conference) to discuss how to best perform un-certain inference in artificial intelligence. The texts Probabilistic Reasoningin Intelligent Systems [Pearl, 1988] and Probabilistic Reasoning in Expert Sys-tems [Neapolitan, 1990] integrated much of the results of these discussions intothe field we now call Bayesian networks. Bayesian networks have arguablybecome the standard for handling uncertain inference in AI, and many AI ap-plications have been developed using them. Section 8.8 lists some of them.

1.1.7 Evolutionary Computation

A separate area of artificial called evolutionary computation [Frazer, 1958],[Holland, 1975], [Koza, 1992], [Fogel, 1994] emerged simultaneously with theefforts discussed so far. Evolutionary computation endeavors to obtain approx-imate solutions to problems such as optimization problems using the evolution-ary mechanisms involved in natural selection as its paradigm.

1.1.8 A Return to Creating HAL

The knowledge-based approach, the probabilistic approach, and evolutionarycomputation have resulted in many useful systems that behave intelligently orsolve problems in limited domains . Examples were provided at the beginning ofthis introduction. Nevertheless, some of the early researchers in AI, includingJohn McCarthy [2007] and Marvin Minsky [2007], felt that AI should quitfocusing on developing systems that perform specialized tasks well and returnto developing systems that think. In 2004 they held their first symposium onhuman-level AI [Minsky et al., 2004].

Page 8: Introduction to Ai

i“swp0062” – 2011/11/24 – 13:53 – page 8 – #20 i

i

ii

i

8 CHAPTER 1. INTRODUCTION TO ARTIFICIAL INTELLIGENCE

In 2007 the related field called Artificial General Intelligence (AGI)[Goertzel and Pennachin, 2007] appeared, and now has its own journal titledthe Journal of Artificial General Intelligence. Researchers in AGI are searchingfor a program that can learn and make decisions in any arbitrary environment.Another current effort at developing a thinking entity is the work of Gerry

Edelman. Edelman [2006] explains the development and organization of higherbrain functions in terms of a process known as neuronal group selection. Hecalls this model neural Darwinism. Based on this model, he has developed anumber of robot-like brain-based devices (BBDs) that interact with real-worldenvironments [Edelman, 2007]. However, they are able to navigate only inlimited domains.

1.2 Contemporary Artificial Intelligence

The efforts of the human-level AI community, the AGI community, and GerryEdelman are all vitally important if we hope to someday have an intelligententity that reasons in a changing, complex environment. However, the approachtaken in this text is to focus on the strong AI methods, which have resulted indeveloping systems that successfully solve interesting and important problemsin limited domains. We call this a contemporary approach for an artificialintelligence text.The early successes in AI were based on modeling logical reasoning. For

example, suppose Mary knows that if someone completes 120 credit hours andpasses the comprehensive exam, that person will graduate. Suppose she thenlearns that Joe completed 120 credit hours and that he did not graduate. Shereasons logically that for sure he did not pass the comprehensive exam. AImodels based on such logical reasoning is the focus of Part I of this text.As discussed above, in the 1970s it became increasingly apparent that many

judgements made by humans involve uncertain or probabilistic inference. Forexample, suppose Mary knows that studying hard will greatly increase thechances of getting an A on the exam; she also realizes that a smart person ismore likely to get an A on the exam. She learns that Joe got A. When Joe tellsher that he did not study very hard, she reasons that Joe is probably smart.By 1990 the modeling of such probabilistic inference became commonplace inAI; such inference is the focus of Part II.Intelligent behavior is not limited to human reasoning. Evolution itself

seems pretty smart in that creatures better able to adapt to their environmenttend to survive, with the result being the existence of humans themselves.Researchers in evolutionary computation have solved interesting problems basedon a model of natural selection. Such algorithms are discussed in Part III. InParts I and II we assume the system is created from knowledge extracted fromhumans, who are often experts in some domain. However, an important aspectof intelligence involves acquiring new knowledge from experience or data. PartIV concerns learning both logical and probabilistic models from data. Finally,Part V discusses an important endeavor in AI, namely language understanding.