Machine Learning Systems & Ea

MACHINE LEARNING SYSTEMS

In this section we will examine machine learning and its related terms. Unlike other AI

systems, machine learning had limited successes but useful demonstrations. Much of the

work is still under research studies. Learning will be considered with agents in mind.

Terminology in machine learning

Learning Learning is the process by which an agent uses percepts to improve its ability to act in future.

As a process it takes place as the agent interacts with the world, and when the agent assesses

its own decision-making processes.

Learning element

Learning element is the part of the agent that is responsible for making improvements.

Performance element

Performance element is the part of an agent that selects external actions. Knowledge about

learning element and some feedback on how the agent is doing are used to determine how the

performance element should be modified to do better in future.

Critic

This is the part of the learning agent that tells the element how well the agent is doing. A

fixed standard of performance may be used. This measure should possibly be conceptually

outside the agent.

Problem generator

This is the part of the agent that suggests actions that may lead to new informative

experiences. Exploratory actions are suggested.

Model of a learning agent

Environment

Adapted from Russel & Novig P.526

Supervised learning

Supervised learning is the learning situation in which both the inputs and outputs can be

perceived. Sometimes a friendly teacher can supply the outputs.

Sensors

Feedback

changes

knowledge future adjustments

Learning goals

Effectors

CRITIC

LEARNING

ELEMENT

PROBLEM

GENERATOR

PERFORMANCE

ELEMENT

Reinforcement learning

Reinforcement learning is a type of learning situation in which the agent does not know the

outcomes but is given some form of feedback on evaluating its action. It is however not told

the correctness of its action.

Unsupervised learning

Unsupervised learning is a type of learning in which the no hint is given at all about the

correct input.

Example

Example is the pair (x, f(x)) where x is the input and f(x) is the output of the function applied

to x.

Hypothesis

Suppose (x, f(x)) is an example, then an hypothesis, h, is an approximation of the function f.

APPLICATIONS OF MACHINE LEARNING

The main aim of machine learning is to make computer systems that can learn. If machines

learn then their ability to solve problems will be enhanced considerably. In research learning

has found applications that are related to knowledge acquisition; planning and problem

solving. There some areas, that are side effects of research in Machine learning, that have

seen intensive research in recent times that include data mining. Specifically some of these

applications include:

Where there are very many examples and we have no function to generate the outputs,

machine learning techniques can be used to allow the system to search for suitable functions(

hypotheses).

Where we have massive amount of data and hidden relationships, we can use machine

learning techniques to discover the relationships (data mining).

Sometimes machines cannot be built to do what is required due to some limitations, if

machines can learn then they can improve their performance.

Where too much knowledge is available such that it is impossible for man to cope

with it, then machines can be used to learn as much as possible.

Environments change over time, so machines can adapt instead of re-design new ones.

New knowledge is being discovered by humans, new vocabulary arise, new world events

stream in and therefore new AI systems should be re-designed. Instead of doing this, learning

systems may be built.

(These reasons come from: Nils, J. Nilsson(1996). Introduction to Machine Learning.

Internet)

TECHNIQUES USED IN MACHINE LEARNING

Machine learning depends on several methods that include induction, examples, observations,

and neural networks.

Induction

Pure inductive inference problem seeks to find a hypothesis, h, that approximates the

function, f, given the example (x, f(x)). Consider a plot of points. The possible curves that

can be joined suggest various functions (hypotheses, h) that can approximate the original

function. Where there is preference to hypothesis to a given example beyond consistency, we

say there is a bias.

Consider an agent that has a reflex learning element that updates global variable, examples,

and that it holds a list of pairs of (percept, action). When it is confronted with a percept and it

is looking for an action it first checks the list. If the percept is there then it applies the action,

otherwise it must formulate a hypothesis, h, that is used for selecting the action. If the agent

instead of applying a new hypothesis adjusts the old hypothesis, then we say incremental

learning occurs. The skeleton algorithms for a reflex learning agent are given below.

Global examples {}

Function reflex-performance-element(percept) returns an action

If (percept, a) in examples then return a

Else

H induce(examples) i.e find a hypotheis based on examples

Return H(percept)

Procedure reflext-learning-element (percept, action)

Inputs: percept, feedback percept

Action, feedback action

Examples Examples {(percept, action)}

We consider two inductive learning methods namely decision trees and version spaces.

Decision trees

In decision tree, the inputs are objects or situations described by a set of properties while

outputs are either yes or no decisions. Each node consists of a test to the value of one of the

properties and the branches from the nodes are labeled with possible values of test result.

Each leaf specifies the Boolean value if that leaf is reached. An example is given below:

None some full

The table is processed attribute by attribute and selecting the attribute that minimizes noise

or maximizes information. A typical example here is ID3 algorithm.

Applicant Annual

income

Assets Age Dependants Decision

Okello 50,000 100,000 30 3 Yes

Kamoro 70,000 None 35 1 Yes

Mulei 40,000 None 33 2 No

Wanjiru 30,000 250,000 42 0 Yes

Turban &Aronson, p.507

Yes No

No

Yes

Logically: A has_assets(A) annual_income(A, >40,000) Approve_loan_for(A).

A decision tree learning algorithm (Russel & Novig, 537)

Function Decision-tree-learning (examples, attributes, default) returns a decision tree

Inputs: examples, set of examples

Attributes, set of attributes

Default, default value for the goal predicate

If examples is empty then return default

Else if all examples have the same classification then return the classification

Else if attributes is empty then return majority-value(examples)

Else

Best choose-attribute(attributes, examples)

Tree a new decision tree with root Best

For each value vi of Best do

Examplesi {elements of examples with best = vi}

Subtree decision-tree-learning(examplesi, attributes Best, majority-value(examples))

add a branch to tree with label vi and subtree subtree.

End

Return Tree.

Two success reports of decision tree learning

BP deployed expert system GASOIL in 1986, for gas-oil separation for offshore platforms

that had about 2500 rules. The attributes included relative proportions of gas, oil, and water

and the flow rate, pressure, density, viscosity, temperature and susceptibility to waxing. The

decision tree learning methods were applied to a database of existing designs and the system

was developed in less time with the performance better than human experts, saving BP

millions of dollars (Russel and Novig, P539).

Assets

available ?

Annual Income

>40,000

No

Yes

Yes

A program was written to fly the flight simulator, by observing real flights about 30 times.

The embedded flight simulator could now do better than human beings in that it made fewer

mistakes.

Versioning

Versioning is another inductive technique that we will outline. This technique depends on

Hypotheses which are candidate functions that may be used to estimate the actual functions.

For instance the example above where a decision tree was used for the determining whether

a patron will wait may have the following hypotheses:

P willwait(P) patrons(P,Some H1

Patrons(P,Full) Hungry(P) H2

WaitEstimate(P,0-10) H3

Hungry(P) Alternative(P) H4 :

Hn

Consider the hypothesis space { H1, H2, .. Hn}. The learning algorithm considers that one of

the hypothesis is correct, especially the disjunction of the hypotheses: H1, H2, .. Hn

Each of the hypothesis predicts a set of examples and this is called the extension of the

predicate.

False negative examples. These are examples that according to the hypothesis should be

negative but they are actually positive.

False positive examples. These are examples that according to the hypothesis should be

positive but they are actually negative.

The idea is to readjust the hypotheses so that the classifications are correct without false

placements. There are two approaches that are used to maintain logical consistency of

hypotheses.

Current-best hypothesis search

A single hypothesis is maintained and is adjusted as new examples are encountered. Where a

hypothesis has been working well and a false negative occurs then it must be extended to

include the example. This is called generalization. However, when the hypothesis has been

working and a false positive occurs, then it must be minimized or cut down to exclude the

example. This is called specialization. An algorithm is given below that describes the

process:

Function current-best-learning(examples) returns hypothesis

H any hypothesis consistent with the first examples

For each remaining example in examples do

If e is false positive for H then

H choose a specialization of H consistent with examples

Else if e is false negative for H then

H choose a generalization of H consistent with examples

If no consistent specialization/generalization can be found then fail

End Return H.

Least-commitment search

Another technique of finding a consistent hypothesis is to start with original disjunction of all

hypotheses: H1, H2, .. Hn It is original set that is reduced as some hypotheses that are not consistent are dropped. If this method is applied then the final set that remains is called a

version space. Version space learning algorithm is given below:

Function version-space-learning (examples) returns a version space

Local variables: V, the version space- the set of all possible hypotheses

V the set of all hypotheses

For each example e in examples do

If V is not empty then V Version-space-update(V,e)

End Return V

Function version-space-update(V,e) returns an updated version space

V {h V: h is consistent with e}

EXERCISES

1. What is learning? 2. What is machine learning? 3. Define the terms performance element, critic, problem solver, supervised learning,

reinforcement learning, unsupervised learning, example, hypothesis.

4. Describe a model of a learning agent. 5. Discuss applications of Machine Learning. 6. Describe the techniques used in inductive learning. 7. Show how decision trees are used in learning. 8. Describe learning by versioning. 9. Investigate other areas of machine learning.

GENETIC ALGORITHMS AND EVOLUTIONARY ALGORITHMS.

An evolutionary algorithm (EA) is a heuristic optimization algorithm using techniques

inspired by mechanisms from organic evolution such as mutation, recombination, and natural

selection to find an optimal configuration for a specific system within specific constraints.

A genetic or evolutionary algorithm applies the principles of evolution found in nature to

the problem of finding an optimal solution to a Solver problem. In a "genetic algorithm," the

problem is encoded in a series of bit strings that are manipulated by the algorithm; in an

"evolutionary algorithm," the decision variables and problem functions are used directly.

Most commercial Solver products are based on evolutionary algorithms. An evolutionary

algorithm for optimization is different from "classical" optimization methods in several ways:

Random Versus Deterministic Operation

Population Versus Single Best Solution

Creating New Solutions Through Mutation

Combining Solutions Through Crossover

Selecting Solutions Via "Survival of the Fittest"

Drawbacks of Evolutionary Algorithms

Randomness. First, it relies in part on random sampling. This makes it a nondeterministic

method, which may yield somewhat different solutions on different runs -- even if you

haven't changed your model. In contrast, the linear, nonlinear and integer Solvers also

included in the Premium Solver are deterministic methods -- they always yield the same

solution if you start with the same values in the decision variable cells.

Population. Second, where most classical optimization methods maintain a single best

solution found so far, an evolutionary algorithm maintains a population of candidate

solutions. Only one (or a few, with equivalent objectives) of these is "best," but the other

members of the population are "sample points" in other regions of the search space, where a

better solution may later be found.

The use of a population of solutions helps the evolutionary algorithm avoid becoming

"trapped" at a local optimum, when an even better optimum may be found outside the

vicinity of the current solution.

Mutation. Third -- inspired by the role of mutation of an organism's DNA in natural

evolution -- an evolutionary algorithm periodically makes random changes or mutations in

one or more members of the current population, yielding a new candidate solution (which

may be better or worse than existing population members).

There are many possible ways to perform a "mutation," and the Evolutionary Solver actually

employs three different mutation strategies. The result of a mutation may be an infeasible

solution, and the Evolutionary Solver attempts to "repair" such a solution to make it feasible;

this is sometimes, but not always, successful.

Crossover. Fourth -- inspired by the role of sexual reproduction in the evolution of living

things -- an evolutionary algorithm attempts to combine elements of existing solutions in

order to create a new solution, with some of the features of each "parent." The elements (e.g.

decision variable values) of existing solutions are combined in a "crossover" operation,

inspired by the crossover of DNA strands that occurs in reproduction of biological organisms.

As with mutation, there are many possible ways to perform a crossover operation -- some

much better than others -- and the Evolutionary Solver actually employs multiple variations

of two different crossover strategies.

Selection. Fifth -- inspired by the role of natural selection in evolution -- an evolutionary

algorithm performs a selection process in which the "most fit" members of the population

survive, and the "least fit" members are eliminated. In a constrained optimization problem,

the notion of "fitness" depends partly on whether a solution is feasible (i.e. whether it

satisfies all of the constraints), and partly on its objective function value. The selection

process is the step that guides the evolutionary algorithm towards ever-better solutions.

Drawbacks. A drawback of any evolutionary algorithm is that a solution is "better" only in

comparison to other, presently known solutions; such an algorithm actually has no concept

of an "optimal solution," or any way to test whether a solution is optimal. (For this reason,

evolutionary algorithms are best employed on problems where it is difficult or impossible to

test for optimality.) This also means that an evolutionary algorithm never knows for certain

when to stop, aside from the length of time, or the number of iterations or candidate

solutions, that you wish to allow it to explore.

APPLICATIONS

Evolutionary algorithms often perform well approximating solutions to all types of problems

because they ideally do not make any assumption about the underlying fitness landscape; this

generality is shown by successes in fields as diverse as engineering, art, biology, economics,

marketing, genetics, operations research, robotics, social sciences, physics, politics and

chemistry.

When are Evolutionary Algorithms Useful? (can research on other applications areas)

Evolutionary algorithms are typically used to provide good approximate solutions to problems

that cannot be solved easily using other techniques. Many optimisation problems fall into this

category. It may be too computationally-intensive to find an exact solution but sometimes a

near-optimal solution is sufficient. In these situations evolutionary techniques can be effective.

Due to their random nature, evolutionary algorithms are never guaranteed to find an optimal

solution for any problem, but they will often find a good solution if one exists.

One example of this kind of optimisation problem is the challenge of timetabling. Schools

and universities must arrange room and staff allocations to suit the needs of their curriculum.

There are several constraints that must be satisfied. A member of staff can only be in one place

at a time, they can only teach classes that are in their area of expertise, rooms cannot host

lessons if they are already occupied, and classes must not clash with other classes taken by the

same students. This is a combinatorial problem and known to be NP-Hard. It is not feasible to

exhaustively search for the optimal timetable due to the huge amount of computation involved.

Instead, heuristics must be used. Genetic algorithms have proven to be a successful way of

generating satisfactory solutions to many scheduling problems.

Evolutionary algorithms can also be used to tackle problems that humans don't really know

how to solve. An EA, free of any human preconceptions or biases, can generate surprising

solutions that are comparable to, or better than, the best human-generated efforts. It is merely

necessary that we can recognise a good solution if it were presented to us, even if we don't

know how to create a good solution. In other words, we need to be able to formulate an effective

fitness function.

Engineers working for NASA know a lot about physics. They know exactly which

characteristics make for a good communications antenna. But the process of designing an

antenna so that it has the necessary properties is hard. Even though the engineers know what is

required from the final antenna, they may not know how to design the antenna so that it satisfies

those requirements.

NASA's Evolvable Systems Group has used evolutionary algorithms to successfully evolve

antennas for use on satellites. These evolved antennas have irregular shapes with no obvious

symmetry (one of these antennas is pictured below). It is unlikely that a human expert would

have arrived at such an unconventional design. Despite this, when tested these antennas proved

to be extremely well adapted to their purpose.

Documents

Machine Learning Systems & Ea