10
MACHINE LEARNING SYSTEMS In this section we will examine machine learning and its related terms. Unlike other AI systems, machine learning had limited successes but useful demonstrations. Much of the work is still under research studies. Learning will be considered with agents in mind. Terminology in machine learning Learning Learning is the process by which an agent uses percepts to improve its ability to act in future. As a process it takes place as the agent interacts with the world, and when the agent assesses its own decision-making processes. Learning element Learning element is the part of the agent that is responsible for making improvements. Performance element Performance element is the part of an agent that selects external actions. Knowledge about learning element and some feedback on how the agent is doing are used to determine how the performance element should be modified to do better in future. Critic This is the part of the learning agent that tells the element how well the agent is doing. A fixed standard of performance may be used. This measure should possibly be conceptually outside the agent. Problem generator This is the part of the agent that suggests actions that may lead to new informative experiences. Exploratory actions are suggested. Model of a learning agent Environment Adapted from Russel & Novig P.526 Supervised learning Supervised learning is the learning situation in which both the inputs and outputs can be perceived. Sometimes a friendly teacher can supply the outputs. Sensors Feedback changes knowledge future adjustments Learning goals Effectors CRITIC LEARNING ELEMENT PROBLEM GENERATOR PERFORMANCE ELEMENT

Machine Learning Systems & Ea

Embed Size (px)

DESCRIPTION

jkl

Citation preview

  • MACHINE LEARNING SYSTEMS

    In this section we will examine machine learning and its related terms. Unlike other AI

    systems, machine learning had limited successes but useful demonstrations. Much of the

    work is still under research studies. Learning will be considered with agents in mind.

    Terminology in machine learning

    Learning Learning is the process by which an agent uses percepts to improve its ability to act in future.

    As a process it takes place as the agent interacts with the world, and when the agent assesses

    its own decision-making processes.

    Learning element

    Learning element is the part of the agent that is responsible for making improvements.

    Performance element

    Performance element is the part of an agent that selects external actions. Knowledge about

    learning element and some feedback on how the agent is doing are used to determine how the

    performance element should be modified to do better in future.

    Critic

    This is the part of the learning agent that tells the element how well the agent is doing. A

    fixed standard of performance may be used. This measure should possibly be conceptually

    outside the agent.

    Problem generator

    This is the part of the agent that suggests actions that may lead to new informative

    experiences. Exploratory actions are suggested.

    Model of a learning agent

    Environment

    Adapted from Russel & Novig P.526

    Supervised learning

    Supervised learning is the learning situation in which both the inputs and outputs can be

    perceived. Sometimes a friendly teacher can supply the outputs.

    Sensors

    Feedback

    changes

    knowledge future adjustments

    Learning goals

    Effectors

    CRITIC

    LEARNING

    ELEMENT

    PROBLEM

    GENERATOR

    PERFORMANCE

    ELEMENT

  • Reinforcement learning

    Reinforcement learning is a type of learning situation in which the agent does not know the

    outcomes but is given some form of feedback on evaluating its action. It is however not told

    the correctness of its action.

    Unsupervised learning

    Unsupervised learning is a type of learning in which the no hint is given at all about the

    correct input.

    Example

    Example is the pair (x, f(x)) where x is the input and f(x) is the output of the function applied

    to x.

    Hypothesis

    Suppose (x, f(x)) is an example, then an hypothesis, h, is an approximation of the function f.

    APPLICATIONS OF MACHINE LEARNING

    The main aim of machine learning is to make computer systems that can learn. If machines

    learn then their ability to solve problems will be enhanced considerably. In research learning

    has found applications that are related to knowledge acquisition; planning and problem

    solving. There some areas, that are side effects of research in Machine learning, that have

    seen intensive research in recent times that include data mining. Specifically some of these

    applications include:

    Where there are very many examples and we have no function to generate the outputs,

    machine learning techniques can be used to allow the system to search for suitable functions(

    hypotheses).

    Where we have massive amount of data and hidden relationships, we can use machine

    learning techniques to discover the relationships (data mining).

    Sometimes machines cannot be built to do what is required due to some limitations, if

    machines can learn then they can improve their performance.

    Where too much knowledge is available such that it is impossible for man to cope

    with it, then machines can be used to learn as much as possible.

    Environments change over time, so machines can adapt instead of re-design new ones.

    New knowledge is being discovered by humans, new vocabulary arise, new world events

    stream in and therefore new AI systems should be re-designed. Instead of doing this, learning

    systems may be built.

    (These reasons come from: Nils, J. Nilsson(1996). Introduction to Machine Learning.

    Internet)

    TECHNIQUES USED IN MACHINE LEARNING

    Machine learning depends on several methods that include induction, examples, observations,

    and neural networks.

    Induction

  • Pure inductive inference problem seeks to find a hypothesis, h, that approximates the

    function, f, given the example (x, f(x)). Consider a plot of points. The possible curves that

    can be joined suggest various functions (hypotheses, h) that can approximate the original

    function. Where there is preference to hypothesis to a given example beyond consistency, we

    say there is a bias.

    Consider an agent that has a reflex learning element that updates global variable, examples,

    and that it holds a list of pairs of (percept, action). When it is confronted with a percept and it

    is looking for an action it first checks the list. If the percept is there then it applies the action,

    otherwise it must formulate a hypothesis, h, that is used for selecting the action. If the agent

    instead of applying a new hypothesis adjusts the old hypothesis, then we say incremental

    learning occurs. The skeleton algorithms for a reflex learning agent are given below.

    Global examples {}

    Function reflex-performance-element(percept) returns an action

    If (percept, a) in examples then return a

    Else

    H induce(examples) i.e find a hypotheis based on examples

    Return H(percept)

    Procedure reflext-learning-element (percept, action)

    Inputs: percept, feedback percept

    Action, feedback action

    Examples Examples {(percept, action)}

    We consider two inductive learning methods namely decision trees and version spaces.

  • Decision trees

    In decision tree, the inputs are objects or situations described by a set of properties while

    outputs are either yes or no decisions. Each node consists of a test to the value of one of the

    properties and the branches from the nodes are labeled with possible values of test result.

    Each leaf specifies the Boolean value if that leaf is reached. An example is given below:

    None some full

  • The table is processed attribute by attribute and selecting the attribute that minimizes noise

    or maximizes information. A typical example here is ID3 algorithm.

    Applicant Annual

    income

    Assets Age Dependants Decision

    Okello 50,000 100,000 30 3 Yes

    Kamoro 70,000 None 35 1 Yes

    Mulei 40,000 None 33 2 No

    Wanjiru 30,000 250,000 42 0 Yes

    Turban &Aronson, p.507

    Yes No

    No

    Yes

    Logically: A has_assets(A) annual_income(A, >40,000) Approve_loan_for(A).

    A decision tree learning algorithm (Russel & Novig, 537)

    Function Decision-tree-learning (examples, attributes, default) returns a decision tree

    Inputs: examples, set of examples

    Attributes, set of attributes

    Default, default value for the goal predicate

    If examples is empty then return default

    Else if all examples have the same classification then return the classification

    Else if attributes is empty then return majority-value(examples)

    Else

    Best choose-attribute(attributes, examples)

    Tree a new decision tree with root Best

    For each value vi of Best do

    Examplesi {elements of examples with best = vi}

    Subtree decision-tree-learning(examplesi, attributes Best, majority-value(examples))

    add a branch to tree with label vi and subtree subtree.

    End

    Return Tree.

    Two success reports of decision tree learning

    BP deployed expert system GASOIL in 1986, for gas-oil separation for offshore platforms

    that had about 2500 rules. The attributes included relative proportions of gas, oil, and water

    and the flow rate, pressure, density, viscosity, temperature and susceptibility to waxing. The

    decision tree learning methods were applied to a database of existing designs and the system

    was developed in less time with the performance better than human experts, saving BP

    millions of dollars (Russel and Novig, P539).

    Assets

    available ?

    Annual Income

    >40,000

    No

    Yes

    Yes

  • A program was written to fly the flight simulator, by observing real flights about 30 times.

    The embedded flight simulator could now do better than human beings in that it made fewer

    mistakes.

    Versioning

    Versioning is another inductive technique that we will outline. This technique depends on

    Hypotheses which are candidate functions that may be used to estimate the actual functions.

    For instance the example above where a decision tree was used for the determining whether

    a patron will wait may have the following hypotheses:

    P willwait(P) patrons(P,Some H1

    Patrons(P,Full) Hungry(P) H2

    WaitEstimate(P,0-10) H3

    Hungry(P) Alternative(P) H4 :

    Hn

    Consider the hypothesis space { H1, H2, .. Hn}. The learning algorithm considers that one of

    the hypothesis is correct, especially the disjunction of the hypotheses: H1, H2, .. Hn

    Each of the hypothesis predicts a set of examples and this is called the extension of the

    predicate.

    False negative examples. These are examples that according to the hypothesis should be

    negative but they are actually positive.

    False positive examples. These are examples that according to the hypothesis should be

    positive but they are actually negative.

    The idea is to readjust the hypotheses so that the classifications are correct without false

    placements. There are two approaches that are used to maintain logical consistency of

    hypotheses.

    Current-best hypothesis search

    A single hypothesis is maintained and is adjusted as new examples are encountered. Where a

    hypothesis has been working well and a false negative occurs then it must be extended to

    include the example. This is called generalization. However, when the hypothesis has been

    working and a false positive occurs, then it must be minimized or cut down to exclude the

    example. This is called specialization. An algorithm is given below that describes the

    process:

    Function current-best-learning(examples) returns hypothesis

    H any hypothesis consistent with the first examples

    For each remaining example in examples do

    If e is false positive for H then

    H choose a specialization of H consistent with examples

    Else if e is false negative for H then

  • H choose a generalization of H consistent with examples

    If no consistent specialization/generalization can be found then fail

    End Return H.

    Least-commitment search

    Another technique of finding a consistent hypothesis is to start with original disjunction of all

    hypotheses: H1, H2, .. Hn It is original set that is reduced as some hypotheses that are not consistent are dropped. If this method is applied then the final set that remains is called a

    version space. Version space learning algorithm is given below:

    Function version-space-learning (examples) returns a version space

    Local variables: V, the version space- the set of all possible hypotheses

    V the set of all hypotheses

    For each example e in examples do

    If V is not empty then V Version-space-update(V,e)

    End Return V

    Function version-space-update(V,e) returns an updated version space

    V {h V: h is consistent with e}

    EXERCISES

    1. What is learning? 2. What is machine learning? 3. Define the terms performance element, critic, problem solver, supervised learning,

    reinforcement learning, unsupervised learning, example, hypothesis.

    4. Describe a model of a learning agent. 5. Discuss applications of Machine Learning. 6. Describe the techniques used in inductive learning. 7. Show how decision trees are used in learning. 8. Describe learning by versioning. 9. Investigate other areas of machine learning.

  • GENETIC ALGORITHMS AND EVOLUTIONARY ALGORITHMS.

    An evolutionary algorithm (EA) is a heuristic optimization algorithm using techniques

    inspired by mechanisms from organic evolution such as mutation, recombination, and natural

    selection to find an optimal configuration for a specific system within specific constraints.

    A genetic or evolutionary algorithm applies the principles of evolution found in nature to

    the problem of finding an optimal solution to a Solver problem. In a "genetic algorithm," the

    problem is encoded in a series of bit strings that are manipulated by the algorithm; in an

    "evolutionary algorithm," the decision variables and problem functions are used directly.

    Most commercial Solver products are based on evolutionary algorithms. An evolutionary

    algorithm for optimization is different from "classical" optimization methods in several ways:

    Random Versus Deterministic Operation

    Population Versus Single Best Solution

    Creating New Solutions Through Mutation

    Combining Solutions Through Crossover

    Selecting Solutions Via "Survival of the Fittest"

    Drawbacks of Evolutionary Algorithms

    Randomness. First, it relies in part on random sampling. This makes it a nondeterministic

    method, which may yield somewhat different solutions on different runs -- even if you

    haven't changed your model. In contrast, the linear, nonlinear and integer Solvers also

    included in the Premium Solver are deterministic methods -- they always yield the same

    solution if you start with the same values in the decision variable cells.

    Population. Second, where most classical optimization methods maintain a single best

    solution found so far, an evolutionary algorithm maintains a population of candidate

    solutions. Only one (or a few, with equivalent objectives) of these is "best," but the other

    members of the population are "sample points" in other regions of the search space, where a

    better solution may later be found.

    The use of a population of solutions helps the evolutionary algorithm avoid becoming

    "trapped" at a local optimum, when an even better optimum may be found outside the

    vicinity of the current solution.

    Mutation. Third -- inspired by the role of mutation of an organism's DNA in natural

    evolution -- an evolutionary algorithm periodically makes random changes or mutations in

    one or more members of the current population, yielding a new candidate solution (which

    may be better or worse than existing population members).

    There are many possible ways to perform a "mutation," and the Evolutionary Solver actually

    employs three different mutation strategies. The result of a mutation may be an infeasible

    solution, and the Evolutionary Solver attempts to "repair" such a solution to make it feasible;

    this is sometimes, but not always, successful.

    Crossover. Fourth -- inspired by the role of sexual reproduction in the evolution of living

    things -- an evolutionary algorithm attempts to combine elements of existing solutions in

  • order to create a new solution, with some of the features of each "parent." The elements (e.g.

    decision variable values) of existing solutions are combined in a "crossover" operation,

    inspired by the crossover of DNA strands that occurs in reproduction of biological organisms.

    As with mutation, there are many possible ways to perform a crossover operation -- some

    much better than others -- and the Evolutionary Solver actually employs multiple variations

    of two different crossover strategies.

    Selection. Fifth -- inspired by the role of natural selection in evolution -- an evolutionary

    algorithm performs a selection process in which the "most fit" members of the population

    survive, and the "least fit" members are eliminated. In a constrained optimization problem,

    the notion of "fitness" depends partly on whether a solution is feasible (i.e. whether it

    satisfies all of the constraints), and partly on its objective function value. The selection

    process is the step that guides the evolutionary algorithm towards ever-better solutions.

    Drawbacks. A drawback of any evolutionary algorithm is that a solution is "better" only in

    comparison to other, presently known solutions; such an algorithm actually has no concept

    of an "optimal solution," or any way to test whether a solution is optimal. (For this reason,

    evolutionary algorithms are best employed on problems where it is difficult or impossible to

    test for optimality.) This also means that an evolutionary algorithm never knows for certain

    when to stop, aside from the length of time, or the number of iterations or candidate

    solutions, that you wish to allow it to explore.

    APPLICATIONS

    Evolutionary algorithms often perform well approximating solutions to all types of problems

    because they ideally do not make any assumption about the underlying fitness landscape; this

    generality is shown by successes in fields as diverse as engineering, art, biology, economics,

    marketing, genetics, operations research, robotics, social sciences, physics, politics and

    chemistry.

    When are Evolutionary Algorithms Useful? (can research on other applications areas)

    Evolutionary algorithms are typically used to provide good approximate solutions to problems

    that cannot be solved easily using other techniques. Many optimisation problems fall into this

    category. It may be too computationally-intensive to find an exact solution but sometimes a

    near-optimal solution is sufficient. In these situations evolutionary techniques can be effective.

    Due to their random nature, evolutionary algorithms are never guaranteed to find an optimal

    solution for any problem, but they will often find a good solution if one exists.

    One example of this kind of optimisation problem is the challenge of timetabling. Schools

    and universities must arrange room and staff allocations to suit the needs of their curriculum.

    There are several constraints that must be satisfied. A member of staff can only be in one place

    at a time, they can only teach classes that are in their area of expertise, rooms cannot host

    lessons if they are already occupied, and classes must not clash with other classes taken by the

    same students. This is a combinatorial problem and known to be NP-Hard. It is not feasible to

    exhaustively search for the optimal timetable due to the huge amount of computation involved.

    Instead, heuristics must be used. Genetic algorithms have proven to be a successful way of

    generating satisfactory solutions to many scheduling problems.

  • Evolutionary algorithms can also be used to tackle problems that humans don't really know

    how to solve. An EA, free of any human preconceptions or biases, can generate surprising

    solutions that are comparable to, or better than, the best human-generated efforts. It is merely

    necessary that we can recognise a good solution if it were presented to us, even if we don't

    know how to create a good solution. In other words, we need to be able to formulate an effective

    fitness function.

    Engineers working for NASA know a lot about physics. They know exactly which

    characteristics make for a good communications antenna. But the process of designing an

    antenna so that it has the necessary properties is hard. Even though the engineers know what is

    required from the final antenna, they may not know how to design the antenna so that it satisfies

    those requirements.

    NASA's Evolvable Systems Group has used evolutionary algorithms to successfully evolve

    antennas for use on satellites. These evolved antennas have irregular shapes with no obvious

    symmetry (one of these antennas is pictured below). It is unlikely that a human expert would

    have arrived at such an unconventional design. Despite this, when tested these antennas proved

    to be extremely well adapted to their purpose.