Abdul Rahim Ahmad MITM 613 Intelligent System Chapter 3: Dealing with Uncertainty

Abdul Rahim Ahmad

MITM 613Intelligent System

Chapter 3: Dealing with Uncertainty

Contents

Sources of Uncertainty

Bayesian Updating

Certainty theory

Note:Possibility Theory to be covered in another slide.

Abdul Rahim Ahmad

2

Abdul Rahim Ahmad

3


Earlier we assume that the world is a clear cut

true/false world.

Many systems used closed-world assumption – any unknown hypothesis is assumed false. Everything are either true or false.

However, in real world many things are uncertain

assumptions have to be made:

Eg: assume false if unknownAbdul Rahim Ahmad

4


3 distinct forms of uncertainty: Uncertainty in the rule

Uncertainty in the evidence

Use of vague language

Abdul Rahim Ahmad

5

Source 1 - Uncertainty in the rule

IF transducer output is low THEN water level is low

The rule above is uncertain. Why? A low level of water in the drum is not the

only possible explanation for a low transducer output.

Another cause could be that the float attached to the transducer is stuck.

What we really mean by this rule is that if the transducer output is low then the water level is probably low.

Abdul Rahim Ahmad

6

Source 2 - Uncertainty in the evidence


The evidence that the rule based on may be uncertain.

Two reasons: Evidence may come from a source not totally

reliable. (transducer output relies upon voltage measurement)

The evidence itself may have been derived by a rule whose conclusion was probable rather than certain.

Abdul Rahim Ahmad

7

Source 3 – Use of Vague language


The above rule is based around the notion of a “low” transducer output.

How low is “low” – If the output is a voltage, then we must consider whether “low” corresponds to 1mV, 1V or 1kV.

Abdul Rahim Ahmad

8

Handling Uncertainty

Uncertainty in the evidence

Abdul Rahim Ahmad

9

Bayesian updating based on probability theory, assume statistical

independence.

Certainty theory No rigorous mathematical basis, but practical in

overcoming some of the limitations of Bayesian updating.

Possibility theory, or fuzzy logic allows vague language, to be used in a

precise manner.

Use of vague language

Uncertainty in the rule

Bayesian Updating

Ascribe a probability to every hypothesis or assertion

Probabilities are updated in the light of evidence for or against a hypothesis or assertion.

Probability updating can either be: By using Bayes’ theorem directly OR

By Calculation of likelihood ratios

Abdul Rahim Ahmad

10

Bayesian Updating in Boiler Control Example Using rules:

Consider the hypothesis steam outlet blocked. In close world assumption, when no evidence hypothesis is false.

In Bayesian approach: When no evidence, a prior probability is first given to the

hypothesis that steam outlet is blocked.

When evidence 1 (release valve stuck) is encountered, the probability is updated.

When evidence 2 (steam escaping) is encountered, the probability is again update cumulatively.

Abdul Rahim Ahmad

11

/* Rule 2.4 */IF release valve stuck THEN steam outlet blocked/* Rule 2.6 */IF steam escaping THEN steam outlet blocked

Direct Application of Bayes Theorem

For Bayes theorem to be applied, the rules need to be modified a bit, for example:

Suppose that the prior probability of steam outlet blockage is 0.01 (rarely occur).

As the evidence steam escaping is observed, P(steam outlet blocked) is updated.

steam outlet blockage is a hypothesis, steam escaping is supporting evidence

Abdul Rahim Ahmad

12

/* Rule 2.6 */IF steam escaping THEN steam outlet blocked

/* Rule 2.6 */IF steam escaping THEN update P(steam outlet blockage)

Bayesian Updating

Update probability of a hypothesis P(H) in the presence of evidence E.

This is based upon Bayes’ theorem.

i.e:

Conditional probability P(H|E) of a hypothesis H given some evidence E, is given in terms of conditional probability P(E|H) the evidence E given H.

Abdul Rahim Ahmad

13

Proof of Bayes Theorem

P(H|E) is the fraction in which H is also observed from an expected population of events in which E is observed.

Similarly , thus

Replacing for P(H & E), we get the Bayes theorem:

For calculation, we use the equivalent:

where

Abdul Rahim Ahmad

14

Bayesian updating

Using this equation we can update the probability of a hypothesis H in the light of new evidence E using the knowledge of: P(H) - the current probability of the hypothesis. If this is the

first update for this hypothesis, then P(H) is the prior probability.

P(E|H) - the conditional probability that the evidence is present, given that the hypothesis is TRUE.

P(E|~H) - the conditional probability that the evidence is present, given that the hypothesis is NOT TRUE

Abdul Rahim Ahmad

15

Bayesian updating

Thus, to build a system that makes direct use of Bayes ‘ theorem: P(H), P(E|H), and P(E|~H) values for all the different

hypotheses and evidence are needed in advance.

P(E|H) and P(E|~H) can be informally estimated.

In rule 2.6a

We have some idea of how often steam is observed escaping when there is an outlet blockage P(E|H), but is less likely to know how often a steam escape is due to an outlet blockage P(H|E).

Abdul Rahim Ahmad

16

/* Rule 2.6a */IF steam escaping THEN update P(steam outlet blockage)

Bayesian Updating

Bayes’ theorem, in effect, performs abduction (i.e., determining causes) using deductive information (i.e., the likelihood of symptoms, effects, or evidence).

The premise that deductive information is more readily available than abductive information is one of the justifications for using Bayesian updating.

Abdul Rahim Ahmad

17

Likelihood ratios Likelihood ratios, provide an alternative means of

representing Bayesian updating.

The rule should be written as:

If the evidence steam escaping is observed, we can update the probability of steam outlet blockage provided we have X expressed as odds rather than a probability.

The odds O(H) is given by:

where Here, ~H means “not H”.

P(H) can also be expressed in O(H) as :Abdul Rahim Ahmad

18

IF steam escapingTHEN steam outlet blockage IS X times more likely

Likelihood ratios Examples:

If P(H) = 0.2, O(H) = 0.2/(1 – 0.2) = 0.2 / 0.8 = 0.25 (ie: “4 to 1 against”).

If P(H) = 0.8, O(H) = 0.8/(1 – 0.8) = 0.8 / 0.2 = 4 (ie: “4 to 1 on”).

If P(H) = 1, O(H) = 1/(1 – 1) = 1/0 = infinity.

As P(H) -> 1, O(H) -> infinity.

Normally, limits are often set on odds values: If O(H)>10^6 then H is true

if O(H)<10^6 then H is false. Abdul Rahim Ahmad

19

Updating Likelihoods

Using Bayes theorem for H,

Using Bayes theorem for ~H,

Dividing the first by the second,

Using the definition for odds; and subsitute from 1st two equations, we get

where

If we take , then O(H|E) = A x O(H). Abdul Rahim Ahmad

20

Updating Likelihoods

O(H|E) is the updated odds of H, given the presence of evidence E.

A is the affirms weight of evidence E.

We can also use another likelihood ratio, the denies weight D of evidence E.

The denies weight can be obtained by considering the absence of evidence, i.e., ~E:

where

Abdul Rahim Ahmad

21

Using the likelihood ratios The affirm and denies functions are as shown below. Rather than

displaying odds values, which have an infinite range, the corresponding probabilities have been shown. The weight (A or D) has been shown on a logarithmic scale over the range 0.01 to 100.

Abdul Rahim Ahmad

22

Using the likelihood ratios Equation O(H|E) = A x O(H) is used in updating our confidence in

hypothesis H in light of evidence E given A and O(H) (current odds of H).

O(H) will be at its a priori value if it has not previously been updated by other pieces of evidence.

In the case of Rule 2.6, H refers to the hypothesis steam outlet blockage and E refers to the evidence steam escaping.

The absence of evidence may reduce the likelihood of hypothesis (equivalent to the presence of opposing evidence).

The known absence of evidence is not the same as not knowing whether the evidence is present. Can be used to reduce the probability (or odds) of the hypothesis using the denies weight, DAbdul

Rahim Ahmad

23

/* Rule 2.6 */IF steam escaping THEN steam outlet blocked

IF steam escapingTHEN steam outlet blockage IS X times more likely

Using the likelihood ratios

If an evidence E has affirms weight A > 1, then its denies weight must be less than 1 and vice versa: A>1 implies D<1,

A<1 implies D>1. If A<1 and D>1, then the absence of evidence is supportive of a hypothesis.

Rule 2.7 provides an example of this, where NOT(water level low) supports the hypothesis pressure high and water level low opposes the hypothesis

Bayesian version of Rule 2.7:Abdul Rahim Ahmad

24

/* Rule 3.1 */IF temperature high (AFFIRMS 18.0; DENIES 0.11)AND water level low (AFFIRMS 0.10; DENIES 1.90) THEN pressure high

/* Rule 2.7 */IF temperature high AND NOT(water level low) THEN pressure high

Using the likelihood ratios

As with the direct application of Bayes rule, likelihood ratios have the advantage that the definitions of A and D are in terms of the conditional probability of evidence given a hypothesis P(E|H) which is more readily available than the conditional probability of a hypothesis, given the evidence, P(H|E).

Even if accurate P(E|H) is not available, Bayesian updating using likelihood ratios is still useful if A and D can be found heuristically.

Abdul Rahim Ahmad

25

Dealing with uncertain evidence

So far we assumed evidence is either definitely

present (i.e., has a probability of 1) or definitely absent (i.e., has a probability of 0).

If the probability of the evidence lies between these extremes, then the confidence in the conclusion must be scaled appropriately.

Abdul Rahim Ahmad

26

Dealing with uncertain evidence

Two reasons why the evidence may be uncertain: evidence could be generated by uncertain

rule (therefore has a probability associated with it)

evidence may be in the form of data which are not totally reliable (such as output from a sensor).

Abdul Rahim Ahmad

27

Uncertain Evidence

In terms of probabilities, we wish to calculate P(H|E), where E is uncertain. We can assume E was asserted by another rule

whose evidence was B (certain and has probability 1).

Given the evidence B, the probability of E is P(E|B). We thus need to calculate P(H|B).

An expression for P(H|B) is:

Abdul Rahim Ahmad

28

Uncertain Evidence

The expression is: OK if Bayes’ theorem is used directly

Not OK if using likelihood ratios.

Alternatively, we can modify the A and D weights to reflect the uncertainty in E by interpolating the weights linearly for 0<E<1 as seen on the next page.

Abdul Rahim Ahmad

29

Uncertain Evidence

This scaling process, shows the interpolated affirms and denies weights are given the symbols A' and D', respectively.

Abdul Rahim Ahmad

30

P(E) > 0.5, affirms weight is used

P(E) < 0.5, the denies weight is used.

Over the range of values for P(E), A' and D' vary between 1 (neutral weighting) and A and D, respectively.

Combining Evidences

How to combine several evidences supporting the same hypothesis?

If n pieces of evidence support a hypothesis H, then where

Abdul Rahim Ahmad

31

Note: Since we do not know which evidence will be available to support the hypothesis H, we need to write expressions for A covering

All possible pieces of evidence Ei.

All combinations of the pairs Ei&Ej.

All the triples Ei&Ej&Ek.

All quadruples Ei&Ej&Ek&Em

and so on.

Combining Evidences

This is unrealistic for cases of many evidences. Thus we normally assume all evidences are statistically independent (though not accurate).

If 2 evidences (E1 and E2) are statistically independent, the probability of E1 given E2 is identical to the probability of just E1 (no information about E2). i.e: and

Thus and for each piece of evidence Ei

Abdul Rahim Ahmad

32

Combining Evidences

If, in a given run of the system, n pieces of evidence are found that support or oppose H, then the updating equations are simply:

and

Abdul Rahim Ahmad

33

Combining Evidences

Interdependence of evidences is OK if the rule base is properly structured.

If evidences are dependent on each other; They should not be combined in a single rule.

Instead, assertions — and the rules that generate them — should be arranged in a hierarchy: from low-level input data to high-level conclusions,

with many levels of hypotheses between.

amount of evidence that is considered in reaching a conclusion is not limited, but interactions between evidences are controlled.

Abdul Rahim Ahmad

34

Inference Network

Use Inference networks to represent levels of assertions from input data, through intermediate deductions to final conclusions.

Abdul Rahim Ahmad

35

Each node represents either a hypothesis or a piece of evidence, and has an associated probability (not shown).

All evidence that is relevant to particular conclusions is drawn together in a single rule for each conclusion, producing a shallow network (no intermediate levels between input data and conclusions). Only reliable if there was little or no dependence between the input data.

Inference Network

Inference network that includes several intermediate steps

Abdul Rahim Ahmad

36

Note: The probabilities at each node are modified as the reasoning process proceeds, until they reach their final values.

Combining Bayesian Rules with Production Rules

In a practical rule-based system, we may wish to mix uncertain rules with production rules.

For instance, we may wish to make use of the production rule:

even though the assertion release valve is stuck may have been established with a probability less than 1.

In this case the hypothesis release valve needs cleaning can be asserted with the same probability as the evidence.

This avoids the issue of providing a prior probability for the hypothesis or a weighting for the evidence.

Abdul Rahim Ahmad

37

IF release valve is stuck THEN release valve needs cleaning

Bayesian Rules + Production Rules

If a production rule contains multiple pieces of evidence that are independent from each other, their combined probability can be derived from standard probability theory.

Consider, for example, a rule in which two pieces of independent evidence are conjoined (i.e: they are joined by AND):

The probability of hypothesis H3 is given by:Abdul Rahim Ahmad

38

IF evidence E1 AND evidence E2 THEN hypothesis H3

Bayesian Rules + Production Rules

For production rules containing independent evidence that is disjoined (i.e., joined by OR) can be treated in a similar way.

So given the rule:

the probability of hypothesis H3 is given by

Abdul Rahim Ahmad

39

IF evidence E1 AND evidence E2 THEN hypothesis H3

Working example of Bayesian Updating

See text – page 74 - 78

Abdul Rahim Ahmad

40

Advantages and disadvantages

Advantages Disadvantages• Based on proven statistical

theorem.• Clearly defined and familiar

meaning.• Use deductive probabilities,

easier to estimate• Can do sensible guesses. • Can combine evidences• uncertainty in the evidence

can be estimated by Linear interpolation of the likelihood ratios • Probability of a hypothesis

can be updated in response to more than one piece of evidence.

• Must know prior probability • Conditional probabilities

must be measured or estimated• Estimates of likelihood are

often subjective• The single probability value of an assertion.• No record is kept of

evidence• Addition of new rule requires alterations. • Assumption of statistical independence.• The linear interpolation

not mathematically justified.• Representations based on

odds

Abdul Rahim Ahmad

41

Certainty theory

An adaptation of Bayesian updating.

Overcome some of the shortcomings of Bayesian updating.

Less mathematical rigor than Bayesian updating

Abdul Rahim Ahmad

42

Making Uncertain Hypothesis Instead of using probabilities, each assertion has a certainty

value associated with it (between 1 and –1).

For a given: hypothesis H, its certainty value C(H) is given by: C(H) = 1.0 if H is known to be true;

C(H) = 0.0 if H is unknown;

C(H) = –1.0 if H is known to be false.

There is a similarity between certainty values and probabilities, such that: C(H) = 1.0 corresponds to P(H)=1.0;

C(H) = 0.0 corresponds to P(H) being at its a priori value;

C(H) = –1.0 corresponds to P(H)=0.0.

Each rule also has a certainty associated with it, certainty factor CF.

Abdul Rahim Ahmad

43

Making Uncertain Hypothesis Certainty factors serve a similar role to the affirms and

denies weightings in Bayesian systems:

Identical measures of certainty are attached to rules and hypotheses.

The certainty factor of a rule is modified to reflect the level of certainty of the evidence, such that the modified certainty factor CF is given by:

CF’ = CF x C(E)

If the evidence is known to be present, i.e., C(E) = 1, then the Equation yields CF’ = CF.

Abdul Rahim Ahmad

44

IF <evidence> THEN <hypothesis> WITH certainty factor CF

Updating Certainty

The technique for updating the certainty of hypothesis H, in the light of evidence E, involves the application of the following composite function:

Abdul Rahim Ahmad

45

where:C(H|E) is the certainty of H updated in the light of evidence E;C(H) is the initial certainty of H, i.e., 0 unless it has been updated by theprevious application of a rule;|x| = the magnitude of x, ignoring its sign.• The updating procedure

consists of adding a positive or negative value to the current certainty of a hypothesis.

• This contrasts with Bayesian updating, where the odds of a hypothesis are multiplied by the appropriate likelihood ratio

Updating Certainty

The function for Certainty Updating is similar to the Bayesian updating equation earlier

Abdul Rahim Ahmad

46

Updating certainty

In standard certainty theory, a rule can only be applied if C(E) > 0.

Some system restricts rule firing further by requiring that C(E) > 0.2 to save computational power and makes explanations clearer.

It is also possible to allow rules to fire regardless. The absence of supporting evidence, indicated by C(E) < 0, would then be taken into account since CF’ would have the opposite sign to CF.

Abdul Rahim Ahmad

47

Properties of Updating Function

Continuous and has no singularities or steps;

The updated certainty C(H|E) always lies within the bounds –1 and +1;

If either C(H) or CF’ is +1 (i.e., definitely true) then C(H|E) is also +1;

If either C(H) or CF’ is –1 (i.e., definitely false) then C(H|E) is also –1;

When contradictory conclusions are combined, they tend to cancel each other out, i.e., if C(H) = – CF’ then C(H|E) = 0;

Several pieces of independent evidence can be combined by repeated application of the function, and the outcome is independent of the order in which the pieces of evidence are applied;

If C(H) = 0, i.e., the certainty of H is at its a priori value, then C(H|E) = CF’

If the evidence is certain (i.e., C(E) = 1) then CF’ = CF.

Although not part of the standard implementation, the absence of evidence can be taken into account by allowing rules to fire when C(E) < 0.

Abdul Rahim Ahmad

48

Logical combinations of evidence

In Bayesian updating systems, each piece of evidence that contributes toward a hypothesis is assumed to be independent and is given its own affirms and denies weights

In systems based upon certainty theory, the certainty factor is associated with the rule as a whole.

Simple algorithm for determining the value of the certainty factor that should be applied when more than one item of evidence is included in a single rule.

The relationship between pieces of evidence is made explicit by the use of AND and OR.

If separate pieces of evidence are intended to contribute toward a single hypothesis independently of each other, they must be placed in separate rules.

The algorithm for combining items of evidence in a single rule is borrowed from possibility theory (Lotfi Zadeh). Abdul

Rahim Ahmad

49

Logical combinations of evidence

Abdul Rahim Ahmad

50

Working Example of Certainty Theory

Page 83 – 86 of text

Abdul Rahim Ahmad

51

Relating certainty factors to probabilities

There is a similarity between the certainty factors and the probabilities of those hypotheses. C(H) = 1.0 corresponds to P(H) = 1.0;

C(H) = 0.0 corresponds to P(H) being at its a priori value;

C(H) = –1.0 corresponds to P(H) = 0.0.

Relationship between certainty factor and P(H|E):

Abdul Rahim Ahmad

52

END

Abdul Rahim Ahmad

53

Documents

Abdul Rahim Ahmad MITM 613 Intelligent System Chapter 3: Dealing with Uncertainty