Upload
darren-cannon
View
229
Download
0
Embed Size (px)
Citation preview
Abdul Rahim Ahmad
MITM 613Intelligent System
Chapter 3: Dealing with Uncertainty
Contents
Sources of Uncertainty
Bayesian Updating
Certainty theory
Note:Possibility Theory to be covered in another slide.
Abdul Rahim Ahmad
2
Abdul Rahim Ahmad
3
Sources of Uncertainty
Earlier we assume that the world is a clear cut
true/false world.
Many systems used closed-world assumption – any unknown hypothesis is assumed false. Everything are either true or false.
However, in real world many things are uncertain
assumptions have to be made:
Eg: assume false if unknownAbdul Rahim Ahmad
4
Sources of Uncertainty
3 distinct forms of uncertainty: Uncertainty in the rule
Uncertainty in the evidence
Use of vague language
Abdul Rahim Ahmad
5
Source 1 - Uncertainty in the rule
IF transducer output is low THEN water level is low
The rule above is uncertain. Why? A low level of water in the drum is not the
only possible explanation for a low transducer output.
Another cause could be that the float attached to the transducer is stuck.
What we really mean by this rule is that if the transducer output is low then the water level is probably low.
Abdul Rahim Ahmad
6
Source 2 - Uncertainty in the evidence
IF transducer output is low THEN water level is low
The evidence that the rule based on may be uncertain.
Two reasons: Evidence may come from a source not totally
reliable. (transducer output relies upon voltage measurement)
The evidence itself may have been derived by a rule whose conclusion was probable rather than certain.
Abdul Rahim Ahmad
7
Source 3 – Use of Vague language
IF transducer output is low THEN water level is low
The above rule is based around the notion of a “low” transducer output.
How low is “low” – If the output is a voltage, then we must consider whether “low” corresponds to 1mV, 1V or 1kV.
Abdul Rahim Ahmad
8
Handling Uncertainty
Uncertainty in the evidence
Abdul Rahim Ahmad
9
Bayesian updating based on probability theory, assume statistical
independence.
Certainty theory No rigorous mathematical basis, but practical in
overcoming some of the limitations of Bayesian updating.
Possibility theory, or fuzzy logic allows vague language, to be used in a
precise manner.
Use of vague language
Uncertainty in the rule
Bayesian Updating
Ascribe a probability to every hypothesis or assertion
Probabilities are updated in the light of evidence for or against a hypothesis or assertion.
Probability updating can either be: By using Bayes’ theorem directly OR
By Calculation of likelihood ratios
Abdul Rahim Ahmad
10
Bayesian Updating in Boiler Control Example Using rules:
Consider the hypothesis steam outlet blocked. In close world assumption, when no evidence hypothesis is false.
In Bayesian approach: When no evidence, a prior probability is first given to the
hypothesis that steam outlet is blocked.
When evidence 1 (release valve stuck) is encountered, the probability is updated.
When evidence 2 (steam escaping) is encountered, the probability is again update cumulatively.
Abdul Rahim Ahmad
11
/* Rule 2.4 */IF release valve stuck THEN steam outlet blocked/* Rule 2.6 */IF steam escaping THEN steam outlet blocked
Direct Application of Bayes Theorem
For Bayes theorem to be applied, the rules need to be modified a bit, for example:
Suppose that the prior probability of steam outlet blockage is 0.01 (rarely occur).
As the evidence steam escaping is observed, P(steam outlet blocked) is updated.
steam outlet blockage is a hypothesis, steam escaping is supporting evidence
Abdul Rahim Ahmad
12
/* Rule 2.6 */IF steam escaping THEN steam outlet blocked
/* Rule 2.6 */IF steam escaping THEN update P(steam outlet blockage)
Bayesian Updating
Update probability of a hypothesis P(H) in the presence of evidence E.
This is based upon Bayes’ theorem.
i.e:
Conditional probability P(H|E) of a hypothesis H given some evidence E, is given in terms of conditional probability P(E|H) the evidence E given H.
Abdul Rahim Ahmad
13
Proof of Bayes Theorem
P(H|E) is the fraction in which H is also observed from an expected population of events in which E is observed.
Similarly , thus
Replacing for P(H & E), we get the Bayes theorem:
For calculation, we use the equivalent:
where
Abdul Rahim Ahmad
14
Bayesian updating
Using this equation we can update the probability of a hypothesis H in the light of new evidence E using the knowledge of: P(H) - the current probability of the hypothesis. If this is the
first update for this hypothesis, then P(H) is the prior probability.
P(E|H) - the conditional probability that the evidence is present, given that the hypothesis is TRUE.
P(E|~H) - the conditional probability that the evidence is present, given that the hypothesis is NOT TRUE
Abdul Rahim Ahmad
15
Bayesian updating
Thus, to build a system that makes direct use of Bayes ‘ theorem: P(H), P(E|H), and P(E|~H) values for all the different
hypotheses and evidence are needed in advance.
P(E|H) and P(E|~H) can be informally estimated.
In rule 2.6a
We have some idea of how often steam is observed escaping when there is an outlet blockage P(E|H), but is less likely to know how often a steam escape is due to an outlet blockage P(H|E).
Abdul Rahim Ahmad
16
/* Rule 2.6a */IF steam escaping THEN update P(steam outlet blockage)
Bayesian Updating
Bayes’ theorem, in effect, performs abduction (i.e., determining causes) using deductive information (i.e., the likelihood of symptoms, effects, or evidence).
The premise that deductive information is more readily available than abductive information is one of the justifications for using Bayesian updating.
Abdul Rahim Ahmad
17
Likelihood ratios Likelihood ratios, provide an alternative means of
representing Bayesian updating.
The rule should be written as:
If the evidence steam escaping is observed, we can update the probability of steam outlet blockage provided we have X expressed as odds rather than a probability.
The odds O(H) is given by:
where Here, ~H means “not H”.
P(H) can also be expressed in O(H) as :Abdul Rahim Ahmad
18
IF steam escapingTHEN steam outlet blockage IS X times more likely
Likelihood ratios Examples:
If P(H) = 0.2, O(H) = 0.2/(1 – 0.2) = 0.2 / 0.8 = 0.25 (ie: “4 to 1 against”).
If P(H) = 0.8, O(H) = 0.8/(1 – 0.8) = 0.8 / 0.2 = 4 (ie: “4 to 1 on”).
If P(H) = 1, O(H) = 1/(1 – 1) = 1/0 = infinity.
As P(H) -> 1, O(H) -> infinity.
Normally, limits are often set on odds values: If O(H)>10^6 then H is true
if O(H)<10^6 then H is false. Abdul Rahim Ahmad
19
Updating Likelihoods
Using Bayes theorem for H,
Using Bayes theorem for ~H,
Dividing the first by the second,
Using the definition for odds; and subsitute from 1st two equations, we get
where
If we take , then O(H|E) = A x O(H). Abdul Rahim Ahmad
20
Updating Likelihoods
O(H|E) is the updated odds of H, given the presence of evidence E.
A is the affirms weight of evidence E.
We can also use another likelihood ratio, the denies weight D of evidence E.
The denies weight can be obtained by considering the absence of evidence, i.e., ~E:
where
Abdul Rahim Ahmad
21
Using the likelihood ratios The affirm and denies functions are as shown below. Rather than
displaying odds values, which have an infinite range, the corresponding probabilities have been shown. The weight (A or D) has been shown on a logarithmic scale over the range 0.01 to 100.
Abdul Rahim Ahmad
22
Using the likelihood ratios Equation O(H|E) = A x O(H) is used in updating our confidence in
hypothesis H in light of evidence E given A and O(H) (current odds of H).
O(H) will be at its a priori value if it has not previously been updated by other pieces of evidence.
In the case of Rule 2.6, H refers to the hypothesis steam outlet blockage and E refers to the evidence steam escaping.
The absence of evidence may reduce the likelihood of hypothesis (equivalent to the presence of opposing evidence).
The known absence of evidence is not the same as not knowing whether the evidence is present. Can be used to reduce the probability (or odds) of the hypothesis using the denies weight, DAbdul
Rahim Ahmad
23
/* Rule 2.6 */IF steam escaping THEN steam outlet blocked
IF steam escapingTHEN steam outlet blockage IS X times more likely
Using the likelihood ratios
If an evidence E has affirms weight A > 1, then its denies weight must be less than 1 and vice versa: A>1 implies D<1,
A<1 implies D>1. If A<1 and D>1, then the absence of evidence is supportive of a hypothesis.
Rule 2.7 provides an example of this, where NOT(water level low) supports the hypothesis pressure high and water level low opposes the hypothesis
Bayesian version of Rule 2.7:Abdul Rahim Ahmad
24
/* Rule 3.1 */IF temperature high (AFFIRMS 18.0; DENIES 0.11)AND water level low (AFFIRMS 0.10; DENIES 1.90) THEN pressure high
/* Rule 2.7 */IF temperature high AND NOT(water level low) THEN pressure high
Using the likelihood ratios
As with the direct application of Bayes rule, likelihood ratios have the advantage that the definitions of A and D are in terms of the conditional probability of evidence given a hypothesis P(E|H) which is more readily available than the conditional probability of a hypothesis, given the evidence, P(H|E).
Even if accurate P(E|H) is not available, Bayesian updating using likelihood ratios is still useful if A and D can be found heuristically.
Abdul Rahim Ahmad
25
Dealing with uncertain evidence
So far we assumed evidence is either definitely
present (i.e., has a probability of 1) or definitely absent (i.e., has a probability of 0).
If the probability of the evidence lies between these extremes, then the confidence in the conclusion must be scaled appropriately.
Abdul Rahim Ahmad
26
Dealing with uncertain evidence
Two reasons why the evidence may be uncertain: evidence could be generated by uncertain
rule (therefore has a probability associated with it)
evidence may be in the form of data which are not totally reliable (such as output from a sensor).
Abdul Rahim Ahmad
27
Uncertain Evidence
In terms of probabilities, we wish to calculate P(H|E), where E is uncertain. We can assume E was asserted by another rule
whose evidence was B (certain and has probability 1).
Given the evidence B, the probability of E is P(E|B). We thus need to calculate P(H|B).
An expression for P(H|B) is:
Abdul Rahim Ahmad
28
Uncertain Evidence
The expression is: OK if Bayes’ theorem is used directly
Not OK if using likelihood ratios.
Alternatively, we can modify the A and D weights to reflect the uncertainty in E by interpolating the weights linearly for 0<E<1 as seen on the next page.
Abdul Rahim Ahmad
29
Uncertain Evidence
This scaling process, shows the interpolated affirms and denies weights are given the symbols A' and D', respectively.
Abdul Rahim Ahmad
30
P(E) > 0.5, affirms weight is used
P(E) < 0.5, the denies weight is used.
Over the range of values for P(E), A' and D' vary between 1 (neutral weighting) and A and D, respectively.
Combining Evidences
How to combine several evidences supporting the same hypothesis?
If n pieces of evidence support a hypothesis H, then where
Abdul Rahim Ahmad
31
Note: Since we do not know which evidence will be available to support the hypothesis H, we need to write expressions for A covering
All possible pieces of evidence Ei.
All combinations of the pairs Ei&Ej.
All the triples Ei&Ej&Ek.
All quadruples Ei&Ej&Ek&Em
and so on.
Combining Evidences
This is unrealistic for cases of many evidences. Thus we normally assume all evidences are statistically independent (though not accurate).
If 2 evidences (E1 and E2) are statistically independent, the probability of E1 given E2 is identical to the probability of just E1 (no information about E2). i.e: and
Thus and for each piece of evidence Ei
Abdul Rahim Ahmad
32
Combining Evidences
If, in a given run of the system, n pieces of evidence are found that support or oppose H, then the updating equations are simply:
and
Abdul Rahim Ahmad
33
Combining Evidences
Interdependence of evidences is OK if the rule base is properly structured.
If evidences are dependent on each other; They should not be combined in a single rule.
Instead, assertions — and the rules that generate them — should be arranged in a hierarchy: from low-level input data to high-level conclusions,
with many levels of hypotheses between.
amount of evidence that is considered in reaching a conclusion is not limited, but interactions between evidences are controlled.
Abdul Rahim Ahmad
34
Inference Network
Use Inference networks to represent levels of assertions from input data, through intermediate deductions to final conclusions.
Abdul Rahim Ahmad
35
Each node represents either a hypothesis or a piece of evidence, and has an associated probability (not shown).
All evidence that is relevant to particular conclusions is drawn together in a single rule for each conclusion, producing a shallow network (no intermediate levels between input data and conclusions). Only reliable if there was little or no dependence between the input data.
Inference Network
Inference network that includes several intermediate steps
Abdul Rahim Ahmad
36
Note: The probabilities at each node are modified as the reasoning process proceeds, until they reach their final values.
Combining Bayesian Rules with Production Rules
In a practical rule-based system, we may wish to mix uncertain rules with production rules.
For instance, we may wish to make use of the production rule:
even though the assertion release valve is stuck may have been established with a probability less than 1.
In this case the hypothesis release valve needs cleaning can be asserted with the same probability as the evidence.
This avoids the issue of providing a prior probability for the hypothesis or a weighting for the evidence.
Abdul Rahim Ahmad
37
IF release valve is stuck THEN release valve needs cleaning
Bayesian Rules + Production Rules
If a production rule contains multiple pieces of evidence that are independent from each other, their combined probability can be derived from standard probability theory.
Consider, for example, a rule in which two pieces of independent evidence are conjoined (i.e: they are joined by AND):
The probability of hypothesis H3 is given by:Abdul Rahim Ahmad
38
IF evidence E1 AND evidence E2 THEN hypothesis H3
Bayesian Rules + Production Rules
For production rules containing independent evidence that is disjoined (i.e., joined by OR) can be treated in a similar way.
So given the rule:
the probability of hypothesis H3 is given by
Abdul Rahim Ahmad
39
IF evidence E1 AND evidence E2 THEN hypothesis H3
Working example of Bayesian Updating
See text – page 74 - 78
Abdul Rahim Ahmad
40
Advantages and disadvantages
Advantages Disadvantages• Based on proven statistical
theorem.• Clearly defined and familiar
meaning.• Use deductive probabilities,
easier to estimate• Can do sensible guesses. • Can combine evidences• uncertainty in the evidence
can be estimated by Linear interpolation of the likelihood ratios • Probability of a hypothesis
can be updated in response to more than one piece of evidence.
• Must know prior probability • Conditional probabilities
must be measured or estimated• Estimates of likelihood are
often subjective• The single probability value of an assertion.• No record is kept of
evidence• Addition of new rule requires alterations. • Assumption of statistical independence.• The linear interpolation
not mathematically justified.• Representations based on
odds
Abdul Rahim Ahmad
41
Certainty theory
An adaptation of Bayesian updating.
Overcome some of the shortcomings of Bayesian updating.
Less mathematical rigor than Bayesian updating
Abdul Rahim Ahmad
42
Making Uncertain Hypothesis Instead of using probabilities, each assertion has a certainty
value associated with it (between 1 and –1).
For a given: hypothesis H, its certainty value C(H) is given by: C(H) = 1.0 if H is known to be true;
C(H) = 0.0 if H is unknown;
C(H) = –1.0 if H is known to be false.
There is a similarity between certainty values and probabilities, such that: C(H) = 1.0 corresponds to P(H)=1.0;
C(H) = 0.0 corresponds to P(H) being at its a priori value;
C(H) = –1.0 corresponds to P(H)=0.0.
Each rule also has a certainty associated with it, certainty factor CF.
Abdul Rahim Ahmad
43
Making Uncertain Hypothesis Certainty factors serve a similar role to the affirms and
denies weightings in Bayesian systems:
Identical measures of certainty are attached to rules and hypotheses.
The certainty factor of a rule is modified to reflect the level of certainty of the evidence, such that the modified certainty factor CF is given by:
CF’ = CF x C(E)
If the evidence is known to be present, i.e., C(E) = 1, then the Equation yields CF’ = CF.
Abdul Rahim Ahmad
44
IF <evidence> THEN <hypothesis> WITH certainty factor CF
Updating Certainty
The technique for updating the certainty of hypothesis H, in the light of evidence E, involves the application of the following composite function:
Abdul Rahim Ahmad
45
where:C(H|E) is the certainty of H updated in the light of evidence E;C(H) is the initial certainty of H, i.e., 0 unless it has been updated by theprevious application of a rule;|x| = the magnitude of x, ignoring its sign.• The updating procedure
consists of adding a positive or negative value to the current certainty of a hypothesis.
• This contrasts with Bayesian updating, where the odds of a hypothesis are multiplied by the appropriate likelihood ratio
Updating Certainty
The function for Certainty Updating is similar to the Bayesian updating equation earlier
Abdul Rahim Ahmad
46
Updating certainty
In standard certainty theory, a rule can only be applied if C(E) > 0.
Some system restricts rule firing further by requiring that C(E) > 0.2 to save computational power and makes explanations clearer.
It is also possible to allow rules to fire regardless. The absence of supporting evidence, indicated by C(E) < 0, would then be taken into account since CF’ would have the opposite sign to CF.
Abdul Rahim Ahmad
47
Properties of Updating Function
Continuous and has no singularities or steps;
The updated certainty C(H|E) always lies within the bounds –1 and +1;
If either C(H) or CF’ is +1 (i.e., definitely true) then C(H|E) is also +1;
If either C(H) or CF’ is –1 (i.e., definitely false) then C(H|E) is also –1;
When contradictory conclusions are combined, they tend to cancel each other out, i.e., if C(H) = – CF’ then C(H|E) = 0;
Several pieces of independent evidence can be combined by repeated application of the function, and the outcome is independent of the order in which the pieces of evidence are applied;
If C(H) = 0, i.e., the certainty of H is at its a priori value, then C(H|E) = CF’
If the evidence is certain (i.e., C(E) = 1) then CF’ = CF.
Although not part of the standard implementation, the absence of evidence can be taken into account by allowing rules to fire when C(E) < 0.
Abdul Rahim Ahmad
48
Logical combinations of evidence
In Bayesian updating systems, each piece of evidence that contributes toward a hypothesis is assumed to be independent and is given its own affirms and denies weights
In systems based upon certainty theory, the certainty factor is associated with the rule as a whole.
Simple algorithm for determining the value of the certainty factor that should be applied when more than one item of evidence is included in a single rule.
The relationship between pieces of evidence is made explicit by the use of AND and OR.
If separate pieces of evidence are intended to contribute toward a single hypothesis independently of each other, they must be placed in separate rules.
The algorithm for combining items of evidence in a single rule is borrowed from possibility theory (Lotfi Zadeh). Abdul
Rahim Ahmad
49
Logical combinations of evidence
Abdul Rahim Ahmad
50
Working Example of Certainty Theory
Page 83 – 86 of text
Abdul Rahim Ahmad
51
Relating certainty factors to probabilities
There is a similarity between the certainty factors and the probabilities of those hypotheses. C(H) = 1.0 corresponds to P(H) = 1.0;
C(H) = 0.0 corresponds to P(H) being at its a priori value;
C(H) = –1.0 corresponds to P(H) = 0.0.
Relationship between certainty factor and P(H|E):
Abdul Rahim Ahmad
52
END
Abdul Rahim Ahmad
53