Chapter14-15

Embed Size (px)

Citation preview

  • 7/31/2019 Chapter14-15

    1/20

    1

    Reasoning Under UncertaintyPart (1) Certainty Factors

    Kostas Kontogiannis

    E&CE 457

    Objectives

    This unit aims to investigate techniques that allow for analgorithmic process to deduce new facts from a knowledgebase with a level of confidence or a measure of belief.

    These techniques are of particular importance when:1. The rules in the knowledge base do not produce a conclusion that is

    certain even though the rule premises are known to be certain and/or

    2. The premises of the rules are not known to be certain

    The three parts in this unit deal with:1. Techniques related to certainty factors and their application in rule-

    based systems2. Techniques related to the Measures of Belief, their relationship to

    probabilistic reasoning, and finally their application in rule-basedsystems

    3. The Demster Shafer model for reasoning under uncertainty in rule-based systems

  • 7/31/2019 Chapter14-15

    2/20

    2

    Uncertainty and Evidential Support

    In its simplest case, a Knowledge Base contains rules of

    the form :

    A & B & C => D

    where facts A, B, C are considered to be True (that is these

    facts hold with probability 1), and D is asserted in the

    Knowledge Base as being True (also with probability 1)

    However for realistic cases, domain knowledge has to be

    modeled in way that accommodates uncertainty. In other

    words we would like to encode domain knowledge using

    rules of the form:A & B & C => D (CF:x1)

    where A, B, C are not necessarily certain (i.e. CF = 1)

    Issues in Rule-Based Reasoning

    Under Uncertainty Many rules support the same conclusion with various

    degrees of Certainty

    A1 & A2 & A3 => H (CF=0.5)

    B1 & B2 & B3 => H (CF=0.6)

    (If we assume all A1, A2, A3, B1, B3, B3 hold then H is

    supported with CF(H) = CFcombine(0.5, 0.6))

    The premises of a rule to be applied do not hold with

    absolute certainty (CF, or probability associated with apremise not equal to 1)

    Rule: A1 => H (CF=0.5)

    However if during a consultation, A1 holds with CF(A1) =

    0.3 the H holds with CF(H) = 0.5*0.3 = 0.15

  • 7/31/2019 Chapter14-15

    3/20

    3

    The Certainty Factor Model

    The potential for a single piece of negative evidence

    should not overwhelm several pieces of positive evidence

    and vice versa

    the computational expense of storing MBs and MDs

    should be avoided and instead maintain a cumulative CF

    value

    Simple model:

    CF = MB - MD

    Cfcombine(X, Y) = X + Y*(1-X)

    The problem is that a single negative evidence overwhelmsseveral pieces of positive evidence

    The Revised CF ModelMB - MD

    1 - min(MB, MD)CF =

    {X + Y(1 - X) X, Y > 0

    X + Y1 - min(|X|, |Y|)

    One of X, Y < 0

    - CFcombine(-X, -Y) X, Y < 0

    CFcombine(X,Y) =

  • 7/31/2019 Chapter14-15

    4/20

    4

    Additional Use of CFs

    Provide methods for search termination

    A B C D E

    In the case of branching in the inference sequencing paths

    should be kept distinct

    0.8 0.4 0.7 0.7

    R1 R2 R3 R4

    Cutoff in Complex Inferences

    A B C

    D E

    0.8 0.4F0.9

    R30.7

    0.7

    R4

    R5

    R1 R2

    We should maintain to paths for cutoff (0.2), one being

    (E, D, C, B, A) and the other (F, C, B, A). If we had one path

    then E, D, C would drop to 0.19 and make C unusable later in

    path F, C, B, A.

  • 7/31/2019 Chapter14-15

    5/20

    5

    Reasoning Under UncertaintyPart (2) Measures of Belief

    Kostas Kontogiannis

    E&CE 457

    Terminology

    The units of belief follow the same as in probability theory

    If the sum of all evidence is represented by e and dis the

    diagnosis (hypothesis) under consideration, then the

    probability

    P(d|e)

    is interpreted as the probabilistic measure of belief or

    strength that the hypothesis dholds given the evidence e.

    In this context: P(d) : a-priori probability (the probability hypothesis d occurs

    P(e|d) : the probability that the evidence represented by e are

    present given that the hypothesis (i.e. disease) holds

  • 7/31/2019 Chapter14-15

    6/20

    6

    Analyzing and Using Sequential

    Evidence

    Let e1 be a set of observations to date, and s1 be some new

    piece of data. Furthermore, let e be the new set of

    observations once s1 has been added to e1. Then

    P(di | e) = P(s1 | di & e1) P(di | e1)

    Sum (P(s1 | dj & e1) P(dj | e1))

    P(d|e) = x is interpreted:

    IF you observe symptom e

    THEN conclude hypothesis d with probability x

    Requirements

    It is practically impossible to obtain measurements

    for P(sk|dj) for each or the pieces of data sk, in e,

    and for the inter-relationships of the sk within

    each possible hypothesis dj

    Instead, we would like to obtain a measurement of

    P(di | e) in terms of P(di | sk), where e is the

    composite of all the observed sk

  • 7/31/2019 Chapter14-15

    7/20

    7

    Advantages of Using Rules in

    Uncertainty Reasoning

    The use of general knowledge and abstractions in the

    problem domain

    The use of judgmental knowledge

    Ease of modification and fine-tuning

    Facilitated search for potential inconsistencies and

    contradictions in the knowledge base

    Straightforward mechanisms for explaining decisions

    An augmented instructional capability

    Measuring Uncertainty Probability theory

    Confirmation

    Classificatory: The evidence e confirms the hypothesis h

    Comparative: e1 confirms h more strongly than e2 confirms h or

    e confirms h1 more strongly than e confirms h2

    Quantitative: e confirms h with strength x usually denoted as

    C[h,e]. In this context C[h,e] is not equal to 1-C[~h,e]

    Fuzzy sets

  • 7/31/2019 Chapter14-15

    8/20

    8

    Model of Evidential Strength

    Quantification scheme for modeling inexact reasoning

    The concepts ofbeliefand disbeliefas units of measurement

    The terminology is based on:

    MB[h,e] = x the measure of increased belief in the hypothesis h,

    based on the evidence e, is x

    MD[h,e] = y the measure of increased disbelief in the hypothesis h,

    based on the evidence e, is y

    The evidence e need not be an observed event, but may be a hypothesis

    subject to confirmation

    For example, MB[h,e] = 0.7 reflects the extend to which the experts

    belief that h is true is increased by the knowledge that e is true

    In this sense MB[h,e] = 0 means that the expert has no reason to increase

    his/her belief in h on the basis of e

    Probability and Evidential Model

    In accordance with subjective probability theory, P(h)reflects experts belief in h at any given time. Thus 1 - P(h)reflects experts disbelief regarding the truth of h

    If P(h|e) > P(h), then it means that the observation of eincreases the experts belief in h, while decreasing his/herdisbelief regarding the truth of h

    In fact, the proportionate decrease in disbelief is given bythe following ratio

    P(h|e) - P(h)

    1 - P(h)

    The ratio is called the measure of increased belief in hresulting from the observation of e (i.e. MB[h,e])

  • 7/31/2019 Chapter14-15

    9/20

  • 7/31/2019 Chapter14-15

    10/20

    10

    Characteristics of Belief Measures

    Range of degrees:

    0

  • 7/31/2019 Chapter14-15

    11/20

    11

    More Characteristics of Belief Measures

    CF[h,e] + CF[~h,e] =/= 1

    MB[h,e] = MD[~h,e]

    The Belief Measure Model as anApproximation

    Suppose e = s1 & s2 and that evidence e confirms d. Then

    CF[d, e] = MB[d,e] - 0 = P(d|e) - P(d) =

    1 - P(d)

    = P(d| s1&s2) - P(d)

    1 - P(d)

    which means we still need to keep probability measurementsand moreover, we need to keep MBs and MDs

  • 7/31/2019 Chapter14-15

    12/20

    12

    Defining Criteria for Approximation

    MB[h, e+] increases toward 1 as confirming evidence is

    found, equaling 1 if and only f a piece of evidence logically

    implies h with certainty

    MD[h, e-] increases toward 1 as disconfirming evidence is

    found, equaling 1 if and only if a piece of evidence logically

    implies ~h with certainty

    CF[h, e-]

  • 7/31/2019 Chapter14-15

    13/20

    13

    Combining Functions

    {MB[h, s1&s2] =0 If MD[h, s1&s2] = 1

    MB[h, s1] + MB[h, s2](1 - MB[h, s1]) otherwise

    MD[h, s1&s2] = {0 If MB[h, s1&s2] = 1

    MD[h, s1] + MD[h, s2](1 - MD[h, s1]) otherwise

    MB[h1 or h2, e] = max(MB[h1, e], MB[h2, e])MD[h1 or h2, e] = min(MD[h1, e], MD[h2, e])

    MB[h, s1] = MB[h, s1] * max(0, CF[s1, e])MD[h,s1] = MD[h, s1] * max(0, CF[s1, e])

    Probabilistic Reasoning and Certainty Factors(Revisited)

    Of methods for utilizing evidence to select diagnoses ordecisions, probability theory has the firmest appeal

    The usefulness of Bayes theorem is limited by practicaldifficulties, related to the volume of data required tocompute the a-priori probabilities used in the theorem.

    On the other hand CFs and MBs, MDs offer an intuitive,yet informal, way of dealing with reasoning underuncertainty.

    The MYCIN model tries to combine these two areas(probabilistic, CFs) by providing a semi-formal bridge(theory) between the two areas

  • 7/31/2019 Chapter14-15

    14/20

    14

    A Simple Probability Model

    (The MYCIN Model Prelude)

    Consider a finite population ofn members. Members of the

    population may possess one or more of several properties

    that define subpopulations, or sets.

    Properties of interest might be e1 or e2, which may be

    evidence for or against a diagnosis h.

    The number of individuals with a certain property say e,

    will be denoted as n(e), and the number of two properties

    e1 and e2 will be denoted as n(e1&e2).

    Probabilities can be computed as ratios

    A Simple Probability Model (Cont.) From the above we observe that:

    n(e1 & h) * n = n(e & h) * n

    n(e) * n(h) = n(h) * n(e)

    So a convenient form of Bayes theorem is:

    P(h|e) = P(e|h)

    P(h) P(e)

    If we consider that two pieces of evidence e1 and e2 bear on a hypothesis h,and that if we assume e1 and e2 are independent then the following ratios hold

    n(e1 & e2) = n(e1) * n(e2)

    n n n

    andn(e1 & e2 & h) = n(e1 & h) * n(e2 & h)

    n(h) n(h) n(h)

  • 7/31/2019 Chapter14-15

    15/20

    15

    Simple Probability Model

    With the above the right-hand side of the Bayes Theorem

    becomes:

    P(e1 & e2 | h) = P(e1 | h) * P(e2 | h)

    P(e1 & e2) P(e1) P(e2)

    The idea is to ask the experts to estimate the ratios P(ei|h)/P(h)

    and P(h), and from these compute P(h | e1 & e2 & & en)

    The ratios P(ei|h)/P(h) should be in the range [0,1/P(h)]

    In this context MB[h,e] = 1 when all individuals with e have

    disease h, and MD[h,e] = 1 when no individual with e has h

    Adding New Evidence Serially adjusting the probability of a hypothesis with new

    evidence against the hypothesis:

    P(h | e) = P(ei | h) * P(h | e)

    P(ei)

    or new evidence favoring the hypothesis:

    P(h | e) = 1 - P(ei | ~h) * [ 1 - P(h | e)]P(ei)

  • 7/31/2019 Chapter14-15

    16/20

    16

    Measure of Beliefs and Probabilities

    We can define then the MB and MD as:

    MB[h,e] = 1 - P(e | ~h]

    P(e)

    and

    MD[h,e] = 1 - P(e | h)

    P(e)

    The MYCIN Model

    MB[h1 & h2, e] = min(MB[h1,e], MB[h2,e])

    MD[h1 & h2, e] = max(MD[h1,e], MD[h2,e])

    MB[h1 or h2, e) = max(MB[h1,e], MB[h2,e])

    MD[h1 or h2, e) = min(MD[h1,e], MDh2,e])

    1 - MD[h, e1 & e2] = (1 - MD[h,e1])*(1-MD[h,e2])

    1- MB[h, e1 & e2] = (1 - MB[h,e1])*(1-MB[h, e2])

    CF(h, ef& ea) = MB[h, ef] - MD[h,ea]

  • 7/31/2019 Chapter14-15

    17/20

    17

    Reasoning Under UncertaintyPart (3) Demster-Shafer Model

    Kostas Kontogiannis

    E&CE 457

    The Demster-Shafer Model

    So far we have described techniques, all of which consideran individual hypothesis (proposition) and and assign toeach of them a point estimate in terms of a CF

    An alternative technique is to consider sets of propositionsand assign to them an interval of the form

    [Belief, Plausibility] that is

    [Bel(p) , 1-Bel(~p)]

  • 7/31/2019 Chapter14-15

    18/20

    18

    Belief and Plausibility

    Belief (denoted asBel) measures the strength of the

    evidence in favor of a set of hypotheses. It ranges from 0

    (indicating no support) to 1 (indicating certainty).

    Plausibility (denoted as Pl) is defined as

    Pl(s) = 1 - Bel(~s)

    Plausibility also ranges from 0 to 1, and measures the

    extent to which evidence in favor of ~s leaves room for

    belief in s. In particular, if we have certain evidence infavor of ~s, then the Bel(~s) = 1, and the Pl(s) = 0. This

    tells us that the only possible value for Bel(s) = 0

    Objectives for Belief and Plausibility

    To define more formally Belief and Plausibility we need to

    start with an exhaustive universe of mutually exclusive

    hypotheses in our diagnostic domain. We call this set

    frame of discernment and we denote it as Theta

    Our goal is to attach a some measure of belief to elements

    ofTheta. In addition, since the elements ofTheta are

    mutually exclusive, evidence in favor of some may have an

    effect on our belief in the others.

    The key function we use to measure the belief of elementsofTheta is a probability density function, which we denote

    as m

  • 7/31/2019 Chapter14-15

    19/20

    19

    The Probability Density Function in

    Demster-Shafer Model The probability density function m used in the Demster-

    Shafer model, is defined not just for the elements ofThetabut for all subsets of it.

    The quantity m(p) measures the amount of belief that iscurrently assigned to exactly the setp of hypotheses

    IfTheta contains n elements there are 2n subsets ofTheta

    We must assign m so that the sum of all the m valuesassigned to subsets ofTheta is equal to 1

    Although dealing with 2n hypotheses may appear

    intractable, it usually turns out that many of the subsetswill never need to be considered because they have nosignificance in a particular consultation and so their mvalue is 0

    Defining Belief in Terms of

    Function m Having defined m we can now defineBel(p) for a set p, as

    the sum of the values of m forp and for all its subsets.

    ThusBel(p) is our overall belief, that the correct answer

    lies somewhere in the setp

    In order to be able to use m, and thus Bel and Pl in

    reasoning programs, we need to define functions that

    enable us to combine ms that arise from multiple sources

    of evidence

    The combination of belief functions m1 and m2 issupported by the Dempster-Shafer model and results to a

    new belief function m3

  • 7/31/2019 Chapter14-15

    20/20

    20

    Combining Belief Functions

    To combine the belief functions m1 and m2 on sets X and Y we use

    the following formula

    SumY intersect Y = Z m1(X) * m2(Y)

    m3(Z) =

    1 - SumX intersect Y = empty m1(X) * m2(Y)

    If all the intersections X, Y are not empty then m3 is computed by

    using only the upper part of the fraction above (I.e. normalize by

    dividing by 1)

    If there are intersections of X, Y that are empty the upper part of the

    fraction is normalized by 1-k (where k is the sum of the m1*m2 on the

    X,Y elements that give empty intersection