Linguistic Summarization Using IF-THEN Rules
Authors: Dongrui Wu, Jerry M. Mendel and Jhiin Joo
Introduction
• Type I & Type II Fuzzy Systems• Dataset Description• Linguistic Descriptions• Implementation
Type-1 Fuzzy Sets• Crisp sets, where x A or x A• Membership is a continuous grade [0,1]• Membership a value
1.77
0
1
Height (m)
Degree of “Tall-ness”
0.6
Interval Type-2 Fuzzy Sets
• Interval type-2 fuzzy sets - interval membership grades
• X is primary domain• Jx is the secondary domain• All secondary grades (A(x,u)) equal 1• A(x) is the secondary membership function at x
(vertical slice representation)
A = {((x,u), 1) | x X, u Jx, Jx [0,1]}~
~
Interval Type-2 Fuzzy Sets
Tall
0
1
Height (m)
~ Upper Membership Function
Lower MF Tall
Type -1 MF= FOU(explained in next slide)
•Membership no longer crisp
~
Interval Type-2 Fuzzy Sets
• Fuzzification:
1.8
0.42
Tall
0
1
Height (m)
~
0.78Tall (1.8) = [0.42,0.78]
Interval Type-2 Fuzzy Sets
• FOU• Vertical slice of a
Type 2 membership function– Indicating 3D
structure of Type 2
Mendel Jerry M. and. Bob John Robert I, “Type-2 Fuzzy Sets Made Simple.” IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 10, NO. 2, APRIL 2002.
Haberman’s survival Data set - UIUC
• From a study conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer.
• Attribute Information:• 1. Age of patient at time of operation (numerical)
2. Patient's year of operation (year - 1900, numerical) 3. Number of positive axillary nodes detected (numerical) 4. Survival status (class attribute) -- 1 = the patient survived 5 years or longer -- 2 = the patient died within 5 year
Linguistic Summarizations: IF - THEN
• Type 1: IF AGE is 35 AND YEAR is 1962, THEN SURVIVAL is YES
• Type 2: IF AGE is around 35 AND YEAR is around 1962, THEN SURVIVAL is YES
Some parameters
• T – Degree of Truth; an assessment of Validity
– T increases as more data satisfying antecedent also satisfy consequent
Some parameters• C – Degree of Sufficient Coverage
– Determines if sufficient data satisfies a rule
• (trigger)
– C=f(rc)• U – Degree of Usefulness
– Indicates how useful a rule is– A rule is useful iff
• it has high degree of truth: most of the data satisfy the rule’s antecedents as well as its consequent
• It has sufficient coverage: enough data are described by it.– U=min(T,C)
• It depends on the parameters described earlier
Some parameters
• O – Degree of Outlier– Indicates if a rule describes the
outliers instead of most of the data
– If T=0, O=0 since no data is described by the rule
– Described by the complement of T & C since they both depend on the data (not outlier)
Some parameters
• S - Degree of Simplicity • Determined by the length of the summary
• L = number of antecedents • Simplest rule: S=1 (one antecedent and one consequent)
MAMC Rules
• Multi Antecedent Multi consequent
Implementation•Each case represented as a piecewise linear curve•Blue – strength of supporting rule•Red- cases violating given rule•Black- Irrelevant
•Figure shows if C is used for ranking, T may/may not be high
Implementation
•Figure shows if U is used for ranking, high U indicates high T & C : useful rule
Conclusions
• An important method of ranking rules using the parameters:– Degree of Truth– Degree of Sufficient Coverage– Degree of Usefulness– Degree of Outlier– Degree of Simplicity