18
Section 11 – Ec1818 Jeremy Barofsky [email protected] u April 21 and 22, 2010

Section 11 – Ec1818 Jeremy Barofsky [email protected] April 21 and 22, 2010

Embed Size (px)

Citation preview

Page 1: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Section 11 – Ec1818

Jeremy [email protected]

April 21 and 22, 2010

Page 2: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Outline (Lectures 21-23)• Fuzzy Logic

– Motivation– Difference between probability and fuzziness– Crisp versus fuzzy sets– Fuzzy Arithmetic– Making decisions with fuzzy sets

• Boolean Truth Tables / QCA

• Generalization in statistical learning theory

• FINAL REVIEW SESSION: 4/22, 5:30- 6:30, Sever 102.• FINAL OFFICE HOURS: 4/22 10-11am, 4/26 10-11am outside CGIS

North room 320.• FINAL LOCATION: Emerson Hall 210, 1:00 Tuesday, 4/27.

Page 3: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Motivation for Fuzzy Logic• Since Alan Greenspan won’t live forever, can we

preserve his head full of knowledge to answer all our pressing economic questions? Would we want to? Instead, let’s make an expert system!!

• Experts make decisions using fuzzy concepts, meaning a scout for a basketball team must choose whether a player is tall or not. But the definition of tall is not clearly defined or agreed upon.

• Though not always, experts often produce accurate predictions without numeric statements like “tall and quick players make good point guards.”

• Fuzzy logic allows us to create an expert system, classifying qualitative data, and re-produce expert judgments.

Page 4: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Difference between Random (probability distributions) and Inherent Uncertainty (fuzzy sets)• Probability: the events are clearly defined but we are

uncertain about which ones will occur. An individual will be tall with probability p, but the definition of tall is clear – it follows a crisp set. Tall = {heights over 5’ 11’’}. Uncertainty resolved once we measure that person’s height.

• Fuzzy Logic: Uncertainty not in the probability of being tall (which is eliminated when we measure the person), but uncertainty in what tall means. Uncertainty in how to classify an individual even when we know their height.

• Membership function µ(x) maps heights (x) to the degree that a given height conforms to the tall versus medium set.

Page 5: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Crisp versus Fuzzy Set Functions• Set refers to group of elements in a domain.• Say our domain is height and our sets are short,

medium, and tall. Elements refer to the actual height numbers, eg: 5’1’’, 6’11’’.

• Crisp set functions provide well-defined sets within a domain, meaning tall = {all heights > 5’10’’} are tall and medium = {heights > 5’4’’ and < 5’10’’}.

• Membership function µ(x) is either 0 or 1 only.• Fuzzy set functions provide a degree with which

elements (a specific height) in the domain (of height) conform to a given set (tall or medium).

• Draw diagram of membership functions for crisp versus fuzzy sets – need not be linear function (could be triangular or bell curved).

Page 6: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Can I be in two sets at once?

• Fuzzy logic was indeed created to deal with situations where we are uncertain about how to classify a person’s height (which set an element conforms to).

• Transition phase: refers to areas on membership diagram where two membership functions overlap. A person who is 6 feet conforms to the medium height set to degree 0.2 and the tall height set to degree 0.8.

Page 7: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Fuzzy Arithmetic – Operations on Fuzzy Sets

• Define x = “sort of friendly” set, y = “a bit tall” set. µ(x1) = 0.6, µ(y1) = 0.3.

• AND operation: Intersection of sets. So µ(x1) AND µ(y1) = min[µ(x1) , µ(y1) ] = 0.3

• OR operation: Union of sets. So µ(x1) OR µ(y1) = max[µ(x1) , µ(y1) ] = 0.6.

• NOT operation: 1 - µ(x1) = membership in not sort of friendly set.

• Write out example of combined operations of AND, NOT for fuzzy and crisp sets.

Page 8: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Bring it on home – How does fuzzy logic help us make decisions?• Uses if-then statements: In a crisp world, person is either big or not,

so rules are simple, if big, then run.• In fuzzy world, person is big or somewhat big by degrees. • We see a person who is x = 6 feet and y = has mom tattoo and must

decide which rule to use. (Define x = “bigness” and y = “meanness”).• Rules example:• 3 bigness categories {not big, somewhat big, meat head}, 2

meanness categories {no “Mom” tattoo, tattoo}DATA: A person we meet has µ1(x) = .2 (means x conforms to “not big”

set with degree .2), µ2(x) = .7 (somewhat big) and µ3(x) = .1 (meat head), µ1(y) = .4, µ1(y) = .4 , µ2(y) = .6

1) If not big and not mean – blow off person2) If somewhat big and not mean – give some cash3) If big and mean – give cash and apologize for staring.Fuzzy Logic system tells us how to make decisions based on the truth

value of each rule (taking an average, min, max).Rule 1 - µ1(x) AND µ1(y) = 0.2Rule 2- µ2(x) AND µ2(y) = 0.6Rule 3- µ3(x) AND µ2(y) = 0.1

Page 9: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Creating the expert system• Knowledge Base: Key component of fuzzy logic

expert system is how to produce the knowledge base used to create the system.

• 3 ways: 1) Have experts run through scenarios and make decisions – extract rules, 2) Interview experts about their most recent or most important decisions 3) Have experts answer basic questions to create simple rules for decision making.

Page 10: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Qualitative Comparative Analysis (QCA)• Allows us to group data (often case studies) with few

observations and where nonlinearities exist.• QCA generalizes case studies: at what point can we take

case studies and produce a result that holds in many contexts?

• Nonlinearities or configurations determine the outcome, meaning combinations of factors affect result (K>0).

• Linear regression doesn’t allow this because additional interaction terms require many observations and these scale quickly.

• Eg: If you have binary variables and 3 characteristics that together determine outcome, then need a minimum of 23 = 8 (25 = 32 )observations to estimate all the interaction terms.

• Fuzzy QCA: Simplest version of QCA takes outcomes as correct, not as instance of a probability distribution but when we add uncertainty in outcomes we get fuzzy QCA.

Page 11: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

4 Steps of QCA

• 1) Describe observations as Boolean truth table.

• 2) Combine statements via Boolean minimization.

• 3) Develop assumptions about missing observations.

• 4) Add details to model for aberrant cases.

Page 12: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Boolean Notation and Truth Table• Note on Freeman’s usage of notation from Boolean

algebra compared to statistics. A (capitalized) means A occurred, a (lower case) means it didn’t, + sign means OR (union), and letters together like AB mean AND (intersection) of A and B.

• EXAMPLE: – ABC means A and B and C occurred. – A + ABC means that A or A and B and C occurred. – Abc means A and not B and not C.

• 1) Boolean truth table: Start with full explanation of the data using the conditions we assume cause the outcome. Contains as many rows as logical combinations of factors. Eg: 4 binary independent variables leads to 24 = 16 rows.

Page 13: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Boolean Truth Table Example

OBS # A B C D Y

Ob1 0 0 0 0 0

Ob2 0 0 0 1 0

Ob3 0 0 1 0 1

Ob4 0 0 1 1 0

Ob5 0 1 0 0 1

Ob6 0 1 0 1 0

Ob7 0 1 1 0 1

Ob8 0 1 1 1 1

Ob9 1 0 0 0 1

Ob10 1 0 0 1 1

Ob11 1 0 1 0 0

Page 14: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

2) Boolean Minimization – simplify the truth table

• Write out table as Boolean sum of observations…

• Y = abCd + aBcd + aBCd + aBCD + Abcd + AbcD• y = Y’ = abcd + abcD + abcD + aBcD + AbCd• Simplify meaning look for sets of characteristics that irreducibly

cause the outcome. • Since abcd + abcD = y, then can reduce these implicants to abcd

+ abcD = abc. Which can be read that if we don’t have a and b and c, then we won’t get outcome y.

• Cycle through all input variables and determine if simplify strings.

• Simplies: Y = aCd + aBd + aBC + Abc and y = acD + abD + abc

Page 15: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

QCA continued• Using reduced-form strings allows us to reduce

the structure of our table to the minimum way to describe the outcome data.

• 3) Won’t observe all configurations because-– Impossible (selective admissions and bad faculty)– Missing data; then postulate on outcomes. Can usually

make conservative assumption of all 0s. • 4) Correct errors: Cases of PY’ or P’Y. In

manufacturing dataset in class, make arbitrary choice that 65% of cases predicted correctly means success.

Page 16: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Generalization in Statistical Learning Theory

• The amount of data that we collect from our world is increasing rapidly (genetics, financial information, Internet).

• Supervised or statistical learning takes data examples and trains to perform a specific task. Neural nets and tree diagrams are examples where the task may be prediction of mortgage default.

• In psychology terms, this is analogous to concept learning. Want to understand the concept of sports and classify inputs, but not too generally.

Page 17: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Concentration of Measure• Functions that depend on many parameters are

almost constant. • As the dimensionality of the data increases (we

have more information about Yao Ming) our outcome estimate improves.

• Allows us to reduce error on test data by increasing the number of measurements even on a non-representative sample.

• Solves our problem when we have small n (like in a case study), but where we have a lot of information about those cases D-> infinity.

Page 18: Section 11 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu April 21 and 22, 2010

Using concentration of measure