125
1-1 © Young H. Chun Session 1. Introduction to Data Mining "Data, data everywhere, but not a thought to think." from Jesse Shera's paraphrase of Coleridge * Data Mining ! Progress in digital data acquisition and storage technology has resulted in the growth of huge data bases. - Supermarket transaction data - Credit card usage records - Telephone call details - Government statistics - Medical records Ex] Wal-Mart makes over 20 million transactions daily. AT&T has 100 million customers and carries on the order of 300 million calls a day on its long distance network. ! Interest has grown in the possibility of tapping these data, of extracting from them information that might be of value to the owner of the database. The discipline concerned with this task has become known as data mining. * Definition of Data Mining ! Simply stated, “data mining refers to extracting or ‘mining’ knowledge from large amounts of data.! The mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining. Thus, “knowledge mining from data” is more appropriate…

Session 1. Introduction to Data Mining - Innovativepponson/FirstTest.pdfSession 1. Introduction to Data Mining "Data, data everywhere, but not a thought to think." from Jesse Shera's

Embed Size (px)

Citation preview

1-1

© Young H. Chun

SSeessssiioonn 11.. IInnttrroodduuccttiioonn ttoo DDaattaa MMiinniinngg

"Data, data everywhere, but not a thought to think." from Jesse Shera's paraphrase of Coleridge

* Data Mining

! Progress in digital data acquisition and storage technology has resulted in the growth of huge data bases.

- Supermarket transaction data - Credit card usage records - Telephone call details - Government statistics - Medical records Ex] Wal-Mart makes over 20 million transactions daily.

AT&T has 100 million customers and carries on the order of 300 million calls a day on its long distance network. ! Interest has grown in the possibility of tapping these data, of extracting from them information that might be of value to the owner of the database. The discipline concerned with this task has become known as data mining.

* Definition of Data Mining

! Simply stated, “data mining refers to extracting or ‘mining’ knowledge from large amounts of data.” ! The mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining. Thus, “knowledge mining from data” is more appropriate…

1-2

© Young H. Chun

* Knowledge Discovery in Database

! Data mining is often set in the broader context of knowledge discovery in data base or KDD. This term originated in the artificial intelligence (AI) research field.

! Stages of KDD

- Selecting the target data - Preprocessing the data (cleaning and integration) - Transforming them if necessary - Performing data mining to extract patterns and relationships - Interpreting and assessing the discovered structures.

! In ISDS 4141, we will focus primarily on data mining algorithms, rather than the overall process.

! Data mining is an interdisciplinary exercise.

Statistics, database technology, machine learning, pattern recognition, artificial intelligence, and visualization, all play a role.

* Afraid of Mathematics?

No mathematical background beyond high school algebra is required for an understanding of data mining. Mathematical derivations are generally not included in this lecture notes.

The emphasis here will be (1) on providing students with solid and effective evidence concerning the power and applicability of modern data mining methods, (2) on making sure that they can use these techniques, and (3) on indicating the assumptions underlying these techniques, as well as their limitations.

To accomplish these purposes, it is neither necessary nor appropriate to deluge students with mathematics.

1-3

© Young H. Chun

11..11.. VVaarriiaabbllee aanndd DDaattaa

* Variable of Interest

! As a decision-maker, what information do you need?

Ex] IQ scores of 30,000 LSU students…

Ex] Do you wash your hands after using the restroom?

! Collect data for analysis!

- IQ scores, X = {123, 105, 136, … }

- Wash hands? X = {Yes, No, Yes, Yes, No, …} ! Other examples

- Annual household income?

- Mileage on your BMW?

- Proportion of US voters who support abortion?

- Are you left-handed?

"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." by H. G. Wells

No Yes

X

1-4

© Young H. Chun

* Type of Data

! Level of measurement and types of measurement scales

How much information does a set of numbers really convey?

Level Implication Meaning Operations

Nominal Labeling only Counting Ordinal Ranking Ordering

Interval Meaningful distances between values

Addition, subtraction, multiplication by constant

Ratio Meaningful zero-point Multiplication & division, etc.

Ex] ! Gender, undergraduate major, political party.

! Grades, BCS rankings, movie classification

! Temperature, survey questions

! Stock price, height, starting salary.

* Another Classification…

! Quantitative (cardinal) data: IQ score?

! Qualitative (attribute) data: Wash your hand?

Ratio

Interval Ordinal

Nominal

Attribute

Cardinal

1-5

© Young H. Chun

* Nature of Modeling

It is concerned with optimal decision making in, and modeling of, deterministic and probabilistic systems that originated from real life.

Real world system Assumed

real world system

Assumptions

Model

prediction

* Phase of the Modeling

1. Problem Definition

2. Model formulation

3. Model solution

4. Model validity

5. Implementation

* Types of Model

Linear and logistic regression model, neural network, Bayesian model, decision tree, and so on.

1-6

© Young H. Chun

11..22.. DDaattaa MMiinniinngg TTaasskkss

1. Exploratory data analysis (EDA)

The goal is simply to explore the data without any clear ideas of what we are looking for. Typically, EDA techniques are interactive and visual.

2. Descriptive modeling

The goal is describe all the data (or the process generating the data)

! Models for the overall probability distribution of the data (density estimation)

! Partitioning of the p-dimensional space into groups(cluster analysis and segmentation)

! Models describing the relationship between variables (dependency modeling)

3. Predictive modeling: Classification and regression

The aim is to build a model that will permit the value of one variable to be predicted from the known values of other variables. In classification, the variable being predicted is categorical, while in regression the variable is quantitative.

4. Discovering patterns and rules

The goal is to detect unusual behavior or association rules.

5. Retrieval by content

The user has a pattern of interest and wishes to find similar patterns in the text or image data set.

1-7

© Young H. Chun

A. Prediction

Ex] Police Officers: In a study of differences in levels of community demand for police officers, the following regression equation was fitted to data from 39 towns in Delaware County, Pennsylvania. ( E. J. Mathias and C. E. Zech, “The community demand for

police officers,” American Journal of Economics and Sociology, 44, 1985, 401-410 )

Y = Number of full-time police officers per capita X1 = Maximum base salary of police officers X2 = Percentage of population that is black X3 = Estimated per capita income X4 = Population density X5 = Amount of intergovernmental grants per capita X6 = Number of miles from center city Philadelphia X7 = Percentage of population that is male and between 12

and 21 years of age. Ex] Housing Price: To explain the selling prices of houses, the following model was fitted, , to a sample of 815 sales. ( B. A.

Newsome, “Adjusting comparable sales for vinyl siding,” The Appraisal Journal, 59, 1991, 92-95.)

Y = Selling price of house, in dollars X1 = Square footage of living area X2 = Size of garage, in number of cars X3 = Age of house, in years X4 = Dummy variable taking the value 1 if the house has a

fireplace, and 0 otherwise X5 = Dummy variable taking the value 1 if the house has brick

siding and 0 if it has vinyl siding.

! Multiple liner regression analysis ! Neural network

1-8

© Young H. Chun

Ex] Box Office Success: Predict whether a film will be a hit or a miss at the box office long before it is even made. (R. Sharda and

D. Delen, "Predicting box-office success of motion pictures with neural networks." Expert Systems with Applications 30, 2006, 243-254)

Y = Gross revenue X1 = Rating by censors X2 = Competition from other films at the time of release X3 = Strength of the cast X4 = Genre X5 = Special effects X6 = Whether it is a sequel, and X7 = Number of theatres it opens in.

Ex] Netflix Prize: Substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win a million dollars. (http://www.netflixprize.com/)

The training data set consists of more than 100 million ratings from over 480 thousand randomly-chosen, anonymous customers on nearly 18 thousand movie titles.

The test set contains over 2.8 million customer/movie id pairs, but with the ratings withheld. You must provide predictions for all the withheld ratings for each customer/movie id pair in the test set.

Netflix will score your predictions by computing the square root of the averaged squared difference between each prediction and the actual rating (the root mean squared error or "RMSE") in the test set.

1-9

© Young H. Chun

B. Classification

Ex] Consider two groups in Baton Rouge: riding-mower owners and those without riding-mowers. In order to identify the best sales prospects for an intensive sales campaign, a riding-mower manufacture is interested in classifying families as prospective owners or non-owners on the basis of income and lot size. Ex] The titanic dataset gives the values of four categorical attributes for each of the 2201 people on board the Titanic when it struck an iceberg and sank. The attributes are age (adult or child), gender (male or female), social class (first class, second class, third class, or crewmember), and whether or not the person survived. The question of interest is considered to be how survival relates to the other attributes. ! Logistic regression - Logit model - Probit model - Complimentary log-log model

! Bayesian classification - Prior information - Misclassifications costs - Likelihood function

! Density estimation - Kernel method - Multivariate binary Kernel method

! Decision tree

1-10

© Young H. Chun

C. Cluster Analysis

Ex] J.C. Penny creates special catalogs targeted to various demographic groups based on attributes such as income, location, and physical characteristics of potential customers. To determine the target mailings of the various catalogs and to assist in the creation of new, more specific catalogs, the company performs a clustering of potential customers based on the determined attribute values. The results of the clustering exercise are then used by management to create special catalogs and distribute them to the correct target population based on the cluster for that catalog. ! Classification pertains to a known number of groups and the

operational objective is to assign new observations to one of these groups

! Cluster analysis is a more primitive technique in that no assumptions are made concerning the number of groups or the group structure. Grouping is done on the basis of similarities or distances.

! Distance measure for variables and items

! Distance measure for categorical variables ! Linkage methods

! k-mean method

! Nearest mean method

1-11

© Young H. Chun

D. Association Rules and Sequence Discovery

Ex] A grocery store chain keeps a record of weekly transactions where each transaction represents the items bought during one cash register transaction. The executives of the chair receive a summarized report of the transactions indicating what types of items have sold at what quantity. In addition, they find that 100% of the time that PeanutButter is purchased, so is Bread. In addition, 33% of the time PeanutButter is purchased, Jelly is also purchased.

! Apriori algorithm ! Frequent pattern growth (FP-growth) algorithm Ex] The Webmaster at LSU periodically analyzes the Web log data to determine how visitors of the Web pages access them. He is interested in determining what sequences of pages are frequently accessed. He determines that 70 percent of the visitors of page A follow one of the the following patterns of behavior: (A, B, C) or (A, D, B, C) or (A, E, B, C). He then determines to add a link directly from page A to C.

! Hidden Markov chain

E. Time Series Data Mining

! Use distance measures to determine the similarity between different time series.

! Examine the structure of the line to determine (and perhaps classify) its behavior.

! Use the historical time series plot to predict future values.

1-12

© Young H. Chun

11..33.. DDaattaa MMiinniinngg SSooffttwwaarree

* Types of Data Mining Software

1. Application-specific software

Aimed at providing solutions to end-users for common tasks.

- Unica for customer relationship management - Urban Science for location and distribution

2. Technique-specific software

Focused on a few data mining methods.

- Decision trees: CART (Salford Associates)

- Artificial neural network NeuralWorks (Neuralware)

- Rule Induction WizWhy (Wizsoft), See5 (Rulequest)

3. General purpose (horizontal) data mining tools

Designed for data mining analysts who may be statisticians, business analysts or experts in a particular business domain.

Enterprise Miner (SAS), Clementine (SPSS), Intelligent Miner (IBM), Teraminer (Retrograde Data Systems), Insightful Miner, Darwin (Oracle)

- Those are powerful, comprehensive, and easy-to-use - Need substantial learning effort and very expensive.

1-13

© Young H. Chun

A. SAS Enterprise Miner

! Streamline the entire data mining process from data access to model deployment by supporting all necessary tasks within a single, integrated solution, all while providing the flexibility for efficient workgroup collaborations.

! Provide advanced predictive and descriptive modeling tools and algorithms, including decision trees, neural networks, auto-neural networks, memory-based reasoning, linear and logistic regression, clustering, associations, time series and more.

! SEMMA data mining approach combines a structured process with the logical organization of the tools needed to support each of the five steps:

Sample your data by extracting a portion of a large data set big enough to contain the significant information, yet small enough to manipulate quickly.

Explore your data by searching for unanticipated trends and anomalies in order to gain understanding and ideas.

Modify your data by creating, selecting, and transforming the variables to focus the model selection process.

Model your data by allowing the software to search automatically for a combination of data that reliably predicts a desired outcome.

Assess your data by evaluating the usefulness and reliability of the findings from the data mining process and estimate how well it performs.

! Once you have developed the champion model using the SEMMA-based mining approach, it then needs to be deployed to score new customer cases. Model deployment is the end result of data mining - the final phase in which the ROI from the mining process is realized.

1-14

© Young H. Chun

B. IBM DB2 Intelligent Miner (IM)

! Intelligent Miner is a family of data analysis software available from IBM.

! IBM produces a vast array of software for enterprise customers.

! Its products are often available on a larger number of platforms, and work well with many other products.

! The Intelligent Miner (IM) family 1. Intelligent Miner for Data 2. Intelligent Miner for Text 3. Intelligent Miner - Modeling - Scoring - Visualization

C. Weka

Weka is an open source data mining software which is a collection of machine learning algorithms for solving real-world data mining problems. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.

D. XL Miner

XL Miner is a data mining add-in for Microsoft Excel. It offers a full repertoire of techniques for classification, prediction, affinity analysis, data exploration, and data reduction.

1-15

© Young H. Chun

TThhee WWiizzaarrdd ooff OOddddss

* Background

Ms. Marilyn vos Savant, listed in the “Guinness Book of World Records Hall of Fame” for “Highest IQ” in the world, replied that, "Yes; you should switch. The first door has a one-third chance of winning, but the second door has a two-thirds chance."

When she innocently printed the reply in the magazine supplement to many Sunday newspapers, she had no idea that it would provoke a national controversy. She received thousands of letters, nearly all insisting that, because two options remained, the chances were even.

The most vehement criticism has come from statisticians and scientists, who have alternated between gloating at her and lamenting the nation's innumeracy.

Whose side are you on?

Dear Marilyn:

Suppose you're on a game show, and you're given the choice of three doors. Behind one door is a car, the others, goats. You pick a door, say #1, and the host, who knows what's behind the doors, opens another door, say #3, which has a goat. He says to you, "Do you want to pick door #2?" It is to your advantage to switch your choice of doors?

Craig F. Whitaker, Columbia, Maryland.

1-16

© Young H. Chun

A. Letters from Academia

! Since you seem to enjoy coming straight to the point, I'll do the same. In the following question and answer, you blew it! As a professional mathematician, I'm very concerned with the general public's lack of mathematical skills. Please help by confessing your error and it the future being more careful. Robert Sachs, Ph. D. George Mason Univ.

! You blew it, and you blew it big! ... There is enough mathematical illiteracy in this country, and we don't need the holder of the world's highest I.Q. propagating more. Shame!

S. S., Ph.D. University of Florida

! Your answer to the question is in error. But if it is any consolation, many of my academic colleagues have also been stumped by this problem.

Barry Pasternack, Ph. D. California Faculty Association

! You're in error, but Albert Einstein earned a dearer place in the hearts of people after he admitted his errors. Frank Rose, Ph.D. University of Michigan

! I have been a faithful reader of your column, and I have not, until now, had any reason to doubt you. However, in this matter ( for which I do have expertise ), your answer is clearly at odds with the truth. James Rauff, Ph. D. Millikin University

! May I suggest that you obtain and refer to a standard textbook on probability before you try to answer a question of this type again?

Charles Reid, Ph.D. University of Florida

! Your logic is in error, and I am sure you will receive many letters on this topic from high school and college students. Perhaps you should keep a few addresses for help with future columns. W. Robert Smith, Ph.D. Georgia State University

! You are utterly incorrect about the game show question, and I hope this controversy will call some public attention to the serious national crisis in mathematical education. If you can admit your error, you will have contributed constructively towards the solution of a deplorable situation. How many irate mathematicians are needed to get you to change your mind? E. Ray Bobo, Ph.D. Georgetown University

! I am in shock that after being corrected by at least three mathematicians, you still do not see your mistake. Kent Ford, Dickinson State university

! You are the goat! Glenn Calkins, Ph.D. Western State College

! You are wrong, but look at the positive side. If all those Ph.D.'s were wrong, the country would be in some very serious trouble.

Everett Harman, Ph.D. U.S. Army Research Institute

B. Tom and Ray's Car Talk Radio Show

! What could be more fun than proving a bunch of pompous academics wrong!!!

2-1

© Young H. Chun

SSeessssiioonn 22.. RReevviieeww ooff BBaassiicc TToooollss iinn DDaattaa MMiinniinngg II

"Data, Data, Data! He cried impatiently. I can't make bricks without clay."

by Sherlock Holmes

* Type of Data

• Univariate data, X

- What is a typical ( summary ) value? Central tendency…

- How diverse are these items? Dispersion (variation)…

- Are there any individuals that require special attention? Outliers…

• Bivariate data, ( X1 and X2 ) or ( X and Y )

- Is there a simple relationship between the two?

- How strongly are they related?

- Can you predict one from the other?

- Are there any individuals that require special attention? • Multivariate data, (Y, X1, X2,…)

- Is there a relationship among them?

- How strongly are they related?

- Can you predict one from the others?

- Are there any individuals that require special attention?

2-2

© Young H. Chun

22..11.. PPooppuullaattiioonn aanndd SSaammppllee

* What is Statistics?

Statistics is the art and science of

(A) collecting, organizing, summarizing, presenting and

analyzing data, and

(B) drawing valid conclusions and making reasonable

decisions on the basis of such analysis.

* Type of Statistics

A. Descriptive statistics ( deductive statistics )

B. Statistical inferences (inductive, or inferential statistics)

Descriptive statistics Inferential statistics

! Small population X

! Large population

- Census X

- Sampling X X

"Statistical thinking will one day be as necessary for efficient citizenship

as the ability to read and write." by H. G. Wells

Sample, n Population, N

Population parameters Sample statistics

2-3

© Young H. Chun

* Population Parameters

! Population (Universe)

Total set of elements of interest for a given problem.

! Population parameters: !, "2, #

- Numerical characteristic of the population.

- Unknown constants.

Type of Data Population parameters 1. Population mean

! for the central

tendency $%

%!N

i

i

N

x

1

A. Quantitative data 2. Population

variance "2 for

dispersion $%

!&%"

N

i

i

N

x

1

22 )(

B. Qualitative data with two categories

3. Population

proportion #: N

x%#

Ex] Final grades this semester = { 4, 3, 3, 2}

! Population mean

! =

! Population variance

"2 =

! Population standard deviation

" = Ex] 20 females in a class of 50 students

! Population proportion, # =

2-4

© Young H. Chun

* Sample Statistics

- Sample A small part (or subset) of the population used to gain information about the whole.

- Sample statistics: x , s2, p

- Numerical characteristic of the sample. - They are known when we have taken a sample, but they are random variables, changing from sample to sample.

Type of Data Sample statistics

1. Sample mean x for the central tendency

$%

%n

i

i

n

xx

1

A. Quantitative data

2. Sample variance s

2 for dispersion $

% &

&%

n

i

i

n

xxs

1

22

1

)(

B. Qualitative data with two categories

3. Sample proportion p n

xp %

Ex] IQ scores of 3 students in the sample = {106, 124, 130}

! Sample mean x =

! Sample variance s2 =

! Sample standard deviation s =

Ex] 6 students are left-handed in a sample of 50 students

! Sample proportion, p =

2-5

© Young H. Chun

22..22.. IInnttrroodduuccttiioonn ttoo PPrroobbaabbiilliittyy

* Definition

P[A] = the probability that the outcome of the random experiment is an element of the set A.

1. Relative frequencies

If a trial is performed a large number of times in an independent manner, the fraction of times that event A occurs will approach, as a limit, the value P[A].

Ex] Monte Carlo Simulation: Toss a fair coin

0

0.5

1

0 20 40 60 80 100 120 140 160 180 200

“I returned, and saw under the sun, that the race [is] not to the swift, nor the battle to the

strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all. --- Ecclesiastes 9.11

2. Subjective probability

- The probability P[A] is the degree of certainty one feels that event A will occur.

- One-time trials: unique, non-repeatable trials.

Win => $1,000

Lose => $0

p

1-p $

1. Risky option 2. Certainty option

2-6

© Young H. Chun

* Simple Probability

If there are n possible outcomes, all equally likely, and an event X occurs in k of these outcomes, we say that the probability of X is k/n and is denoted as

n

kXP %][

Ex] Indecent Proposal: In the dice game “craps”, the shooter (usually one of the players with the largest bet) throws two dice and the sum of the two numbers that appear is observed.

! Possible outcomes, S

1, 1 1, 2 1, 3 1, 4 1, 5 1, 6

2, 1 2, 2 2, 3 2, 4 2, 5 2, 6

3, 1 3, 2 3, 3 3, 4 3, 5 3, 6

4, 1 4, 2 4, 3 4, 4 4, 5 4, 6

5, 1 5, 2 5, 3 5, 4 5, 5 5, 6

6, 1 6, 2 6, 3 6, 4 6, 5 6, 6

! Probability of X (Sum of the two numbers)

0.00

0.05

0.10

0.15

0.20

2 3 4 5 6 7 8 9 10 11 12

P[Sum=7] =

P[Snake eyes] =

2-7

© Young H. Chun

* Random Experiment, Outcomes, and Events

Experiment

outcome

Sample space Real-valued

space

Probabilityspace

Tail

Head

random variable

0

1

* Sample Space:

The set of all possible experimental outcomes.

• Finite • Infinite - Countably infinite - Uncountably infinite

• Discrete vs. Continuous random variables

Ex] 1. Final grade in this course S =

2. Number of marriage proposals until accepted S =

3. Length of your honeymoon S =

Case 1: Endless honeymoon Case 2: Love is short; Marriage is long…

2-8

© Young H. Chun

Ex] A coin is tossed three times and observed to be either a head or a tail each time. • Sample space

Finite sample space.

S =

• Random variable:

Let X be the number of heads.

Discrete random variable

• Probability distribution

X

P[X]

• Bar graph (Histogram)

2-9

© Young H. Chun

Ex] A fair coin is flipped successively at random until the first head is observed. Let X denote the number of flips of the coin that are required. • Sample space

S =

Countably infinite sample space.

• Random variable:

Let X be the number of flips.

Discrete random variable

• Probability distribution

X

P[X]

• Bar graph (Histogram)

Ex] Starting salary

2-10

© Young H. Chun

* Probability and Statistics

Probability is the inverse of statistics.

- Statistics helps you go from observed data to generalizations about how the world works.

- Probability goes the other direction: If you assume you know how the world works, then you can figure out what kinds of data you are likely to see and the likelihood for each

• Probability How the world works " What will happen

• Statistics What happened " How the world works

"A career is nothing to leave to chance." by American Statistical Association

Ex] "Ask Marilyn," Parade Magazine, (August 3, 1997), p. 15

It is a well-established fact that in any randomly

chosen group of 50 people, it is virtually certain that

two will have birthdays on the same date. Since there

are 365 days in a year, I can't understand why this is

the case. Can you explain this phenomenon?

! P[None] =

! P[At least one] =

If n = 23, then P[At least one] =

2-11

© Young H. Chun

* Probability and Odds ! The odds favoring event A over event B is 1 in

][

][):(

BP

APBAO % .

! The odds favoring event A ( over Ac ) is 1 in

][

][):(

c

c

AP

APAAO % .

! Relationship:

][1

][)(

AP

APAO

&%

][1

][][

AO

AOAP

'%

! The odds of dying from an injury in 2000 were 1 in 1,820.

! The lifetime odds of dying from an injury for a person born in 2000 were 1 in 24.

2-12

© Young H. Chun

Ex] If the odds are 1 in 10 favoring event A, then the probability of favoring event A is 1/11. In the fair game, the payoff ratio should be 10 to 1 should event A occur. In other words, the payoff odds is 10 to 1.

Win

Lose

$10

$1

p=1/11

10/11

Ex] e-mail from a former student

I know that the correct odds for rolling a seven in craps are 1 in 5. There are 6 chances to roll a seven out of 36 possibilities. This is 1/6, or, expressed in odds, 5 unfavorable numbers to 1 favorable.

If the casino's payout in odds is 4 to 1, I am told that the casino's edge is 16.67%. I understand the edge to be the difference between the player's chance of winning the bet (correct odds) and the casino's payout. I would like to know how to calculate the 16.67%.

I am off by a factor of 2 when I work out the problem. Thanks for the help.

E(X) =

2-13

© Young H. Chun

* Axioms of Probability:

Let S be a sample space. Then,

A1: For every event A, 0 # P[A] # 1 A2: P[S]=1

A3: P[A()B]=P[A]+P[B] if A and B are mutually exclusive. * Properties of Probability

P[*] = 0 P[ A ] = 1 - P[A]

If A+B, then P[A] # P[B]

P[A ()B] = P[A] + P[B] - P[A ,)B] Ex] Marilyn vos Savant, "Ask Marilyn", Parade Magazine, ( Aug. 4,1996 ), p. 6.

Ex] Exactly three runners, Andy, Bob, and Cade, are in a race: Andy is twice as likely to win as Bob and Bob is twice as likely to win as Cade.

• P[Andy wins the race]?

• P[Andy or Bob wins the race]?

Andy Bob Cade P[win]

Say that Tom studied a lot of mathematics in college and was campus chess champion too. If that's the case, which of the following statements is more likely to be true? A. “Tom is now a mathematician.” B. “Tom is now a mathematician and plays chess as a hobby.”

2-14

© Young H. Chun

22..33.. CCoonnddiittiioonnaall PPrroobbaabbiilliittyy

* Conditional Probability

! We have two events, A and B, whose occurrences are in some way connected.

! The event A is unknown, while the event B is known.

! The conditional probability P[A|B] of the event A given that the event B has occurred.

- . - .- .BP

BAPB|AP

,%

Ex] Throw two dice and observe the two numbers that appear.

! 36 possible outcomes (X, Y)

1, 1 1, 2 1, 3 1, 4 1, 5 1, 6

2, 1 2, 2 2, 3 2, 4 2, 5 2, 6

3, 1 3, 2 3, 3 3, 4 3, 5 3, 6

4, 1 4, 2 4, 3 4, 4 4, 5 4, 6

5, 1 5, 2 5, 3 5, 4 5, 5 5, 6

6, 1 6, 2 6, 3 6, 4 6, 5 6, 6

! P[X=6] = ! P[X=6 | Y $ 5] = ! P[X=6 | X+Y $ 10] = ! P[X=6 | X $ Y] = ! P[X=6 | X $ Y and X+Y $ 10] =

2-15

© Young H. Chun

Ex] Fighting Tigers: LSU Tigers plays 60% of its games at home and 40% of its games away. Given that the team has a home game, there is a 0.8 probability of winning. Given that the team has an away game, there is a 0.4 probability of winning.

1. Tree diagram

2. Probability Table

Result Win Lose

Home Game

Away

1.00 (a) Given that the team has a home game, what is the probability of winning? P[ Win | Home ] = (b) If the team wins on a particular Saturday, what is the probability that the game was played at home? P[ Home | Win ] =

Home

Away Win

Lose

Win

Lose

2-16

© Young H. Chun

Ex] Market Basket Analysis: 100 transactions with four items.

Wine Cheese Beer Chip -

1 x x x x . 2 x x x . 3 x x . 4 x x x . 5 x . 6 x x x . 7 x x . 8 x . 9 x x . 10 x x . . . . . . .

100 . . . . .

Total 4 6 5 8 .

Find the conditional probability (aka, confidence) of the following association rules in data mining.

(a) “Wine => Cheese”

(b) “Cheese => Wine”

Cheese Yes No

Total

Yes 4 Wine

No 96 Total 6 94 100

(c) “Beer => Chip”

(d) “Chip => Beer”

(e) “Beer => Wine”

(f) “Wine and Cheese => Chip”

2-17

© Young H. Chun

* Independent Events

If P[A] = P[A|B], then A and B are independent.

Ex] Your final grade in this course.

P[A] =

P[A | ACT score > 25] =

P[A | Age > 25] =

P[A | Checking account balance > $5,000] =

Ex] Fighting Tigers

P[ Win ] =

P[ Win | Home game ] =

• If independent, then P[A ,)B ] = P[A] / P[B] * Two Laws of Probability:

Addition law Multiplication law

• A or B • A and B

• P[A ()B]

= P[A] + P[B] - P[A ,)B]

• P[A ,)B]

= P[A] P[B|A]

= P[B] P[A|B]

• If mutually exclusive,

P[A ,)B]=0.

Thus,

P[A ()B] = P[A] + P[B]

• If independent,

P[B|A]=P[B]

or P[A|B]=P[A].

Thus,

P[A ,)B] = P[A] P[B]

2-18

© Young H. Chun

22..44.. BBaayyeessiiaann AAnnaallyyssiiss

* Two Approaches in Statistics:

1. Classical approach (Frequentist) - Use only empirical evidence; i.e., the evidence contained in samples from the population or process of interest. - Frequency interpretation of probability

2. Bayesian approach - Use any and all available information, whether it be sample information or information of some other nature. - Subjective interpretation of probability: Degree of belief

Ex] Predict your batting average at the end of the baseball season.

Sample information! 3 hits in 4 at-bats during the first game of the season.

! Classical approach: Use only the sample information:

! Bayesian approach: Use not only the sample information, but also (1) prior information and (2) loss function

Prior

probabilities P[0]

New information from research or

experimentation x

Bayesian analysis Posterior (revised)

probabilities P[0|x]

2-19

© Young H. Chun

Ex] Want to be a lawyer?

On the night of August 21, 2004, a man was struck by a speeding taxi as he crossed the street. The city where the accident occurred has only two taxi companies, Blue Cab and Green Cab. Blue Cab has only 15% of the taxis in the city. An eyewitness has testified that she thought the hit-and run taxi was blue. The man sued the Blue Cab Company for his medical expenses.

At the trial, the man’s lawyer shows that the eyewitness is 80% reliable in identifying the color of taxis. That is, she was able to identify correctly the color of taxis 80% of the time, under conditions like those of the night of the accident.

The lawyer claims that it is extremely likely that the man was actually hit by a blue cab. Do you agree? Why, or why not?

! P[Speeding cab was blue | Testified it as blue] =

! P[Speeding cab was green|Testified it as blue] =

Blueacc

Greenacc Bluewit

Greenwit

Bluewit

Greenwit

The speeding cab was Witness testifies it as

2-20

© Young H. Chun

# Another eyewitness has testified that she thought the taxi was blue. Suppose that her reliability is also 80%.

! Prior probabilities

P[Blueacc] =

P[Greenacc] = ! Joint probabilities

P[Blueacc, Bluewit, Bluewit] =

P[Greenacc, Bluewit, Bluewit] = ! Posterior probabilities

P[Blueacc | Bluewit, Bluewit] =

P[Greenacc|Bluewit, Bluewit] =

Y. H. Chun, "Bayesian Analysis of the Sequential Inspection Plan via the Gibbs Sampler," Operations Research, Forthcoming.

Y. H. Chun and R. T. Sumichrast, "Bayesian Inspection Model with the Negative Binomial

Prior in the Presence of Inspection Errors," European Journal of Operational Research, Forthcoming.

Blueacc

Greenacc Bluewit

Greenwit

Bluewit

Greenwit

The speeding cab was

Witness #1 testifies it as

Bluewit

Bluewit

Witness #2 testifies it as

2-21

© Young H. Chun

Ex] Want to be a doctor?

Suppose that a laboratory blood test is 95% effective in detecting a certain diseases when it is, in fact, present. However, the test also yields "false positive" result for 2% of the healthy persons tested (i.e., if a healthy person is tested, then, with probability 0.02, the test result will imply he/she has the disease. If 0.1% of the population actually has the disease, what is the probability a person has the disease given that his/her test result is positive?

(A) P[ Virus | Positive ] vs. (B) P[ No virus | Positive ] ?

! P[Virus|Positive]

! P[Virus|Positive, Positive]

Virus

No virus

Positive

Negative

Positive

Negative

2-22

© Young H. Chun

Ex] Jailer’s Dilemma "Ask Marilyn," Parade Magazine, ( July 5, 1992 ), p. 23

Three prisoners on death row are told that one of

them has been chosen at random for execution the next

day, but the other two are to be freed. One privately

begs the warden to at least tell him the name of one other

prisoner who will be freed. The warden relents: 'Chad will go

free.' Horrified, the first prisoner says that because he is now

one of only two remaining prisoners at risk, his chances of

execution have risen from one-third to one half!

Should the warden have kept his mouth shut?

– Marvin M. Kilgo III, Camden, S. C."

Ex] Game Show Problem: "Ask Marilyn," Parade Magazine, ( Sep. 9, 1990 )

Suppose you’re on a game show, and you’re given a choice

of three doors. Behind one door is a car; behind the others,

goats. You pick a door, say No. 1, and the host, who knows

what’s behind the doors, opens another door, say No. 3, which

has a goat. He then says to you, "Do you want to pick door No.

2?" Is it to your advantage to switch your choice?

• , "On the Information Economics Approach to the Generalized Game Show Problem," The American Statistician, Vol. 53, (February, 1999), pp. 43-51. • , "Game Show Problem," OR/MS Today ( June, 1991 ), p. 9.

2-23

© Young H. Chun

22..55.. RRaannddoomm VVaarriiaabblleess

* Random Variable

Random variable X is a numerical description of the outcome of an experiment.

123

%tailif,0

headif,1X

* Probability Distribution, P[X]

Distribution of a random variable X

Ex] Coin-toss experiment, P[X]

X 0 1 P[X] 0.4 0.6

Ex] Starting salary f(x)

1 0

f(x) 0.4

0.6

x x

P[X]

1

0

Random variable, X Sample space, S={H, T}

Head

Tail

1

0

Probability, P[X]

0.4

0.6

2-24

© Young H. Chun

* Discrete random variable

1. Probability mass function ( p.m.f. ):

! The probability that the random variable X will take a value x

! P[X=x], P[x], or Px

2. Cumulative distribution function ( c.d.f. )

! The probability that the random variable X will take a value less than or equal to a.

! P[X 4 a]

Ex] Toss a dice

x 1 2 3 4 5 6 P[X=x]

P[X 4x]

* Continuous random variable

1. Probability density function ( p.d.f. ):

! f(x)

2. Cumulative distribution function ( c.d.f. )

! F(x) = P[X 4 x]

Ex] Starting salary is 40 4 X 4 60.

! f(x) = for 40 4 X 4 60

! P[X<45] =

60

f(x)

x 50 40

2-25

© Young H. Chun

A. Expected Value, E[X] = ! ! Definition

551

552

3

%

6

$

7

7&

variablerandomcontinuous,)(

variablerandomdiscrete,][

][

dxxfx

xpx

XEx

! Properties

Let a and b are constant numbers

1. E[a] = a

2. E[aX+b] = a E[X] + b

3. E[ X1+X2+...+Xn] = E[X1] + E[X2] +...+E[Xn]

B. Variance, Var[X] = "2

! Definition

Var[X] = E[(X-!)2] = E[X2] - !2

! Properties

1. Var[a] = 0

2. Var[aX+b] = a2 Var(X)

3. Var[X1+X2+..+Xn] =Var[X1] + Var[X2] +..+Var[Xn]

if there are independent.

4. Var[X8Y] = Var[X] + Var[Y] 8 2 Cov{X, Y}

2-26

© Young H. Chun

Ex] Distribution of a random variable X

X 0 1

P[x] 0.5 0.5

E[X] = E[X2] = Var[X] =

! Distribution of a new random variable Y = 2X +1

X 0 1

Y

P[y] 0.5 0.5

E[Y] = E[Y2] = Var[Y] =

3 1

0.5 0.5

y

P[Y] 0.5

Y

3

1

0.5

Y2

1 0

0.5 0.5 0.5

X

x

P[X]

0

1

0.5

X2

2-27

© Young H. Chun

Ex] Discrete Case: Suppose that a random variable X can take only the values 0, 2, and 4 and that the probabilities of these values are as follows:

X 0 2 4

P[x] 0.3 0.5 0.2

(a) Draw the probability mass function P[x]

(b) Find the expected value of X. ! E[X] = (c) Find the variance of X.

! E[X2] =

! Var[X] = (d) Find the expected value and the variance of Y = 2X+1.

X 0 2 4

Y

P[Y] 0.3 0.5 0.2

! E[Y] = ! Var[Y] =

x 0 2 4

2-28

© Young H. Chun

Ex] Continuous Case: Suppose the density of X is given by

123 44

%otherwise0

10for2)(

xxxf

(a) Sketch the probability density function f(x).

(b) Find the cumulative distribution function F(x). ! F(x) = (c) Find the expected value of X. ! E[X] = (d) Find the median and mode of X.

! Median = ! Mode =

(e) Find the variance of X. ! E[X2] = ! Var[X] = (f) Find the expected value and the variance of Y = 3X+2. ! E[Y] = ! Var[Y] =

x 0 0.5 1.0

2-29

© Young H. Chun

22..66.. SSttaannddaarrddiizzaattiioonn

Let "

!&%

XZ ))

) ))where E[X] = ! and Var[X] = "2 denote the mean and variance of the random variable X.

! Expected value of Z:

E[Z] = 9:;

<=>

"!

&"

%9:;

<=>

"!

&"

XEX

E1

=

! Variance of Z

Var[Z] = 9:;

<=>

"!

&"

%9:;

<=>

"!

&"

XVarX

Var1

=

Ex] A. Distribution of a random variable X.

! E[X] = ! Var[X] =

B. Distribution of a random variable Z = (X-3)/2

! E[Z] = ! E[Z2] = ! Var[Z] =

5 1

0.5

Z

P[Z]

2 3 4 -2 -1 0

0.5

5 1

0.5 0.5

X

P[X]

2 3 4 -2 -1 0

2-30

© Young H. Chun

Ex] Consider the following discrete random variable:

X -2 0 8

P[X] 0.2 0.5 0.3

(a) Find the expected value of X. ! E[X] = (b) Find E[X2]. ! E[X2] = (c) Find the standard deviation of X.

! Var[X] = "? =

! Std[X] = " = (d) Find the expected value and the variance of a new random variable, Z = (X-2)/4.

X -2 0 8

Z

P[Z] 0.2 0.5 0.3

! E[Z] = ! Var[Z] =

2-31

© Young H. Chun

22..77.. BBiivvaarriiaattee RRaannddoomm VVaarriiaabbllee

* Covariance

! Cov(X, Y) = "xy

= E[ (X-!x) (Y-!y) ] =E[XY]-!x !y = E[XY]-E[X] E[Y]

! Cov(aX+b, cY+d) = a c Cov(X,Y)

! Cov(X, Y+Z) = Cov(X, Y) + Cov(X, Z)

! Var(X8Y) = Var(X) + Var(Y) 8 2 Cov(X, Y)

* Correlation coefficient

yx

xyxy

YVarXVar

YXCov

""

"%%@

][][

],[, where -1 # @xy # +1.

! "xy = the covariance between X and Y

= E[ (X-!x)(Y-!y) ]

= E[XY] -!x!y

) ! "x = the standard deviation of X

2)( xXE !&% 22][ xXE !&% )

) ! "y = the standard deviation of Y

2)( yYE !&% 22][ yYE !&%

2-32

© Young H. Chun

Ex] Discrete Case: Final grades in Marketing and ISDS

ISDS (Y)

4: A2 3: B2

4: A1 0.2 0.3 0.5 Marketing (X)

3: B1 0.4 0.1 0.5

0.6 0.4 1.0

E[X] = E[Y] =

E[X2] = E[Y2] =

Var[X] = "x2 = Var[Y] = "y2 =

E[XY] =

Cov[XY] ="xy =

@xy =

# Prove that 408.05.0*5.0*4.0*6.0

3.0*4.01.0*2.0&%

&%@xy

2-33

© Young H. Chun

22..88.. PPrroobbaabbiilliittyy DDiissttrriibbuuttiioonnss

I. Discrete Distributions, P[x], x=0, 1, 2,...,

• Binomial distribution ( n, p )

• Hypergeometric distribution

• Poisson distribution ()A )

• Negative binomial distribution

• Geometric distribution

II. Continuous Distribution, f(x), -% < x < +%

• Uniform distribution ( a, b )

• Normal ( z ) distribution ( !, "2 )

• Exponential distribution ()A )

• t, B2 and F distributions

III. Multivariate Distribution f(x, y)

• Multinomial distribution

• Bivariate normal distribution

H. Moskowitz and , "Two-dimensional Free-replacement Warranties," Product Warranty Handbook, eds. by Murthy and Blischke, Marcel Dekker, 1996, 341-363

* Classifications:

• Probability Density (or mass) Function ( p.d.f. ):

P[x] or f(x)

• Cumulative Distribution Function ( c.d.f. ):

F(x)

2-34

© Young H. Chun

A. Univariate Normal Distribution, N(!, "2)

I. Probability Density Function ( pdf ): Bell-curve

f (x|!,"2 ) %1

2# "e

& 1

2

x&!"

C)D)E) F)

G)H)

2>)

=)<)

;)

:)9)

where -% < x < +%

! E[X] = !) ) ) ! Var[X] = "2

0.00

0.10

0.20

0.30

0.40

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

II. Cumulative Distribution Function ( cdf )

0.00

0.20

0.40

0.60

0.80

1.00

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

dyexF

yx

99:

;

<<=

>HGF

EDC"!&

&

7& "#% 6

2

2

1

2

1)( where -% < x < +%

Ex] If the starting salary is X ~ N(!=50, "=10), find P[X<40].

2-35

© Young H. Chun

* How to find F(x) for a given x, ! and "?

! Method 1. Integral dyexF

yx

99:

;

<<=

>HGF

EDC"!&

&

7& "#% 6

2

2

1

2

1)(

! Method 2. Tables

- Tables for each !, ", and x. How many tables?

! Method 3. Single table

- Have a single table for the standard normal

distribution with ! = 0 and " = 1.

- Transform all others into the standard normal z.

- Look up the z table in the book.

! Method 4. Use the Excel function:

- Microsoft Excel =NORMDIST(x, !, ", true)

* Standardization of a Normal Random Variable X

If X ~ N(!, "2), then Z %X & !"

~ N(0,1)

- Standard normal distribution (or z distribution)

• E[X] = 0 • Var[X] = 1 - Microsoft Excel =STANDARDIZE(x, !, ")

2-36

© Young H. Chun

Ex] Suppose that the duration of a flight between New Orleans and New York is a normal random variable X with mean 3.6 hours and standard deviation 0.2 hour. Find the probability that

(a) P[X # 4]

(b) P[X # 3.41]

(c) P[3.41 # X # 4.0]

0.00

0.50

1.00

1.50

2.00

2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4

0.00

0.20

0.40

0.60

-4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0

2-37

© Young H. Chun

* Properties

- If X has a normal distribution with ! and "2 and if Y=aX±b, where a and b are given constant and a&0, then Y has a

normal distribution with mean a!±b and variance a2"2. - If Xi, i=1,...,n, are independent and Xi has a normal

distribution with !i and "i2, then the sum

X1+X2+...+Xn has a normal distribution with mean

!1+!2+...+!n and variance "12+"2

2+...+"n2.

- If Xi, i=1,...,n, are i.i.d. random variable from a normal

distribution with ! and "2, then the sample mean X has a

normal distribution with mean !)and variance "2/n.

* The Central Limit Theorem

"When you are listening to corn pop, are you hearing the Central Limit Theorem?"

William A. Massey

Let X1, X2,…, Xi,… be a sequence of independent random

variables with E[Xi] = ! and Var[Xi] = "2.

Then, n

X n

/"

!& has a limiting distribution that is normal with

mean zero and variance 1.

Time, X

Popcorn

!)= 2 min. 15 sec.

2-38

© Young H. Chun

B. Bivariate Normal Distribution

! Probability density function

21221

2112

1),(

@&"#"%xxf

2

1

11212

[)1(2

1exp{ HH

G

FEED

C"!&

@&&

X

]}2

2

2

22

2

22

1

11HHG

FEED

C"!&

'HHG

FEED

C"!&

HHG

FEED

C"!&

@&XXX

E[Xi ] % !i Var[Xi ] % "i2

Corr[X1, X2 ] % @12

! Marginal distributions: Normal(!i, "i2)

! Conditional distribution:

! Surface plot of a bivariate normal density

20 34 48 62 76 90

104

118

132

0

0.0001

0.0002

0.0003

0.0004

0.0005

0.0006

0.0007

2-39

© Young H. Chun

TThhee WWiizzaarrdd ooff OOddddss

From Nico M. van Dijk, “To wait or not to wait: that is the question.” Chance Magazine, Vol. 10, Winter 1997 p. 38.

The famous bus paradox is explained as simply the “curse of variation.”

! Case 1: Suppose that buses arrive exactly every 20 minutes at your corner (i.e., the times between arrivals are 20, 20, and 20 in an hour).

(a) What is the average time between arrivals?

(b) If you arrive at a random time, what is the average waiting time?

! Case 2: The slightest variation in the arrival times, keeping the same average of 20 minutes between buses, can increase your average waiting time significantly. For example, suppose that the times between arrivals are 2, 2, and 56 in an hour.

(a) What is the average time between arrivals?

(b) If you come at a random time in this hour, what is the average waiting time?

2-40

© Young H. Chun

AAppppeennddiixx:: RReevviieeww ooff BBaassiicc CCaallccuulluuss

In the following formulas, f and g represent functions of x, while a and n represent fixed real numbers. A. Derivatives

1. 0%adx

d 2. 1&% nn nxx

dx

d

3. )()(

1)(ln xf

dx

d

xfxf

dx

d% 4. )()()( xf

dx

dee

dx

d xfxf %

5. )()]([)]([ 1 xfdx

dxfnxf

dx

d nn &%

6. )()()()()]()([ xfdx

dxgxg

dx

dxfxgxf

dx

d'%

B. Integrals

7. axdxa %6 8. 1except,1

1 1 &%'

% '6 nxn

dxx nn

9. xdxx ln1 %6 & 10. - . 666 '%' dxxgdxxfdxxgxf )()()()(

11. axax ea

dxe1

%6 12. 66 % dxxfadxxfa )()(

13. )()()()( aFbFxFdxxfb

a

b

a

&%%6 ,where 6% dxxfxF )()(

!

© Young H.

!

!

!

* Vec!

!!!!"#$%

&x'(!!

!

!

)*+#!,#-&'.

!

!!

/01#!

!!!!2#3!

)

!

!

Chun

SSee

ctors

$434*403!%!,#-*05!

x6(7(!x

! "8

!!!!

"

#

)+#!9-:;:,#-*05<!!-*05!:3=!-m>!49!-:

! "?8!

/+#5#!*+##5:*403!*+

3@*+!0$!:

)+#!leng

054@43!

! L"!8

eessssiioonn

3

"!0$!05=xm>(:3=!49

$$$$

%

&

mx

x

x

<

6

'

!

:59!xi!:5#)+49!1:5-03949*9!;;#=!:!50

Ax'(!x6(7

#!154B#!+:*!/54*#

:!,#-*05!

gth!0$!:!,49!@4,#3

8!CC"CC!'

33.. RReevv

iinn DDaatta

33..11 LLiinne

=#5!&m.'9!=#$43#=

#!5#$#55#=5*4-D;:5!50$!n!50/0/!,#-*0

7(!xmE!

9FBG0;!#9!:!-0;D

,#-*05!"!03!GF!*+#!H

66

6' xx (

vviieeww ooff

aa MMiinniin

eeaarr AAllgg

'>!49!:3!0=!GF!:!G0

=!*0!:9!-5#15#9#3/9!:3=!'05!:3=!49!

&‘>!=#30*B3!:9!:!

0$!m!#;#BHF*+:@05

66 <<< x((

ff BBaassiicc

nngg IIII!

ggeebbrraa

05=#5#=!90;=$:-#=

0B103#3*:*403!49!-0;DB3/54**#3!:

*#9!*+#!*550/!:3=

B#3*9!#B5#:3!$05B

6mx !

cc TToooollss

9#*!0$!5#:=(!;0/#5-

#3*9!05!#;9!-:;;#=!:3<!!%!,#-:9!$0;;0/

5:39109#=!,4-#!,#

B:3:*43@BD;:I!

ss

:;!3DBG#-:9#!;#**#

;#B#3*9!:!-0;DB3*05!0$!05/9I!

#!&4<#<(!*+#59:><!

@!$50B!*+

JK'!

#59!#5<!

0$!3!5=#5!

+#!

+#!

!

© Young H.

!!!!L-:!

2

!M

!

!!!!N#-!

)

!

!!

!!!!N#-!

)

!!#"$!2!

&:>!"!O!!&G>!L"!!!&->!"K%!!&=>!"4

Chun

:;:5!BD;*

2#*!c!G#!:/4*+!i*

$!c!8!'PL

-*05!:==

)+#!9DB!3DBG#

! &!8!

-*05!BD;

)+#!433#9:B#!3-0B10

"’%

#*!"?8!AQ

O!%!8!!!

8!!

%!8!!!

49*:3-#!G

*41;4-:*40

:3!:5G4*5*+!#3*5F!c

L"(!*+#!,#

4*403!

0$!*/0!,#5!0$!#3*5

"!O!%!/4

;*41;4-:*4

5!&05!=0*3DBG#5!003#3*!150

= x'y'!O

Q(!'E!:3=!

G#*/##3!

03I!Scali

5:5F!9-:;cxi<!

#-*05!49!3

,#-*059!"54#9(!49!*+

4*+!i*+!#3

03I!Proj

*>!150=D-0$!#3*54#0=D-*9!

O!x6y6!O7

%?!8!A'(!

"!:3=!%I

ing!

;:5<!!)+#3

305B:;4R

"!:3=!%(!+:*!,#-*0

3*5F!zi!8!x

jection!

-*!0$!*/0#9!49!=#$4

7+ xmym

SE<!

I!!CC"K%CC!

3!*+#!150

R#=T!4<#<(!

#:-+!+:,05!

xi!O!yi<!

0!,#-*0593#=!:9!*+

m<!

8!!!!

0=D-*!c"

4*9!;#3@*

,43@!*+#!

9!"!:3=!%+#!9DB!0

"!49!:!,#-

*+!49!'<!

9:B#!

%!/4*+!*+0$!

JK6!

-*05!

+#!

! ! JKJ!

© Young H. Chun

#"$!'()*+,-!./0(%*1*I!LD1109#!*+:*!$4,#!9*D=#3*9!43!*+#!-;:99!1099#99!*+#!$0;;0/43@!-+:5:-*#549*4-9I!!

! N:54:G;#9!

U#4@+*! V#4@+*! W#3=#5! XF#!-0;05! U:45!-0;05!

M*#B9!

%BG#5! YS! 'JQ! Z! @5##3! G;03=!

[5:=! Y\! '6]! ^! G50/3! G50/3!

_+:=! \6! 6]]! ^! G;D#! G;03=!

"0D@! \Y! 6']! ^! G50/3! G50/3!

XB4;F! \`! 'Y]! Z! G50/3! G50/3!!

a943@!03;F!*+#!+#4@+*9!:3=!/#4@+*9!43!*+#!*:G;#(!-:;-D;:*#!*+#!=49*:3-#9!G#*/##3!1:459!0$!9*D=#3*9<!

!

! !!XD-;4=#:3!=49*:3-#9!!

! %BG#5! [5:=! _43=F! "0D@! X=!

%BG#5! 0 15.30 65.49

[5:=! 15.30 0 80.16 90.45 41.48

_43=F! 65.49 80.16 0 10.77 40.45

"0D@! 75.95 90.45 10.77 0 50.04

XB4;F! 28.65 41.48 40.45 50.04 0 !

100

120

140

160

180

200

220

60 65 70 75 80

A

B

C D

E

!

© Young H.

* Mat!

! !!"!

%

!

!

!

K!!

K!

!

K!

!

K!

!! !!^

!

)

!

!

Chun

trices

#$434*403

%!B:*54bD11#5-:55:3@

!)nm&.

M$!m!8!n

M$!aij!8!aB:*54b

M$!aij!8!]*+#3!*+

M$!aij!8!*+#3!*+=#30*#

^:*54b!BD

)+#!150=B:*54b@4,#3!G

! 'ijc

3!

b!.!0$!05-:9#!;#**#@#=!43*0!m

!!!!

"

#

'

m

n

a

a

a

<

'

6'

''

>

n(!*+#3!*+

aji!$05!:;;b!49!-:;;#=

]!$05!:;;!+#!B:*54b

'!$05!:;;!+#!B:*54b#=!2<!

D;*41;4-:*

=D-*!.3!b!3!8!cbj

GF!

*'

'n

sisba

'

5=#5!m .!#5(!49!:!5#m!50/9!:

ma

a

a

<

6'

66'

'6'

+#!B:*54b

;!#;#B#3=!:!symm

0$$K=4:@b!49!-:;;#

03K=4:@b!49!-:;;#

*403!

0$!:3!m.

jkd!49!*+#

sjb (!!!!!i!8

n(!@#3#5#-*:3@D;:3=!n!-0

mn

n

n

a

a

a

<

<<

<

<

6

'

b!49!-:;;#

3*9!0$!:!9metric!B

@03:;!#;##=!:!diag

03:;!#;#B#=!:!iden

.n!B:*54!m.k!B:

8!'(!6(!7

5:;;F!=#3;:5!:55:F!0;DB39I

$$$$

%

&

!

#=!:!squ

9eD:5#!BB:*54b<!

#B#3*9!0gonal!B:

B#3*9!0$ntify!B:*

4b!.!8!ca:*54b!'!/

7(!m!:3=

30*#=!GF0$!#;#B

are!B:*5

B:*54b(!*+

0$!:!9eD::*54b<!

$!:!=4:@0*54b!:3=!4

aijd!:3=!/+09#!#;

=!j!8!'(!6

!:!G0;=$:B#3*9!

54b<!

+#3!*+#!

:5#!B:*54b

03:;!B:*49!D9D:;;

:3!n.k!;#B#3*!c

(!7(!k.

JKS!

:-#!

b(!

*54b(!;F!

cij!49!

!

© Young H.

!!D!

)

!

!

!

! !!In!

2

!

! !!Tr!

2

!

!

!

#"$!2

!

&:>!C.!!&G>!tr&!!&->!.3

Chun

etermina

)+#!deter

=#30*#

! C.C!8

nverse!0$

2#*!2!G#!*3!9D-+9eD:5#

race!0$!:

2#*!.!8!cB:*54b#;#B#3

! tr&.

#*! ') >66&.

C!8!!!

&.>!8!!!

3!8!!!!

ant 0$!:!

rminant!#=!GF!C%C

8!k

jja

'*''

$!:!9eD:5

*+#!m.m

+!*+:*!.3#!m.m!!B

:!9eD:5#!

caijd(G#!:b!%(!/54**3*T!4<#<(!

.>!8!*'

m

j

a'

$%

&!"

#'

J'

Q6

9eD:5#!B

0$!*+#!9eC(!49!*+#!9

jA +' &CC

5#!B:*54b

m!4=#3*4*F3!8!3.!8B:*54b!.

B:*54b

:3!m.m

*#3!*5&%>

iia !

%

&!:3=!

)6&3

B:*54b!

eD:5#!m9-:;:5!

j(+ '>' !

b!

F!B:*54b8!2!49!-:.!:3=!49!=

9eD:5#!B>(!49!*+#!

!"

#

+'

)

J

>63

.m!B:*5

<!!)+#!9e:;;#=!*+#!=#30*#=!G

B:*54b<!!9DB!0$!*

$%

&+

6'

QJ

54b!.!8!c

eD:5#!m.inverse!GF!.K'<!

)+#!trac

*+#!=4:@0

!

caijd(!

.m!!B:*50$!*+#!

ce!0$!*+#03:;!

JKQ!

54b!

#!

! ! JKY!

© Young H. Chun

* Matrices and Systems of Linear Equations !

! !!_0394=#5!*+#!9F9*#B!0$!;43#:5!#eD:*4039!@4,#3!GF!!

! ! a''!x'!O!a'6!x6!O!7!O!a'n!xn!8!b'!! ! a6'!x'!O!a66!x6!O!7!O!a6n!xn!8!b6!! ! 7! am'!x'!O!am6!x6!O!7!O!amn!xn!8!bm!!

! !!a943@!B:*54-#9!-:3!@5#:*;F!94B1;4$F!*+#!9*:*#B#3*!:3=!90;D*403!0$!:!9F9*#B!0$!;43#:5!#eD:*4039<!!

! .

$$$$

%

&

!!!!

"

#

'

mnmm

n

n

aaa

aaa

aaa

<

<<<<

<

<

6'

6666'

''6''

(!!"

$$$$

%

&

!!!!

"

#

'

nx

x

x

<

6

'

(!!:3=!!4

$$$$

%

&

!!!!

"

#

'

mb

b

b

<

6

'

!

!

! )+#3(!4*!B:F!G#!/54**#3!:9!."!8!4<!!)+D9(!"!8!.K'!4!!

! !!%!90;D*403!*0!:!;43#:5!9F9*#B!0$!m!#eD:*4039!43!n!D3f30/3!,:54:G;#9!49!:!9#*!0$!,:;D#9!$05!*+#!D3f30/3!,:54:G;#9!*+:*!9:*49$4#9!#:-+!0$!*+#!9F9*#Bg9!m!#eD:*4039<!!#"$!Z43=!:!90;D*403!*0!*+#!;43#:5!9F9*#B<!! &a9#!^MhNXiLX!:3=!^^a2)!43!Xb-#;>!!

! !]'6

Q6'

6'

6'

'+

'(

xx

xx! ! . $

%

&!"

#

+'

'6

6'(!" $

%

&!"

#'

6

'

x

x(!:3=!4 $

%

&!"

#']

Q!

!! ! "!8!.K'!4!8!

!

© Young H.

* Mat

!!.551/6

Xb-#;!+:9B:*54-#9(-#;;!*+:*!/*0!:==!*+49!#3*#5#=50/!=0/!

!!71/51/6

)+#!*5:39-0;DB39!*+#3!9#;#B:*54b<!L*+#!;0/#5*5:39109#!

!!8)(+19)0!BD;*41*+:*!*+#!B3DBG#5!0*+#!$459*!B$05!*+#!15^^a2)%55:F!6<!G0*+!*+#!150=D-*!B!

!!71/51/6L#;#-*!:!-9#;#-*!^"943@;#!,:0$!,:;D#9*+#!=#*#5B43,#59#!0+:9!30!D3D34eD#!90!

!!71/51/6M$!*+#!43,BD9*!30*!B:*54b<!_B#3D!$0591#-4$F!*+B#5#;F!+:3=!AL+4$43!*+#!+4@

Chun

trix Ope

6!:-!;)4+-0<

9!30!$D3-*403(!G0*+!B:*54-/4;;!G#!$459*!#!-#;;9!$05!*+=(!-;4-f!:3=!=3!$05!*+#!3DB

6!+=,!>-0/*

9109#!0$!:!B*+#3!*+#!*5:3-*!-01F<!W0!L#;#-*!X=4*(!*+5!54@+*!-053##!0$!*+#!9#;#-

(%1/6!80+-1

1;F!B:*54-#9B:*54-#9!-:3!0$!50/9!0$!*+B:*54b!:3=!*+50=D-*!B:*54)<!X3*#5!*+#!-i:*+#5!*+:3!-03*50;!A_*5;B:*54b!/4;;!:

6!0!?,+,-@1

-#;;!/+#5#!F"X)Xi^!$5:;D#!:3=!30*!9!$05!*+#!9eD:B43:3*!/4;;!0$!*+#!B:*54b34eD#!90;D*400;D*403<!

6!0/!2/A,-*,

,#59#!0$!:!9eD#eD:;!]<!)0!_;4-f!03!*+#!H5!^:*+!j!)54+#!:55:F!0$!,4**43@!X3*#5!$*E!f#F9!/+4;@+;4@+*#=!:5#

erations

<+1/6!80+-1<

3!*0!:==!05!B-#9!BD9*!+:,50/!:3=!$459+#!$459*!50/!:=5:@!*+#!-#;;BG#5!0$!50/

9:*,!:B!0!8

B:*54b!+:9!*+#39109#!0$!%!*0!*+#!-#;;!/+#3!9#;#-*!H#5!0$!*+#!H:9*-*#=!B:*54b!/

1<,*!C1+=!#"

9(!*+#!$459*!9*#G#!BD;*41;4#+#!9#-03=!B:+#!9:B#!3DBb!/4*+!*+#!:1-#;;9!$05!*+#!B#5#;F!+4**4;E!:3=!AL+4$*:11#:5!43!*+#

1/0/+!C1+=!#

0D!/:3*!*0!150B!*+#!$D3-:3!:55:F!0$!,:5#!B:*54b!$0:11#:5!43!*+!:3=!*+#!9F9*03<!M$!*+#!=#*

,!:B!0!;D)0-

D:5#!B:*54b!$43=!*+#!43,H:9*#!ZD3-*44@<!M3!*+#!/4,:;D#9!$05!*+#05!-;4-f43@!0;#!+4**43@!X3#:<!

s with M!

<,*!C1+=!#"

BD;*41;F!$D3-,#!*+#!3DBG#9*!-0;DB3!0$!:3=!$459*!-0;D;!:-5099!*0!$0/9<!)+49!/4;;!

80+-1"!C1+=!#

#!50/9!:3=!-+:9!n!50/9!:/+#5#!F0D!/::9*#!L1#-4:;*#!L1#-4:;!B/4;;!:11#:5<

"<,(!#1!49!*0!B:f##=<!)+#!3DB:*54b<!)+#!15BG#5!0$!-0;DB1150154:*#!3$459*!B:*54b!43@!X3*#5!05!E!f#F9!/+4;##!+4@+;4@+*#=

#"<,(!

1D*!*+#!=#*#5B-*4039!D3=#5,:;D#9<!M3!*+#05!/+4-+!F0D#!91#-4$4#=!-*#B!0$!#eD:**#5B43:3*!49!

-,!80+-1"!C

&k50/9!8!k-,#59#(!$459*!+4403!GD**03!:33=0/!G#94=##!9eD:5#!B:*03!*+#!lm!GD3*#5!05!-;4-f43

Microso

"<,(!-*4039<!)0!:=#5!0$!50/9!:3$ *+#!B:*54b!*DB3!0$!*+#!B05!*+#!3DBG#F4#;=!*+#!9DB

#"<,(!-0;DB39!5#,#:3=!m!-0;DB:3*!*+#!$459*!5<!_;4-f!03!*+B#3D<!h#b*!-;

#!9D5#!*+:*!*+BG#5!0$!-0;DB50=D-*!B:*54bB39!:9!*+#!93DBG#5!0$!50/43!%55:F!'!:-;4-f43@!03!*#!+4**43@!X3*#=!:5#:!4$!#,#5

B43:3*!*+#3!*+#!B#3D!$0#!/43=0/!G#D!/:3*!*+#!=#-#;;<!M$!*+49!,*4039!:990-4:30*!R#50!*+#3

C1+=!#"<,(!

0;DB39>!#b44@+;4@+*!*+#!:3=!9#;#-*!^M#!*+#!/05=!%*54b!$05!/+4-D**03(!F0D!B3@!03!*+#!lm

oft Exc

==!05!9DG*5:-3=!-0;DB39<!*+:*!/4;;!-03*B:*54-#9!*0!G#5!0$!-0;DB3B!0$!*/0!B:

#59#=<!M$!*+#!BB39<!Z459*(!+450/!:3=!*+#!+#!G0b!G#94=#;4-f!03!lm!0

+#!=4B#39403B39!0$!*+#!$45b!/4;;!+:,#!*9#-03=!B:*54/9!:3=!-0;D:3=!*+#!-#;;9!*+#!lm!GD**0#5!05!-;4-f43@5F*+43@!+:9!G

-;4-f!03!*+#05!^:*+!j!)5#94=#!*+#!/0#*#5B43:3*<!_,:;D#!49!R#50!:*#=!/4*+!*+#3!:3!43,#59#

9*9!*+#3!*+#!=:5#:!-055#910MhNXiLX!$5%ii%n!G#94-+!F0D!/:3*!BD9*!+0;=!=0/m!GD**03<!)+

cel

-*!M3!:!*:43!*+#!9DB#!:==#=<!l339<!h#b*!-;4-f:*54-#9<!

B:*54b!%!+:@+;4@+*!*+#!0$459*!-0;DB3#!)5:39109#!05!+4*!*+#!X3*

39!0$!*+#!B:59*!B:*54b!B*+#!9:B#!3DBb<!M3!Xb-#;(!DB39<!L#;#-*!$05!*+#!9#-0303(!F0D!BD9@!03!*+#!lmG##3!=03#!-0

#!H:9*#!ZD3-*54@<!)+#!=#*#05=!%ii%n!_;4-f!lm!:3*+#3!F0D!-:3#!-0#$$4-4#3*9#!-:3!G#!$0D3

=#*#5B43:3*!03=43@!/4*+!50B!*+#!$D3-4=#!*+#!/05=*+#!43,#59#<!/3!G0*+!*+#!+#!43,#59#!B

!

B(!#3*#5!:!$05B3-#!*+49!$05Bf!:3=!=5:@!*+

9!m!50/9!:3054@43:;!B:*53!0$!*+#!*5:39*+:*!49!$0D3=*#5!f#F<!)+#!

:*54-#9!:5#!9DBD9*!#eD:;!*+#BG#5!0$!50/+4@+;4@+*!:3*+#!$D3-*4033=!B:*54b!439*!+0;=!=0/3m!GD**03<!)+#055#-*;F<!

*403!GD**03<!h#5B43:3*!49!:91#-4$F!*+#!3=!*+#!,:;D#!330*!$43=!:3!9!$05!*+#!B:*3=!:3=!*+#5#!

0$!*+#!B:*54*+#!05=#5!0$-*4039!D3=#5!=!%ii%n!i:*+#5!*+:3-03*50;!A_*5

B:*54b!/4;;!:1

JK\!

BD;:!BD;:!+:*!

=!n!54b(!9109#!=!43!

D-+!#!/9!:9!3!:5#:!3!3!3!#!

h0/!:!:55:F!0$!

*54b!49!:!

4b!$!*+#!*+#!

3!5;E!11#:5!

! ! JK`!

© Young H. Chun

33..22.. OOppttiimmiizzaattiioonn !

* Optimization of a Function of a Single Variable !

<!!%!necessary!-03=4*403!$05!:!1:5*4-D;:5!90;D*403!x!8!xo!*0!G#!#4*+#5!:!B434BDB!05!:!B:b4BDB!49!*+:*!!

! ! ]>&'

dx

xdf!:*!x!8!xo<!

!

!!!)+#5#!:5#!$4,#!10994G;#!90;D*4039!9:*49$F43@!*+#9#!-03=4*4039<!!

!!

!!!)0!0G*:43!B05#!43$05B:*403!:G0D*!*+#9#!$4,#!critical!1043*9(!4*!49!3#-#99:5F!*0!#b:B43#!*+#!9#-03=!=#54,:*4,#<!!)+D9(!4$!!

! ! ]>&

6

6

,dx

xfd!:*!x!8!xo(!

!

*+#3!xo!BD9*!G#!:*!;#:9*!:!;0-:;!B434BDB<!M$!f&x>!49!:!convex!$D3-*403(!4*!49!:!@;0G:;!B434BDB<!!!!L4B4;:5;F(!4$!!

! ! ]>&

6

6

-dx

xfd!:*!x!8!xo(!

!

*+#3!xo!BD9*!G#!:*!;#:9*!:!;0-:;!B:b4BDB<!M$!f&x>!49!:!concave!$D3-*403(!4*!49!:!@;0G:;!B:b4BDB<!

f&x>!

W;0G:;!B434BD

x

W;0G:;!B:b4BD

M3$;#-*403!1043*!

20-:;!B:b4BDB

20-:;!B434BDB

! ! JKp!

© Young H. Chun

* Optimization with Microsoft Excel !)0!D9#!L0;,#5(!$459*!B:f#!9D5#!*+:*!4*!49!439*:;;#=<!!)0!=0!90(!9#;#-*!Tools!$50B!*+#!B:43!

B#3D<!M$!*+#!01*403!L0;,#5!:11#:59(!*+#3!L0;,#5!49!:;5#:=F!439*:;;#=!:3=!F0D!:5#!5#:=F!*0!150-##=<!M$!*+#!01*403!L0;,#5!=0#9!30*!:11#:5(!*+#3!F0D!BD9*!439*:;;!L0;,#5<!

)0!439*:;;!L0;,#5(!9#;#-*!Add-Ins!$50B!*+#!Tools!B#3D(!*+#3!9-50;;!*+50D@+!*+#!%==KM39!=4:;0@!G0b!D3*4;!F0D!$43=!*+#!L0;,#5!01*403<!_;4-f!03!*+49!01*403!&:!-+#-fKB:5f!9+0D;=!:11#:5!43!*+#!-055#9103=43@!G0b>!:3=!*+#3!-;4-f!03!OK<!%$*#5!:!G54#$!1#540=!*+#!439*:;;:*403!/4;;!G#!-0B1;#*#<!&n0D!B:F!G#!:9f#=!*0!439#5*!*+#!054@43:;!_"Kil^<>!

n0D!9+0D;=!30/!G#!:G;#!*0!D9#!*+#!L0;,#5!GF!-;4-f43@!03!*+#!Tools!+#:=43@!03!*+#!B#3D!G:5!:3=!9#;#-*43@!*+#!Solver…!4*#B<!)+#!L0;,#5!H:5:B#*#59!=4:;0@!G0b!/4;;!:11#:5<!!

)+#!Set Target Cell!G0b!9+0D;=!-03*:43!*+#!-#;;!;0-:*403!0$!*+#!0Gq#-*4,#!$D3-*403!$05!*+#!150G;#B!D3=#5!-0394=#5:*403<!!!Max!05!Min!B:F!G#!9#;#-*#=!$05!$43=43@!*+#!B:b4BDB!05!B434BDB!0$!*+#!9#*!*:5@#*!-#;;<!!M$!Value of!49!9#;#-*#=(!*+#!L0;,#5!/4;;!:**#B1*!*0!$43=!:!,:;D#!0$!*+#!):5@#*!_#;;!#eD:;!*0!/+:*#,#5!,:;D#!49!1;:-#=!43!*+#!G0b!qD9*!*0!*+#!54@+*!0$!*+49!9#;#-*403<!!!)+#!By Changing Cells!G0b!9+0D;=!-03*:43!*+#!;0-:*403!0$!*+#!=#-49403!,:54:G;#9!$05!*+#!150G;#B<!

Z43:;;F(!*+#!-039*5:43*9!BD9*!G#!91#-4$4#=!43!*+#!Subject to the Constraints!G0b!GF!-;4-f43@!03!Add<!!Change!:;;0/9!F0D!*0!B0=4$F!:!-039*5:43*!:;5#:=F!#3*#5#=!:3=!Delete!:;;0/9!F0D!*0!=#;#*#!:!15#,40D9;F!#3*#5#=!-039*5:43*<!!Reset All!-;#:59!*+#!-D55#3*!150G;#B!:3=!5#9#*9!:;;!1:5:B#*#59!*0!*+#45!=#$:D;*!,:;D#9<!!Options!43,0f#9!*+#!L0;,#5!01*4039!=4:;0@!G0b<!!V#!5#:;;F!9+0D;=3g*!#,#5!+:,#!*0!/055F!:G0D*!*+#!L0;,#5!l1*4039!=4:;0@!G0b<!!%9!F0D!-:3!9##(!:!9#54#9!0$!=#$:D;*!-+04-#9!:5#!43-;D=#=!*+:*!=45#-*!L0;,#5g9!9#:5-+!$05!*+#!01*4BDB!90;D*403!:3=!$05!+0/!;03@!4*!/4;;!9#:5-+<!)+#!Guess!9#;#-*403!49!30*!1:5*4-D;:5;F!D9#$D;!$05!0D5!1D5109#9!:3=!/4;;!30*!G#!=49-D99#=!+#5#<!!!

V+#3!*+#!Add!GD**03!49!-;4-f#=(!*+#!%==!_039*5:43*!=4:;0@!G0b!:11#:59I!_;4-f43@!03!*+#!Cell Reference![0b!:;;0/9!F0D!*0!91#-4$F!:!-#;;!;0-:*403!&D9D:;;F!:!-#;;!/4*+!:!$05BD;:><!!)+#!Constraint!*F1#!B:F!G#!9#*!GF!9#;#-*43@!*+#!=0/3!:550/!&r8(!s8(!8(!43*(!/+#5#!43*!5#$#59!*0!43*#@#5(!05!G43(!/+#5#!G43!5#$#59!*0!G43:5F><!!)+#!Constraint!G0b!B:F!-03*:43!:!$05BD;:!0$!-#;;9(!:!94B1;#!-#;;!5#$#5#3-#(!05!:!3DB#54-:;!,:;D#<!!)+#!Add!GD**03!:==9!*+#!-D55#3*;F!91#-4$4#=!-039*5:43*!*0!*+#!#b49*43@!B0=#;!:3=!5#*D539!*0!*+#!%==!_039*5:43*!=4:;0@!G0b<!!)+#!OK!GD**03!:==9!*+#!-D55#3*!-039*5:43*!*0!*+#!B0=#;!:3=!5#*D539!F0D!*0!*+#!L0;,#5!"4:;0@!G0b<!

V+#3!F0D!5D3!Xb-#;g9!L0;,#5(!4*!#b#-D*#9!:!9#54#9!0$!t:==K43u!$4;#9!:3=!50D*43#9<!!a103!-0B1;#*403!0$!*+#!,:540D9!150@5:B9(!Xb-#;!15#9#3*9!*+#!D9#5!/4*+!*+#!L0;,#5!i#9D;*9!=4:;0@!G0b<!

!

© Young H.

#"$!^!

! ! !!!

#"$!^

!

! ! !!!!#"$!"t*54:;!90;,#!B#v!wM91#3=-+4;=5

!

! !!

! !

! !

! !!

! !!

Chun

^434B4R#

^:b4B4R#

"#:5!^:5:3=! #554*!B:*+#M!/:3*!*<! %=B49#3(!x<Q]

!N:54:G;

!!

!!

!!

!_039*5:!!!!

!f&x>!8!x6

#!f&.>!8!.

54;F3I!tM!505u! B#*#B:*4-:;0!*:f#!'99403! $0<!!U0/!B

;#9!&303K

:43*9!

6!K!Sx O!Q

yJ

J .+. e!

90;,#=!**+0=(! :3;;F!GD*!+']]!1#0105! B#3! 4B:3F!0$!

K3#@:*4,#

Q!

*+49!150G3=! 943-#!+:,#!+:=1;#!*0!*+49! x'](!#:-+!-:3

K!Z5#

#(!43*#@#

!

G;#B!:!$#*+#3(! M

=!30!9D-#!-45-D9$05! /0B3!M!*:f#v

#=!^<![D;;0

#5!,:;D#>

#/!F#:59g,#! G##3-#99<!_:(!:3=!M!+B#3! x6vu!

0-f(!_#550!

!

9!:@0!GF!3! *5F43@:3!F0D!++:,#!xpQ<Q(! :3=!

W05=0(!h<

JK']!

*+#!@! *0!+#;1!Q!*0!$05!

<_<!z!

! ! JK''!

© Young H. Chun

33..33.. EEnnttrrooppyy

!* Definition !

! )+#!entropy!H&X>!0$!:!=49-5#*#!5:3=0B!,:54:G;#!X!49!!

! ! j

c

jj ppXH ;0@>&

'*'

+' !

!

! )+#!;0@!49!*0!*+#!G:9#!6!:3=!#3*501F!49!#b15#99#=!43!bits<!!! )+#3(!H&X>!49!*+#!;0/#5!G0D3=!03!*+#!3DBG#5!0$!binary!!! ! eD#9*4039!5#eD45#=!*0!=#*#5B43#!*+#!,:;D#!0$!X<!!

! ! H&X>!"!Xb1#-*#=!3DBG#5!0$!{D#9*4039!&X{>!r!H&X>O'!!! k!M$!*+#!G:9#!0$!*+#!;0@:54*+B!49!b(!! ! /#!/4;;!=#30*#!*+#!#3*501F!:9!Hb&X><!!

Hb&X>!8!&;0@b!a>!Ha&X>!8!Ha&X>P!&;0@a!b>!!

)+D9(!H&X>!8!He&X>P!ln!6!05!H&X>!8!H']&X>P!;0@']!6!!!#"$!)099!:!$:45!-043!03-#<!!

! ! He&X>!8!!!

! ! H&X>!8!!!! M*!*:f#9!03;F!03#!G43:5F!eD#9*403!*0!$43=!*+#!5#9D;*I!X{8'!!

! ! M9!4*!:!+#:=v! M$!*+#!:39/#5!49!F#9(!*+#3!4*!49!:!+#:=!! ! ! ! ! M$!*+#!:39/#5!49!30(!*+#3!4*!49!:!*:4;<!

! ! JK'6!

© Young H. Chun

#"$!V+#5#!:5#!F0D!$50Bv!!

X Louisiana Mississippi Texas Others

P(X) 4/8 2/8 1/8 1/8 !

! !!X3*501F!! He&X>!8! !!:3=!!!!!H&X>!8!!!

! !!X{!8!!!

!!!!! !!! !!)+#!9#eD#3-#!0$!eD#9*4039!B:**#59!43!=:*:!B4343@v!!

!!!!! !! H&X>!8!'<\Q!G4*9(!GD*!X{!8!!!! !!M$!4*!49!:!BD;*41;#!-+04-#!eD#9*403!/4*+!S!:39/#59!! ! He&X>!8!!! ! HS&X>!8!!

'PQ

SPQQP\!

6P\!\P`!

'P`!

6P`!

n#9!

h0!

n#9!

h0!

n#9

h0

^4994994114v!

20D494:3:v!

)|!!!!!!!!'!

^L!!!!!!!6!

2%!05!!!J!0*+#59

'P`!

QP`!

6]P`

'P6

'P66PS!

6PS!SP`!

SP`!

6P`!

n#9!

h0!

n#9!

h0!

n#9

h0

20D494:3:v!

^4994994114v!

)#b:9v!

2%!!!!!!!!'!

^L!!!!!!!6!

)|!05!!!J!0*+#59

SP`!

6P`!

'SP`

)#b:9v!

! ! JK'J!

© Young H. Chun

#"$!V+:*!/:9!F0D5!@5:=#!43!ML"L!6]]'v!!

X A B C D

P(X) 1/4 1/4 1/4 1/4

!! He&X>!8! :3=! H&X>!8!!!!! %<!l3#!10994G;#!9#eD#3-#I!X{!!8!!!!

! !!

! [<!l1*4B:;!9#eD#3-#I!X{!!8!!!!

! ! ! !!!! !!h0*!:!G43:5F!eD#9*403(!GD*!-+#-f!03#!0$!*+#!$0D5!G0b#9<!!

! He&X>!8!! :3=!!

! HS&X>!8!!

%v

h0

n#9!

h0

h0

n#9

%!05![v!

'P6

'P6

'P6

'P6SP`!

SP`!

n#9!

_v

%y

[y

_y

"y

pPS

'P6

'P66PJ!

'PJ!JPS!

'PS!

'PS!

n#9!

h0!

n#9!

h0

n#9

h0

%v!

[v!

_v!

%!!!!!!!'

[!!!!!!!6!

_!05!!!J!"!

'PS!

6PS!

! ! JK'S!

© Young H. Chun

* Huffman Codes!!

! Z43=!*+#!B09*!#$$4-4#3*!9#54#9!0$!F#9K30!eD#9*4039!*0!!! ! =#*#5B43#!:3!0Gq#-*!$50B!:!-;:99!0$!0Gq#-*9<!!

Xb!'E!

! !!

! !!He&X>!8!! :3=! H&X>!8!!!

! !!X{!8!!!!

Xb!6E!

! !!

! !!He&X>!8!! :3=! H&X>!8!!!

! !!X{!8!!!!

Xb!JE!

! !!

! !!He&X>!8!! :3=! H&X>!8!!!

! !!X{!8!!!!! !

.! ]<6Q! ! ! ! !

3! ]<6Q! ! ! ! !

'! ]<6]! ! ! ! !

?! ]<'Q! ! ! ! !

#! ]<'Q! !

.! 'PS! 6PS! 6PS! SPS!

3! 'PS! 'PS! 6PS! !

'! 'PS! 'PS! ! !

?! 'PS! !

E.! SP`! SP`! SP`! `P`!8;! 6P`! 6P`! SP`! !>F! 'P`! 6P`! ! !G+=,-*! 'P`!

!

© Young H.

!!!!!!)+! ,!

! !

!!!!!!)+!

! !

!

! !

!

!!!!!)+!

! !!

! /! =!!!!!!i#!

! H!

! H!

! I!

! I!

! I!

Chun

+#!joint e

,:54:G;#9

&XH

+#!condit

&XH

XH &

+#!mutua

I&XT

/+4-+!49!=D#!*0!*+

#;:*4039+

H&XCY>!8

H&X, Y>!8

I&YT!X>!8!

I&XT!Y>!8!

I&XT!Y>!8!

33..44.. EE

entropy!H9!X!:3=!n

>(YX +'

tional (m

>CYX '

'YX >C

al inform

Y>!8!H&X

*+#!5#=D#!f30/;

+419!

8!H&X( Y>

8!H&X>!O

H&Y>!K!H

I&YT!X>!

H&X>!O!

nnttrrooppyy

H&X,Y>!0n!/4*+!:!

pyx

**//

+

mean) en

y x

* */ /!"

#+

**/ /

+y x

p

mation &-

X>!K!H&X

D-*403!43#=@#!0$!

>!K!H&Y>

O!H&Y|X>

H&Y|X>!

H&Y>!K!H

ooff TTwwoo

0$!:!1:45!q043*!=49

;0>(& yxp

ntropy!H

>C& yxp

xyp ;0@>&

-5099!#3*5

X|Y>!!

!*+#!D3-Y<!

>!!8!H&Y>

H&X, Y>

oo VVaarriiaabb

0$!=49-59*54GD*40

(&0@ yxp

H&X|Y>!49

&;0@ xp

yp

xyp

>&

>&@

501F> I&X

#5*:43*F!

>!O!H&X|

bblleess

#*#!5:3=03!p&x(!y>

>y !

&>C ypy $%

&

XTY>!49!

0$!X!

Y>!

0B!!>!49!

>y (!05!

JK'Q!

!

© Young H.

#"$!'9FB1*-#5*:43-03=DpQ}!#15#9#36}!0$*#9*#=1:*4#3!

!!~043*!

!

!!_03=!

Hh

!

!)+D9(!

0!

EAH

Chun

'(0**1B1<0

*0B9(!*+#3!=49#:9#D-*!:!-0B#$$#-*4,#3*<!!U0/$!*+#!+#:(!*+#3(!/3*!+:9!*+#

*!150G:G

Test!

=4*403:;!

!

H094*4,# xh#@:*4,# x

*+#!#3*5

8c]<Y]](!]

He&0>8]He&0Cx>E8]<

M&0T!x!>8]

0+1:/!>-

#!15405!1#!49!Y]}B15#+#39#!43!=#*#-/#,#5(!*+#;*+F!1#59/4*+!150G#!=49#:9#

G4;4*4#9(!H

H094*4,#!

h#@:*4,#

!

150G:G4;

N

'!x6!

01F!@:43

)#]<S]]d!

<Y\J!<'Q]!

]<Q6J!

-,,I!LD11150G:G4;4}<!!)+#!=94,#!G;00-*43@!*+##!*#9*!:;99039!*#9*G:G4;4*F!]#<>!

HAxi(!0jE!

N

x'!

#!x6!

;4*4#9(!HA

N45D910'!

]<]\'!

3!$50B!*+

#9*!

HAx'E8!]<

HAx6E8!]<

109#!*+:*F!*+:*!*+=0-*05!/:0=!*#9*<!!#!=49#:9#90!F4#;=9*#=!&4<#<(!]<]6(!*+#

Di

N45D9!0'!!

]<]J]!

]<Y]]!

A0j!C!xiE!

h0!

]

+#!G;00=

H

h

Q\`

<S66

:*(!G:9#=!+#!1:*4#3:3*9!*0!$D)+#!G;00#9!/+#3!49!t$:;9#!14$!:!+#:#!*#9*!5#9

isease

h0!,45D

]<Jp

]<S]

,45D9106!

]<p6p!

=!*#9*!49!

H094*4,#!x'!

h#@:*4,#!x6

03!*+#!13*!+:9!:!$D5*+#5!0=!*#9*!494*!49(!43!$1094*4,#u;*+F!1#59D;*!/4;;!4

D9!06!

p6!

]]!

HA

]<Q]<S

!

0 8c]<p`Y

He&0Cx'>!8

0!8c]<]\

He&0Cx6>!8

1:*4#3*g9

9!$:-*(!u!5#9D;*!$903!49!4B1;F!*+

!

]<S66!

'<]]]!

AxiE!

Q\`!S66!

Y(!]<]'Sd

8]<]\JY!

\'(!]<p6pd

8]<6QY6!

JK'Y!

9!

$05!

+#!

!

!

© Young H.

!!X3*5!!

! H!!

! H!!

! H

!!!!_03=!!

! H!!

! H

!!

!!^D*D!!

! I!!

! I!!

! I

!!!HI!JI!'

Chun

01F!

He&0>!!8!!

He&X>!!8!

He&X(!0>!

=4*403:;!

He&XC!0>!8

He&0CX>!8

D:;!43$05

Ie&XT!0>!8

Ie&XT!0>!8

Ie&XT!0>!8

'=)/!&6]]^:b4B

Q

!

!!

!8!!!

#3*501F!

8!He&X(1

8!He&X(10

5B:*403!

8!He&0>!K

8!He&X>!K

8!He&X>!O

]Q>(!!�L#54:BDB!;4f#;4

Quality Eng

0>!K!He&0

0>!K!He&X

&M3$05B:

K!He&0CX>

K!He&XC0>

O!He&0>!K

:;!M391#-*404+00=!:3=!^gineering(!

0>!!8!!!!

X>!!8!!!

:*403!@:4

>!!8!!!

>!!8!!!

K!He&X,!0

03!H;:3!43!^:b4BDBN0;<!'\(!h

43>(!I&X(1

0>!!8!!!

*+#!H5#9#3B!X3*501F!%h0<!S(!11<!Y

10>!

3-#!0$!M391%1150:-+#Y6\�YJ6<!

#-*403!X5509(�!

JK'\!

059I!

!

© Young H.

* Qua!! )! -!! !!!

! !

! !!

! !

!! '

! !

! !

!

! 6

! !!!!!!!!N#!

! !

Chun

antity o

)+#!eD:3-:3!G#!0G

!!!I&XTY>

!

'<!M$(!:$*#

-0B

H&X

6<!M$!X!:3

H&X

#33!=4:@5

HKX

of Inform

3*4*F!0$!43G*:43#=!G

>!8!H&X>

8!H&Y>

8!H&X>

x

**//

'

#5!5#-#4,

B1;#*#;F!

X|Y>!8!]!:

3=!Y!:5#!

X|Y>!8!H&

5:B!

H&XCY

XL!

mation

3$05B:*GF!0G9#5

>!K!H&X|Y

>!K!H&Y|X

>!O!H&Y>!

(& yxpy

*/

43@!:!B#

=#$43#=(

:3=!I&XT

43=#1#3

&X>!:3=!I

I&XY>!

H&X

403!:G0D5,43@!:30

Y>!!

X>!!

K!H&X,Y

&

&;0@>

xp

py

#99:@#!:G

(!*+#3!

Y>!8!H&X

3=#3*(!*+#

I&XTY>!8

XTY>! H

X(Y>!

D*!:!5:3=0*+#5!5:3

Y>!

>&>

>(&

ypx

yx!

G0D*!Y(!*

X><!

#3!

8!]<!

H&YCX>!

HK

=0B!,:543=0B!,:

*+#!,:;D#

!HKYL!

:G;#!X!*+:54:G;#!Y

#!0$!X!49!

JK'`!

+:*!!Y<!

!

!

© Young H.

#"$!)!

!! !!

!!!!

! !!

! !! !!

! !!

! !! !! !!

! !

!

! !

Chun

/0!,:54:

pro

X

!X3*501F

He&XHe&YHe&|

!_03=4*4

H&XH&Y

!^D*D:;!

I&XTI&XTI&XT

!N#33!=

HKX

:G;#9(!X!

Joint obabilitie

1

0

F!

X>!8!Y>!8!|(Y>!8!

403:;!#3*

XCY>!8!H&YCX>!8!H&

43$05B:

Y>!8!H&YY>!8!H&XY>!8!H&X

4:@5:B!

H&XCY8]<p6\

XLMNIOO

:3=!Y!

s

1 0

0 0

0

:3=:3=:3=

*501F!

&X(Y>!K!H&X(Y>!K!H

:*403(!I&X

Y>!K!H&YX>!K!H&XX>!O!H&Y

Y>!\!

I&XT8]<]

H&X(Y

O!

Y

1

0.25

0.10

0.35

=! H&X>!!8=! H&Y>!!8=! H&X(Y>

H&Y>!!8!!H&X>!!8!!

X(Y>!

YCX>!!8!!!XCY>!!8!!Y>!K!H&X,

H

8TY>!]\J!

Y>8'<`Y'

Y

0

0.25

0.40

0.65

8!!!8!!!

Y>!8!

!

X,Y>!!8!!!

H&YCX>!8] `Y'

HKY

0.50

0.50

1.0

YLMOIPQR

0

0

0

!R!

JK'p!

!

© Young H.

#"$!LD$00*G:!

!A. Ca! )! !! )! M3!B. Ca!

! K!! !! !!

! K!! !! !!

i#9X

!!!V+4

Chun

D1109#!*:;;!@:B#

Where?Y1

ase 1<!h)+#!150G

4*9!#)+#!5#:903!*+:*!-:

ase 2<!l

l1*403!M$!*+M$!*+

l1*403!M$!*+M$!*+

!

9D;*!X!

V2

)0*:;!

X3*501F!

-+!01*40

*+:*!=D549!:9!$0;;0

? Ho

Aw

h0!eD#9*4G:G4;4*F!=#3*501F!403:G;#!::9#(!*+#!#

l3;F!03#!

'<!%9f!/+#!5#9103+#!5#9103

6<!%9f!/+#!5#9103+#!5#9103

U

V43!209#!

He

03!49!B05

43@!*+#!'0/9<!!"4

ome

way

4039!:;;0=49*54GD*449!He&X>!:39/#5!49#5505!5:*#

eD#9*403

/+#5#v!Y39#!49!U039#!49!%/

/+#3v!Y39#!49!":39#!49!h4

V+#5#

U0B#!

J!]!

J!

He&Y'>!8!

5#!#$$#-*4

p`Q!9#:94=!/#!/4

Whe

Day

2W 0L

0W 3L

2W 3L

0/#=<!403!49!cY8!!!!!9!V03y!#!8!S]}

3!:;;0/#

Y'!0B#(!/+/:F(!/+

Y6!:F(!/+:*4@+*(!/+

#v!Y'!

%/:F!

J!S!

\!

4,#v!

903(!2La43!*+#!#4@

en? Y2

Nigh

1W 0

3W 1

4W 1

YP'](!SP'

}<!

#=7!

+:*!49!F0D+:*!49!F0D

*!49!F0D5!:*!49!F0D

V+

":F!

6!J!

Q!

He&Y6>!

a!1;:F#=@+*+!@:B

ht

0L 3

1L 3

1L 6

]d!:3=!!

D5!:39/#D5!:39/#

:39/#5vD5!:39/#

#3v!Y6!

h4@+*!

S!'!

Q!

8!!

=!$459*!']B#v!

W 0L

W 4L

W 4L

#5v!#5v!

v!#5v!

)0*:

Y!S!

']

!

JK6]!

]!

;!

!

© Young H.

!K!Opt!

!!!!!)+!

! H

!!!!!!^D!

! I

! !!!!!!!~04!

!!!!!!)+!!!!!!^D!

! I!

! !!!!!!!X55

Chun

tion 1<!%

+#!condit

He&X|Y'>!

D*D:;!43$

I&X( Y'>!8!!!!!8

43*!150G:

i#9D;*!X

+#!joint e

D*D:;!43$

I&XTY'>!8

!!!8!

505!5:*#!8

cY

%9f!/+#5

tional (m

8!!

$05B:*40

8!He&X>!K8!

:G4;4*F!=

!

V4

20

!

entropy!H

$05B:*40

8!He&X>!O

!

8!!

YP'](!SP']

5#v!Y'!

mean) en

3!

K!He&XC Y

49*54GD*40

43!

9#!

He&X, Y'

3(!I&XT Y

O!He&Y'>!

]d!

JP'

\P'

ntropy!H

Y'>!

03!

V+#

U0B#!

]<J!

]<]!

]<J!

>!=!!!

Y'>!

K!He&X,

U

%

']

]

H&X|Y'>!!

#5#v!!Y'!

%/:

]<J

]<S

]<\

Y'>!

U0B#y!cJP

%/:Fy!cJP

:F!

J!

S!

\!

PJ(!]PJd!

P\(!SP\d! !

!

]<Y!

]<S!

'<]!

JK6'!

! ! JK66!

© Young H. Chun

K!Option 2<!%9f!/+#3v!Y6!!

!!!!!)+#!conditional (mean) entropy!H&X|Y6>!!!

!! He&X|Y6>!8!!!!!!!!^D*D:;!43$05B:*403!!

! I&X( Y6>!8!He&X>!K!He&XC Y6>!! ! !!!!!8!!!!!!!~043*!150G:G4;4*F!=49*54GD*403!!

! V+#3v!!Y6! !

":F! h4@+*!

i#9D;*!X

V43! ]<6! ]<S! ]<Y!

209#! ]<J! ]<'! ]<S!

! ]<Q! ]<Q! '<]!

!!!!!!)+#!joint entropy!He&X, Y6>!= !!!!!!!!^D*D:;!43$05B:*403(!I&XT Y6>!!

! I&XTY6>!8!He&X>!O!He&Y6>!K!He&X, Y6>!!

! ! !!!!8!!!!!!!!X5505!5:*#!8!!

":Fy!c6PQ(!JPQd!

h4@+*y!cSPQ(!'PQd!

cYP'](!SP']d!

QP']

QP']

! ! JK6J!

© Young H. Chun

C. Case 3<!)/0!eD#9*4039!:5#!:;;0/#=7!!

!!!K!l1*403!'!

! !!

! ! X5505!5:*#!8!'P']! ! X{!8!6<]! ! I&XTY>!8!]<SS`!!!!!K!l1*403!6!

! !!! ! X5505!5:*#!8!'P']! ! X{!8!'<\! ! I&XTY>!8!]<SS`!!!!!K!l1*403!JI![#9*y!

! !!

! ! X5505!5:*#!8!'P']! ! X{!8!! ! I&XTY>8]<S66`!! !

":Fv!

h4@+*v!

209*y

V03yU0B#v

%/:Fv

V03y cSPQ(!'PQd

QP']!

U0B#v!

%/:Fv!

V03y

":Fv

h4@+*v

209*y

V03y c]<\Q(!]<6Qd!

":Fv!

h4@+*v!

209*y c](!'d!

V03y c'(!]d!U0B#v

%/:Fv

U0B#v

%/:Fv V03y c]<\Q(!]<6Qd!

V03y c'(!]d!

QP']!

QP']! 'PQ

SPQ

6PQ

JPQ

! ! JK6S!

© Young H. Chun

!

TThhee WWiizzaarrdd ooff OOddddss

!

The Deer Hunter !

!!t2#*g9!1;:F!:!@:B#!0$!iD994:3!50D;#**#<!!n0D!:5#!*4#=!*0!F0D5!-+:45!:3=!-:3g*!@#*!D1<!!U#5#g9!:!@D3<!!U#5#g9!*+#!G:55#;!0$!*+#!@D3(!94b!-+:BG#59(!:;;!#B1*F<!

!!h0/!/:*-+!B#!:9!M!1D*!*/0!GD;;#*9!43!*+#!@D3<!!L##!+0/!M!1D*!*+#B!43!*/0!:=q:-#3*!-+:BG#59v!!M!-;09#!*+#!G:55#;!:3=!9143!4*<!

!!M!1D*!*+#!@D3!*0!F0D5!+#:=!:3=!1D;;!*+#!*54@@#5<!_;4-f<!!n0Dg5#!9*4;;!:;4,#<!!2D-fF!F0Dy!!h0/(!MgB!@043@!*0!1D;;!*+#!*54@@#5!03#!B05#!*4B#<!

!!V+4-+!/0D;=!F0D!15#$#5(!*+:*!M!9143!*+#!G:55#;!$459*(!05!*+:*!M!qD9*!1D;;!*+#!*54@@#5vu!

! !

! ! JK6Q!

© Young H. Chun

EExxeerrcciissee PPrroobblleemmss !!!LD1109#!*+:*(!G:9#=!03!*+#!1:*4#3*g9!9FB1*0B9(!*+#!15405!150G:G4;4*F!*+:*!*+#!1:*4#3*!+:9!:!-#5*:43!=49#:9#!49!6]}<!!)+#!=0-*05!/:3*9!*0!$D5*+#5!-03=D-*!:!-0B15#+#394,#!G;00=!*#9*<!!)+#!G;00=!*#9*!49!Q]}!#$$#-*4,#!43!=#*#-*43@!*+#!=49#:9#9!/+#3!4*!49(!43!$:-*(!15#9#3*<!!U0/#,#5(!*+#!*#9*!:;90!F4#;=9!t$:;9#!1094*4,#u!5#9D;*!$05!Q]}!0$!*+#!+#:;*+F!1#59039!*#9*#=<!!

! XI!)#9*!5#9D;*9v!)0*:;!

H094*4,#(!X'!! h#@:*4,#(!X6!

YI!N45D9v!n#9(!Y'! ]<']! ]<']! ]<6]!

h0(!Y6! ]<S]! ]<S]! ]<`]!

)0*:;! ]<Q]! ]<Q]! '<]]!!

&:>!Z43=!*+#!#3*501F!H&Y>!:3=!#b15#99!4*!43!bits<!!

! !!!!!

&G>!Z43=!*+#!q043*!#3*501F!H&X, Y>!:3=!#b15#99!4*!43!bits<!!

! !!!!!!

&->!Z43=!*+#!-03=4*403:;!#3*501F!H&YCX>!:3=!#b15#99!4*!43!bits<!!

! !!!!!!&=>!V+:*!49!*+#!5#=D-*403!43!*+#!D3-#5*:43*F!0$!Y!=D#!*0!*+#!f30/;#=@#!0$!Xv!!Xb15#99!4*!43!bits<!!

! !!!

! ! "#$!

© Young H. Chun!

!

!!""####$$%%&&''(())''**""##++,,$$--..$$//""''00%%11""22$$&&33

“A good display has many purposes, but it reaches its highest value when it

forces you to see something you weren't expecting.” – by Laurie Snell

!

4'5,36&$7$&3'89:",$+62'*6.6''

! !!%&'(&('!)&&)*+!! ! %&'(&('!,(-.(/0(!12!34(!&)5!')3)!6/!&)/7!1&'(&!!

! ! 860&1,123!9:0(;! ! Data!!<=!!Sort…!'

! !!>3(?#)/'#;()2!'6,@;)*!!

! ! >(@)&)3(,!')3)!(/3&6(,!6/31!;()'6/A!'6A63,!B,3(?,C!)/'!! ! ! 3&)6;6/A!'6A63,!B;()D(,C!!

!!

4'!9::6,$7$&3'*6.6''

!!!EF!Tabular!)/'!Graphical!8(341',+ !

! G&(,(/36/A!')3)!6/!3)H;(,!)/'!04)&3,!!

! ! Excel! ! Insert!!<=!!Chart…!!

!!!EEF!Numerical!8(341',+!!

! I!J)/'?)&7!,.??)&6(,!BK(/3&);!3(/'(/0*C+!!

! ! E/3(&@&(36/A!3*@60);!D);.(,!!

! I!L6,@(&,61/!BM)&6)361/C+!!

! ! L();6/A!5634!'6D(&,63*!!

! ! Excel! ! Tools!!<=!!Data Analysis…

N!OO$$PQQRRSSNT!T!O$$PPQRS!

S!$QRSN!

! ! "#P!

© Young H. Chun!

!

(());;))''<<66==992266,,''66&&11''>>,,66--??$$++6622''00""..??%%11##''

“One picture is worth more than ten thousand words.”

!

@)'Qualitative'A%,'B6."3%,$+62C'*6.6+!!

!!!$F!U)H.;)&!?(341',!!

! V&(-.(/0*!1&!@(&0(/3!2&(-.(/0*!'6,3&6H.361/!!

Final Grade A B C Total

Number 24 40 16 80 Frequency 0.3 0.5 0.2 1.0

!!!!PF!W&)@460);!?(341',+!!

! !!X)&!K4)&3!

! ! !

! ! Excel! Insert => Charts… => Column !

! !!G6(!04)&3!

! ! ! ! !

! ! Excel! Insert => Charts… => Pie

0

10

20

30

40

50

A B C

8);(V(?);(

! ! "#Q!

© Young H. Chun!

D)'Quantitative'A%,'89:",$+62C'*6.6'!!!!$F!U)H.;)&!?(341',!!

! I!V&(-.(/0*!'6,3&6H.361/!!!

! I!Y(;)36D(!2&(-.(/0*!'6,3&6H.361/!B@(&0(/3)A(!'6,3&6H.361/C!!

! I!K.?.;)36D(!2&(-.(/0*!'6,3&6H.361/!!

Final Exam 70~79.99 80~89.99 90~99.99 Total

Number 24 40 16 80 Frequency 0.3 0.5 0.2 1.0

!! !!Excel! Tools => Data Analysis… => Histogram !

!!!PF!W&)@460);!?(341',!!

! I!L13!@;13!!

! I!Z6,31A&)?,+!!

! I!K.?.;)36D(!'6,3&6H.361/+!%A6D(!

!!! !!Excel Insert => Charts… => Column

0

5

10

15

20

25

50 60 70 80 90 100 More

D$&

E,"F9"&+G

0%

20%

40%

60%

80%

100%

120%

! ! "#"!

© Young H. Chun!

B)'D$/6,$6."'*6.6 !

!!!$F![.);63)36D(!')3)+!!K1/36/A(/0*!3)H;(,!!

! !!K1/36/A(/0*!3)H;(!!

! Blood test !

G1,636D(! \(A)36D(!

Patient M6&.,! T! $! $O

\1!D6&.,! ]! Q"! "O

! $R! QR! RO'

! !!G&1H)H6;63*!3)H;(+!!

! Blood test !

G1,636D(! \(A)36D(!

Patient M6&.,! OF$N! OFOP! OFPO!

\1!D6&., OF$P! OF]N! OFNO!

! OFQO! OFSO! $FOO!'

! !!Excel! Data => Pivot table and pivot chart report… !!!!PF![.)/363)36D(!')3)+!!

! !!>0)33(&!@;13!!

!!

! !!Excel! Insert => Charts… => XY(Scatter)

0

5

10

15

20

25

0 5 10 15 20

Mileage

^A(

! ! "#R!

© Young H. Chun!

*)'B?6,.'H$76,1#'$&'0$+,%#%I.'JK+"2'!

!!K4)&3,!)&(!.,('!31!)/);*_(!')3)!A&)@460);;*F!!`(;;#0&()3('!)/'!21&?)33('!04)&3,!0)/!4(;@!@(1@;(!)/'! H.,6/(,,(,! ?)7(! '(06,61/,! H),('! 1/! 34(! 6?@)03! 34)3! 34(6&! 6?)A(,! @&1D6'(! 31! 34(! .,(&,F!860&1,123!9:0(;!6,!(-.6@@('!5634!34(!K4)&3!`6_)&'!34)3!);;15,!*1.!31!0&()3(!)/'!21&?)3!)!04)&3!31!,.63!);?1,3!)/*!,0(/)&61!1&!/(('F!!

1. Launch Microsoft Excel!Ba1.!'1/b3!7/15!415cC!!

2.!Enter the data to be graphed.!!

!

3. Select the data you want to graph. !

!!!>(;(03!H134!34(!/.?(&60!')3)!)/'!)'d)0(/3!&15!)/'!01;.?/!;)H(;,e!9:0(;!.,(,!34(!;)H(;,!21&!;(A(/'!)/'!):6,!6/21&?)361/F!!E2!34(!')3)!6,!/13!01/36A.1.,f!.,(!9:0(;b,!?.;36@;(!,(;(0361/!3(04/6-.(e!,(;(03!34(!26&,3!&)/A(!12!')3)f!41;'!'15/!34(!Control!7(*f!)/'!34(/!,(;(03!34(!,(01/'!&)/A(!12!')3)F!!

!

!

© Young H.

4. Select !

!!!U4(;117634(!3134)3!@!

!!!>3(04)&3!!

!!!E/!>0;607601/3&!

!!!>3(J(A(/12!34(12!34(!

!!!E/!>*1.!234(!04')3)!1Finis

*1.&!!

5. Manip!

!!!^/!J67(!)&1./

!4'!6:!

! !!

! !!

! !!

! !!

! !!

!

Chun!

the Chart W

(!K4)&3!`6_)6/A!04)&3F!!^11;H)&F!!U4(!@&1?@3!*1.!2

@!$!12!34(!K46/!34(!;(23!0

>3(@!Pf!,@(066/A!34(!K4)&&1;,!54604!;)

@!Q!12!34(!K4/',!)/'!W&6'(,(!1@361/,f!(!>3(@!Q!'6);

>3(@!"f!34(!K21&!)!04)&3!;14)&3!1/!34(!,1&!1/!)/134(sh!)23(&!*1.!04)&3F!

pulating the

9:0(;!04)&3!134(&!51&7,/'!63F!

:-2"'*6

K&('63!,0

Y6'6/A!;)

81D6(,!

8GW!

U63)/60!

WizardF!

)&'!A.6'(,!*23(&!*1.!,(;(K4)&3!`6_)&21&!6/21&?)36

4)&3!`6_)&'!1;.?/f!)/'!3

62*!34(!;10)36&3!`6_)&'f!34)H(;,!)@@()&!

4)&3!`6_)&'!';6/(,F!!U1!00;607!34(!3)H;1A!H1:F!

K4)&3!`6_)&'10)361/F!!a1.,)?(!51&7,4(&!51&7,4((3,(;(03!34(!;1

e Chart

6,!)!51&7,4,4((3!1Hd(03,

6."'!".'

01&(!

)5/!?15

!

*1.!34&1.A4!((03!34(!')3)!*&'!'6,@;)*,!)61/!)H1.3!34(

;(3,!*1.!04134(/!0411,(!

61/!)/'!1&6(/41,(!0(;;,!)@@1/!34(!g#):

@&(,(/3,!/.?04)/A(!1/(!H!)3!34(!31@!

'!@&1?@3,!.!0)/!0&()3(!4((3!),!34(!3F!!K;607!0)361/!21&!

((3!1Hd(03F!!f!*1.!0)/!?1

5(&!

()04!12!34(!,*1.!5)/3!31!0)!,(&6(,!12!,3((!04)&3F!

11,(!34(!3*@(34(!,.H#3*@(

/3)361/!12!*1@()&!6/!34(!D:6,F!

?(&1.,!1@361

1D(!63f!04)/A

,3(@,!/(0(,,)04)&3f!0;607!3(@#H*#,3(@!-

(!12!04)&3!*1(!6/!34(!&6A43

1.&!')3)F!!E2!*Data Range

1/,!&)/A6/A!

A(!63,!,6_(f!'

)&*!31!0&()3(34(!K4)&3!`-.(,361/,!)/'

1.!5)/3F!>(;(3!01;.?/F!

*1.!,(;(03('H1:F!U4(!Se

2&1?!K4)&3!

(;(3(!63!)/'!@

!)!@&12(,,61/`6_)&'!H.331/'!'6);1A!H1:

(03!34(!3*@(!1

'!')3)!H(21&(eries!,(0361/

U63;(,!31!

@;)0(!H1&'(&

"#]!

/);#/!1/!:(,!

12!

!/!

&,!

! ! "#S!

© Young H. Chun!

!

(())LL))''8899::"",,$$++6622''00""..??%%11##''

“79.48% of all statistics are made up on the spot.” by John Paulos

!@)'0"6#9,"#'%I'B"&.,62'<"&1"&+G'A%,'M%+6.$%&C'!

!!!$F!Arithmetic mean+!)!3*@60);!D);.(!21&!-.)/363)36D(!')3)!!

#!G1@.;)361/!?()/! ! "

xii"$

N

#

Nf!N!6,!34(!@1@.;)361/!,6_(!

!

! #!>)?@;(!?()/!n

x

x

n

ii#

"" $!54(&(!n!6,!34(!,)?@;(!,6_(!

!“When the Okies migrated from Oklahoma to California, they raised the

average IQ's of both states.” --- Will Rogers

!!!!PF!Weight average+!^'d.,36/A!21&!6?@1&3)/0(! !!!

!!!QF!$% trimmed mean+!Y(?1D6/A!1.3;6(&,!!

Y(?1D(!34(!,?);;(,3!$h!)/'!34(!;)&A(,3!$h!12!34(!')3)!

D);.(,!)/'!34(/[email protected](!34(!?()/!12!34(!?6'';(!B$#P$%h!12!34(!')3)F!

!!# @/",63"'*$II$+92.G'

At a recent fund-raising dinner, a group of six MBA alumni, sitting around a table were, oddly enough, discussing compensation issues. Although reticent to divulge their own individual annual compensations, they agreed that it would be useful if they knew the average salary of the group. Can you derive a strategy that would enable themselves to know the group average, without anybody knowing the salaries of anybody else?

! ! "#N!

© Young H. Chun!

Ex]!8()/!6,!?()/6/A;(,,c!!

!!!iU4(!A&()3!?)d1&63*!12!@(1@;(!4)D(!?1&(!34)/!34(!)D(&)A(!/.?H(&!12!;(A,j!!^?1/A!34(!RS!?6;;61/!@(1@;(!6/!X&63)6/!34(&(!)&(!@&1H)H;*!RfOOO!@(1@;(!541!4)D(!1/;*!1/(!;(AF!!U4(&(21&(f!34(!)D(&)A(!/.?H(&!12!;(A,!6,!! BROOO!&!$!k!R]fTTRfOOO!&!PClRSfOOOfOOO!<!$FTTTT$PQF!!!!81,3!@(1@;(!4)D(!351!;(A,FFFm!!!!!"F!Median+!!

! #!^!3*@60);!D);.(!21&!-.)/363)36D(!)/'!1&'6/);!')3)F!!

! ! E2!n!<!1''f!34(!D);.(!12!34(!?6'';(!63(?F!!

! ! E2!n!<!(D(/f!34(!)D(&)A(!12!34(!351!?6'';(!63(?,!!!!!RF!Mode+!!

! #!^!3*@60);!D);.(!21&!/1?6/);!')3)F!!

! #!L)3)!D);.(!34)3!100.&,!with greatest frequencyF!!!!!]F pth Percentile!!

! #!p!@(&0(/3!12!34(!313);!1H,(&D)361/,!6,!H(;15!34)3!D);.(F!!!!!SF!Quartile+!!

! #!V6&,3f!,(01/'f!)/'!346&'!-.)&36;(,F!!* Box-and-Whisker Plot: Five-Number Summary!!

86/f![$f![Pf![Qf!8):! X

! ! "#T!

© Young H. Chun!

Ex]!K;),,#,6_(!')3)!21&!)!,)?@;(!12!26D(!0;),,(,+!!

! "]! R"! "P! "]! QP!!

!!B)C!%&'(&('!)&&)*+!^&&)/A(!34(!')3)!6/!),0(/'6/A!1&'(&F!!! QP! "P! "]! "]! R"!!!!BHC!V6/'!34(!,)?@;(!)D(&)A(+!!

30 35 40 45 50 55 60 !

x =

!!!B0C!V6/'!34(!,)?@;(!?('6)/+ !

!

!!B'C!V6/'!34(!,)?@;(!?1'(+ !

!

Ex]!Warm up your calculator!!!K1/,6'(&!34(!)A(,!)/'!?6;()A(,!12!34(!21.&!).31?1H6;(,!6/!34(!,)?@;(F!!

Car Age Mileage

$! $! $]!P! Q! "O!Q! R! R"!"! N! $ON!

Mean

Median !

!

© Young H.

D)'0"!

!!!$F!Ra!

=

!!!PF!In!

<

!!!"F!Va!

! !

!

! !!

! !

!!!RF!St

! !!

!

!!!]F!Co!

! !

Chun!

"6#9,"'

ange!!

= X?):!#!X

nterquar

<![Q!#![

ariance+

!G1@.;)3

!>)?@;(

B^C!

tandard

!G1@.;)3

!>)?@;(

oefficien

$OO&!

'h

%I'*$#-

“All m

X?6/!!!

rtile ran

$!

+!!

361/!D)&6

(!D)&6)/0

sP " i"$

n

#

deviatio

361/!,3)/

(!,3)/')&

nt of var

h!!1&!!x

s

-",#$%&

easurements

nge!B!E[Y

6)/0(+!'

0(!

Bxi ( x C$

#

n ($

on+!

/')&'!'(D

&'!'(D6)3

riation+

$OO& h

&'A%,'N6

s are subject

Y!C !

P "

Bxi"$

N

#

P

!!1&!BX

D6)361/+!

361/+!s "

Y(;)36D(

!

6,$6.$%&C

t to variation

xi ( !CP

N

XC! sP " i"

n

#

P'"'

sP !

(!D)&6)361

C'

n.”

!

xiP ( n

"$

n

#

n ($

P!

1/!

nx P

!

"#$O!

! ! "#$$!

© Young H. Chun!

Ex] Class-size data for a sample of five classes: !

K;),, i xi! Bxi# x C! Bxi# x CP! xi

P!

$! "]!

P! R"!

Q! "P! #P! "! $S]"!"! "]! P! "! P$$]!R! QP! #$P! $""! $OP"!

! x = >.?!<! !

!!B)C!V6/'!34(!,)?@;(!D)&6)/0(+!!

! !!n,(!34(!21&?.;)!B^C!!!

sP!<

!

! !!n,(!34(!21&?.;)!BXC!!!

! ! sP!< !

!

!!BHC!V6/'!34(!,)?@;(!,3)/')&'!'(D6)361/+!s!< !

!!B0C!V6/'!34(!01(22606(/3!12!D)&6)361/+!!

! ! 0D!<! !

Ex] TI-30Xa+!K1/,6'(&!34(!21;;156/A!,)?@;(!')3)F!!

Car Age Mileage

$! $! $]!P! Q! "O!Q! R! R"!"! S! $ON!R! $"! $QS!

Average

Variance

Standard deviation

!

© Young H.

B)''!?!! I!

!! I!

! !!

! !!

!

Well,

!!Ex]!>.oQOOfO!

B)C!41.

!

! !!BHC!'1;;?()

!

! !

Chun!

?6-"'%I

!Unimod

!Symmet

#!J(

#!Y6

by definiti

.@@1,(!3OOO!)/'!

`4)3!'1.,(!@&60(

E2!34(!21;)&!?)/,)/!1&!34(

I'.?"'*$

dal!1&!bi

trical!1&!

(23#,7(5(

A43#,7(5

You kn

ion, half of

34)3!34(!?34(!?('

1(,!346,!,(,!4(&(c!

1./'(&!12,61/!6/!34(!?('6)/

$#.,$=9.

modal!'

skewed

('+!;1/A

5('+!;1/

now how du

f them are e

?()/!@&'6)/!@&60

,)*!)H1.

2!)!46#3(04(!01??/!D);.(!1

.$%&'

'6,3&6H.36

'6,3&6H.

(&!;15(&

/A(&!&6A4

umb the av

even dumb

60(!12!)!4(!6,!oNO

.3!34(!'6,

04!01?@?./63*f!512!34(!41

1/!

.361/!

&!3)6;+!!!8

43!3)6;+!!8

verage guy

ber than tha

41.,(!6/fOOOF!

,3&6H.361/

@)/*!H.6;54604!A11.,(,c!

!

8()/!p!8

8()/!=!8

!

is?

at.!!!qF!YF!r

/!J)2)*(

/!12!

;',!)!o$O1(,!.@!?

8('6)/

8('6)/!

rX1Hr!L1H

33(!6,!

O!?6;;61/?1&(f!34(!

"#$P!

HH,!

/!

!

© Young H.

*)'D$/

!! !!

! !

!Ex]!U&(./61*()&f!3'(0&()!!Ex]!U34(!@)5),!S?()/!(:)?c!

! !!

Chun!

/6,$6."'

!K1D)&6)

!K1&&(;)

4(!?()/1/!5),!S34(!?()/),(!54(/

46&3*#,6:),,6/A!,0Nf!34(!?12!);;!,0c!B!2&1?!M

*6.6O'

)/0(+!K1

)361/!01(

/!)A(!12!?SPFSR!*()/!)A(!5)/!);;!34(!

:!,3.'(/31&(!5),!

?()/!,01&01&(,!5)MathCount!

!

1DBxf!yC!<

(22606(/3+

?(?H(&)&,F!!^3!3),!S$F]S!0;),,!?(

3,!3117!34SOF!!U4(&(!12!341,!S$F!Z121&!?6'';(

<!PQSFRO

!K1&&Bxf

&,!12!34(!34(6&!2623**()&,F!!Z(?H(&,!)

4(!26/);!(!?()/!,,(!541!215!?)/*(!,0411;!,3.

O!

!yC!!<!!rx

0;),,!12!*#26&,3!&(Z15!0)/)&(!)!*()

(:)?!;),01&(!12!2)6;('!5*!,3.'(/3.'(/3,C!

xy!<!kOFT

2 sR"!)3!34(./61/!34/!34(!?()&!1;'(&c

),3!*()&f!341,(!54),!]Of!)/3,!'6'!/1

TRO!

4(6&!26236(4(!/(:3!)/!)A(!c!

1/!5460441!@),,(/'!34(!13!@),,!34

"#$Q!

!

(34!

4!('!

4(!

! ! "#$"!

© Young H. Chun!

(())PP))''**""##++,,$$--..$$//""''00%%11""22$$&&33''QQ$$..??''JJKK++""22''

!

@)'R#"I92'I9&+.$%&#'!!!!860&1,123!9:0(;!@&1D6'(,!)!/.?H(&!12!2./0361/,!21&!01??1/!,3)36,360);!1@(&)361/,F!!U1!.,(!34(,(!2./0361/,f!0411,(!Insert!)/'!Functionf!1&!0;607!G),3(!V./0361/!311;! fx !;10)3('!1/!34(!>3)/')&'!311;H)&FF!!

!!!U*@60);;*f!34(!,*/3):!12!)!2./0361/!4),!34&((!?)d1&!@)&3,+!)/!(-.);!,6A/f!34(!2./0361/!/)?(f!)/'!)!&)/A(!12!0(;;,F!!

!!!^;;!2./0361/,!H(A6/!5634!34(!<!,6A/F!U4(!2./0361/!/)?(!.,.);;*!6/'60)3(,!54)3!34(!2./0361/!'1(,F!V1&!(:)?@;(f!<>UL9MB^$O+^PRC!0);0.;)3(,!)!,3)/')&'!'(D6)361/!21&!34(!0(;;,!6/!^$O!34&1.A4!^PRF!!`6346/!@)&(/34(,(,f!,@(062*!34(!6/@.3!6/21&?)361/!0);;('!34(!)&A.?(/3,!1&!@)&)?(3(&,F!!E/!34(!0),(!12!>UL9MB!Cf!34(!)&A.?(/3,!)&(!34(!,(&6(,!12!/.?H(&,!21&!54604!)!,3)/')&'!'(D6)361/!6,!0);0.;)3('F!!E/!?1,3!0),(,f!2./0361/!)&A.?(/3,!)&(!0(;;!&(2(&(/0(,F!!

!!!n,(!9:0(;b,!Z(;@!>*,3(?!1&!34(!G),3(!V./0361/!'6);1A!H1:!31!A(3!4(;@!5634!2./0361/,F!!

!!!!!>1?(!12!34(!.,(2.;!2./0361/,!6/!'(,0&6@36D(!,3)36,360,!)&(!! average, max, min, var, stdev, count, mode, median, quartile, percentile, trimmean, skew, kurt, covar, correl

!

© Young H.

D)'@&!

!!!860&1,*1.! 0)/!)/);*,(,F)@@&1@&6))/[email protected]!

!!!U1!D6(5AnalysisAdd-Ins!!

!

!!!E/!34(!D'6);1A!H1Summar,3)36,360,!!

'

Chun!

&62G#$#'

,123!9:0(;!@&.,(! 31! ,)DF! !a1.!@&1D)3(!,3)36,360)3!3)H;(F!>1?

5!)!;6,3!12!)Ds!01??)/'!6)/'!04(07!34

Data Analys1:f!,@(062*!34ry Statistics1/!)!/(5!5

<%%2S6

&1D6'(,!)!,(3D(! ,3(@,! 546'(! 34(!')3));!1&!(/A6/(((!311;,!A(/(

D)6;)H;(!)/);6,!/13!1/!34(4(!H1:f!Ana

sis!'6);1A!H14(!0(;;,!34)3!s!04(07H1:!61&7,4((3F!!

6T'

3!12!')3)!)/)(/! *1.! '(D)!)/'!@)&)?(&6/A!?)0&1!&)3(!04)&3,!6/

;*,6,!311;,f!0(!Tools!?(/.alysis ToolP

1:f!,(;(03!De01/3)6/!*1.6/!34(!;15(&!

!

);*,6,!311;,!tD(;1@! 01?@;(3(&,! 21&!()02./0361/,!)//!)''6361/!31

;607!Data A.f!*1.!/(('!

PakFC!

escriptive S.&!')3)!6/!34(;(23!01&/(&F!!

t!0);;('!34(;(:! ,3)36,360)04!)/);*,6,e!/'!34(/!'6,@[email protected]!3)H;

Analysis!1/!331!6/,3);;!34(

StatisticsF!!E/(!Input RangX*!'(2).;3f!

(!Analysis );! 1&! (/A6/(34(! 311;!.,(;)*,!34(!&(,.(,F!

34(!Tools!?(!Analysis T

!

/!34(!Descrge!H1:F!!K;609:0(;!A(/(&

!

ToolPak!t((&6/A!(,! 34(!.;3,!6/!

?(/.F!!E2!34(!ToolPakF!!BK

iptive Statis07!34(!&)3(,!34(!

"#$R!

t!34)3!

Data

K;607!

stics!

! ! "#$]!

© Young H. Chun!

!

<?"'H$76,1'%I'511#'

!

Figures don't lie, but liars figure.

!'B6#"';)'<?"'!$&'%I'.?"'0$##$&3'U",% !

!!!U4(! ;()'!@)&)A&)@4!12! 34(!)&360;(!0);;('! 63! i34(!H6AA(,3! ;)/'!H11?! 6/!^?(&60)/!Z6,31&*mF! ! U1! ,415! 34(! '&)?)360! A&1534! 6/! ;)/'! D);.(,f! 34(!)&360;(! 6/0;.'('! 34(! A&)@4! ,415/! H(;15F! ! ^3! 26&,3! A;)/0(f! 63! ;117,! ),!341.A4!34(&(!4),!H((/!)/!6/0&(),(!12!)H1.3!35(/3*!26D(#21;'!'.&6/A!346,!@(&61'f!,6/0(!34(!&)361!12!34(!4(6A43!12!34(!;6/(!@;133('!21&!$TN]!6,!)H1.3!PR!36?(,!34)3!12!34(!4(6A43!@;133('!21&!POOQF!!!

a()&! $TN]! $TNS! $TNN $TNT $TTO u POO$ POOP! POOQ

G&60(! ]OOOO! ]]OOO! SP]OO STN]O NSN"] u PRO]QR PSR]TN! QOQP]N

!

!! !

'!'B6#"'L)'!$7"':6..",#V'!a(,j!,6_(!?)33(&,u!!

!\.?H(&!12!2(?);(!01;;(A(!A&)'.)3(,!

! !

50000

100000

150000

200000

250000

300000

350000

1986 1990 1994 1998 2002

0

50000

100000

150000

200000

250000

300000

350000

1986 1988 1990 1992 1994 1996 1998 2000 2002

$TTO! POOO!$TNO!

$O!

PO!

QO!

! ! "#$S!

© Young H. Chun!

!

JJKK"",,++$$##""''SS,,%%==22""::##''

Problem 1F!U4(!PN!3(,3!,01&(,!12!)!')3)!?6/6/A!0;),,!5(&(!)&&)/A('!6/!)!,3(?!)/'!;()2!@;13!),!6;;.,3&)3('F!!!

W | 0 1 2 3 3 6 7 X | 0 2 3 3 5 7 8 9 Y | 1 1 4 6 8 9 9 Z | 0 2 4 6 8 9

!

`4)3!6,!34(!median!12!34(!A6D(/!')3)c!BR!@16/3,C!!

!!!!Problem 2F!8&F!q),1/!U)3(f!8)/)A(&!)3!Y('!>3607!U):6f!,(;(03('!)!&)/'1?!,)?@;(!12!"!3):60)H,!)/'!&(01&'('!34(!/.?H(&!12!&1./'!3&6@,!?)'(!31!34(!;10);!)6&@1&3!1/!%031H(&!PF!!U4(!&(,.;36/A!')3)!)&(!),!21;;15,+!B$R!@16/3,C!!

K)Hf!i! $! P! Q! "!

Y1./'!3&6@,f!x! "! O! ]! Q!

!B)[email protected](!34(!?()/!/.?H(&!12!3&6@,!21&!346,!,)?@;(!12!0)H,F!!

[email protected](!34(!?('6)/!/.?H(&!12!3&6@,!21&!346,!,)?@;(!12!0)H,F!!

[email protected](!34(!,)?@;(!,3)/')&'!'(D6)361/!21&!346,!,)?@;(!12!0)H,F!!

!

!

!

© Young H.

!

SSeess!

* Data!

! !!"!! ! #!

! !!$! ! %!

!

! !!n !

-

!

&!

!

&!

!

!!'#()*+(,

!

-!.)d

Chun

ssssiioonn 5

a Set

/010!2#1#)3(*4),

$#!503#!0%#!503#!

ID A

142 256

.. 452

!!k!/010

n!*4%2!#)1(1(#

k!6478,011*(98

Observa

:4*!0!;

#<1!4*!(,):4*,01(#1*(#3()=#>=>?!@4,0=#2>!+

)!ISDS 4

data>!

55.. DDaattaa

1!(2!0!2#1!,#)1!4*!;

0!6477#610!2#1!4:!1

Age S

36 23 ..

33

0!,01*(<!

,0A!9#!*2?!602#2?

,)2!,0A!81#2?!4*!:(

ationB!15;0*1(6870*

,0=#!/01(4)?!0)/!=!82#:87!4=7#C!4*!+#>=>?!,0

4141?!%#

aa CCoollllee

4:!,#02;*46#22>

1(4)!4:!n15#!20,#

Sex

M HF C.. M A

*#:#**#/!!49D#612?

9#!*#:#*:(#7/2>!

5#!2#1!4:!*!#7#,#)

0!902#2!0/010!,(1#<1!:*4,+((C!*#640,,4=*0

#!%(77!*#2

eeccttiioonn

28*#,#)1

49D#612?#!k!,#02

Educa

High SchCollege

.Advance

14!02!ele

?!4*!*#64

**#/!14!02

,#028*#)1>!!

0*#!0724!)()=!,#,!70*=#!64=)(E()=0,C>!

21*(61!48*!

aanndd PPrr

12!10F#)!

?!0)/!:4*28*#,#)1

ation

hool

.. ed

ements?!4*/2>!

2!variab

#,#)12!4

(,;4*10#154/2!606477#61(4=!2;#6(:(6

*!/(26822

rreepprroocc

:*4,!24,

*!#065!4912>!

Inco

56,45,

.85,

()/(3(/8

les?!:#018

4910()#/!

)1!248*60)!5#7;!(4)2!4:!/46!;011#*)

2(4)!14!n

cceessssiinngg

,#!!

9D#61!

ome

000 000 .. 000

8072?!(1#,

8*#2?!

#2!4:!()!+(C!468,#)1)2!()!

umerica

G&H!

gg!

,2?!

12!

al

! ! G&I!

© Young H. Chun

55..11.. DDaattaa CCoolllleeccttiioonn

!* Gathering Data !

!!H>!J910()!/010!07*#0/A!;897(25#/!9A!other sources>!!! ! +K#64)/0*A!/010B!.)1#*)07!4*!#<1#*)07!248*6#2C!!

! !!Netflix Prize /010!2#1!!

! !!L01#2!4:!IIMH!'(10)(6!;022#)=#*2B!! ! '5#(*!944F()=!67022#2?!0=#2?!0)/!=#)/#*2>!!

! !!K;4*12!07,0)06?!LN.!6*(,#!2101(21(62?!OK!P#)282?!#16>!!

!!I>!Q#2(=)!0)!experiment!14!4910()!15#!)#6#220*A!/010>!!

! !!R0*(097#2!0*#!(/#)1(:(#/!0)/!controlled>!!

! !!S'0F#!15#!T#;2(!65077#)=#U!!

! !!V(6*4%03#!;4;64*)B!T4%#*!7#3#7!0)/!1(,#!!

!!W>!V0F#!492#*301(4)2!15*48=5!0)!observational study>!!

! !!R0*(097#2!4:!()1#*#21!0*#!)41!controlled>!!

! !!Q*(3#&15*8!2#*3(6#!1(,#!;#*!3#5(67#!01!'064!N#77!!

! !!T*(6#!/(::#*#)6#!9#1%##)!X4%Y2!0)/!Z4,#!Q#;41!!

!![>!P4)/861!0!survey>!!

! !!V0(7!28*3#A?!;#*24)07!()1#*3(#%?!1#7#;54)#!()1#*3(#%!!

! !!USA Today\CNN\Gallup!T*#2(/#)1(07!#7#61(4)!;477!!

! !!T*#2(/#)1Y2!D49!0;;*4307!*01()=2!!

! !!Consumer ReportsB!J%)#*!201(2:061(4)!28*3#A!

! ! G&W!

© Young H. Chun

* Errors in Survey Research !!"#!$%&'()*'!+((%(!!

! !!K#7#61(4)!9(02!]^!_4)&*#;*#2#)101(3#!20,;7#!!+,`!O)7(21#/!1#7#;54)#!)8,9#*!!

+,`!Z4%!9(=!(2!15#!:(25!()!15#!;4)/a!!b48!10F#!A48*!%(/#&,#25#/!:(25()=!)#1!0)/!60165!4)#!58)/*#/!:(25#2?!#3#*A!4)#!4:!%5(65!(2!=*#01#*!150)!2(<!()65#2!74)=>!!Q4#2!15(2!#3(/#)6#!28;;4*1!15#!5A;415#2(2!1501!no fish in the pond is less than six

inches longa!!

!!-#!.%/01'23%/2'!+((%(!!

! !!K(7#)1!,0D4*(1Ac!P077#*2!14!d825!X(,908=5!4*!_Td!!

!!4#!5678937'!1'23%/2'!+((%(!!

! !!.)1#*)#1!;477!!

!!:#!;%)<'<!=6'289%/2!!

+,`!SQ4!A48!:034*!90))()=!;*(301#!4%)#*25(;!4:!50)/=8)2!in

order to reduce the rate of violent crimeaU!!

!!>#!?/@7')(!A'B9/989%/!!

!S.!/(/!)41!503#!2e<807!*#701(4)2!%(15!1501!%4,0)?!V(22!X#%()2FA>U!

!

!S.1!/#;#)/2!4)!%501!15#!,#0)()=!4:!15#!%4*/!f(2Y!(2>U!

!

© Young H.

!!C#!A9!

! !

! !!

+,D

2865Ass

Dys

26(#:70%!!!J()06abo

stra

!!!.)*#;4=*#0:#,

!!E#!F/!

!SZ#)1#

!

! ]!

!!G#!5!

! !

!

!!H#!I)!

! !

Chun

92J%/'28

!g<0==#0)2%

D!K#<!28*5!(/7#!,

sociation

sfunction

#)1(:(6!60%#/>!J)#!,0D4668*06A!out their

angers>!)!,421!284*1#/!9A01#*!150),07#2>!

/)33(%3

Z03#!A48#*10(),#

]^!SZ03#

5')26('K

!Z0)=()

)K379/*

!V0*=()

8L!

#*01#!15#(%#*2!1501

*3#A2!0*#,82()=>!'n!;897(25n in the U

0*#?!.!282

4*!*#024)4:!15#2#sexual p

8*3#A2?!1A!5#1#*42)!15#!03#

(9)8'!M%

8!;01*4)(E#)1!%(15(

#!A48!=4

K'/8!+(

)=?!;*#=)

*!+((%(!

)!4:!#**4*

(*!()64,1!15#A!15

#!4:1#)!7'5#!Jour

5#/!0!218U.S.U!!'52;#61!150

)!:4*!9415!28*3#A2practices

15#!03#*0#<807!,#*0=#!)8,

%@)N67)

E#/!0!64()!15#!;0

4)#!14!0!,

((%(!

)0)1?!0)/

*?!%5(65!

,#?!8)/#*5()F!0*#!S

#22!()2(=rnal of th

8/A!7021!,548=5!(101!#3#)!15

5!15#!9702!(2!1501!ps!+28*;*(

0=#!)8,,07#2?!:4*,9#*!*#;

(L!;'&'

,,#*6(0021!,4)15

,43(#!%

/!/(,;7#/

(2!8)034

*2101#!15#S066#;10

=51:87!150he Ameri

,4)15!1(11!%02!64)5#2#!*#28

0)/)#22!people d

2#hC?!#2;

,9#*!4:!2#*!#<0,;7;4*1#/!9A

'7!

07!248*6#5aU!

%(15()!15#

/!650/2>

4(/097#

#(*!0=#?!4097#>U!

0)!0!=44ican Med

17#/!SSex

)/861#/!8712!0*#!

0)/!15#!don’t tell

;#6(077A!1

#<!;0*1)7#?!(2!2(=)A!5#1#*4

#!4:!6()#

#!;021!,

!

4*!;*43(/

4/!)43#7!dical

xual

%(15!2#*(4827A

7(F#7A!l the trut

14!

)#*2!)(:(60)1742#<807!

#,01(6!

,4)15aU

G&[!

/#!!

4*!

A!

th

7A!

!

© Young H.

O$)2'")0*65

'5#341#!:4()678/0!2;41Z4%

%*(1#!(")=*A341#2?

!

+0C!.1!515#!.)115#!.)10//*#2;4()12

!

+9C!$!

O$)2'

4:!XKO4:!15#!5#*!141!!J:!150//*#2!!'5#!d15#!*#2resoun

$150,0

!!'5(2!4:!ijG!!

Chun

'!"D!V(655A>U!The P

#!#/(14*2!:4*!Most

/#/!,0)A1!14!%*(1#%0*/!K1#()!4)#!4A!Q*8)F#!9A!:0*!15

502!9##)1#*)#1>!Q1#*)#1>!g22#2!()!152C!

$54!%02!Z

'!-D!"2!;O!,0(7#/k8#21(4)107!()64,5#![?lIM!22#2!:4*!d#8)(4)2;4)2#2!nding su

000!”

*#2871!#<:4*!,0)

50#7!m>!Z(,

Philadelp

4:!Peop

Beautifu

A!%#77&F#!()!0!)0,#*)?!0!*0/:!5(2!645#)!Q%0*5#!,421!

)!20(/!150Q#26*(9#!g3#)!(:!%5#!648)1

Z0)F?!15

;0*1!4:!(12/!0!k8#2)2!02F#/,#!7021!A,#,9#GlW>!!J:)!P4,,(0)/!0))

uccess. T

<0==#*01)A!*#024)

!

,4%(1E?!S

phia Inqu

ple!,0=0ful Perso

F)4%)!6#,#>!/(4!;#*2454*12!4)!*:>U!K1#*)341#2!=0

01!()!15#!54%!A48%#!0228,1*A?!%501

5#!")=*A

2!IG15!*#1(4))0(*#/!15#!*#2;A#0*>!*2!4:!15#:!1542#?!I(11##!64,)48)6#/?The avera

1#2!15#!())2>!!$50

SJ)7()#!R

irer?!+V0

0E()#!02Fn. '5#!4#7#9*(1(#2

4)07(1A?!15#!*0/(4)i2!:0)2!=0*)#*#/!9

:818*#?!#8!,(=51!,#!1501!%1!;*497#,

A!Q*8)F#

#8)(4)!6##!14!(12!,;4)/#)1!

#!67022!ijIIM!*#18*,;81#/!1 “The m

age inco

)64,#!401!0*#!15#

R41#*2!@#

0A!Il?!Hnn

F#/!15#(*4::(6(07!92!0)/!072

28==#21#4?!)0,#7=03#!Z0)9A!0)A!4

#3#*A15()1*A!14!64%#!503#!0,2!%487

#)!Q%0*

#7#9*01(4,#,9#*214!=(3#!5

jG?!15#!"*)#/!15#!15#!,#0)

member o

ome of cl

4:!15#!,##2#!*#024

#1!Z0)F#*

nlC?!Lo!

*!4)7()#!*907741!24!()678/

#/!1501!57A!Z0)F!)F!43#*!4)#!;#*24

)=!%(77!94)/861!0066#22!147/!%#!:06

*:a!+nG!;

4)?!15#!672>!!J)#!5(2!4*!

"78,)(!Jk8#21(4))!()64,#of '75 ha

lass mem

#,9#*2!44)2a!

()Y!:4*!

*#0/#*2!1

/#/!

5(2!:0)2!S15#!IWM?MMM4)>!

9#!/4)#!0!28*3#A!4!077!#&,6#a!+G!

4()12C!

7022!4:!ij

J::(6#!50))0(*#>!#!=(3#)!(ve enjoy

mbers is

4:!15#!P7

G&G!

!

14!

M!

4)!4)!,0(7!

jG!

0/!

()!yed

022!

!

© Young H.

!

1. Sim!

T15#!2#7##7#,

!2. Sy!

J#7#,

!

!3. Str!

'20,!

4. Clu!

'67826782()!#

!

Chun

mple Ra

T*4909(7(20,#! :#61()=! 4),#)1!(2!6

stemat

J910()#/,#)12!()

ratified

'5#!;4;8,;7#!(2!10

uster S

'5#!#7#,21#*2! 4*!21#*2?!0:1#065!654

55..22

andom

(1A! 4:! 2#:4*! #065)#! #7#,6542#)> !

tic Sam

/! 9A! 10F)!15#!;4;

Rando

8701(4)!(20F#)!:*4,

ample

,#)12!()!1=*48;21#*!%5(62#)!6782

22.. SSaammpp

Sampl

#7#61()=!5! 0)/! #3#)1! (2! ()

mple

F()=! #3#;8701(4)>

om Sam

2!/(3(/#/,!15#!#7#

( two-s

15#!;4;8>! ! P5445!0!2(,;21#*!(2!2#7

pplliinngg MM

e

#065! #7#3#*A! #7#)/#;#)/

#*A! k15!

mple

/!()14!21#,#)12!(

!tage cl

8701(4)!042#! 01! *;7#!*0)/7#61#/>!

MMeetthhooddss

#,#)1! ()#,#)1?! 0#)1! 4:!%

#7#,#)

1*010!+=*4()!#065!2

uster s

0*#!/(3(/#0)/4,!4,!20,;

ss

)! 15#! ;40)/! 15#!%5#15#*!

)1! 4)! 0!

48;2C!0)21*018,!+

samplin

#/!()14!00! 20,;7;7#!4:!15

4;8701(4)650)6#24,#! 41

7(21! 4:!

!

)/!0!*0)/+=*48;C>

!

ng )

0!)8,9#*7#! 4:! 155#!#7#,#

!

G&o!

)! (2!#! 4:!15#*!

077!

/4,!

*!4:!5#2#!#)12!

! ! G&j!

© Young H. Chun

+,D! K8;;42#! 1501! XKO! #,;74A2! I?MMM! ,07#! 0)/! GMM! :#,07#!:06871A! ,#,9#*2>! ! '5#! #k807! #,;74A,#)1! 4;;4*18)(1A! 4::(6#*!;4772! 0! 21*01(:(#/! *0)/4,! 20,;7#! 4:! IMM! ,07#! 0)/! IMM! :#,07#!:06871A!,#,9#*2>!

!!!g065! ,#,9#*! 4:! 15#! 20,;7#! (2! 02F#/?! S.)! A48*!4;()(4)?!0*#!:#,07#!:06871A!,#,9#*2! ()!=#)#*07!;0(/!7#22! 150)! ,07#2! %(15! 2(,(70*! ;42(1(4)2! 0)/!k807(:(601(4)2aU!

!!!HlM!4:!15#!IMM!:#,07#2!0)/!oM!4:!15#!IMM!,07#2!20A!pb#2>p!!K4!I[M!4:!15#!20,;7#!4:![MM!+oMqC!0)2%#*#/!pb#2?p!0)/!15#!4::(6#*!15#*#:4*#!*#;4*12!1501!S902#/!4)!0!20,;7#?!%#!60)!64)678/#! 1501! oMq! 4:! 15#! 14107! :06871A! :##7! 1501! :#,07#!,#,9#*2!0*#!8)/#*;0(/!*#701(3#!14!,07#2>U!!!+0C!g<;70()!%5A!15(2!64)6782(4)!(2!%*4)=>!!

!!!

!!

!+9C!@(3#! 0)! 8)9(02#/! #21(,01#! 4:! 15#! ;*4;4*1(4)! 4:! 15#! 14107!:06871A!%54!:##7!1501!:#,07#2!0*#!8)/#*;0(/>!!

! N n x x/n

V07#! I?MMM! IMM! oM! WMq!

L#,07#! ! ! ! !

'4107! IGMM! [MM! I[M! oMq!

!! ! !!!!

! ! !!!!!

!

© Young H.

!* Why!

!!d#,2;

!

-

! ! !! ! !!

-

! ! !!

-

! ! !!

! !!Q!

!!'544

!

* How!

!!Z4

!

! ! &!!

! ! &!!

! ! &!!

! ! &!!

Chun

y?

#07&%4*7,(22()=?!2(E#?!:087;*497#,2

Incomp

X06F()4*!64)

Noisy:

P4)10(

Inconsis

P4)10(

010!k807(

582?!S/0143#*077!,4:!1(,#!2;

w?

4%!60)!%4:!/010a!

Q010!67#

Q010!()1

Q010!1*0

Q010!*#

55..33..

7/!/010900)/!()6471A!/010!62?!0)/!/(

plete:

)=!011*(9810()()=!4

()()=!#**

stent:

()()=!/(2

(1A!(2!0!F

10!;*#;*4,()()=!;*;#)1!D821

%#!;*#;*

#0)()=!

1#=*01(4)

0)2:4*,0

/861(4)!!

.. DDaattaa PP

02#2!0*#!4)2(21#)16477#61(426*#;0)6

81#!3078#4)7A!0==

*4*2?!4*!4

26*#;0)6

F#A!(228#

46#22()=*4D#61?U!%1!;*#;0*(

*46#22!15

)!+*#&64)

01(4)!

PPrree--PPrroo

5(=57A!21!/010!/84)!()21*8,6A!()!)0,

#2!4*!6#*=*#=01#!/

4817(#*!30

6(#2!()!15

#B!@.@J

=!*#;*#2#%5(65!(2()=!15#!*

5#!/010!2

)247(/01(

oocceessssiinn

2826#;1(98#!14!15#(,#)12?!/,()=!64)

*10()!011*/010>!

078#2>!

5#!64/#2!

Jh!

#)12!8;!142!0)!#<1*0(=51!/010

24!02!14!(

(4)C!

gg

97#!14!)4(*!1A;(60/010!#)1*A)3#)1(4)

*(981#2!4

4*!)0,#

4!lM!;#*604*/()0*0>!

(,;*43#!

(2A?!077A!58=#A!)2>!

:!()1#*#2

#2>!

6#)1!4:!1*A!0,48)

15#!k807

G&l!

#!

21?!!

15#!)1!

7(1A!

!

© Young H.

!

! Z4%2,441*#2473!

(1) M!

! !!L(! !!Q! !!O! !!O! ! !! !!O! ! !!+,D!'!

!

! !!5! ! !!

! !!5! ! !! ! !!

! !!5! ! !

Chun

%!14!67#015()=!4813()=!()64

issing V

()/!15#!3#7#1#!15#2#!15#!012#!15#!0120,;7#2#!15#!,d#=*#2

4!;70A!4

i

1 2 3 4 5

5'8J%<!"

&"3#*0

5'8J%<!-

&!"3#*! ! !

5'8J%<!4

&!.:!Y!]

0)!15#!/(1!)4(2A!/4)2(21#)6

Values

3078#!0)/#!30*(09711*(981#!,11*(981#!,#!67022>!,421!;*4922(4)?!/#

4*!)41!14!

Temp

84??868894

"B!L(77!%(0=#!1#,;

-B!L(77!%(*0=#!1#,! !

4B!V421!7]!b#2!0)

55..44.. DDaa

(*1A!/010!/010?!(/#)6(#2!()!15

/!:(77!(1!()#2!4*!(1#,,#0)!14!,#0)!:4*

9097#!307#6(2(4)!1*

;70A!=47

p, X1

4 ?? 6 8 4

(15!15#!01;#*018*#B

(15!15#!01,;#*018*#

7(F#7A!;*)/!XI!]!_

attaa CClleeaa

9A!:(77())1(:A()=!5#!/010a

)!,0)80,2!%(15:(77!()!15*!077!20,

78#!14!:(7*##?!4*!N

7:?!1501!(2

Wind, X

NoNoYesYesNo

11*(981#!,B!!!!!!!!

11*(981#!,#!%5#)!Y!

*#/(61(4)_4?!!15#)

aanniinngg

)=!()!,(20)/!*#,

077A>!5!,(22()=5#!,(22(),;7#2!9#7

77!()!15#!,N0A#2(0)

2!15#!k8#

X2 Pl

,#0)!

,#0)!4:!Y!]!b#2!!

)!)!XH!]!!!

22()=!30,43()=!4

=!3078#2>)=!3078#4)=()=!1

,(22()=!)!,#154/

#21(4)h!

lay?, Y

Yes Yes Yes No No

: 15#!20,

78#2?!4817(#*2?!0

>!>!14!15#!!

3078#>!/>!

,#!67022

G&n!

0)/!

!

© Young H.

(2) No!

! _4(!

! ! +!! P)Q!!

! ! !!

! ! !!

! ! !!

! ! !! !! !! !! !! !! !!

! !!

! PNQ!!

! ! !!

! ! !! ! !!

! ! !!

! ! !!

! ! !!

! ! !!

! ! !

Chun

oisy Da

(2#!(2!0!*

+,D!R07(/

R%S!8%!

!P7821#*(

!d#=*#22

!Z(214=*

! H>!r!!!!!!

I>!'

R%S!8%

!'5#!24*

!'5#!70*=! 2,4

!K1#;2!H

! ! I

! ! !

! ! !

! ! !

ata

*0)/4,!#

/!+PgJY

<'8'@8!%

()=B!4817

2(4)B!481

*0,B!

r80*1(7#B!&!K82;#! (:!! r&!J817(#! (:!! r

'5*##&2(=

!2K%%8J

*1#/!3078

=#*!15#!%4415()=>!

>!N())()

>!O2#!4)

&!K,44

&!K,44

&!K,44

#**4*!4*!3

Y2!2070*A

%6879'(2T

7(#*2!:077!

17(#*2!:07

#61#/!48: 15#*#!0*

H!s!H>G!e#*2!: 15#*#!0*

H!s!W>M!e

=,0!7(,(

J!%68T!!U

8#2!0*#!/

%(/15?!15

)=!9A!#k8

)#!4:!15#!

415()=!9A

415()=!9A

415()=!9A

30*(0)6#

AC?!g**4*!

T!

4812(/#!

7!:0*!:*4,

817(#*2!*#!4812(/e!+rW&rH

*#!4812(/e!+rW&rH

(1B! sx W!

U9//9/*

(21*(981#

5#!=*#01#

8(&/#;15

:4774%()

A!9()!,#

A!9()!,#

A!15#!674

#!()!0!,#

+"=#]IM

4:!15#!6

,!15#!7()

#!15#!())C!0)/!rW

#!15#!481C!0)/!rW

xs !

#/!()14!0!

#*!15#!#::

4*!#k8(&

)=!,#154

#0)2!

#/(0)2!

42#21!948

#028*#/!3

MMC!

67821#*2>!

)#0*!7()#

)#*!:#)6#

W!t!H>G!e

1#*!:#)6#

W!t!W>M!e

)8,9#*

:#61!4:!15

&%(/15>!

4/2B!

8)/0*A!3

30*(097#>

#>!

#2B!!+rW&rHC

#2B!!+rW&rHC

*!4:!9()2>

5#!!

3078#2!

G&HM!

>!

C!

C!!

>!

!

© Young H.

(3) In!

! !!K0!

! ! +!

! !!K0!

! ! +! ! !! ! !!

! K54!

! ! +! ! !!

+,D!V

!

! !

! !

! !

20

Chun

consis

0,#!64)

+,D!u@#)

0,#!3078

+,D!u8)/! g!)4! ! !

487/!9#!6

+,D!J)#!! %#*

V6879'(2B!

!.))#*!:#!J81#*!:#!W&2(=,0

40

tent Da

6#;1!981

)/#*?!K#<

8#!#<;*#

/#*=*0/84101(4)B!! !

64**#61#/

90)FY2!/#!94*)!4

!

Mean StandaMinimuFirst QSeconThird QMaxim

#)6#2!B!:#)6#2B!!0!7(,(12B!

60

ata

1!/(::#*#)

<?>>v?!ur

22#/!/(::

801#c!O@H>GWoI>og

/!,0)80

/010902#4)!HH\HH\

Norm

ard Deviatum

Quartile d Quartile

Quartile mum

!!!

80

o!

)1!011*(98

r8(E-H?!r

::#*#)17A

@?wv!ogM[!0)&MI!0)/!

077A!82()=

#!254%2!1\HH!

mal (100, 1

tion

e

100

"

81#!)0,#

rH?!>>v?!u

/!HGWIoM>MIo!

=!#<1#*)0

1501!Gq!

10)

9710719098

104130

120 1

#!

u22)?!218

o!

07!*#:#*#

4:!15#!68

7.97 0.48 1.56 0.98 8.98 4.77 0.58

140 16

8/#)1x22)

)6#2!

8214,#*

60 180

G&HH!

)?v!

2!!

!

0

!

© Young H.

+,D!P!!!&!5'!

! !!K1!

! ! !!

! !!K1!

! ! &!! ! !!

! ! &!! ! !!

! ! &!! ! !!

!

!!&!5'!! !!K1!

! ! !! ! !!! !!K1!

! ! &!! ! !!

! ! &!! ! !!

! ! &!! ! !!

Chun

P4)2(/#*!

'8J%<!"B!

1#;!HB!T0

! u[?!

1#;!IB!K,

NA!9()!! un?!n

NA!9()!! ul?!

NA!9()!9! u[?![

'8J%<!-B!

1#;!HB!T0

N()!*0)! !!

1#;!IB!K,

NA!9()!! !!

NA!9()!! !!

NA!9()!9! !!

15#!;*(6#

N())()=

0*1(1(4)!()

l?!HGv?!u

,4415()=

,#0)2!n?!nv?!uI

,#/(0)2l?!lv?!uI

948)/0*[?!HGv?!u

N())()=

0*1(1(4)!()

)=#B!!

,4415()=

,#0)2!

,#/(0)2

948)/0*

!

#!/010B!u

=!9A!#k8

)14!#k8(

uIH?!IH?!

=!

II?!II?!I

2!IH?!IH?!I

*(#2!uIH?!IH?!

=!9A!#k8

)14!#k8(

=!

2!

*(#2!

u[?!l?!HG?

(&/#;15

&/#;15!9

I[v?!uIo

IIv?!!

IHv?!!

I[v?!!

(&()1#*30

&()1#*307

?!IH?!IH?

9()2!4:!W!

o?!Il?!Wo

07!

7!9()2!4:

!I[?!Io?!

(1#,2>!

ov!

:!HMB!

Il?!Wov

G&HI!

!

© Young H.

!* Data!

! Q01! ! 2!

! !!K6! ! g!

! !!d#! ! !!

! !!Q8!

! !!Q! ! 9!!* Data!

! Q01! ! :!

! !!K,!

! !!"! ! +!

! !!@! ! +!

! !!_!

! !!P*

Chun

55..55.. DD

a Integ

10!:*4,!,214*#>!

65#,0!()g<`!68214

#/8)/0)P0)!9#

8;7(601#/

010!3078#9#6082#!4

a Trans

10!0*#!1*0:4*!/010!,

,4415!15

==*#=01(#>=>?!/0(7

#)#*07(E0#>=>?!21*#

4*,07(E0

*#01(4)!4

DDaattaa IInnt

ration

,871(;7#!

)1#=*01(44,#*&.Q

)1!30*(09#!/#1#61#

/!(1#,2!

#!64):7(64:!/(::#*

sformat

0)2:4*,#,()()=>!

5#!/010!14

(4)!7A!207#2!

01(4)!##1!]^!6(

01(4)B!7()

4:!)#%!3

tteeggrraattiioo

248*6#2

4)!]^!#)!y!6821&

7#2!#/!9A!64*

61!*#)1!8)(12

tion

#/!4*!64)

4!*#,43

/010!]^!

(1A!4*!648

)#0*!1*0)

30*(097#2

oonn aanndd

!0*#!64,

)1(1A!(/#)&)8,9#*

**#701(4)

2?!2607()=

)247(/01#

3#!15#!)4

,4)157A

8)1AC!

)2:4*,01

!+011*(98

TTrraannssff

,9()#/!()

)1(:(601(4

)!0)07A2(

=?!4*!#)6

#/!()14!:

4(2#>!

A!14107!0,

(4)!4:!15

81#2C!

ffoorrmmaattii

)14!0!645

4)!;*497#

(2!

64/()=!

:4*,2!0;;

,48)1C!

5#!*0%!/0

oonn

5#*#)1!/0

#,!

;*4;*(01

010!

G&HW!

010!!

#!!

!

© Young H.

* Nor!

! ! g! ! !!! H>!V!

! ! X

!

!!;4

!

! I>!E!

! ! X

!

! ! %! ! !!

! ! +!

! W>!_!

! ! V!

! ! X

!

! ! !

Chun

rmalizat

g2;#6(077,#154

V()&,0<

,+i

X

XX #

!K607()=;*#3#)12!43#*!:#018

E&264*#!)

xs

XX

$#i

%5#*#! x

! ! sx

+,@'7B! !

_4*,07(E

V43#!15#

j

XX

HMi#

!2865!1

tion

7A!(,;4*/2!+#>=>?!

<!)4*,07

,,0<

,(

X

XX

$

$

=!/010!30481%#(=8*#2!%(1

)4*,07(E

x

x$!

x !(2!15#!2

x!(2!15#!2

]K'"_

E01(4)!9A

#!/#6(,0

?!%5#*#!

1501!zX’,0

*10)1!:4*!2070*A!0

7(E01(4)

%,()

() iC

X

078#2!()!0=5()=!:#015!2,077#

01(4)!+E#

20,;7#!,0,;7#!21

_Q"dQ

A!/#6(,0

07!;4()1>

j!(2!15#!2

0<z!{!H>!

*#=*#22(0)/!0=#C

,0<i X$

0!*0)=#!2018*#2!%(#*!*0)=#!

#*4&,#0)

,#0)!0)/10)/0*/!

Q.|g+x?&'

07!2607()=

2,077#21

4)!0)/!6

(,()i X)

2865!02!}(15!70*=#7(F#!f0=#

)!)4*,0

/!/#3(01(4

'*&"C!

=!

1!()1#=#*

67022(:(60

,()iX !

}M?!H`?!}&#!*0)=#!7(#Y>!

07(E01(4)C

)>!

01(4)!!

H?!H`!(F#!f2070

C!

G&H[!

0*AY!

!

© Young H.

+,D!.2101(21!

!

!!+0C!V!

! ! !

!!!!+9C!E!

! ! !

!!!!+6C!_!

!!

!!

!

!!

! ! !

!

Chun

.%(K)79W

(62!4:!0!3

V()&,0<

!

&264*#!)

!

_4*,07(E

!'5#!*0)

!'5#!,0

!$#!/(32607#/!

!

W)89%/B!N30*(097#!

Mean StandaMinimuFirst QSeconThird QMaxim

<!)4*,07(

)4*,07(E0

E01(4)!9A

)=#!4:!X!

0<(,8,!

(/#!#06514!M>HWM

!

N02#/!4)X?!norm

Norm

ard Deviatum

Quartile d Quartile

Quartile mum

(E01(4)!(

01(4)!+E#

A!/#6(,0

(2!:*4,!

0924781#

5!X!9A!HMMGl>!

)!15#!:477malize!15

mal (100, 1

tion

e

)14!15#!*

#*4&,#0)

07!2607()=

jH>Go!14

#!3078#!(2

MMM!+(>#>?

74%()=!/5#!3078#!x

10)

9710719098

104130

*0)=#!+M

)!)4*,0

=!

4!HWM>Gl

2!HWM>Gl

?!j!]!WC!2

/#26*(;1(x]!lM>M>

7.97 0.48 1.56 0.98 8.98 4.77 0.58

?!HC!

7(E01(4)C

>!

l>!

24!1501!HW

(3#!!

C!

WM>Gl!(2!

G&HG!

!

! ! G&Ho!

© Young H. Chun

55..66.. DDaattaa RReedduuccttiioonn !

! J910()!0!*#/86#/!*#;*#2#)101(4)!4:!15#!/010!2#1!1501!(2!,865!2,077#*!()!3478,#?!A#1!6742#7A!,0()10()2!15#!()1#=*(1A!4:!15#!4*(=()07!/010>!!* Data Cube Aggregation !

! !!"==*#=01#!/010!689#2!+(1#,2C!14!*#/86#!15#!/010!3478,#>!!

! ! ! g<`! k80*1#*7A!207#2!/010!]^!A#0*7A!14107!0,48)1!! ! ! ! ! 9*0)65!4::(6#2!]^!*#=(4)07!!

!

!!* Dimensionality Reduction !! !!g7(,()01#!(**#7#30)1!4*!*#/8)/0)1!011*(981#2!+30*(097#2C>!!

! !!L()/!0!=44/!2892#1!4:!15#!4*(=()07!011*(981#2!+30*(097#2C>!!

! ! &!L4*%0*/!2#7#61(4)!,#154/!!

! ! &!N06F%0*/!#7(,()01(4)!,#154/!!

! ! &!K1#;%(2#!,#154/!!

Year = 2001

=6)(8'(! I)7'2!

rH! ~[Ml?MMM!

rI! ~WGM?MMM!

rW! ~Glo?MMM!

r[! ~II[?MMM!

Year = 2002

=6)(8'(! I)7'2!

rH! ~[Ml?MMM

rI! ~WGM?MMM

rW! ~Glo?MMM

r[! ~II[?MMM

Year = 2003

=6)(8'(! I)7'2!

rH! ~[Ml?MMM

rI! ~WGM?MMM

rW! ~Glo?MMM

r[! ~II[?MMM

Annual Sales

X')(! I)7'2!

IMMH! ~H?Gol?MMM!

IMMI! ~I?WGo?MMM!

IMMW! ~W?Gn[?MMM!

! ! G&Hj!

© Young H. Chun

+,D!P4)2(/#*!15#!67022(:(601(4)!;*497#,!%(15!o!9()0*A!30*(097#2B!!

! !!J*(=()07!/010!2#1!!

R0*(097#2! P7022!b!XH! XI! XW! X[! XG! Xo!

M! M! M! M! M! M! "!

M! H! H! M! H! H! Y!

M! M! M! H! M! M! "!

H! H! M! H! H! M! Y!

M! M! M! H! M! H! "!

H! H! H! H! H! H! Y!

H! M! M! M! M! M! "!

H! H! H! M! H! H! Y!

!! Q4!%#!*#077A!)##/!077!15#!2(<!30*(097#2!:4*!15#!67022(:(601(4)!;*497#,a!!_4;#h!!

!

!! !!d#/86#/!/010!2#1!!

R0*(097#2! P7022!b!XH! X[! Xo!

M! M! M! "!

M! M! H! Y!

M! H! M! "!

H! H! M! Y!

M! H! H! "!

H! H! H! Y!

H! M! M! "!

H! M! H! Y!

X[ ]!Ha

b#2h _4h

XH!]!Ha Xo ]!Ha

Y!]!Mh!

b#2hb#2h! _4h_4h

Y ]!Hh Y ]!Mh Y ]!Hh

! ! G&Hl!

© Young H. Chun

* Data Compression !! J910()!0!*#/86#/!4*!S64,;*#22#/U!*#;*#2#)101(4)!4:!15#!4*(=()07!/010>!!!! P"Q!Z(9/@93)7!$%K3%/'/82![/)7L292!+TP"C!!

!

!!!T*46#/8*#2!!

H>!_4*,07(E#!15#!();81!/010>!!

I>!P4,;81#!c!4*154)4,07!3#614*2!1501!;*43(/#!0!902(2!:4*!15#!)4*,07(E#/!();81!/010>!

!

W>!K4*1!1542#!3#614*2!()!4*/#*!4:!/#6*#02()=!2(=)(:(60)6#!4*!21*#)=15>!

!

[>!P5442#!0!:#%!;*()6(;07!64,;4)#)12!+3#614*2C!0)/!*#64)21*861!0!=44/!0;;*4<(,01(4)!4:!15#!4*(=()07!/010>!

!!! P-Q!A92@('8'!\)&'7'8!]()/2B%(K)89%/!+Q$'C!

YI!

YH!

XH!

XI!

! ! G&Hn!

© Young H. Chun

+,D!Q4!A48!4%)!0!*(/()=!70%)!,4%#*a!!

X1 (Income) X2 (Yard) Y

X’ Y

4 7 1 11 1 6 5 1 11 1 7 6 1 13 1 6 8 1 14 1 8 4 1 12 1 3 6 0 9 0 4 2 0 6 0 4 4 0 8 0 5 3 0 8 0 2 4 0 6 0

!

! !!K6011#*!;741!

!

! !!'*0)2:4*,01(4)!4:!15#!30*(097#2?!XH!0)/!XI>!!

! ! ! X’!]!!!!

! !!T#*:#61!67022(:(601(4)!%(15!0!1*0)2:4*,#/!30*(097#?!X’!!

! ! ! .:!X’!^!HM?!15#)!Y!]!H!!

! ! ! .:!X’!{!HM?!15#)!Y!]!M!! !

0

2

4

6

8

10

0 2 4 6 8 10

! ! G&IM!

© Young H. Chun

55..77.. DDiissccrreettiizzaattiioonn

!!!!Q010!/(26*#1(E01(4)!+0>F>0>!/010!601#=4*(E01(4)C!(2!0!;*46#/8*#!1501!10F#2!0!/010!2#1!0)/!64)3#*12!077!continuous!011*(981#2!14!categorical>!!

! !!d#/86#!15#!/010!2(E#!!

! !!K4,#!67022(:(601(4)!07=4*(15,2!+#>=>?!/#6(2(4)!1*##C!066#;1!!! ! 4)7A!categorical!/010>!!+,D!O2#!15#!()1#*307!709#7!+541!y!%0*,C!()21#0/!4:!15#!061807!!! ! 3078#!+1#,;#*018*#C>!!

!

!! !!N#:4*#B!!L05*#)5#(1!nHH!!

i Temp, X1 Wind, X2 Play?, Y

1 84 N Yes 2 85 N Yes 3 86 Y Yes 4 88 Y No 5 94 N No

!

! !!":1#*B!!$0*,!]!u'#,;!{!ljv!0)/!Z41!]!u'#,;!^!ljv!!

i Temp, X1 Wind, X2 Play?, Y

1 Warm N Yes 2 Warm N Yes 3 Warm Y Yes 4 Hot Y No 5 Hot N No

Numerical

Data!

Discretizer Categorical

Data

!

© Young H.

* How!

! !!Q!

! !!T*!

! ! H! ! !!

! ! I!* Num!

! '5#! ! ()!

! '5#! ! /! ! 4!! &!V! ! !!

! &!V! ! !!

! &!V! ! !!

* Disc!

! !!K8!

'+)

Chun

w?

(3(/#!15#

*46#22!

H>!'5#!82g<`!N(

I>!'5#!07

mber of

#!,4*#!()):4*,01(

#!70*=#!)/010!,()(43#*&:(11(

V#154/!H>! 150)

V#154/!I>! %5#

V#154/!W>! %5#

cretizat

8;#*3(2#

'5#!/(21(67022!709)41!82#!15

#!*0)=#!4

2#*!/#1#*()0*A!/(2

=4*(15,!

f Interva

)1#*3072!(4)!(2!*#1

)8,9#*!4()=!07=4)=>!

>!!'5#!)8)!15#!)8,

>!!'5#!)8#*#!n!(2!15

>!!'5#!)8#*#!c!(2!15

tion Alg

/!32>!8)

)61(4)!(29#7C>!!'5#5(2!():4*

4:!15#!011

*,()#2!1526*#1(E01(

/#1#*,(

als

0*#!82#/10()#/>!

4:!()1#*30*(15,!0)

8,9#*!4:,9#*!4:!6

8,9#*!4:5#!)8,9

8,9#*!4:5#!)8,9

gorithm

28;#*3(2

2!902#/!4#!8)28;#*,01(4)>

1*(981#!()

5#!)8,9(4)!

()#2!15#!%

/?!15#!,4

072!0::#6)/!()6*#0

:!()1#*3067022#2!%

:!()1#*309#*!4:!(1#

:!()1#*309#*!4:!670

ms

2#/!07=4*

4)!15#!82#*3(2#/!+

)14!2#3#

9#*!4:!/(2

%(/15!4:

4*#!4:!15

612!15#!#:02#2!15#!

072!25487%#!%0)1!

072!(2!2k*1#,2!()!15

072!(2!n\+W022#2>!

*(15,!

2#!4:!15#+67022&97

*07!()1#*3

26*#1#!()1

:!#065!()

#!4*(=()0

::(6(#)6A650)6#!4

/!)41!9#14!/#26*

1+nC!5#!1*0()()

WcC!

#!/#6(2(47()/C!07=4

3072>!

1#*3072>

)1#*307>!

07!!

A!4:!15#!4:!!

#!2,077#**(9#>!

)=!2#1!!

)!011*(984*(15,2!

G&IH!

*!

81#!/4!

!

© Young H.

P"Q!U9!

! &!gk!

324

!

! ! !!

! &!gk!

0())

!+,D!'!

!

! &!gk!

! ! +,!

! ! .)!

! ! Q!! &!gk!

! ! n!

! ! .)!

! ! Q

Chun

//9/*!5

k807!()1#

L()/!1530*(097#!02;#6(:(#/4817(#*2>!

g<`!L(

k807!:*#k

K4*1!150)/!15#)!)1#*3072!)8,9#*!4

#,;#*01

X

k807!()1#

,0<&,()

)1#*307B!

Q(26*#1(E

k807!:*#k

n!(1#,2!\!

)1#*307B!

Q(26*#1(E

5'8J%<!+

#*307!%(/

5#!,()(,0)/!15#!//!#k807!%

)07!=*0/

k8#)6A!/

5#!3078#2/(3(/#!15()!2865!4:!(1#,2>

18*#!]!u$

82 84

#*307!%(/

)C\W!]!!

!

E01(4)B!!

k8#)6A!/

W!]!!

!

E01(4)B!!

+unsuper

/15!/(26*

,8,!0)//(3(/#!15%(/15!/(2

/#!]!u"?!

/(26*#1(E0

2!4:!#0655#,!()140!%0A!15>!

$0*,?!Z

4 87

/15!/(26*

/(26*#1(E0

rvised!07

*#1(E01(4)

/!15#!,05(2!*0)=#26*#1#!()1

N?!P?!Q

01(4)!

5!30*(0974!15#!82#501!#065!

Z41?!R#*A

88 91

*#1(E01(4)

01(4)B!)8

7=4*(15,

)!

0<(,8,!3#!()14!0!)1#*3072> !

?!Lv!

7#!()!0)!0#*&2;#6(:()1#*307

A!Z41v!

1 93

)B!)8,9#

8,9#*!4:

,C!

3078#2!:4)8,9#*!4K#)2(1(3

026#)/():(#/!)8,7!64)10()

95 96

#*!4:!()1#

:!()1#*30

4*!#065!4:!82#*&3#!14!

)=!4*/#*!,9#*!4:!)2!15#!20,

6 98

#*3072!]!

072!]!W!

G&II!

,#!

W!

! ! G&IW!

© Young H. Chun

P-Q!k0!5')/2!$7628'(9/*!A92@('89W)89%/!

!V()(,(E#!15#!28,!4:!15#!2k80*#/!/(210)6#2!4:!077!3#614*2!()!0!67821#*!/4,0()!14!15#!67821#*!6#)1#*!+6#)1*4(/C?!vi>!

!

!

!!P4Q!R%78'^2!V/'!167'!A92@('89W'(!P"1AQ!

!!!!K4*1!15#!492#*3#/!3078#2!4:!0!64)1()8482!30*(097#!0)/!011#,;1!14!=*##/(7A!/(3(/#!15#!/4,0()!4:!15#!30*(097#!()14!9()2!1501!#065!64)10()2!4)7A!()210)6#2!4:!4)#!;0*1(6870*!67022>!!

!!!'4!034(/!4)#!9()!:4*!#065!492#*3#/!3078#?!15#!07=4*(15,!(2!64)21*0()#/!14!:4*,!9()2!4:!01!7#021!0!;*#&2;#6(:(#/!)8,9#*!4:!492#*301(4)2>!+#<6#;1!15#!*(=51,421!9()C!!

!!!"!,()(,8,!9()!2(E#!4:!o!(2!28==#21#/!902#/!4)!0)!#,;(*(607!218/A>!

!P:Q!1'@6(29&'!59/9K)7!+/8(%3L!Z)(8989%/9/*!

!!P>Q!$J90I_6)('!]'28!!! !

vH! vI vW

P47/! Z41$0*,

! ! G&I[!

© Young H. Chun

55..88.. DDaattaa MMiinniinngg SSooffttwwaarree !

!!!T*#;*46#22()=!()!$#F0!(2!/4)#!9415!,0)8077A!+#>=>!9A!1(6F()=!*#7#30)1!011*(981#2C!0)/!0814,01(6077A>!"814,01(6!;*#;*46#22()=!(2!/4)#!9A!:(71#*2>!'5#2#!503#!14!9#!2#7#61#/!:*4,!15#!L(71#*2!,#)8>!!L(71#*2!0*#!0//#/?!0)/!*#2871()=!)#%!/0102#12!60)!*#;706#!15#!();81!/0102#1!()!0)!0)07A2(2>!!!

! !!K#7#61(4)!4:!*#7#30)1!30*(097#2!!

weka.filters.unsupervised.attribute.Remove !

")!()210)6#!:(71#*!1501!/#7#1#2!0!*0)=#!4:!011*(981#2!:*4,!15#!/0102#1>!

!

! !!Q(26*#1(E01(4)!!

weka.filters.unsupervised.attribute.Discretize !

")!()210)6#!:(71#*!1501!/(26*#1(E#2!0!*0)=#!4:!)8,#*(6!011*(981#2!()!15#!/0102#1!()14!)4,()07!011*(981#2>!Q(26*#1(E01(4)!(2!9A!2(,;7#!9())()=>!!KF(;2!15#!67022!011*(981#!(:!2#1>!

!

! !!_4*,07(E01(4)!!

weka.filters.unsupervised.attribute.Normalize !

_4*,07(E#2!077!)8,#*(6!3078#2!()!15#!=(3#)!/0102#1>!'5#!*#2871()=!3078#2!0*#!()!}M?H`!:4*!15#!/010!82#/!14!64,;81#!15#!)4*,07(E01(4)!()1#*3072>!

!

! !!V(22()=!3078#!!

weka.filters.unsupervised.attribute.ReplaceMissingValues !

d#;706#2!077!,(22()=!3078#2!:4*!)4,()07!0)/!)8,#*(6!011*(981#2!()!0!/0102#1!%(15!15#!,4/#2!0)/!,#0)2!:*4,!15#!1*0()()=!/010>!!

! !

! ! G&IG!

© Young H. Chun

!

TThhee WWiizzaarrdd ooff OOddddss

!

Simpson's Paradox and Data Aggregation

X0*=#!64*;4*01(4)2!0)/!64,;0)(#2!0*#!/(3(/#/!()14!2#3#*07!/(3(2(4)2?!289/(3(2(4)2?!/#;0*1,#)12?!0)/!24!4)>!!g<6#;1!:4*!5(=5&7#3#7!,0)0=#*(07!;42(1(4)2?!#,;74A,#)1!/#6(2(4)2!828077A!10F#!;706#!01!15#!/#;0*1,#)107!4*!/(3(2(4)07!7#3#7>!!")07AE()=!0==*#=01#!#,;74A,#)1!/010!()!2865!64,;0)(#2!60)!=(3#!*(2#!14!0!68*(482!;5#)4,#)4)!F)4%)!02!Simpson's paradox>!

")!()21*861(3#!0)/!28*;*(2()=!#<0,;7#!4:!K(,;24)i2!;0*0/4<!4668**#/!01!15#!O)(3#*2(1A!4:!P07(:4*)(0!01!N#*F#7#A!()!15#!HnjM2>!!g<0,()01(4)!4:!0;;7(60)1!/010!:4*!0!HnjW!k80*1#*!*#3#07#/!1501!15#!43#*077!*01#!4:!0/,(22(4)!:4*!:#,07#!0;;7(60)12!14!15#!=*0/801#!265447!%02!289210)1(077A!7#22!150)!15#!*01#!4:!0/,(22(4)!:4*!,07#!0;;7(60)12>!

$5(65!/#;0*1,#)12!01!N#*F#7#A!%#*#!*#2;4)2(97#!:4*!15(2!(,9070)6#a!!!

!!

'%4!T4;8701(4)2!_8,9#*!

0;;7(#/!

_8,9#*!

0/,(11#/!

T#*6#)1!

0/,(11#/!

Q#;0*1,#)1! V07#! ! ! !

4:!V015#,01(62! L#,07#! ! ! !

Q#;0*1,#)1! V07#! ! ! !

4:!g)=7(25! L#,07#! ! ! !

P4,9()#/!V07#! HHM! nH! lI>jq!

L#,07#! HHM! Hn! Hj>Wq!

!

!

© Young H.

e!L*4,!!

!!!p"!6415#!jM!%%54!0;;,07#2!0;:#,07#!0

!!!"!:#/*#3(#%()15#!:0614(**#=870481!15010;;7(60)50/!0;;6487/!3#

!!!N81!15:#,07#!0/#)(#/!#68**#)1!90117#?!1564,;0)70%A#*!1=43#*),

!!!P0)! A20,#!*0

!

$5

;

N78

;

P4

!!

Chun

V0*(7A)!3

4,;0)A!/#6%5(1#&64770;7(#/?!IMq;;7A()=!:40;;7(60)12!

#*07!gk807!)=!15#!5(*(4*A!150)!:#0*(1(#2!()!5(1!()!9415!15)12!5(*#/!%;7(#/!s!0!6(*#*A!%#77!50

5#!=43#*),0;;7A()=!:4#,;74A,#70%!(2!%*(15#!64,;0))Ai2!;*#2(/#14!#<;70()!,#)1!%4*F

A48! #<;70(0%!/010ap!!!

5(1#&64770*

;42(1(4)!

8#&64770*!

;42(1(4)!

4,9()#/!

342!K030)1?

6(/#/!14!#<0*!;42(1(4)2q!%#*#!5(*#4*!15#!978#&%#*#!5(*#/

g,;74A,)=!;*061(6##,07#2?!24!(*()=?!15#!6#!%5(1#&64%02!=*#01#*!*68,210)6#03#!*#:7#61#

,#)1!4::(6(0:4*!0!D49!01!)1?!%5(7#!,11#)?!15(2!#3)A!%02!:()##)1!%02!7#:(1?!15#!70%A#/>!

()! 54%! 1%! !

!

*! V

L

V

L

V

L

!

?!S"2F!V0*

<;0)/?!24!(12?!IMM!,07##/?!%5(7#!4)&64770*!;42(/> !

,#)1!J;;4*#2!)41#/!155#!/#6(/#/4,;0)A!;*4770*!0)/!97150)!(1!%0#!9#A4)/!5#/!,4*#!:#

07!;*4/86#15#!:0614*A,07#!0;;7(63(/#)6#/!0#/?!15#!:061:1!%4)/#*()A#*!*#;7(#/

%4! 4;;42()

V07#!

#,07#!

V07#!

#,07#!

V07#!

#,07#!

*(7A)?U!Par

1!4;#)#/!0!#2!0)/!IMM)7A!HGq!4(1(4)2?!jGq

*18)(1A!#):4501!,0)A!,/!14!()3#21(*#2(/#)1!/#78#&64770*!:02!:4*!,07#5(2!64)1*47!#,07#!150)!

/!5(2!4%)!A!50/!0!9#160)12!50/!40!3(4701(4)>14*A!%02!2#)=!%501!5#/!1501!5#!)#

)=! 2101(21(60s!P5*(214;5

_8,9#

0;;7(#

!

!

!

!

!

!

rade Maga

:0614*A!=#M!:#,07#2!0:!15#!,07#2q!%#*#!5(*#

4*6#,#)1!4,4*#!,07#2(=01#>!d#2;#)(#/!0)A!/:(#7/2?!15#!;2>!!Z#!254%s!5(2!%4*F,07#!#,;

2101(21(62?!%11#*!150)!Gl4)7A!0![Gq>>>":1#*!0!7##(E#/!0)/!2#!/(/!%*4)=#3#*!%487/

07! 48164,5#*!V6X08=5

#*!

#/!

_8

0/,

azine?![\Il\

#)#*01()=![0;;7(#/>!J:2!%#*#!5(*##/?!%5(7#!l

4::(6(07!2!%#*#!5(*#/;4)/()=!14!/(26*(,()01;#*6#)10=#%#/!1501!(:F:4*6#!64,;74A##2>!

%5(65!254%lq!650)6#q!/#)(07!*01#)=15A!0)/2581!/4%)?=>!!$5#)!5/!8)/#*210)

#2! 0*#! *#057()?!J*0)=#!

8,9#*!

,(11#/!

!

!

!

!

!

!

\Hnno?!;>!o

[GG!D492>!L:!15#!:#,07##/>!J:!15#![lGq!4:!15#!

/!01!650*=#2!4:1(4)?!;4()1(#!4:!:#,07#:!,4*#!:#,,;42(1(4)!

%#/!1501!0!#!4:!9#()=!1#>!"2!15#!/!64217A!648?!0)/!15#!5#!02F#/!5()/!54%!15#

65#/! :*4,T0*F?!L70>!

T#*6#)1

0/,(11#/

!

!

!

!

!

!

G&Io!

o>!

4*!#2![MM!

:!()=!#!,07#2!

8*1!

(2!#!

,! 15#!

1!

/!

! ! G&Ij!

© Young H. Chun

EExxeerrcciissee PPrroobblleemmss !

Z(%N7'K!">!K8;;42#!1501!15#*#!0*#![M!,07#!0)/!IM!:#,07#!218/#)12!()!Q*>!P58)Y2!.KQK![H[H!67022>!!Q*>!P58)!;477#/!0!stratified!*0)/4,!20,;7#!4:!G!,07#2!0)/!G!:#,07#2!:*4,!5(2!67022>!!g065!,#,9#*!4:!15#!20,;7#!%02!02F#/?!SQ(/!A48!*#6#(3#!8)08154*(E#/!5#7;!4)!A48*!7021!10F#&54,#!#<0,()01(4)aU!!'%4!4:!15#!G!,07#2!0)/!:48*!4:!15#!G!:#,07#2!()!15#!20,;7#!54)#217A!0)2%#*#/!Sb#2>U!

!!!L()/!0)!8)9(02#/!#21(,01#!4:!15#!;*4;4*1(4)!+!4:!15#!218/#)12!%54!*#6#(3#/!8)08154*(E#/!5#7;!4)!15#!#<0,>!!

! !!!!

Z(%N7'K!->!d065#7!L03074*4!28*3#A#/!Go!*(/()=&,4%#*!4%)#*2!()!_#%!J*7#0)2?!0)/!:48)/!15#!:4774%()=!/#26*(;1(3#!2101(21(62!:4*!15#!5482#547/!()64,#!+!x!~H?MMMCB!

Descriptive Statistics!Mean 72.5

Variance 100.0

Minimum 50.0

First Quartile 58.4

Second Quartile 69.5

Third Quartile 75.4

Maximum 250.0 !

">!Z#!(2!()1#*#21#/!()!/#1#61()=!0)A!outliers!()!15#!/010>!!

+0C!L()/!15#!three-sigma limits!:4*!15#!/010>!!

! !!!!!+9C!L()/!15#!outer-fences!:4*!15#!/010>!!

! !!!

! !!!

N>!Z#!(2!0724!()1#*#21#/!()!normalizing!15#!4*(=()07!/010!2#1>!!

+6C!$501!(2!15#!)4*,07(E#/!z!264*#!:4*!0)!492#*301(4)!X!]!jMa!!

! !!!!+/C!O2#!15#!,()&,0<!)4*,07(E01(4)!14!1*0)2:4*,!15#!3078#!X!]jM!4)14!15#!*0)=#!}&H>M?!tH>M`>!!

! !!!