Upload
hoangcong
View
219
Download
5
Embed Size (px)
Citation preview
1-1
© Young H. Chun
SSeessssiioonn 11.. IInnttrroodduuccttiioonn ttoo DDaattaa MMiinniinngg
"Data, data everywhere, but not a thought to think." from Jesse Shera's paraphrase of Coleridge
* Data Mining
! Progress in digital data acquisition and storage technology has resulted in the growth of huge data bases.
- Supermarket transaction data - Credit card usage records - Telephone call details - Government statistics - Medical records Ex] Wal-Mart makes over 20 million transactions daily.
AT&T has 100 million customers and carries on the order of 300 million calls a day on its long distance network. ! Interest has grown in the possibility of tapping these data, of extracting from them information that might be of value to the owner of the database. The discipline concerned with this task has become known as data mining.
* Definition of Data Mining
! Simply stated, “data mining refers to extracting or ‘mining’ knowledge from large amounts of data.” ! The mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining. Thus, “knowledge mining from data” is more appropriate…
1-2
© Young H. Chun
* Knowledge Discovery in Database
! Data mining is often set in the broader context of knowledge discovery in data base or KDD. This term originated in the artificial intelligence (AI) research field.
! Stages of KDD
- Selecting the target data - Preprocessing the data (cleaning and integration) - Transforming them if necessary - Performing data mining to extract patterns and relationships - Interpreting and assessing the discovered structures.
! In ISDS 4141, we will focus primarily on data mining algorithms, rather than the overall process.
! Data mining is an interdisciplinary exercise.
Statistics, database technology, machine learning, pattern recognition, artificial intelligence, and visualization, all play a role.
* Afraid of Mathematics?
No mathematical background beyond high school algebra is required for an understanding of data mining. Mathematical derivations are generally not included in this lecture notes.
The emphasis here will be (1) on providing students with solid and effective evidence concerning the power and applicability of modern data mining methods, (2) on making sure that they can use these techniques, and (3) on indicating the assumptions underlying these techniques, as well as their limitations.
To accomplish these purposes, it is neither necessary nor appropriate to deluge students with mathematics.
1-3
© Young H. Chun
11..11.. VVaarriiaabbllee aanndd DDaattaa
* Variable of Interest
! As a decision-maker, what information do you need?
Ex] IQ scores of 30,000 LSU students…
Ex] Do you wash your hands after using the restroom?
! Collect data for analysis!
- IQ scores, X = {123, 105, 136, … }
- Wash hands? X = {Yes, No, Yes, Yes, No, …} ! Other examples
- Annual household income?
- Mileage on your BMW?
- Proportion of US voters who support abortion?
- Are you left-handed?
"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." by H. G. Wells
No Yes
X
1-4
© Young H. Chun
* Type of Data
! Level of measurement and types of measurement scales
How much information does a set of numbers really convey?
Level Implication Meaning Operations
Nominal Labeling only Counting Ordinal Ranking Ordering
Interval Meaningful distances between values
Addition, subtraction, multiplication by constant
Ratio Meaningful zero-point Multiplication & division, etc.
Ex] ! Gender, undergraduate major, political party.
! Grades, BCS rankings, movie classification
! Temperature, survey questions
! Stock price, height, starting salary.
* Another Classification…
! Quantitative (cardinal) data: IQ score?
! Qualitative (attribute) data: Wash your hand?
Ratio
Interval Ordinal
Nominal
Attribute
Cardinal
1-5
© Young H. Chun
* Nature of Modeling
It is concerned with optimal decision making in, and modeling of, deterministic and probabilistic systems that originated from real life.
Real world system Assumed
real world system
Assumptions
Model
prediction
* Phase of the Modeling
1. Problem Definition
2. Model formulation
3. Model solution
4. Model validity
5. Implementation
* Types of Model
Linear and logistic regression model, neural network, Bayesian model, decision tree, and so on.
1-6
© Young H. Chun
11..22.. DDaattaa MMiinniinngg TTaasskkss
1. Exploratory data analysis (EDA)
The goal is simply to explore the data without any clear ideas of what we are looking for. Typically, EDA techniques are interactive and visual.
2. Descriptive modeling
The goal is describe all the data (or the process generating the data)
! Models for the overall probability distribution of the data (density estimation)
! Partitioning of the p-dimensional space into groups(cluster analysis and segmentation)
! Models describing the relationship between variables (dependency modeling)
3. Predictive modeling: Classification and regression
The aim is to build a model that will permit the value of one variable to be predicted from the known values of other variables. In classification, the variable being predicted is categorical, while in regression the variable is quantitative.
4. Discovering patterns and rules
The goal is to detect unusual behavior or association rules.
5. Retrieval by content
The user has a pattern of interest and wishes to find similar patterns in the text or image data set.
1-7
© Young H. Chun
A. Prediction
Ex] Police Officers: In a study of differences in levels of community demand for police officers, the following regression equation was fitted to data from 39 towns in Delaware County, Pennsylvania. ( E. J. Mathias and C. E. Zech, “The community demand for
police officers,” American Journal of Economics and Sociology, 44, 1985, 401-410 )
Y = Number of full-time police officers per capita X1 = Maximum base salary of police officers X2 = Percentage of population that is black X3 = Estimated per capita income X4 = Population density X5 = Amount of intergovernmental grants per capita X6 = Number of miles from center city Philadelphia X7 = Percentage of population that is male and between 12
and 21 years of age. Ex] Housing Price: To explain the selling prices of houses, the following model was fitted, , to a sample of 815 sales. ( B. A.
Newsome, “Adjusting comparable sales for vinyl siding,” The Appraisal Journal, 59, 1991, 92-95.)
Y = Selling price of house, in dollars X1 = Square footage of living area X2 = Size of garage, in number of cars X3 = Age of house, in years X4 = Dummy variable taking the value 1 if the house has a
fireplace, and 0 otherwise X5 = Dummy variable taking the value 1 if the house has brick
siding and 0 if it has vinyl siding.
! Multiple liner regression analysis ! Neural network
1-8
© Young H. Chun
Ex] Box Office Success: Predict whether a film will be a hit or a miss at the box office long before it is even made. (R. Sharda and
D. Delen, "Predicting box-office success of motion pictures with neural networks." Expert Systems with Applications 30, 2006, 243-254)
Y = Gross revenue X1 = Rating by censors X2 = Competition from other films at the time of release X3 = Strength of the cast X4 = Genre X5 = Special effects X6 = Whether it is a sequel, and X7 = Number of theatres it opens in.
Ex] Netflix Prize: Substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win a million dollars. (http://www.netflixprize.com/)
The training data set consists of more than 100 million ratings from over 480 thousand randomly-chosen, anonymous customers on nearly 18 thousand movie titles.
The test set contains over 2.8 million customer/movie id pairs, but with the ratings withheld. You must provide predictions for all the withheld ratings for each customer/movie id pair in the test set.
Netflix will score your predictions by computing the square root of the averaged squared difference between each prediction and the actual rating (the root mean squared error or "RMSE") in the test set.
1-9
© Young H. Chun
B. Classification
Ex] Consider two groups in Baton Rouge: riding-mower owners and those without riding-mowers. In order to identify the best sales prospects for an intensive sales campaign, a riding-mower manufacture is interested in classifying families as prospective owners or non-owners on the basis of income and lot size. Ex] The titanic dataset gives the values of four categorical attributes for each of the 2201 people on board the Titanic when it struck an iceberg and sank. The attributes are age (adult or child), gender (male or female), social class (first class, second class, third class, or crewmember), and whether or not the person survived. The question of interest is considered to be how survival relates to the other attributes. ! Logistic regression - Logit model - Probit model - Complimentary log-log model
! Bayesian classification - Prior information - Misclassifications costs - Likelihood function
! Density estimation - Kernel method - Multivariate binary Kernel method
! Decision tree
1-10
© Young H. Chun
C. Cluster Analysis
Ex] J.C. Penny creates special catalogs targeted to various demographic groups based on attributes such as income, location, and physical characteristics of potential customers. To determine the target mailings of the various catalogs and to assist in the creation of new, more specific catalogs, the company performs a clustering of potential customers based on the determined attribute values. The results of the clustering exercise are then used by management to create special catalogs and distribute them to the correct target population based on the cluster for that catalog. ! Classification pertains to a known number of groups and the
operational objective is to assign new observations to one of these groups
! Cluster analysis is a more primitive technique in that no assumptions are made concerning the number of groups or the group structure. Grouping is done on the basis of similarities or distances.
! Distance measure for variables and items
! Distance measure for categorical variables ! Linkage methods
! k-mean method
! Nearest mean method
1-11
© Young H. Chun
D. Association Rules and Sequence Discovery
Ex] A grocery store chain keeps a record of weekly transactions where each transaction represents the items bought during one cash register transaction. The executives of the chair receive a summarized report of the transactions indicating what types of items have sold at what quantity. In addition, they find that 100% of the time that PeanutButter is purchased, so is Bread. In addition, 33% of the time PeanutButter is purchased, Jelly is also purchased.
! Apriori algorithm ! Frequent pattern growth (FP-growth) algorithm Ex] The Webmaster at LSU periodically analyzes the Web log data to determine how visitors of the Web pages access them. He is interested in determining what sequences of pages are frequently accessed. He determines that 70 percent of the visitors of page A follow one of the the following patterns of behavior: (A, B, C) or (A, D, B, C) or (A, E, B, C). He then determines to add a link directly from page A to C.
! Hidden Markov chain
E. Time Series Data Mining
! Use distance measures to determine the similarity between different time series.
! Examine the structure of the line to determine (and perhaps classify) its behavior.
! Use the historical time series plot to predict future values.
1-12
© Young H. Chun
11..33.. DDaattaa MMiinniinngg SSooffttwwaarree
* Types of Data Mining Software
1. Application-specific software
Aimed at providing solutions to end-users for common tasks.
- Unica for customer relationship management - Urban Science for location and distribution
2. Technique-specific software
Focused on a few data mining methods.
- Decision trees: CART (Salford Associates)
- Artificial neural network NeuralWorks (Neuralware)
- Rule Induction WizWhy (Wizsoft), See5 (Rulequest)
3. General purpose (horizontal) data mining tools
Designed for data mining analysts who may be statisticians, business analysts or experts in a particular business domain.
Enterprise Miner (SAS), Clementine (SPSS), Intelligent Miner (IBM), Teraminer (Retrograde Data Systems), Insightful Miner, Darwin (Oracle)
- Those are powerful, comprehensive, and easy-to-use - Need substantial learning effort and very expensive.
1-13
© Young H. Chun
A. SAS Enterprise Miner
! Streamline the entire data mining process from data access to model deployment by supporting all necessary tasks within a single, integrated solution, all while providing the flexibility for efficient workgroup collaborations.
! Provide advanced predictive and descriptive modeling tools and algorithms, including decision trees, neural networks, auto-neural networks, memory-based reasoning, linear and logistic regression, clustering, associations, time series and more.
! SEMMA data mining approach combines a structured process with the logical organization of the tools needed to support each of the five steps:
Sample your data by extracting a portion of a large data set big enough to contain the significant information, yet small enough to manipulate quickly.
Explore your data by searching for unanticipated trends and anomalies in order to gain understanding and ideas.
Modify your data by creating, selecting, and transforming the variables to focus the model selection process.
Model your data by allowing the software to search automatically for a combination of data that reliably predicts a desired outcome.
Assess your data by evaluating the usefulness and reliability of the findings from the data mining process and estimate how well it performs.
! Once you have developed the champion model using the SEMMA-based mining approach, it then needs to be deployed to score new customer cases. Model deployment is the end result of data mining - the final phase in which the ROI from the mining process is realized.
1-14
© Young H. Chun
B. IBM DB2 Intelligent Miner (IM)
! Intelligent Miner is a family of data analysis software available from IBM.
! IBM produces a vast array of software for enterprise customers.
! Its products are often available on a larger number of platforms, and work well with many other products.
! The Intelligent Miner (IM) family 1. Intelligent Miner for Data 2. Intelligent Miner for Text 3. Intelligent Miner - Modeling - Scoring - Visualization
C. Weka
Weka is an open source data mining software which is a collection of machine learning algorithms for solving real-world data mining problems. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
D. XL Miner
XL Miner is a data mining add-in for Microsoft Excel. It offers a full repertoire of techniques for classification, prediction, affinity analysis, data exploration, and data reduction.
1-15
© Young H. Chun
TThhee WWiizzaarrdd ooff OOddddss
* Background
Ms. Marilyn vos Savant, listed in the “Guinness Book of World Records Hall of Fame” for “Highest IQ” in the world, replied that, "Yes; you should switch. The first door has a one-third chance of winning, but the second door has a two-thirds chance."
When she innocently printed the reply in the magazine supplement to many Sunday newspapers, she had no idea that it would provoke a national controversy. She received thousands of letters, nearly all insisting that, because two options remained, the chances were even.
The most vehement criticism has come from statisticians and scientists, who have alternated between gloating at her and lamenting the nation's innumeracy.
Whose side are you on?
Dear Marilyn:
Suppose you're on a game show, and you're given the choice of three doors. Behind one door is a car, the others, goats. You pick a door, say #1, and the host, who knows what's behind the doors, opens another door, say #3, which has a goat. He says to you, "Do you want to pick door #2?" It is to your advantage to switch your choice of doors?
Craig F. Whitaker, Columbia, Maryland.
1-16
© Young H. Chun
A. Letters from Academia
! Since you seem to enjoy coming straight to the point, I'll do the same. In the following question and answer, you blew it! As a professional mathematician, I'm very concerned with the general public's lack of mathematical skills. Please help by confessing your error and it the future being more careful. Robert Sachs, Ph. D. George Mason Univ.
! You blew it, and you blew it big! ... There is enough mathematical illiteracy in this country, and we don't need the holder of the world's highest I.Q. propagating more. Shame!
S. S., Ph.D. University of Florida
! Your answer to the question is in error. But if it is any consolation, many of my academic colleagues have also been stumped by this problem.
Barry Pasternack, Ph. D. California Faculty Association
! You're in error, but Albert Einstein earned a dearer place in the hearts of people after he admitted his errors. Frank Rose, Ph.D. University of Michigan
! I have been a faithful reader of your column, and I have not, until now, had any reason to doubt you. However, in this matter ( for which I do have expertise ), your answer is clearly at odds with the truth. James Rauff, Ph. D. Millikin University
! May I suggest that you obtain and refer to a standard textbook on probability before you try to answer a question of this type again?
Charles Reid, Ph.D. University of Florida
! Your logic is in error, and I am sure you will receive many letters on this topic from high school and college students. Perhaps you should keep a few addresses for help with future columns. W. Robert Smith, Ph.D. Georgia State University
! You are utterly incorrect about the game show question, and I hope this controversy will call some public attention to the serious national crisis in mathematical education. If you can admit your error, you will have contributed constructively towards the solution of a deplorable situation. How many irate mathematicians are needed to get you to change your mind? E. Ray Bobo, Ph.D. Georgetown University
! I am in shock that after being corrected by at least three mathematicians, you still do not see your mistake. Kent Ford, Dickinson State university
! You are the goat! Glenn Calkins, Ph.D. Western State College
! You are wrong, but look at the positive side. If all those Ph.D.'s were wrong, the country would be in some very serious trouble.
Everett Harman, Ph.D. U.S. Army Research Institute
B. Tom and Ray's Car Talk Radio Show
! What could be more fun than proving a bunch of pompous academics wrong!!!
2-1
© Young H. Chun
SSeessssiioonn 22.. RReevviieeww ooff BBaassiicc TToooollss iinn DDaattaa MMiinniinngg II
"Data, Data, Data! He cried impatiently. I can't make bricks without clay."
by Sherlock Holmes
* Type of Data
• Univariate data, X
- What is a typical ( summary ) value? Central tendency…
- How diverse are these items? Dispersion (variation)…
- Are there any individuals that require special attention? Outliers…
• Bivariate data, ( X1 and X2 ) or ( X and Y )
- Is there a simple relationship between the two?
- How strongly are they related?
- Can you predict one from the other?
- Are there any individuals that require special attention? • Multivariate data, (Y, X1, X2,…)
- Is there a relationship among them?
- How strongly are they related?
- Can you predict one from the others?
- Are there any individuals that require special attention?
2-2
© Young H. Chun
22..11.. PPooppuullaattiioonn aanndd SSaammppllee
* What is Statistics?
Statistics is the art and science of
(A) collecting, organizing, summarizing, presenting and
analyzing data, and
(B) drawing valid conclusions and making reasonable
decisions on the basis of such analysis.
* Type of Statistics
A. Descriptive statistics ( deductive statistics )
B. Statistical inferences (inductive, or inferential statistics)
Descriptive statistics Inferential statistics
! Small population X
! Large population
- Census X
- Sampling X X
"Statistical thinking will one day be as necessary for efficient citizenship
as the ability to read and write." by H. G. Wells
Sample, n Population, N
Population parameters Sample statistics
2-3
© Young H. Chun
* Population Parameters
! Population (Universe)
Total set of elements of interest for a given problem.
! Population parameters: !, "2, #
- Numerical characteristic of the population.
- Unknown constants.
Type of Data Population parameters 1. Population mean
! for the central
tendency $%
%!N
i
i
N
x
1
A. Quantitative data 2. Population
variance "2 for
dispersion $%
!&%"
N
i
i
N
x
1
22 )(
B. Qualitative data with two categories
3. Population
proportion #: N
x%#
Ex] Final grades this semester = { 4, 3, 3, 2}
! Population mean
! =
! Population variance
"2 =
! Population standard deviation
" = Ex] 20 females in a class of 50 students
! Population proportion, # =
2-4
© Young H. Chun
* Sample Statistics
- Sample A small part (or subset) of the population used to gain information about the whole.
- Sample statistics: x , s2, p
- Numerical characteristic of the sample. - They are known when we have taken a sample, but they are random variables, changing from sample to sample.
Type of Data Sample statistics
1. Sample mean x for the central tendency
$%
%n
i
i
n
xx
1
A. Quantitative data
2. Sample variance s
2 for dispersion $
% &
&%
n
i
i
n
xxs
1
22
1
)(
B. Qualitative data with two categories
3. Sample proportion p n
xp %
Ex] IQ scores of 3 students in the sample = {106, 124, 130}
! Sample mean x =
! Sample variance s2 =
! Sample standard deviation s =
Ex] 6 students are left-handed in a sample of 50 students
! Sample proportion, p =
2-5
© Young H. Chun
22..22.. IInnttrroodduuccttiioonn ttoo PPrroobbaabbiilliittyy
* Definition
P[A] = the probability that the outcome of the random experiment is an element of the set A.
1. Relative frequencies
If a trial is performed a large number of times in an independent manner, the fraction of times that event A occurs will approach, as a limit, the value P[A].
Ex] Monte Carlo Simulation: Toss a fair coin
0
0.5
1
0 20 40 60 80 100 120 140 160 180 200
“I returned, and saw under the sun, that the race [is] not to the swift, nor the battle to the
strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all. --- Ecclesiastes 9.11
2. Subjective probability
- The probability P[A] is the degree of certainty one feels that event A will occur.
- One-time trials: unique, non-repeatable trials.
Win => $1,000
Lose => $0
p
1-p $
1. Risky option 2. Certainty option
2-6
© Young H. Chun
* Simple Probability
If there are n possible outcomes, all equally likely, and an event X occurs in k of these outcomes, we say that the probability of X is k/n and is denoted as
n
kXP %][
Ex] Indecent Proposal: In the dice game “craps”, the shooter (usually one of the players with the largest bet) throws two dice and the sum of the two numbers that appear is observed.
! Possible outcomes, S
1, 1 1, 2 1, 3 1, 4 1, 5 1, 6
2, 1 2, 2 2, 3 2, 4 2, 5 2, 6
3, 1 3, 2 3, 3 3, 4 3, 5 3, 6
4, 1 4, 2 4, 3 4, 4 4, 5 4, 6
5, 1 5, 2 5, 3 5, 4 5, 5 5, 6
6, 1 6, 2 6, 3 6, 4 6, 5 6, 6
! Probability of X (Sum of the two numbers)
0.00
0.05
0.10
0.15
0.20
2 3 4 5 6 7 8 9 10 11 12
P[Sum=7] =
P[Snake eyes] =
2-7
© Young H. Chun
* Random Experiment, Outcomes, and Events
Experiment
outcome
Sample space Real-valued
space
Probabilityspace
Tail
Head
random variable
0
1
* Sample Space:
The set of all possible experimental outcomes.
• Finite • Infinite - Countably infinite - Uncountably infinite
• Discrete vs. Continuous random variables
Ex] 1. Final grade in this course S =
2. Number of marriage proposals until accepted S =
3. Length of your honeymoon S =
Case 1: Endless honeymoon Case 2: Love is short; Marriage is long…
2-8
© Young H. Chun
Ex] A coin is tossed three times and observed to be either a head or a tail each time. • Sample space
Finite sample space.
S =
• Random variable:
Let X be the number of heads.
Discrete random variable
• Probability distribution
X
P[X]
• Bar graph (Histogram)
2-9
© Young H. Chun
Ex] A fair coin is flipped successively at random until the first head is observed. Let X denote the number of flips of the coin that are required. • Sample space
S =
Countably infinite sample space.
• Random variable:
Let X be the number of flips.
Discrete random variable
• Probability distribution
X
P[X]
• Bar graph (Histogram)
Ex] Starting salary
2-10
© Young H. Chun
* Probability and Statistics
Probability is the inverse of statistics.
- Statistics helps you go from observed data to generalizations about how the world works.
- Probability goes the other direction: If you assume you know how the world works, then you can figure out what kinds of data you are likely to see and the likelihood for each
• Probability How the world works " What will happen
• Statistics What happened " How the world works
"A career is nothing to leave to chance." by American Statistical Association
Ex] "Ask Marilyn," Parade Magazine, (August 3, 1997), p. 15
It is a well-established fact that in any randomly
chosen group of 50 people, it is virtually certain that
two will have birthdays on the same date. Since there
are 365 days in a year, I can't understand why this is
the case. Can you explain this phenomenon?
! P[None] =
! P[At least one] =
If n = 23, then P[At least one] =
2-11
© Young H. Chun
* Probability and Odds ! The odds favoring event A over event B is 1 in
][
][):(
BP
APBAO % .
! The odds favoring event A ( over Ac ) is 1 in
][
][):(
c
c
AP
APAAO % .
! Relationship:
][1
][)(
AP
APAO
&%
][1
][][
AO
AOAP
'%
! The odds of dying from an injury in 2000 were 1 in 1,820.
! The lifetime odds of dying from an injury for a person born in 2000 were 1 in 24.
2-12
© Young H. Chun
Ex] If the odds are 1 in 10 favoring event A, then the probability of favoring event A is 1/11. In the fair game, the payoff ratio should be 10 to 1 should event A occur. In other words, the payoff odds is 10 to 1.
Win
Lose
$10
$1
p=1/11
10/11
Ex] e-mail from a former student
I know that the correct odds for rolling a seven in craps are 1 in 5. There are 6 chances to roll a seven out of 36 possibilities. This is 1/6, or, expressed in odds, 5 unfavorable numbers to 1 favorable.
If the casino's payout in odds is 4 to 1, I am told that the casino's edge is 16.67%. I understand the edge to be the difference between the player's chance of winning the bet (correct odds) and the casino's payout. I would like to know how to calculate the 16.67%.
I am off by a factor of 2 when I work out the problem. Thanks for the help.
E(X) =
2-13
© Young H. Chun
* Axioms of Probability:
Let S be a sample space. Then,
A1: For every event A, 0 # P[A] # 1 A2: P[S]=1
A3: P[A()B]=P[A]+P[B] if A and B are mutually exclusive. * Properties of Probability
P[*] = 0 P[ A ] = 1 - P[A]
If A+B, then P[A] # P[B]
P[A ()B] = P[A] + P[B] - P[A ,)B] Ex] Marilyn vos Savant, "Ask Marilyn", Parade Magazine, ( Aug. 4,1996 ), p. 6.
Ex] Exactly three runners, Andy, Bob, and Cade, are in a race: Andy is twice as likely to win as Bob and Bob is twice as likely to win as Cade.
• P[Andy wins the race]?
• P[Andy or Bob wins the race]?
Andy Bob Cade P[win]
Say that Tom studied a lot of mathematics in college and was campus chess champion too. If that's the case, which of the following statements is more likely to be true? A. “Tom is now a mathematician.” B. “Tom is now a mathematician and plays chess as a hobby.”
2-14
© Young H. Chun
22..33.. CCoonnddiittiioonnaall PPrroobbaabbiilliittyy
* Conditional Probability
! We have two events, A and B, whose occurrences are in some way connected.
! The event A is unknown, while the event B is known.
! The conditional probability P[A|B] of the event A given that the event B has occurred.
- . - .- .BP
BAPB|AP
,%
Ex] Throw two dice and observe the two numbers that appear.
! 36 possible outcomes (X, Y)
1, 1 1, 2 1, 3 1, 4 1, 5 1, 6
2, 1 2, 2 2, 3 2, 4 2, 5 2, 6
3, 1 3, 2 3, 3 3, 4 3, 5 3, 6
4, 1 4, 2 4, 3 4, 4 4, 5 4, 6
5, 1 5, 2 5, 3 5, 4 5, 5 5, 6
6, 1 6, 2 6, 3 6, 4 6, 5 6, 6
! P[X=6] = ! P[X=6 | Y $ 5] = ! P[X=6 | X+Y $ 10] = ! P[X=6 | X $ Y] = ! P[X=6 | X $ Y and X+Y $ 10] =
2-15
© Young H. Chun
Ex] Fighting Tigers: LSU Tigers plays 60% of its games at home and 40% of its games away. Given that the team has a home game, there is a 0.8 probability of winning. Given that the team has an away game, there is a 0.4 probability of winning.
1. Tree diagram
2. Probability Table
Result Win Lose
Home Game
Away
1.00 (a) Given that the team has a home game, what is the probability of winning? P[ Win | Home ] = (b) If the team wins on a particular Saturday, what is the probability that the game was played at home? P[ Home | Win ] =
Home
Away Win
Lose
Win
Lose
2-16
© Young H. Chun
Ex] Market Basket Analysis: 100 transactions with four items.
Wine Cheese Beer Chip -
1 x x x x . 2 x x x . 3 x x . 4 x x x . 5 x . 6 x x x . 7 x x . 8 x . 9 x x . 10 x x . . . . . . .
100 . . . . .
Total 4 6 5 8 .
Find the conditional probability (aka, confidence) of the following association rules in data mining.
(a) “Wine => Cheese”
(b) “Cheese => Wine”
Cheese Yes No
Total
Yes 4 Wine
No 96 Total 6 94 100
(c) “Beer => Chip”
(d) “Chip => Beer”
(e) “Beer => Wine”
(f) “Wine and Cheese => Chip”
2-17
© Young H. Chun
* Independent Events
If P[A] = P[A|B], then A and B are independent.
Ex] Your final grade in this course.
P[A] =
P[A | ACT score > 25] =
P[A | Age > 25] =
P[A | Checking account balance > $5,000] =
Ex] Fighting Tigers
P[ Win ] =
P[ Win | Home game ] =
• If independent, then P[A ,)B ] = P[A] / P[B] * Two Laws of Probability:
Addition law Multiplication law
• A or B • A and B
• P[A ()B]
= P[A] + P[B] - P[A ,)B]
• P[A ,)B]
= P[A] P[B|A]
= P[B] P[A|B]
• If mutually exclusive,
P[A ,)B]=0.
Thus,
P[A ()B] = P[A] + P[B]
• If independent,
P[B|A]=P[B]
or P[A|B]=P[A].
Thus,
P[A ,)B] = P[A] P[B]
2-18
© Young H. Chun
22..44.. BBaayyeessiiaann AAnnaallyyssiiss
* Two Approaches in Statistics:
1. Classical approach (Frequentist) - Use only empirical evidence; i.e., the evidence contained in samples from the population or process of interest. - Frequency interpretation of probability
2. Bayesian approach - Use any and all available information, whether it be sample information or information of some other nature. - Subjective interpretation of probability: Degree of belief
Ex] Predict your batting average at the end of the baseball season.
Sample information! 3 hits in 4 at-bats during the first game of the season.
! Classical approach: Use only the sample information:
! Bayesian approach: Use not only the sample information, but also (1) prior information and (2) loss function
Prior
probabilities P[0]
New information from research or
experimentation x
Bayesian analysis Posterior (revised)
probabilities P[0|x]
2-19
© Young H. Chun
Ex] Want to be a lawyer?
On the night of August 21, 2004, a man was struck by a speeding taxi as he crossed the street. The city where the accident occurred has only two taxi companies, Blue Cab and Green Cab. Blue Cab has only 15% of the taxis in the city. An eyewitness has testified that she thought the hit-and run taxi was blue. The man sued the Blue Cab Company for his medical expenses.
At the trial, the man’s lawyer shows that the eyewitness is 80% reliable in identifying the color of taxis. That is, she was able to identify correctly the color of taxis 80% of the time, under conditions like those of the night of the accident.
The lawyer claims that it is extremely likely that the man was actually hit by a blue cab. Do you agree? Why, or why not?
! P[Speeding cab was blue | Testified it as blue] =
! P[Speeding cab was green|Testified it as blue] =
Blueacc
Greenacc Bluewit
Greenwit
Bluewit
Greenwit
The speeding cab was Witness testifies it as
2-20
© Young H. Chun
# Another eyewitness has testified that she thought the taxi was blue. Suppose that her reliability is also 80%.
! Prior probabilities
P[Blueacc] =
P[Greenacc] = ! Joint probabilities
P[Blueacc, Bluewit, Bluewit] =
P[Greenacc, Bluewit, Bluewit] = ! Posterior probabilities
P[Blueacc | Bluewit, Bluewit] =
P[Greenacc|Bluewit, Bluewit] =
Y. H. Chun, "Bayesian Analysis of the Sequential Inspection Plan via the Gibbs Sampler," Operations Research, Forthcoming.
Y. H. Chun and R. T. Sumichrast, "Bayesian Inspection Model with the Negative Binomial
Prior in the Presence of Inspection Errors," European Journal of Operational Research, Forthcoming.
Blueacc
Greenacc Bluewit
Greenwit
Bluewit
Greenwit
The speeding cab was
Witness #1 testifies it as
Bluewit
Bluewit
Witness #2 testifies it as
2-21
© Young H. Chun
Ex] Want to be a doctor?
Suppose that a laboratory blood test is 95% effective in detecting a certain diseases when it is, in fact, present. However, the test also yields "false positive" result for 2% of the healthy persons tested (i.e., if a healthy person is tested, then, with probability 0.02, the test result will imply he/she has the disease. If 0.1% of the population actually has the disease, what is the probability a person has the disease given that his/her test result is positive?
(A) P[ Virus | Positive ] vs. (B) P[ No virus | Positive ] ?
! P[Virus|Positive]
! P[Virus|Positive, Positive]
Virus
No virus
Positive
Negative
Positive
Negative
2-22
© Young H. Chun
Ex] Jailer’s Dilemma "Ask Marilyn," Parade Magazine, ( July 5, 1992 ), p. 23
Three prisoners on death row are told that one of
them has been chosen at random for execution the next
day, but the other two are to be freed. One privately
begs the warden to at least tell him the name of one other
prisoner who will be freed. The warden relents: 'Chad will go
free.' Horrified, the first prisoner says that because he is now
one of only two remaining prisoners at risk, his chances of
execution have risen from one-third to one half!
Should the warden have kept his mouth shut?
– Marvin M. Kilgo III, Camden, S. C."
Ex] Game Show Problem: "Ask Marilyn," Parade Magazine, ( Sep. 9, 1990 )
Suppose you’re on a game show, and you’re given a choice
of three doors. Behind one door is a car; behind the others,
goats. You pick a door, say No. 1, and the host, who knows
what’s behind the doors, opens another door, say No. 3, which
has a goat. He then says to you, "Do you want to pick door No.
2?" Is it to your advantage to switch your choice?
• , "On the Information Economics Approach to the Generalized Game Show Problem," The American Statistician, Vol. 53, (February, 1999), pp. 43-51. • , "Game Show Problem," OR/MS Today ( June, 1991 ), p. 9.
2-23
© Young H. Chun
22..55.. RRaannddoomm VVaarriiaabblleess
* Random Variable
Random variable X is a numerical description of the outcome of an experiment.
123
%tailif,0
headif,1X
* Probability Distribution, P[X]
Distribution of a random variable X
Ex] Coin-toss experiment, P[X]
X 0 1 P[X] 0.4 0.6
Ex] Starting salary f(x)
1 0
f(x) 0.4
0.6
x x
P[X]
1
0
Random variable, X Sample space, S={H, T}
Head
Tail
1
0
Probability, P[X]
0.4
0.6
2-24
© Young H. Chun
* Discrete random variable
1. Probability mass function ( p.m.f. ):
! The probability that the random variable X will take a value x
! P[X=x], P[x], or Px
2. Cumulative distribution function ( c.d.f. )
! The probability that the random variable X will take a value less than or equal to a.
! P[X 4 a]
Ex] Toss a dice
x 1 2 3 4 5 6 P[X=x]
P[X 4x]
* Continuous random variable
1. Probability density function ( p.d.f. ):
! f(x)
2. Cumulative distribution function ( c.d.f. )
! F(x) = P[X 4 x]
Ex] Starting salary is 40 4 X 4 60.
! f(x) = for 40 4 X 4 60
! P[X<45] =
60
f(x)
x 50 40
2-25
© Young H. Chun
A. Expected Value, E[X] = ! ! Definition
551
552
3
%
6
$
7
7&
variablerandomcontinuous,)(
variablerandomdiscrete,][
][
dxxfx
xpx
XEx
! Properties
Let a and b are constant numbers
1. E[a] = a
2. E[aX+b] = a E[X] + b
3. E[ X1+X2+...+Xn] = E[X1] + E[X2] +...+E[Xn]
B. Variance, Var[X] = "2
! Definition
Var[X] = E[(X-!)2] = E[X2] - !2
! Properties
1. Var[a] = 0
2. Var[aX+b] = a2 Var(X)
3. Var[X1+X2+..+Xn] =Var[X1] + Var[X2] +..+Var[Xn]
if there are independent.
4. Var[X8Y] = Var[X] + Var[Y] 8 2 Cov{X, Y}
2-26
© Young H. Chun
Ex] Distribution of a random variable X
X 0 1
P[x] 0.5 0.5
E[X] = E[X2] = Var[X] =
! Distribution of a new random variable Y = 2X +1
X 0 1
Y
P[y] 0.5 0.5
E[Y] = E[Y2] = Var[Y] =
3 1
0.5 0.5
y
P[Y] 0.5
Y
3
1
0.5
Y2
1 0
0.5 0.5 0.5
X
x
P[X]
0
1
0.5
X2
2-27
© Young H. Chun
Ex] Discrete Case: Suppose that a random variable X can take only the values 0, 2, and 4 and that the probabilities of these values are as follows:
X 0 2 4
P[x] 0.3 0.5 0.2
(a) Draw the probability mass function P[x]
(b) Find the expected value of X. ! E[X] = (c) Find the variance of X.
! E[X2] =
! Var[X] = (d) Find the expected value and the variance of Y = 2X+1.
X 0 2 4
Y
P[Y] 0.3 0.5 0.2
! E[Y] = ! Var[Y] =
x 0 2 4
2-28
© Young H. Chun
Ex] Continuous Case: Suppose the density of X is given by
123 44
%otherwise0
10for2)(
xxxf
(a) Sketch the probability density function f(x).
(b) Find the cumulative distribution function F(x). ! F(x) = (c) Find the expected value of X. ! E[X] = (d) Find the median and mode of X.
! Median = ! Mode =
(e) Find the variance of X. ! E[X2] = ! Var[X] = (f) Find the expected value and the variance of Y = 3X+2. ! E[Y] = ! Var[Y] =
x 0 0.5 1.0
2-29
© Young H. Chun
22..66.. SSttaannddaarrddiizzaattiioonn
Let "
!&%
XZ ))
) ))where E[X] = ! and Var[X] = "2 denote the mean and variance of the random variable X.
! Expected value of Z:
E[Z] = 9:;
<=>
"!
&"
%9:;
<=>
"!
&"
XEX
E1
=
! Variance of Z
Var[Z] = 9:;
<=>
"!
&"
%9:;
<=>
"!
&"
XVarX
Var1
=
Ex] A. Distribution of a random variable X.
! E[X] = ! Var[X] =
B. Distribution of a random variable Z = (X-3)/2
! E[Z] = ! E[Z2] = ! Var[Z] =
5 1
0.5
Z
P[Z]
2 3 4 -2 -1 0
0.5
5 1
0.5 0.5
X
P[X]
2 3 4 -2 -1 0
2-30
© Young H. Chun
Ex] Consider the following discrete random variable:
X -2 0 8
P[X] 0.2 0.5 0.3
(a) Find the expected value of X. ! E[X] = (b) Find E[X2]. ! E[X2] = (c) Find the standard deviation of X.
! Var[X] = "? =
! Std[X] = " = (d) Find the expected value and the variance of a new random variable, Z = (X-2)/4.
X -2 0 8
Z
P[Z] 0.2 0.5 0.3
! E[Z] = ! Var[Z] =
2-31
© Young H. Chun
22..77.. BBiivvaarriiaattee RRaannddoomm VVaarriiaabbllee
* Covariance
! Cov(X, Y) = "xy
= E[ (X-!x) (Y-!y) ] =E[XY]-!x !y = E[XY]-E[X] E[Y]
! Cov(aX+b, cY+d) = a c Cov(X,Y)
! Cov(X, Y+Z) = Cov(X, Y) + Cov(X, Z)
! Var(X8Y) = Var(X) + Var(Y) 8 2 Cov(X, Y)
* Correlation coefficient
yx
xyxy
YVarXVar
YXCov
""
"%%@
][][
],[, where -1 # @xy # +1.
! "xy = the covariance between X and Y
= E[ (X-!x)(Y-!y) ]
= E[XY] -!x!y
) ! "x = the standard deviation of X
2)( xXE !&% 22][ xXE !&% )
) ! "y = the standard deviation of Y
2)( yYE !&% 22][ yYE !&%
2-32
© Young H. Chun
Ex] Discrete Case: Final grades in Marketing and ISDS
ISDS (Y)
4: A2 3: B2
4: A1 0.2 0.3 0.5 Marketing (X)
3: B1 0.4 0.1 0.5
0.6 0.4 1.0
E[X] = E[Y] =
E[X2] = E[Y2] =
Var[X] = "x2 = Var[Y] = "y2 =
E[XY] =
Cov[XY] ="xy =
@xy =
# Prove that 408.05.0*5.0*4.0*6.0
3.0*4.01.0*2.0&%
&%@xy
2-33
© Young H. Chun
22..88.. PPrroobbaabbiilliittyy DDiissttrriibbuuttiioonnss
I. Discrete Distributions, P[x], x=0, 1, 2,...,
• Binomial distribution ( n, p )
• Hypergeometric distribution
• Poisson distribution ()A )
• Negative binomial distribution
• Geometric distribution
II. Continuous Distribution, f(x), -% < x < +%
• Uniform distribution ( a, b )
• Normal ( z ) distribution ( !, "2 )
• Exponential distribution ()A )
• t, B2 and F distributions
III. Multivariate Distribution f(x, y)
• Multinomial distribution
• Bivariate normal distribution
H. Moskowitz and , "Two-dimensional Free-replacement Warranties," Product Warranty Handbook, eds. by Murthy and Blischke, Marcel Dekker, 1996, 341-363
* Classifications:
• Probability Density (or mass) Function ( p.d.f. ):
P[x] or f(x)
• Cumulative Distribution Function ( c.d.f. ):
F(x)
2-34
© Young H. Chun
A. Univariate Normal Distribution, N(!, "2)
I. Probability Density Function ( pdf ): Bell-curve
f (x|!,"2 ) %1
2# "e
& 1
2
x&!"
C)D)E) F)
G)H)
2>)
=)<)
;)
:)9)
where -% < x < +%
! E[X] = !) ) ) ! Var[X] = "2
0.00
0.10
0.20
0.30
0.40
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
II. Cumulative Distribution Function ( cdf )
0.00
0.20
0.40
0.60
0.80
1.00
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
dyexF
yx
99:
;
<<=
>HGF
EDC"!&
&
7& "#% 6
2
2
1
2
1)( where -% < x < +%
Ex] If the starting salary is X ~ N(!=50, "=10), find P[X<40].
2-35
© Young H. Chun
* How to find F(x) for a given x, ! and "?
! Method 1. Integral dyexF
yx
99:
;
<<=
>HGF
EDC"!&
&
7& "#% 6
2
2
1
2
1)(
! Method 2. Tables
- Tables for each !, ", and x. How many tables?
! Method 3. Single table
- Have a single table for the standard normal
distribution with ! = 0 and " = 1.
- Transform all others into the standard normal z.
- Look up the z table in the book.
! Method 4. Use the Excel function:
- Microsoft Excel =NORMDIST(x, !, ", true)
* Standardization of a Normal Random Variable X
If X ~ N(!, "2), then Z %X & !"
~ N(0,1)
- Standard normal distribution (or z distribution)
• E[X] = 0 • Var[X] = 1 - Microsoft Excel =STANDARDIZE(x, !, ")
2-36
© Young H. Chun
Ex] Suppose that the duration of a flight between New Orleans and New York is a normal random variable X with mean 3.6 hours and standard deviation 0.2 hour. Find the probability that
(a) P[X # 4]
(b) P[X # 3.41]
(c) P[3.41 # X # 4.0]
0.00
0.50
1.00
1.50
2.00
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4
0.00
0.20
0.40
0.60
-4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0
2-37
© Young H. Chun
* Properties
- If X has a normal distribution with ! and "2 and if Y=aX±b, where a and b are given constant and a&0, then Y has a
normal distribution with mean a!±b and variance a2"2. - If Xi, i=1,...,n, are independent and Xi has a normal
distribution with !i and "i2, then the sum
X1+X2+...+Xn has a normal distribution with mean
!1+!2+...+!n and variance "12+"2
2+...+"n2.
- If Xi, i=1,...,n, are i.i.d. random variable from a normal
distribution with ! and "2, then the sample mean X has a
normal distribution with mean !)and variance "2/n.
* The Central Limit Theorem
"When you are listening to corn pop, are you hearing the Central Limit Theorem?"
William A. Massey
Let X1, X2,…, Xi,… be a sequence of independent random
variables with E[Xi] = ! and Var[Xi] = "2.
Then, n
X n
/"
!& has a limiting distribution that is normal with
mean zero and variance 1.
Time, X
Popcorn
!)= 2 min. 15 sec.
2-38
© Young H. Chun
B. Bivariate Normal Distribution
! Probability density function
21221
2112
1),(
@&"#"%xxf
2
1
11212
[)1(2
1exp{ HH
G
FEED
C"!&
@&&
X
]}2
2
2
22
2
22
1
11HHG
FEED
C"!&
'HHG
FEED
C"!&
HHG
FEED
C"!&
@&XXX
E[Xi ] % !i Var[Xi ] % "i2
Corr[X1, X2 ] % @12
! Marginal distributions: Normal(!i, "i2)
! Conditional distribution:
! Surface plot of a bivariate normal density
20 34 48 62 76 90
104
118
132
0
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
2-39
© Young H. Chun
TThhee WWiizzaarrdd ooff OOddddss
From Nico M. van Dijk, “To wait or not to wait: that is the question.” Chance Magazine, Vol. 10, Winter 1997 p. 38.
The famous bus paradox is explained as simply the “curse of variation.”
! Case 1: Suppose that buses arrive exactly every 20 minutes at your corner (i.e., the times between arrivals are 20, 20, and 20 in an hour).
(a) What is the average time between arrivals?
(b) If you arrive at a random time, what is the average waiting time?
! Case 2: The slightest variation in the arrival times, keeping the same average of 20 minutes between buses, can increase your average waiting time significantly. For example, suppose that the times between arrivals are 2, 2, and 56 in an hour.
(a) What is the average time between arrivals?
(b) If you come at a random time in this hour, what is the average waiting time?
2-40
© Young H. Chun
AAppppeennddiixx:: RReevviieeww ooff BBaassiicc CCaallccuulluuss
In the following formulas, f and g represent functions of x, while a and n represent fixed real numbers. A. Derivatives
1. 0%adx
d 2. 1&% nn nxx
dx
d
3. )()(
1)(ln xf
dx
d
xfxf
dx
d% 4. )()()( xf
dx
dee
dx
d xfxf %
5. )()]([)]([ 1 xfdx
dxfnxf
dx
d nn &%
6. )()()()()]()([ xfdx
dxgxg
dx
dxfxgxf
dx
d'%
B. Integrals
7. axdxa %6 8. 1except,1
1 1 &%'
% '6 nxn
dxx nn
9. xdxx ln1 %6 & 10. - . 666 '%' dxxgdxxfdxxgxf )()()()(
11. axax ea
dxe1
%6 12. 66 % dxxfadxxfa )()(
13. )()()()( aFbFxFdxxfb
a
b
a
&%%6 ,where 6% dxxfxF )()(
!
© Young H.
!
!
!
* Vec!
!!!!"#$%
&x'(!!
!
!
)*+#!,#-&'.
!
!!
/01#!
!!!!2#3!
)
!
!
Chun
SSee
ctors
$434*403!%!,#-*05!
x6(7(!x
! "8
!!!!
"
#
)+#!9-:;:,#-*05<!!-*05!:3=!-m>!49!-:
! "?8!
/+#5#!*+##5:*403!*+
3@*+!0$!:
)+#!leng
054@43!
! L"!8
eessssiioonn
3
"!0$!05=xm>(:3=!49
$$$$
%
&
mx
x
x
<
6
'
!
:59!xi!:5#)+49!1:5-03949*9!;;#=!:!50
Ax'(!x6(7
#!154B#!+:*!/54*#
:!,#-*05!
gth!0$!:!,49!@4,#3
8!CC"CC!'
33.. RReevv
iinn DDaatta
33..11 LLiinne
=#5!&m.'9!=#$43#=
#!5#$#55#=5*4-D;:5!50$!n!50/0/!,#-*0
7(!xmE!
9FBG0;!#9!:!-0;D
,#-*05!"!03!GF!*+#!H
66
6' xx (
vviieeww ooff
aa MMiinniin
eeaarr AAllgg
'>!49!:3!0=!GF!:!G0
=!*0!:9!-5#15#9#3/9!:3=!'05!:3=!49!
&‘>!=#30*B3!:9!:!
0$!m!#;#BHF*+:@05
66 <<< x((
ff BBaassiicc
nngg IIII!
ggeebbrraa
05=#5#=!90;=$:-#=
0B103#3*:*403!49!-0;DB3/54**#3!:
*#9!*+#!*550/!:3=
B#3*9!#B5#:3!$05B
6mx !
cc TToooollss
9#*!0$!5#:=(!;0/#5-
#3*9!05!#;9!-:;;#=!:3<!!%!,#-:9!$0;;0/
5:39109#=!,4-#!,#
B:3:*43@BD;:I!
ss
:;!3DBG#-:9#!;#**#
;#B#3*9!:!-0;DB3*05!0$!05/9I!
#!&4<#<(!*+#59:><!
@!$50B!*+
JK'!
#59!#5<!
0$!3!5=#5!
+#!
+#!
!
© Young H.
!!!!L-:!
2
!M
!
!!!!N#-!
)
!
!!
!!!!N#-!
)
!!#"$!2!
&:>!"!O!!&G>!L"!!!&->!"K%!!&=>!"4
Chun
:;:5!BD;*
2#*!c!G#!:/4*+!i*
$!c!8!'PL
-*05!:==
)+#!9DB!3DBG#
! &!8!
-*05!BD;
)+#!433#9:B#!3-0B10
"’%
#*!"?8!AQ
O!%!8!!!
8!!
%!8!!!
49*:3-#!G
*41;4-:*40
:3!:5G4*5*+!#3*5F!c
L"(!*+#!,#
4*403!
0$!*/0!,#5!0$!#3*5
"!O!%!/4
;*41;4-:*4
5!&05!=0*3DBG#5!003#3*!150
= x'y'!O
Q(!'E!:3=!
G#*/##3!
03I!Scali
5:5F!9-:;cxi<!
#-*05!49!3
,#-*059!"54#9(!49!*+
4*+!i*+!#3
03I!Proj
*>!150=D-0$!#3*54#0=D-*9!
O!x6y6!O7
%?!8!A'(!
"!:3=!%I
ing!
;:5<!!)+#3
305B:;4R
"!:3=!%(!+:*!,#-*0
3*5F!zi!8!x
jection!
-*!0$!*/0#9!49!=#$4
7+ xmym
SE<!
I!!CC"K%CC!
3!*+#!150
R#=T!4<#<(!
#:-+!+:,05!
xi!O!yi<!
0!,#-*0593#=!:9!*+
m<!
8!!!!
0=D-*!c"
4*9!;#3@*
,43@!*+#!
9!"!:3=!%+#!9DB!0
"!49!:!,#-
*+!49!'<!
9:B#!
%!/4*+!*+0$!
JK6!
-*05!
+#!
! ! JKJ!
© Young H. Chun
#"$!'()*+,-!./0(%*1*I!LD1109#!*+:*!$4,#!9*D=#3*9!43!*+#!-;:99!1099#99!*+#!$0;;0/43@!-+:5:-*#549*4-9I!!
! N:54:G;#9!
U#4@+*! V#4@+*! W#3=#5! XF#!-0;05! U:45!-0;05!
M*#B9!
%BG#5! YS! 'JQ! Z! @5##3! G;03=!
[5:=! Y\! '6]! ^! G50/3! G50/3!
_+:=! \6! 6]]! ^! G;D#! G;03=!
"0D@! \Y! 6']! ^! G50/3! G50/3!
XB4;F! \`! 'Y]! Z! G50/3! G50/3!!
a943@!03;F!*+#!+#4@+*9!:3=!/#4@+*9!43!*+#!*:G;#(!-:;-D;:*#!*+#!=49*:3-#9!G#*/##3!1:459!0$!9*D=#3*9<!
!
! !!XD-;4=#:3!=49*:3-#9!!
! %BG#5! [5:=! _43=F! "0D@! X=!
%BG#5! 0 15.30 65.49
[5:=! 15.30 0 80.16 90.45 41.48
_43=F! 65.49 80.16 0 10.77 40.45
"0D@! 75.95 90.45 10.77 0 50.04
XB4;F! 28.65 41.48 40.45 50.04 0 !
100
120
140
160
180
200
220
60 65 70 75 80
A
B
C D
E
!
© Young H.
* Mat!
! !!"!
%
!
!
!
K!!
K!
!
K!
!
K!
!! !!^
!
)
!
!
Chun
trices
#$434*403
%!B:*54bD11#5-:55:3@
!)nm&.
M$!m!8!n
M$!aij!8!aB:*54b
M$!aij!8!]*+#3!*+
M$!aij!8!*+#3!*+=#30*#
^:*54b!BD
)+#!150=B:*54b@4,#3!G
! 'ijc
3!
b!.!0$!05-:9#!;#**#@#=!43*0!m
!!!!
"
#
'
m
n
a
a
a
<
'
6'
''
>
n(!*+#3!*+
aji!$05!:;;b!49!-:;;#=
]!$05!:;;!+#!B:*54b
'!$05!:;;!+#!B:*54b#=!2<!
D;*41;4-:*
=D-*!.3!b!3!8!cbj
GF!
*'
'n
sisba
'
5=#5!m .!#5(!49!:!5#m!50/9!:
ma
a
a
<
6'
66'
'6'
+#!B:*54b
;!#;#B#3=!:!symm
0$$K=4:@b!49!-:;;#
03K=4:@b!49!-:;;#
*403!
0$!:3!m.
jkd!49!*+#
sjb (!!!!!i!8
n(!@#3#5#-*:3@D;:3=!n!-0
mn
n
n
a
a
a
<
<<
<
<
6
'
b!49!-:;;#
3*9!0$!:!9metric!B
@03:;!#;##=!:!diag
03:;!#;#B#=!:!iden
.n!B:*54!m.k!B:
8!'(!6(!7
5:;;F!=#3;:5!:55:F!0;DB39I
$$$$
%
&
!
#=!:!squ
9eD:5#!BB:*54b<!
#B#3*9!0gonal!B:
B#3*9!0$ntify!B:*
4b!.!8!ca:*54b!'!/
7(!m!:3=
30*#=!GF0$!#;#B
are!B:*5
B:*54b(!*+
0$!:!9eD::*54b<!
$!:!=4:@0*54b!:3=!4
aijd!:3=!/+09#!#;
=!j!8!'(!6
!:!G0;=$:B#3*9!
54b<!
+#3!*+#!
:5#!B:*54b
03:;!B:*49!D9D:;;
:3!n.k!;#B#3*!c
(!7(!k.
JKS!
:-#!
b(!
*54b(!;F!
cij!49!
!
© Young H.
!!D!
)
!
!
!
! !!In!
2
!
! !!Tr!
2
!
!
!
#"$!2
!
&:>!C.!!&G>!tr&!!&->!.3
Chun
etermina
)+#!deter
=#30*#
! C.C!8
nverse!0$
2#*!2!G#!*3!9D-+9eD:5#
race!0$!:
2#*!.!8!cB:*54b#;#B#3
! tr&.
#*! ') >66&.
C!8!!!
&.>!8!!!
3!8!!!!
ant 0$!:!
rminant!#=!GF!C%C
8!k
jja
'*''
$!:!9eD:5
*+#!m.m
+!*+:*!.3#!m.m!!B
:!9eD:5#!
caijd(G#!:b!%(!/54**3*T!4<#<(!
.>!8!*'
m
j
a'
$%
&!"
#'
J'
Q6
9eD:5#!B
0$!*+#!9eC(!49!*+#!9
jA +' &CC
5#!B:*54b
m!4=#3*4*F3!8!3.!8B:*54b!.
B:*54b
:3!m.m
*#3!*5&%>
iia !
%
&!:3=!
)6&3
B:*54b!
eD:5#!m9-:;:5!
j(+ '>' !
b!
F!B:*54b8!2!49!-:.!:3=!49!=
9eD:5#!B>(!49!*+#!
!"
#
+'
)
J
>63
.m!B:*5
<!!)+#!9e:;;#=!*+#!=#30*#=!G
B:*54b<!!9DB!0$!*
$%
&+
6'
QJ
54b!.!8!c
eD:5#!m.inverse!GF!.K'<!
)+#!trac
*+#!=4:@0
!
caijd(!
.m!!B:*50$!*+#!
ce!0$!*+#03:;!
JKQ!
54b!
#!
! ! JKY!
© Young H. Chun
* Matrices and Systems of Linear Equations !
! !!_0394=#5!*+#!9F9*#B!0$!;43#:5!#eD:*4039!@4,#3!GF!!
! ! a''!x'!O!a'6!x6!O!7!O!a'n!xn!8!b'!! ! a6'!x'!O!a66!x6!O!7!O!a6n!xn!8!b6!! ! 7! am'!x'!O!am6!x6!O!7!O!amn!xn!8!bm!!
! !!a943@!B:*54-#9!-:3!@5#:*;F!94B1;4$F!*+#!9*:*#B#3*!:3=!90;D*403!0$!:!9F9*#B!0$!;43#:5!#eD:*4039<!!
! .
$$$$
%
&
!!!!
"
#
'
mnmm
n
n
aaa
aaa
aaa
<
<<<<
<
<
6'
6666'
''6''
(!!"
$$$$
%
&
!!!!
"
#
'
nx
x
x
<
6
'
(!!:3=!!4
$$$$
%
&
!!!!
"
#
'
mb
b
b
<
6
'
!
!
! )+#3(!4*!B:F!G#!/54**#3!:9!."!8!4<!!)+D9(!"!8!.K'!4!!
! !!%!90;D*403!*0!:!;43#:5!9F9*#B!0$!m!#eD:*4039!43!n!D3f30/3!,:54:G;#9!49!:!9#*!0$!,:;D#9!$05!*+#!D3f30/3!,:54:G;#9!*+:*!9:*49$4#9!#:-+!0$!*+#!9F9*#Bg9!m!#eD:*4039<!!#"$!Z43=!:!90;D*403!*0!*+#!;43#:5!9F9*#B<!! &a9#!^MhNXiLX!:3=!^^a2)!43!Xb-#;>!!
! !]'6
Q6'
6'
6'
'+
'(
xx
xx! ! . $
%
&!"
#
+'
'6
6'(!" $
%
&!"
#'
6
'
x
x(!:3=!4 $
%
&!"
#']
Q!
!! ! "!8!.K'!4!8!
!
© Young H.
* Mat
!!.551/6
Xb-#;!+:9B:*54-#9(-#;;!*+:*!/*0!:==!*+49!#3*#5#=50/!=0/!
!!71/51/6
)+#!*5:39-0;DB39!*+#3!9#;#B:*54b<!L*+#!;0/#5*5:39109#!
!!8)(+19)0!BD;*41*+:*!*+#!B3DBG#5!0*+#!$459*!B$05!*+#!15^^a2)%55:F!6<!G0*+!*+#!150=D-*!B!
!!71/51/6L#;#-*!:!-9#;#-*!^"943@;#!,:0$!,:;D#9*+#!=#*#5B43,#59#!0+:9!30!D3D34eD#!90!
!!71/51/6M$!*+#!43,BD9*!30*!B:*54b<!_B#3D!$0591#-4$F!*+B#5#;F!+:3=!AL+4$43!*+#!+4@
Chun
trix Ope
6!:-!;)4+-0<
9!30!$D3-*403(!G0*+!B:*54-/4;;!G#!$459*!#!-#;;9!$05!*+=(!-;4-f!:3=!=3!$05!*+#!3DB
6!+=,!>-0/*
9109#!0$!:!B*+#3!*+#!*5:3-*!-01F<!W0!L#;#-*!X=4*(!*+5!54@+*!-053##!0$!*+#!9#;#-
(%1/6!80+-1
1;F!B:*54-#9B:*54-#9!-:3!0$!50/9!0$!*+B:*54b!:3=!*+50=D-*!B:*54)<!X3*#5!*+#!-i:*+#5!*+:3!-03*50;!A_*5;B:*54b!/4;;!:
6!0!?,+,-@1
-#;;!/+#5#!F"X)Xi^!$5:;D#!:3=!30*!9!$05!*+#!9eD:B43:3*!/4;;!0$!*+#!B:*54b34eD#!90;D*400;D*403<!
6!0/!2/A,-*,
,#59#!0$!:!9eD#eD:;!]<!)0!_;4-f!03!*+#!H5!^:*+!j!)54+#!:55:F!0$!,4**43@!X3*#5!$*E!f#F9!/+4;@+;4@+*#=!:5#
erations
<+1/6!80+-1<
3!*0!:==!05!B-#9!BD9*!+:,50/!:3=!$459+#!$459*!50/!:=5:@!*+#!-#;;BG#5!0$!50/
9:*,!:B!0!8
B:*54b!+:9!*+#39109#!0$!%!*0!*+#!-#;;!/+#3!9#;#-*!H#5!0$!*+#!H:9*-*#=!B:*54b!/
1<,*!C1+=!#"
9(!*+#!$459*!9*#G#!BD;*41;4#+#!9#-03=!B:+#!9:B#!3DBb!/4*+!*+#!:1-#;;9!$05!*+#!B#5#;F!+4**4;E!:3=!AL+4$*:11#:5!43!*+#
1/0/+!C1+=!#
0D!/:3*!*0!150B!*+#!$D3-:3!:55:F!0$!,:5#!B:*54b!$0:11#:5!43!*+!:3=!*+#!9F9*03<!M$!*+#!=#*
,!:B!0!;D)0-
D:5#!B:*54b!$43=!*+#!43,H:9*#!ZD3-*44@<!M3!*+#!/4,:;D#9!$05!*+#05!-;4-f43@!0;#!+4**43@!X3#:<!
s with M!
<,*!C1+=!#"
BD;*41;F!$D3-,#!*+#!3DBG#9*!-0;DB3!0$!:3=!$459*!-0;D;!:-5099!*0!$0/9<!)+49!/4;;!
80+-1"!C1+=!#
#!50/9!:3=!-+:9!n!50/9!:/+#5#!F0D!/::9*#!L1#-4:;*#!L1#-4:;!B/4;;!:11#:5<
"<,(!#1!49!*0!B:f##=<!)+#!3DB:*54b<!)+#!15BG#5!0$!-0;DB1150154:*#!3$459*!B:*54b!43@!X3*#5!05!E!f#F9!/+4;##!+4@+;4@+*#=
#"<,(!
1D*!*+#!=#*#5B-*4039!D3=#5,:;D#9<!M3!*+#05!/+4-+!F0D#!91#-4$4#=!-*#B!0$!#eD:**#5B43:3*!49!
-,!80+-1"!C
&k50/9!8!k-,#59#(!$459*!+4403!GD**03!:33=0/!G#94=##!9eD:5#!B:*03!*+#!lm!GD3*#5!05!-;4-f43
Microso
"<,(!-*4039<!)0!:=#5!0$!50/9!:3$ *+#!B:*54b!*DB3!0$!*+#!B05!*+#!3DBG#F4#;=!*+#!9DB
#"<,(!-0;DB39!5#,#:3=!m!-0;DB:3*!*+#!$459*!5<!_;4-f!03!*+B#3D<!h#b*!-;
#!9D5#!*+:*!*+BG#5!0$!-0;DB50=D-*!B:*54bB39!:9!*+#!93DBG#5!0$!50/43!%55:F!'!:-;4-f43@!03!*#!+4**43@!X3*#=!:5#:!4$!#,#5
B43:3*!*+#3!*+#!B#3D!$0#!/43=0/!G#D!/:3*!*+#!=#-#;;<!M$!*+49!,*4039!:990-4:30*!R#50!*+#3
C1+=!#"<,(!
0;DB39>!#b44@+;4@+*!*+#!:3=!9#;#-*!^M#!*+#!/05=!%*54b!$05!/+4-D**03(!F0D!B3@!03!*+#!lm
oft Exc
==!05!9DG*5:-3=!-0;DB39<!*+:*!/4;;!-03*B:*54-#9!*0!G#5!0$!-0;DB3B!0$!*/0!B:
#59#=<!M$!*+#!BB39<!Z459*(!+450/!:3=!*+#!+#!G0b!G#94=#;4-f!03!lm!0
+#!=4B#39403B39!0$!*+#!$45b!/4;;!+:,#!*9#-03=!B:*54/9!:3=!-0;D:3=!*+#!-#;;9!*+#!lm!GD**0#5!05!-;4-f43@5F*+43@!+:9!G
-;4-f!03!*+#05!^:*+!j!)5#94=#!*+#!/0#*#5B43:3*<!_,:;D#!49!R#50!:*#=!/4*+!*+#3!:3!43,#59#
9*9!*+#3!*+#!=:5#:!-055#910MhNXiLX!$5%ii%n!G#94-+!F0D!/:3*!BD9*!+0;=!=0/m!GD**03<!)+
cel
-*!M3!:!*:43!*+#!9DB#!:==#=<!l339<!h#b*!-;4-f:*54-#9<!
B:*54b!%!+:@+;4@+*!*+#!0$459*!-0;DB3#!)5:39109#!05!+4*!*+#!X3*
39!0$!*+#!B:59*!B:*54b!B*+#!9:B#!3DBb<!M3!Xb-#;(!DB39<!L#;#-*!$05!*+#!9#-0303(!F0D!BD9@!03!*+#!lmG##3!=03#!-0
#!H:9*#!ZD3-*54@<!)+#!=#*#05=!%ii%n!_;4-f!lm!:3*+#3!F0D!-:3#!-0#$$4-4#3*9#!-:3!G#!$0D3
=#*#5B43:3*!03=43@!/4*+!50B!*+#!$D3-4=#!*+#!/05=*+#!43,#59#<!/3!G0*+!*+#!+#!43,#59#!B
!
B(!#3*#5!:!$05B3-#!*+49!$05Bf!:3=!=5:@!*+
9!m!50/9!:3054@43:;!B:*53!0$!*+#!*5:39*+:*!49!$0D3=*#5!f#F<!)+#!
:*54-#9!:5#!9DBD9*!#eD:;!*+#BG#5!0$!50/+4@+;4@+*!:3*+#!$D3-*4033=!B:*54b!439*!+0;=!=0/3m!GD**03<!)+#055#-*;F<!
*403!GD**03<!h#5B43:3*!49!:91#-4$F!*+#!3=!*+#!,:;D#!330*!$43=!:3!9!$05!*+#!B:*3=!:3=!*+#5#!
0$!*+#!B:*54*+#!05=#5!0$-*4039!D3=#5!=!%ii%n!i:*+#5!*+:3-03*50;!A_*5
B:*54b!/4;;!:1
JK\!
BD;:!BD;:!+:*!
=!n!54b(!9109#!=!43!
D-+!#!/9!:9!3!:5#:!3!3!3!#!
h0/!:!:55:F!0$!
*54b!49!:!
4b!$!*+#!*+#!
3!5;E!11#:5!
! ! JK`!
© Young H. Chun
33..22.. OOppttiimmiizzaattiioonn !
* Optimization of a Function of a Single Variable !
<!!%!necessary!-03=4*403!$05!:!1:5*4-D;:5!90;D*403!x!8!xo!*0!G#!#4*+#5!:!B434BDB!05!:!B:b4BDB!49!*+:*!!
! ! ]>&'
dx
xdf!:*!x!8!xo<!
!
!!!)+#5#!:5#!$4,#!10994G;#!90;D*4039!9:*49$F43@!*+#9#!-03=4*4039<!!
!!
!!!)0!0G*:43!B05#!43$05B:*403!:G0D*!*+#9#!$4,#!critical!1043*9(!4*!49!3#-#99:5F!*0!#b:B43#!*+#!9#-03=!=#54,:*4,#<!!)+D9(!4$!!
! ! ]>&
6
6
,dx
xfd!:*!x!8!xo(!
!
*+#3!xo!BD9*!G#!:*!;#:9*!:!;0-:;!B434BDB<!M$!f&x>!49!:!convex!$D3-*403(!4*!49!:!@;0G:;!B434BDB<!!!!L4B4;:5;F(!4$!!
! ! ]>&
6
6
-dx
xfd!:*!x!8!xo(!
!
*+#3!xo!BD9*!G#!:*!;#:9*!:!;0-:;!B:b4BDB<!M$!f&x>!49!:!concave!$D3-*403(!4*!49!:!@;0G:;!B:b4BDB<!
f&x>!
W;0G:;!B434BD
x
W;0G:;!B:b4BD
M3$;#-*403!1043*!
20-:;!B:b4BDB
20-:;!B434BDB
! ! JKp!
© Young H. Chun
* Optimization with Microsoft Excel !)0!D9#!L0;,#5(!$459*!B:f#!9D5#!*+:*!4*!49!439*:;;#=<!!)0!=0!90(!9#;#-*!Tools!$50B!*+#!B:43!
B#3D<!M$!*+#!01*403!L0;,#5!:11#:59(!*+#3!L0;,#5!49!:;5#:=F!439*:;;#=!:3=!F0D!:5#!5#:=F!*0!150-##=<!M$!*+#!01*403!L0;,#5!=0#9!30*!:11#:5(!*+#3!F0D!BD9*!439*:;;!L0;,#5<!
)0!439*:;;!L0;,#5(!9#;#-*!Add-Ins!$50B!*+#!Tools!B#3D(!*+#3!9-50;;!*+50D@+!*+#!%==KM39!=4:;0@!G0b!D3*4;!F0D!$43=!*+#!L0;,#5!01*403<!_;4-f!03!*+49!01*403!&:!-+#-fKB:5f!9+0D;=!:11#:5!43!*+#!-055#9103=43@!G0b>!:3=!*+#3!-;4-f!03!OK<!%$*#5!:!G54#$!1#540=!*+#!439*:;;:*403!/4;;!G#!-0B1;#*#<!&n0D!B:F!G#!:9f#=!*0!439#5*!*+#!054@43:;!_"Kil^<>!
n0D!9+0D;=!30/!G#!:G;#!*0!D9#!*+#!L0;,#5!GF!-;4-f43@!03!*+#!Tools!+#:=43@!03!*+#!B#3D!G:5!:3=!9#;#-*43@!*+#!Solver…!4*#B<!)+#!L0;,#5!H:5:B#*#59!=4:;0@!G0b!/4;;!:11#:5<!!
)+#!Set Target Cell!G0b!9+0D;=!-03*:43!*+#!-#;;!;0-:*403!0$!*+#!0Gq#-*4,#!$D3-*403!$05!*+#!150G;#B!D3=#5!-0394=#5:*403<!!!Max!05!Min!B:F!G#!9#;#-*#=!$05!$43=43@!*+#!B:b4BDB!05!B434BDB!0$!*+#!9#*!*:5@#*!-#;;<!!M$!Value of!49!9#;#-*#=(!*+#!L0;,#5!/4;;!:**#B1*!*0!$43=!:!,:;D#!0$!*+#!):5@#*!_#;;!#eD:;!*0!/+:*#,#5!,:;D#!49!1;:-#=!43!*+#!G0b!qD9*!*0!*+#!54@+*!0$!*+49!9#;#-*403<!!!)+#!By Changing Cells!G0b!9+0D;=!-03*:43!*+#!;0-:*403!0$!*+#!=#-49403!,:54:G;#9!$05!*+#!150G;#B<!
Z43:;;F(!*+#!-039*5:43*9!BD9*!G#!91#-4$4#=!43!*+#!Subject to the Constraints!G0b!GF!-;4-f43@!03!Add<!!Change!:;;0/9!F0D!*0!B0=4$F!:!-039*5:43*!:;5#:=F!#3*#5#=!:3=!Delete!:;;0/9!F0D!*0!=#;#*#!:!15#,40D9;F!#3*#5#=!-039*5:43*<!!Reset All!-;#:59!*+#!-D55#3*!150G;#B!:3=!5#9#*9!:;;!1:5:B#*#59!*0!*+#45!=#$:D;*!,:;D#9<!!Options!43,0f#9!*+#!L0;,#5!01*4039!=4:;0@!G0b<!!V#!5#:;;F!9+0D;=3g*!#,#5!+:,#!*0!/055F!:G0D*!*+#!L0;,#5!l1*4039!=4:;0@!G0b<!!%9!F0D!-:3!9##(!:!9#54#9!0$!=#$:D;*!-+04-#9!:5#!43-;D=#=!*+:*!=45#-*!L0;,#5g9!9#:5-+!$05!*+#!01*4BDB!90;D*403!:3=!$05!+0/!;03@!4*!/4;;!9#:5-+<!)+#!Guess!9#;#-*403!49!30*!1:5*4-D;:5;F!D9#$D;!$05!0D5!1D5109#9!:3=!/4;;!30*!G#!=49-D99#=!+#5#<!!!
V+#3!*+#!Add!GD**03!49!-;4-f#=(!*+#!%==!_039*5:43*!=4:;0@!G0b!:11#:59I!_;4-f43@!03!*+#!Cell Reference![0b!:;;0/9!F0D!*0!91#-4$F!:!-#;;!;0-:*403!&D9D:;;F!:!-#;;!/4*+!:!$05BD;:><!!)+#!Constraint!*F1#!B:F!G#!9#*!GF!9#;#-*43@!*+#!=0/3!:550/!&r8(!s8(!8(!43*(!/+#5#!43*!5#$#59!*0!43*#@#5(!05!G43(!/+#5#!G43!5#$#59!*0!G43:5F><!!)+#!Constraint!G0b!B:F!-03*:43!:!$05BD;:!0$!-#;;9(!:!94B1;#!-#;;!5#$#5#3-#(!05!:!3DB#54-:;!,:;D#<!!)+#!Add!GD**03!:==9!*+#!-D55#3*;F!91#-4$4#=!-039*5:43*!*0!*+#!#b49*43@!B0=#;!:3=!5#*D539!*0!*+#!%==!_039*5:43*!=4:;0@!G0b<!!)+#!OK!GD**03!:==9!*+#!-D55#3*!-039*5:43*!*0!*+#!B0=#;!:3=!5#*D539!F0D!*0!*+#!L0;,#5!"4:;0@!G0b<!
V+#3!F0D!5D3!Xb-#;g9!L0;,#5(!4*!#b#-D*#9!:!9#54#9!0$!t:==K43u!$4;#9!:3=!50D*43#9<!!a103!-0B1;#*403!0$!*+#!,:540D9!150@5:B9(!Xb-#;!15#9#3*9!*+#!D9#5!/4*+!*+#!L0;,#5!i#9D;*9!=4:;0@!G0b<!
!
© Young H.
#"$!^!
! ! !!!
#"$!^
!
! ! !!!!#"$!"t*54:;!90;,#!B#v!wM91#3=-+4;=5
!
! !!
! !
! !
! !!
! !!
Chun
^434B4R#
^:b4B4R#
"#:5!^:5:3=! #554*!B:*+#M!/:3*!*<! %=B49#3(!x<Q]
!N:54:G;
!!
!!
!!
!_039*5:!!!!
!f&x>!8!x6
#!f&.>!8!.
54;F3I!tM!505u! B#*#B:*4-:;0!*:f#!'99403! $0<!!U0/!B
;#9!&303K
:43*9!
6!K!Sx O!Q
yJ
J .+. e!
90;,#=!**+0=(! :3;;F!GD*!+']]!1#0105! B#3! 4B:3F!0$!
K3#@:*4,#
Q!
*+49!150G3=! 943-#!+:,#!+:=1;#!*0!*+49! x'](!#:-+!-:3
K!Z5#
#(!43*#@#
!
G;#B!:!$#*+#3(! M
=!30!9D-#!-45-D9$05! /0B3!M!*:f#v
#=!^<![D;;0
#5!,:;D#>
#/!F#:59g,#! G##3-#99<!_:(!:3=!M!+B#3! x6vu!
0-f(!_#550!
!
9!:@0!GF!3! *5F43@:3!F0D!++:,#!xpQ<Q(! :3=!
W05=0(!h<
JK']!
*+#!@! *0!+#;1!Q!*0!$05!
<_<!z!
! ! JK''!
© Young H. Chun
33..33.. EEnnttrrooppyy
!* Definition !
! )+#!entropy!H&X>!0$!:!=49-5#*#!5:3=0B!,:54:G;#!X!49!!
! ! j
c
jj ppXH ;0@>&
'*'
+' !
!
! )+#!;0@!49!*0!*+#!G:9#!6!:3=!#3*501F!49!#b15#99#=!43!bits<!!! )+#3(!H&X>!49!*+#!;0/#5!G0D3=!03!*+#!3DBG#5!0$!binary!!! ! eD#9*4039!5#eD45#=!*0!=#*#5B43#!*+#!,:;D#!0$!X<!!
! ! H&X>!"!Xb1#-*#=!3DBG#5!0$!{D#9*4039!&X{>!r!H&X>O'!!! k!M$!*+#!G:9#!0$!*+#!;0@:54*+B!49!b(!! ! /#!/4;;!=#30*#!*+#!#3*501F!:9!Hb&X><!!
Hb&X>!8!&;0@b!a>!Ha&X>!8!Ha&X>P!&;0@a!b>!!
)+D9(!H&X>!8!He&X>P!ln!6!05!H&X>!8!H']&X>P!;0@']!6!!!#"$!)099!:!$:45!-043!03-#<!!
! ! He&X>!8!!!
! ! H&X>!8!!!! M*!*:f#9!03;F!03#!G43:5F!eD#9*403!*0!$43=!*+#!5#9D;*I!X{8'!!
! ! M9!4*!:!+#:=v! M$!*+#!:39/#5!49!F#9(!*+#3!4*!49!:!+#:=!! ! ! ! ! M$!*+#!:39/#5!49!30(!*+#3!4*!49!:!*:4;<!
! ! JK'6!
© Young H. Chun
#"$!V+#5#!:5#!F0D!$50Bv!!
X Louisiana Mississippi Texas Others
P(X) 4/8 2/8 1/8 1/8 !
! !!X3*501F!! He&X>!8! !!:3=!!!!!H&X>!8!!!
! !!X{!8!!!
!!!!! !!! !!)+#!9#eD#3-#!0$!eD#9*4039!B:**#59!43!=:*:!B4343@v!!
!!!!! !! H&X>!8!'<\Q!G4*9(!GD*!X{!8!!!! !!M$!4*!49!:!BD;*41;#!-+04-#!eD#9*403!/4*+!S!:39/#59!! ! He&X>!8!!! ! HS&X>!8!!
'PQ
SPQQP\!
6P\!\P`!
'P`!
6P`!
n#9!
h0!
n#9!
h0!
n#9
h0
^4994994114v!
20D494:3:v!
)|!!!!!!!!'!
^L!!!!!!!6!
2%!05!!!J!0*+#59
'P`!
QP`!
6]P`
'P6
'P66PS!
6PS!SP`!
SP`!
6P`!
n#9!
h0!
n#9!
h0!
n#9
h0
20D494:3:v!
^4994994114v!
)#b:9v!
2%!!!!!!!!'!
^L!!!!!!!6!
)|!05!!!J!0*+#59
SP`!
6P`!
'SP`
)#b:9v!
! ! JK'J!
© Young H. Chun
#"$!V+:*!/:9!F0D5!@5:=#!43!ML"L!6]]'v!!
X A B C D
P(X) 1/4 1/4 1/4 1/4
!! He&X>!8! :3=! H&X>!8!!!!! %<!l3#!10994G;#!9#eD#3-#I!X{!!8!!!!
! !!
! [<!l1*4B:;!9#eD#3-#I!X{!!8!!!!
! ! ! !!!! !!h0*!:!G43:5F!eD#9*403(!GD*!-+#-f!03#!0$!*+#!$0D5!G0b#9<!!
! He&X>!8!! :3=!!
! HS&X>!8!!
%v
h0
n#9!
h0
h0
n#9
%!05![v!
'P6
'P6
'P6
'P6SP`!
SP`!
n#9!
_v
%y
[y
_y
"y
pPS
'P6
'P66PJ!
'PJ!JPS!
'PS!
'PS!
n#9!
h0!
n#9!
h0
n#9
h0
%v!
[v!
_v!
%!!!!!!!'
[!!!!!!!6!
_!05!!!J!"!
'PS!
6PS!
! ! JK'S!
© Young H. Chun
* Huffman Codes!!
! Z43=!*+#!B09*!#$$4-4#3*!9#54#9!0$!F#9K30!eD#9*4039!*0!!! ! =#*#5B43#!:3!0Gq#-*!$50B!:!-;:99!0$!0Gq#-*9<!!
Xb!'E!
! !!
! !!He&X>!8!! :3=! H&X>!8!!!
! !!X{!8!!!!
Xb!6E!
! !!
! !!He&X>!8!! :3=! H&X>!8!!!
! !!X{!8!!!!
Xb!JE!
! !!
! !!He&X>!8!! :3=! H&X>!8!!!
! !!X{!8!!!!! !
.! ]<6Q! ! ! ! !
3! ]<6Q! ! ! ! !
'! ]<6]! ! ! ! !
?! ]<'Q! ! ! ! !
#! ]<'Q! !
.! 'PS! 6PS! 6PS! SPS!
3! 'PS! 'PS! 6PS! !
'! 'PS! 'PS! ! !
?! 'PS! !
E.! SP`! SP`! SP`! `P`!8;! 6P`! 6P`! SP`! !>F! 'P`! 6P`! ! !G+=,-*! 'P`!
!
© Young H.
!!!!!!)+! ,!
! !
!!!!!!)+!
! !
!
! !
!
!!!!!)+!
! !!
! /! =!!!!!!i#!
! H!
! H!
! I!
! I!
! I!
Chun
+#!joint e
,:54:G;#9
&XH
+#!condit
&XH
XH &
+#!mutua
I&XT
/+4-+!49!=D#!*0!*+
#;:*4039+
H&XCY>!8
H&X, Y>!8
I&YT!X>!8!
I&XT!Y>!8!
I&XT!Y>!8!
33..44.. EE
entropy!H9!X!:3=!n
>(YX +'
tional (m
>CYX '
'YX >C
al inform
Y>!8!H&X
*+#!5#=D#!f30/;
+419!
8!H&X( Y>
8!H&X>!O
H&Y>!K!H
I&YT!X>!
H&X>!O!
nnttrrooppyy
H&X,Y>!0n!/4*+!:!
pyx
**//
+
mean) en
y x
* */ /!"
#+
**/ /
+y x
p
mation &-
X>!K!H&X
D-*403!43#=@#!0$!
>!K!H&Y>
O!H&Y|X>
H&Y|X>!
H&Y>!K!H
ooff TTwwoo
0$!:!1:45!q043*!=49
;0>(& yxp
ntropy!H
>C& yxp
xyp ;0@>&
-5099!#3*5
X|Y>!!
!*+#!D3-Y<!
>!!8!H&Y>
H&X, Y>
oo VVaarriiaabb
0$!=49-59*54GD*40
(&0@ yxp
H&X|Y>!49
&;0@ xp
yp
xyp
>&
>&@
501F> I&X
#5*:43*F!
>!O!H&X|
bblleess
#*#!5:3=03!p&x(!y>
>y !
&>C ypy $%
&
XTY>!49!
0$!X!
Y>!
0B!!>!49!
>y (!05!
JK'Q!
!
© Young H.
#"$!'9FB1*-#5*:43-03=DpQ}!#15#9#36}!0$*#9*#=1:*4#3!
!!~043*!
!
!!_03=!
Hh
!
!)+D9(!
0!
EAH
Chun
'(0**1B1<0
*0B9(!*+#3!=49#:9#D-*!:!-0B#$$#-*4,#3*<!!U0/$!*+#!+#:(!*+#3(!/3*!+:9!*+#
*!150G:G
Test!
=4*403:;!
!
H094*4,# xh#@:*4,# x
*+#!#3*5
8c]<Y]](!]
He&0>8]He&0Cx>E8]<
M&0T!x!>8]
0+1:/!>-
#!15405!1#!49!Y]}B15#+#39#!43!=#*#-/#,#5(!*+#;*+F!1#59/4*+!150G#!=49#:9#
G4;4*4#9(!H
H094*4,#!
h#@:*4,#
!
150G:G4;
N
'!x6!
01F!@:43
)#]<S]]d!
<Y\J!<'Q]!
]<Q6J!
-,,I!LD11150G:G4;4}<!!)+#!=94,#!G;00-*43@!*+##!*#9*!:;99039!*#9*G:G4;4*F!]#<>!
HAxi(!0jE!
N
x'!
#!x6!
;4*4#9(!HA
N45D910'!
]<]\'!
3!$50B!*+
#9*!
HAx'E8!]<
HAx6E8!]<
109#!*+:*F!*+:*!*+=0-*05!/:0=!*#9*<!!#!=49#:9#90!F4#;=9*#=!&4<#<(!]<]6(!*+#
Di
N45D9!0'!!
]<]J]!
]<Y]]!
A0j!C!xiE!
h0!
]
+#!G;00=
H
h
Q\`
<S66
:*(!G:9#=!+#!1:*4#3:3*9!*0!$D)+#!G;00#9!/+#3!49!t$:;9#!14$!:!+#:#!*#9*!5#9
isease
h0!,45D
]<Jp
]<S]
,45D9106!
]<p6p!
=!*#9*!49!
H094*4,#!x'!
h#@:*4,#!x6
03!*+#!13*!+:9!:!$D5*+#5!0=!*#9*!494*!49(!43!$1094*4,#u;*+F!1#59D;*!/4;;!4
D9!06!
p6!
]]!
HA
]<Q]<S
!
0 8c]<p`Y
He&0Cx'>!8
0!8c]<]\
He&0Cx6>!8
1:*4#3*g9
9!$:-*(!u!5#9D;*!$903!49!4B1;F!*+
!
]<S66!
'<]]]!
AxiE!
Q\`!S66!
Y(!]<]'Sd
8]<]\JY!
\'(!]<p6pd
8]<6QY6!
JK'Y!
9!
$05!
+#!
!
!
© Young H.
!!X3*5!!
! H!!
! H!!
! H
!!!!_03=!!
! H!!
! H
!!
!!^D*D!!
! I!!
! I!!
! I
!!!HI!JI!'
Chun
01F!
He&0>!!8!!
He&X>!!8!
He&X(!0>!
=4*403:;!
He&XC!0>!8
He&0CX>!8
D:;!43$05
Ie&XT!0>!8
Ie&XT!0>!8
Ie&XT!0>!8
'=)/!&6]]^:b4B
Q
!
!!
!8!!!
#3*501F!
8!He&X(1
8!He&X(10
5B:*403!
8!He&0>!K
8!He&X>!K
8!He&X>!O
]Q>(!!�L#54:BDB!;4f#;4
Quality Eng
0>!K!He&0
0>!K!He&X
&M3$05B:
K!He&0CX>
K!He&XC0>
O!He&0>!K
:;!M391#-*404+00=!:3=!^gineering(!
0>!!8!!!!
X>!!8!!!
:*403!@:4
>!!8!!!
>!!8!!!
K!He&X,!0
03!H;:3!43!^:b4BDBN0;<!'\(!h
43>(!I&X(1
0>!!8!!!
*+#!H5#9#3B!X3*501F!%h0<!S(!11<!Y
10>!
3-#!0$!M391%1150:-+#Y6\�YJ6<!
#-*403!X5509(�!
JK'\!
059I!
!
© Young H.
* Qua!! )! -!! !!!
! !
! !!
! !
!! '
! !
! !
!
! 6
! !!!!!!!!N#!
! !
Chun
antity o
)+#!eD:3-:3!G#!0G
!!!I&XTY>
!
'<!M$(!:$*#
-0B
H&X
6<!M$!X!:3
H&X
#33!=4:@5
HKX
of Inform
3*4*F!0$!43G*:43#=!G
>!8!H&X>
8!H&Y>
8!H&X>
x
**//
'
#5!5#-#4,
B1;#*#;F!
X|Y>!8!]!:
3=!Y!:5#!
X|Y>!8!H&
5:B!
H&XCY
XL!
mation
3$05B:*GF!0G9#5
>!K!H&X|Y
>!K!H&Y|X
>!O!H&Y>!
(& yxpy
*/
43@!:!B#
=#$43#=(
:3=!I&XT
43=#1#3
&X>!:3=!I
I&XY>!
H&X
403!:G0D5,43@!:30
Y>!!
X>!!
K!H&X,Y
&
&;0@>
xp
py
#99:@#!:G
(!*+#3!
Y>!8!H&X
3=#3*(!*+#
I&XTY>!8
XTY>! H
X(Y>!
D*!:!5:3=0*+#5!5:3
Y>!
>&>
>(&
ypx
yx!
G0D*!Y(!*
X><!
#3!
8!]<!
H&YCX>!
HK
=0B!,:543=0B!,:
*+#!,:;D#
!HKYL!
:G;#!X!*+:54:G;#!Y
#!0$!X!49!
JK'`!
+:*!!Y<!
!
!
© Young H.
#"$!)!
!! !!
!!!!
! !!
! !! !!
! !!
! !! !! !!
! !
!
! !
Chun
/0!,:54:
pro
X
!X3*501F
He&XHe&YHe&|
!_03=4*4
H&XH&Y
!^D*D:;!
I&XTI&XTI&XT
!N#33!=
HKX
:G;#9(!X!
Joint obabilitie
1
0
F!
X>!8!Y>!8!|(Y>!8!
403:;!#3*
XCY>!8!H&YCX>!8!H&
43$05B:
Y>!8!H&YY>!8!H&XY>!8!H&X
4:@5:B!
H&XCY8]<p6\
XLMNIOO
:3=!Y!
s
1 0
0 0
0
:3=:3=:3=
*501F!
&X(Y>!K!H&X(Y>!K!H
:*403(!I&X
Y>!K!H&YX>!K!H&XX>!O!H&Y
Y>!\!
I&XT8]<]
H&X(Y
O!
Y
1
0.25
0.10
0.35
=! H&X>!!8=! H&Y>!!8=! H&X(Y>
H&Y>!!8!!H&X>!!8!!
X(Y>!
YCX>!!8!!!XCY>!!8!!Y>!K!H&X,
H
8TY>!]\J!
Y>8'<`Y'
Y
0
0.25
0.40
0.65
8!!!8!!!
Y>!8!
!
X,Y>!!8!!!
H&YCX>!8] `Y'
HKY
0.50
0.50
1.0
YLMOIPQR
0
0
0
!R!
JK'p!
!
© Young H.
#"$!LD$00*G:!
!A. Ca! )! !! )! M3!B. Ca!
! K!! !! !!
! K!! !! !!
i#9X
!!!V+4
Chun
D1109#!*:;;!@:B#
Where?Y1
ase 1<!h)+#!150G
4*9!#)+#!5#:903!*+:*!-:
ase 2<!l
l1*403!M$!*+M$!*+
l1*403!M$!*+M$!*+
!
9D;*!X!
V2
)0*:;!
X3*501F!
-+!01*40
*+:*!=D549!:9!$0;;0
? Ho
Aw
h0!eD#9*4G:G4;4*F!=#3*501F!403:G;#!::9#(!*+#!#
l3;F!03#!
'<!%9f!/+#!5#9103+#!5#9103
6<!%9f!/+#!5#9103+#!5#9103
U
V43!209#!
He
03!49!B05
43@!*+#!'0/9<!!"4
ome
way
4039!:;;0=49*54GD*449!He&X>!:39/#5!49#5505!5:*#
eD#9*403
/+#5#v!Y39#!49!U039#!49!%/
/+#3v!Y39#!49!":39#!49!h4
V+#5#
U0B#!
J!]!
J!
He&Y'>!8!
5#!#$$#-*4
p`Q!9#:94=!/#!/4
Whe
Day
2W 0L
0W 3L
2W 3L
0/#=<!403!49!cY8!!!!!9!V03y!#!8!S]}
3!:;;0/#
Y'!0B#(!/+/:F(!/+
Y6!:F(!/+:*4@+*(!/+
#v!Y'!
%/:F!
J!S!
\!
4,#v!
903(!2La43!*+#!#4@
en? Y2
Nigh
1W 0
3W 1
4W 1
YP'](!SP'
}<!
#=7!
+:*!49!F0D+:*!49!F0D
*!49!F0D5!:*!49!F0D
V+
":F!
6!J!
Q!
He&Y6>!
a!1;:F#=@+*+!@:B
ht
0L 3
1L 3
1L 6
]d!:3=!!
D5!:39/#D5!:39/#
:39/#5vD5!:39/#
#3v!Y6!
h4@+*!
S!'!
Q!
8!!
=!$459*!']B#v!
W 0L
W 4L
W 4L
#5v!#5v!
v!#5v!
)0*:
Y!S!
']
!
JK6]!
]!
;!
!
© Young H.
!K!Opt!
!!!!!)+!
! H
!!!!!!^D!
! I
! !!!!!!!~04!
!!!!!!)+!!!!!!^D!
! I!
! !!!!!!!X55
Chun
tion 1<!%
+#!condit
He&X|Y'>!
D*D:;!43$
I&X( Y'>!8!!!!!8
43*!150G:
i#9D;*!X
+#!joint e
D*D:;!43$
I&XTY'>!8
!!!8!
505!5:*#!8
cY
%9f!/+#5
tional (m
8!!
$05B:*40
8!He&X>!K8!
:G4;4*F!=
!
V4
20
!
entropy!H
$05B:*40
8!He&X>!O
!
8!!
YP'](!SP']
5#v!Y'!
mean) en
3!
K!He&XC Y
49*54GD*40
43!
9#!
He&X, Y'
3(!I&XT Y
O!He&Y'>!
]d!
JP'
\P'
ntropy!H
Y'>!
03!
V+#
U0B#!
]<J!
]<]!
]<J!
>!=!!!
Y'>!
K!He&X,
U
%
']
]
H&X|Y'>!!
#5#v!!Y'!
%/:
]<J
]<S
]<\
Y'>!
U0B#y!cJP
%/:Fy!cJP
:F!
J!
S!
\!
PJ(!]PJd!
P\(!SP\d! !
!
]<Y!
]<S!
'<]!
JK6'!
! ! JK66!
© Young H. Chun
K!Option 2<!%9f!/+#3v!Y6!!
!!!!!)+#!conditional (mean) entropy!H&X|Y6>!!!
!! He&X|Y6>!8!!!!!!!!^D*D:;!43$05B:*403!!
! I&X( Y6>!8!He&X>!K!He&XC Y6>!! ! !!!!!8!!!!!!!~043*!150G:G4;4*F!=49*54GD*403!!
! V+#3v!!Y6! !
":F! h4@+*!
i#9D;*!X
V43! ]<6! ]<S! ]<Y!
209#! ]<J! ]<'! ]<S!
! ]<Q! ]<Q! '<]!
!!!!!!)+#!joint entropy!He&X, Y6>!= !!!!!!!!^D*D:;!43$05B:*403(!I&XT Y6>!!
! I&XTY6>!8!He&X>!O!He&Y6>!K!He&X, Y6>!!
! ! !!!!8!!!!!!!!X5505!5:*#!8!!
":Fy!c6PQ(!JPQd!
h4@+*y!cSPQ(!'PQd!
cYP'](!SP']d!
QP']
QP']
! ! JK6J!
© Young H. Chun
C. Case 3<!)/0!eD#9*4039!:5#!:;;0/#=7!!
!!!K!l1*403!'!
! !!
! ! X5505!5:*#!8!'P']! ! X{!8!6<]! ! I&XTY>!8!]<SS`!!!!!K!l1*403!6!
! !!! ! X5505!5:*#!8!'P']! ! X{!8!'<\! ! I&XTY>!8!]<SS`!!!!!K!l1*403!JI![#9*y!
! !!
! ! X5505!5:*#!8!'P']! ! X{!8!! ! I&XTY>8]<S66`!! !
":Fv!
h4@+*v!
209*y
V03yU0B#v
%/:Fv
V03y cSPQ(!'PQd
QP']!
U0B#v!
%/:Fv!
V03y
":Fv
h4@+*v
209*y
V03y c]<\Q(!]<6Qd!
":Fv!
h4@+*v!
209*y c](!'d!
V03y c'(!]d!U0B#v
%/:Fv
U0B#v
%/:Fv V03y c]<\Q(!]<6Qd!
V03y c'(!]d!
QP']!
QP']! 'PQ
SPQ
6PQ
JPQ
! ! JK6S!
© Young H. Chun
!
TThhee WWiizzaarrdd ooff OOddddss
!
The Deer Hunter !
!!t2#*g9!1;:F!:!@:B#!0$!iD994:3!50D;#**#<!!n0D!:5#!*4#=!*0!F0D5!-+:45!:3=!-:3g*!@#*!D1<!!U#5#g9!:!@D3<!!U#5#g9!*+#!G:55#;!0$!*+#!@D3(!94b!-+:BG#59(!:;;!#B1*F<!
!!h0/!/:*-+!B#!:9!M!1D*!*/0!GD;;#*9!43!*+#!@D3<!!L##!+0/!M!1D*!*+#B!43!*/0!:=q:-#3*!-+:BG#59v!!M!-;09#!*+#!G:55#;!:3=!9143!4*<!
!!M!1D*!*+#!@D3!*0!F0D5!+#:=!:3=!1D;;!*+#!*54@@#5<!_;4-f<!!n0Dg5#!9*4;;!:;4,#<!!2D-fF!F0Dy!!h0/(!MgB!@043@!*0!1D;;!*+#!*54@@#5!03#!B05#!*4B#<!
!!V+4-+!/0D;=!F0D!15#$#5(!*+:*!M!9143!*+#!G:55#;!$459*(!05!*+:*!M!qD9*!1D;;!*+#!*54@@#5vu!
! !
! ! JK6Q!
© Young H. Chun
EExxeerrcciissee PPrroobblleemmss !!!LD1109#!*+:*(!G:9#=!03!*+#!1:*4#3*g9!9FB1*0B9(!*+#!15405!150G:G4;4*F!*+:*!*+#!1:*4#3*!+:9!:!-#5*:43!=49#:9#!49!6]}<!!)+#!=0-*05!/:3*9!*0!$D5*+#5!-03=D-*!:!-0B15#+#394,#!G;00=!*#9*<!!)+#!G;00=!*#9*!49!Q]}!#$$#-*4,#!43!=#*#-*43@!*+#!=49#:9#9!/+#3!4*!49(!43!$:-*(!15#9#3*<!!U0/#,#5(!*+#!*#9*!:;90!F4#;=9!t$:;9#!1094*4,#u!5#9D;*!$05!Q]}!0$!*+#!+#:;*+F!1#59039!*#9*#=<!!
! XI!)#9*!5#9D;*9v!)0*:;!
H094*4,#(!X'!! h#@:*4,#(!X6!
YI!N45D9v!n#9(!Y'! ]<']! ]<']! ]<6]!
h0(!Y6! ]<S]! ]<S]! ]<`]!
)0*:;! ]<Q]! ]<Q]! '<]]!!
&:>!Z43=!*+#!#3*501F!H&Y>!:3=!#b15#99!4*!43!bits<!!
! !!!!!
&G>!Z43=!*+#!q043*!#3*501F!H&X, Y>!:3=!#b15#99!4*!43!bits<!!
! !!!!!!
&->!Z43=!*+#!-03=4*403:;!#3*501F!H&YCX>!:3=!#b15#99!4*!43!bits<!!
! !!!!!!&=>!V+:*!49!*+#!5#=D-*403!43!*+#!D3-#5*:43*F!0$!Y!=D#!*0!*+#!f30/;#=@#!0$!Xv!!Xb15#99!4*!43!bits<!!
! !!!
! ! "#$!
© Young H. Chun!
!
!!""####$$%%&&''(())''**""##++,,$$--..$$//""''00%%11""22$$&&33
“A good display has many purposes, but it reaches its highest value when it
forces you to see something you weren't expecting.” – by Laurie Snell
!
4'5,36&$7$&3'89:",$+62'*6.6''
! !!%&'(&('!)&&)*+!! ! %&'(&('!,(-.(/0(!12!34(!&)5!')3)!6/!&)/7!1&'(&!!
! ! 860&1,123!9:0(;! ! Data!!<=!!Sort…!'
! !!>3(?#)/'#;()2!'6,@;)*!!
! ! >(@)&)3(,!')3)!(/3&6(,!6/31!;()'6/A!'6A63,!B,3(?,C!)/'!! ! ! 3&)6;6/A!'6A63,!B;()D(,C!!
!!
4'!9::6,$7$&3'*6.6''
!!!EF!Tabular!)/'!Graphical!8(341',+ !
! G&(,(/36/A!')3)!6/!3)H;(,!)/'!04)&3,!!
! ! Excel! ! Insert!!<=!!Chart…!!
!!!EEF!Numerical!8(341',+!!
! I!J)/'?)&7!,.??)&6(,!BK(/3&);!3(/'(/0*C+!!
! ! E/3(&@&(36/A!3*@60);!D);.(,!!
! I!L6,@(&,61/!BM)&6)361/C+!!
! ! L();6/A!5634!'6D(&,63*!!
! ! Excel! ! Tools!!<=!!Data Analysis…
N!OO$$PQQRRSSNT!T!O$$PPQRS!
S!$QRSN!
! ! "#P!
© Young H. Chun!
!
(());;))''<<66==992266,,''66&&11''>>,,66--??$$++6622''00""..??%%11##''
“One picture is worth more than ten thousand words.”
!
@)'Qualitative'A%,'B6."3%,$+62C'*6.6+!!
!!!$F!U)H.;)&!?(341',!!
! V&(-.(/0*!1&!@(&0(/3!2&(-.(/0*!'6,3&6H.361/!!
Final Grade A B C Total
Number 24 40 16 80 Frequency 0.3 0.5 0.2 1.0
!!!!PF!W&)@460);!?(341',+!!
! !!X)&!K4)&3!
! ! !
! ! Excel! Insert => Charts… => Column !
! !!G6(!04)&3!
! ! ! ! !
! ! Excel! Insert => Charts… => Pie
0
10
20
30
40
50
A B C
8);(V(?);(
! ! "#Q!
© Young H. Chun!
D)'Quantitative'A%,'89:",$+62C'*6.6'!!!!$F!U)H.;)&!?(341',!!
! I!V&(-.(/0*!'6,3&6H.361/!!!
! I!Y(;)36D(!2&(-.(/0*!'6,3&6H.361/!B@(&0(/3)A(!'6,3&6H.361/C!!
! I!K.?.;)36D(!2&(-.(/0*!'6,3&6H.361/!!
Final Exam 70~79.99 80~89.99 90~99.99 Total
Number 24 40 16 80 Frequency 0.3 0.5 0.2 1.0
!! !!Excel! Tools => Data Analysis… => Histogram !
!!!PF!W&)@460);!?(341',!!
! I!L13!@;13!!
! I!Z6,31A&)?,+!!
! I!K.?.;)36D(!'6,3&6H.361/+!%A6D(!
!!! !!Excel Insert => Charts… => Column
0
5
10
15
20
25
50 60 70 80 90 100 More
D$&
E,"F9"&+G
0%
20%
40%
60%
80%
100%
120%
! ! "#"!
© Young H. Chun!
B)'D$/6,$6."'*6.6 !
!!!$F![.);63)36D(!')3)+!!K1/36/A(/0*!3)H;(,!!
! !!K1/36/A(/0*!3)H;(!!
! Blood test !
G1,636D(! \(A)36D(!
Patient M6&.,! T! $! $O
\1!D6&.,! ]! Q"! "O
! $R! QR! RO'
! !!G&1H)H6;63*!3)H;(+!!
! Blood test !
G1,636D(! \(A)36D(!
Patient M6&.,! OF$N! OFOP! OFPO!
\1!D6&., OF$P! OF]N! OFNO!
! OFQO! OFSO! $FOO!'
! !!Excel! Data => Pivot table and pivot chart report… !!!!PF![.)/363)36D(!')3)+!!
! !!>0)33(&!@;13!!
!!
! !!Excel! Insert => Charts… => XY(Scatter)
0
5
10
15
20
25
0 5 10 15 20
Mileage
^A(
! ! "#R!
© Young H. Chun!
*)'B?6,.'H$76,1#'$&'0$+,%#%I.'JK+"2'!
!!K4)&3,!)&(!.,('!31!)/);*_(!')3)!A&)@460);;*F!!`(;;#0&()3('!)/'!21&?)33('!04)&3,!0)/!4(;@!@(1@;(!)/'! H.,6/(,,(,! ?)7(! '(06,61/,! H),('! 1/! 34(! 6?@)03! 34)3! 34(6&! 6?)A(,! @&1D6'(! 31! 34(! .,(&,F!860&1,123!9:0(;!6,!(-.6@@('!5634!34(!K4)&3!`6_)&'!34)3!);;15,!*1.!31!0&()3(!)/'!21&?)3!)!04)&3!31!,.63!);?1,3!)/*!,0(/)&61!1&!/(('F!!
1. Launch Microsoft Excel!Ba1.!'1/b3!7/15!415cC!!
2.!Enter the data to be graphed.!!
!
3. Select the data you want to graph. !
!!!>(;(03!H134!34(!/.?(&60!')3)!)/'!)'d)0(/3!&15!)/'!01;.?/!;)H(;,e!9:0(;!.,(,!34(!;)H(;,!21&!;(A(/'!)/'!):6,!6/21&?)361/F!!E2!34(!')3)!6,!/13!01/36A.1.,f!.,(!9:0(;b,!?.;36@;(!,(;(0361/!3(04/6-.(e!,(;(03!34(!26&,3!&)/A(!12!')3)f!41;'!'15/!34(!Control!7(*f!)/'!34(/!,(;(03!34(!,(01/'!&)/A(!12!')3)F!!
!
!
© Young H.
4. Select !
!!!U4(;117634(!3134)3!@!
!!!>3(04)&3!!
!!!E/!>0;607601/3&!
!!!>3(J(A(/12!34(12!34(!
!!!E/!>*1.!234(!04')3)!1Finis
*1.&!!
5. Manip!
!!!^/!J67(!)&1./
!4'!6:!
! !!
! !!
! !!
! !!
! !!
!
Chun!
the Chart W
(!K4)&3!`6_)6/A!04)&3F!!^11;H)&F!!U4(!@&1?@3!*1.!2
@!$!12!34(!K46/!34(!;(23!0
>3(@!Pf!,@(066/A!34(!K4)&&1;,!54604!;)
@!Q!12!34(!K4/',!)/'!W&6'(,(!1@361/,f!(!>3(@!Q!'6);
>3(@!"f!34(!K21&!)!04)&3!;14)&3!1/!34(!,1&!1/!)/134(sh!)23(&!*1.!04)&3F!
pulating the
9:0(;!04)&3!134(&!51&7,/'!63F!
:-2"'*6
K&('63!,0
Y6'6/A!;)
81D6(,!
8GW!
U63)/60!
WizardF!
)&'!A.6'(,!*23(&!*1.!,(;(K4)&3!`6_)&21&!6/21&?)36
4)&3!`6_)&'!1;.?/f!)/'!3
62*!34(!;10)36&3!`6_)&'f!34)H(;,!)@@()&!
4)&3!`6_)&'!';6/(,F!!U1!00;607!34(!3)H;1A!H1:F!
K4)&3!`6_)&'10)361/F!!a1.,)?(!51&7,4(&!51&7,4((3,(;(03!34(!;1
e Chart
6,!)!51&7,4,4((3!1Hd(03,
6."'!".'
01&(!
)5/!?15
!
*1.!34&1.A4!((03!34(!')3)!*&'!'6,@;)*,!)61/!)H1.3!34(
;(3,!*1.!04134(/!0411,(!
61/!)/'!1&6(/41,(!0(;;,!)@@1/!34(!g#):
@&(,(/3,!/.?04)/A(!1/(!H!)3!34(!31@!
'!@&1?@3,!.!0)/!0&()3(!4((3!),!34(!3F!!K;607!0)361/!21&!
((3!1Hd(03F!!f!*1.!0)/!?1
5(&!
()04!12!34(!,*1.!5)/3!31!0)!,(&6(,!12!,3((!04)&3F!
11,(!34(!3*@(34(!,.H#3*@(
/3)361/!12!*1@()&!6/!34(!D:6,F!
?(&1.,!1@361
1D(!63f!04)/A
,3(@,!/(0(,,)04)&3f!0;607!3(@#H*#,3(@!-
(!12!04)&3!*1(!6/!34(!&6A43
1.&!')3)F!!E2!*Data Range
1/,!&)/A6/A!
A(!63,!,6_(f!'
)&*!31!0&()3(34(!K4)&3!`-.(,361/,!)/'
1.!5)/3F!>(;(3!01;.?/F!
*1.!,(;(03('H1:F!U4(!Se
2&1?!K4)&3!
(;(3(!63!)/'!@
!)!@&12(,,61/`6_)&'!H.331/'!'6);1A!H1:
(03!34(!3*@(!1
'!')3)!H(21&(eries!,(0361/
U63;(,!31!
@;)0(!H1&'(&
"#]!
/);#/!1/!:(,!
12!
!/!
&,!
! ! "#S!
© Young H. Chun!
!
(())LL))''8899::"",,$$++6622''00""..??%%11##''
“79.48% of all statistics are made up on the spot.” by John Paulos
!@)'0"6#9,"#'%I'B"&.,62'<"&1"&+G'A%,'M%+6.$%&C'!
!!!$F!Arithmetic mean+!)!3*@60);!D);.(!21&!-.)/363)36D(!')3)!!
#!G1@.;)361/!?()/! ! "
xii"$
N
#
Nf!N!6,!34(!@1@.;)361/!,6_(!
!
! #!>)?@;(!?()/!n
x
x
n
ii#
"" $!54(&(!n!6,!34(!,)?@;(!,6_(!
!“When the Okies migrated from Oklahoma to California, they raised the
average IQ's of both states.” --- Will Rogers
!!!!PF!Weight average+!^'d.,36/A!21&!6?@1&3)/0(! !!!
!!!QF!$% trimmed mean+!Y(?1D6/A!1.3;6(&,!!
Y(?1D(!34(!,?);;(,3!$h!)/'!34(!;)&A(,3!$h!12!34(!')3)!
D);.(,!)/'!34(/[email protected](!34(!?()/!12!34(!?6'';(!B$#P$%h!12!34(!')3)F!
!!# @/",63"'*$II$+92.G'
At a recent fund-raising dinner, a group of six MBA alumni, sitting around a table were, oddly enough, discussing compensation issues. Although reticent to divulge their own individual annual compensations, they agreed that it would be useful if they knew the average salary of the group. Can you derive a strategy that would enable themselves to know the group average, without anybody knowing the salaries of anybody else?
! ! "#N!
© Young H. Chun!
Ex]!8()/!6,!?()/6/A;(,,c!!
!!!iU4(!A&()3!?)d1&63*!12!@(1@;(!4)D(!?1&(!34)/!34(!)D(&)A(!/.?H(&!12!;(A,j!!^?1/A!34(!RS!?6;;61/!@(1@;(!6/!X&63)6/!34(&(!)&(!@&1H)H;*!RfOOO!@(1@;(!541!4)D(!1/;*!1/(!;(AF!!U4(&(21&(f!34(!)D(&)A(!/.?H(&!12!;(A,!6,!! BROOO!&!$!k!R]fTTRfOOO!&!PClRSfOOOfOOO!<!$FTTTT$PQF!!!!81,3!@(1@;(!4)D(!351!;(A,FFFm!!!!!"F!Median+!!
! #!^!3*@60);!D);.(!21&!-.)/363)36D(!)/'!1&'6/);!')3)F!!
! ! E2!n!<!1''f!34(!D);.(!12!34(!?6'';(!63(?F!!
! ! E2!n!<!(D(/f!34(!)D(&)A(!12!34(!351!?6'';(!63(?,!!!!!RF!Mode+!!
! #!^!3*@60);!D);.(!21&!/1?6/);!')3)F!!
! #!L)3)!D);.(!34)3!100.&,!with greatest frequencyF!!!!!]F pth Percentile!!
! #!p!@(&0(/3!12!34(!313);!1H,(&D)361/,!6,!H(;15!34)3!D);.(F!!!!!SF!Quartile+!!
! #!V6&,3f!,(01/'f!)/'!346&'!-.)&36;(,F!!* Box-and-Whisker Plot: Five-Number Summary!!
86/f![$f![Pf![Qf!8):! X
! ! "#T!
© Young H. Chun!
Ex]!K;),,#,6_(!')3)!21&!)!,)?@;(!12!26D(!0;),,(,+!!
! "]! R"! "P! "]! QP!!
!!B)C!%&'(&('!)&&)*+!^&&)/A(!34(!')3)!6/!),0(/'6/A!1&'(&F!!! QP! "P! "]! "]! R"!!!!BHC!V6/'!34(!,)?@;(!)D(&)A(+!!
30 35 40 45 50 55 60 !
x =
!!!B0C!V6/'!34(!,)?@;(!?('6)/+ !
!
!!B'C!V6/'!34(!,)?@;(!?1'(+ !
!
Ex]!Warm up your calculator!!!K1/,6'(&!34(!)A(,!)/'!?6;()A(,!12!34(!21.&!).31?1H6;(,!6/!34(!,)?@;(F!!
Car Age Mileage
$! $! $]!P! Q! "O!Q! R! R"!"! N! $ON!
Mean
Median !
!
© Young H.
D)'0"!
!!!$F!Ra!
=
!!!PF!In!
<
!!!"F!Va!
! !
!
! !!
! !
!!!RF!St
! !!
!
!!!]F!Co!
! !
Chun!
"6#9,"'
ange!!
= X?):!#!X
nterquar
<![Q!#![
ariance+
!G1@.;)3
!>)?@;(
B^C!
tandard
!G1@.;)3
!>)?@;(
oefficien
$OO&!
'h
%I'*$#-
“All m
X?6/!!!
rtile ran
$!
+!!
361/!D)&6
(!D)&6)/0
sP " i"$
n
#
deviatio
361/!,3)/
(!,3)/')&
nt of var
h!!1&!!x
s
-",#$%&
easurements
nge!B!E[Y
6)/0(+!'
0(!
Bxi ( x C$
#
n ($
on+!
/')&'!'(D
&'!'(D6)3
riation+
$OO& h
&'A%,'N6
s are subject
Y!C !
P "
Bxi"$
N
#
P
!!1&!BX
D6)361/+!
361/+!s "
Y(;)36D(
!
6,$6.$%&C
t to variation
xi ( !CP
N
XC! sP " i"
n
#
P'"'
sP !
(!D)&6)361
C'
n.”
!
xiP ( n
"$
n
#
n ($
P!
1/!
nx P
!
"#$O!
! ! "#$$!
© Young H. Chun!
Ex] Class-size data for a sample of five classes: !
K;),, i xi! Bxi# x C! Bxi# x CP! xi
P!
$! "]!
P! R"!
Q! "P! #P! "! $S]"!"! "]! P! "! P$$]!R! QP! #$P! $""! $OP"!
! x = >.?!<! !
!!B)C!V6/'!34(!,)?@;(!D)&6)/0(+!!
! !!n,(!34(!21&?.;)!B^C!!!
sP!<
!
! !!n,(!34(!21&?.;)!BXC!!!
! ! sP!< !
!
!!BHC!V6/'!34(!,)?@;(!,3)/')&'!'(D6)361/+!s!< !
!!B0C!V6/'!34(!01(22606(/3!12!D)&6)361/+!!
! ! 0D!<! !
Ex] TI-30Xa+!K1/,6'(&!34(!21;;156/A!,)?@;(!')3)F!!
Car Age Mileage
$! $! $]!P! Q! "O!Q! R! R"!"! S! $ON!R! $"! $QS!
Average
Variance
Standard deviation
!
© Young H.
B)''!?!! I!
!! I!
! !!
! !!
!
Well,
!!Ex]!>.oQOOfO!
B)C!41.
!
! !!BHC!'1;;?()
!
! !
Chun!
?6-"'%I
!Unimod
!Symmet
#!J(
#!Y6
by definiti
.@@1,(!3OOO!)/'!
`4)3!'1.,(!@&60(
E2!34(!21;)&!?)/,)/!1&!34(
I'.?"'*$
dal!1&!bi
trical!1&!
(23#,7(5(
A43#,7(5
You kn
ion, half of
34)3!34(!?34(!?('
1(,!346,!,(,!4(&(c!
1./'(&!12,61/!6/!34(!?('6)/
$#.,$=9.
modal!'
skewed
('+!;1/A
5('+!;1/
now how du
f them are e
?()/!@&'6)/!@&60
,)*!)H1.
2!)!46#3(04(!01??/!D);.(!1
.$%&'
'6,3&6H.36
'6,3&6H.
(&!;15(&
/A(&!&6A4
umb the av
even dumb
60(!12!)!4(!6,!oNO
.3!34(!'6,
04!01?@?./63*f!512!34(!41
1/!
.361/!
&!3)6;+!!!8
43!3)6;+!!8
verage guy
ber than tha
41.,(!6/fOOOF!
,3&6H.361/
@)/*!H.6;54604!A11.,(,c!
!
8()/!p!8
8()/!=!8
!
is?
at.!!!qF!YF!r
/!J)2)*(
/!12!
;',!)!o$O1(,!.@!?
8('6)/
8('6)/!
rX1Hr!L1H
33(!6,!
O!?6;;61/?1&(f!34(!
"#$P!
HH,!
/!
!
© Young H.
*)'D$/
!! !!
! !
!Ex]!U&(./61*()&f!3'(0&()!!Ex]!U34(!@)5),!S?()/!(:)?c!
! !!
Chun!
/6,$6."'
!K1D)&6)
!K1&&(;)
4(!?()/1/!5),!S34(!?()/),(!54(/
46&3*#,6:),,6/A!,0Nf!34(!?12!);;!,0c!B!2&1?!M
*6.6O'
)/0(+!K1
)361/!01(
/!)A(!12!?SPFSR!*()/!)A(!5)/!);;!34(!
:!,3.'(/31&(!5),!
?()/!,01&01&(,!5)MathCount!
!
1DBxf!yC!<
(22606(/3+
?(?H(&)&,F!!^3!3),!S$F]S!0;),,!?(
3,!3117!34SOF!!U4(&(!12!341,!S$F!Z121&!?6'';(
<!PQSFRO
!K1&&Bxf
&,!12!34(!34(6&!2623**()&,F!!Z(?H(&,!)
4(!26/);!(!?()/!,,(!541!215!?)/*(!,0411;!,3.
O!
!yC!!<!!rx
0;),,!12!*#26&,3!&(Z15!0)/)&(!)!*()
(:)?!;),01&(!12!2)6;('!5*!,3.'(/3.'(/3,C!
xy!<!kOFT
2 sR"!)3!34(./61/!34/!34(!?()&!1;'(&c
),3!*()&f!341,(!54),!]Of!)/3,!'6'!/1
TRO!
4(6&!26236(4(!/(:3!)/!)A(!c!
1/!5460441!@),,(/'!34(!13!@),,!34
"#$Q!
!
(34!
4!('!
4(!
! ! "#$"!
© Young H. Chun!
(())PP))''**""##++,,$$--..$$//""''00%%11""22$$&&33''QQ$$..??''JJKK++""22''
!
@)'R#"I92'I9&+.$%&#'!!!!860&1,123!9:0(;!@&1D6'(,!)!/.?H(&!12!2./0361/,!21&!01??1/!,3)36,360);!1@(&)361/,F!!U1!.,(!34(,(!2./0361/,f!0411,(!Insert!)/'!Functionf!1&!0;607!G),3(!V./0361/!311;! fx !;10)3('!1/!34(!>3)/')&'!311;H)&FF!!
!!!U*@60);;*f!34(!,*/3):!12!)!2./0361/!4),!34&((!?)d1&!@)&3,+!)/!(-.);!,6A/f!34(!2./0361/!/)?(f!)/'!)!&)/A(!12!0(;;,F!!
!!!^;;!2./0361/,!H(A6/!5634!34(!<!,6A/F!U4(!2./0361/!/)?(!.,.);;*!6/'60)3(,!54)3!34(!2./0361/!'1(,F!V1&!(:)?@;(f!<>UL9MB^$O+^PRC!0);0.;)3(,!)!,3)/')&'!'(D6)361/!21&!34(!0(;;,!6/!^$O!34&1.A4!^PRF!!`6346/!@)&(/34(,(,f!,@(062*!34(!6/@.3!6/21&?)361/!0);;('!34(!)&A.?(/3,!1&!@)&)?(3(&,F!!E/!34(!0),(!12!>UL9MB!Cf!34(!)&A.?(/3,!)&(!34(!,(&6(,!12!/.?H(&,!21&!54604!)!,3)/')&'!'(D6)361/!6,!0);0.;)3('F!!E/!?1,3!0),(,f!2./0361/!)&A.?(/3,!)&(!0(;;!&(2(&(/0(,F!!
!!!n,(!9:0(;b,!Z(;@!>*,3(?!1&!34(!G),3(!V./0361/!'6);1A!H1:!31!A(3!4(;@!5634!2./0361/,F!!
!!!!!>1?(!12!34(!.,(2.;!2./0361/,!6/!'(,0&6@36D(!,3)36,360,!)&(!! average, max, min, var, stdev, count, mode, median, quartile, percentile, trimmean, skew, kurt, covar, correl
!
© Young H.
D)'@&!
!!!860&1,*1.! 0)/!)/);*,(,F)@@&1@&6))/[email protected]!
!!!U1!D6(5AnalysisAdd-Ins!!
!
!!!E/!34(!D'6);1A!H1Summar,3)36,360,!!
'
Chun!
&62G#$#'
,123!9:0(;!@&.,(! 31! ,)DF! !a1.!@&1D)3(!,3)36,360)3!3)H;(F!>1?
5!)!;6,3!12!)Ds!01??)/'!6)/'!04(07!34
Data Analys1:f!,@(062*!34ry Statistics1/!)!/(5!5
<%%2S6
&1D6'(,!)!,(3D(! ,3(@,! 546'(! 34(!')3));!1&!(/A6/(((!311;,!A(/(
D)6;)H;(!)/);6,!/13!1/!34(4(!H1:f!Ana
sis!'6);1A!H14(!0(;;,!34)3!s!04(07H1:!61&7,4((3F!!
6T'
3!12!')3)!)/)(/! *1.! '(D)!)/'!@)&)?(&6/A!?)0&1!&)3(!04)&3,!6/
;*,6,!311;,f!0(!Tools!?(/.alysis ToolP
1:f!,(;(03!De01/3)6/!*1.6/!34(!;15(&!
!
);*,6,!311;,!tD(;1@! 01?@;(3(&,! 21&!()02./0361/,!)//!)''6361/!31
;607!Data A.f!*1.!/(('!
PakFC!
escriptive S.&!')3)!6/!34(;(23!01&/(&F!!
t!0);;('!34(;(:! ,3)36,360)04!)/);*,6,e!/'!34(/!'6,@[email protected]!3)H;
Analysis!1/!331!6/,3);;!34(
StatisticsF!!E/(!Input RangX*!'(2).;3f!
(!Analysis );! 1&! (/A6/(34(! 311;!.,(;)*,!34(!&(,.(,F!
34(!Tools!?(!Analysis T
!
/!34(!Descrge!H1:F!!K;609:0(;!A(/(&
!
ToolPak!t((&6/A!(,! 34(!.;3,!6/!
?(/.F!!E2!34(!ToolPakF!!BK
iptive Statis07!34(!&)3(,!34(!
"#$R!
t!34)3!
Data
K;607!
stics!
! ! "#$]!
© Young H. Chun!
!
<?"'H$76,1'%I'511#'
!
Figures don't lie, but liars figure.
!'B6#"';)'<?"'!$&'%I'.?"'0$##$&3'U",% !
!!!U4(! ;()'!@)&)A&)@4!12! 34(!)&360;(!0);;('! 63! i34(!H6AA(,3! ;)/'!H11?! 6/!^?(&60)/!Z6,31&*mF! ! U1! ,415! 34(! '&)?)360! A&1534! 6/! ;)/'! D);.(,f! 34(!)&360;(! 6/0;.'('! 34(! A&)@4! ,415/! H(;15F! ! ^3! 26&,3! A;)/0(f! 63! ;117,! ),!341.A4!34(&(!4),!H((/!)/!6/0&(),(!12!)H1.3!35(/3*!26D(#21;'!'.&6/A!346,!@(&61'f!,6/0(!34(!&)361!12!34(!4(6A43!12!34(!;6/(!@;133('!21&!$TN]!6,!)H1.3!PR!36?(,!34)3!12!34(!4(6A43!@;133('!21&!POOQF!!!
a()&! $TN]! $TNS! $TNN $TNT $TTO u POO$ POOP! POOQ
G&60(! ]OOOO! ]]OOO! SP]OO STN]O NSN"] u PRO]QR PSR]TN! QOQP]N
!
!! !
'!'B6#"'L)'!$7"':6..",#V'!a(,j!,6_(!?)33(&,u!!
!\.?H(&!12!2(?);(!01;;(A(!A&)'.)3(,!
! !
50000
100000
150000
200000
250000
300000
350000
1986 1990 1994 1998 2002
0
50000
100000
150000
200000
250000
300000
350000
1986 1988 1990 1992 1994 1996 1998 2000 2002
$TTO! POOO!$TNO!
$O!
PO!
QO!
! ! "#$S!
© Young H. Chun!
!
JJKK"",,++$$##""''SS,,%%==22""::##''
Problem 1F!U4(!PN!3(,3!,01&(,!12!)!')3)!?6/6/A!0;),,!5(&(!)&&)/A('!6/!)!,3(?!)/'!;()2!@;13!),!6;;.,3&)3('F!!!
W | 0 1 2 3 3 6 7 X | 0 2 3 3 5 7 8 9 Y | 1 1 4 6 8 9 9 Z | 0 2 4 6 8 9
!
`4)3!6,!34(!median!12!34(!A6D(/!')3)c!BR!@16/3,C!!
!!!!Problem 2F!8&F!q),1/!U)3(f!8)/)A(&!)3!Y('!>3607!U):6f!,(;(03('!)!&)/'1?!,)?@;(!12!"!3):60)H,!)/'!&(01&'('!34(!/.?H(&!12!&1./'!3&6@,!?)'(!31!34(!;10);!)6&@1&3!1/!%031H(&!PF!!U4(!&(,.;36/A!')3)!)&(!),!21;;15,+!B$R!@16/3,C!!
K)Hf!i! $! P! Q! "!
Y1./'!3&6@,f!x! "! O! ]! Q!
!B)[email protected](!34(!?()/!/.?H(&!12!3&6@,!21&!346,!,)?@;(!12!0)H,F!!
[email protected](!34(!?('6)/!/.?H(&!12!3&6@,!21&!346,!,)?@;(!12!0)H,F!!
[email protected](!34(!,)?@;(!,3)/')&'!'(D6)361/!21&!346,!,)?@;(!12!0)H,F!!
!
!
!
© Young H.
!
SSeess!
* Data!
! !!"!! ! #!
! !!$! ! %!
!
! !!n !
-
!
&!
!
&!
!
!!'#()*+(,
!
-!.)d
Chun
ssssiioonn 5
a Set
/010!2#1#)3(*4),
$#!503#!0%#!503#!
ID A
142 256
.. 452
!!k!/010
n!*4%2!#)1(1(#
k!6478,011*(98
Observa
:4*!0!;
#<1!4*!(,):4*,01(#1*(#3()=#>=>?!@4,0=#2>!+
)!ISDS 4
data>!
55.. DDaattaa
1!(2!0!2#1!,#)1!4*!;
0!6477#610!2#1!4:!1
Age S
36 23 ..
33
0!,01*(<!
,0A!9#!*2?!602#2?
,)2!,0A!81#2?!4*!:(
ationB!15;0*1(6870*
,0=#!/01(4)?!0)/!=!82#:87!4=7#C!4*!+#>=>?!,0
4141?!%#
aa CCoollllee
4:!,#02;*46#22>
1(4)!4:!n15#!20,#
Sex
M HF C.. M A
*#:#**#/!!49D#612?
9#!*#:#*:(#7/2>!
5#!2#1!4:!*!#7#,#)
0!902#2!0/010!,(1#<1!:*4,+((C!*#640,,4=*0
#!%(77!*#2
eeccttiioonn
28*#,#)1
49D#612?#!k!,#02
Educa
High SchCollege
.Advance
14!02!ele
?!4*!*#64
**#/!14!02
,#028*#)1>!!
0*#!0724!)()=!,#,!70*=#!64=)(E()=0,C>!
21*(61!48*!
aanndd PPrr
12!10F#)!
?!0)/!:4*28*#,#)1
ation
hool
.. ed
ements?!4*/2>!
2!variab
#,#)12!4
(,;4*10#154/2!606477#61(4=!2;#6(:(6
*!/(26822
rreepprroocc
:*4,!24,
*!#065!4912>!
Inco
56,45,
.85,
()/(3(/8
les?!:#018
4910()#/!
)1!248*60)!5#7;!(4)2!4:!/46!;011#*)
2(4)!14!n
cceessssiinngg
,#!!
9D#61!
ome
000 000 .. 000
8072?!(1#,
8*#2?!
#2!4:!()!+(C!468,#)1)2!()!
umerica
G&H!
gg!
,2?!
12!
al
! ! G&I!
© Young H. Chun
55..11.. DDaattaa CCoolllleeccttiioonn
!* Gathering Data !
!!H>!J910()!/010!07*#0/A!;897(25#/!9A!other sources>!!! ! +K#64)/0*A!/010B!.)1#*)07!4*!#<1#*)07!248*6#2C!!
! !!Netflix Prize /010!2#1!!
! !!L01#2!4:!IIMH!'(10)(6!;022#)=#*2B!! ! '5#(*!944F()=!67022#2?!0=#2?!0)/!=#)/#*2>!!
! !!K;4*12!07,0)06?!LN.!6*(,#!2101(21(62?!OK!P#)282?!#16>!!
!!I>!Q#2(=)!0)!experiment!14!4910()!15#!)#6#220*A!/010>!!
! !!R0*(097#2!0*#!(/#)1(:(#/!0)/!controlled>!!
! !!S'0F#!15#!T#;2(!65077#)=#U!!
! !!V(6*4%03#!;4;64*)B!T4%#*!7#3#7!0)/!1(,#!!
!!W>!V0F#!492#*301(4)2!15*48=5!0)!observational study>!!
! !!R0*(097#2!4:!()1#*#21!0*#!)41!controlled>!!
! !!Q*(3#&15*8!2#*3(6#!1(,#!;#*!3#5(67#!01!'064!N#77!!
! !!T*(6#!/(::#*#)6#!9#1%##)!X4%Y2!0)/!Z4,#!Q#;41!!
!![>!P4)/861!0!survey>!!
! !!V0(7!28*3#A?!;#*24)07!()1#*3(#%?!1#7#;54)#!()1#*3(#%!!
! !!USA Today\CNN\Gallup!T*#2(/#)1(07!#7#61(4)!;477!!
! !!T*#2(/#)1Y2!D49!0;;*4307!*01()=2!!
! !!Consumer ReportsB!J%)#*!201(2:061(4)!28*3#A!
! ! G&W!
© Young H. Chun
* Errors in Survey Research !!"#!$%&'()*'!+((%(!!
! !!K#7#61(4)!9(02!]^!_4)&*#;*#2#)101(3#!20,;7#!!+,`!O)7(21#/!1#7#;54)#!)8,9#*!!
+,`!Z4%!9(=!(2!15#!:(25!()!15#!;4)/a!!b48!10F#!A48*!%(/#&,#25#/!:(25()=!)#1!0)/!60165!4)#!58)/*#/!:(25#2?!#3#*A!4)#!4:!%5(65!(2!=*#01#*!150)!2(<!()65#2!74)=>!!Q4#2!15(2!#3(/#)6#!28;;4*1!15#!5A;415#2(2!1501!no fish in the pond is less than six
inches longa!!
!!-#!.%/01'23%/2'!+((%(!!
! !!K(7#)1!,0D4*(1Ac!P077#*2!14!d825!X(,908=5!4*!_Td!!
!!4#!5678937'!1'23%/2'!+((%(!!
! !!.)1#*)#1!;477!!
!!:#!;%)<'<!=6'289%/2!!
+,`!SQ4!A48!:034*!90))()=!;*(301#!4%)#*25(;!4:!50)/=8)2!in
order to reduce the rate of violent crimeaU!!
!!>#!?/@7')(!A'B9/989%/!!
!S.!/(/!)41!503#!2e<807!*#701(4)2!%(15!1501!%4,0)?!V(22!X#%()2FA>U!
!
!S.1!/#;#)/2!4)!%501!15#!,#0)()=!4:!15#!%4*/!f(2Y!(2>U!
!
© Young H.
!!C#!A9!
! !
! !!
+,D
2865Ass
Dys
26(#:70%!!!J()06abo
stra
!!!.)*#;4=*#0:#,
!!E#!F/!
!SZ#)1#
!
! ]!
!!G#!5!
! !
!
!!H#!I)!
! !
Chun
92J%/'28
!g<0==#0)2%
D!K#<!28*5!(/7#!,
sociation
sfunction
#)1(:(6!60%#/>!J)#!,0D4668*06A!out their
angers>!)!,421!284*1#/!9A01#*!150),07#2>!
/)33(%3
Z03#!A48#*10(),#
]^!SZ03#
5')26('K
!Z0)=()
)K379/*
!V0*=()
8L!
#*01#!15#(%#*2!1501
*3#A2!0*#,82()=>!'n!;897(25n in the U
0*#?!.!282
4*!*#024)4:!15#2#sexual p
8*3#A2?!1A!5#1#*42)!15#!03#
(9)8'!M%
8!;01*4)(E#)1!%(15(
#!A48!=4
K'/8!+(
)=?!;*#=)
*!+((%(!
)!4:!#**4*
(*!()64,1!15#A!15
#!4:1#)!7'5#!Jour
5#/!0!218U.S.U!!'52;#61!150
)!:4*!9415!28*3#A2practices
15#!03#*0#<807!,#*0=#!)8,
%@)N67)
E#/!0!64()!15#!;0
4)#!14!0!,
((%(!
)0)1?!0)/
*?!%5(65!
,#?!8)/#*5()F!0*#!S
#22!()2(=rnal of th
8/A!7021!,548=5!(101!#3#)!15
5!15#!9702!(2!1501!ps!+28*;*(
0=#!)8,,07#2?!:4*,9#*!*#;
(L!;'&'
,,#*6(0021!,4)15
,43(#!%
/!/(,;7#/
(2!8)034
*2101#!15#S066#;10
=51:87!150he Ameri
,4)15!1(11!%02!64)5#2#!*#28
0)/)#22!people d
2#hC?!#2;
,9#*!4:!2#*!#<0,;7;4*1#/!9A
'7!
07!248*6#5aU!
%(15()!15#
/!650/2>
4(/097#
#(*!0=#?!4097#>U!
0)!0!=44ican Med
17#/!SSex
)/861#/!8712!0*#!
0)/!15#!don’t tell
;#6(077A!1
#<!;0*1)7#?!(2!2(=)A!5#1#*4
#!4:!6()#
#!;021!,
!
4*!;*43(/
4/!)43#7!dical
xual
%(15!2#*(4827A
7(F#7A!l the trut
14!
)#*2!)(:(60)1742#<807!
#,01(6!
,4)15aU
G&[!
/#!!
4*!
A!
th
7A!
!
© Young H.
O$)2'")0*65
'5#341#!:4()678/0!2;41Z4%
%*(1#!(")=*A341#2?
!
+0C!.1!515#!.)115#!.)10//*#2;4()12
!
+9C!$!
O$)2'
4:!XKO4:!15#!5#*!141!!J:!150//*#2!!'5#!d15#!*#2resoun
$150,0
!!'5(2!4:!ijG!!
Chun
'!"D!V(655A>U!The P
#!#/(14*2!:4*!Most
/#/!,0)A1!14!%*(1#%0*/!K1#()!4)#!4A!Q*8)F#!9A!:0*!15
502!9##)1#*)#1>!Q1#*)#1>!g22#2!()!152C!
$54!%02!Z
'!-D!"2!;O!,0(7#/k8#21(4)107!()64,5#![?lIM!22#2!:4*!d#8)(4)2;4)2#2!nding su
000!”
*#2871!#<:4*!,0)
50#7!m>!Z(,
Philadelp
4:!Peop
Beautifu
A!%#77&F#!()!0!)0,#*)?!0!*0/:!5(2!645#)!Q%0*5#!,421!
)!20(/!150Q#26*(9#!g3#)!(:!%5#!648)1
Z0)F?!15
;0*1!4:!(12/!0!k8#2)2!02F#/,#!7021!A,#,9#GlW>!!J:)!P4,,(0)/!0))
uccess. T
<0==#*01)A!*#024)
!
,4%(1E?!S
phia Inqu
ple!,0=0ful Perso
F)4%)!6#,#>!/(4!;#*2454*12!4)!*:>U!K1#*)341#2!=0
01!()!15#!54%!A48%#!0228,1*A?!%501
5#!")=*A
2!IG15!*#1(4))0(*#/!15#!*#2;A#0*>!*2!4:!15#:!1542#?!I(11##!64,)48)6#/?The avera
1#2!15#!())2>!!$50
SJ)7()#!R
irer?!+V0
0E()#!02Fn. '5#!4#7#9*(1(#2
4)07(1A?!15#!*0/(4)i2!:0)2!=0*)#*#/!9
:818*#?!#8!,(=51!,#!1501!%1!;*497#,
A!Q*8)F#
#8)(4)!6##!14!(12!,;4)/#)1!
#!67022!ijIIM!*#18*,;81#/!1 “The m
age inco
)64,#!401!0*#!15#
R41#*2!@#
0A!Il?!Hnn
F#/!15#(*4::(6(07!92!0)/!072
28==#21#4?!)0,#7=03#!Z0)9A!0)A!4
#3#*A15()1*A!14!64%#!503#!0,2!%487
#)!Q%0*
#7#9*01(4,#,9#*214!=(3#!5
jG?!15#!"*)#/!15#!15#!,#0)
member o
ome of cl
4:!15#!,##2#!*#024
#1!Z0)F#*
nlC?!Lo!
*!4)7()#!*907741!24!()678/
#/!1501!57A!Z0)F!)F!43#*!4)#!;#*24
)=!%(77!94)/861!0066#22!147/!%#!:06
*:a!+nG!;
4)?!15#!672>!!J)#!5(2!4*!
"78,)(!Jk8#21(4))!()64,#of '75 ha
lass mem
#,9#*2!44)2a!
()Y!:4*!
*#0/#*2!1
/#/!
5(2!:0)2!S15#!IWM?MMM4)>!
9#!/4)#!0!28*3#A!4!077!#&,6#a!+G!
4()12C!
7022!4:!ij
J::(6#!50))0(*#>!#!=(3#)!(ve enjoy
mbers is
4:!15#!P7
G&G!
!
14!
M!
4)!4)!,0(7!
jG!
0/!
()!yed
022!
!
© Young H.
!
1. Sim!
T15#!2#7##7#,
!2. Sy!
J#7#,
!
!3. Str!
'20,!
4. Clu!
'67826782()!#
!
Chun
mple Ra
T*4909(7(20,#! :#61()=! 4),#)1!(2!6
stemat
J910()#/,#)12!()
ratified
'5#!;4;8,;7#!(2!10
uster S
'5#!#7#,21#*2! 4*!21#*2?!0:1#065!654
55..22
andom
(1A! 4:! 2#:4*! #065)#! #7#,6542#)> !
tic Sam
/! 9A! 10F)!15#!;4;
Rando
8701(4)!(20F#)!:*4,
ample
,#)12!()!1=*48;21#*!%5(62#)!6782
22.. SSaammpp
Sampl
#7#61()=!5! 0)/! #3#)1! (2! ()
mple
F()=! #3#;8701(4)>
om Sam
2!/(3(/#/,!15#!#7#
( two-s
15#!;4;8>! ! P5445!0!2(,;21#*!(2!2#7
pplliinngg MM
e
#065! #7#3#*A! #7#)/#;#)/
#*A! k15!
mple
/!()14!21#,#)12!(
!tage cl
8701(4)!042#! 01! *;7#!*0)/7#61#/>!
MMeetthhooddss
#,#)1! ()#,#)1?! 0#)1! 4:!%
#7#,#)
1*010!+=*4()!#065!2
uster s
0*#!/(3(/#0)/4,!4,!20,;
ss
)! 15#! ;40)/! 15#!%5#15#*!
)1! 4)! 0!
48;2C!0)21*018,!+
samplin
#/!()14!00! 20,;7;7#!4:!15
4;8701(4)650)6#24,#! 41
7(21! 4:!
!
)/!0!*0)/+=*48;C>
!
ng )
0!)8,9#*7#! 4:! 155#!#7#,#
!
G&o!
)! (2!#! 4:!15#*!
077!
/4,!
*!4:!5#2#!#)12!
! ! G&j!
© Young H. Chun
+,D! K8;;42#! 1501! XKO! #,;74A2! I?MMM! ,07#! 0)/! GMM! :#,07#!:06871A! ,#,9#*2>! ! '5#! #k807! #,;74A,#)1! 4;;4*18)(1A! 4::(6#*!;4772! 0! 21*01(:(#/! *0)/4,! 20,;7#! 4:! IMM! ,07#! 0)/! IMM! :#,07#!:06871A!,#,9#*2>!
!!!g065! ,#,9#*! 4:! 15#! 20,;7#! (2! 02F#/?! S.)! A48*!4;()(4)?!0*#!:#,07#!:06871A!,#,9#*2! ()!=#)#*07!;0(/!7#22! 150)! ,07#2! %(15! 2(,(70*! ;42(1(4)2! 0)/!k807(:(601(4)2aU!
!!!HlM!4:!15#!IMM!:#,07#2!0)/!oM!4:!15#!IMM!,07#2!20A!pb#2>p!!K4!I[M!4:!15#!20,;7#!4:![MM!+oMqC!0)2%#*#/!pb#2?p!0)/!15#!4::(6#*!15#*#:4*#!*#;4*12!1501!S902#/!4)!0!20,;7#?!%#!60)!64)678/#! 1501! oMq! 4:! 15#! 14107! :06871A! :##7! 1501! :#,07#!,#,9#*2!0*#!8)/#*;0(/!*#701(3#!14!,07#2>U!!!+0C!g<;70()!%5A!15(2!64)6782(4)!(2!%*4)=>!!
!!!
!!
!+9C!@(3#! 0)! 8)9(02#/! #21(,01#! 4:! 15#! ;*4;4*1(4)! 4:! 15#! 14107!:06871A!%54!:##7!1501!:#,07#2!0*#!8)/#*;0(/>!!
! N n x x/n
V07#! I?MMM! IMM! oM! WMq!
L#,07#! ! ! ! !
'4107! IGMM! [MM! I[M! oMq!
!! ! !!!!
! ! !!!!!
!
© Young H.
!* Why!
!!d#,2;
!
-
! ! !! ! !!
-
! ! !!
-
! ! !!
! !!Q!
!!'544
!
* How!
!!Z4
!
! ! &!!
! ! &!!
! ! &!!
! ! &!!
Chun
y?
#07&%4*7,(22()=?!2(E#?!:087;*497#,2
Incomp
X06F()4*!64)
Noisy:
P4)10(
Inconsis
P4)10(
010!k807(
582?!S/0143#*077!,4:!1(,#!2;
w?
4%!60)!%4:!/010a!
Q010!67#
Q010!()1
Q010!1*0
Q010!*#
55..33..
7/!/010900)/!()6471A!/010!62?!0)/!/(
plete:
)=!011*(9810()()=!4
()()=!#**
stent:
()()=!/(2
(1A!(2!0!F
10!;*#;*4,()()=!;*;#)1!D821
%#!;*#;*
#0)()=!
1#=*01(4)
0)2:4*,0
/861(4)!!
.. DDaattaa PP
02#2!0*#!4)2(21#)16477#61(426*#;0)6
81#!3078#4)7A!0==
*4*2?!4*!4
26*#;0)6
F#A!(228#
46#22()=*4D#61?U!%1!;*#;0*(
*46#22!15
)!+*#&64)
01(4)!
PPrree--PPrroo
5(=57A!21!/010!/84)!()21*8,6A!()!)0,
#2!4*!6#*=*#=01#!/
4817(#*!30
6(#2!()!15
#B!@.@J
=!*#;*#2#%5(65!(2()=!15#!*
5#!/010!2
)247(/01(
oocceessssiinn
2826#;1(98#!14!15#(,#)12?!/,()=!64)
*10()!011*/010>!
078#2>!
5#!64/#2!
Jh!
#)12!8;!142!0)!#<1*0(=51!/010
24!02!14!(
(4)C!
gg
97#!14!)4(*!1A;(60/010!#)1*A)3#)1(4)
*(981#2!4
4*!)0,#
4!lM!;#*604*/()0*0>!
(,;*43#!
(2A?!077A!58=#A!)2>!
:!()1#*#2
#2>!
6#)1!4:!1*A!0,48)
15#!k807
G&l!
#!
21?!!
15#!)1!
7(1A!
!
© Young H.
!
! Z4%2,441*#2473!
(1) M!
! !!L(! !!Q! !!O! !!O! ! !! !!O! ! !!+,D!'!
!
! !!5! ! !!
! !!5! ! !! ! !!
! !!5! ! !
Chun
%!14!67#015()=!4813()=!()64
issing V
()/!15#!3#7#1#!15#2#!15#!012#!15#!0120,;7#2#!15#!,d#=*#2
4!;70A!4
i
1 2 3 4 5
5'8J%<!"
&"3#*0
5'8J%<!-
&!"3#*! ! !
5'8J%<!4
&!.:!Y!]
0)!15#!/(1!)4(2A!/4)2(21#)6
Values
3078#!0)/#!30*(09711*(981#!,11*(981#!,#!67022>!,421!;*4922(4)?!/#
4*!)41!14!
Temp
84??868894
"B!L(77!%(0=#!1#,;
-B!L(77!%(*0=#!1#,! !
4B!V421!7]!b#2!0)
55..44.. DDaa
(*1A!/010!/010?!(/#)6(#2!()!15
/!:(77!(1!()#2!4*!(1#,,#0)!14!,#0)!:4*
9097#!307#6(2(4)!1*
;70A!=47
p, X1
4 ?? 6 8 4
(15!15#!01;#*018*#B
(15!15#!01,;#*018*#
7(F#7A!;*)/!XI!]!_
attaa CClleeaa
9A!:(77())1(:A()=!5#!/010a
)!,0)80,2!%(15:(77!()!15*!077!20,
78#!14!:(7*##?!4*!N
7:?!1501!(2
Wind, X
NoNoYesYesNo
11*(981#!,B!!!!!!!!
11*(981#!,#!%5#)!Y!
*#/(61(4)_4?!!15#)
aanniinngg
)=!()!,(20)/!*#,
077A>!5!,(22()=5#!,(22(),;7#2!9#7
77!()!15#!,N0A#2(0)
2!15#!k8#
X2 Pl
,#0)!
,#0)!4:!Y!]!b#2!!
)!)!XH!]!!!
22()=!30,43()=!4
=!3078#2>)=!3078#4)=()=!1
,(22()=!)!,#154/
#21(4)h!
lay?, Y
Yes Yes Yes No No
: 15#!20,
78#2?!4817(#*2?!0
>!>!14!15#!!
3078#>!/>!
,#!67022
G&n!
0)/!
!
© Young H.
(2) No!
! _4(!
! ! +!! P)Q!!
! ! !!
! ! !!
! ! !!
! ! !! !! !! !! !! !! !!
! !!
! PNQ!!
! ! !!
! ! !! ! !!
! ! !!
! ! !!
! ! !!
! ! !!
! ! !
Chun
oisy Da
(2#!(2!0!*
+,D!R07(/
R%S!8%!
!P7821#*(
!d#=*#22
!Z(214=*
! H>!r!!!!!!
I>!'
R%S!8%
!'5#!24*
!'5#!70*=! 2,4
!K1#;2!H
! ! I
! ! !
! ! !
! ! !
ata
*0)/4,!#
/!+PgJY
<'8'@8!%
()=B!4817
2(4)B!481
*0,B!
r80*1(7#B!&!K82;#! (:!! r&!J817(#! (:!! r
'5*##&2(=
!2K%%8J
*1#/!3078
=#*!15#!%4415()=>!
>!N())()
>!O2#!4)
&!K,44
&!K,44
&!K,44
#**4*!4*!3
Y2!2070*A
%6879'(2T
7(#*2!:077!
17(#*2!:07
#61#/!48: 15#*#!0*
H!s!H>G!e#*2!: 15#*#!0*
H!s!W>M!e
=,0!7(,(
J!%68T!!U
8#2!0*#!/
%(/15?!15
)=!9A!#k8
)#!4:!15#!
415()=!9A
415()=!9A
415()=!9A
30*(0)6#
AC?!g**4*!
T!
4812(/#!
7!:0*!:*4,
817(#*2!*#!4812(/e!+rW&rH
*#!4812(/e!+rW&rH
(1B! sx W!
U9//9/*
(21*(981#
5#!=*#01#
8(&/#;15
:4774%()
A!9()!,#
A!9()!,#
A!15#!674
#!()!0!,#
+"=#]IM
4:!15#!6
,!15#!7()
#!15#!())C!0)/!rW
#!15#!481C!0)/!rW
xs !
#/!()14!0!
#*!15#!#::
4*!#k8(&
)=!,#154
#0)2!
#/(0)2!
42#21!948
#028*#/!3
MMC!
67821#*2>!
)#0*!7()#
)#*!:#)6#
W!t!H>G!e
1#*!:#)6#
W!t!W>M!e
)8,9#*
:#61!4:!15
&%(/15>!
4/2B!
8)/0*A!3
30*(097#>
#>!
#2B!!+rW&rHC
#2B!!+rW&rHC
*!4:!9()2>
5#!!
3078#2!
G&HM!
>!
C!
C!!
>!
!
© Young H.
(3) In!
! !!K0!
! ! +!
! !!K0!
! ! +! ! !! ! !!
! K54!
! ! +! ! !!
+,D!V
!
! !
! !
! !
20
Chun
consis
0,#!64)
+,D!u@#)
0,#!3078
+,D!u8)/! g!)4! ! !
487/!9#!6
+,D!J)#!! %#*
V6879'(2B!
!.))#*!:#!J81#*!:#!W&2(=,0
40
tent Da
6#;1!981
)/#*?!K#<
8#!#<;*#
/#*=*0/84101(4)B!! !
64**#61#/
90)FY2!/#!94*)!4
!
Mean StandaMinimuFirst QSeconThird QMaxim
#)6#2!B!:#)6#2B!!0!7(,(12B!
60
ata
1!/(::#*#)
<?>>v?!ur
22#/!/(::
801#c!O@H>GWoI>og
/!,0)80
/010902#4)!HH\HH\
Norm
ard Deviatum
Quartile d Quartile
Quartile mum
!!!
80
o!
)1!011*(98
r8(E-H?!r
::#*#)17A
@?wv!ogM[!0)&MI!0)/!
077A!82()=
#!254%2!1\HH!
mal (100, 1
tion
e
100
"
81#!)0,#
rH?!>>v?!u
/!HGWIoM>MIo!
=!#<1#*)0
1501!Gq!
10)
9710719098
104130
120 1
#!
u22)?!218
o!
07!*#:#*#
4:!15#!68
7.97 0.48 1.56 0.98 8.98 4.77 0.58
140 16
8/#)1x22)
)6#2!
8214,#*
60 180
G&HH!
)?v!
2!!
!
0
!
© Young H.
+,D!P!!!&!5'!
! !!K1!
! ! !!
! !!K1!
! ! &!! ! !!
! ! &!! ! !!
! ! &!! ! !!
!
!!&!5'!! !!K1!
! ! !! ! !!! !!K1!
! ! &!! ! !!
! ! &!! ! !!
! ! &!! ! !!
Chun
P4)2(/#*!
'8J%<!"B!
1#;!HB!T0
! u[?!
1#;!IB!K,
NA!9()!! un?!n
NA!9()!! ul?!
NA!9()!9! u[?![
'8J%<!-B!
1#;!HB!T0
N()!*0)! !!
1#;!IB!K,
NA!9()!! !!
NA!9()!! !!
NA!9()!9! !!
15#!;*(6#
N())()=
0*1(1(4)!()
l?!HGv?!u
,4415()=
,#0)2!n?!nv?!uI
,#/(0)2l?!lv?!uI
948)/0*[?!HGv?!u
N())()=
0*1(1(4)!()
)=#B!!
,4415()=
,#0)2!
,#/(0)2
948)/0*
!
#!/010B!u
=!9A!#k8
)14!#k8(
uIH?!IH?!
=!
II?!II?!I
2!IH?!IH?!I
*(#2!uIH?!IH?!
=!9A!#k8
)14!#k8(
=!
2!
*(#2!
u[?!l?!HG?
(&/#;15
&/#;15!9
I[v?!uIo
IIv?!!
IHv?!!
I[v?!!
(&()1#*30
&()1#*307
?!IH?!IH?
9()2!4:!W!
o?!Il?!Wo
07!
7!9()2!4:
!I[?!Io?!
(1#,2>!
ov!
:!HMB!
Il?!Wov
G&HI!
!
© Young H.
!* Data!
! Q01! ! 2!
! !!K6! ! g!
! !!d#! ! !!
! !!Q8!
! !!Q! ! 9!!* Data!
! Q01! ! :!
! !!K,!
! !!"! ! +!
! !!@! ! +!
! !!_!
! !!P*
Chun
55..55.. DD
a Integ
10!:*4,!,214*#>!
65#,0!()g<`!68214
#/8)/0)P0)!9#
8;7(601#/
010!3078#9#6082#!4
a Trans
10!0*#!1*0:4*!/010!,
,4415!15
==*#=01(#>=>?!/0(7
#)#*07(E0#>=>?!21*#
4*,07(E0
*#01(4)!4
DDaattaa IInnt
ration
,871(;7#!
)1#=*01(44,#*&.Q
)1!30*(09#!/#1#61#
/!(1#,2!
#!64):7(64:!/(::#*
sformat
0)2:4*,#,()()=>!
5#!/010!14
(4)!7A!207#2!
01(4)!##1!]^!6(
01(4)B!7()
4:!)#%!3
tteeggrraattiioo
248*6#2
4)!]^!#)!y!6821&
7#2!#/!9A!64*
61!*#)1!8)(12
tion
#/!4*!64)
4!*#,43
/010!]^!
(1A!4*!648
)#0*!1*0)
30*(097#2
oonn aanndd
!0*#!64,
)1(1A!(/#)&)8,9#*
**#701(4)
2?!2607()=
)247(/01#
3#!15#!)4
,4)157A
8)1AC!
)2:4*,01
!+011*(98
TTrraannssff
,9()#/!()
)1(:(601(4
)!0)07A2(
=?!4*!#)6
#/!()14!:
4(2#>!
A!14107!0,
(4)!4:!15
81#2C!
ffoorrmmaattii
)14!0!645
4)!;*497#
(2!
64/()=!
:4*,2!0;;
,48)1C!
5#!*0%!/0
oonn
5#*#)1!/0
#,!
;*4;*(01
010!
G&HW!
010!!
#!!
!
© Young H.
* Nor!
! ! g! ! !!! H>!V!
! ! X
!
!!;4
!
! I>!E!
! ! X
!
! ! %! ! !!
! ! +!
! W>!_!
! ! V!
! ! X
!
! ! !
Chun
rmalizat
g2;#6(077,#154
V()&,0<
,+i
X
XX #
!K607()=;*#3#)12!43#*!:#018
E&264*#!)
xs
XX
$#i
%5#*#! x
! ! sx
+,@'7B! !
_4*,07(E
V43#!15#
j
XX
HMi#
!2865!1
tion
7A!(,;4*/2!+#>=>?!
<!)4*,07
,,0<
,(
X
XX
$
$
=!/010!30481%#(=8*#2!%(1
)4*,07(E
x
x$!
x !(2!15#!2
x!(2!15#!2
]K'"_
E01(4)!9A
#!/#6(,0
?!%5#*#!
1501!zX’,0
*10)1!:4*!2070*A!0
7(E01(4)
%,()
() iC
X
078#2!()!0=5()=!:#015!2,077#
01(4)!+E#
20,;7#!,0,;7#!21
_Q"dQ
A!/#6(,0
07!;4()1>
j!(2!15#!2
0<z!{!H>!
*#=*#22(0)/!0=#C
,0<i X$
0!*0)=#!2018*#2!%(#*!*0)=#!
#*4&,#0)
,#0)!0)/10)/0*/!
Q.|g+x?&'
07!2607()=
2,077#21
4)!0)/!6
(,()i X)
2865!02!}(15!70*=#7(F#!f0=#
)!)4*,0
/!/#3(01(4
'*&"C!
=!
1!()1#=#*
67022(:(60
,()iX !
}M?!H`?!}&#!*0)=#!7(#Y>!
07(E01(4)C
)>!
01(4)!!
H?!H`!(F#!f2070
C!
G&H[!
0*AY!
!
© Young H.
+,D!.2101(21!
!
!!+0C!V!
! ! !
!!!!+9C!E!
! ! !
!!!!+6C!_!
!!
!!
!
!!
! ! !
!
Chun
.%(K)79W
(62!4:!0!3
V()&,0<
!
&264*#!)
!
_4*,07(E
!'5#!*0)
!'5#!,0
!$#!/(32607#/!
!
W)89%/B!N30*(097#!
Mean StandaMinimuFirst QSeconThird QMaxim
<!)4*,07(
)4*,07(E0
E01(4)!9A
)=#!4:!X!
0<(,8,!
(/#!#06514!M>HWM
!
N02#/!4)X?!norm
Norm
ard Deviatum
Quartile d Quartile
Quartile mum
(E01(4)!(
01(4)!+E#
A!/#6(,0
(2!:*4,!
0924781#
5!X!9A!HMMGl>!
)!15#!:477malize!15
mal (100, 1
tion
e
)14!15#!*
#*4&,#0)
07!2607()=
jH>Go!14
#!3078#!(2
MMM!+(>#>?
74%()=!/5#!3078#!x
10)
9710719098
104130
*0)=#!+M
)!)4*,0
=!
4!HWM>Gl
2!HWM>Gl
?!j!]!WC!2
/#26*(;1(x]!lM>M>
7.97 0.48 1.56 0.98 8.98 4.77 0.58
?!HC!
7(E01(4)C
>!
l>!
24!1501!HW
(3#!!
C!
WM>Gl!(2!
G&HG!
!
! ! G&Ho!
© Young H. Chun
55..66.. DDaattaa RReedduuccttiioonn !
! J910()!0!*#/86#/!*#;*#2#)101(4)!4:!15#!/010!2#1!1501!(2!,865!2,077#*!()!3478,#?!A#1!6742#7A!,0()10()2!15#!()1#=*(1A!4:!15#!4*(=()07!/010>!!* Data Cube Aggregation !
! !!"==*#=01#!/010!689#2!+(1#,2C!14!*#/86#!15#!/010!3478,#>!!
! ! ! g<`! k80*1#*7A!207#2!/010!]^!A#0*7A!14107!0,48)1!! ! ! ! ! 9*0)65!4::(6#2!]^!*#=(4)07!!
!
!!* Dimensionality Reduction !! !!g7(,()01#!(**#7#30)1!4*!*#/8)/0)1!011*(981#2!+30*(097#2C>!!
! !!L()/!0!=44/!2892#1!4:!15#!4*(=()07!011*(981#2!+30*(097#2C>!!
! ! &!L4*%0*/!2#7#61(4)!,#154/!!
! ! &!N06F%0*/!#7(,()01(4)!,#154/!!
! ! &!K1#;%(2#!,#154/!!
Year = 2001
=6)(8'(! I)7'2!
rH! ~[Ml?MMM!
rI! ~WGM?MMM!
rW! ~Glo?MMM!
r[! ~II[?MMM!
Year = 2002
=6)(8'(! I)7'2!
rH! ~[Ml?MMM
rI! ~WGM?MMM
rW! ~Glo?MMM
r[! ~II[?MMM
Year = 2003
=6)(8'(! I)7'2!
rH! ~[Ml?MMM
rI! ~WGM?MMM
rW! ~Glo?MMM
r[! ~II[?MMM
Annual Sales
X')(! I)7'2!
IMMH! ~H?Gol?MMM!
IMMI! ~I?WGo?MMM!
IMMW! ~W?Gn[?MMM!
! ! G&Hj!
© Young H. Chun
+,D!P4)2(/#*!15#!67022(:(601(4)!;*497#,!%(15!o!9()0*A!30*(097#2B!!
! !!J*(=()07!/010!2#1!!
R0*(097#2! P7022!b!XH! XI! XW! X[! XG! Xo!
M! M! M! M! M! M! "!
M! H! H! M! H! H! Y!
M! M! M! H! M! M! "!
H! H! M! H! H! M! Y!
M! M! M! H! M! H! "!
H! H! H! H! H! H! Y!
H! M! M! M! M! M! "!
H! H! H! M! H! H! Y!
!! Q4!%#!*#077A!)##/!077!15#!2(<!30*(097#2!:4*!15#!67022(:(601(4)!;*497#,a!!_4;#h!!
!
!! !!d#/86#/!/010!2#1!!
R0*(097#2! P7022!b!XH! X[! Xo!
M! M! M! "!
M! M! H! Y!
M! H! M! "!
H! H! M! Y!
M! H! H! "!
H! H! H! Y!
H! M! M! "!
H! M! H! Y!
X[ ]!Ha
b#2h _4h
XH!]!Ha Xo ]!Ha
Y!]!Mh!
b#2hb#2h! _4h_4h
Y ]!Hh Y ]!Mh Y ]!Hh
! ! G&Hl!
© Young H. Chun
* Data Compression !! J910()!0!*#/86#/!4*!S64,;*#22#/U!*#;*#2#)101(4)!4:!15#!4*(=()07!/010>!!!! P"Q!Z(9/@93)7!$%K3%/'/82![/)7L292!+TP"C!!
!
!!!T*46#/8*#2!!
H>!_4*,07(E#!15#!();81!/010>!!
I>!P4,;81#!c!4*154)4,07!3#614*2!1501!;*43(/#!0!902(2!:4*!15#!)4*,07(E#/!();81!/010>!
!
W>!K4*1!1542#!3#614*2!()!4*/#*!4:!/#6*#02()=!2(=)(:(60)6#!4*!21*#)=15>!
!
[>!P5442#!0!:#%!;*()6(;07!64,;4)#)12!+3#614*2C!0)/!*#64)21*861!0!=44/!0;;*4<(,01(4)!4:!15#!4*(=()07!/010>!
!!! P-Q!A92@('8'!\)&'7'8!]()/2B%(K)89%/!+Q$'C!
YI!
YH!
XH!
XI!
! ! G&Hn!
© Young H. Chun
+,D!Q4!A48!4%)!0!*(/()=!70%)!,4%#*a!!
X1 (Income) X2 (Yard) Y
X’ Y
4 7 1 11 1 6 5 1 11 1 7 6 1 13 1 6 8 1 14 1 8 4 1 12 1 3 6 0 9 0 4 2 0 6 0 4 4 0 8 0 5 3 0 8 0 2 4 0 6 0
!
! !!K6011#*!;741!
!
! !!'*0)2:4*,01(4)!4:!15#!30*(097#2?!XH!0)/!XI>!!
! ! ! X’!]!!!!
! !!T#*:#61!67022(:(601(4)!%(15!0!1*0)2:4*,#/!30*(097#?!X’!!
! ! ! .:!X’!^!HM?!15#)!Y!]!H!!
! ! ! .:!X’!{!HM?!15#)!Y!]!M!! !
0
2
4
6
8
10
0 2 4 6 8 10
! ! G&IM!
© Young H. Chun
55..77.. DDiissccrreettiizzaattiioonn
!!!!Q010!/(26*#1(E01(4)!+0>F>0>!/010!601#=4*(E01(4)C!(2!0!;*46#/8*#!1501!10F#2!0!/010!2#1!0)/!64)3#*12!077!continuous!011*(981#2!14!categorical>!!
! !!d#/86#!15#!/010!2(E#!!
! !!K4,#!67022(:(601(4)!07=4*(15,2!+#>=>?!/#6(2(4)!1*##C!066#;1!!! ! 4)7A!categorical!/010>!!+,D!O2#!15#!()1#*307!709#7!+541!y!%0*,C!()21#0/!4:!15#!061807!!! ! 3078#!+1#,;#*018*#C>!!
!
!! !!N#:4*#B!!L05*#)5#(1!nHH!!
i Temp, X1 Wind, X2 Play?, Y
1 84 N Yes 2 85 N Yes 3 86 Y Yes 4 88 Y No 5 94 N No
!
! !!":1#*B!!$0*,!]!u'#,;!{!ljv!0)/!Z41!]!u'#,;!^!ljv!!
i Temp, X1 Wind, X2 Play?, Y
1 Warm N Yes 2 Warm N Yes 3 Warm Y Yes 4 Hot Y No 5 Hot N No
Numerical
Data!
Discretizer Categorical
Data
!
© Young H.
* How!
! !!Q!
! !!T*!
! ! H! ! !!
! ! I!* Num!
! '5#! ! ()!
! '5#! ! /! ! 4!! &!V! ! !!
! &!V! ! !!
! &!V! ! !!
* Disc!
! !!K8!
'+)
Chun
w?
(3(/#!15#
*46#22!
H>!'5#!82g<`!N(
I>!'5#!07
mber of
#!,4*#!()):4*,01(
#!70*=#!)/010!,()(43#*&:(11(
V#154/!H>! 150)
V#154/!I>! %5#
V#154/!W>! %5#
cretizat
8;#*3(2#
'5#!/(21(67022!709)41!82#!15
#!*0)=#!4
2#*!/#1#*()0*A!/(2
=4*(15,!
f Interva
)1#*3072!(4)!(2!*#1
)8,9#*!4()=!07=4)=>!
>!!'5#!)8)!15#!)8,
>!!'5#!)8#*#!n!(2!15
>!!'5#!)8#*#!c!(2!15
tion Alg
/!32>!8)
)61(4)!(29#7C>!!'5#5(2!():4*
4:!15#!011
*,()#2!1526*#1(E01(
/#1#*,(
als
0*#!82#/10()#/>!
4:!()1#*30*(15,!0)
8,9#*!4:,9#*!4:!6
8,9#*!4:5#!)8,9
8,9#*!4:5#!)8,9
gorithm
28;#*3(2
2!902#/!4#!8)28;#*,01(4)>
1*(981#!()
5#!)8,9(4)!
()#2!15#!%
/?!15#!,4
072!0::#6)/!()6*#0
:!()1#*3067022#2!%
:!()1#*309#*!4:!(1#
:!()1#*309#*!4:!670
ms
2#/!07=4*
4)!15#!82#*3(2#/!+
)14!2#3#
9#*!4:!/(2
%(/15!4:
4*#!4:!15
612!15#!#:02#2!15#!
072!25487%#!%0)1!
072!(2!2k*1#,2!()!15
072!(2!n\+W022#2>!
*(15,!
2#!4:!15#+67022&97
*07!()1#*3
26*#1#!()1
:!#065!()
#!4*(=()0
::(6(#)6A650)6#!4
/!)41!9#14!/#26*
1+nC!5#!1*0()()
WcC!
#!/#6(2(47()/C!07=4
3072>!
1#*3072>
)1#*307>!
07!!
A!4:!15#!4:!!
#!2,077#**(9#>!
)=!2#1!!
)!011*(984*(15,2!
G&IH!
*!
81#!/4!
!
© Young H.
P"Q!U9!
! &!gk!
324
!
! ! !!
! &!gk!
0())
!+,D!'!
!
! &!gk!
! ! +,!
! ! .)!
! ! Q!! &!gk!
! ! n!
! ! .)!
! ! Q
Chun
//9/*!5
k807!()1#
L()/!1530*(097#!02;#6(:(#/4817(#*2>!
g<`!L(
k807!:*#k
K4*1!150)/!15#)!)1#*3072!)8,9#*!4
#,;#*01
X
k807!()1#
,0<&,()
)1#*307B!
Q(26*#1(E
k807!:*#k
n!(1#,2!\!
)1#*307B!
Q(26*#1(E
5'8J%<!+
#*307!%(/
5#!,()(,0)/!15#!//!#k807!%
)07!=*0/
k8#)6A!/
5#!3078#2/(3(/#!15()!2865!4:!(1#,2>
18*#!]!u$
82 84
#*307!%(/
)C\W!]!!
!
E01(4)B!!
k8#)6A!/
W!]!!
!
E01(4)B!!
+unsuper
/15!/(26*
,8,!0)//(3(/#!15%(/15!/(2
/#!]!u"?!
/(26*#1(E0
2!4:!#0655#,!()140!%0A!15>!
$0*,?!Z
4 87
/15!/(26*
/(26*#1(E0
rvised!07
*#1(E01(4)
/!15#!,05(2!*0)=#26*#1#!()1
N?!P?!Q
01(4)!
5!30*(0974!15#!82#501!#065!
Z41?!R#*A
88 91
*#1(E01(4)
01(4)B!)8
7=4*(15,
)!
0<(,8,!3#!()14!0!)1#*3072> !
?!Lv!
7#!()!0)!0#*&2;#6(:()1#*307
A!Z41v!
1 93
)B!)8,9#
8,9#*!4:
,C!
3078#2!:4)8,9#*!4K#)2(1(3
026#)/():(#/!)8,7!64)10()
95 96
#*!4:!()1#
:!()1#*30
4*!#065!4:!82#*&3#!14!
)=!4*/#*!,9#*!4:!)2!15#!20,
6 98
#*3072!]!
072!]!W!
G&II!
,#!
W!
! ! G&IW!
© Young H. Chun
P-Q!k0!5')/2!$7628'(9/*!A92@('89W)89%/!
!V()(,(E#!15#!28,!4:!15#!2k80*#/!/(210)6#2!4:!077!3#614*2!()!0!67821#*!/4,0()!14!15#!67821#*!6#)1#*!+6#)1*4(/C?!vi>!
!
!
!!P4Q!R%78'^2!V/'!167'!A92@('89W'(!P"1AQ!
!!!!K4*1!15#!492#*3#/!3078#2!4:!0!64)1()8482!30*(097#!0)/!011#,;1!14!=*##/(7A!/(3(/#!15#!/4,0()!4:!15#!30*(097#!()14!9()2!1501!#065!64)10()2!4)7A!()210)6#2!4:!4)#!;0*1(6870*!67022>!!
!!!'4!034(/!4)#!9()!:4*!#065!492#*3#/!3078#?!15#!07=4*(15,!(2!64)21*0()#/!14!:4*,!9()2!4:!01!7#021!0!;*#&2;#6(:(#/!)8,9#*!4:!492#*301(4)2>!+#<6#;1!15#!*(=51,421!9()C!!
!!!"!,()(,8,!9()!2(E#!4:!o!(2!28==#21#/!902#/!4)!0)!#,;(*(607!218/A>!
!P:Q!1'@6(29&'!59/9K)7!+/8(%3L!Z)(8989%/9/*!
!!P>Q!$J90I_6)('!]'28!!! !
vH! vI vW
P47/! Z41$0*,
! ! G&I[!
© Young H. Chun
55..88.. DDaattaa MMiinniinngg SSooffttwwaarree !
!!!T*#;*46#22()=!()!$#F0!(2!/4)#!9415!,0)8077A!+#>=>!9A!1(6F()=!*#7#30)1!011*(981#2C!0)/!0814,01(6077A>!"814,01(6!;*#;*46#22()=!(2!/4)#!9A!:(71#*2>!'5#2#!503#!14!9#!2#7#61#/!:*4,!15#!L(71#*2!,#)8>!!L(71#*2!0*#!0//#/?!0)/!*#2871()=!)#%!/0102#12!60)!*#;706#!15#!();81!/0102#1!()!0)!0)07A2(2>!!!
! !!K#7#61(4)!4:!*#7#30)1!30*(097#2!!
weka.filters.unsupervised.attribute.Remove !
")!()210)6#!:(71#*!1501!/#7#1#2!0!*0)=#!4:!011*(981#2!:*4,!15#!/0102#1>!
!
! !!Q(26*#1(E01(4)!!
weka.filters.unsupervised.attribute.Discretize !
")!()210)6#!:(71#*!1501!/(26*#1(E#2!0!*0)=#!4:!)8,#*(6!011*(981#2!()!15#!/0102#1!()14!)4,()07!011*(981#2>!Q(26*#1(E01(4)!(2!9A!2(,;7#!9())()=>!!KF(;2!15#!67022!011*(981#!(:!2#1>!
!
! !!_4*,07(E01(4)!!
weka.filters.unsupervised.attribute.Normalize !
_4*,07(E#2!077!)8,#*(6!3078#2!()!15#!=(3#)!/0102#1>!'5#!*#2871()=!3078#2!0*#!()!}M?H`!:4*!15#!/010!82#/!14!64,;81#!15#!)4*,07(E01(4)!()1#*3072>!
!
! !!V(22()=!3078#!!
weka.filters.unsupervised.attribute.ReplaceMissingValues !
d#;706#2!077!,(22()=!3078#2!:4*!)4,()07!0)/!)8,#*(6!011*(981#2!()!0!/0102#1!%(15!15#!,4/#2!0)/!,#0)2!:*4,!15#!1*0()()=!/010>!!
! !
! ! G&IG!
© Young H. Chun
!
TThhee WWiizzaarrdd ooff OOddddss
!
Simpson's Paradox and Data Aggregation
X0*=#!64*;4*01(4)2!0)/!64,;0)(#2!0*#!/(3(/#/!()14!2#3#*07!/(3(2(4)2?!289/(3(2(4)2?!/#;0*1,#)12?!0)/!24!4)>!!g<6#;1!:4*!5(=5&7#3#7!,0)0=#*(07!;42(1(4)2?!#,;74A,#)1!/#6(2(4)2!828077A!10F#!;706#!01!15#!/#;0*1,#)107!4*!/(3(2(4)07!7#3#7>!!")07AE()=!0==*#=01#!#,;74A,#)1!/010!()!2865!64,;0)(#2!60)!=(3#!*(2#!14!0!68*(482!;5#)4,#)4)!F)4%)!02!Simpson's paradox>!
")!()21*861(3#!0)/!28*;*(2()=!#<0,;7#!4:!K(,;24)i2!;0*0/4<!4668**#/!01!15#!O)(3#*2(1A!4:!P07(:4*)(0!01!N#*F#7#A!()!15#!HnjM2>!!g<0,()01(4)!4:!0;;7(60)1!/010!:4*!0!HnjW!k80*1#*!*#3#07#/!1501!15#!43#*077!*01#!4:!0/,(22(4)!:4*!:#,07#!0;;7(60)12!14!15#!=*0/801#!265447!%02!289210)1(077A!7#22!150)!15#!*01#!4:!0/,(22(4)!:4*!,07#!0;;7(60)12>!
$5(65!/#;0*1,#)12!01!N#*F#7#A!%#*#!*#2;4)2(97#!:4*!15(2!(,9070)6#a!!!
!!
'%4!T4;8701(4)2!_8,9#*!
0;;7(#/!
_8,9#*!
0/,(11#/!
T#*6#)1!
0/,(11#/!
Q#;0*1,#)1! V07#! ! ! !
4:!V015#,01(62! L#,07#! ! ! !
Q#;0*1,#)1! V07#! ! ! !
4:!g)=7(25! L#,07#! ! ! !
P4,9()#/!V07#! HHM! nH! lI>jq!
L#,07#! HHM! Hn! Hj>Wq!
!
!
© Young H.
e!L*4,!!
!!!p"!6415#!jM!%%54!0;;,07#2!0;:#,07#!0
!!!"!:#/*#3(#%()15#!:0614(**#=870481!15010;;7(60)50/!0;;6487/!3#
!!!N81!15:#,07#!0/#)(#/!#68**#)1!90117#?!1564,;0)70%A#*!1=43#*),
!!!P0)! A20,#!*0
!
$5
;
N78
;
P4
!!
Chun
V0*(7A)!3
4,;0)A!/#6%5(1#&64770;7(#/?!IMq;;7A()=!:40;;7(60)12!
#*07!gk807!)=!15#!5(*(4*A!150)!:#0*(1(#2!()!5(1!()!9415!15)12!5(*#/!%;7(#/!s!0!6(*#*A!%#77!50
5#!=43#*),0;;7A()=!:4#,;74A,#70%!(2!%*(15#!64,;0))Ai2!;*#2(/#14!#<;70()!,#)1!%4*F
A48! #<;70(0%!/010ap!!!
5(1#&64770*
;42(1(4)!
8#&64770*!
;42(1(4)!
4,9()#/!
342!K030)1?
6(/#/!14!#<0*!;42(1(4)2q!%#*#!5(*#4*!15#!978#&%#*#!5(*#/
g,;74A,)=!;*061(6##,07#2?!24!(*()=?!15#!6#!%5(1#&64%02!=*#01#*!*68,210)6#03#!*#:7#61#
,#)1!4::(6(0:4*!0!D49!01!)1?!%5(7#!,11#)?!15(2!#3)A!%02!:()##)1!%02!7#:(1?!15#!70%A#/>!
()! 54%! 1%! !
!
*! V
L
V
L
V
L
!
?!S"2F!V0*
<;0)/?!24!(12?!IMM!,07##/?!%5(7#!4)&64770*!;42(/> !
,#)1!J;;4*#2!)41#/!155#!/#6(/#/4,;0)A!;*4770*!0)/!97150)!(1!%0#!9#A4)/!5#/!,4*#!:#
07!;*4/86#15#!:0614*A,07#!0;;7(63(/#)6#/!0#/?!15#!:061:1!%4)/#*()A#*!*#;7(#/
%4! 4;;42()
V07#!
#,07#!
V07#!
#,07#!
V07#!
#,07#!
*(7A)?U!Par
1!4;#)#/!0!#2!0)/!IMM)7A!HGq!4(1(4)2?!jGq
*18)(1A!#):4501!,0)A!,/!14!()3#21(*#2(/#)1!/#78#&64770*!:02!:4*!,07#5(2!64)1*47!#,07#!150)!
/!5(2!4%)!A!50/!0!9#160)12!50/!40!3(4701(4)>14*A!%02!2#)=!%501!5#/!1501!5#!)#
)=! 2101(21(60s!P5*(214;5
_8,9#
0;;7(#
!
!
!
!
!
!
rade Maga
:0614*A!=#M!:#,07#2!0:!15#!,07#2q!%#*#!5(*#
4*6#,#)1!4,4*#!,07#2(=01#>!d#2;#)(#/!0)A!/:(#7/2?!15#!;2>!!Z#!254%s!5(2!%4*F,07#!#,;
2101(21(62?!%11#*!150)!Gl4)7A!0![Gq>>>":1#*!0!7##(E#/!0)/!2#!/(/!%*4)=#3#*!%487/
07! 48164,5#*!V6X08=5
#*!
#/!
_8
0/,
azine?![\Il\
#)#*01()=![0;;7(#/>!J:2!%#*#!5(*##/?!%5(7#!l
4::(6(07!2!%#*#!5(*#/;4)/()=!14!/(26*(,()01;#*6#)10=#%#/!1501!(:F:4*6#!64,;74A##2>!
%5(65!254%lq!650)6#q!/#)(07!*01#)=15A!0)/2581!/4%)?=>!!$5#)!5/!8)/#*210)
#2! 0*#! *#057()?!J*0)=#!
8,9#*!
,(11#/!
!
!
!
!
!
!
\Hnno?!;>!o
[GG!D492>!L:!15#!:#,07##/>!J:!15#![lGq!4:!15#!
/!01!650*=#2!4:1(4)?!;4()1(#!4:!:#,07#:!,4*#!:#,,;42(1(4)!
%#/!1501!0!#!4:!9#()=!1#>!"2!15#!/!64217A!648?!0)/!15#!5#!02F#/!5()/!54%!15#
65#/! :*4,T0*F?!L70>!
T#*6#)1
0/,(11#/
!
!
!
!
!
!
G&Io!
o>!
4*!#2![MM!
:!()=!#!,07#2!
8*1!
(2!#!
,! 15#!
1!
/!
! ! G&Ij!
© Young H. Chun
EExxeerrcciissee PPrroobblleemmss !
Z(%N7'K!">!K8;;42#!1501!15#*#!0*#![M!,07#!0)/!IM!:#,07#!218/#)12!()!Q*>!P58)Y2!.KQK![H[H!67022>!!Q*>!P58)!;477#/!0!stratified!*0)/4,!20,;7#!4:!G!,07#2!0)/!G!:#,07#2!:*4,!5(2!67022>!!g065!,#,9#*!4:!15#!20,;7#!%02!02F#/?!SQ(/!A48!*#6#(3#!8)08154*(E#/!5#7;!4)!A48*!7021!10F#&54,#!#<0,()01(4)aU!!'%4!4:!15#!G!,07#2!0)/!:48*!4:!15#!G!:#,07#2!()!15#!20,;7#!54)#217A!0)2%#*#/!Sb#2>U!
!!!L()/!0)!8)9(02#/!#21(,01#!4:!15#!;*4;4*1(4)!+!4:!15#!218/#)12!%54!*#6#(3#/!8)08154*(E#/!5#7;!4)!15#!#<0,>!!
! !!!!
Z(%N7'K!->!d065#7!L03074*4!28*3#A#/!Go!*(/()=&,4%#*!4%)#*2!()!_#%!J*7#0)2?!0)/!:48)/!15#!:4774%()=!/#26*(;1(3#!2101(21(62!:4*!15#!5482#547/!()64,#!+!x!~H?MMMCB!
Descriptive Statistics!Mean 72.5
Variance 100.0
Minimum 50.0
First Quartile 58.4
Second Quartile 69.5
Third Quartile 75.4
Maximum 250.0 !
">!Z#!(2!()1#*#21#/!()!/#1#61()=!0)A!outliers!()!15#!/010>!!
+0C!L()/!15#!three-sigma limits!:4*!15#!/010>!!
! !!!!!+9C!L()/!15#!outer-fences!:4*!15#!/010>!!
! !!!
! !!!
N>!Z#!(2!0724!()1#*#21#/!()!normalizing!15#!4*(=()07!/010!2#1>!!
+6C!$501!(2!15#!)4*,07(E#/!z!264*#!:4*!0)!492#*301(4)!X!]!jMa!!
! !!!!+/C!O2#!15#!,()&,0<!)4*,07(E01(4)!14!1*0)2:4*,!15#!3078#!X!]jM!4)14!15#!*0)=#!}&H>M?!tH>M`>!!
! !!!