6
Z Essential Tools for Understanding Statistics & Probability - Rules, Concepts, Variables, Equations, & t s'f Problems, Helpful Hints & .& Common Pitfalls DESCRIPTIVE STATISTICS Methods used to simply describe data set that has been observed KEY TERMS & SYMBOLS Sample Problems & . quantitative data: data variables that represent some numeric 1. A student receives the following exam grades in a course: 67,88,75,82,78 quantity (is a numeric measurement). a. Compute the mean: x = LX = 67 + 88 + 75 + 82 + 78 390 =78 categorical (qualitative) data: data variables with values that n 5 5 reflect some quality of the element; one of several categories, not b. What is the median exam score? a numeric measurement. in order, the scores are: 67,75,78,82, 88; middle element = 78 population: "the whole"; the entire group of which we wish to c. What is the range? range = maximum - minimum = 88 - 67 = 21 speak or that we intend to measure. d. Compute the standard deviation: sample: "the part"; a representative subset of the population. s= - x)' = (67 -78)' + (88 - 78)' + (75 -78)' + (82 -78 )' + (78 - 78)' {2464 6 =.J615 = 7.84 .-1 4 '1/4 simple random sampling: the most commonly assumed method for selecting a sample; samples are chosen so that every possible sample h . h f h d f x - x 88 - 78 10 e. W at IS t e z score or t e exam gra e 0 88? = -s- = 7.8'1 = 7.84 = 1.28 of the same size is equally likely to be the one that is selected. N: size of a population. 2. The residents of a retirement comm- Sums unity are surveyed as to how many times x = # of marriages 0 n: size of a sample. I 2 3 4 nla they've been married; the results are f= # of ob servati ons 13 42 37 12 6 110 = 1/ x: the value of an observation. given in the following frequency table: xl 0 42 74 36 24 176 _ LX! 176 L..-_____________--' f: the frequency of an observation (i.e., the number of times it occurs). a. Compute the mean: x = --n- = T10 = 1.6 frequency table: a table that lists the values observed in a data set along with the frequency with which it occurs. b. Compute the median: Since n ="L;f = 110, an even number, the median is the average (population) parameter: some numeric measurement that of the observations with ranks and +1 (i.e., the 55 th and 56 th observations) describes a population; generally not known, but estimated from .& While we could count from either side of the distribution (from 0 or from 4), it is sample statistics. EX: population mean: II; population standard deviation: a; easier here to count from the bottom: The first 13 observations in rank order are all pop,u/ation proportion: p (sometimes denoted 'IT) 0; the next 42 (the 14th through the 55 th ) are all 1; the 56 th through the 92 0d are all 2; since the 55 th is a 1 and the 56 th is a 2, the median is the average : (1 + 2) / 2 = 1.5 (sample) statistic: some numeric measurement used to describe data in a sample, used to estimate or make c. Compute the IQR: To find the IQR, we must first compute Q1 and Q3; if we divide n in inferences about population parameters. half, we have a lower 55 and an upper 55 observations; the "median" of each would EX: sample mean: X; sample standard deviation: s; have rank n;' = 28; the 28 th observation in the lower half is a 1, so Q1 = 1 and the 28 th sample proportion: p observation In the upper half is a 2, so Q2 = 2; therefore, IQR = Q3 - Q1 = 2 - 1 = 1 measures of center from a frequency table sensitive to extreme values; any outlier will influence the mean; (measures of central I - Lx! more useful for symmetric data =--n tendency) X ______+-__________ indicate which value is median n odd: median has rank n + 1 not sensitive to extreme values; typical for the data set the middle n even: median is the -2- more useful when data are skewed element in average of values with ranks !!. and !!. + 1 2 2 order of rank mode the observation with the highest frequency J ""( mly measure of center appropriate for categorical data mid-range not often used; highly sensitive to unusual values; maximum + minimum easy to compute 2 measures of variation sample (measures of variance 2 ___L __ (_X_- __ X_)_2_________________________________... I_n_o_t __ o_ ften used; units are the squares of those for the data S _ = . dispersion) reflect the variability of the data (i.e ., how sample square root of variance; sensitive to extreme values; different the values are standard s = commonly used from each other) deviation n-l interquartile IQR = Q3 - Q1 (see quartile, below) less sensitive to extreme values range (JQR) range maximum - minimum not often used; highly sensitive to unusual values; easy to compute percentile data divided into 100 equal parts by rank (i.e., the k'h per- important to apply to normal distributions centile is that value greater than k% of the others) (see probability distributions) measures of relative standing quartile data divided into 4 equal parts by rank: Q3 (third quartile) used to compute IQR (see IQR, above); Q3 is often viewed (measures is the value greater than 3,4 of the others; Q1 (first quartile) as the "median" of the upper half, and Q1 as the "median" of relative position) is greater than 1.4; Q2 is identical to the median of the lower half; Q2 is the median of the data set indicate how a particular value z score measures the distance from the mean in terms ::: = X - X to find the value of some observation, x, of standard deviation compares to the others S when the z score is known: X = X + ZS in the same data set 1

Statistics Equations and Answers - Qs

Embed Size (px)

DESCRIPTION

Basic Statistics Equations

Citation preview

Z

Essential Tools for Understanding Statistics amp Probability - Rules Concepts Variables Equations amp t sf Problems Helpful Hints amp amp Common Pitfalls

DESCRIPTIVE STATISTICS Methods used to simply describe data set that has been observed

KEY TERMS amp SYMBOLS Sample Problems amp quantitative data data variables that represent some numeric 1 A student receives the following exam grades in a course 6788758278quantity (is a numeric measurement)

a Compute the mean x= LX =67 + 88 + 75 + 82 + 78 390 =78 categorical (qualitative) data data variables with values that n 5 5 reflect some quality of the element one of several categories not b What is the median exam score a numeric measurement in order the scores are 67757882 88 middle element = 78 population the whole the entire group of which we wish to c What is the range range = maximum - minimum = 88 - 67 = 21 speak or that we intend to measure d Compute the standard deviation sample the part a representative subset of the population s= ~L(X - x) = (67 -78) + (88 - 78) + (75 -78) + (82 -78) + (78 - 78) 24646 =J615 =784

-1 4 14simple random sampling the most commonly assumed method for selecting a sample samples are chosen so that every possible sample h h f h d f x - x 88 - 78 10 e W at IS t e z score or t e exam gra e 0 88 ~ =-s- =781 = 784 =128 of the same size is equally likely to be the one that is selected

N size of a population 2 The residents of a retirement comm- Sums unity are surveyed as to how many times x = of marriages 0n size of a sample I 2 3 4 nla theyve been married the results are f = of observations 13 42 37 12 6 110 = 1

x the value of an observation given in the following frequency table xl 0 42 74 36 24 176 _ LX 176 L-_____________-- f the frequency of an observation (ie the number of times it occurs)

a Compute the mean x = --n- = T10 = 16frequency table a table that lists the values observed in a data set along with the frequency with which it occurs b Compute the median Since n =Lf = 110 an even number the median is the average (population) parameter some numeric measurement that of the observations with ranks ~ and ~ +1 (ie the 55th and 56th observations) describes a population generally not known but estimated from amp While we could count from either side of the distribution (from 0 or from 4) it issample statistics

EX population mean II population standard deviation a easier here to count from the bottom The first 13 observations in rank order are all popuation proportion p (sometimes denoted IT) 0 the next 42 (the 14th through the 55th) are all 1 the 56th through the 920d are all 2

since the 55th is a 1 and the 56th is a 2 the median is the average (1 + 2) 2 = 15(sample) statistic some numeric measurement used to describe data in a sample used to estimate or make c Compute the IQR To find the IQR we must first compute Q1 and Q3 if we divide n in inferences about population parameters half we have a lower 55 and an upper 55 observations the median of each would

EX sample mean X sample standard deviation s have rank n = 28 the 28th observation in the lower half is a 1 so Q1 = 1 and the 28th sample proportion p observation In the upper half is a 2 so Q2 = 2 therefore IQR = Q3 - Q1 = 2 - 1 = 1

measures of center from a frequency table sensitive to extreme values any outlier will influence the mean (measures of central I - Lx more useful for symmetric data

=--ntendency) X ~-------4--------~~------------------ ~______+-__________indicate which value is

median n odd median has rank n+ 1 not sensitive to extreme valuestypical for the data set the middle n even median is the - 2- more useful when data are skewed element in average of values with ranks and + 1

2 2order of rank

mode the observation with the highest frequency J (mly measure of center appropriate for categorical data ~-------4------

mid-range not often used highly sensitive to unusual values maximum +minimum easy to compute2

measures of variation sample (measures of variance 2___L__(_X_-__X_)_2_________________________________I_n_o_t __o_ften used units are the squares of those for the data

S_ = ---n-~---l--- dispersion) reflect the variability ~

of the data (ie how sample square root of variance sensitive to extreme values different the values are standard s= ~L(x-xf commonly used from each other) deviation n-l

interquartile IQR = Q3 - Q1 (see quartile below) less sensitive to extreme values range (JQR)

~------------------range maximum - minimum not often used highly sensitive to unusual values easy

to compute

percentile data divided into 100 equal parts by rank (ie the kh pershy important to apply to normal distributions centile is that value greater than k of the others) (see probability distributions)

measures of relative standing quartile data divided into 4 equal parts by rank Q3 (third quartile) used to compute IQR (see IQR above) Q3 is often viewed (measures is the value greater than 34 of the others Q1 (first quartile) as the median of the upper half and Q1 as the median

of relative position) is greater than 14 Q2 is identical to the median of the lower half Q2 is the median of the data set indicate how a particular value z score measures the distance from the mean in terms = X - X to find the value of some observation x of standard deviation compares to the others

S when the z score is known X =X + ZSin the same data set

1

---

KEY TERMS amp SYMBOLS

probability experiment any process with an outcome regarded as random

sample space (S) the set of all possible outcomes toss a fair coin twice P HH HT TH TT there are two ways to get heads just once from a probability experime nt

events (A B C etc) subsets of the sample spaceroll a fair die 1 2 3 4 5 6 many problems are best solved by a careful consid e ra tishy

on of the defined eventsroll two fair dice ( (1 1) (12) (13) (21) (22) (23) (64) (65) (66) ~ a total of 36 outcomes six for the first die times another

peA) the probability of event A for any event Asix for the second die os P(A)S 1 and for the e ntire sample space 5 P(S) =1 --~-oy- ir- __r-b-- g-I or B Ghave a baby

equally likely outcomes a very common assumptionpick an orange from one of the ( some positive real number in some unit of weight this in solving problems in probability if a ll outcomes in the trees in a grove and weigh it ~ would be a continuous sample space sample space 5 are equally likely then the probability

of some event A can be calculated as

P A) = number of simple ollcmlles E A

( lolaI nunlber of s111ple Olltc0l11eS

Probability Rules ill Knowing that events are disjoint can make things much easier since Rule Formula

otherwise peA and B) can be difficult to find addition rule peA or B) = peA) + PCB) - peA and B) (or) if A and B are disjoint peA or B) = P(A) + PCB)complementary the complement of peA) + peN) = 1

event A (denoted AC or A) (any event will either means not A it consists happen or not) thus ill Subtract peA and B) so as not to count twice the elements of both of all simple outcomes in peA) = 1 - peN) A and B 5 that are not in A P(AC) = 1 - P(A)

multiplication peA and B) = P(A)P(BIA) equivalently peA and B) = P(B)P(AIB) ( The law of complements is a useful tool since its often easier to find the rule (and) if A and B are independent peA and B) = P(A)P(B) ~ probability that an event does NOT occur ill While it doesnt matter whethe r we condition on A (first) or condition

on B (second) generally the information available will require one orindependent the occurrence of one P(AIB) = peA)

the otherevent does not affect the and P(BIA) = PCB) probability of the other so peA and B) = P(A)P(B) conditional P(AandS) P(AandS)and vice versa Aprobability rule p(AIB) = PB) p(BIA) = PA)

(given that)( Events are often assumed to be independent particularly ~ repeated trials jJ By multiplying both sides by PC B) or peA) we see this is a re phrasing of

the multiplication rule conditional probabilities are often difficult to assess an alternative way of thinking about P(AIB) is that it is the

Probability Distributions proportion of elements in B that are ALSO in A

When some number is derived from a probability experiment it is called a total To find the probability of an event A if the samplerandom variable probability space is partitioned into several disjoint and exhaustive

Every random variable has a probability distribution that determines the rule events 0 0 0 0 then since A must occurbull 2 3 ~probabilities of particular values along With one and only one of the Ds

For instance when you roll a fair six-sided die the resulting number (X) is a peA) = peA and 0) + P(A and O2) + + peA and OJ random variable with the following IDKrm probability distribution =P(D)P(AID) + P(DJP(AIDz) + + P(Dk)P(AIDk)

In the table to the right P(X) is called the probability ill The total probability rule may look complicated but it isnt distribution function (pdf) (see sample problem 3a next page)

Since each value of P(X) represents a probability pdfs must follow the basic probability rules PiX) must always be Bayes With two events A and B using the total probability rule between 0 and I and all of the values P(X) sum to I Theorem

P(BA)= P(AandB) = P(4andB) _ P(B)(AiS)Other probability distributions are continuous They do not asshy I P(A) PAltndB)P(AandB ) P(B)P(A IB)+P(B )(A IB ) sign specific probabilities to specific values as above in the discrete case instead we can measure probabilities only over ( Bayes Theorem allows us to reverse the order of a conditional a range of values using the area under the curve of a probshy ~ probability statement and is the only generally valid method ability density function Much like data variables we often measure the mean (expectation) and standard deviation of random variables if we can charshy Sample Problems amp acterize a random variable as belonging to some major family (see table below)

1 Discrete random variable X follows thewe can find the mean and standard deviation easily in general we have following probability distribution

Type of Random Variable General Formula for Mean General Formula for Standard Deviation X 0 1 2 3 sums PIX) 01 5 025 04 02 1 (always) XP(X) 0 025 O B 06 165-E(X) discrete ( ) ()

(X takes some countable J1=E X =LXP X a=sD(x)=liX2 p(X)-f12 X PIX) 0 025 16 18 365 number of specific values)

--------~~-----------------------4------- - a What is the expected value of X continuous ( ) f () J(X has uncountable J1 = E X = XP X dX a =SD(X) = f X P(X)dX -1 J1 =E +(X) =IXp(X) = 165 possible values and P(X)

b What is the standard deviation of Xcan be measured only over intervals) a=SD(X)=JIX2 P(X)-f1 2 = ill Fortunately most useful continuous probability distributions do not require integration in practice

other formulas and tables are used )365 -1652 =~O9275 =0963

2

all outcomes are consecutive integers and all are equally likely

some fixed number of independent trials with the same n = fixed number of trials np ~l1p(l-p)binomial probability of a given event each time X = total number of

times the event occurs p = probability that the designated

event occurs on a given trial

Commonly used distribution symmetric if p = 05 only valid values for X are 0 X n

events occur independently at some average rate per interval of timespace X = total number of times the event occurs

A = mean number of events per interval

6 There is no upper limit on X for the Poisson distribution

geometric a series of independent trials with the same probability of a p = probability that the event occurs P(X) = (1 _ py1 pgiven event X = of trials until the event occurs on a given trial

6 Sincee only count trials until the event occurs the first time there is no need to count the C x arrangements as in the binomial

hyper- drawing samples from a finite population with a categorical N =

geometric outcome X = of elements in the sample that fall n = in the category of interest K =

Sample Problems amp

1 A sock drawer contains nine black socks six blue socks and five white socks-none paired up reach in and take two socks at random without replacement find the probability that

6 There are 20 socks total in the drawer (9 + 6 + 5 = 20) before any are taken out in situations like this without any other information we should assume that each sock is equally likely to be chosen

aboth socks are black

j)P(both are black) = P(first is black AND second is black) = P(first is black)p(second is black I first is black)

=2- x = ~=R= 0189 20 19 20 x 19 380

b both socks are white

j)[Expect a smaller probability than in the preceding problem as there are fewer white socks from which to choose]

As above we lose both one of the socks in the category as well as one of the socks total after selecting the first

2 x -plusmn- = ~ = ~ = 0053 20 19 20 x 19 380

cthe two socks match (ie that they are of the same color)

j)There are only three colors of sock in the d rawer P(match) = P(both black) + P(both blue) + P(both white)

9 8 6 5 5 4 122 = 20 x 19 + 20 x 19 + 20 x 19= 380 =0321

d the socks DO NOT match

6 For the socks not to match we could have the first black and the seshycond blue or the first blue and the second white or a bunch of other possibilities too it is much safer as well as easier to use the rule for complements-common sense dictates that the socks will either match or not match so P(socks DO NOT match) = 1 - P(socks do match) - 1 - 0321 =0690

2 In a particular county 88 of homes have air conditioning 27 have a swimming pool and 23 have both what is the probability that one of these homes chosen at random has

a air conditioning OR a pool

n The given percentages can be taken as probabi lities for these events ~ so we have P(AC) = 088 P(pool) = 027 and P(AC and pool) = 023

b NEITHER air conditioning NOR a pool

j)By the addition rule P(AC or pool) = P(AC) + P(pool) - P(AC and pool) 088 + 027 - 023 = 092

Upon examination of the event this is the complement of the above event P(neither AC nor pool) = P(no AC AND no pool) = 1 - P(AC or pool) = 1 - 092 = 008

population size p(x)= KCN - KC - sample size number in category in population N C

chas a pool given that it has air conditioning

6 This is the same as asking What proportion of the homes with air conditioning also have pools Whenever we use the phrase given that a conditional probability is indicated

P(pool lAC) = P(pool and AC) 023 _ P(AC) 088 - 0261

d has air conditioning given that it has a pool

j)This probability is much greater since more homes have air conditioning than pools

6 [CAUTION This is NOT the same as the preceding problem-now were asked what proportion of homes that have pools ALSO have air

conditioning]

The event in the numerator is the same what has changed is the condition

P(AC I 001) = P (pool and AC ) 0deg2273 = 0852 P P( AC)

3 The TIC Corporation manufactures ceiling fans each fan contains an electric motor which TIC buys from one of three suppliers 50 of their motors from supplier A 40 from supplier B and 10 from supplier C of course some of the motors they buy are defective-the defective rate is 6 for supplier A 5 for supplier B and 30 for supplier C one of these motors is chosen at random find the probability that

n We have here a bunch of statements of probability and its useful to list ~them explicitly let events A B and C denote the supplier for a fan motor

and D denote that the motor is defective then PtA) = 05 P(B) = 04 and P(C) = 01

The information about defective rates provides conditional p robabilities

P(DIA) = 006 P(DIB) = 005 and P(DIC) = 03

We can also note the complementary probabilities of a motor not being defecshytive P(DCIA) = 094 P(DCIB) = 095 and P(DclC) = 07

a the motor is defective

6 To find the overall defective rate we use the total probability rule as a defective motor still had to come from supplier A B or C P(D) = PtA and D) + P(B and D) + P(C and D) = P(A)P(DIA) + P(B)P(DIB) +

P(C)P(DIC) = (05)(006) + (04)(005) + (01 )(03) = 003 + 002 + 003 = 008

j)If 8 overall are defective then 92 are not-that is we can also conshyclude that P(DC) = 1 - P(D) = 1 - 008 = 092

b the motor came from supplier C given that it is defective

j)This is like asking What proportion of the defectives come from supplier C Denote this probability as P(CID) we began with P(DIC) (among other probabilities---we are effectively using Bayes Theorem to reverse the order however we already have P(D) so

P(CID) = P(~(~D) gg~ = 0375

3

symmetric unbounded bellshy(or some shaped arises commonly in other nature and in statistics as a reshyletter) sult of the central limit theorem

n Many other distributions approach the normal as n ~ (or some other parameter such as A or df) increases

standard Z I II = mean = 0 a special variant of normal normal a = standard with II = 0 and (J = 1

deviation = 1 represented in Z tables

jJ Used for inference about proportions the cumulative probability is provided in Z tables For a particular value z the cumulative probability is ltl(z) = P(Z lt z) ie the area under the density curve to the left of z

students t t df = degrees similar in shape to normal of freedom II = 0 (always)

jJ Used for inference about means

chi-square X2 df = degrees not symmetric (skewed right) of freedom

jJ Used for inferences about categorical distributions

Sample Problems amp

1 For a standard normal random variable Z find p(Z lt 15) n Since by definition the values from the standard normal table are ~ ltIgt (z) - p(Z lt z) P(Z lt 15) = ltIgt(15) = 09332

2 For a t distribution with df = 20 which critical value of t has an area of 0 05 in the right tail

n At tabJe generally provides the tail area rather than the cumulative ~probability as g iven in standard normal tables with the row = df

20 and the column =tail area = 005 a t table produces the value of 1725

3 The heights of military recruits follow a normal distribution with a mean of 70 inches and a standard deviation of 4 inches find the probashybility that a randomly chosen recruit is

a shorter than 60 inches jJ First we must transform values of the variable (height) to the

standard normal distribution by taking z scores here

z= x - p = 60 - 70 = -10 = -25 a 4 4

6 Since we want the less than probability the solution comes directly from the standard normal z table

P(X lt 60) = p(Z lt -25) = lt1gt(-25) = 00062

b taller than 72 inches n x-J 72-70 2 ~ First the z score Z = a = --4- =4 =05

11- Since this is a greater than probability subtract the cumulative ~ probability from 1

P(X gt 72) = P(Z gt 05) = 1 - lt1gt(05) = 1 - 06915 = 03085

c between 64 and 76 inches tall

fJ In this case there are two boundaries The only way to find the area under the curve between them is to find the cumulative probabilishyties for each and then to subtract this entails finding z scores for both X = 64 md X = 76

z= x- p = 64-70 = -6 =-15 andz = x- J = 76-70 =sect = 15 a 44 - a 44

Now p(64 lt x lt 76) = P (-1 5 lt Z lt 15) = lt1gt(15) - lt1gt(-15) = 09332 - 00668 = 08664

ltP(z) = P(ZltZ)

o z

Because sample statistics are derived from random samples they are random The probability distribution of a statistic is called its samshypling distribution

Due to the central limit theoshy jJ if n 2 30 or if the populationrem some important statistics distribution is normalhave sampling distributions

that approach a normal sample p distribution as the sample size proportion increases (these are listed in the table at right)

Knowing the expected value and standard error allows us to find probabilities then in tum we can use the properties of these samshypling distributions to make inferences about the parameter values when we do not know them as in real-world applications

Sample Problems amp

1 60 of the registered voters in a large district plan to vote in favor of a referendum a random sample of 340 of these voters is selected

a What is the expected value of the sample proportion

E(p)= p=O6 b What is the standard error of the sample proportion

SE(P)=~P(IP) =~06~~0 6) =00266

c What is the probability that the sample proportion is between 55 and 65

jJ First find the z scores for those proportions p(p) 055-06 -005

z= SE(p) = 00266 00266 =-188 and

p(p) 065-06 005 z= SE(p) = 00266 00266 = 188

Now P (055) P (065) = P - (188) Z (1 88) = ltIgt(188) - ltIgt(-1 88) = 0 9699 - 00301 09398

2 The standard deviation of the weight of cattle in a certain herd is 160 pounds but the mean is unknown a random sample of size 100 is chosen

a Compute the standard error of the sample mean

SE(x)= c = ~ = 161bsn 100 b For an individual animal in this herd what is the probability of a

weight within 40 Ibs of the population mean

jJ Since this problem refers to a single observation not the sample mean we use the standard deviation not the standard error

6 Not knowing the value of II we can only express the boundaries for within 40 Ibs of the mean as X = II + 40 and X = II - 40

We can still compute z scores

Z = x ~p = p +1- p = 1 = 025 and

x-p p-40-p -40 z=---o-= 160 = 160 =-025

jJThat is within 40 Ibs of the mean is the same as within 025 standard deviation

We find the probability P (-025 lt Z lt 025) = lt1gt(025) - lt1gt(-025) = 05987 - 04013 = 01974

c What is the probability that the sample mean falls within 40 Ibs of the population mean

jJ Even though we dont know the population mean the z score formula will allow us to find this probability

6 Since this is the sample mean we must use the standard error of 16 Ibs rather than the standard deviation in computing the z scores

Z = x - = p +40 - P = 40 =25 and SE(X) 16 16 - x-p p-40-p -40

z= SE(X) 16 =16=-25 Now

P (-25 lt Z lt 25) = lt1gt(25) - lt1gt(-2 5) = 09938 - 00062 = 09876

6 This probability is dramatically higher than the probability for an bull individual head of cattle

is nd

ware le

0

e

of he

d in tny or any her

4

When we want to draw conclusions about a population using data from a sample we use some method of sta tistical inference

A hypothesis test is a procedure by which claims about populations (hypotheses) are evaluated on the basis of sample statistics

The procedure begins with a null hypothesis (Hj and an alternative (or research) hypothesis (H) if the sample data are too unusual assuming H to be true then H is rejected in favor ofHI otherwise we fail to reject the null hypothesis and thereby fail to support the alternatives

the null hypothesis (Ho) the alternative hypothesis (H or H)

Sample Problems amp

In each of the following cases formulate hypotheses to test the claim indicate which hypothesis represents the claim 1 The manager of a bank claims that the

average waiting time for customers is less than two minutes

~ Since the claim refers to the average bull this is a test for ~

As a less than claim it is represented by Ho and the hypothesis test is Ho JI =2 vs H JI lt 2

is assumed true for the purpose of is supported only by carrying out the test if the null carrying out the hypothesis test hypothesis can be rejected

ALWAYS provides a specific value for the I NEVER provides a specific value for the parameter ----j

parameter its null value always instead contains gt (right-tailed) lt (left-tailed) or contains = (two-tailed)

the null value implies a specific sampling without any specific value for the parameter of interest distribution for the test statistic the sampling distribution is unknown can be rejected-or not rejected- can be supported (by rejecting the null)-or not supported but NEVER supported (by failing to reject the null-but NEVER rejected

6 The tail(s) of the hypothesis test are determined by the alternative hypothesis (H)-this is one bull of the most important attributes of the t est regardless of which method is used

There are two major methods for carrying out a hypothesis test the traditional approach (or fixed significance) and the p-value app roach (observed significance) the following table lists the steps for each approach

p-value approach traditional approach

~~mulate null and alternative hypotheses formulate a null and an alternative hy--p_o_th_e_s_is___---1

~ observe sample data determine rejection region(s) based on the level of Z significance and the tail(s) of the test ~ ~ ------------~

W~ute a test statistic from sample data observe sample data _______

~ compute the p-value from the test statistic compute the test statistic from sample data ~ reject the null hypothesis (supporting the reject the null hypothesis (supporting the alternative) at

Oalternative) at a significance level 0 if the the significance level if the test statistic falls in the p-value 0 otherwise fail to reject the rejection region otherwise fail to reject the

~ null hypothesis null hypothesis

6 With the p-value approach the final decision is made by comparing probabilities whereas with the traditional approach the decision is made by comparing values of random variables because there is a one-to-one correspondence between the values of the random variables and the probabilities the two methods will always yield consistent results we can convert between the two using the following simple (but important) rule

reject the null hypothesis (Ho) ~ p-value 0

at significance level 0 f- shy

Test Statistics Parameter Test Statistic Distribution Under Ho Assumptions

popUlation proportion standard normal Z np 15 andz P- Po n(1 - p) 2 15= SE(p)

population mean t distribution n 2 30 or theX-J1 with df = n-1 population distribution t =SE(X) is normal

6 Since the t distribution approaches the standard normal Z many teachers and texts advise that bull its OK to use Z if n is sufficiently large

difference of proportions np 2 15 ltmd (independent samples) n(1-p) 2 15

test for independence X2 distribution X2 tests for categorical (categorical data) with df = (r - 1)(c - 1) data assume that the

r = of rows expected counts (E) in multinomial goodnessshy c = of columns each cell are at least of-fit (categorical data) f--------------j 5 under the null

x2 distribution with df = k - 1 hypothesis and k = of categories

x2 tests for categorical data do not have directional alternative hypotheses rejection ~ regions are always in the right tail

5

j) (left-tailed)

2 Your friend says that a coin you are tossing is not fair

A fair coin is one that shows heads 50 of ill the time the friend states that the coin is

NOT fair

This is an H claim Ho P - 05 vs H p 05 j) (two-tailed)

3 A highway patrolman claims that the average speed of cars on a highway is at most 70 mph

it The claim directly refers to the average bull since this is an at most claim it is

represented by Ho

The hypothesis test is Ho JI = 70 vs H JI gt 70

j) (right-tailed)

4 A motorist claims that more than 80 of the cars on a highway travel at a speed exceeding 70 mph

Since the claim is really about a proportionshyill dont be fooled by the 70 mph-the

hypotheses refer to p As the motorist makes a more than claim it is the null hypothesis Ho Ho P =OS vs H pgt OS

P (right-tailed)

5 The manager of a snack-food factory states that the average weight of a bag of their potato chips is exactly 5 oz (no more no less)

This is an is exactly claim that refers ill to the average thus the claim is Ho

The test is Ho JI 5 vs H JI 5

j) (two-tailed)

and the hypothesis test is two-tailed

is less than alternative hypothesis (H)

and the hypothesis test is left-tailed lt

is greater than alternative hypothesis (H)

and the hypothesis test is right-tailed gt

is equal t~ null hypothesis (Hal bull IS exactly

and the hypothesis test is two-tailed

is at least null hypothesis (Hal

and the hypothesis test is left-tailed lt

is at most null hypothesis (Hal

and the hypothesis test is right-tailed gt

5tatiIIieaI _ (conIinueclJ

Sample Problems amp

Decision Reality 1_ In some hypothesis tests the null hypothesis is

Ho true Ho false rejected if an error has been made which kind of error is it

reject Ho type I error correct inference (supporting H) p(reject Ho IHo true) = a = p(reject Ho IHo false) = 1-13 =power

I i level of significance 6 The only error of inference in which the null bull hypothesis is rejected is a type I error 6 When the null hypothesis (Ho) is rejected we can support the alternative hypothesis (H)

bull This is a substantive finding We have sufficient evidence that Ho is not correct 2_ A researcher conducts a hypothesis test at a

fail to reject Ho correct inference type II error significance level of 005 and computer software (failing to support H) p(fail to reject Ho IHo false) = p(fail to reject Ho IHo true) = 13 produces a p-value of 00912 unknown to the

1 - ex = level of confidence researcher the null hypothesis is really falseshy6 If Ho is not rejected then we cannot support H either this is NOT a substantive finding We have what is her decision ls it some type of error

bull failed to find evidence against Ho but have not confirmed or proved it to be true

P First consider her decision notes Under the null hypothesis we have If the null hypothesis is false there is no a specific value for the parameter specific value for the parameter She will reject or fail to reject the null This determines a specific sampling Thus we can only estimate 13 and 1 -13 hypothesis we have no test statistic distribution so that ex and by making some alternative assumption only a p-value 1 - ex can be precisely determined about the parameter it But since the p-value is less than the

6 It is important to note that these probabilities are conditioned on reality rather than the decision significance level 0 Ho is rejected but also bull That is given that Ho is true ex is the probability of rejecting Ho it is NOT the probability that Ho since Ho is false this is a type II error

is true given that it has been rejected

Finding Rejection Regions amp P-Values Percentage Tail(s) of Rejection Region bull P-Value Cumulative Hypothesis Test Distribution lt left-tailed values of the test statistic less than some area under the density curve to the

critical value with area a in the left tail left of the test statistic for selected z values under a normal curve

gt right-tailed values of the test statistic greater than some area under the density curve to the critical value with area a in the right tail right of the test statistic

~ values of the test statistic less than some double the tail area under the curve

Z critical value with area a in the left tail or away from the test statistic two-tailed

greater than some critical value with area a in the right tailW z -value -3 -2 -I 0 + 1 +2 +3

~ Sample Problems amp

o1 At an aquaculture facility a large number of eels are kept in a tank they die QSf b What is the probability that shell catch a mouse on her first attempt

I11III independently of each other at an average rate of 25 eels per day With a 20 chance of success each time the probability of succeeding the

iIIIII a Which distribution is appropriate first time is simply 02 n Since the events are independent and were given an average rate per jJ We can also use the geometric pdf with x=1 P(1) = (1_02) (02) = 0_2 ~ fixed interval a Poisson distribution can be used with parameter A = 25

c What is the probability that shell catch a mouse on her third attemptb_ Find the probability that exactly two eels die in a given day jJ Find P(X) for X = 2 e-2s25 jJ The first success occurring on the third trial means

P(2) =-V=0-1283 x = 3 P(3) = (1 - 02)3(02) = (08)2(02) = 0_128

c What is the probability that at least one eel dies in the span of one day d How many times is she expected to pounce until she succeeds jJ Since the Poisson distribution has no maximum there is no alternative but E(X)=l=_1 = 5

to use the law of complements Plat least one dies)= 1- P(none at all die) = p 02

1- P(O)= 1- e- ~~5() =l-e-25 =1-00821 = 09179 3_ John is playing darts each time he throws a dart he has an 8 chance of hitting a bulls-eye independently of the result for any other dart thrown he HAlm d_ Compute the probability that at least one eel dies in the span of 12 hours throws a total of five darts 6 This is harder since the duration of the interval has changed but we can a Which distribution is appropriate

scale the Poisson parameter A proportionally If the average rate is 25 jJ With a constant probability of success and a fixed number of eels per day then the rate is 125 (half as many) per half-day thus independent events the total number of successes follows a binomial

1_ P(O) =1- e-L2~15deg 1- e-L2S = 1-02865 = 0_7135 distribution with parameters n =5 P =0_08 b How many bulls-eyes is John expected to hit E(X) =np = 5(008) = 04

2 A cat is hunting some mice every time she pounces at a mouse she has c What is the probability that he hits exactly two bulls-eyes a 20 chance of catching the mouse but will stop hunting as soon as she

catches one jJ x = 2 P(X) = sC20082(1 - 008)5-2 = (10)(00064)(092)3 = 0_0498 a Which distribution is appropriate

d_ What is the probability that he hits at least one bulls-eye411 n As there is a fixed probability of the event but the experiment will be II1II ~ repeated until the event occurs a geometric distribution can be used jJ As always Plat least one) = 1 - P(none at all)Z with parameter p = 02 = 1 - P(O) = 1 - 5CO 0080(1 - 008)50 = 1 - 0925 =1 - 06591 = 03409

US $5_95 fr~e downloadamp amphU[lOceoamp or titles at

L Customer Hotline 18002309522 qUICKstudYmiddotCOm

oNOTE TO STUDENT This guide is intended for informational purposes only Due ISBN-13 978-142320857-0 to its condensed format thi s guide cannot cover every aspect of the subject rather ISBN-10 142320857-9

bull tIl Jt 26it is intended for use in conjunction with course work and ass igned texts Neither BarChans Inc its writers editors nor design sta ff are in any way responsible or All rights reserved No part of this publicat ion may be reproduced or transmitted in any liable for the use or misuse of the infonnation conta ined in thi s guide form or by any means electronic or mechanical including photocopy recording or any

information storage and retrieval system wi thout written permission from the publisher

~ 911~ j1111~ ~lllll~Ijllil~llllfII1llfIlillllAUTHOR Stephen V Kizlik PhD copy 2009 BarCharts Inc 0409

---

KEY TERMS amp SYMBOLS

probability experiment any process with an outcome regarded as random

sample space (S) the set of all possible outcomes toss a fair coin twice P HH HT TH TT there are two ways to get heads just once from a probability experime nt

events (A B C etc) subsets of the sample spaceroll a fair die 1 2 3 4 5 6 many problems are best solved by a careful consid e ra tishy

on of the defined eventsroll two fair dice ( (1 1) (12) (13) (21) (22) (23) (64) (65) (66) ~ a total of 36 outcomes six for the first die times another

peA) the probability of event A for any event Asix for the second die os P(A)S 1 and for the e ntire sample space 5 P(S) =1 --~-oy- ir- __r-b-- g-I or B Ghave a baby

equally likely outcomes a very common assumptionpick an orange from one of the ( some positive real number in some unit of weight this in solving problems in probability if a ll outcomes in the trees in a grove and weigh it ~ would be a continuous sample space sample space 5 are equally likely then the probability

of some event A can be calculated as

P A) = number of simple ollcmlles E A

( lolaI nunlber of s111ple Olltc0l11eS

Probability Rules ill Knowing that events are disjoint can make things much easier since Rule Formula

otherwise peA and B) can be difficult to find addition rule peA or B) = peA) + PCB) - peA and B) (or) if A and B are disjoint peA or B) = P(A) + PCB)complementary the complement of peA) + peN) = 1

event A (denoted AC or A) (any event will either means not A it consists happen or not) thus ill Subtract peA and B) so as not to count twice the elements of both of all simple outcomes in peA) = 1 - peN) A and B 5 that are not in A P(AC) = 1 - P(A)

multiplication peA and B) = P(A)P(BIA) equivalently peA and B) = P(B)P(AIB) ( The law of complements is a useful tool since its often easier to find the rule (and) if A and B are independent peA and B) = P(A)P(B) ~ probability that an event does NOT occur ill While it doesnt matter whethe r we condition on A (first) or condition

on B (second) generally the information available will require one orindependent the occurrence of one P(AIB) = peA)

the otherevent does not affect the and P(BIA) = PCB) probability of the other so peA and B) = P(A)P(B) conditional P(AandS) P(AandS)and vice versa Aprobability rule p(AIB) = PB) p(BIA) = PA)

(given that)( Events are often assumed to be independent particularly ~ repeated trials jJ By multiplying both sides by PC B) or peA) we see this is a re phrasing of

the multiplication rule conditional probabilities are often difficult to assess an alternative way of thinking about P(AIB) is that it is the

Probability Distributions proportion of elements in B that are ALSO in A

When some number is derived from a probability experiment it is called a total To find the probability of an event A if the samplerandom variable probability space is partitioned into several disjoint and exhaustive

Every random variable has a probability distribution that determines the rule events 0 0 0 0 then since A must occurbull 2 3 ~probabilities of particular values along With one and only one of the Ds

For instance when you roll a fair six-sided die the resulting number (X) is a peA) = peA and 0) + P(A and O2) + + peA and OJ random variable with the following IDKrm probability distribution =P(D)P(AID) + P(DJP(AIDz) + + P(Dk)P(AIDk)

In the table to the right P(X) is called the probability ill The total probability rule may look complicated but it isnt distribution function (pdf) (see sample problem 3a next page)

Since each value of P(X) represents a probability pdfs must follow the basic probability rules PiX) must always be Bayes With two events A and B using the total probability rule between 0 and I and all of the values P(X) sum to I Theorem

P(BA)= P(AandB) = P(4andB) _ P(B)(AiS)Other probability distributions are continuous They do not asshy I P(A) PAltndB)P(AandB ) P(B)P(A IB)+P(B )(A IB ) sign specific probabilities to specific values as above in the discrete case instead we can measure probabilities only over ( Bayes Theorem allows us to reverse the order of a conditional a range of values using the area under the curve of a probshy ~ probability statement and is the only generally valid method ability density function Much like data variables we often measure the mean (expectation) and standard deviation of random variables if we can charshy Sample Problems amp acterize a random variable as belonging to some major family (see table below)

1 Discrete random variable X follows thewe can find the mean and standard deviation easily in general we have following probability distribution

Type of Random Variable General Formula for Mean General Formula for Standard Deviation X 0 1 2 3 sums PIX) 01 5 025 04 02 1 (always) XP(X) 0 025 O B 06 165-E(X) discrete ( ) ()

(X takes some countable J1=E X =LXP X a=sD(x)=liX2 p(X)-f12 X PIX) 0 025 16 18 365 number of specific values)

--------~~-----------------------4------- - a What is the expected value of X continuous ( ) f () J(X has uncountable J1 = E X = XP X dX a =SD(X) = f X P(X)dX -1 J1 =E +(X) =IXp(X) = 165 possible values and P(X)

b What is the standard deviation of Xcan be measured only over intervals) a=SD(X)=JIX2 P(X)-f1 2 = ill Fortunately most useful continuous probability distributions do not require integration in practice

other formulas and tables are used )365 -1652 =~O9275 =0963

2

all outcomes are consecutive integers and all are equally likely

some fixed number of independent trials with the same n = fixed number of trials np ~l1p(l-p)binomial probability of a given event each time X = total number of

times the event occurs p = probability that the designated

event occurs on a given trial

Commonly used distribution symmetric if p = 05 only valid values for X are 0 X n

events occur independently at some average rate per interval of timespace X = total number of times the event occurs

A = mean number of events per interval

6 There is no upper limit on X for the Poisson distribution

geometric a series of independent trials with the same probability of a p = probability that the event occurs P(X) = (1 _ py1 pgiven event X = of trials until the event occurs on a given trial

6 Sincee only count trials until the event occurs the first time there is no need to count the C x arrangements as in the binomial

hyper- drawing samples from a finite population with a categorical N =

geometric outcome X = of elements in the sample that fall n = in the category of interest K =

Sample Problems amp

1 A sock drawer contains nine black socks six blue socks and five white socks-none paired up reach in and take two socks at random without replacement find the probability that

6 There are 20 socks total in the drawer (9 + 6 + 5 = 20) before any are taken out in situations like this without any other information we should assume that each sock is equally likely to be chosen

aboth socks are black

j)P(both are black) = P(first is black AND second is black) = P(first is black)p(second is black I first is black)

=2- x = ~=R= 0189 20 19 20 x 19 380

b both socks are white

j)[Expect a smaller probability than in the preceding problem as there are fewer white socks from which to choose]

As above we lose both one of the socks in the category as well as one of the socks total after selecting the first

2 x -plusmn- = ~ = ~ = 0053 20 19 20 x 19 380

cthe two socks match (ie that they are of the same color)

j)There are only three colors of sock in the d rawer P(match) = P(both black) + P(both blue) + P(both white)

9 8 6 5 5 4 122 = 20 x 19 + 20 x 19 + 20 x 19= 380 =0321

d the socks DO NOT match

6 For the socks not to match we could have the first black and the seshycond blue or the first blue and the second white or a bunch of other possibilities too it is much safer as well as easier to use the rule for complements-common sense dictates that the socks will either match or not match so P(socks DO NOT match) = 1 - P(socks do match) - 1 - 0321 =0690

2 In a particular county 88 of homes have air conditioning 27 have a swimming pool and 23 have both what is the probability that one of these homes chosen at random has

a air conditioning OR a pool

n The given percentages can be taken as probabi lities for these events ~ so we have P(AC) = 088 P(pool) = 027 and P(AC and pool) = 023

b NEITHER air conditioning NOR a pool

j)By the addition rule P(AC or pool) = P(AC) + P(pool) - P(AC and pool) 088 + 027 - 023 = 092

Upon examination of the event this is the complement of the above event P(neither AC nor pool) = P(no AC AND no pool) = 1 - P(AC or pool) = 1 - 092 = 008

population size p(x)= KCN - KC - sample size number in category in population N C

chas a pool given that it has air conditioning

6 This is the same as asking What proportion of the homes with air conditioning also have pools Whenever we use the phrase given that a conditional probability is indicated

P(pool lAC) = P(pool and AC) 023 _ P(AC) 088 - 0261

d has air conditioning given that it has a pool

j)This probability is much greater since more homes have air conditioning than pools

6 [CAUTION This is NOT the same as the preceding problem-now were asked what proportion of homes that have pools ALSO have air

conditioning]

The event in the numerator is the same what has changed is the condition

P(AC I 001) = P (pool and AC ) 0deg2273 = 0852 P P( AC)

3 The TIC Corporation manufactures ceiling fans each fan contains an electric motor which TIC buys from one of three suppliers 50 of their motors from supplier A 40 from supplier B and 10 from supplier C of course some of the motors they buy are defective-the defective rate is 6 for supplier A 5 for supplier B and 30 for supplier C one of these motors is chosen at random find the probability that

n We have here a bunch of statements of probability and its useful to list ~them explicitly let events A B and C denote the supplier for a fan motor

and D denote that the motor is defective then PtA) = 05 P(B) = 04 and P(C) = 01

The information about defective rates provides conditional p robabilities

P(DIA) = 006 P(DIB) = 005 and P(DIC) = 03

We can also note the complementary probabilities of a motor not being defecshytive P(DCIA) = 094 P(DCIB) = 095 and P(DclC) = 07

a the motor is defective

6 To find the overall defective rate we use the total probability rule as a defective motor still had to come from supplier A B or C P(D) = PtA and D) + P(B and D) + P(C and D) = P(A)P(DIA) + P(B)P(DIB) +

P(C)P(DIC) = (05)(006) + (04)(005) + (01 )(03) = 003 + 002 + 003 = 008

j)If 8 overall are defective then 92 are not-that is we can also conshyclude that P(DC) = 1 - P(D) = 1 - 008 = 092

b the motor came from supplier C given that it is defective

j)This is like asking What proportion of the defectives come from supplier C Denote this probability as P(CID) we began with P(DIC) (among other probabilities---we are effectively using Bayes Theorem to reverse the order however we already have P(D) so

P(CID) = P(~(~D) gg~ = 0375

3

symmetric unbounded bellshy(or some shaped arises commonly in other nature and in statistics as a reshyletter) sult of the central limit theorem

n Many other distributions approach the normal as n ~ (or some other parameter such as A or df) increases

standard Z I II = mean = 0 a special variant of normal normal a = standard with II = 0 and (J = 1

deviation = 1 represented in Z tables

jJ Used for inference about proportions the cumulative probability is provided in Z tables For a particular value z the cumulative probability is ltl(z) = P(Z lt z) ie the area under the density curve to the left of z

students t t df = degrees similar in shape to normal of freedom II = 0 (always)

jJ Used for inference about means

chi-square X2 df = degrees not symmetric (skewed right) of freedom

jJ Used for inferences about categorical distributions

Sample Problems amp

1 For a standard normal random variable Z find p(Z lt 15) n Since by definition the values from the standard normal table are ~ ltIgt (z) - p(Z lt z) P(Z lt 15) = ltIgt(15) = 09332

2 For a t distribution with df = 20 which critical value of t has an area of 0 05 in the right tail

n At tabJe generally provides the tail area rather than the cumulative ~probability as g iven in standard normal tables with the row = df

20 and the column =tail area = 005 a t table produces the value of 1725

3 The heights of military recruits follow a normal distribution with a mean of 70 inches and a standard deviation of 4 inches find the probashybility that a randomly chosen recruit is

a shorter than 60 inches jJ First we must transform values of the variable (height) to the

standard normal distribution by taking z scores here

z= x - p = 60 - 70 = -10 = -25 a 4 4

6 Since we want the less than probability the solution comes directly from the standard normal z table

P(X lt 60) = p(Z lt -25) = lt1gt(-25) = 00062

b taller than 72 inches n x-J 72-70 2 ~ First the z score Z = a = --4- =4 =05

11- Since this is a greater than probability subtract the cumulative ~ probability from 1

P(X gt 72) = P(Z gt 05) = 1 - lt1gt(05) = 1 - 06915 = 03085

c between 64 and 76 inches tall

fJ In this case there are two boundaries The only way to find the area under the curve between them is to find the cumulative probabilishyties for each and then to subtract this entails finding z scores for both X = 64 md X = 76

z= x- p = 64-70 = -6 =-15 andz = x- J = 76-70 =sect = 15 a 44 - a 44

Now p(64 lt x lt 76) = P (-1 5 lt Z lt 15) = lt1gt(15) - lt1gt(-15) = 09332 - 00668 = 08664

ltP(z) = P(ZltZ)

o z

Because sample statistics are derived from random samples they are random The probability distribution of a statistic is called its samshypling distribution

Due to the central limit theoshy jJ if n 2 30 or if the populationrem some important statistics distribution is normalhave sampling distributions

that approach a normal sample p distribution as the sample size proportion increases (these are listed in the table at right)

Knowing the expected value and standard error allows us to find probabilities then in tum we can use the properties of these samshypling distributions to make inferences about the parameter values when we do not know them as in real-world applications

Sample Problems amp

1 60 of the registered voters in a large district plan to vote in favor of a referendum a random sample of 340 of these voters is selected

a What is the expected value of the sample proportion

E(p)= p=O6 b What is the standard error of the sample proportion

SE(P)=~P(IP) =~06~~0 6) =00266

c What is the probability that the sample proportion is between 55 and 65

jJ First find the z scores for those proportions p(p) 055-06 -005

z= SE(p) = 00266 00266 =-188 and

p(p) 065-06 005 z= SE(p) = 00266 00266 = 188

Now P (055) P (065) = P - (188) Z (1 88) = ltIgt(188) - ltIgt(-1 88) = 0 9699 - 00301 09398

2 The standard deviation of the weight of cattle in a certain herd is 160 pounds but the mean is unknown a random sample of size 100 is chosen

a Compute the standard error of the sample mean

SE(x)= c = ~ = 161bsn 100 b For an individual animal in this herd what is the probability of a

weight within 40 Ibs of the population mean

jJ Since this problem refers to a single observation not the sample mean we use the standard deviation not the standard error

6 Not knowing the value of II we can only express the boundaries for within 40 Ibs of the mean as X = II + 40 and X = II - 40

We can still compute z scores

Z = x ~p = p +1- p = 1 = 025 and

x-p p-40-p -40 z=---o-= 160 = 160 =-025

jJThat is within 40 Ibs of the mean is the same as within 025 standard deviation

We find the probability P (-025 lt Z lt 025) = lt1gt(025) - lt1gt(-025) = 05987 - 04013 = 01974

c What is the probability that the sample mean falls within 40 Ibs of the population mean

jJ Even though we dont know the population mean the z score formula will allow us to find this probability

6 Since this is the sample mean we must use the standard error of 16 Ibs rather than the standard deviation in computing the z scores

Z = x - = p +40 - P = 40 =25 and SE(X) 16 16 - x-p p-40-p -40

z= SE(X) 16 =16=-25 Now

P (-25 lt Z lt 25) = lt1gt(25) - lt1gt(-2 5) = 09938 - 00062 = 09876

6 This probability is dramatically higher than the probability for an bull individual head of cattle

is nd

ware le

0

e

of he

d in tny or any her

4

When we want to draw conclusions about a population using data from a sample we use some method of sta tistical inference

A hypothesis test is a procedure by which claims about populations (hypotheses) are evaluated on the basis of sample statistics

The procedure begins with a null hypothesis (Hj and an alternative (or research) hypothesis (H) if the sample data are too unusual assuming H to be true then H is rejected in favor ofHI otherwise we fail to reject the null hypothesis and thereby fail to support the alternatives

the null hypothesis (Ho) the alternative hypothesis (H or H)

Sample Problems amp

In each of the following cases formulate hypotheses to test the claim indicate which hypothesis represents the claim 1 The manager of a bank claims that the

average waiting time for customers is less than two minutes

~ Since the claim refers to the average bull this is a test for ~

As a less than claim it is represented by Ho and the hypothesis test is Ho JI =2 vs H JI lt 2

is assumed true for the purpose of is supported only by carrying out the test if the null carrying out the hypothesis test hypothesis can be rejected

ALWAYS provides a specific value for the I NEVER provides a specific value for the parameter ----j

parameter its null value always instead contains gt (right-tailed) lt (left-tailed) or contains = (two-tailed)

the null value implies a specific sampling without any specific value for the parameter of interest distribution for the test statistic the sampling distribution is unknown can be rejected-or not rejected- can be supported (by rejecting the null)-or not supported but NEVER supported (by failing to reject the null-but NEVER rejected

6 The tail(s) of the hypothesis test are determined by the alternative hypothesis (H)-this is one bull of the most important attributes of the t est regardless of which method is used

There are two major methods for carrying out a hypothesis test the traditional approach (or fixed significance) and the p-value app roach (observed significance) the following table lists the steps for each approach

p-value approach traditional approach

~~mulate null and alternative hypotheses formulate a null and an alternative hy--p_o_th_e_s_is___---1

~ observe sample data determine rejection region(s) based on the level of Z significance and the tail(s) of the test ~ ~ ------------~

W~ute a test statistic from sample data observe sample data _______

~ compute the p-value from the test statistic compute the test statistic from sample data ~ reject the null hypothesis (supporting the reject the null hypothesis (supporting the alternative) at

Oalternative) at a significance level 0 if the the significance level if the test statistic falls in the p-value 0 otherwise fail to reject the rejection region otherwise fail to reject the

~ null hypothesis null hypothesis

6 With the p-value approach the final decision is made by comparing probabilities whereas with the traditional approach the decision is made by comparing values of random variables because there is a one-to-one correspondence between the values of the random variables and the probabilities the two methods will always yield consistent results we can convert between the two using the following simple (but important) rule

reject the null hypothesis (Ho) ~ p-value 0

at significance level 0 f- shy

Test Statistics Parameter Test Statistic Distribution Under Ho Assumptions

popUlation proportion standard normal Z np 15 andz P- Po n(1 - p) 2 15= SE(p)

population mean t distribution n 2 30 or theX-J1 with df = n-1 population distribution t =SE(X) is normal

6 Since the t distribution approaches the standard normal Z many teachers and texts advise that bull its OK to use Z if n is sufficiently large

difference of proportions np 2 15 ltmd (independent samples) n(1-p) 2 15

test for independence X2 distribution X2 tests for categorical (categorical data) with df = (r - 1)(c - 1) data assume that the

r = of rows expected counts (E) in multinomial goodnessshy c = of columns each cell are at least of-fit (categorical data) f--------------j 5 under the null

x2 distribution with df = k - 1 hypothesis and k = of categories

x2 tests for categorical data do not have directional alternative hypotheses rejection ~ regions are always in the right tail

5

j) (left-tailed)

2 Your friend says that a coin you are tossing is not fair

A fair coin is one that shows heads 50 of ill the time the friend states that the coin is

NOT fair

This is an H claim Ho P - 05 vs H p 05 j) (two-tailed)

3 A highway patrolman claims that the average speed of cars on a highway is at most 70 mph

it The claim directly refers to the average bull since this is an at most claim it is

represented by Ho

The hypothesis test is Ho JI = 70 vs H JI gt 70

j) (right-tailed)

4 A motorist claims that more than 80 of the cars on a highway travel at a speed exceeding 70 mph

Since the claim is really about a proportionshyill dont be fooled by the 70 mph-the

hypotheses refer to p As the motorist makes a more than claim it is the null hypothesis Ho Ho P =OS vs H pgt OS

P (right-tailed)

5 The manager of a snack-food factory states that the average weight of a bag of their potato chips is exactly 5 oz (no more no less)

This is an is exactly claim that refers ill to the average thus the claim is Ho

The test is Ho JI 5 vs H JI 5

j) (two-tailed)

and the hypothesis test is two-tailed

is less than alternative hypothesis (H)

and the hypothesis test is left-tailed lt

is greater than alternative hypothesis (H)

and the hypothesis test is right-tailed gt

is equal t~ null hypothesis (Hal bull IS exactly

and the hypothesis test is two-tailed

is at least null hypothesis (Hal

and the hypothesis test is left-tailed lt

is at most null hypothesis (Hal

and the hypothesis test is right-tailed gt

5tatiIIieaI _ (conIinueclJ

Sample Problems amp

Decision Reality 1_ In some hypothesis tests the null hypothesis is

Ho true Ho false rejected if an error has been made which kind of error is it

reject Ho type I error correct inference (supporting H) p(reject Ho IHo true) = a = p(reject Ho IHo false) = 1-13 =power

I i level of significance 6 The only error of inference in which the null bull hypothesis is rejected is a type I error 6 When the null hypothesis (Ho) is rejected we can support the alternative hypothesis (H)

bull This is a substantive finding We have sufficient evidence that Ho is not correct 2_ A researcher conducts a hypothesis test at a

fail to reject Ho correct inference type II error significance level of 005 and computer software (failing to support H) p(fail to reject Ho IHo false) = p(fail to reject Ho IHo true) = 13 produces a p-value of 00912 unknown to the

1 - ex = level of confidence researcher the null hypothesis is really falseshy6 If Ho is not rejected then we cannot support H either this is NOT a substantive finding We have what is her decision ls it some type of error

bull failed to find evidence against Ho but have not confirmed or proved it to be true

P First consider her decision notes Under the null hypothesis we have If the null hypothesis is false there is no a specific value for the parameter specific value for the parameter She will reject or fail to reject the null This determines a specific sampling Thus we can only estimate 13 and 1 -13 hypothesis we have no test statistic distribution so that ex and by making some alternative assumption only a p-value 1 - ex can be precisely determined about the parameter it But since the p-value is less than the

6 It is important to note that these probabilities are conditioned on reality rather than the decision significance level 0 Ho is rejected but also bull That is given that Ho is true ex is the probability of rejecting Ho it is NOT the probability that Ho since Ho is false this is a type II error

is true given that it has been rejected

Finding Rejection Regions amp P-Values Percentage Tail(s) of Rejection Region bull P-Value Cumulative Hypothesis Test Distribution lt left-tailed values of the test statistic less than some area under the density curve to the

critical value with area a in the left tail left of the test statistic for selected z values under a normal curve

gt right-tailed values of the test statistic greater than some area under the density curve to the critical value with area a in the right tail right of the test statistic

~ values of the test statistic less than some double the tail area under the curve

Z critical value with area a in the left tail or away from the test statistic two-tailed

greater than some critical value with area a in the right tailW z -value -3 -2 -I 0 + 1 +2 +3

~ Sample Problems amp

o1 At an aquaculture facility a large number of eels are kept in a tank they die QSf b What is the probability that shell catch a mouse on her first attempt

I11III independently of each other at an average rate of 25 eels per day With a 20 chance of success each time the probability of succeeding the

iIIIII a Which distribution is appropriate first time is simply 02 n Since the events are independent and were given an average rate per jJ We can also use the geometric pdf with x=1 P(1) = (1_02) (02) = 0_2 ~ fixed interval a Poisson distribution can be used with parameter A = 25

c What is the probability that shell catch a mouse on her third attemptb_ Find the probability that exactly two eels die in a given day jJ Find P(X) for X = 2 e-2s25 jJ The first success occurring on the third trial means

P(2) =-V=0-1283 x = 3 P(3) = (1 - 02)3(02) = (08)2(02) = 0_128

c What is the probability that at least one eel dies in the span of one day d How many times is she expected to pounce until she succeeds jJ Since the Poisson distribution has no maximum there is no alternative but E(X)=l=_1 = 5

to use the law of complements Plat least one dies)= 1- P(none at all die) = p 02

1- P(O)= 1- e- ~~5() =l-e-25 =1-00821 = 09179 3_ John is playing darts each time he throws a dart he has an 8 chance of hitting a bulls-eye independently of the result for any other dart thrown he HAlm d_ Compute the probability that at least one eel dies in the span of 12 hours throws a total of five darts 6 This is harder since the duration of the interval has changed but we can a Which distribution is appropriate

scale the Poisson parameter A proportionally If the average rate is 25 jJ With a constant probability of success and a fixed number of eels per day then the rate is 125 (half as many) per half-day thus independent events the total number of successes follows a binomial

1_ P(O) =1- e-L2~15deg 1- e-L2S = 1-02865 = 0_7135 distribution with parameters n =5 P =0_08 b How many bulls-eyes is John expected to hit E(X) =np = 5(008) = 04

2 A cat is hunting some mice every time she pounces at a mouse she has c What is the probability that he hits exactly two bulls-eyes a 20 chance of catching the mouse but will stop hunting as soon as she

catches one jJ x = 2 P(X) = sC20082(1 - 008)5-2 = (10)(00064)(092)3 = 0_0498 a Which distribution is appropriate

d_ What is the probability that he hits at least one bulls-eye411 n As there is a fixed probability of the event but the experiment will be II1II ~ repeated until the event occurs a geometric distribution can be used jJ As always Plat least one) = 1 - P(none at all)Z with parameter p = 02 = 1 - P(O) = 1 - 5CO 0080(1 - 008)50 = 1 - 0925 =1 - 06591 = 03409

US $5_95 fr~e downloadamp amphU[lOceoamp or titles at

L Customer Hotline 18002309522 qUICKstudYmiddotCOm

oNOTE TO STUDENT This guide is intended for informational purposes only Due ISBN-13 978-142320857-0 to its condensed format thi s guide cannot cover every aspect of the subject rather ISBN-10 142320857-9

bull tIl Jt 26it is intended for use in conjunction with course work and ass igned texts Neither BarChans Inc its writers editors nor design sta ff are in any way responsible or All rights reserved No part of this publicat ion may be reproduced or transmitted in any liable for the use or misuse of the infonnation conta ined in thi s guide form or by any means electronic or mechanical including photocopy recording or any

information storage and retrieval system wi thout written permission from the publisher

~ 911~ j1111~ ~lllll~Ijllil~llllfII1llfIlillllAUTHOR Stephen V Kizlik PhD copy 2009 BarCharts Inc 0409

all outcomes are consecutive integers and all are equally likely

some fixed number of independent trials with the same n = fixed number of trials np ~l1p(l-p)binomial probability of a given event each time X = total number of

times the event occurs p = probability that the designated

event occurs on a given trial

Commonly used distribution symmetric if p = 05 only valid values for X are 0 X n

events occur independently at some average rate per interval of timespace X = total number of times the event occurs

A = mean number of events per interval

6 There is no upper limit on X for the Poisson distribution

geometric a series of independent trials with the same probability of a p = probability that the event occurs P(X) = (1 _ py1 pgiven event X = of trials until the event occurs on a given trial

6 Sincee only count trials until the event occurs the first time there is no need to count the C x arrangements as in the binomial

hyper- drawing samples from a finite population with a categorical N =

geometric outcome X = of elements in the sample that fall n = in the category of interest K =

Sample Problems amp

1 A sock drawer contains nine black socks six blue socks and five white socks-none paired up reach in and take two socks at random without replacement find the probability that

6 There are 20 socks total in the drawer (9 + 6 + 5 = 20) before any are taken out in situations like this without any other information we should assume that each sock is equally likely to be chosen

aboth socks are black

j)P(both are black) = P(first is black AND second is black) = P(first is black)p(second is black I first is black)

=2- x = ~=R= 0189 20 19 20 x 19 380

b both socks are white

j)[Expect a smaller probability than in the preceding problem as there are fewer white socks from which to choose]

As above we lose both one of the socks in the category as well as one of the socks total after selecting the first

2 x -plusmn- = ~ = ~ = 0053 20 19 20 x 19 380

cthe two socks match (ie that they are of the same color)

j)There are only three colors of sock in the d rawer P(match) = P(both black) + P(both blue) + P(both white)

9 8 6 5 5 4 122 = 20 x 19 + 20 x 19 + 20 x 19= 380 =0321

d the socks DO NOT match

6 For the socks not to match we could have the first black and the seshycond blue or the first blue and the second white or a bunch of other possibilities too it is much safer as well as easier to use the rule for complements-common sense dictates that the socks will either match or not match so P(socks DO NOT match) = 1 - P(socks do match) - 1 - 0321 =0690

2 In a particular county 88 of homes have air conditioning 27 have a swimming pool and 23 have both what is the probability that one of these homes chosen at random has

a air conditioning OR a pool

n The given percentages can be taken as probabi lities for these events ~ so we have P(AC) = 088 P(pool) = 027 and P(AC and pool) = 023

b NEITHER air conditioning NOR a pool

j)By the addition rule P(AC or pool) = P(AC) + P(pool) - P(AC and pool) 088 + 027 - 023 = 092

Upon examination of the event this is the complement of the above event P(neither AC nor pool) = P(no AC AND no pool) = 1 - P(AC or pool) = 1 - 092 = 008

population size p(x)= KCN - KC - sample size number in category in population N C

chas a pool given that it has air conditioning

6 This is the same as asking What proportion of the homes with air conditioning also have pools Whenever we use the phrase given that a conditional probability is indicated

P(pool lAC) = P(pool and AC) 023 _ P(AC) 088 - 0261

d has air conditioning given that it has a pool

j)This probability is much greater since more homes have air conditioning than pools

6 [CAUTION This is NOT the same as the preceding problem-now were asked what proportion of homes that have pools ALSO have air

conditioning]

The event in the numerator is the same what has changed is the condition

P(AC I 001) = P (pool and AC ) 0deg2273 = 0852 P P( AC)

3 The TIC Corporation manufactures ceiling fans each fan contains an electric motor which TIC buys from one of three suppliers 50 of their motors from supplier A 40 from supplier B and 10 from supplier C of course some of the motors they buy are defective-the defective rate is 6 for supplier A 5 for supplier B and 30 for supplier C one of these motors is chosen at random find the probability that

n We have here a bunch of statements of probability and its useful to list ~them explicitly let events A B and C denote the supplier for a fan motor

and D denote that the motor is defective then PtA) = 05 P(B) = 04 and P(C) = 01

The information about defective rates provides conditional p robabilities

P(DIA) = 006 P(DIB) = 005 and P(DIC) = 03

We can also note the complementary probabilities of a motor not being defecshytive P(DCIA) = 094 P(DCIB) = 095 and P(DclC) = 07

a the motor is defective

6 To find the overall defective rate we use the total probability rule as a defective motor still had to come from supplier A B or C P(D) = PtA and D) + P(B and D) + P(C and D) = P(A)P(DIA) + P(B)P(DIB) +

P(C)P(DIC) = (05)(006) + (04)(005) + (01 )(03) = 003 + 002 + 003 = 008

j)If 8 overall are defective then 92 are not-that is we can also conshyclude that P(DC) = 1 - P(D) = 1 - 008 = 092

b the motor came from supplier C given that it is defective

j)This is like asking What proportion of the defectives come from supplier C Denote this probability as P(CID) we began with P(DIC) (among other probabilities---we are effectively using Bayes Theorem to reverse the order however we already have P(D) so

P(CID) = P(~(~D) gg~ = 0375

3

symmetric unbounded bellshy(or some shaped arises commonly in other nature and in statistics as a reshyletter) sult of the central limit theorem

n Many other distributions approach the normal as n ~ (or some other parameter such as A or df) increases

standard Z I II = mean = 0 a special variant of normal normal a = standard with II = 0 and (J = 1

deviation = 1 represented in Z tables

jJ Used for inference about proportions the cumulative probability is provided in Z tables For a particular value z the cumulative probability is ltl(z) = P(Z lt z) ie the area under the density curve to the left of z

students t t df = degrees similar in shape to normal of freedom II = 0 (always)

jJ Used for inference about means

chi-square X2 df = degrees not symmetric (skewed right) of freedom

jJ Used for inferences about categorical distributions

Sample Problems amp

1 For a standard normal random variable Z find p(Z lt 15) n Since by definition the values from the standard normal table are ~ ltIgt (z) - p(Z lt z) P(Z lt 15) = ltIgt(15) = 09332

2 For a t distribution with df = 20 which critical value of t has an area of 0 05 in the right tail

n At tabJe generally provides the tail area rather than the cumulative ~probability as g iven in standard normal tables with the row = df

20 and the column =tail area = 005 a t table produces the value of 1725

3 The heights of military recruits follow a normal distribution with a mean of 70 inches and a standard deviation of 4 inches find the probashybility that a randomly chosen recruit is

a shorter than 60 inches jJ First we must transform values of the variable (height) to the

standard normal distribution by taking z scores here

z= x - p = 60 - 70 = -10 = -25 a 4 4

6 Since we want the less than probability the solution comes directly from the standard normal z table

P(X lt 60) = p(Z lt -25) = lt1gt(-25) = 00062

b taller than 72 inches n x-J 72-70 2 ~ First the z score Z = a = --4- =4 =05

11- Since this is a greater than probability subtract the cumulative ~ probability from 1

P(X gt 72) = P(Z gt 05) = 1 - lt1gt(05) = 1 - 06915 = 03085

c between 64 and 76 inches tall

fJ In this case there are two boundaries The only way to find the area under the curve between them is to find the cumulative probabilishyties for each and then to subtract this entails finding z scores for both X = 64 md X = 76

z= x- p = 64-70 = -6 =-15 andz = x- J = 76-70 =sect = 15 a 44 - a 44

Now p(64 lt x lt 76) = P (-1 5 lt Z lt 15) = lt1gt(15) - lt1gt(-15) = 09332 - 00668 = 08664

ltP(z) = P(ZltZ)

o z

Because sample statistics are derived from random samples they are random The probability distribution of a statistic is called its samshypling distribution

Due to the central limit theoshy jJ if n 2 30 or if the populationrem some important statistics distribution is normalhave sampling distributions

that approach a normal sample p distribution as the sample size proportion increases (these are listed in the table at right)

Knowing the expected value and standard error allows us to find probabilities then in tum we can use the properties of these samshypling distributions to make inferences about the parameter values when we do not know them as in real-world applications

Sample Problems amp

1 60 of the registered voters in a large district plan to vote in favor of a referendum a random sample of 340 of these voters is selected

a What is the expected value of the sample proportion

E(p)= p=O6 b What is the standard error of the sample proportion

SE(P)=~P(IP) =~06~~0 6) =00266

c What is the probability that the sample proportion is between 55 and 65

jJ First find the z scores for those proportions p(p) 055-06 -005

z= SE(p) = 00266 00266 =-188 and

p(p) 065-06 005 z= SE(p) = 00266 00266 = 188

Now P (055) P (065) = P - (188) Z (1 88) = ltIgt(188) - ltIgt(-1 88) = 0 9699 - 00301 09398

2 The standard deviation of the weight of cattle in a certain herd is 160 pounds but the mean is unknown a random sample of size 100 is chosen

a Compute the standard error of the sample mean

SE(x)= c = ~ = 161bsn 100 b For an individual animal in this herd what is the probability of a

weight within 40 Ibs of the population mean

jJ Since this problem refers to a single observation not the sample mean we use the standard deviation not the standard error

6 Not knowing the value of II we can only express the boundaries for within 40 Ibs of the mean as X = II + 40 and X = II - 40

We can still compute z scores

Z = x ~p = p +1- p = 1 = 025 and

x-p p-40-p -40 z=---o-= 160 = 160 =-025

jJThat is within 40 Ibs of the mean is the same as within 025 standard deviation

We find the probability P (-025 lt Z lt 025) = lt1gt(025) - lt1gt(-025) = 05987 - 04013 = 01974

c What is the probability that the sample mean falls within 40 Ibs of the population mean

jJ Even though we dont know the population mean the z score formula will allow us to find this probability

6 Since this is the sample mean we must use the standard error of 16 Ibs rather than the standard deviation in computing the z scores

Z = x - = p +40 - P = 40 =25 and SE(X) 16 16 - x-p p-40-p -40

z= SE(X) 16 =16=-25 Now

P (-25 lt Z lt 25) = lt1gt(25) - lt1gt(-2 5) = 09938 - 00062 = 09876

6 This probability is dramatically higher than the probability for an bull individual head of cattle

is nd

ware le

0

e

of he

d in tny or any her

4

When we want to draw conclusions about a population using data from a sample we use some method of sta tistical inference

A hypothesis test is a procedure by which claims about populations (hypotheses) are evaluated on the basis of sample statistics

The procedure begins with a null hypothesis (Hj and an alternative (or research) hypothesis (H) if the sample data are too unusual assuming H to be true then H is rejected in favor ofHI otherwise we fail to reject the null hypothesis and thereby fail to support the alternatives

the null hypothesis (Ho) the alternative hypothesis (H or H)

Sample Problems amp

In each of the following cases formulate hypotheses to test the claim indicate which hypothesis represents the claim 1 The manager of a bank claims that the

average waiting time for customers is less than two minutes

~ Since the claim refers to the average bull this is a test for ~

As a less than claim it is represented by Ho and the hypothesis test is Ho JI =2 vs H JI lt 2

is assumed true for the purpose of is supported only by carrying out the test if the null carrying out the hypothesis test hypothesis can be rejected

ALWAYS provides a specific value for the I NEVER provides a specific value for the parameter ----j

parameter its null value always instead contains gt (right-tailed) lt (left-tailed) or contains = (two-tailed)

the null value implies a specific sampling without any specific value for the parameter of interest distribution for the test statistic the sampling distribution is unknown can be rejected-or not rejected- can be supported (by rejecting the null)-or not supported but NEVER supported (by failing to reject the null-but NEVER rejected

6 The tail(s) of the hypothesis test are determined by the alternative hypothesis (H)-this is one bull of the most important attributes of the t est regardless of which method is used

There are two major methods for carrying out a hypothesis test the traditional approach (or fixed significance) and the p-value app roach (observed significance) the following table lists the steps for each approach

p-value approach traditional approach

~~mulate null and alternative hypotheses formulate a null and an alternative hy--p_o_th_e_s_is___---1

~ observe sample data determine rejection region(s) based on the level of Z significance and the tail(s) of the test ~ ~ ------------~

W~ute a test statistic from sample data observe sample data _______

~ compute the p-value from the test statistic compute the test statistic from sample data ~ reject the null hypothesis (supporting the reject the null hypothesis (supporting the alternative) at

Oalternative) at a significance level 0 if the the significance level if the test statistic falls in the p-value 0 otherwise fail to reject the rejection region otherwise fail to reject the

~ null hypothesis null hypothesis

6 With the p-value approach the final decision is made by comparing probabilities whereas with the traditional approach the decision is made by comparing values of random variables because there is a one-to-one correspondence between the values of the random variables and the probabilities the two methods will always yield consistent results we can convert between the two using the following simple (but important) rule

reject the null hypothesis (Ho) ~ p-value 0

at significance level 0 f- shy

Test Statistics Parameter Test Statistic Distribution Under Ho Assumptions

popUlation proportion standard normal Z np 15 andz P- Po n(1 - p) 2 15= SE(p)

population mean t distribution n 2 30 or theX-J1 with df = n-1 population distribution t =SE(X) is normal

6 Since the t distribution approaches the standard normal Z many teachers and texts advise that bull its OK to use Z if n is sufficiently large

difference of proportions np 2 15 ltmd (independent samples) n(1-p) 2 15

test for independence X2 distribution X2 tests for categorical (categorical data) with df = (r - 1)(c - 1) data assume that the

r = of rows expected counts (E) in multinomial goodnessshy c = of columns each cell are at least of-fit (categorical data) f--------------j 5 under the null

x2 distribution with df = k - 1 hypothesis and k = of categories

x2 tests for categorical data do not have directional alternative hypotheses rejection ~ regions are always in the right tail

5

j) (left-tailed)

2 Your friend says that a coin you are tossing is not fair

A fair coin is one that shows heads 50 of ill the time the friend states that the coin is

NOT fair

This is an H claim Ho P - 05 vs H p 05 j) (two-tailed)

3 A highway patrolman claims that the average speed of cars on a highway is at most 70 mph

it The claim directly refers to the average bull since this is an at most claim it is

represented by Ho

The hypothesis test is Ho JI = 70 vs H JI gt 70

j) (right-tailed)

4 A motorist claims that more than 80 of the cars on a highway travel at a speed exceeding 70 mph

Since the claim is really about a proportionshyill dont be fooled by the 70 mph-the

hypotheses refer to p As the motorist makes a more than claim it is the null hypothesis Ho Ho P =OS vs H pgt OS

P (right-tailed)

5 The manager of a snack-food factory states that the average weight of a bag of their potato chips is exactly 5 oz (no more no less)

This is an is exactly claim that refers ill to the average thus the claim is Ho

The test is Ho JI 5 vs H JI 5

j) (two-tailed)

and the hypothesis test is two-tailed

is less than alternative hypothesis (H)

and the hypothesis test is left-tailed lt

is greater than alternative hypothesis (H)

and the hypothesis test is right-tailed gt

is equal t~ null hypothesis (Hal bull IS exactly

and the hypothesis test is two-tailed

is at least null hypothesis (Hal

and the hypothesis test is left-tailed lt

is at most null hypothesis (Hal

and the hypothesis test is right-tailed gt

5tatiIIieaI _ (conIinueclJ

Sample Problems amp

Decision Reality 1_ In some hypothesis tests the null hypothesis is

Ho true Ho false rejected if an error has been made which kind of error is it

reject Ho type I error correct inference (supporting H) p(reject Ho IHo true) = a = p(reject Ho IHo false) = 1-13 =power

I i level of significance 6 The only error of inference in which the null bull hypothesis is rejected is a type I error 6 When the null hypothesis (Ho) is rejected we can support the alternative hypothesis (H)

bull This is a substantive finding We have sufficient evidence that Ho is not correct 2_ A researcher conducts a hypothesis test at a

fail to reject Ho correct inference type II error significance level of 005 and computer software (failing to support H) p(fail to reject Ho IHo false) = p(fail to reject Ho IHo true) = 13 produces a p-value of 00912 unknown to the

1 - ex = level of confidence researcher the null hypothesis is really falseshy6 If Ho is not rejected then we cannot support H either this is NOT a substantive finding We have what is her decision ls it some type of error

bull failed to find evidence against Ho but have not confirmed or proved it to be true

P First consider her decision notes Under the null hypothesis we have If the null hypothesis is false there is no a specific value for the parameter specific value for the parameter She will reject or fail to reject the null This determines a specific sampling Thus we can only estimate 13 and 1 -13 hypothesis we have no test statistic distribution so that ex and by making some alternative assumption only a p-value 1 - ex can be precisely determined about the parameter it But since the p-value is less than the

6 It is important to note that these probabilities are conditioned on reality rather than the decision significance level 0 Ho is rejected but also bull That is given that Ho is true ex is the probability of rejecting Ho it is NOT the probability that Ho since Ho is false this is a type II error

is true given that it has been rejected

Finding Rejection Regions amp P-Values Percentage Tail(s) of Rejection Region bull P-Value Cumulative Hypothesis Test Distribution lt left-tailed values of the test statistic less than some area under the density curve to the

critical value with area a in the left tail left of the test statistic for selected z values under a normal curve

gt right-tailed values of the test statistic greater than some area under the density curve to the critical value with area a in the right tail right of the test statistic

~ values of the test statistic less than some double the tail area under the curve

Z critical value with area a in the left tail or away from the test statistic two-tailed

greater than some critical value with area a in the right tailW z -value -3 -2 -I 0 + 1 +2 +3

~ Sample Problems amp

o1 At an aquaculture facility a large number of eels are kept in a tank they die QSf b What is the probability that shell catch a mouse on her first attempt

I11III independently of each other at an average rate of 25 eels per day With a 20 chance of success each time the probability of succeeding the

iIIIII a Which distribution is appropriate first time is simply 02 n Since the events are independent and were given an average rate per jJ We can also use the geometric pdf with x=1 P(1) = (1_02) (02) = 0_2 ~ fixed interval a Poisson distribution can be used with parameter A = 25

c What is the probability that shell catch a mouse on her third attemptb_ Find the probability that exactly two eels die in a given day jJ Find P(X) for X = 2 e-2s25 jJ The first success occurring on the third trial means

P(2) =-V=0-1283 x = 3 P(3) = (1 - 02)3(02) = (08)2(02) = 0_128

c What is the probability that at least one eel dies in the span of one day d How many times is she expected to pounce until she succeeds jJ Since the Poisson distribution has no maximum there is no alternative but E(X)=l=_1 = 5

to use the law of complements Plat least one dies)= 1- P(none at all die) = p 02

1- P(O)= 1- e- ~~5() =l-e-25 =1-00821 = 09179 3_ John is playing darts each time he throws a dart he has an 8 chance of hitting a bulls-eye independently of the result for any other dart thrown he HAlm d_ Compute the probability that at least one eel dies in the span of 12 hours throws a total of five darts 6 This is harder since the duration of the interval has changed but we can a Which distribution is appropriate

scale the Poisson parameter A proportionally If the average rate is 25 jJ With a constant probability of success and a fixed number of eels per day then the rate is 125 (half as many) per half-day thus independent events the total number of successes follows a binomial

1_ P(O) =1- e-L2~15deg 1- e-L2S = 1-02865 = 0_7135 distribution with parameters n =5 P =0_08 b How many bulls-eyes is John expected to hit E(X) =np = 5(008) = 04

2 A cat is hunting some mice every time she pounces at a mouse she has c What is the probability that he hits exactly two bulls-eyes a 20 chance of catching the mouse but will stop hunting as soon as she

catches one jJ x = 2 P(X) = sC20082(1 - 008)5-2 = (10)(00064)(092)3 = 0_0498 a Which distribution is appropriate

d_ What is the probability that he hits at least one bulls-eye411 n As there is a fixed probability of the event but the experiment will be II1II ~ repeated until the event occurs a geometric distribution can be used jJ As always Plat least one) = 1 - P(none at all)Z with parameter p = 02 = 1 - P(O) = 1 - 5CO 0080(1 - 008)50 = 1 - 0925 =1 - 06591 = 03409

US $5_95 fr~e downloadamp amphU[lOceoamp or titles at

L Customer Hotline 18002309522 qUICKstudYmiddotCOm

oNOTE TO STUDENT This guide is intended for informational purposes only Due ISBN-13 978-142320857-0 to its condensed format thi s guide cannot cover every aspect of the subject rather ISBN-10 142320857-9

bull tIl Jt 26it is intended for use in conjunction with course work and ass igned texts Neither BarChans Inc its writers editors nor design sta ff are in any way responsible or All rights reserved No part of this publicat ion may be reproduced or transmitted in any liable for the use or misuse of the infonnation conta ined in thi s guide form or by any means electronic or mechanical including photocopy recording or any

information storage and retrieval system wi thout written permission from the publisher

~ 911~ j1111~ ~lllll~Ijllil~llllfII1llfIlillllAUTHOR Stephen V Kizlik PhD copy 2009 BarCharts Inc 0409

symmetric unbounded bellshy(or some shaped arises commonly in other nature and in statistics as a reshyletter) sult of the central limit theorem

n Many other distributions approach the normal as n ~ (or some other parameter such as A or df) increases

standard Z I II = mean = 0 a special variant of normal normal a = standard with II = 0 and (J = 1

deviation = 1 represented in Z tables

jJ Used for inference about proportions the cumulative probability is provided in Z tables For a particular value z the cumulative probability is ltl(z) = P(Z lt z) ie the area under the density curve to the left of z

students t t df = degrees similar in shape to normal of freedom II = 0 (always)

jJ Used for inference about means

chi-square X2 df = degrees not symmetric (skewed right) of freedom

jJ Used for inferences about categorical distributions

Sample Problems amp

1 For a standard normal random variable Z find p(Z lt 15) n Since by definition the values from the standard normal table are ~ ltIgt (z) - p(Z lt z) P(Z lt 15) = ltIgt(15) = 09332

2 For a t distribution with df = 20 which critical value of t has an area of 0 05 in the right tail

n At tabJe generally provides the tail area rather than the cumulative ~probability as g iven in standard normal tables with the row = df

20 and the column =tail area = 005 a t table produces the value of 1725

3 The heights of military recruits follow a normal distribution with a mean of 70 inches and a standard deviation of 4 inches find the probashybility that a randomly chosen recruit is

a shorter than 60 inches jJ First we must transform values of the variable (height) to the

standard normal distribution by taking z scores here

z= x - p = 60 - 70 = -10 = -25 a 4 4

6 Since we want the less than probability the solution comes directly from the standard normal z table

P(X lt 60) = p(Z lt -25) = lt1gt(-25) = 00062

b taller than 72 inches n x-J 72-70 2 ~ First the z score Z = a = --4- =4 =05

11- Since this is a greater than probability subtract the cumulative ~ probability from 1

P(X gt 72) = P(Z gt 05) = 1 - lt1gt(05) = 1 - 06915 = 03085

c between 64 and 76 inches tall

fJ In this case there are two boundaries The only way to find the area under the curve between them is to find the cumulative probabilishyties for each and then to subtract this entails finding z scores for both X = 64 md X = 76

z= x- p = 64-70 = -6 =-15 andz = x- J = 76-70 =sect = 15 a 44 - a 44

Now p(64 lt x lt 76) = P (-1 5 lt Z lt 15) = lt1gt(15) - lt1gt(-15) = 09332 - 00668 = 08664

ltP(z) = P(ZltZ)

o z

Because sample statistics are derived from random samples they are random The probability distribution of a statistic is called its samshypling distribution

Due to the central limit theoshy jJ if n 2 30 or if the populationrem some important statistics distribution is normalhave sampling distributions

that approach a normal sample p distribution as the sample size proportion increases (these are listed in the table at right)

Knowing the expected value and standard error allows us to find probabilities then in tum we can use the properties of these samshypling distributions to make inferences about the parameter values when we do not know them as in real-world applications

Sample Problems amp

1 60 of the registered voters in a large district plan to vote in favor of a referendum a random sample of 340 of these voters is selected

a What is the expected value of the sample proportion

E(p)= p=O6 b What is the standard error of the sample proportion

SE(P)=~P(IP) =~06~~0 6) =00266

c What is the probability that the sample proportion is between 55 and 65

jJ First find the z scores for those proportions p(p) 055-06 -005

z= SE(p) = 00266 00266 =-188 and

p(p) 065-06 005 z= SE(p) = 00266 00266 = 188

Now P (055) P (065) = P - (188) Z (1 88) = ltIgt(188) - ltIgt(-1 88) = 0 9699 - 00301 09398

2 The standard deviation of the weight of cattle in a certain herd is 160 pounds but the mean is unknown a random sample of size 100 is chosen

a Compute the standard error of the sample mean

SE(x)= c = ~ = 161bsn 100 b For an individual animal in this herd what is the probability of a

weight within 40 Ibs of the population mean

jJ Since this problem refers to a single observation not the sample mean we use the standard deviation not the standard error

6 Not knowing the value of II we can only express the boundaries for within 40 Ibs of the mean as X = II + 40 and X = II - 40

We can still compute z scores

Z = x ~p = p +1- p = 1 = 025 and

x-p p-40-p -40 z=---o-= 160 = 160 =-025

jJThat is within 40 Ibs of the mean is the same as within 025 standard deviation

We find the probability P (-025 lt Z lt 025) = lt1gt(025) - lt1gt(-025) = 05987 - 04013 = 01974

c What is the probability that the sample mean falls within 40 Ibs of the population mean

jJ Even though we dont know the population mean the z score formula will allow us to find this probability

6 Since this is the sample mean we must use the standard error of 16 Ibs rather than the standard deviation in computing the z scores

Z = x - = p +40 - P = 40 =25 and SE(X) 16 16 - x-p p-40-p -40

z= SE(X) 16 =16=-25 Now

P (-25 lt Z lt 25) = lt1gt(25) - lt1gt(-2 5) = 09938 - 00062 = 09876

6 This probability is dramatically higher than the probability for an bull individual head of cattle

is nd

ware le

0

e

of he

d in tny or any her

4

When we want to draw conclusions about a population using data from a sample we use some method of sta tistical inference

A hypothesis test is a procedure by which claims about populations (hypotheses) are evaluated on the basis of sample statistics

The procedure begins with a null hypothesis (Hj and an alternative (or research) hypothesis (H) if the sample data are too unusual assuming H to be true then H is rejected in favor ofHI otherwise we fail to reject the null hypothesis and thereby fail to support the alternatives

the null hypothesis (Ho) the alternative hypothesis (H or H)

Sample Problems amp

In each of the following cases formulate hypotheses to test the claim indicate which hypothesis represents the claim 1 The manager of a bank claims that the

average waiting time for customers is less than two minutes

~ Since the claim refers to the average bull this is a test for ~

As a less than claim it is represented by Ho and the hypothesis test is Ho JI =2 vs H JI lt 2

is assumed true for the purpose of is supported only by carrying out the test if the null carrying out the hypothesis test hypothesis can be rejected

ALWAYS provides a specific value for the I NEVER provides a specific value for the parameter ----j

parameter its null value always instead contains gt (right-tailed) lt (left-tailed) or contains = (two-tailed)

the null value implies a specific sampling without any specific value for the parameter of interest distribution for the test statistic the sampling distribution is unknown can be rejected-or not rejected- can be supported (by rejecting the null)-or not supported but NEVER supported (by failing to reject the null-but NEVER rejected

6 The tail(s) of the hypothesis test are determined by the alternative hypothesis (H)-this is one bull of the most important attributes of the t est regardless of which method is used

There are two major methods for carrying out a hypothesis test the traditional approach (or fixed significance) and the p-value app roach (observed significance) the following table lists the steps for each approach

p-value approach traditional approach

~~mulate null and alternative hypotheses formulate a null and an alternative hy--p_o_th_e_s_is___---1

~ observe sample data determine rejection region(s) based on the level of Z significance and the tail(s) of the test ~ ~ ------------~

W~ute a test statistic from sample data observe sample data _______

~ compute the p-value from the test statistic compute the test statistic from sample data ~ reject the null hypothesis (supporting the reject the null hypothesis (supporting the alternative) at

Oalternative) at a significance level 0 if the the significance level if the test statistic falls in the p-value 0 otherwise fail to reject the rejection region otherwise fail to reject the

~ null hypothesis null hypothesis

6 With the p-value approach the final decision is made by comparing probabilities whereas with the traditional approach the decision is made by comparing values of random variables because there is a one-to-one correspondence between the values of the random variables and the probabilities the two methods will always yield consistent results we can convert between the two using the following simple (but important) rule

reject the null hypothesis (Ho) ~ p-value 0

at significance level 0 f- shy

Test Statistics Parameter Test Statistic Distribution Under Ho Assumptions

popUlation proportion standard normal Z np 15 andz P- Po n(1 - p) 2 15= SE(p)

population mean t distribution n 2 30 or theX-J1 with df = n-1 population distribution t =SE(X) is normal

6 Since the t distribution approaches the standard normal Z many teachers and texts advise that bull its OK to use Z if n is sufficiently large

difference of proportions np 2 15 ltmd (independent samples) n(1-p) 2 15

test for independence X2 distribution X2 tests for categorical (categorical data) with df = (r - 1)(c - 1) data assume that the

r = of rows expected counts (E) in multinomial goodnessshy c = of columns each cell are at least of-fit (categorical data) f--------------j 5 under the null

x2 distribution with df = k - 1 hypothesis and k = of categories

x2 tests for categorical data do not have directional alternative hypotheses rejection ~ regions are always in the right tail

5

j) (left-tailed)

2 Your friend says that a coin you are tossing is not fair

A fair coin is one that shows heads 50 of ill the time the friend states that the coin is

NOT fair

This is an H claim Ho P - 05 vs H p 05 j) (two-tailed)

3 A highway patrolman claims that the average speed of cars on a highway is at most 70 mph

it The claim directly refers to the average bull since this is an at most claim it is

represented by Ho

The hypothesis test is Ho JI = 70 vs H JI gt 70

j) (right-tailed)

4 A motorist claims that more than 80 of the cars on a highway travel at a speed exceeding 70 mph

Since the claim is really about a proportionshyill dont be fooled by the 70 mph-the

hypotheses refer to p As the motorist makes a more than claim it is the null hypothesis Ho Ho P =OS vs H pgt OS

P (right-tailed)

5 The manager of a snack-food factory states that the average weight of a bag of their potato chips is exactly 5 oz (no more no less)

This is an is exactly claim that refers ill to the average thus the claim is Ho

The test is Ho JI 5 vs H JI 5

j) (two-tailed)

and the hypothesis test is two-tailed

is less than alternative hypothesis (H)

and the hypothesis test is left-tailed lt

is greater than alternative hypothesis (H)

and the hypothesis test is right-tailed gt

is equal t~ null hypothesis (Hal bull IS exactly

and the hypothesis test is two-tailed

is at least null hypothesis (Hal

and the hypothesis test is left-tailed lt

is at most null hypothesis (Hal

and the hypothesis test is right-tailed gt

5tatiIIieaI _ (conIinueclJ

Sample Problems amp

Decision Reality 1_ In some hypothesis tests the null hypothesis is

Ho true Ho false rejected if an error has been made which kind of error is it

reject Ho type I error correct inference (supporting H) p(reject Ho IHo true) = a = p(reject Ho IHo false) = 1-13 =power

I i level of significance 6 The only error of inference in which the null bull hypothesis is rejected is a type I error 6 When the null hypothesis (Ho) is rejected we can support the alternative hypothesis (H)

bull This is a substantive finding We have sufficient evidence that Ho is not correct 2_ A researcher conducts a hypothesis test at a

fail to reject Ho correct inference type II error significance level of 005 and computer software (failing to support H) p(fail to reject Ho IHo false) = p(fail to reject Ho IHo true) = 13 produces a p-value of 00912 unknown to the

1 - ex = level of confidence researcher the null hypothesis is really falseshy6 If Ho is not rejected then we cannot support H either this is NOT a substantive finding We have what is her decision ls it some type of error

bull failed to find evidence against Ho but have not confirmed or proved it to be true

P First consider her decision notes Under the null hypothesis we have If the null hypothesis is false there is no a specific value for the parameter specific value for the parameter She will reject or fail to reject the null This determines a specific sampling Thus we can only estimate 13 and 1 -13 hypothesis we have no test statistic distribution so that ex and by making some alternative assumption only a p-value 1 - ex can be precisely determined about the parameter it But since the p-value is less than the

6 It is important to note that these probabilities are conditioned on reality rather than the decision significance level 0 Ho is rejected but also bull That is given that Ho is true ex is the probability of rejecting Ho it is NOT the probability that Ho since Ho is false this is a type II error

is true given that it has been rejected

Finding Rejection Regions amp P-Values Percentage Tail(s) of Rejection Region bull P-Value Cumulative Hypothesis Test Distribution lt left-tailed values of the test statistic less than some area under the density curve to the

critical value with area a in the left tail left of the test statistic for selected z values under a normal curve

gt right-tailed values of the test statistic greater than some area under the density curve to the critical value with area a in the right tail right of the test statistic

~ values of the test statistic less than some double the tail area under the curve

Z critical value with area a in the left tail or away from the test statistic two-tailed

greater than some critical value with area a in the right tailW z -value -3 -2 -I 0 + 1 +2 +3

~ Sample Problems amp

o1 At an aquaculture facility a large number of eels are kept in a tank they die QSf b What is the probability that shell catch a mouse on her first attempt

I11III independently of each other at an average rate of 25 eels per day With a 20 chance of success each time the probability of succeeding the

iIIIII a Which distribution is appropriate first time is simply 02 n Since the events are independent and were given an average rate per jJ We can also use the geometric pdf with x=1 P(1) = (1_02) (02) = 0_2 ~ fixed interval a Poisson distribution can be used with parameter A = 25

c What is the probability that shell catch a mouse on her third attemptb_ Find the probability that exactly two eels die in a given day jJ Find P(X) for X = 2 e-2s25 jJ The first success occurring on the third trial means

P(2) =-V=0-1283 x = 3 P(3) = (1 - 02)3(02) = (08)2(02) = 0_128

c What is the probability that at least one eel dies in the span of one day d How many times is she expected to pounce until she succeeds jJ Since the Poisson distribution has no maximum there is no alternative but E(X)=l=_1 = 5

to use the law of complements Plat least one dies)= 1- P(none at all die) = p 02

1- P(O)= 1- e- ~~5() =l-e-25 =1-00821 = 09179 3_ John is playing darts each time he throws a dart he has an 8 chance of hitting a bulls-eye independently of the result for any other dart thrown he HAlm d_ Compute the probability that at least one eel dies in the span of 12 hours throws a total of five darts 6 This is harder since the duration of the interval has changed but we can a Which distribution is appropriate

scale the Poisson parameter A proportionally If the average rate is 25 jJ With a constant probability of success and a fixed number of eels per day then the rate is 125 (half as many) per half-day thus independent events the total number of successes follows a binomial

1_ P(O) =1- e-L2~15deg 1- e-L2S = 1-02865 = 0_7135 distribution with parameters n =5 P =0_08 b How many bulls-eyes is John expected to hit E(X) =np = 5(008) = 04

2 A cat is hunting some mice every time she pounces at a mouse she has c What is the probability that he hits exactly two bulls-eyes a 20 chance of catching the mouse but will stop hunting as soon as she

catches one jJ x = 2 P(X) = sC20082(1 - 008)5-2 = (10)(00064)(092)3 = 0_0498 a Which distribution is appropriate

d_ What is the probability that he hits at least one bulls-eye411 n As there is a fixed probability of the event but the experiment will be II1II ~ repeated until the event occurs a geometric distribution can be used jJ As always Plat least one) = 1 - P(none at all)Z with parameter p = 02 = 1 - P(O) = 1 - 5CO 0080(1 - 008)50 = 1 - 0925 =1 - 06591 = 03409

US $5_95 fr~e downloadamp amphU[lOceoamp or titles at

L Customer Hotline 18002309522 qUICKstudYmiddotCOm

oNOTE TO STUDENT This guide is intended for informational purposes only Due ISBN-13 978-142320857-0 to its condensed format thi s guide cannot cover every aspect of the subject rather ISBN-10 142320857-9

bull tIl Jt 26it is intended for use in conjunction with course work and ass igned texts Neither BarChans Inc its writers editors nor design sta ff are in any way responsible or All rights reserved No part of this publicat ion may be reproduced or transmitted in any liable for the use or misuse of the infonnation conta ined in thi s guide form or by any means electronic or mechanical including photocopy recording or any

information storage and retrieval system wi thout written permission from the publisher

~ 911~ j1111~ ~lllll~Ijllil~llllfII1llfIlillllAUTHOR Stephen V Kizlik PhD copy 2009 BarCharts Inc 0409

When we want to draw conclusions about a population using data from a sample we use some method of sta tistical inference

A hypothesis test is a procedure by which claims about populations (hypotheses) are evaluated on the basis of sample statistics

The procedure begins with a null hypothesis (Hj and an alternative (or research) hypothesis (H) if the sample data are too unusual assuming H to be true then H is rejected in favor ofHI otherwise we fail to reject the null hypothesis and thereby fail to support the alternatives

the null hypothesis (Ho) the alternative hypothesis (H or H)

Sample Problems amp

In each of the following cases formulate hypotheses to test the claim indicate which hypothesis represents the claim 1 The manager of a bank claims that the

average waiting time for customers is less than two minutes

~ Since the claim refers to the average bull this is a test for ~

As a less than claim it is represented by Ho and the hypothesis test is Ho JI =2 vs H JI lt 2

is assumed true for the purpose of is supported only by carrying out the test if the null carrying out the hypothesis test hypothesis can be rejected

ALWAYS provides a specific value for the I NEVER provides a specific value for the parameter ----j

parameter its null value always instead contains gt (right-tailed) lt (left-tailed) or contains = (two-tailed)

the null value implies a specific sampling without any specific value for the parameter of interest distribution for the test statistic the sampling distribution is unknown can be rejected-or not rejected- can be supported (by rejecting the null)-or not supported but NEVER supported (by failing to reject the null-but NEVER rejected

6 The tail(s) of the hypothesis test are determined by the alternative hypothesis (H)-this is one bull of the most important attributes of the t est regardless of which method is used

There are two major methods for carrying out a hypothesis test the traditional approach (or fixed significance) and the p-value app roach (observed significance) the following table lists the steps for each approach

p-value approach traditional approach

~~mulate null and alternative hypotheses formulate a null and an alternative hy--p_o_th_e_s_is___---1

~ observe sample data determine rejection region(s) based on the level of Z significance and the tail(s) of the test ~ ~ ------------~

W~ute a test statistic from sample data observe sample data _______

~ compute the p-value from the test statistic compute the test statistic from sample data ~ reject the null hypothesis (supporting the reject the null hypothesis (supporting the alternative) at

Oalternative) at a significance level 0 if the the significance level if the test statistic falls in the p-value 0 otherwise fail to reject the rejection region otherwise fail to reject the

~ null hypothesis null hypothesis

6 With the p-value approach the final decision is made by comparing probabilities whereas with the traditional approach the decision is made by comparing values of random variables because there is a one-to-one correspondence between the values of the random variables and the probabilities the two methods will always yield consistent results we can convert between the two using the following simple (but important) rule

reject the null hypothesis (Ho) ~ p-value 0

at significance level 0 f- shy

Test Statistics Parameter Test Statistic Distribution Under Ho Assumptions

popUlation proportion standard normal Z np 15 andz P- Po n(1 - p) 2 15= SE(p)

population mean t distribution n 2 30 or theX-J1 with df = n-1 population distribution t =SE(X) is normal

6 Since the t distribution approaches the standard normal Z many teachers and texts advise that bull its OK to use Z if n is sufficiently large

difference of proportions np 2 15 ltmd (independent samples) n(1-p) 2 15

test for independence X2 distribution X2 tests for categorical (categorical data) with df = (r - 1)(c - 1) data assume that the

r = of rows expected counts (E) in multinomial goodnessshy c = of columns each cell are at least of-fit (categorical data) f--------------j 5 under the null

x2 distribution with df = k - 1 hypothesis and k = of categories

x2 tests for categorical data do not have directional alternative hypotheses rejection ~ regions are always in the right tail

5

j) (left-tailed)

2 Your friend says that a coin you are tossing is not fair

A fair coin is one that shows heads 50 of ill the time the friend states that the coin is

NOT fair

This is an H claim Ho P - 05 vs H p 05 j) (two-tailed)

3 A highway patrolman claims that the average speed of cars on a highway is at most 70 mph

it The claim directly refers to the average bull since this is an at most claim it is

represented by Ho

The hypothesis test is Ho JI = 70 vs H JI gt 70

j) (right-tailed)

4 A motorist claims that more than 80 of the cars on a highway travel at a speed exceeding 70 mph

Since the claim is really about a proportionshyill dont be fooled by the 70 mph-the

hypotheses refer to p As the motorist makes a more than claim it is the null hypothesis Ho Ho P =OS vs H pgt OS

P (right-tailed)

5 The manager of a snack-food factory states that the average weight of a bag of their potato chips is exactly 5 oz (no more no less)

This is an is exactly claim that refers ill to the average thus the claim is Ho

The test is Ho JI 5 vs H JI 5

j) (two-tailed)

and the hypothesis test is two-tailed

is less than alternative hypothesis (H)

and the hypothesis test is left-tailed lt

is greater than alternative hypothesis (H)

and the hypothesis test is right-tailed gt

is equal t~ null hypothesis (Hal bull IS exactly

and the hypothesis test is two-tailed

is at least null hypothesis (Hal

and the hypothesis test is left-tailed lt

is at most null hypothesis (Hal

and the hypothesis test is right-tailed gt

5tatiIIieaI _ (conIinueclJ

Sample Problems amp

Decision Reality 1_ In some hypothesis tests the null hypothesis is

Ho true Ho false rejected if an error has been made which kind of error is it

reject Ho type I error correct inference (supporting H) p(reject Ho IHo true) = a = p(reject Ho IHo false) = 1-13 =power

I i level of significance 6 The only error of inference in which the null bull hypothesis is rejected is a type I error 6 When the null hypothesis (Ho) is rejected we can support the alternative hypothesis (H)

bull This is a substantive finding We have sufficient evidence that Ho is not correct 2_ A researcher conducts a hypothesis test at a

fail to reject Ho correct inference type II error significance level of 005 and computer software (failing to support H) p(fail to reject Ho IHo false) = p(fail to reject Ho IHo true) = 13 produces a p-value of 00912 unknown to the

1 - ex = level of confidence researcher the null hypothesis is really falseshy6 If Ho is not rejected then we cannot support H either this is NOT a substantive finding We have what is her decision ls it some type of error

bull failed to find evidence against Ho but have not confirmed or proved it to be true

P First consider her decision notes Under the null hypothesis we have If the null hypothesis is false there is no a specific value for the parameter specific value for the parameter She will reject or fail to reject the null This determines a specific sampling Thus we can only estimate 13 and 1 -13 hypothesis we have no test statistic distribution so that ex and by making some alternative assumption only a p-value 1 - ex can be precisely determined about the parameter it But since the p-value is less than the

6 It is important to note that these probabilities are conditioned on reality rather than the decision significance level 0 Ho is rejected but also bull That is given that Ho is true ex is the probability of rejecting Ho it is NOT the probability that Ho since Ho is false this is a type II error

is true given that it has been rejected

Finding Rejection Regions amp P-Values Percentage Tail(s) of Rejection Region bull P-Value Cumulative Hypothesis Test Distribution lt left-tailed values of the test statistic less than some area under the density curve to the

critical value with area a in the left tail left of the test statistic for selected z values under a normal curve

gt right-tailed values of the test statistic greater than some area under the density curve to the critical value with area a in the right tail right of the test statistic

~ values of the test statistic less than some double the tail area under the curve

Z critical value with area a in the left tail or away from the test statistic two-tailed

greater than some critical value with area a in the right tailW z -value -3 -2 -I 0 + 1 +2 +3

~ Sample Problems amp

o1 At an aquaculture facility a large number of eels are kept in a tank they die QSf b What is the probability that shell catch a mouse on her first attempt

I11III independently of each other at an average rate of 25 eels per day With a 20 chance of success each time the probability of succeeding the

iIIIII a Which distribution is appropriate first time is simply 02 n Since the events are independent and were given an average rate per jJ We can also use the geometric pdf with x=1 P(1) = (1_02) (02) = 0_2 ~ fixed interval a Poisson distribution can be used with parameter A = 25

c What is the probability that shell catch a mouse on her third attemptb_ Find the probability that exactly two eels die in a given day jJ Find P(X) for X = 2 e-2s25 jJ The first success occurring on the third trial means

P(2) =-V=0-1283 x = 3 P(3) = (1 - 02)3(02) = (08)2(02) = 0_128

c What is the probability that at least one eel dies in the span of one day d How many times is she expected to pounce until she succeeds jJ Since the Poisson distribution has no maximum there is no alternative but E(X)=l=_1 = 5

to use the law of complements Plat least one dies)= 1- P(none at all die) = p 02

1- P(O)= 1- e- ~~5() =l-e-25 =1-00821 = 09179 3_ John is playing darts each time he throws a dart he has an 8 chance of hitting a bulls-eye independently of the result for any other dart thrown he HAlm d_ Compute the probability that at least one eel dies in the span of 12 hours throws a total of five darts 6 This is harder since the duration of the interval has changed but we can a Which distribution is appropriate

scale the Poisson parameter A proportionally If the average rate is 25 jJ With a constant probability of success and a fixed number of eels per day then the rate is 125 (half as many) per half-day thus independent events the total number of successes follows a binomial

1_ P(O) =1- e-L2~15deg 1- e-L2S = 1-02865 = 0_7135 distribution with parameters n =5 P =0_08 b How many bulls-eyes is John expected to hit E(X) =np = 5(008) = 04

2 A cat is hunting some mice every time she pounces at a mouse she has c What is the probability that he hits exactly two bulls-eyes a 20 chance of catching the mouse but will stop hunting as soon as she

catches one jJ x = 2 P(X) = sC20082(1 - 008)5-2 = (10)(00064)(092)3 = 0_0498 a Which distribution is appropriate

d_ What is the probability that he hits at least one bulls-eye411 n As there is a fixed probability of the event but the experiment will be II1II ~ repeated until the event occurs a geometric distribution can be used jJ As always Plat least one) = 1 - P(none at all)Z with parameter p = 02 = 1 - P(O) = 1 - 5CO 0080(1 - 008)50 = 1 - 0925 =1 - 06591 = 03409

US $5_95 fr~e downloadamp amphU[lOceoamp or titles at

L Customer Hotline 18002309522 qUICKstudYmiddotCOm

oNOTE TO STUDENT This guide is intended for informational purposes only Due ISBN-13 978-142320857-0 to its condensed format thi s guide cannot cover every aspect of the subject rather ISBN-10 142320857-9

bull tIl Jt 26it is intended for use in conjunction with course work and ass igned texts Neither BarChans Inc its writers editors nor design sta ff are in any way responsible or All rights reserved No part of this publicat ion may be reproduced or transmitted in any liable for the use or misuse of the infonnation conta ined in thi s guide form or by any means electronic or mechanical including photocopy recording or any

information storage and retrieval system wi thout written permission from the publisher

~ 911~ j1111~ ~lllll~Ijllil~llllfII1llfIlillllAUTHOR Stephen V Kizlik PhD copy 2009 BarCharts Inc 0409

5tatiIIieaI _ (conIinueclJ

Sample Problems amp

Decision Reality 1_ In some hypothesis tests the null hypothesis is

Ho true Ho false rejected if an error has been made which kind of error is it

reject Ho type I error correct inference (supporting H) p(reject Ho IHo true) = a = p(reject Ho IHo false) = 1-13 =power

I i level of significance 6 The only error of inference in which the null bull hypothesis is rejected is a type I error 6 When the null hypothesis (Ho) is rejected we can support the alternative hypothesis (H)

bull This is a substantive finding We have sufficient evidence that Ho is not correct 2_ A researcher conducts a hypothesis test at a

fail to reject Ho correct inference type II error significance level of 005 and computer software (failing to support H) p(fail to reject Ho IHo false) = p(fail to reject Ho IHo true) = 13 produces a p-value of 00912 unknown to the

1 - ex = level of confidence researcher the null hypothesis is really falseshy6 If Ho is not rejected then we cannot support H either this is NOT a substantive finding We have what is her decision ls it some type of error

bull failed to find evidence against Ho but have not confirmed or proved it to be true

P First consider her decision notes Under the null hypothesis we have If the null hypothesis is false there is no a specific value for the parameter specific value for the parameter She will reject or fail to reject the null This determines a specific sampling Thus we can only estimate 13 and 1 -13 hypothesis we have no test statistic distribution so that ex and by making some alternative assumption only a p-value 1 - ex can be precisely determined about the parameter it But since the p-value is less than the

6 It is important to note that these probabilities are conditioned on reality rather than the decision significance level 0 Ho is rejected but also bull That is given that Ho is true ex is the probability of rejecting Ho it is NOT the probability that Ho since Ho is false this is a type II error

is true given that it has been rejected

Finding Rejection Regions amp P-Values Percentage Tail(s) of Rejection Region bull P-Value Cumulative Hypothesis Test Distribution lt left-tailed values of the test statistic less than some area under the density curve to the

critical value with area a in the left tail left of the test statistic for selected z values under a normal curve

gt right-tailed values of the test statistic greater than some area under the density curve to the critical value with area a in the right tail right of the test statistic

~ values of the test statistic less than some double the tail area under the curve

Z critical value with area a in the left tail or away from the test statistic two-tailed

greater than some critical value with area a in the right tailW z -value -3 -2 -I 0 + 1 +2 +3

~ Sample Problems amp

o1 At an aquaculture facility a large number of eels are kept in a tank they die QSf b What is the probability that shell catch a mouse on her first attempt

I11III independently of each other at an average rate of 25 eels per day With a 20 chance of success each time the probability of succeeding the

iIIIII a Which distribution is appropriate first time is simply 02 n Since the events are independent and were given an average rate per jJ We can also use the geometric pdf with x=1 P(1) = (1_02) (02) = 0_2 ~ fixed interval a Poisson distribution can be used with parameter A = 25

c What is the probability that shell catch a mouse on her third attemptb_ Find the probability that exactly two eels die in a given day jJ Find P(X) for X = 2 e-2s25 jJ The first success occurring on the third trial means

P(2) =-V=0-1283 x = 3 P(3) = (1 - 02)3(02) = (08)2(02) = 0_128

c What is the probability that at least one eel dies in the span of one day d How many times is she expected to pounce until she succeeds jJ Since the Poisson distribution has no maximum there is no alternative but E(X)=l=_1 = 5

to use the law of complements Plat least one dies)= 1- P(none at all die) = p 02

1- P(O)= 1- e- ~~5() =l-e-25 =1-00821 = 09179 3_ John is playing darts each time he throws a dart he has an 8 chance of hitting a bulls-eye independently of the result for any other dart thrown he HAlm d_ Compute the probability that at least one eel dies in the span of 12 hours throws a total of five darts 6 This is harder since the duration of the interval has changed but we can a Which distribution is appropriate

scale the Poisson parameter A proportionally If the average rate is 25 jJ With a constant probability of success and a fixed number of eels per day then the rate is 125 (half as many) per half-day thus independent events the total number of successes follows a binomial

1_ P(O) =1- e-L2~15deg 1- e-L2S = 1-02865 = 0_7135 distribution with parameters n =5 P =0_08 b How many bulls-eyes is John expected to hit E(X) =np = 5(008) = 04

2 A cat is hunting some mice every time she pounces at a mouse she has c What is the probability that he hits exactly two bulls-eyes a 20 chance of catching the mouse but will stop hunting as soon as she

catches one jJ x = 2 P(X) = sC20082(1 - 008)5-2 = (10)(00064)(092)3 = 0_0498 a Which distribution is appropriate

d_ What is the probability that he hits at least one bulls-eye411 n As there is a fixed probability of the event but the experiment will be II1II ~ repeated until the event occurs a geometric distribution can be used jJ As always Plat least one) = 1 - P(none at all)Z with parameter p = 02 = 1 - P(O) = 1 - 5CO 0080(1 - 008)50 = 1 - 0925 =1 - 06591 = 03409

US $5_95 fr~e downloadamp amphU[lOceoamp or titles at

L Customer Hotline 18002309522 qUICKstudYmiddotCOm

oNOTE TO STUDENT This guide is intended for informational purposes only Due ISBN-13 978-142320857-0 to its condensed format thi s guide cannot cover every aspect of the subject rather ISBN-10 142320857-9

bull tIl Jt 26it is intended for use in conjunction with course work and ass igned texts Neither BarChans Inc its writers editors nor design sta ff are in any way responsible or All rights reserved No part of this publicat ion may be reproduced or transmitted in any liable for the use or misuse of the infonnation conta ined in thi s guide form or by any means electronic or mechanical including photocopy recording or any

information storage and retrieval system wi thout written permission from the publisher

~ 911~ j1111~ ~lllll~Ijllil~llllfII1llfIlillllAUTHOR Stephen V Kizlik PhD copy 2009 BarCharts Inc 0409