30
doc.Ing. Zlata Sojková, CSc. 1 Inferent ial statisti cs Suppose, we have a bag of nuts. I will choose one of nuts, I will crack it and it will be empty. What then I can conclude? The optimist says: „But this! Only one nut is bad and I have to pull it. At least we got rid of it. "Pessimist says:" This is what I was afraid of, the bag is full of bad nuts ". What will say Statistician? I declare that both pessimist and optimist may be right. To determine whether the nuts in the bag are bad, it is enough to select few nuts from different places of bag and crack them

Inferential statistics

  • Upload
    vachel

  • View
    42

  • Download
    3

Embed Size (px)

DESCRIPTION

Suppose, we have a bag of nuts. I will choose one of nut s , I will crack it and it will be empty. What then I can conclude? The optimist says: „ But this! O nly one nut is bad and I have to pull it. At least we got rid of it. - PowerPoint PPT Presentation

Citation preview

Page 1: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 1

Inferential statistics

Suppose, we have a bag of nuts.I will choose one of nuts, I will crack it and it will be empty. What then I can conclude? The optimist says: „But this! Only one nut is bad and I have to pull it. At least we got rid of it."Pessimist says:" This is what I was afraid of, the bag is full of bad nuts ". What will say Statistician? I declare that both pessimist and optimist may be right.

To determine whether the nuts in the bag are bad,

it is enough to select few nuts from different places of bag and crack them …

Page 2: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 2

Statistical inference is based on the sample investigation

Statistical inference is the process of using sample results to draw conclusions about the parameters of a population.

The sample should be a representativesample of the population. On the picture it’s not so ...

Page 3: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 3

Examples of inferential statistics

Household accounts Marketing research of consumer behavior (patterns?) Sample investigation of agricultural enterprises Survey of public opinions Quality control

Page 4: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 4

Inferential statistics (or Statistical inference)

Assume that we are working with the sample and we calculate a sample statistics such: sample average, sample variance , sample standard deviation.

Based on the sample we assume the properties of a population.

This means , the values of a sample statistics are used to estimate the unknown values of population parameters

Usually we estimate parameters of population such : population mean, population variance, standard deviation of population.

Page 5: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 5

Graphicaly

Sample with size n

Symbols: parameters of population: , 2, , generally Q

sample characteristics :

Generally:un

s, s,x 2 Population – size N,resp. (infinity)

Page 6: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 6

has two basic tasks

statistical estimation - unknown population parameters are estimated by sample characteristics

Statistical hypothesis testing - we express assumptions about the unknown parameters of the population. If we can formulate these assumptions to statistical hypotheses and we can verify their validity by statistical procedures, then these statistical process is statistical hypothesis testing.

Statistical inference (SI)

Page 7: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 7

Some another tasks of SI To determinate size of sample (n), which will be enough for reliable

(spoľahlivý) estimation of parameters

To determinate some methods of statistical units sampling from population

Explanation: the sample characteristics are deterministic in relationship to the sample,

but they are random variables in relationship to the population , so they have some

probability distribution.

That means, important is choosing of the right model of sample characteristic

distribution, which we have to use in statistical inference (this made for us

statisticians). Arithmetic average has usually Student distribution, but in large

sample (n>30) we can approximate Student distribution by Normal distribution

Page 8: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 8

Random samplingThere are a lot of methods that can be used to select

a sample from a population

from the repetition point of view selection with replacement •selection without replacement

Classification based on the subdivision file simple random sample (finite or infinite

population) or composite, which can be:.

• Based on choosing of groups • Quota sampling …..e.t.c.

Page 9: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 9

Theory of Estimation (TO)

Point estimate – bodový odhad Interval estimate – intervalový odhad

Repetition: the main goal of theory of estimation is to estimate population parameters such: mju, sigma by using sample characteristics

There are two types of estimators

Page 10: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 10

Point estimation of population parameter Q (generally)

Point estimator – is a single numerical value used as an estimate of population parameter Q - geometrically that means one point

Estimate- estimator – abbrev.est. sign: est Q = un

Q un

Mostly we estimate :

population mean

variance of population 2 and standard deviation of

population

Page 11: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 11

Attributes of point estimates

The best estimator satisfies (meets) following conditions: Unbiasedness - neskreslenosť (nevychýlenosť) Consistency - konzistencia Efficiency - výdatnosť Suficiency (postačujúci odhad)

We eplain two first condition

Page 12: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 12

UnbiasednessE(un - Q) = 0 E( un )= Q

we will repeat sampling more times, always we will get some another error – so we will get another average .

According to the unbiasedness we require that expected value of all errors should be equal to zero. We

require that all errors are only random, so we don’t underestimate or overestimate the mean of population.

x

x x xx

xx xx

x

x

Asymptotically unbiased estimator of Q is sample characteristic , which satisfy condition :

Q)E(u lim nn

x

Page 13: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 13

Consistency

1 ) | Q- u(|P lim nn

Principle of consistency lies in the law of large numbers. The consistency provides in statistical practice, that with increasing sample size the error of estimation decreases.

For large samples the error of estimation is very smallSufficient condition of consistency is asymptotically unbiased estimation of un and meeting of the condition:

0)D(u lim n n

Page 14: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 14

Efficiency PE

Any sample characteristic is a random variable, with some variance

If we have two unbiased point estimators of the same population parameter, the point with the smaller variance is said to have greater efficiency.

min )u(D n

Page 15: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 15

Point estimator of population mean

nxDxE

2

... )( , )(

While offers unbiased estimator of and : x

0n

lim)x(D n

lim 2

n

The sufficiency condition of consistence is satisfied and is unbiased and consistent estimator of population mean x est

nx

! Standard deviation of average , mean standard error of estimation

x

Page 16: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 16

Point estimator of variance 2 resp. 22 .

n

1)-(n ... )( sE

Sample variance s 2 isn’t unbiased estimator of population variance 2 -it offers negatively biased estimation.

Unbiasedness is equal to 2 .

n

1

22

n

2

n

n

1nlim )E(s lim

The sample variance is asymptotically unbiased of 2, while

Page 17: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 17

So, unbiased point estimator of population variance 2 is sample variance s1

2, which is computed:

2n

1jj

221 )(x

1-n

1

1-n

n xss

Bessel’s correction

Difference between s12

and s2 is decreasing with increasing sample size n. At the sample size greater than 50, ( n > 50 ) difference is negligible

Conclusion x est

21

2 s est

Page 18: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 18

Example:At 400 random households in one of the regions SR were investigated expenditures on alcoholic drinks and cigarettes. We will make point estimate of mean and standard error.

Skxest 973 Skest 286s 1

3.1420

2861 n

sx

Estimated average error of mean is relatively small. It is only 1.5% of mean. We can expect that error in estimation of average expenditures on alcoholic drinks and cigarettes is not too large.

Page 19: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 19

Comparison of the statistical distribution of attributes X in the population to the distribution of

sample average :

)x(f

)x(f

n

σx

Page 20: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 20

Interval estimate of parameter Q

q1 q2

(1 - ) confidency level/2 /2

P(q1 Q q2) = 1-

-risk of estimation

q1,q2 – lower and upper limit of interval - random

f(g)

Page 21: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 21

Interval estimation of population mean Suppose, that the statistical attribute has a Normal distribution X.....N(,2) , If we will choose a sample with the size of n, then aritmethic average has Normal distribution too .......N(, 2/n)

Confidence interval for depends on disponibility of information and sample size:a) If the variance of population is known (theoretical

assumption) we can create standardized normal variables :

n

- x u

u has N(0,1) independent on

estiamed value

Page 22: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 22

-1 u

n

σ-μx

uP2

12

1

21u

21

u

1 -

f(u)

Page 23: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 23

-1

nu x

n ux P

21

21

After transformation we get

- sampling error

- half of the interval, determinates accurancy of the estimation,Interval estimate is actually point estimate , t.j.

Δ x

Δ x Δ x

Page 24: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 24

b) The population variance is unknown est 2 = s1

2 , and the sample size is large, n > 30

n

s u x 1

21

c) If the population variance is unknown est 2 = s1

2 , and the sample size is small (less than 30), n 30

n

s t x 1

1)-(nt(n-1) –critical value of Student’s distribution at alfa level and at degrees of freedom

We can use N(0,1)

Page 25: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 25

Example: Based on the point estimator of household expenditure on cigarette and alcohol we will do interval estimation with 95% of probability

n

1

21

sux

= 1.96 * 14.3 = 28.03

973 - 28.03 < < 973 + 28.03, t.j 944.97 < < 1 001.03

With 95% probability we estimate average expenditure from 945 Sk to 1001 Sk.

n=400 973x

3.14400

2861 n

sx

96.1uu u 975.00.025 -1

21

Excel... NORMSINV(0.975)

x

Page 26: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 26

Example: It was taken research to investigate the weight loss of

carrot, after one week storage. 20 samples of 1 kg weight at the begining of the storage was analyzed and the loss of weight was identified. Average weight loss was 49g with sample standard deviation 4g.We assume, that weight loss have normal distribution. We will estimate average loss of weight with 95% confidence. Because n<30 we will use...

9.501.47

9.120

4. 2.09 49

n

11)-(n

s

tx t(n-1) -kvantil Studentovho rozdelenia, t0.05(19)=2.09TINV(0.05;19) - Excel

With 95 % confidence, average weight loss of 1kg carrot sample is in interval 47.1g to 50.9g

Page 27: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 27

The large of confidence error depends on the??

confidence probability (1- ) mean error of average which depends on:

Variability of attributes - we can’t change it ,Sample size . That we can change !!!

2

212

/2-1

s u n

The sample size which we need for achievement of

reliability an accuracy we can determinate using next formula:

Page 28: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 28

Confidence Interval for variance 2 a

1 χ χ χP 22/

222/1

2

212

σ

)s1(nχ

/2/2 1 -

2 1-/2 2

/2

f(2)

Critical values of CHÍ-square distribution

Page 29: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 29

After transformation we receive:

1 χ

1)s-(n

χ

1)s-(n P

22/1

212

22/

21

1 χ

1)s-(n

χ

1)s-(nP

22/1

21

22/

21

Respectively confidence interval for standard deviation:

Page 30: Inferential  statistics

doc.Ing. Zlata Sojková, CSc. 30

QuestionsWhat is relevant difference between

point and interval estimation? How boundary interval depends on the confidence level?? How confidence level influences the accuracy of the confidence interval

How can we assure interval estimate of mean with chosen confidence and accurancy?