42
Mathematical Statistics Instructor: Dr. Deshi Ye Course homepage: http://www.cs.zju.edu.cn/people/yedeshi/

Mathematical Statistics

Embed Size (px)

DESCRIPTION

Mathematical Statistics. Instructor: Dr. Deshi Ye. Course homepage: http://www.cs.zju.edu.cn/people/yedeshi/. Course information. What is for? This course provides an elementary introduction to mathematical statistics with applications. - PowerPoint PPT Presentation

Citation preview

Page 1: Mathematical Statistics

Mathematical Statistics

Instructor:

Dr. Deshi Ye

Course homepage: http://www.cs.zju.edu.cn/people/yedeshi/

Page 2: Mathematical Statistics

Course information

• What is for?– This course provides an elementary

introduction to mathematical statistics with applications.

– Topics include: statistical estimation, hypothesis testing; confidence intervals; calculation of a P-value; nonparametric testing; curve fitting; analysis of variance and factorial experimental design.

Page 3: Mathematical Statistics

Grading

• Grades for the course will be based on the following weighting1) Class attendance: 10% 2) Homework assignment: 26% 3) Unit quiz: 24% (12%, 12%)4) Final exam: 40%

Page 4: Mathematical Statistics

Introduction

• Probability theory is devoted to the study of uncertainty and variability

• Statistics can be described as the study of how to make inference and decisions in the face of uncertainty and variability

Page 5: Mathematical Statistics

Brief History

• Blaise Pascal and Pierre de Fermat: the origins of probability are found.– concerning a popular dice game – fundamental principles of probability theory

• Pierre de Laplace: – Before him, concern on the analysis of games

of chance – Laplace applied probabilistic ideas to many

scientific and practical problems

Page 6: Mathematical Statistics

A case study

• Visually inspecting data to improve product quality

Page 7: Mathematical Statistics

Population and Sample

• Investigating: a physical phenomenon, production process, or manufactured unit, share some common characteristics.

• Relevant data must be collected.

• Unit: the source of each measurement.– A single entity, usually an object or person

• Population: entire collection of units.

Page 8: Mathematical Statistics

Examples

Population Unit variables

All students currently enrolled in school

student GPA

Number of credits

All books in library

book Replacement cost

Page 9: Mathematical Statistics

Sample

• Statistical population: the set of all measurement corresponding to each unit in the entire population of units about which information is sought.

• Sample: A sample from a statistical population is the subset of measurements that are actually collected in the course of investigation.

Page 10: Mathematical Statistics

Ch2: Treatment of data

• Outline– Pareto diagrams, dot diagrams– Histograms (Frequency distributions)– Stem-and-leaf display– Box-plot (Quartiles and Percentiles)– The calculation of and standard deviation sx

Page 11: Mathematical Statistics

Pareto Diagram

• For a computer-controlled lathe whose performance was below par, workers recorded the following causes and their frequencies:

power fluctuations 6 controller not stable 22 operator error 13 worn tool not replaced 2 other 5

Page 12: Mathematical Statistics

Minitab14

• 1. Stat->Quality tools->Pareto chart

• 2. Choose chart defects table as follows

Page 13: Mathematical Statistics

Output

Page 14: Mathematical Statistics

Pareto diagram

• Pareto diagram: depicts Pareto’s empirical law that any assortment of events consists of a few major and many minor elements.

• Typically, two or three elements will account for more than half of the total frequency.

Page 15: Mathematical Statistics

Dot diagram

• Observation on the deviations of cutting speed from the target value set by the controller.

• EX. Cutting speed – target speed

• 3 6 –2 4 7 4

• In minitab: stat->dotplots->simple

Page 16: Mathematical Statistics

Dot diagram

• This diagram visually summarize the information that the lathe is generally running fast.

Page 17: Mathematical Statistics

Data001. 80 data of emission (in ton)of sulfur

oxides from an industry plant• 15.8 26.4 17.3 11.2 23.9 24.8 18.7 13.9 9.0 13.2 22.7

9.8 6.2 14.7 17.5 26.1 12.8 28.6 17.6 23.7 26.8

• 22.7 18.0 20.5 11.0 20.9 15.5 19.4 16.7 10.7 19.1 15.2 22.9 26.6 20.4 21.4 19.2 21.6 16.9 19.0 18.5 23.0

• 24.6 20.1 16.2 18.0 7.7 13.5 23.5 14.5 14.4 29.6 19.4 17.0 20.8 24.3 22.5 24.6 18.4 18.1 8.3 21.9 12.3

• 22.3 13.3 11.8 19.3 20.0 25.7 31.8 25.9 10.5 15.9 27.5 18.1 17.9 9.4 24.1 20.1 28.5

Page 18: Mathematical Statistics

Frequency distributions

• A frequency distribution is a tabular arrangement of data whereby the data is grouped into different intervals, and then the number of observations that belong to each interval is determined.

• Data that is presented in this manner are known as grouped data.

Page 19: Mathematical Statistics

Class limits & frequnecy

Class limits Frequency

5.0 -- 8.9

9.0 – 12.9

13.0 – 16.9

17.0 – 20.9

21.0 – 24.9

25.0 – 28.9

29.0 – 32.9

3

10

14

25

17

9

2

Total 80

Page 20: Mathematical Statistics

Class limit and width

• lower class limit: The smallest value that can belong to a given interval

• upper class limit: The largest value that can belong to the interval.

• Class width: The difference between the upper class limit and the lower class limit is defined to be the.

• When designing the intervals to be used in a frequency distribution, it is preferable that the class widths of all intervals be the same.

Page 21: Mathematical Statistics

Class limits & frequnecy

Class limits Frequency

[5.0, 9.0)

[9.0, 13.0)

[13.0, 17.0)

[17.0, 21.0)

[21.0, 25.0)

[25.0, 29.0)

[29.0, 33.0)

3

10

14

25

17

9

2

Total 80

Page 22: Mathematical Statistics

Variants of frequency distribution

• The cumulative frequency distribution is obtained by computing the cumulative frequency, defined as the total frequency of all values less than the upper class limit of a particular interval, for all intervals.

• Relative frequency: the ratio of the number of observations in the interval to the total number of observations

• The percentage frequency distribution is arrived at by multiplying the relative frequencies of each interval by 100%.

Page 23: Mathematical Statistics

cumulative frequnecy

Class limits Frequency

Less than 5

Less than 9

Less than 13

Less than 17

Less than 21

Less than 25

Less than 29

Less than 33

0

3

13

27

52

69

78

80

Page 24: Mathematical Statistics

Percentage distribution

Class limits Perc. Dist. Frequency

[5.0, 9.0)

[9.0, 13.0)

[13.0, 17.0)

[17.0, 21.0)

[21.0, 25.0)

[25.0, 29.0)

[29.0, 33.0)

3.75%

12.5%

17.5%

31.25%

21.25%

11.25%

2.5%

3

10

14

25

17

9

2

Total 100% 80

Page 25: Mathematical Statistics

Histogram

• The most common form of graphical presentation of a frequency distribution is the histogram.

• Histogram: is constructed of adjacent rectangles; the height of the rectangles is the class frequencies and the bases of the rectangles extend between successive class boundaries.

Page 26: Mathematical Statistics

Histogram in Minitab

Page 27: Mathematical Statistics

1. Graph->histogram->simple

2. Graph variables: c4

3. Edit bars: Click the bars in the output figures, in Binning, Interval type select midpoint and interval definition select midpoint/cutpoint, and then input 7 11 15 19 23 27 31 as illustrated in the following

Page 28: Mathematical Statistics

Density histogram

• When a histogram is constructed from a frequency table having classes of unequal lengths, the height of each rectangle must be changed to

• Height = relative frequency / width.

• The area of the rectangle then represents the relative frequency for the class and the total area of the histogram is 1.

Page 29: Mathematical Statistics

Density histogram

Page 30: Mathematical Statistics

Cumulative histogram

• 1) Graph->histogram->simple

• 2) Dataview->

Datadisplay: check “symbos” only

Smoother: check “lowess” and “0” in degree of smoothing and “1” in number of steps.

Page 31: Mathematical Statistics

Stem-and-leaf Display

• Class limits and frequency, contain data in each class, but the original data points have been lost.

• Stem-and-leaf: function the same as histogram but save the original data points.

• Example: 10 numbers:

• 12, 13, 21, 27, 33, 34, 35, 37, 40, 40

Page 32: Mathematical Statistics

• Frequency table

Class limits Frequency

10 – 19 2

20 – 29 2

30 – 39 4

40 – 49 3

Page 33: Mathematical Statistics

Stem-and-leaf

Stem-and-leaf: each row has a stem and each digit on a stem to the right of the vertical line is a life.

The "stem" is the left-hand column which contains the tens digits.

The "leaves" are the lists in the right-hand column, showing all the ones digits for each of the tens, twenties, thirties, and forties.

Key: “4|0” means 40

Page 34: Mathematical Statistics

Stem-and-leaf in Minitab

• The display has three columns:– The leaves (right) - Each value in the leaf column

represents a digit from one observation. – The stem (middle) - The stem value represents the

digit immediately to the left of the leaf digit. – Counts (left) - If the median value for the sample is

included in a row, the count for that row is enclosed in parentheses. The values for rows above and below the median are cumulative.

Page 35: Mathematical Statistics

Stem-and-leaf for DATA001 • Stem-and-leaf of frequencies N = 80• Leaf Unit = 1.0

• 2 0 67• 6 0 8999• 11 1 00111• 17 1 223333• 24 1 4445555• 32 1 66677777• (13) 1 8888888999999• 35 2 0000000111• 25 2 222223333• 16 2 4444455• 9 2 66667• 4 2 889• 1 3 1

Page 36: Mathematical Statistics

Ch2.5: Descriptive measures

• Mean: the sum of the observation divided by the sample size.

• Median: the center, or location, of a set of data. If the observations are arranged in an ascending or descending order: – If the number of observations is odd, the median is

the middle value. – If the number of observations is even, the median is

the average of the two middle values.

n

xx

n

ii

1

Page 37: Mathematical Statistics

Example

• 15 14 2 27 13

• Mean:

• Ordering the data from smallest to largest

• 2 13 14 15 27

• The median is the third largest value 14

2.145

132721415

x

Page 38: Mathematical Statistics

Sample variance

• Deviations from the mean:

• Standard deviation s:

2

2 1

( )

1

n

ii

x xs

n

2

1

( )

1

n

ii

x xs

n

2 2

2 1 1

( )

( 1)

n n

i ii i

n x xs

n n

Page 39: Mathematical Statistics

Quartiles and Percentiles

• Quartiles: are values in a given set of observations that divide the data in 4 equal parts.

• The first quartile, , is a value that has one fourth, or 25%, of the observation below its value.

• The sample 100 p-th percentile is a value such that at least 100p% of the observation are at or below this value, and at least 100(1-p)% are at or above this value.

1Q

Page 40: Mathematical Statistics

Example

• Example in P34:

1

14.7 15.214.95

2Q

2

19.0 19.119.05

2Q

3

22.9 2322.95

2Q

Page 41: Mathematical Statistics

Boxplots

• A boxplot is a way of summarizing information contained in the quartiles (or on a interval)

• Box length= interquartile range= 3 1Q Q

Page 42: Mathematical Statistics

Modified boxplot

• Outlier: too far from third quartile.

• 1.5(interquartile range) of third quartile.

• Modified boxplot: identify outliers and reduce the effect on the shape of the boxplot.