61
QQS1013 Elementary Statistics DESCRIPTIVE DESCRIPTIVE STATISTICS 2.1 INTRODUCTION Raw data - Data recorded in the sequence in which there are collected and before they are processed or ranked Array data - Raw data that is arranged in ascending or descending order. Here is a list of question asked in a large statistics class and the “raw data” given by one of the students: 1. What is your sex (m=male, f=female)? Answer : m 2. How many hours did you sleep last night? Answer: 5 hours 3. Randomly pick a letter – S or Q. Answer: S 4. What is your height in inches? Answer: 67 inches 5. What’s the fastest you’ve ever driven a car (mph)? Answer: 110 mph Chapter 2: Descriptive Statistics 1 Example 2 Example 1

nota pengantar statistik bab 2

  • Upload
    s126178

  • View
    197

  • Download
    11

Embed Size (px)

DESCRIPTION

sesuai untuk budak uu

Citation preview

Page 1: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

DESCRIPTIVE DESCRIPTIVE STATISTICS

2.1 INTRODUCTION

Raw data - Data recorded in the sequence in which there are

collected and before they are processed or ranked

Array data - Raw data that is arranged in ascending or descending order.

Here is a list of question asked in a large statistics class and the “raw data” given by one of the students:

1. What is your sex (m=male, f=female)?Answer : m

2. How many hours did you sleep last night?Answer: 5 hours

3. Randomly pick a letter – S or Q.Answer: S

4. What is your height in inches?Answer: 67 inches

5. What’s the fastest you’ve ever driven a car (mph)?Answer: 110 mph

Chapter 2: Descriptive Statistics 1

Exampl

e 2

Exampl

e 1

Page 2: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Quantitative raw data Qualitative raw data

These data also called ungrouped data.

2.2 ORGANIZING AND GRAPHING QUALITATIVE DATA

2.2.1 Frequency Distributions Table

A frequency distribution for qualitative data lists all categories and the

number of elements that belong to each of the categories.

It exhibits the frequencies are distributed over various categories

Also called as a frequency distribution table or simply a frequency

table.

e.g. : The number of students who belong to a certain category is called

the frequency of that category.

2.2.2 Relative Frequency and Percentage Distribution

Chapter 2: Descriptive Statistics 2

Page 3: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

A relative frequency distribution is a listing of all categories along with their

relative frequencies (given as proportions or percentages).

It is commonplace to give the frequency and relative frequency distribution

together.

Calculating relative frequency and percentage of a category

Relative Frequency of a category

= Frequency of that category Sum of all frequencies

Percentage (%) = (Relative Frequency)* 100

A sample of UUM staff-owned vehicles produced by Proton was identified and the make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj = Waja, St = Satria, P = Perdana, Sv = Savvy):

Construct a frequency distribution table for these data with their relative frequency and percentage.

W W P Is Is P Is W St Wj

Is W W Wj Is W W Is W Wj

Wj Is Wj Sv W W W Wj St W

Wj Sv W Is P Sv Wj Wj W W

St W W W W St St P Wj Sv

Solution:

Category FrequencyRelative

FrequencyPercentage (%)

Wira 19 19/50 = 0.38 0.38*100 = 38

Chapter 2: Descriptive Statistics

Exampl

e 3

FORMULA

3

Page 4: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Iswara 8 0.16 16

Perdana 4 0.08 8

Waja 10 0.20 20

Satria 5 0.10 10

Savvy 4 0.08 8

Total 50 1.00 100

2.2.3 Graphical Presentation of Qualitative Data

a) Bar Graphs

A graph made of bars whose heights represent the frequencies of

respective categories.

Such a graph is most helpful when you have many categories to

represent.

Notice that a gap is inserted between each of the bars.

It has

o simple/ vertical bar chart

o horizontal bar chart

o component bar chart

o multiple bar chart

Simple/ Vertical Bar Chart

To construct a vertical bar chart, mark the various categories on the horizontal

axis and mark the frequencies on the vertical axis

Chapter 2: Descriptive Statistics 4

Page 5: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Horizontal Bar Chart

To construct a horizontal bar chart, mark the various categories on the vertical

axis and mark the frequencies on the horizontal axis.

Component Bar Chart

To construct a component bar chart, all categories is in one bar and every

bar is divided into components.

The height of components should be tally with representative frequencies.

Suppose we want to illustrate the information below, representing the number of people participating in the activities offered by an outdoor pursuits centre during Jun of three consecutive years.

2004 2005 2006Climbing 21 34 36Caving 10 12 21Walking 75 85 100Sailing 36 36 40

Total 142 167 191

Solution:

Chapter 2: Descriptive Statistics

Exampl

e 4

5

Page 6: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Multiple Bar Chart

To construct a multiple bar chart, each bars that representative any

categories are gathered in groups.

The height of the bar represented the frequencies of categories.

Useful for making comparisons (two or more values).

The bar graphs for relative frequency and percentage distributions can be

drawn simply by marking the relative frequencies or percentages, instead of

the class frequencies.

b) Pie Chart

A circle divided into portions that represent the relative frequencies or

percentages of a population or a sample belonging to different

categories.

An alternative to the bar chart and useful for summarizing a single

categorical variable if there are not too many categories.

The chart makes it easy to compare relative sizes of each

class/category.

Chapter 2: Descriptive Statistics 6

Page 7: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

The whole pie represents the total sample or population. The pie is

divided into different portions that represent the different categories.

To construct a pie chart, we multiply 360o by the relative frequency for

each category to obtain the degree measure or size of the angle for the

corresponding categories.

Movie Genres

Frequency Relative Frequency Angle Size

ComedyActionRomanceDramaHorrorForeignScience Fiction

54362828221616

0.270.180.140.140.110.080.08

360*0.27=97.2o

360*0.18=64.8o

360*0.14=50.4o

360*0.14=50.4o

360*0.11=39.6o

360*0.08=28.8o

360*0.08=28.8o

Total 200 1.00 360o

Chapter 2: Descriptive Statistics

Exampl

e 5

7

Page 8: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

c) Line Graph/Time Series Graph

A graph represents data that occur over a specific period time of time.

Line graphs are more popular than all other graphs combined because

their visual characteristics reveal data trends clearly and these graphs

are easy to create.

When analyzing the graph, look for a trend or pattern that occurs over

the time period.

Example is the line ascending (indicating an increase over time) or

descending (indicating a decrease over time).

Another thing to look for is the slope, or steepness, of the line. A line

that is steep over a specific time period indicates a rapid increase or

decrease over that period.

Two data sets can be compared on the same graph (called a

compound time series graph) if two lines are used.

Data collected on the same element for the same variable at different

points in time or for different periods of time are called time series data.

A line graph is a visual comparison of how two variables—shown on the

x- and y-axes—are related or vary with each other. It shows related

information by drawing a continuous line between all the points on a

grid.

Line graphs compare two variables: one is plotted along the x-axis

(horizontal) and the other along the y-axis (vertical).

The y-axis in a line graph usually indicates quantity (e.g., RM, numbers

of sales litres) or percentage, while the horizontal x-axis often measures

units of time. As a result, the line graph is often viewed as a time series

graph

Chapter 2: Descriptive Statistics

Exampl

e 6 8

Page 9: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

A transit manager wishes to use the following data for a presentation showing how Port Authority Transit ridership has changed over the years. Draw a time series graph for the data and summarize the findings.

YearRidership

(in millions)19901991199219931994

88.085.075.776.675.4

Solution:

The graph shows a decline in ridership through 1992 and then leveling off for the years 1993 and 1994.

EXERCISE 1

Chapter 2: Descriptive Statistics 9

Page 10: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

1. The following data show the method of payment by 16 customers in a supermarket checkout line. ( C = cash, CK = check, CC = credit card, D = debit and O = other ).

C CK CK C CC D O CCK CC D CC C CK CK CC

a. Construct a frequency distribution table.b. Calculate the relative frequencies and percentages for all categories.c. Draw a pie chart for the percentage distribution.

2. The frequency distribution table represents the sale of certain product in ZeeZee Company. Each of the products was given the frequency of the sales in certain period. Find the relative frequency and the percentage of each product. Then, construct a pie chart using the obtained information.

Type of Product

Frequency Relative Frequency Percentage Angle Size

ABCDE

131259

113. Draw a time series graph to represent the data for the number of worldwide airline

fatalities for the given years.

Year 1990 1991 1992 1993 1994 1995 1996No. of fatalities

440 510 990 801 732 557 1132

4. A questionnaire about how people get news resulted in the following information from 25 respondents (N = newspaper, T = television, R = radio, M = magazine).

N N R T TR N T M RM M N R NT R M N MT R R N N

a. Construct a frequency distribution for the data.b. Construct a bar graph for the data.

5. The given information shows the export and import trade in million RM for four months of sales in certain year. Using the provided information, present this data in component bar graph.

Month Export ImportSeptember

OctoberNovemberDecember

28303224

20281714

6. The following information represents the maximum rain fall in millimeter (mm) in each state in Malaysia. You are supposed to help a

Chapter 2: Descriptive Statistics 10

Page 11: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

meteorologist in your place to make an analysis. Based on your knowledge, present this information using the most appropriate chart and give your comment.

State Quantity (mm)

PerlisKedahPulau PinangPerakSelangorWilayah Persekutuan Kuala LumpurNegeri SembilanMelakaJohorPahangTerengganuKelantanSarawakSabah

435512163721664

1003390223876

10501255986878456

2.3 ORGANIZING AND GRAPHING QUANTITATIVE DATA

2.3.1 Stem-and-Leaf Display

In stem and leaf display of quantitative data, each value is divided into two

portions – a stem and a leaf. Then the leaves for each stem are shown

separately in a display.

Gives the information of data pattern.

Can detect which value frequently repeated.

25 12 9 10 5 12 23 736 13 11 12 31 28 37 614 41 38 44 13 22 18 19

Solution:

0 9 5 7 61 2 0 2 3 1 2 4 3 8 92 5 3 8 23 6 1 7 84 1 4

2.3.2 Frequency Distributions

Chapter 2: Descriptive Statistics

Exampl

e 7

11

Page 12: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

A frequency distribution for quantitative data lists all the classes and the

number of values that belong to each class.

Data presented in form of frequency distribution are called grouped data.

The class boundary is given by the midpoint of the upper limit of one

class and the lower limit of the next class. Also called real class limit.

To find the midpoint of the upper limit of the first class and the lower limit

of the second class, we divide the sum of these two limits by 2.

e.g.:

Class Width (class size)

Class width = Upper boundary – Lower boundary

e.g. : Width of the first class = 600.5 – 400.5 = 200

Class Midpoint or Mark

Chapter 2: Descriptive Statistics

class boundary

FORMULA

FORMULA

12

Page 13: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

e.g:

Constructing Frequency Distribution Tables

1. To decide the number of classes, we used Sturge’s formula, which is

c = 1 + 3.3 log n

where c is the no. of classes

n is the no. of observations in the data set.

2. Class width,

This class width is rounded up to a convenient number.

3. Lower Limit of the First Class or the Starting Point Use the smallest value in the data set.

Chapter 2: Descriptive Statistics

Exampl

e 8

FORMULA

FORMULA

13

Page 14: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

The following data give the total home runs hit by all players of each of the 30 Major League Baseball teams during 2004 season.

i) Number of classes, c = 1 + 3.3 log 30 = 1 + 3.3(1.48)

= 5.89 6 class

ii) Class width,

iii) Starting Point = 135

Table 2.10 : Frequency Distribution for Data of Table 2.9Total Home Runs Tally f

135 – 153153 – 171171 – 189189 – 207207 – 225225 – 242

|||| |||||||||| |||| ||||||||

1025634

2.3.3 Relative Frequency and Percentage Distributions

Chapter 2: Descriptive Statistics

FORMULA

14

Page 15: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

(Refer example 8)

Table 2.11: Relative Frequency and Percentage Distributions

Total Home Runs Class BoundariesRelative

Frequency%

135 – 153153 – 171171 – 189189 – 207207 – 225225 – 242

134.5 less than 152.5152.5 less than 170.5170.5 less than 188.5188.5 less than 206.5206.5 less than 224.5224.5 less than 242.5

0.33330.06670.16670.20000.10000.1333

33.33 6.6716.6720.0010.0013.33

Total 1.0 100%

2.3.4 Graphing Grouped Data

a) Histograms

A histogram is a graph in which the class boundaries are marked on the

horizontal axis and either the frequencies, relative frequencies, or percentages

are marked on the vertical axis. The frequencies, relative frequencies or

percentages are represented by the heights of the bars.

In histogram, the bars are drawn adjacent to each other and there is a space

between y axis and the first bar.

(Refer example 8)

Frequency histogram for Table 2.9

b) Polygon

A graph formed by joining the midpoints of the tops of successive bars in a

histogram with straight lines is called a polygon.

Chapter 2: Descriptive Statistics

134.5 152.5 170.5 188.5 206.5 224.5 242.5

Exampl

e 9

Exampl

e 10

15

Page 16: nota pengantar statistik bab 2

0

2

4

6

8

10

12

1 Total home runs

Fre

qu

en

cy

QQS1013 Elementary Statistics

Frequency polygon for Table 2.11

For a very large data set, as the number of classes is increased (and the width of

classes is decreased), the frequency polygon eventually becomes a smooth

curve called a frequency distribution curve or simply a frequency curve.

Frequency distribution curve

c) Shape of Histogram

Same as polygon.

For a very large data set, as the number of classes is increased (and the width

of classes is decreased), the frequency polygon eventually becomes a smooth

curve called a frequency distribution curve or simply a frequency curve.

The most common of shapes are:

(i) Symmetric

(ii) Right skewed

(iii) Left skewed

Chapter 2: Descriptive Statistics

Exampl

e 11

134.5 152.5 170.5 188.5 206.5 224.5 242.5

16

Page 17: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Symmetric histograms

Right skewed and Left skewed

Describing data using graphs helps us insight into the main characteristics of the

data.

When interpreting a graph, we should be very cautious. We should observe

carefully whether the frequency axis has been truncated or whether any axis has

been unnecessarily shortened or stretched.

2.3.5 Cumulative Frequency Distributions

A cumulative frequency distribution gives the total number of values that

fall below the upper boundary of each class.

Chapter 2: Descriptive Statistics

Exampl

e 1217

Page 18: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Using the frequency distribution of table 2.11,

Total Home Runs

Class Boundaries fCumulative Frequency

135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242

134.5 less than 152.5152.5 less than 170.5170.5 less than 188.5188.5 less than 206.5206.5 less than 224.5224.5 less than 242.5

10 2 5 6 3 4

1010+2=1210+2+5=1710+2+5+6=2310+2+5+6+3=2610+2+5+6+3+4=30

Ogive

An ogive is a curve drawn for the cumulative frequency distribution by joining

with straight lines the dots marked above the upper boundaries of classes at

heights equal to the cumulative frequencies of respective classes.

Two type of ogive:

(i) ogive less than(ii) ogive greater than

First, build a table of cumulative frequency.

(Ogive Less Than)Earnings

(RM)Number of students (f) Earnings (RM)

CumulativeFrequency (F)

30 – 39 5 Less than 29.5 040 – 49 6 Less than 39.5 550 – 59 6 Less than 49.5 1160 - 69 3 Less than 59.5 1770 – 79 3 Less than 69.5 2080 - 89 7 Less than 79.5 23

Less than 89.5 30Total 30

Graph Ogive Less Than

Chapter 2: Descriptive Statistics

Exampl

e 13

0

5

10

15

20

25

30

35

29.5 39.5 49.5 59.5 69.5 79.5 89.5

Cu

mu

lati

ve F

req

uen

cy

Earnings

18

Page 19: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

(Ogive More Than)

Earnings (RM)

Number of students (f) Earnings (RM)

CumulativeFrequency (F)

30 – 39 5 More than 29.5 3040 – 49 6 More than 39.5 2550 – 59 6 More than 49.5 1960 - 69 3 More than 59.5 1370 – 79 3 More than 69.5 1080 - 89 7 More than 79.5 7

More than 89.5 0Total 30

Graph Ogive More Than

2.3.6 Box-Plot

Describe the analyze data graphically using 5 measurement: smallest

value, first quartile (K1), second quartile (median or K2), third quartile

(K3) and largest value.

Chapter 2: Descriptive Statistics

0

5

10

15

20

25

30

35

29.5 39.5 49.5 59.5 69.5 79.5 89.5

Earnings

Cumulative Frequency

Exampl

e 14

19

Page 20: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

2.4 MEASURES OF CENTRAL TENDENCY

2.4.1 Ungrouped Data Measurement

Mean

Mean for population data:

Mean for sample data:

where: = the sum of all values

N = the population size n = the sample size,

µ = the population mean

= the sample mean

The following data give the prices (rounded to thousand RM) of five homes sold recently in Sekayang.

158 189 265 127 191

Chapter 2: Descriptive Statistics

Smallest value

Largest value

K1 Median K3

Largest value

K1 Median K3

Largest value

K1 Median K3

Smallest value

Smallest value

For symmetry data

For left skewed data

For right skewed data

Exampl

e 15

FORMULA

20

Page 21: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Find the mean sale price for these homes.

Solution:

Thus, these five homes were sold for an average price of RM186 thousand @ RM186 000.

The mean has the advantage that its calculation includes each value of the data set.

Weighted Mean

Used when have different needs.

Weight mean :

where w is a weight.

Consider the data of electricity components purchasing from a factory in the table below:

Chapter 2: Descriptive Statistics

Exampl

e 16

FORMULA

21

Page 22: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Type Number of component (w) Cost/unit (x)

12345

1200 50025001000 800

RM3.00RM3.40RM2.80RM2.90RM3.25

Total 6000

Solution:

Mean cost of a unit of the component is RM2.97

Median

Median is the value of the middle term in a data set that has been

ranked in increasing order.

Procedure for finding the Median

Step 1: Rank the data set in increasing order.

Step 2: Determine the depth (position or location) of the median.

Step 3: Determine the value of the Median.

Find the median for the following data:10 5 19 8 3

Solution:

Chapter 2: Descriptive Statistics

Exampl

e 17

FORMULA

22

Page 23: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

(1) Rank the data in increasing order

3 5 8 10 19

(2) Determine the depth of the Median

(3) Determine the value of the median

Therefore the median is located in third position of the data set.

3 5 8 10 19

Hence, the Median for above data = 8

Find the median for the following data:10 5 19 8 3 15

Solution:

(1) Rank the data in increasing order

3 5 8 10 15 19 (2) Determine the depth of the Median

(3) Determine the value of the Median

Therefore the median is located in the middle of 3rd position and 4th

position of the data set.

Hence, the Median for the above data = 9 The median gives the center of a histogram, with half of the data values

to the left of (or, less than) the median and half to the right of (or, more

than) the median.

The advantage of using the median is that it is not influenced by outliers.

Chapter 2: Descriptive Statistics

Exampl

e 18

23

Page 24: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Mode

Mode is the value that occurs with the highest frequency in a data set.

1. What is the mode for given data? 77 69 74 81 71 68 74 73

2. What is the mode for given data? 77 69 68 74 81 71 68 74 73

Solution:

1. Mode = 74 (this number occurs twice): Unimodal

2. Mode = 68 and 74: Bimodal

A major shortcoming of the mode is that a data set may have none or

may have more than one mode.

One advantage of the mode is that it can be calculated for both kinds of

data, quantitative and qualitative.

2.4.2 Grouped Data Measurement

Mean

Mean for population data:

Mean for sample data:

Where the midpoint and f is the frequency of a class.

The following table gives the frequency distribution of the number of orders received each day during the past 50 days at the office of a mail-order company. Calculate the mean.

Chapter 2: Descriptive Statistics

Number of order f

10 – 1213 – 1516 – 1819 – 21

4122014

  n = 50

Exampl

e 19

Exampl

e 20

FORMULA

24

Page 25: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Solution:

Because the data set includes only 50 days, it represents a sample. The value of

is calculated in the following table:

Number of order f x fx

10 – 1213 – 1516 – 1819 – 21

4122014

11141720

44168340280

  n = 50 = 832

The value of mean sample is:

Thus, this mail-order company received an average of 16.64 orders per day during these 50 days.

Median

Step 1: Construct the cumulative frequency distribution.

Step 2: Decide the class that contain the median.

Class Median is the first class with the value of cumulative frequency is

at least n/2.

Step 3: Find the median by using the following formula:

Based on the grouped data below, find the median:

Time to travel to work Frequency

Chapter 2: Descriptive Statistics

Where:n = the total frequencyF = the total frequency before class

mediani = the class width

mL = the lower boundary of the class median

mf = the frequency of the class median

Exampl

e 21

FORMULA

25

Page 26: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

1 – 1011 – 2021 – 3031 – 4041 – 50

8141297

Solution:

1st Step: Construct the cumulative frequency distribution

Time to travel to work Frequency Cumulative Frequency

1 – 1011 – 2021 – 3031 – 4041 – 50

8141297

822344350

class median is the 3rd class

So, F = 22 , = 12, = 21.5 and i = 10

Therefore,

Thus, 25 persons take less than 24 minutes to travel to work and another 25 persons take more than 24 minutes to travel to work.

Mode

Mode is the value that has the highest frequency in a data set.

Chapter 2: Descriptive Statistics 26

Page 27: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

For grouped data, class mode (or, modal class) is the class with the

highest frequency.

Formula of mode for grouped data:

Where:

is the lower boundary of class mode

is the difference between the frequency of class mode and the frequency of the class before the class mode

is the difference between the frequency of class mode and the frequency of the class after the class mode

i is the class width

Based on the grouped data below, find the mode

Time to travel to work Frequency1 – 10

11 – 2021 – 3031 – 4041 – 50

8141297

Solution:

Based on the table,

= 10.5, = (14 – 8) = 6, = (14 – 12) = 2 and i = 10

We can also obtain the mode by using the histogram;

Chapter 2: Descriptive Statistics

Exampl

e 22

FORMULA

27

Page 28: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

2.4.3 Relationship among Mean, Median & Mode

As discussed in previous topic, histogram or a frequency distribution curve

can assume either skewed shape or symmetrical shape.

Knowing the value of mean, median and mode can give us some idea

about the shape of frequency curve.

(1) For a symmetrical histogram and frequency curve with one peak, the

value of the mean, median and mode are identical and they lie at the

center of the distribution.

Mean, median, and mode for a symmetric histogram and frequency distribution curve

(2) For a histogram and a frequency curve skewed to the right, the value of

the mean is the largest that of the mode is the smallest and the value

of the median lies between these two.

Chapter 2: Descriptive Statistics 28

Page 29: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Mean, median, and mode for a histogram and frequency distribution curve skewed to the right

(3) For a histogram and a frequency curve skewed to the left, the value of

the mean is the smallest and that of the mode is the largest and the

value of the median lies between these two.

Mean, median, and mode for a histogram and frequency distribution curve skewed to the left

2.5 DISPERSION MEASUREMENT

The measures of central tendency such as mean, median and mode do not

reveal the whole picture of the distribution of a data set.

Chapter 2: Descriptive Statistics 29

Page 30: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Two data sets with the same mean may have a completely different spreads.

The variation among the values of observations for one data set may be

much larger or smaller than for the other data set.

2.5.1 Ungrouped Data Measurement

Range

RANGE = Largest value – Smallest value

Find the range of production for this data set,

Solution:

Range = Largest value – Smallest value = 267 277 – 49 651 = 217 626

Disadvantages:

o being influenced by outliers.o based on two values only. All other values in a data set are ignored.

Variance and Standard Deviation

Standard deviation is the most used measure of dispersion.

A Standard Deviation value tells how closely the values of a data set

clustered around the mean.

Chapter 2: Descriptive Statistics

Exampl

e 23

FORMULA

30

Page 31: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Lower value of standard deviation indicates that the data set value are

spread over relatively smaller range around the mean.

Larger value of data set indicates that the data set value are spread

over relatively larger around the mean (far from mean).

Standard deviation is obtained the positive root of the variance:

Variance for population:

Variance for sample:

Standard Deviation for population:

Standard Deviation for sample:

Let x denote the total production (in unit) of company

Company ProductionABCDE

6293

1267534

Find the variance and standard deviation,

Solution:

Company Production (x) x2

Chapter 2: Descriptive Statistics

Exampl

e 24

FORMULA

FORMULA

31

Page 32: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

ABCDE

6293

1267534

3 8448 649

15 8765 6251 156

  1156 

Since s2 = 1182.50;

Therefore,

The properties of variance and standard deviation:

o The standard deviation is a measure of variation of all values from the

mean.

o The value of the variance and the standard deviation are never

negative. Also, larger values of variance or standard deviation indicate

greater amounts of variation.

o The value of s can increase dramatically with the inclusion of one or

more outliers.

o The measurement units of variance are always the square of the

measurement units of the original data while the units of standard

deviation are the same as the units of the original data values.

2.5.2 Grouped Data Measurement

Range

Chapter 2: Descriptive Statistics 32

Page 33: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Class Frequency

41 – 5051 – 6061 – 7071 – 8081 – 9091 - 100

1 3 71310 6

Total 40

Upper bound of last class = 100.5Lower bound of first class = 40.5Range = 100.5 – 40.5 = 60

Variance and Standard Deviation

Variance for population:

Variance for sample:

Standard Deviation:

Population:22

Sample:22 ss

Find the variance and standard deviation for the following data:

Chapter 2: Descriptive Statistics

Range = Upper bound of last class – Lower bound of first class

Exampl

e 25

FORMULA

FORMULA

FORMULA

33

Page 34: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

No. of order f

10 – 1213 – 1516 – 1819 – 21

4122014

Total n = 50

Solution:

No. of order f x fx fx2

10 – 1213 – 1516 – 1819 – 21

4122014

11141720

44168340280

484235257805600

Total n = 50 857 14216

Variance,

Standard Deviation,

75.25820.72 ss

Thus, the standard deviation of the number of orders received at the office of this mail-order company during the past 50 days is 2.75.

2.5.3 Relative Dispersion Measurement

To compare two or more distribution that has different unit based on their

dispersion OR

Chapter 2: Descriptive Statistics 34

Page 35: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

To compare two or more distribution that has same unit but big different in

their value of mean.

Also called modified coefficient or coefficient of variation, CV.

)(%100

)(%100

populationx

CV

samplex

sCV

Given mean and standard deviation of monthly salary for two groups of worker who are working in ABC company- Group 1: 700 & 20 and Group 2 :1070 & 20. Find the CV for every group and determine which group is more dispersed.

Solution:

The monthly salary for group 1 worker is more dispersed compared to group 2.

2.6 MEASURE OF POSITION

Determines the position of a single value in relation to other values in a

sample or a population data set.

Quartiles

Quartiles are three summary measures that divide ranked data set into four equal parts.

Chapter 2: Descriptive Statistics

Exampl

e 26

FORMULA

35

Page 36: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

o The 1st quartiles – denoted as Q1

o The 2nd quartiles – median of a data set or Q2

o The 3rd quartiles – denoted as Q3

Table below lists the total revenue for the 11 top tourism company in Malaysia

109.7 79.9 21.2 76.4 80.2 82.1 79.4 89.3 98.0 103.5

86.8

Solution:

Step 1: Arrange the data in increasing order

76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7

121.2

Step 2: Determine the depth for Q1 and Q3

Step 3: Determine the Q1 and Q3

76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7

121.2

Q1 = 79.9 ; Q3 = 103.5

Table below lists the total revenue for the 12 top tourism company in Malaysia

109.7 79.9 74.1 121.2 76.4 80.2 82.1 79.4 89.3

98.0 103.5 86.8

Solution:

Step 1: Arrange the data in increasing order

Chapter 2: Descriptive Statistics

Exampl

e 27

Exampl

e 28

FORMULA

FORMULA

36

Page 37: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7

121.2

Step 2: Determine the depth for Q1 and Q3

Step 3: Determine the Q1 and Q3

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7

121.2

Q1 = 79.4 + 0.25 (79.9 – 79.4) = 79.525

Q3 = 98.0 + 0.75 (103.5 – 98.0) = 102.125

Interquartile Range

The difference between the third quartile and the first quartile for a data

set.

IQR = Q3 – Q1

By referring to example 28, calculate the IQR.

Solution:

IQR = Q3 – Q1 = 102.125 – 79.525 = 22.62.6.2 Grouped Data Measurement

Quartiles

From Median, we can get Q1 and Q3 equation as follows:

Chapter 2: Descriptive Statistics

Exampl

e 29

FORMULA

FORMULA

37

Page 38: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Refer to example 22, find Q1 and Q3

Solution:

1st Step: Construct the cumulative frequency distribution

Time to travel to work Frequency Cumulative Frequency

1 – 1011 – 2021 – 3031 – 4041 – 50

81412 9 7

822344350

2nd Step: Determine the Q1 and Q3

Class Q1 is the 2nd class

Therefore,

Chapter 2: Descriptive Statistics

Exampl

e 30

38

Page 39: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Class Q3 is the 4th class

Therefore,

Interquartile Range

IQR = Q3 – Q1

Refer to example 30, calculate the IQR.

Solution:

IQR = Q3 – Q1 = 34.3889 – 13.7143 = 20.6746

2.7 MEASURE OF SKEWNESS

To determine the skewness of data (symmetry, left skewed, right skewed)

Also called Skewness Coefficient or Pearson Coefficient of Skewness

Chapter 2: Descriptive Statistics

Exampl

e 31

FORMULA

FORMULA

39

Page 40: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

s

ModeMeanS

ors

ModeMeanS

k

k

)(3

If Sk +ve right skewed

If Sk -ve left skewed

If Sk = 0 symmetry

If Sk takes a value in between (-0.9999, -0.0001) or (0.0001,

0.9999) approximately symmetry.

The duration of cancer patient warded in Hospital Seberang Jaya recorded in a frequency distribution. From the record, the mean is 28 days, median is 25 days and mode is 23 days. Given the standard deviation is 4.2 days.

a. What is the type of distribution?

b. Find the skewness coefficient

Solution:

This distribution is right skewed because the mean is the largest value

So, from the Sk value this distribution is right skewed.

ADDITIONAL INFORMATION

Use of Standard Deviation

1. Chebyshev’s Theorem

Chapter 2: Descriptive Statistics

Exampl

e 32

40

Page 41: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

According to Chebyshev’s Theorem, for any number k greater than 1, at least (1 – 1/k2) of the data values lie within k standard deviations of the mean.

%[email protected]

2

11

11

2

2

k

Thus; for example if k = 2, then

Therefore, according to Chebyshev’s Theorem, at least 75% of the values of a data set lie within two standard deviation of the mean

2. Empirical Rule

For a bell-shaped distribution, approximately

1.68%of the observations lie within one standard deviation of the mean.

2.95% of the observations lie within two standard deviations of mean.

3.99.7% of the observations lie within three standard deviations of the mean.

Measure of Position

1. Ungrouped Data - Quartile Deviation

QD is a mean for Interquartile Range

Chapter 2: Descriptive Statistics 41

Page 42: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

It used to compare the dissemination of two data set.

If the QD value is high, it means that the data is more

disseminated.

Quartile Deviation = Interquartile Range / 2 = (Q3 - Q1) / 2

2. Ungrouped Data – Percentile

Pk = value of the (kn)th term in a ranked set 100

Where: k = the number of percentile n = the sample size

Percentile rank of xi = Number of values than xi X 100 Total number of values in the data set

Chapter 2: Descriptive Statistics 42

Page 43: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

EXERCISE 2

1. A survey research company asks 100 people how many times they have been to the dentist in the last five years. Their grouped responses appear below.

Number of Visits Number of Responses0 – 4 165 – 9 25

10 – 14 4815 – 19 11

What are the mean and variance of the data?

2. A researcher asked 25 consumers: “How much would you pay for a television adapter that provides Internet access?” Their grouped responses are as follows:

Amount ($) Number of Responses

0 – 99 2100 – 199 2200 – 249 3250 – 299 3300 – 349 6350 – 399 3400 – 499 4500 – 999 2

Calculate the mean, variance, and standard deviation.

3. The following data give the pairs of shoes sold per day by a particular shoe store in the last 20 days.

85 90 89 70 79 80 83 83 75 7689 86 71 76 77 89 70 65 90 86

Calculate thea. mean and interpret the value.

b. median and interpret the value.

c.mode and interpret the value.

d. standard deviation.

4. The followings data shows the information of serving time (in minutes) for 40 customers in a post office:

2.0 4.5 2.5 2.9 4.2 2.9 3.5 2.83.2 2.9 4.0 3.0 3.8 2.5 2.3 3.52.1 3.1 3.6 4.3 4.7 2.6 4.1 3.14.6 2.8 5.1 2.7 2.6 4.4 3.5 3.02.7 3.9 2.9 2.9 2.5 3.7 3.3 2.4

Chapter 2: Descriptive Statistics 43

Page 44: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

a.Construct a frequency distribution table with 0.5 of class width.

b.Construct a histogram.

c.Calculate the mode and median of the data.

d.Find the mean of serving time.

e.Determine the skewness of the data.

f. Find the first and third quartile value of the data.

g.Determine the value of interquartile range.

5. In a survey for a class of final semester student, a group of data was obtained for the number of text books owned.

Number of students Number of text book owned

129

1115108

553210

Find the average number of text book for the class. Use the weighted mean.

6.The following data represent the ages of 15 people buying lift tickets at a ski area.

15 25 26 17 38 16 60 2130 53 28 40 20 35 31

Calculate the quartile and interquartile range.

7.A student scores 60 on a mathematics test that has a mean of 54 and a standard deviation of 3, and she scores 80 on a history test with a mean of 75 and a standard deviation of 2. On which test did she perform better?

8.The following table gives the distribution of the share’s price for ABC Company which was listed in BSKL in 2005.

Price (RM) Frequency12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

Find the mean, median and mode for this data.ANSWER EXERCISE 1

Chapter 2: Descriptive Statistics 44

Page 45: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

1. a) Frequency distribution table, relative frequencies, percentages and angle sizes of all categories.

Method of payment

Frequency, f Relative frequency

Percentage (%) Angle Size (o)

CashCheckCredit CardDebitOther

45421

0.25000.31250.25000.12500.0625

25 31.25

25 12.50 6.25

90112.5

9045

22.5Total 16 1.0 100 360

b). Pie Chart

25%

31%

25%

13%

6%

Cash

Check

Credit Card

Debit

Other

2. a). Frequency distribution table, relative frequencies, percentages and angle sizes of all categories.

Type of product Frequency

Relative Frequency

Percentage (%)

Angle Size (o)

A 13 0.26 26 93.6B 12 0.24 24 86.4C 5 0.1 10 36D 9 0.18 18 64.8E 11 0.22 22 79.2Total 50 1 100 360

b). Pie Chart

A, 13

B, 12C, 5

D, 9

E, 11

A

B

C

D

E

3. Time series graph

Chapter 2: Descriptive Statistics 45

Page 46: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7

Time

No.

of F

atal

ities

4. a). Frequency Distribution TableSource of news Frequency, fNewspaper 8Television 5Radio 7Magazine 5Total 25

b). Bar Graph

0

1

2

3

4

5

6

7

8

9

Newspaper Television Radio Magazine

Source of news

Fre

quen

cy

5. Component bar graph

0

10

20

30

40

50

60

70

September October November December

Month

Fre

quen

cy

Import

Export

6. Bar Graph

Chapter 2: Descriptive Statistics 46

Page 47: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Quantity (mm)

0

200400

600800

10001200

1400

Quantity (mm)

The highest quantity of rain fall is coming from Terengganu state, second goes to Pahang and followed by Kuala Lumpur. The lowest rain fall is Pulau Pinang state. The rain fall is not equally distributed.

ANSWER EXERCISE 2

1. Class f x fx fx^20 - 4 16 2 32 645 - 9 25 7 175 122510 -14 48 12 576 691215 - 19 11 17 187 3179

100 970 11380

2.Class f x fx fx^20-99 2 49.5 99 4900.5100-199 2 149.5 299 44700.5200-249 3 224.5 673.5 151200.75250-299 3 274.5 823.5 226050.75300-349 6 324.5 1947 631801.5350-399 3 374.5 1123.5 420750.75400-499 4 449.5 1798 808201500-999 2 749.5 1499 1123500.5

25 8262.53411106.2

5

Chapter 2: Descriptive Statistics

Mean =970/100 = 9.7

Standard Deviation = 4.46196

Variance = 19.90909

Mean = 330.5

Standard Deviation = 168.368396

Variance = 28347.9167

47

Page 48: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

3.65 422570 490070 4900

71 504175 562576 577676 577677 592979 624180 640083 688983 688985 722586 739686 739689 792189 792189 792190 810090 8100

1609 130571

4. Sturge's FormulaNumber of classes,c = 1 + 3.3 log 40 = 6.2868 = 6

Class Width,I > (5.1 - 2)/6 = 0.5167 = 0.6

Starting Point = 2.0

Frequency Distribution TableClass f CF x fx fx^22.0 - 2.5 7 7 2.2 15.4 33.882.6 - 3.1 15 22 2.7 40.5 109.353.2 - 3.7 7 29 3.2 22.4 71.683.8 - 4.3 6 35 3.7 22.2 82.144.4 - 4.9 4 39 4.2 16.8 70.565.0 - 5.5 1 40 4.7 4.7 22.09

40 122 389.7

class mode = second classmode = 2.85

Chapter 2: Descriptive Statistics

position of median = 10.5

median = (80+83)/2 = 81.5

Mean = 80.45

Mode = 89

Variance = 59.31316

s = 7.701504

48

Page 49: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

class median = 40/2 = 20 = second classMedian = 3.07Mean = 3.05

Skewness = mode - median - mean = Right Skew

40/4 = 10Q1 = 2.67

Q3 = 3.85

IQR = 1.18

5. x w xw

12 5 609 5 45

11 3 3315 2 3010 1 10

8 0 016 178

6. 151617202125262830313538405360

7. CV (Mathematics) = 3/54 * 100% = 5.5556

CV (History) = 2/75 * 100% = 2.6667

Since the coefficient of variation for History is less than Mathematics so, the student performs better for History.

Chapter 2: Descriptive Statistics

Mean = 11.125

Position of Q1 = 4Q1 = 20

Position of Q3 = 12Q3 = 38

IQR = 18

49

Page 50: nota pengantar statistik bab 2

QQS1013 Elementary Statistics

Chapter 2: Descriptive Statistics 50