23
CHAPTER 2 : DESCRIPTIVE STATISTICS: TABULAR & GRAPHICAL PRESENTATION

2.1 Summarizing Qualitative Data A graphic display can reveal at a glance the main characteristics of a data set. Three types of graphs used to display

Embed Size (px)

Citation preview

Page 1: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

CHAPTER 2 : DESCRIPTIVE STATISTICS: TABULAR &

GRAPHICAL PRESENTATION

Page 2: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

2.1 Summarizing Qualitative Data A graphic display can reveal at a glance the main

characteristics of a data set. Three types of graphs used to display qualitative

data:-- bar graph - pie chart- line chart

Their presentation are depend on the nature of data, whether the data is in quantitative (ex. income and CGPA) or qualitative (ex. Gender and ethnic group).

DESCRIPTVE STATISTICS : TABULAR & GRAPHICAL PRESENTATION

Page 3: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

Types of Graph Qualitative

Data

Page 4: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

Example 2.1 :-Table 2.1 shows that the data of 25 UNIMAP students with their data and background.

Code used :• For gender: 1 is male

and 2 is female• For ethnic group: 1 is

Malay, 2 is Chinese, 3 is Indian and 4 is others

• Not much information can be obtained from the data 1 in the raw form. It has to be summarized so that we can get more informations.

Table 2.1 : Data of 25 UNIMAP students

Page 5: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

If data from table 2.1 summarized into gender and ethnic group, then the frequency tables can get as below :

Observation Frequency

Male 28

Female 22

Total 50

Table 2.2: Frequency Table for the Gender

Observation Frequency

Malay 33

Chinese 9

Indian 6

Others 2

Total 50

Table 2.3: Frequency Table for the Ethnic Group

Page 6: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

2.1.1 Bar Chart Bar chart is used to display the frequency

distribution in the graphical form. It consists of two orthogonal axes and one of the axes represent the observations while the other one represents the frequency of the observations. The frequency of the observations is represented by a bar.*Bar chart is for data from Table 2.3.

Figure 1: Bar Chart of the Ethnic Group

Page 7: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

2.1.2 Pie Chart Pie Chart is used to display the frequency

distribution. It displays the ratio of the observations. It is a circle consists of a few sectors. The sectors represent the observations while the area of the sectors represent the proportion of the frequencies of that observations.

*Pie chart is for data from Table 2.2.

Figure 2: The Pie Chart for the Gender

Page 8: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

2.1.3 Line Chart Line chart is used to display the trend of observations. It

consists of two orthoganal axes and one of the axes represent the observations while the other one represents the frequency of the observations. The frequency of the observations are joint by lines.

Example :Table 2.4 below shows the number of sandpipers recorded between January 1989 till December 1989.Jan Feb Mar Apr Ma

yJune

July Aug Sept

Oct Nov Dec

10 7 5 10 39 7 260 316 142 11 4 9

Figure 3: The line Chart for the numbers of common Sandpipers

Table 2.4 : The number of sandpipers

Page 9: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

2.2 Summarizing Quantitative Data2.2.1 Frequency Distribution When summarizing large quantities of raw data, it is often useful to distribute the

data into classes. In determining the classes, there is no spesific rules but statistician suggest the number of classes are between 5 to 20. Table 1.3 shows that the number of classes for Students` CGPA.

A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class.

Data presented in the form of a frequency distribution are called grouped data.

CGPA (Class) Frequency2.50 - 2.75 22.75 - 3.00 103.00 - 3.25 153.25 - 3.50 133.50 - 3.75 73.75 - 4.00 3

Total 50

Table 2.5: The Fequency Distribution of the Students’

CGPA

Page 10: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

For quantitative data, an interval that includes all the values that fall within two numbers; the lower and upper class which is called class. Class is in first column for frequency distribution table.*Classes always represent a variable, non-overlapping; each value is belong to one and only one class.

The numbers listed in second column are called frequencies, which gives the number of values that belong to different classes. Frequencies denoted by f.

Weekly Earnings (dollars)

Number of Employees, f

801-1000 9

1001-1200 22

1201-1400 39

1401-1600 15

1601-1800 9

1801-2000 6

Variable Frequencycolumn

Third class (Interval Class)

Lower Limit of the sixth class

Frequencyof the third class.

Upper limit of the sixth class

Table 2.6 : Weekly Earnings of 100 Employees of a Company

Page 11: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

The class boundary is given by the midpoint of the upper limit of one class and the lower limit of the next class.

The difference between the two boundaries of a class gives the class width; also called class size.

Formula:- Class Midpoint or MarkClass midpoint or mark = (Lower Limit + Upper Limit)/2- Finding The Number of Classes Number of classes = 1 + 3.3 log nn- no of observations.- Finding Class Width Between Two Boundaries

c= Upper boundary – Lower Boundary

- Finding Class Width For Interval ClassApproximate class width = (Largest value – Smallest

value)/Number of classes

* Any convenient number that is equal to or less than the smallest values in the data set can be used as the lower limit of the first class.

1 3.3log n

Page 12: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

2.2.2 Cumulative Frequency Distributions A cumulative frequency distribution gives the total number of

values that fall below the upper boundary of each class. In cumulative frequency distribution table, each class has the

same lower limit but a different upper limit.Table 2.7 : Class Limit, Class Boundaries, Class Width , Cumulative Frequency

Weekly Earnings (dollars)(Class Limit)

Number of

Employees, f

Class Boundaries

Class Width

Cumulative Frequency

801-1000 9 800.5 – 1000.5

200 9

1001-1200 22 1000.5 – 1200.5

200 9 + 22 = 31

1201-1400 39 1200.5 – 1400.5

200 31 + 39 = 70

1401-1600 15 1400.5 – 1600.5

200 70 + 15 = 85

1601-1800 9 1600.5 – 1800.5

200 85 + 9 = 94

1801-2000 6 1800.5 – 2000.5

200 94 + 6 = 100

Page 13: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

Tabular presentation for quantitative data is usually in the form of frequency distribution that is a table represent the frequency of the observation that fall inside some specific classes (intervals) .

There are few graphs available for the graphical presentation of the quantitative data.  

Most popular graphs

HistogramFrequency Polygon

Ogive

Page 14: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

Frequency Poligon

Page 15: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

Ogive

Page 16: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

Histogram

Page 17: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

HistogramThe histogram looks like the bar chart except that the horizontal axis represent the data which is quantitative in nature. There is no gap between the bars.

Frequency PolygonThe frequency polygon looks like the line chart except that the horizontal axis represent the class mark of the data which is quantitative in nature.

OgiveOgive is a line graph with the horizontal axis represent the upper limit of the class interval while the vertical axis represent the cummulative frequencies.

Page 18: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

2.3 Exploratory Data Analysis

Exploratory data analysis (EDA) is an approach to analyze data for the purpose of formulating

hypotheses worth testing, complementing the tools of conventional statistics for testing

hypotheses.

The goal of EDA is to discover the patterns in data.

EDA is an approach for data analysis that employs a variety of techniques to (mostly

graphical) :

i. Maximize insight into a data set.

ii. Uncover underlying structure.

iii. Extract important variable.

iv. Detect outliers and anomalies.

v. Test underlying assumption.

vi. Develop parsimonious models.

vii. Determine optimal factor settings.

Page 19: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

Another technique that is used to present quantitative data is the stem-and-leaf display.

An advantage of a stem-and-leaf-display over a frequency distribution is that by preparing stem-and-leaf display, we do not lose information on individual observations.

A stem-and-leaf only for quantitative data. In a stem-and-leaf display of quantitative data, each value is

divided into two portions; a stem and leaf. The leaves for each stem are shown separately in a display.

STEM-AND-LEAF DISPLAYS

Page 20: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

Steps to construct a stem-and-leaf plot:i. Split each data into two parts; first part contains the leading digit

(stem), second part contains the twilling digit (leaf).ii. Draw a vertical line and write the stems on the left side, arranged in

increasing or decreasing order.iii. Read the leaves for all data and record them next to the corresponding

stems on the right side of the vertical line.iv. Rank the leaves for each stem in increasing order.

Example :-The following are the scores of 30 college students on a statistics test.75 52 80 96 65 79 71 87 93 9569 72 81 61 76 86 79 68 50 9283 84 77 64 71 87 72 92 57 98

Page 21: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

5 267 589For the score of the first student, which is 75, 7 is the stem and 5 is the leaf. For the score of the second student, which 52, 5 is the stem and 2 is the leaf. Observed from data, the stems for all scores are 5,6,7,8 and 9 because all scores lie in the range 50 to 98. After we have listed the stems, we read the leaves for all scores and record them next to the corresponding stems at the right side of the vertical line.

5 2 0 76 5 9 1 8 47 5 9 1 2 6 9 7 1 28 0 7 1 6 3 4 79 6 3 5 2 2 8

Leaf for 52

Stems

Leaf for 75Stem-and-leaf display

Stem-and-leaf display of test scores.

Page 22: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

Now we read all the scores and write the leaves on the right side of the vertical line in the rows of corresponding stems. By looking at the stem-and-leaf display of test scores, we can observed how the data values are distributed. For example, the stem 7 has the highest frequency, followed by stems 8,9,6 and 5. The leaf for each stem of the stem-and-leaf display of test scores are rank in increasing order and presented as below :5 0 2 76 1 4 5 8 97 1 1 2 2 5 6 7 9 98 0 1 3 4 6 7 79 2 2 3 5 6 8

* Analyze – There are 9 out of 30 college students score between 71 and 79.

Ranked stem-and-leaf display oftest scores.

Page 23: 2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display

At an outpatient testing center, the number of cardiograms performed each day for 20 days is shown. Construct a stem and leaf plot for the data

Exercise

25 31 20 3213

14 43 02 5723

36 32 33 3244

32 52 44 5145