21
1 CHAPTER 2 DESCRIPTIVE STATISTICS L1 – Numerical Summary of Data L2 - Data Display and summary

Descriptive Statistics

Embed Size (px)

DESCRIPTION

Its subject related to statistics of the probability and statistics.So you can learn everything about it.

Citation preview

  • *CHAPTER 2 DESCRIPTIVE STATISTICS

    L1 Numerical Summary of Data L2 - Data Display and summary

  • * At the end of the lesson, students should be able to:

    Explain the concepts of sample mean, population mean, sample variance, population variance, sample standard deviation,

    Compute and interpret the sample mean, sample variance, sample standard deviation, sample median, an sample rangeLearning Objectives:

  • *DescriptiveStatistics

    Methods of organizing, display, and describe important features of data by * tables, * graphs, and * summary(numerical) measures

  • *Population Sample ( Definition)Population:

    A collection, or set, of individuals or objects or events whose properties are to be analyzed.( the number UTP students)

    Sample:

    A subset of the population. The number of individuals of a sample is called the sample size.( the number of engineering students in UTP)

  • *Illustration of selection of a sample from a population

  • *DefinitionVariable: A characteristic of the objects in a population. CGPA of UTP students (number)Gender of an engineering graduate ( category: male or female)

    Its value may change from one object to another in the population

    A data set consists of one or more variables

  • *

    Numerical Descriptive MeasuresUse to identify the center spreadImportant features of distribution

    Measure of central tendency: gives the center of a histogram or frequency distribution curve.Common measures of the central tendency: mean, median

    Measures of dispersion: gives the spread of the dataCommon measures of dispersion: range, variance and standard deviation

  • *Population mean (mu) :Sum of all valuesIn the populationThe population sizeSample meanThe sample sizeSum of all valuesIn the sampleMeasures of central tendency: (i)Mean

  • *

    Median: value of the middle term in a data set that has been ranked in increasing order.

    Calculation of the median:Rank the data set in increasing orderFind the middle term.

    Measures of central tendency: (ii)Median

  • *Consider the following data sets ( age of workers):

    Company 1:47383540364549

    Company 2:7033185227

    Mean or median is usually not a sufficient measure to reveal the shape of distributionMeasures of dispersion: provide information about the variation of a data set.

  • *

    Measures of dispersion:Range = Largest value smallest value

    Standard deviation: Most- used dispersion.Tells how closely the values of the data set are clustered around the mean.

    Variance: square of the standard deviation

    Values of the variance and standard deviation are never negative.

    Population variance and mean : parametersSample variance and mean: statistic

  • *Population variance :Population standard deviation is Measures of dispersion:

  • *Sample VarianceSample Standard Deviation: Measures of dispersion:

  • *Example 1:

    Find the mean, variance and standard deviation for the following observations: 55 68 90 42 89 70

  • Pictorial & Tabular Methods1. Stem-and-Leaf Displays: How to construct a Stem-and-Leaf Display: 1. Each numerical data is divided into two parts: - The leading digit(s) becomes the stem, and the remaining digit(s) becomes the leaf 2. List the stem values in a vertical column. 3. Record the leaf for each observation beside its stem. 4. Write the units for stems and leaves on the display.

  • Example:No of hours that 30 students spent working on computers: 75 52 80 96 65 79 71 87 93 95 69 72 81 61 76 86 79 68 50 92 83 84 77 64 71 87 72 92 57 98

    Stem & Leaf DisplayStem: tens digitLeaf: ones digit

  • How to construct a box plot*Step 1: Arrange the numbers from smallest to the largest.

    Step 2: Find the median, Q2, the lower quartile, Q2 and the upper quartile, Q3 of a given set of data.

    Step 3: Find the interquartile range (IQR). The IQR is the difference between the upper quartile and the lower quartile.

    Step 4: Start to draw the Box-plot either horizontally or vertically.

    Step 5: Calculate the 1.5IQR and determine the range of 1.5IQR from upper quartile and the lower quartile. The value(s) that place outside of the 1.5IQR range called the outlier(s). The value(s) that place outside of the 3IQR range called the extreme outlier(s).

  • Shape of the distribution ( symmetrical or non- symmetrical)Presence of outliers

    *WHAT IS IMPORTANT IN A GRAPHICAL DISPLAY:

  • Example 1:

    The cold start ignition time of an automobile engine obtained for a test vehicle are as follows:

    1.75 1.91 1.92 2.35 2.53 2.62 3.09 3.15

    a)Calculate the sample median, the quartiles and the IQR

    b) Construct a box plot of the data. Comment on your plot.

  • Example 2:

    The cold start ignition time of an automobile engine obtained for a test vehicle are as follows:

    1.75 1.91 1.92 2.35 2.53 2.62 3.09 5.15

    Construct a box plot of the data.

  • *Given a list of marks on a recent quiz for 10 students:

    51, 47, 55, 49, 55, 46, 55, 89, 51, 54EXAMPLE 3i) Determine the mode, the median and the mean of the data.

    ii) Construct a box-plot of the data and comment on the plot

    *********************