correlation - Govt.college for girls sector 11...

Preview:

Citation preview

Copyright © 2000 by Monica Yuskaitis

Vocabulary Review Sum – the answer to an addition problem. Addend – the numbers you added together to get the sum.

6 + 9 = 15

Copyright © 2000 by Monica Yuskaitis

Definition

Mean

Means

Average

Copyright © 2000 by Monica Yuskaitis

Definition Mean – the average of a group of numbers.

2, 5, 2, 1, 5

Mean = 3

Copyright © 2000 by Monica Yuskaitis

Mean is found by evening out the numbers

2, 5, 2, 1, 5

Copyright © 2000 by Monica Yuskaitis

Mean is found by evening out the numbers

2, 5, 2, 1, 5

Copyright © 2000 by Monica Yuskaitis

Mean is found by evening out the numbers

2, 5, 2, 1, 5

mean = 3

Copyright © 2000 by Monica Yuskaitis

How to Find the Mean of a Group of Numbers

Step 1 – Add all the numbers.

8, 10, 12, 18, 22, 26

8+10+12+18+22+26 = 96

Copyright © 2000 by Monica Yuskaitis

How to Find the Mean of a Group of Numbers

Step 2 – Divide the sum by the number of addends.

8, 10, 12, 18, 22, 26

8+10+12+18+22+26 = 96 How many addends are there?

Copyright © 2000 by Monica Yuskaitis

How to Find the Mean of a Group of Numbers

Step 2 – Divide the sum by the number of addends.

6) 96 sum # of addends

1

6 3 6

6

6 3

Copyright © 2000 by Monica Yuskaitis

How to Find the Mean of a Group of Numbers

The mean or average of these numbers is 16.

8, 10, 12, 18, 22, 26

Copyright © 2000 by Monica Yuskaitis

What is the mean of these numbers?

7, 10, 16

11

Copyright © 2000 by Monica Yuskaitis

What is the mean of these numbers?

2, 9, 14, 27

13

Copyright © 2000 by Monica Yuskaitis

What is the mean of these numbers?

1, 2, 7, 11, 19

8

Copyright © 2000 by Monica Yuskaitis

What is the mean of these numbers?

26, 33, 41, 52

38

Copyright © 2000 by Monica Yuskaitis

Definition

Median

is in the

Middle

Copyright © 2000 by Monica Yuskaitis

Definition Median – the middle number in a set of ordered numbers.

1, 3, 7, 10, 13

Median = 7

Copyright © 2000 by Monica Yuskaitis

How to Find the Median in a Group of Numbers

Step 1 – Arrange the numbers in order from least to greatest.

21, 18, 24, 19, 27

18, 19, 21, 24, 27

Copyright © 2000 by Monica Yuskaitis

How to Find the Median in a Group of Numbers

Step 2 – Find the middle number.

21, 18, 24, 19, 27

18, 19, 21, 24, 27

Copyright © 2000 by Monica Yuskaitis

How to Find the Median in a Group of Numbers

Step 2 – Find the middle number.

18, 19, 21, 24, 27

This is your median number.

Copyright © 2000 by Monica Yuskaitis

How to Find the Median in a Group of Numbers

Step 3 – If there are two middle numbers, find the mean of these two numbers.

18, 19, 21, 25, 27, 28

Copyright © 2000 by Monica Yuskaitis

How to Find the Median in a Group of Numbers

Step 3 – If there are two middle numbers, find the mean of these two numbers.

21+ 25 = 46

2) 46 23 median

Copyright © 2000 by Monica Yuskaitis

What is the median of these numbers?

16, 10, 7

10

7, 10, 16

Copyright © 2000 by Monica Yuskaitis

What is the median of these numbers?

29, 8, 4, 11, 19

11

4, 8, 11, 19, 29

Copyright © 2000 by Monica Yuskaitis

What is the median of these numbers?

31, 7, 2, 12, 14, 19

13 2, 7, 12, 14, 19, 31

12 + 14 = 26 2) 26

Copyright © 2000 by Monica Yuskaitis

What is the median of these numbers?

53, 5, 81, 67, 25, 78

60 53 + 67 = 120 2) 120

5, 25, 53, 67, 78, 81

Copyright © 2000 by Monica Yuskaitis

Definition

Mode

is the most

Popular

Copyright © 2000 by Monica Yuskaitis

Definition A la mode – the most popular or that which is in fashion.

Baseball caps are a la mode today.

Copyright © 2000 by Monica Yuskaitis

Definition Mode – the number that appears most frequently in a set of numbers.

1, 1, 3, 7, 10, 13

Mode = 1

Copyright © 2000 by Monica Yuskaitis

How to Find the Mode in a Group of Numbers

Step 1 – Arrange the numbers in order from least to greatest.

21, 18, 24, 19, 18

18, 18, 19, 21, 24

Copyright © 2000 by Monica Yuskaitis

How to Find the Mode in a Group of Numbers

Step 2 – Find the number that is repeated the most.

21, 18, 24, 19, 18

18, 18, 19, 21, 24

Copyright © 2000 by Monica Yuskaitis

Which number is the mode?

29, 8, 4, 8, 19

8

4, 8, 8, 19, 29

Copyright © 2000 by Monica Yuskaitis

Which number is the mode?

1, 2, 2, 9, 9, 4, 9, 10

9

1, 2, 2, 4, 9, 9, 9, 10

Copyright © 2000 by Monica Yuskaitis

Which number is the mode?

22, 21, 27, 31, 21, 32

21

21, 21, 22, 27, 31, 32

Copyright © 2000 by Monica Yuskaitis

Definition

Range

is the distance

Between

Copyright © 2000 by Monica Yuskaitis

Definition Range – the difference between

the greatest and the least value in a set of numbers.

1, 1, 3, 7, 10, 13

Range = 12

Copyright © 2000 by Monica Yuskaitis

How to Find the Range in a Group of Numbers

Step 1 – Arrange the numbers in order from least to greatest.

21, 18, 24, 19, 27

18, 19, 21, 24, 27

Copyright © 2000 by Monica Yuskaitis

How to Find the Range in a Group of Numbers

Step 2 – Find the lowest and highest numbers.

21, 18, 24, 19, 27

18, 19, 21, 24, 27

Copyright © 2000 by Monica Yuskaitis

How to Find the Range in a Group of Numbers

Step 3 – Find the difference between these 2 numbers.

18, 19, 21, 24, 27

27 – 18 = 9 The range is 9

Copyright © 2000 by Monica Yuskaitis

What is the range?

29, 8, 4, 8, 19

29 – 4 = 25

4, 8, 8, 19, 29

Copyright © 2000 by Monica Yuskaitis

What is the range?

22, 21, 27, 31, 21, 32

32 – 21 = 11

21, 21, 22, 27, 31, 32

Copyright © 2000 by Monica Yuskaitis

What is the range?

31, 8, 3, 11, 19

31 – 3 = 28

3, 8, 11, 19, 31

Copyright © 2000 by Monica Yuskaitis

What is the range?

23, 7, 9, 41, 19

41 – 7 = 34

7, 9, 23, 19, 41

This powerpoint was kindly donated to

www.worldofteaching.com

http://www.worldofteaching.com is home to over a

thousand powerpoints submitted by teachers. This is a

completely free site and requires no registration. Please

visit and I hope it will help in your teaching.

Median The median is the value separating the higher half of

a data sample, a population, or a probability distribution, from the lower half. In simple terms, it may be thought of as the "middle" value of a data set. For example, in the data set {1, 3, 3, 6, 7, 8, 9}, the median is 6, the fourth number in the sample. The median is a commonly used measure of the properties of a data set in statistics and probability theory.

Median in case of individual series A. Individual Series:

To find the value of Median, in this case, the terms are arranged in ascending or descending order first; and then the middle term taken is called Median.

Two cases arise in individual type of series:

(a) When number of terms is odd:

The terms are arranged in ascending or descending order and then are taken as Median.

N = Total number of terms = 9

Now = N+1/2 = 9+1 /2 = 2

Median = 5th term = 19.

(b) When number of terms is even:

In this case also, the terms are arranged in, order and then mean of two middle terms is taken as Median.

Example 2. From the following figures of ages of some students, calculate the median age:

Calculation of Median in Discrete Series:

After arranging the terms, taking cumulative frequencies, we take (N+1/2) and then calculate.

Steps to Calculate:

(1) Arrange the data in ascending or descending order.

(2) Find cumulative frequencies.

(3) Find the value of the middle item by using the formula

Median = Size of (N+1/2)th item

(4) Find that total in the cumulative frequency column which is equal (N + 1/2)th or nearer to that value.

(5) Locate the value of the variable corresponding to that cumulative frequency This is the value of Median.

Continuous Series: In this case cumulative frequencies is taken and then

the value from the class-interval in which (N/2)th term lies is taken. Using the formula.

M= L+ N1-Cf/f × i

Where N1 = N/2, L is lower limit of class interval in which frequency lies.

Cf is the cumulative frequency, f the frequency of that interval and i is the length of class interval.

Median can also be calculated from formula given below:

M= L – Cf-N1/f × i: Where L is upper limit of median class.

Merits of median 1) It is easy to compute and understand.

2) It is well defined an ideal average should be. 3) It can also be computed in case of frequency distribution with open ended classes. 4) It is not affected by extreme values and also interdependent of range or dispersion of the data. 5) It can be determined graphically. 6) It is proper average for qualitative data where items are not measured but are scored

Demerits 1) For computing median data needs to be arranged in

ascending or descending order. 2) It is not based on all the observations of the data. 3) It can not be given further algebraic treatment. 4) It is affected by fluctuation of sampling. 5) It is not accurate when the data is not large.

Regression

Regression is the attempt to explain the variation in a dependent

variable using the variation in independent variables.

Regression is thus an explanation of causation.

If the independent variable(s) sufficiently explain the variation in the

dependent variable, the model can be used for prediction.

Independent variable (x)

Dependent

variable

Simple Linear Regression

Independent variable (x)

Dependent

variable

(y)

The output of a regression is a function that predicts the dependent

variable based upon values of the independent variables.

Simple regression fits a straight line to the data.

y’ = b0 + b1X ± є

b0 (y intercept)

B1 = slope

= ∆y/ ∆x

є

Simple Linear Regression

Independent variable (x)

Dependent

variable

The function will make a prediction for each observed data point.

The observation is denoted by y and the prediction is denoted by y.

Zero

Prediction: y

Observation: y

^

^

Simple Linear Regression

For each observation, the variation can be described as:

y = y + ε

Actual = Explained + Error

Zero

Prediction error: ε

^

Prediction: y ^ Observation: y

Regression

Independent variable (x)

Dependent

variable

A least squares regression selects the line with the lowest total sum

of squared prediction errors.

This value is called the Sum of Squares of Error, or SSE.

Calculating SSR

Independent variable (x)

Dependent

variable

The Sum of Squares Regression (SSR) is the sum of the squared

differences between the prediction for each observation and the

population mean.

Population mean: y

Regression Formulas

The Total Sum of Squares (SST) is equal to SSR + SSE.

Mathematically,

SSR = ∑ ( y – y ) (measure of explained variation)

SSE = ∑ ( y – y ) (measure of unexplained variation)

SST = SSR + SSE = ∑ ( y – y ) (measure of total variation in y)

^

^

2

2

The Coefficient of Determination

The proportion of total variation (SST) that is explained by the

regression (SSR) is known as the Coefficient of Determination, and is

often referred to as R .

R = =

The value of R can range between 0 and 1, and the higher its value

the more accurate the regression model is. It is often referred to as a

percentage.

SSR SSR

SST SSR + SSE 2

2

2

Standard Error of Regression

The Standard Error of a regression is a measure of its variability. It

can be used in a similar manner to standard deviation, allowing for

prediction intervals.

y ± 2 standard errors will provide approximately 95% accuracy, and 3

standard errors will provide a 99% confidence interval.

Standard Error is calculated by taking the square root of the average

prediction error.

Standard Error = SSE

n-k

Where n is the number of observations in the sample and

k is the total number of variables in the model

The output of a simple regression is the coefficient β and the

constant A. The equation is then:

y = A + β * x + ε

where ε is the residual error.

β is the per unit change in the dependent variable for each unit

change in the independent variable. Mathematically:

β = ∆ y

∆ x

Multiple Linear Regression

More than one independent variable can be used to explain variance in

the dependent variable, as long as they are not linearly related.

A multiple regression takes the form:

y = A + β X + β X + … + β k Xk + ε

where k is the number of variables, or parameters.

1 1 2 2

Multicollinearity

Multicollinearity is a condition in which at least 2 independent

variables are highly linearly correlated. It will often crash computers.

Example table of

Correlations

Y X1 X2

Y 1.000

X1 0.802 1.000

X2 0.848 0.578 1.000

A correlations table can suggest which independent variables may be

significant. Generally, an ind. variable that has more than a .3

correlation with the dependent variable and less than .7 with any

other ind. variable can be included as a possible predictor.

Nonlinear Regression

Nonlinear functions can also be fit as regressions. Common

choices include Power, Logarithmic, Exponential, and Logistic,

but any continuous function can be used.

Regression Output in Excel

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.982655

R Square 0.96561

Adjusted R Square 0.959879

Standard Error 26.01378

Observations 15

ANOVA

df SS MS F Significance F

Regression 2 228014.6 114007.3 168.4712 1.65E-09

Residual 12 8120.603 676.7169

Total 14 236135.2

CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%

Intercept 562.151 21.0931 26.65094 4.78E-12 516.1931 608.1089

Temperature -5.436581 0.336216 -16.1699 1.64E-09 -6.169133 -4.704029

Insulation -20.01232 2.342505 -8.543127 1.91E-06 -25.1162 -14.90844

Estimated Heating Oil = 562.15 - 5.436 (Temperature) - 20.012 (Insulation)

Y = B0 + B1 X1 + B2X2 + B3X3 - - - +/- Error

Total = Estimated/Predicted +/- Error

Recommended