Linear Regression Ed.3

Embed Size (px)

Citation preview

  • 7/30/2019 Linear Regression Ed.3

    1/16

    SitiHawa/MTE3105/ed.2

    Linear Regression

  • 7/30/2019 Linear Regression Ed.3

    2/16

    SitiHawa/MTE3105/ed.2

    Think Of

    Below is 2 sets of data, set A

    (coursework score) and set B(test score) and think of the

    conclusion that you can make

    between this sets of data.

    Set A

    83.0

    92.0

    80.0

    83.080.0

    80.0

    90.0

    83.0

    80.0

    80.094.0

    94.0

    80.0

    80.0

    90.0

    90.092.0

    94.0

    Set B

    71.0

    82.0

    71.0

    51.071.0

    61.0

    65.0

    60.0

    80.0

    82.064.0

    59.0

    47.0

    57.0

    71.0

    78.083.0

    80.0

  • 7/30/2019 Linear Regression Ed.3

    3/16

    SitiHawa/MTE3105/ed.3

    Think Of

    Dependent variable

    Independent variable

    Dependent

    Or

    Independent

    Plot graph for this data

    and explain it to the

    class

  • 7/30/2019 Linear Regression Ed.3

    4/16

    SitiHawa/MTE3105/ed.2

    Linear Regression

    Example : A set of data and the plot of a "best-fit" straight linethrough the data.

  • 7/30/2019 Linear Regression Ed.3

    5/16

    SitiHawa/MTE3105/ed.2

    Linear Regression

    The relationship between two sets of data (xand y) is linear,when the data is plotted (yvsx) the result is a straight line.

    This relationship is having a linear correlation and follows theequation of a straight line, y = mx+b .

  • 7/30/2019 Linear Regression Ed.3

    6/16

    SitiHawa/MTE3105/ed.2

    Linear Regression : Concept

    Example of linear regression with one dependent and one independent variable.

    http://en.wikipedia.org/wiki/Image:Normdist_regression.png
  • 7/30/2019 Linear Regression Ed.3

    7/16

    SitiHawa/MTE3105/ed.2

    Scatterplots

    A scatterplot is a graphic toolused to display the relationshipbetween two quantitativevariables.

    A scatterplot consists of an X axis(the horizontal axis), a Y axis (thevertical axis), and a series of dots.

    Each dot on the scatterplotrepresents one observation from adata set. The position of the dot onthe scatterplot represents its X andY values.

    0.0

    1.0

    2.0

    3.0

    4.0

    5.0

    6.0

    0.0 2.0 4.0 6.0 8.0

    x

    y

    http://stattrek.com/Help/Glossary.aspx?Target=Quantitative%20variablehttp://stattrek.com/Help/Glossary.aspx?Target=Quantitative%20variable
  • 7/30/2019 Linear Regression Ed.3

    8/16

    SitiHawa/MTE3105/ed.2

    Linear Regression : Scatter Diagram

    A scatterplot is often employed to identify potential associations between

    two variables : explanatory variable and response variable.

    Positive association (positive

    slope)

    Negative association

    (negative slope)No association

  • 7/30/2019 Linear Regression Ed.3

    9/16

    SitiHawa/MTE3105/ed.2

    Linear Regression

    Given a set of data (xi, yi) with n data points, the slope and y-intercept can be determined using the following:

    22 )()(

    )(

    xxn

    yxxyn

    m

    n

    xmyb

  • 7/30/2019 Linear Regression Ed.3

    10/16

    SitiHawa/MTE3105/ed.2

    Correlation Coefficient

    Given a set of data (xi , yi) with n data points, the correlation

    coefficient, r can be determined by :

    2222 )()()()(

    )(

    yynxxn

    yxxynr

    See example using MS Excel

    http://localhost/var/www/apps/conversion/tmp/scratch_10/Linear%20Regression%20Eg1.xlsxhttp://localhost/var/www/apps/conversion/tmp/scratch_10/Linear%20Regression%20Eg1.xlsx
  • 7/30/2019 Linear Regression Ed.3

    11/16

    SitiHawa/MTE3105/ed.2

    Linear Regression

    Correlation coefficient, r, is a measure of the reliability of the linear

    relationship between thexand yvalues.

    r= 1 indicates an exact linear relationship betweenxand y.

    Values ofrclose to 1 indicate excellent linear reliability.

    If the correlation coefficient is relatively far away from 1, the predictions

    based on the linear relationship, y = mx + b , will be less reliable.

  • 7/30/2019 Linear Regression Ed.3

    12/16

    SitiHawa/MTE3105/ed.2

    Correlation Coefficients, r

    Correlation coefficients measure thestrength of association between two

    variables.

    The most common correlation coefficient,called the Pearson product-moment

    correlation coefficient, measures the

    strength of the linear association betweenvariables.

  • 7/30/2019 Linear Regression Ed.3

    13/16

    SitiHawa/MTE3105/ed.2

    Scatterplots and Correlation Coefficients

    The scatterplots below show how different patterns of data produce

    different degrees of correlation.

    Maximum positive correlation (r = 1.0) Strong positive correlation (r = 0.80) Zero correlation (r = 0)

    Minimum negative correlation (r = -1.0) Moderate negative correlation (r = -0.43) Strong correlation with outlier (r = 0.71)

    http://stattrek.com/Help/Glossary.aspx?Target=Scatterplothttp://stattrek.com/Help/Glossary.aspx?Target=Scatterplot
  • 7/30/2019 Linear Regression Ed.3

    14/16

    SitiHawa/MTE3105/ed.2

    Linear Regression : Concept

    Simple Linear Regression is the method for finding the "line of best fit" betweenthe dependent variable, y, and the independent variable, x.

    Simple: only one independent variable

    In general, the goal of linear regression is to find the line that best predicts Yfrom

    X.

    Linear regression does this by finding the line that minimizes the sum of the

    squares of the vertical distances of the points from the line.

    The Least Squares Regression Line is the line which minimizes the sum of thesquare or the error of the data points.

  • 7/30/2019 Linear Regression Ed.3

    15/16

    SitiHawa/MTE3105/ed.2

    How to Interpret a Correlation Coefficient

    The sign and the absolute value of a correlation coefficient describe the

    direction and the magnitude of the relationship between two variables.

    The value of a correlation coefficient ranges between -1 and 1.

    The greater the absolute value of a correlation coefficient,

    the stronger the linearrelationship.

    The strongest linear relationship is indicated by a correlation coefficient of -1 or 1.

    The weakest linear relationship is indicated by a correlation coefficient equal to 0.

    A positive correlation means that if one variable gets bigger, the other

    variable tends to get bigger.

    A negative correlation means that if one variable gets bigger, the other

    variable tends to get smaller.

    http://stattrek.com/Help/Glossary.aspx?Target=Absolute%20valuehttp://stattrek.com/Help/Glossary.aspx?Target=Absolute%20value
  • 7/30/2019 Linear Regression Ed.3

    16/16

    SitiHawa/MTE3105/ed.2

    Interpolation & Extrapolation

    Interpolation is making a prediction within the range of

    values of the predictor in the sample used to generate themodel.

    Extrapolation is making a prediction outside the range of

    values of the predictor in the sample used to generate the

    model.