Machine Learning and Data Mining Linear regression

Machine Learning and Data Mining

Linear regression

(adapted from) Prof. Alexander Ihler

Supervised learning• Notation

– Features x– Targets y– Predictions ŷ– Parameters q

Program (“Learner”)

Characterized by some “parameters” θ

Procedure (using θ) that outputs a prediction

Training data (examples)

Features

Learning algorithm

Change θImprove performance

Feedback / Target values Score performance

(“cost function”)

Linear regression

• Define form of function f(x) explicitly• Find a good f(x) within that family

0 10 200

Feature x

“Predictor”:Evaluate line:

return r

(c) Alexander Ihler

More dimensions?

(c) Alexander Ihler

Notation

Define “feature” x0 = 1 (constant)Then

(c) Alexander Ihler

Measuring error

Error or “residual”

Prediction

Observation

(c) Alexander Ihler

Mean squared error• How can we quantify the error?

• Could choose something else, of course…– Computationally convenient (more later)– Measures the variance of the residuals– Corresponds to likelihood under Gaussian model of “noise”

(c) Alexander Ihler

MSE cost function

• Rewrite using matrix form

(Matlab) >> e = y’ – th*X’; J = e*e’/m;

(c) Alexander Ihler

Visualizing the cost function

-1 -0.5 0 0.5 1 1.5 2 2.5 3-40

(c) Alexander Ihler

θ1 θ

Supervised learning• Notation

– Features x– Targets y– Predictions ŷ– Parameters q

Program (“Learner”)

Characterized by some “parameters” θ

Procedure (using θ) that outputs a prediction

Training data (examples)

Features

Learning algorithm

Change θImprove performance

Feedback / Target values Score performance

(“cost function”)

Finding good parameters• Want to find parameters which minimize our error…

• Think of a cost “surface”: error residual for that θ…

(c) Alexander Ihler

Linear regression: direct minimization

MSE Minimum

• Consider a simple problem– One feature, two data points– Two unknowns: µ0, µ1

– Two equations:

• Can solve this system directly:

• However, most of the time, m > n– There may be no linear function that hits all the data exactly– Instead, solve directly for minimum of MSE function

(c) Alexander Ihler

SSE Minimum

• Reordering, we have

• X (XT X)-1 is called the “pseudo-inverse”

• If XT is square and independent, this is the inverse• If m > n: overdetermined; gives minimum MSE fit

(c) Alexander Ihler

Matlab SSE• This is easy to solve in Matlab…

% y = [y1 ; … ; ym]% X = [x1_0 … x1_m ; x2_0 … x2_m ; …]

% Solution 1: “manual” th = y’ * X * inv(X’ * X);

% Solution 2: “mrdivide” th = y’ / X’; % th*X’ = y => th = y/X’

(c) Alexander Ihler

0 2 4 6 8 10 12 14 16 18 200

Effects of MSE choice• Sensitivity to outliers

16 2 cost for this one datum

Heavy penalty for large errors

-20 -15 -10 -5 0 50

(c) Alexander Ihler

L1 error

0 2 4 6 8 10 12 14 16 18 200

18 L2, original data

L1, original data

L1, outlier data

(c) Alexander Ihler

Cost functions for regression

“Arbitrary” functions can’t besolved in closed form…

- use gradient descent

Something else entirely…

(c) Alexander Ihler

Linear regression: nonlinear features

Nonlinear functions• What if our hypotheses are not lines?

– Ex: higher-order polynomials

0 2 4 6 8 10 12 14 16 18 200

18Order 1 polynom ial

0 2 4 6 8 10 12 14 16 18 200

(c) Alexander Ihler

Nonlinear functions• Single feature x, predict target y:

• Sometimes useful to think of “feature transform”

Add features:

Linear regression in new features

(c) Alexander Ihler

Higher-order polynomials• Fit in the same way• More “features”

0 2 4 6 8 10 12 14 16 18 200

0 2 4 6 8 10 12 14 16 18 20-2

0 2 4 6 8 10 12 14 16 18 200

Features• In general, can use any features we think are useful

• Other information about the problem– Sq. footage, location, age, …

• Polynomial functions– Features [1, x, x2, x3, …]

• Other functions– 1/x, sqrt(x), x1 * x2, …

• “Linear regression” = linear in the parameters– Features we can make as complex as we want!

(c) Alexander Ihler

Higher-order polynomials• Are more features better?

• “Nested” hypotheses– 2nd order more general than 1st,– 3rd order “ “ than 2nd, …

• Fits the observed data better

Overfitting and complexity• More complex models will always fit the training data

better• But they may “overfit” the training data, learning

complex relationships that are not really present

Complex model

Simple model

(c) Alexander Ihler

Test data• After training the model• Go out and get more data from the world

– New observations (x,y)

• How well does our model perform?

(c) Alexander Ihler

Training data

New, “test” data

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

Training data

Training versus test error• Plot MSE as a function

of model complexity– Polynomial order

• Decreases– More complex function

fits training data better

• What about new data?M

Polynomial order

New, “test” data

• 0th to 1st order– Error decreases– Underfitting

• Higher order– Error increases– Overfitting

(c) Alexander Ihler

Machine Learning and Data Mining Linear regression

Documents

1 Curve-Fitting Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression Interpolation

Title stata.com etregress — Linear regression with ... · etregress — Linear regression with endogenous treatment effects ... etregress— Linear regression with endogenous

Regression Linear Regression

Data Mining: Scoring (Linear Regression) - SAP · PDF fileData Mining: Scoring (Linear Regression ... Data mining is to automatically ... To make an explicit field assignment, double

SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression

Data Mining Linear and Logistic Regression Data Mining/6... · 07/02/2017 3 Finding a and b •In the case of simple linear regression, a and b can be calculated as follows: ¦ ¦

Data Mining- Regression Analysis (Linear regression).pdf

Lecture 5: Clustering, Linear Regression · Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-2 STATS 202: Data mining and analysis Sergio Bacallado September

Introduction to Linear Regression - …classroom.takasila.org/classroom/dataupload/takasila/155/LinearReg.pdf · Background Simple Linear Regression Multi-variable Linear Regression

Chapter 11 Multiple Linear Regression Chapter 11 Multiple Linear Regression

Statistics 202: Data Mining Part I Linear regression & LASSOstatweb.stanford.edu/.../notes/week10_2x2.pdf · Part I Linear regression & LASSO 2/1 Statistics 202: Data Mining c Jonathan

Regression Linear Regression Regression Trees

Lecture 11: Regression Methods I (Linear Regression)math.arizona.edu/~hzhang/math574m/2017Lect11_lm.pdf · Lecture 11: Regression Methods I (Linear Regression) 7/40. Linear Model

Linear Models & Linear Regression

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION Determining the Regression ... of linear... · 2012. 6. 21. · REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Linear Regression via Normal Equations...Module 1 Objectives/Linear Regression •Linear Algebra Primer -matrix equations, notations -matrix manipulations •Linear Regression -objective,

Linear regression: Part 1 - NTNU · Linear regression: Part 1. Lecture Outline What are linear models?-EX1: What is the ‘best’ line? What is linear regression? ... Linear regression

Multiple Linear Regression - Analysis Made Easy · Multiple Linear Regression The MULTIPLE LINEAR REGRESSION command performs simple multiple regression using least squares. Linear

Linear Regression - Wharton Finance - Finance …finance.wharton.upenn.edu/.../Linear-Regression-Slides.pdf · 2009-10-07 · regression model, 2-variable linear regression model)

REGRESSION 12.1 Simple Linear Regression Model 12.2 ...sman/courses/6739/SimpleLinearRegression.pdf · Goldsman — ISyE 6739 Linear Regression REGRESSION 12.1 Simple Linear Regression