Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
CMSC 422: Machine Learning
Linear RegressionKalman Filters
William Regli, Professor of Computer Science
Slide Credits: Andrew W. Moore, School of Computer Science, Carnegie Mellon UniversityHal Daumé, Furong Huang, Marine Carpuat, Computer Science Department, U of MarylandOther slides and images attributed as best one can, apologies for any errors or omissions, used either with permission or under Fair Use (https://www.copyright.gov/fls/fl102.html)
Regression Models• Learning a functional relationship about a real-valued
number, i.e., when y is tomorrow’s temperature.• Technically, solving a regression problem is finding a
conditional expectation or average value of y,• the probability that we have found exactly the right real-valued
number for y is 0.
• Regression models capture the relationship between one dependent variable and explanatory variable(s)
• Use equation to set up relationship• Numerical Dependent (Response) Variable• 1 or More Numerical or Categorical Independent (Explanatory)
Variables
• Used Mainly for Prediction & Estimation
2
Copyright © 2001, 2003, Andrew W. Moore
Linear Regression
Linear regression assumes that the expected value of the output given an input, E[y|x], is linear.
Simplest case: Out(x) = wx for some unknown w.
Given the data, we can estimate w.
inputs outputsx1 = 1 y1 = 1
x2 = 3 y2 = 2.2
x3 = 2 y3 = 2
x4 = 1.5 y4 = 1.9
x5 = 4 y5 = 3.1
DATASET
¬ 1 ®
w¯
Copyright © 2001, 2003, Andrew W. Moore
1-parameter linear regressionAssume that the data is formed by
yi = wxi + noiseiwhere…• the noise signals are independent• the noise has a normal distribution with
mean 0 and unknown variance σ2
p(y|w,x) has a normal distribution with• mean wx• variance σ2
Regression examples
Prediction of menu pricesChaheau Gimpel … and Smith EMNLP 2012
…
7
Types of Regression Models
RegressionModels
Linear Non-Linear
2+ ExplanatoryVariables
Simple Multiple
Linear
1 ExplanatoryVariable
Non-Linear
8
Regression Modeling Steps
• Hypothesize Deterministic Component• Estimate Unknown Parameters
• Specify Probability Distribution of Random Error Term
• Estimate Standard Deviation of Error
• Evaluate the fitted Model• Use Model for Prediction & Estimation
Linear regression
• Given an input x we would like to compute an output y
• For example:• Predict height from age• Predict Google’s price from
Apple‘s price• Predict distance from wall
from sensors• BMI based on height and
weight• Papers published based on
age
X
Y
Linear regression• Given an input x we would like to
compute an output y• In linear regression we assume
that y and x are related with the following equation:
y = wx+ewhere w is a parameter and erepresents measurement or other noise
X
Y
What we are trying to predict
Observed values
Our goal is to estimate w from a training data of <xi,yi> pairs
Optimization goal: minimize squared error (least squares)
Why least squares?
• minimizes squared distance between measurements and predicted line
• has a nice probabilistic interpretation
Linear regression
∑ −i
iiw wxy 2)(minargX
Y
ε+= wxy
Regression example
• Generated: w=2• Recovered: w=2.03• Noise: std=1
Regression example
• Generated: w=2• Recovered: w=2.05• Noise: std=2
Regression example
• Generated: w=2• Recovered: w=2.08• Noise: std=4
Bias term
• What if the line does not pass through the origin?
• No problem, simply change the model to
y = w0 + w1x+e
• Can use least squares to determine w0 , w1
n
xwyw i
ii∑ −=
1
0
X
Y
w0
∑
∑ −=
ii
iii
x
wyxw 2
0
1
)(
Data points of price versus floor space of houses for sale in Berkeley, CA, in July ‘09:
9/17/18 18
Regression function hypothesis that minimizes squared error loss
Plot of the loss function
9/17/18 19
Note: shape is convexone global minima
Multivariate regression
• What if we have several inputs?Stock prices for Apple, Microsoft and Amazon for the Google prediction task
Multivariate Regression• Model:
y = w0 + w1x1+ … + wkxk + e
Google’s stock price
Apple’s stock price
Microsoft’s stock price
Non-Linear basis function
• So far we only used the observed values x1,x2,… directly• However, linear regression can be applied in the same
way to functions of these values• E.g., to add a term w x1x2 add a new variable z=x1x2 so each
example becomes: x1, x2, …. z
• As long as these functions can be directly computed from the observed values the parameters are still linear in the data and the problem remains a multi-variate linear regression problem
e++++= 22110 kk xwxwwy !
Non-linear basis functions
• What type of functions can we use?
• A few common examples:
- Polynomial: fj(x) = xj for j=0 … n
- Gaussian:
- Sigmoid:
- Logs:
€
φ j (x) =(x −µ j )2σ j
2
€
φ j (x) =1
1+ exp(−s j x)
Any function of the input values can be used. The solution for the parameters of the regression remains the same.
φ j (x) = log(x +1)
General linear regression problem
• Using our new notations for the basis function linear regression can be written as
• Where fj(x) can be either xj for multivariate regression or one of the non-linear basis functions we defined
• … and f0(x)=1 for the intercept term€
y = w jφ j (x)j= 0
n
∑
26
Introduction to Kalman Filters
27
The Problem
• Why do we need Kalman Filters?• What is a Kalman Filter?• Conceptual Overview• The Theory of Kalman Filter• Simple Example
9/17/18 28
29
• System state cannot be measured directly• Need to estimate “optimally” from measurements
Measuring Devices Estimator
MeasurementError Sources
System State (desired but not known)
External Controls
Observed Measurements
Optimal Estimate of
System State
SystemError Sources
System
Black Box
30
What is a Kalman Filter?• Recursive data processing algorithm• Generates optimal estimate of desired quantities
given the set of measurements• Optimal?
• For linear system and Gaussian errors, Kalman filter is “best” estimate based on all previous measurements
• For non-linear system optimality is ‘qualified’• Recursive?
• Doesn’t need to store all previous measurements and reprocess all data each time step
31
Conceptual Overview
• Lost on the 1-dimensional line• Position: y(t)• Assume Gaussian distributed measurements
y
32
Conceptual Overview
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
• Sextant Measurement at t1: Mean = z1 and Variance = s2z1
• Optimal estimate of position is: ŷ(t1) = z1
• Variance of error in estimate: s2x (t1) = s2
z1
• Boat in same position at time t2 - Predicted position is z1
33
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
• So we have the prediction ŷ-(t2)• GPS Measurement at t2: Mean = z2 and Variance = sz2
• Need to correct the prediction due to measurement to get ŷ(t2)• Closer to more trusted measurement – linear interpolation?
prediction ŷ-(t2)measurement z(t2)
34
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
• Corrected mean is the new optimal estimate of position• New variance is smaller than either of the previous two variances
measurement z(t2)
corrected optimal estimate ŷ(t2)
prediction ŷ-(t2)
35
Conceptual Overview
Basic ideas:
Make prediction based on previous data: ŷ-, s-
Take measurement: zk, sz
Optimal estimate (ŷ) = Prediction + (Kalman Gain) * (Measurement - Prediction)
Variance of estimate = Variance of prediction * (1 – Kalman Gain)
36
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
• At time t3, boat moves with velocity dy/dt=u• Naïve approach: Shift probability to the right to predict• This would work if we knew the velocity exactly (perfect model)
ŷ(t2)Naïve Prediction ŷ-(t3)
37
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
• Better to assume imperfect model by adding Gaussian noise
• dy/dt = u + w
• Distribution for prediction moves and spreads out
ŷ(t2)
Naïve Prediction
ŷ-(t3)
Prediction ŷ-(t3)
38
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
• Now we take a measurement at t3• Need to once again correct the prediction• Same as before
Prediction ŷ-(t3)
Measurement z(t3)
Corrected optimal estimate ŷ(t3)
39
Conceptual OverviewSummary:• Initial conditions (ŷk-1 and sk-1)• Prediction (ŷ-
k , s-k)
• Use initial conditions and model (e.g., constant velocity) to make prediction
• Measurement (zk)• Take measurement
• Correction (ŷk , sk)• Use measurement to correct prediction by ‘blending’
prediction and residual – always a case of merging only two Gaussians
• Optimal estimate with smaller variance
40
Theoretical Basis
• Process to be estimated:
yk = Ayk-1 + Buk + wk-1
zk = Hyk + vk
Process Noise (w) with covariance Q
Measurement Noise (v) with covariance R
• Kalman Filter
Predicted: ŷ-k is estimate based on measurements at previous time-steps
ŷk = ŷ-k + K(zk - H ŷ-
k )
Corrected: ŷk has additional information – the measurement at time k
K = P-kHT(HP-
kHT + R)-1
ŷ-k = Ayk-1 + Buk
P-k = APk-1AT + Q
Pk = (I - KH)P-k
41
Kalman Filter Algorithm (notation abuse) Algorithm Kalman_filter( µt-1, St-1, ut, zt):
Prediction:
Correction:
Return µt, St
ttttt uBA += -1µµ
tTtttt RAA +S=S -1
1)( -+SS= tTttt
Tttt QCCCK
)( tttttt CzK µµµ -+=
tttt CKI S-=S )(
42
Theoretical Basis
ŷ-k = Ayk-1 + Buk
P-k = APk-1AT + Q
Prediction (Time Update)
(1) Project the state ahead
(2) Project the error covariance ahead
Correction (Measurement Update)
(1) Compute the Kalman Gain
(2) Update estimate with measurement zk
(3) Update Error Covariance
ŷk = ŷ-k + K(zk - H ŷ-
k )
K = P-kHT(HP-
kHT + R)-1
Pk = (I - KH)P-k
43
The Prediction-Correction-CyclePrediction
44
The Prediction-Correction-Cycle
Correction
45
Theoretical Basis
ŷ-k = Ayk-1 + Buk
P-k = APk-1AT + Q
Prediction (Time Update)
(1) Project the state ahead
(2) Project the error covariance ahead
Correction (Measurement Update)
(1) Compute the Kalman Gain
(2) Update estimate with measurement zk
(3) Update Error Covariance
ŷk = ŷ-k + K(zk - H ŷ-
k )
K = P-kHT(HP-
kHT + R)-1
Pk = (I - KH)P-k
46
Kalman Filter Summary• Highly efficient:
Polynomial in measurement dimensionality k and state dimensionality n:
O(k2.376 + n2)
• Optimal for linear Gaussian systems!
• Most robotics systems are nonlinear!
9/17/18 47
Relating Regression to Kalman Filters
Kalman Filter• estimate the state• discrete-time process• linear stochastic
difference equation
Linear Regression• Estimate value• Finite set of data/values• Linear equations with
gaussian noise
9/17/18 48
Relating Regression to Kalman Filters
• Kalman Filter produces “real-time” estimates of the coefficients of a linear regression
• Kalman filter is linear optimal estimator, it infers parameters from indirect, inaccurate and uncertain observations
• For Gaussian noise, the Kalman filter minimizes the mean square error of the estimated parameters
• Faulty intuition: Kalman filter is used for prediction of future events based on past data where as regression (least squares) does smoothing within end to end points
• This is not really true…both the estimators (and almost all estimators you can think of) can do either job.
9/17/18 49
END
9/17/18 50