Upload
harold-valdivia-garcia
View
234
Download
0
Embed Size (px)
Citation preview
8/2/2019 Linear Logistic Regression Proofs
1/6
LINEAR AND LOGISITC REGRESSION PROOFS
HAROLD VALDIVIA GARCIA
Contents
1. Linear Regression 12. Logistic Regression 12.1. The Cost Function J() 2
1. Linear Regression
2. Logistic Regression
For logistic Regression, we use h(x(i)) as the estimated probability that the
training example x(i) is in class y = 1 ( or is labeled as y = 1).
Here, we assume that the response variables y(1), y(2), ...y(m) Bern(p = i)
The hypothesis h(x) is the logistic function:
h(x) = g(Tx) =
1
1 + eTx
The derivative of the sigmoid function has the following nice property (the proofis very easy, so we will not prove it):
g(z) = g(z) (1 g(z))
Lets consider X Rmn+1 and Y Rm1 as our dataset.
Now, for the parameters Rn+1, we have that X is a linear combination ofthe features of X:
1
8/2/2019 Linear Logistic Regression Proofs
2/6
2 HAROLD VALDIVIA GARCIA
X =
Tx(1)
T
x
(2)
Tx(3)
...Tx(m)
m1
=
x(1) T
x
(2) T
x(3) T...
x(m) T
m1
Lets define the vector h Rm1 such that:
[h]i = g(Tx(i)) = g(x(i) T) (g is the sigmoid function)
h = g(X) ( g is the matrix version
of the sigmoid function)
h =
g(x(1) T)...
g(x(i) T)...
g(x(m) T
)
m1
2.1. The Cost Function J().
We can get the cost function by using maximum likelihood L() over thejoint distributions of the dataset.
Or by constructing a cost function that penalizes the missclasification.
We present the following similar cost functions :
J1() = 1m
y(i)log(h(x(i))) + (1 y(i))log(1 h(x(i)))
J2() =1
m
y(i)log(h(x
(i))) + (1 y(i))log(1 h(x(i)))
J3() =
y(i)log(h(x(i))) + (1 y(i))log(1 h(x
(i)))
8/2/2019 Linear Logistic Regression Proofs
3/6
LINEAR AND LOGISITC REGRESSION PROOFS 3
The last cost function J3() is the log-likehood () = log(L()) of the parame-ters s. It is easy to demonstrate that minimizing J1() is the same as maximizingJ2() and J3().
min J1() = max J2() = max J3()
2.1.1. Matrix notation for J(). Lets consider the matrix notation for J1():
J1() = 1
m YTlog(h) + (1 + Y)T log(1 h)
2.1.2. Gradient Descent for minimizing J1().
= J1()
J1() = ?
jJ1() =
1
m
j
y(i)log(h(x
(i))) + (1 y(i))log(1 h(x(i)))
jJ1() =
1
m
y(i)
1
h(x(i))
jh(x
(i)) + (1 y(i))1
1 h(x(i))
jh(x
(i))
jJ1() =
1
m
y(i)
1
h(x(i)) (1 y(i))
1
1 h(x(i))
jh(x
(i))
The partial derivative for the hypothesis h(x) is :
8/2/2019 Linear Logistic Regression Proofs
4/6
4 HAROLD VALDIVIA GARCIA
jh(x
(i)) =
jg(Tx(i))
jh(x
(i)) = g(Tx(i))
1 g(Tx(i))
jTx(i)
jh(x
(i)) = g(Tx(i))
1 g(Tx(i))
x(i)j
jh(x(i)) = h
(x(i)) 1 h
(x(i))x(i)
j
then, the partial derivative for J1() can be written as:
jJ1() =
1
m
y(i)1
h(x(i)) (1 y(i))
1
1 h(x(i))
jh(x(i))
jJ1() =
1
m
y(i)
1 h(x
(i)) (1 y(i))h(x
(i))
x(i)j
jJ1() =
1
m
y(i) y(i)h(x
(i)) h(x(i)) + y(i)h(x
(i))
x(i)j
jJ1() =
1
m
y(i) h(x
(i))
x(i)j
jJ1() =
1
m
h(x
(i)) y(i)
x(i)j
8/2/2019 Linear Logistic Regression Proofs
5/6
LINEAR AND LOGISITC REGRESSION PROOFS 5
The expression above in vector notation is:
jJ1() =
1
m
x(1)j x
(2)j . . . x
(i)j . . . x
(m)j
1m
h(x(1)
)
y(1)
h(x(2)) y(2)
...h(x(i)) y(i)
...h(x(m)) y(m)
m1
jJ1() =
1
m
xj
T (h y)
J1() =1
m
XT(h y)
The vector notation for the Gradient descent rule is as follow:
= 1
m
XT(h y)
2.1.3. Gradient Ascent for maximizing J2().
= + J2()
J2() = ?
jJ2() =
1
m
h(x
(i)) y(i)
x(i)j
jJ2() =
1
m
xj
T (h y)
J2() = 1
m
XT(h y)
8/2/2019 Linear Logistic Regression Proofs
6/6
6 HAROLD VALDIVIA GARCIA
The vector notation for the Gradient Ascent rule is as follow:
= 1
mX
T(h y)