Logistic Regression - MIT OpenCourseWare · 2020-01-04 · Harvard-MIT Division of Health Sciences...

Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo

6.873/HST.951 Medical Decision SupportFall 2005

Logistic Regression Maximum Likelihood Estimation

Lucila Ohno-Machado

Risk Score of Death from AngioplastyUnadjusted Overall Mortality Rate = 2.1%

Number of Cases

0.4% 1.4%

0 to 2 3 to 4 5 to 6 7 to 8 9 to 10 >10

Risk Score Category

Mortality

7.6% 2.9% 1.6% 1.3%

Linear RegressionOrdinary Least Squares (OLS)

y x3Minimize Sum of Squared Errors (SSE)

n data points x4 i is the subscript for each point x2 x1

yi = β0 + β xi1 x n n

2 2SSE = ∑ (y − yi ) = ∑[y − (β + β xi )]i i 0 1 i=1 i=1

1 =pi 1+ e−(β +β xi )0 1

β0 +β1xie p = i eβ0 +β1xi +1

logit⎤

log⎣

pi ⎦⎥ = β01+ β1xi⎢

Increasing β

0 10 20 300

0 10 20 30

Series1Series1Series1Series1Series1

0 10 20 300

0 10 20 30

Series1Series1Series1Series1Series1Series1

0 10 20 300

0 10 20 30

Series1Series1Series1Series1Series1

Finding β0

• Baseline case 1pi =

1+ e− (β )0

Blue(1) Green(0) 1Death 28 22 50 297.0 =

1+ e− (β )Life 45 52 97

Total 73 74 147

β 0 − = 8616.0

Odds ratio

• Odds: p/(1-p) |blue death • Odds-ratio

p 1− p |=OR blue death

p |green death Blue Green

|Death 28 22 50 1− p green death

Life 45 52 97 28 / 45Total 73 74 147 OR = = 47.1

22 / 52

What do coefficients mean?

eβ color = ORcolor

OR = 28 / 45

= 47 .122 / 52

eβ color = 47 .1Blue Green β color = 385.0

Death 28 22 50 1 Life 45 52 97 pblue =

1+ e − − 8616.0 + 385.0 ) = 383.0 (

Total 73 74 147 1 pgreen = 1+ e 8616.0 = 297.0

What do coefficients mean?

eβage = ORage

p age death =50|

Age49 Age50 1− p age death =50OR = |

Death 28 22 50 p age death =49| Life 45 52 97

|1− p age death =49Total 73 74 147

Why not search using OLS?y x3

yi = β0 + β xi1

n x4 x22SSE = ∑ (y − yi ) x1ii=1

⎤log

pi ⎦⎥ = β01+ β1xi⎢

P(model | data) ?

If only intercept is allowed, which1 value would it have?pi = 1+ e −( β β 1xi )0 +

P (data | model) ?P(data|model) = [P(model | data) P(data)] / P(model)

When comparing models:

P(model): assume all the same (ie, chances ofbeing a model with high coefficients the same aslow, etc)

P(data): assume it is the same Then,

P(data | model) α P(model | data)

Maximum Likelihood Estimation

• Maximize P(data | model)• Maximize the probability that we would

observe what we observed (given assumption of a particular model)

• Choose the best parameters from the particular model

Maximum Likelihood Estimation

• Steps:– Define expression for the probability of data

as a function of the parameters – Find the values of the parameters that

maximize this expression

Likelihood Function

L = Pr(Y )L = Pr( y y 2 ,..., y )1, n

L = Pr( y ) Pr(y Pr( )... y )1 2 n ∏ =

Likelihood FunctionBinomial

L = Pr(Y )L = Pr( y y 2 ,..., y )1, n

L = Pr( y ) Pr( y Pr( )... y ) = ∏ Pr( y )1 2 n i i =1

Pr( y = 1) = pi i

Pr( y = 0) = (1− p )i i

ii y i

y ii ppy −−= 1)1()Pr(

)1(log)(log

)1(log)1(

loglog

ββ +−=

−+⎟⎟⎠

⎞⎜⎜⎝

⎛−

∑∑

∑∑Since model is the logit

Likelihood Function

n n yi 1− yiL = ∏ Pr( y ) = ∏ p (1− p )i i i

i= 1 i= 1

n ⎛ i

L = ∏⎜⎜ pi ⎞

⎟⎟ y

(1− p )i= 1 ⎝ (1− p ) ⎠

Log Likelihood Function

n n yi 1− yiL = ∏ Pr( y ) = ∏ p (1− p )i i i

i= 1 i= 1

in ⎛ p ⎞ y

i= 1 ⎝ (1− pi ) ⎟⎟⎠

(1− pi )L = ∏⎜⎜

p ⎞ilog L = ∑ y log⎜⎜⎛

⎟⎟ + ∑ log(1− p )i i ⎝ (1− p ) ⎠ i

Log Likelihood Function

p ⎞ilog L = ∑ y log⎜⎜⎛

⎟⎟ + ∑ log(1− p )i i ⎝ (1− p ) ⎠ i

βxi )log L = ∑ y (βxi ) − ∑ log(1+ e i i

Since model is the logit

=−=∂

∂ ∑∑

0ˆlog

Not easy to solve because y-hat is non-linear, need to use iterative methods: most popular is Newton-Raphson

Maximize

∂ log L = ∑ y xi − ∑ y∂β i

ˆ x = 0i i

1 Not easy to solve becauseyi = y-hat is non-linear, need to1+ e−βxi use iterative methods: most popular is Newton-Raphson

Newton-Raphson

• Start with random or zero βs

• “walk” in the “direction” that maximizes MLE – how big a step (Gradient or Score)– direction

Maximizing the LogLikelihood

Log Likelihood

βi+1 βi First iteration LL Initial LL

Maximizing the LogLikelihood

Log Likelihood

βi+1 βi Second iteration LL New Initial LL

Similar iterative method to Minimizing the Error in Gradient Descent (neural nets)

winitial wtrained

initial error

final error

negative

local minimum

Error surface

derivative

positive change

Newton-Raphson Algorithmβ xi )log L = ∑ y (β xi ) − ∑ log(1+ e

U (β ) =∂ log L

= ∑ x y − ∑ i i∂β i i i x y Gradient

' I (β ) =∂ 2 log L

= −∑ y x x i i ˆ (1− y ˆ ) Hessiani iβ β ' i∂ ∂

− 1β j + 1 = β j − I (β j )U (β j ) a step

Convergence

Criterion

β j+1 − β j < 0001.β j

Convergence problems: complete and quasi-complete separation

Complete separation

MLE does not exist (ie, it is infinite)

βi βi+1

Quasi-complete separation

Same values for predictors, different outcomes

No (quasi)complete separationis fine to find MLE

How good is the model?

• Is it better than predicting the same prior probability for everyone? (ie, model with just β0)

• How well do the training data fit?

• How well does is generalize?

Generalized likelihood-ratio test

• Are β 1, β 2, …, β different from 0?nn n

yi 1− yiL = ∏ Pr( y ) = ∏ p (1− p )i i ii= 1 i= 1

log L = ∑ [y log p + (1− y ) 1log( − p )]i i i ii

G − = log 2 L + log 2 Lo 1

G has χ 2 distribution

cross − entropy _ error − = ∑ [y log p + (1− y ) 1log( − p )]i i i ii

AIC, SC, BIC

• To compare models

• Akaike’s Information Criterion, k parameters

AIC=− log 2 k L+2

• Schwartz Criterion, Bayesian Information Criterion, n cases

BIC=− log 2 k L logn+

Summary

• Maximum Likelihood Estimation is used in finding parameters for models

• MLE maximizes the probability that the data obtained would have been generatedby the model

• Coming up: goodness-of-fit (how good are the predictions?) – How well do the training data fit?– How well does is generalize?

Logistic Regression - MIT OpenCourseWare · 2020-01-04 · Harvard-MIT Division of Health Sciences...

Documents

Unsupervised Learning - MIT OpenCourseWare · PDF fileHST.951J: Medical Decision Support Unsupervised Learning A review of clustering and other exploratory data analysis methods Harvard-MIT

Harvard-MIT Division of Health Sciences and Technology HST ... · Harvard-MIT Division of Health Sciences and Technology HST.535: Principles and Practice of Tissue Engineering Instructors:

Tissue Repair, Fibrosis, and Healing - MIT OpenCourseWare · PDF fileHarvard-MIT Division of Health Sciences and Technology HST.035: Principle and Practice of Human Pathology Dr. Badizadegan

NONSTEROIDAL ANTI- INFLAMMATORY DRUGS Harvard-MIT …€¦ · INFLAMMATORY DRUGS. Harvard-MIT Division of Health Sciences and Technology HST.151: Principles of Pharmacology • MRS

lecture7 - MIT OpenCourseWare · Harvard-MIT Division of Health Sciences and Technology HST.952: Computing for Biomedical Scientists HST 952 Computing for Biomedical Scientists

Linking Genotypes and Phenotypes Peter J. Park, PhD Children’s Hospital Informatics Program Harvard Medical School HST 950 Lecture #21 Harvard-MIT Division

Harvard-MIT Division of Health Sciences and …dspace.mit.edu/.../contents/lecture-notes/lecture4a.pdfHarvard-MIT Division of Health Sciences and Technology HST.725: Music Perception

Welcome to HST.508/Biophysics 170 - MIT OpenCourseWare...Welcome to HST.508/Biophysics 170 Harvard-MIT Division of Health Sciences and Technology HST.508: Quantitative Genomics, Fall

Magnetic Resonance HST.584J / 22 - MIT OpenCourseWare · Magnetic Resonance HST.584J / 22.561 Compiled and Written By: Faith Van Nice 1992, Mark Haig Khachaturian 2002 Instructors:

Survival Analysis - MIT OpenCourseWare · PDF fileSurvival Analysis Decision Systems Group Brigham and Women’s Hospital Harvard-MIT Division of Health Sciences and Technology HST.951J:

Harvard-MIT Division of Health Sciences and Technology HST.151: Principles of Pharmacologydspace.mit.edu/bitstream/handle/1721.1/35905/HST-151... · 2019. 4. 9. · Harvard-MIT Division

· Transpluto vom HST entdecktl Das Hubble-Space-TeIeskop (HST) hat mit der HDFOC (High Density Faint Object Camera) ein Objekt im Sternbild Ca- melopardalis mit relativ hoher Eigenbewegung

Biomimetic Design of Scaffold - MIT OpenCourseWare · PDF fileHarvard-MIT Division of Health Sciences and Technology HST.535: Principles and Practice of Tissue Engineering Instructor:

Decision Support via Expert Systems 6.872/HST950 Harvard-MIT Division of Health Sciences and Technology HST.950J: Medical Computing Peter Szolovits, PHD

Harvard-MIT Division of Health Sciences and Technology HST ...dspace.mit.edu/bitstream/handle/1721.1/35917/HST-121Fall...Anatomy, Physiology, and Biophysical Chemistry of terminal

Information Technology in the Healthcare System of the … · HST.921 / HST.922 Information Technology in the Health Care System of the Future, Spring 2009 Harvard-MIT Division of

Epithelial Structure and Function - MIT OpenCourseWare · HST.035: Principle and Practice of Human Pathology ... GI Tract: – Mucosa ... • Inflammatory disorders

Health Sciences and Technology (HST) - MITcatalog.mit.edu/subjects/hst/hst.pdf · HEALTH SCIENCES AND TECHNOLOGY (HST) HST.035 Pathology of Human Disease Subject meets with HST.034

Harvard-MIT Division of Health Sciences and Technology …dspace.mit.edu/bitstream/handle/1721.1/35917/HST-121… · · 2017-05-10Harvard-MIT Division of Health Sciences and Technology

Harvard-MIT Division of Health Sciences and Technology … · 2019-09-13 · Harvard-MIT Division Health and Technology HST Principles at Pharmacology M. Weinblatt 1 HSTISO Anti-Innammatory