Workshop in R & GLMs: #3 Diane Srivastava University of British Columbia srivast@zoology.ubc.ca

Workshop in R & GLMs: #3

Diane Srivastava

University of British Columbia

srivast@zoology.ubc.ca

Housekeeping

ls() asks what variables are in the global environment

rm(list=ls()) gets rid of EVERY variable

q() quit, get a prompt to save workspace or not

hard~dens

500 1000 2000

Fitted values

Residuals vs Fitted

-2 -1 0 1 2

Theoretical Quantiles

Normal Q-Q

500 1000 2000

Fitted values

Scale-Location32

0.00 0.04 0.08

Leverage

Cook's distance

Residuals vs Leverage

hard^0.45~dens

log(hard)~dens

6.5 7.0 7.5 8.0

Fitted values

Residuals vs Fitted

-2 -1 0 1 2

Theoretical Quantiles

Normal Q-Q

6.5 7.0 7.5 8.0

Fitted values

Scale-Location3

0.00 0.04 0.08

Leverage

Cook's distance 0.5

Residuals vs Leverage

Janka exercise

Conclusion:

The best y transformation to optimize the model fit (highest log likelihood)…

..is not the best y transformation for normal residuals

This workshop

• Linear, general linear, and generalized linear models.

• Understand how GLMs work [Excel simulation]

• Definitions: e.g. deviance, link functions

• Poisson GLMs[R exercise]• Binomial distribution and logistic regression• Fit GLMs in R! [Exercise]

In the beginning there were…

Linear models: a normally-distributed y fit to a continuous x

But wait…couldn’t we just code a categorical

variable to be continuous?

Y x1.2 01.3 01.1 10.9 1

Then there were…

General Linear Models: a normally-distributed y fit to a continuous OR

categorical x

But wait…why do we force our data to be normal when often

it isn’t?

Generalized linear models

No more need for tedious

transformations!

Proud to be Poisson !

All variances are unequal, but some are more unequal than others…

Because most things in life

aren’t normal !

Because most things in life

aren’t normal !

Distribution solution !

What linear models do:

1. Transform y2. Fit line to transformed y3. Back transform to linear y

What GLMs do:

Log fitted values

1. Start with an arbitrary fitted line2. Back-transform line into linear space3. Calculate residuals4. Improve fitted line to maximize likelihood

Many iterations

Maximum likelihood• Means that an iterative process is used to find the model equation that has the highest probability (likelihood) of explaining the y values given the x values.

•Equation for likelihood depends on the error distribution chosen

• Least squares – by contrast – minimizes variation from the model.

• If the data are normally distributed, maximum likelihood gives the same answer as least squares.

GLM simulation exercise

• Simulates fitting a model with normal errors and a log link to data.

• Your task:

(1) understand how the spreadsheet works

(2) find through an iterative process the best slope

Generalized linear modelsIn least squares, we fit:y=mx + b + error

In GLM, the model is fit more indirectly:y=g(mx + b + error)

where g is a function, the inverse of which is called the “link function”:

link fn(expected y) = mx + b + error

LMs vs GLMs

• Uses least squares

• Assumes normality

• Based on Sum of Squares

• Fits model to transformed y

• Uses maximum likelihood

• Specify one of several distributions

• Based on deviance

• Fits model to untransformed y by means of a link function

All that really matters…

• By using a log link function, we do not need to calculate log(0).

• Be careful! A log link model predicts log y not y!

• Error distribution need not be normal : Poisson, binomial, gamma, Gaussian (=normal)

Exercise

1. Open up the file : Rlecture.csv

diane<-read.table(file.choose(),sep=“,",header=TRUE)

2. Look at dataframe. Make treat a factor (“treat”)

3. Fit this model:

my.first.glm<-glm(growth~size*treat, family=poisson

(link=log), data=diane); summary(my.first.glm)

4. Model dignosticspar(mfrow=c(2,2)); plot(my.first.glm)

Overdispersion

Underdispersed Overdispersed Random

Overdispersion

Is your residual deviance = residual df (approx.)?

If residual dev>>residual df, overdispersed.

If residual dev<<residual df, underdispersed.

Solution:

second.glm<-glm(growth~size*treat, family = quasipoisson (link=log), data=diane); summary(second.glm)

Options

family default link other links

binomial logit probit, cloglog

gaussian identity

Gamma -- identity,inverse,

poisson log identity, sqrt

Rlecture.csv

0 2 4 6 8

0 1 2 3 4 5 6 7

Binomial errors

• Variance gets constrained near limits; binomial accounts for this

• Type 1: Classic example: series of trials resulting in success (value=1) or failure (value=0).

• Type 2: Also continuous but bounded (e.g. % mortality bounded between 0% and 100%).

Logistic regression

• Least squares: arcsine transformations

• GLMs: use logit (or probit) link with binomial errors

-40 -20 0 20 40 60 80

p = proportion of successes

If p = eax+b / (1+ eax+b) calculate:

loge(p/1-p)

Logits continued

Output from logistic regression with logit link: predicted loge (p/1-p) = a+bx

To obtain any expected values of p, need to input a and b in original equation:

p = eax+b / (1+ eax+b)

Binomial GLMs

Type 1 binomial• Simply set family = binomial (link=logit)

Type 2 binomial• First create a vector of % not parasitized.

• Then “cbind” into a matrix (% parasitized, % not parasitized)

• Then run your binomial glm (link = logit) with the matrix as your y.

Homework

1. Fit the binomial glm survival = size*treat

2. Fit the bionomial glm parasitism = size*treat

3. Predict what size has 50% parasitism in treatment “0”

Workshop in R & GLMs: #3 Diane Srivastava University of British Columbia srivast@zoology.ubc.ca

Documents

Habitat structure, trophic structure and ecosystem ...srivast/papers/srivastava_2006.pdf · Habitat structure, trophic structure and ecosystem function: ... with each leaf collecting

Code of Conduct Embodying Integrity - GLMS · Embodying Integrity November 2018 . 2 GLMS Code of Conduct ----- President’s Foreword ... TRANSPARENCY, ETHICS & INTEGRITY: GLMS Members

Butterﬂy diversity and silvicultural practice in …srivast/papers/Butterfly.pdfsilvicultural practice affects the diversity, abundance and community structure of insects (Watt et

Communications An ISO 9001: 2008 Company GL ...glmanagementservices.com/GLMS_Profile.pdfBenefit of outsourcing to GLMS • Reduce/Control Your Operating Costs • Improve Department

Biol 456 Comparative Vertebrate Endocrinology Instructor: Dr. Robert Harris Office: 2530 Biological Sciences Phone: 822-5709 Email: harris@zoology.ubc.ca

Workshop in R & GLMs: #2

Pui-Ling Chan Endocrinologist GLMS/CIL Symposium 23 Feb …• Prostration, Nausea, vomiting • Orthostatic hypotension, dizziness • Fatigue, unexplained weight loss • Headache,

Precept 4 - More GLMs: Models of Binary and Lognormal Outcomes - Soc 504… · GLMsComplementary log-logQuantities of Interest Precept 4 - More GLMs: Models of Binary and Lognormal

Practical 3 - Fitting hierarchical GLMs · Practical 3 - Fitting hierarchical GLMs Andrew Parnell Introduction Inpractical3wearegoingto: • Fitsomehierarchicalregressionmodels

From GLMs to GAMs

Scaled Least Squares Estimator for GLMs in Large-Scale ...papers.nips.cc/paper/6522-scaled-least-squares-estimator-for-glms... · Scaled Least Squares Estimator for GLMs ... the second

Introduction to Predictive Modeling Using GLMs › cas › annual14 › webprogram › Handout...Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual

More Flexible GLMs Zero-Inflated Models and Hybrid Models

2019 MONITORING & INTELLIGENCE REPORT · On behalf of the GLMS Executive Committee and team, I am pleased to present you the results of the GLMS monitoring and intelligence activities

Estimation of Dispersion Parameters in GLMs with … · Mathematical Statistics Stockholm University Estimation of Dispersion Parameters in GLMs with and without Random Eﬁects Meng

S4C03, HW2: model formulas, contrasts, ggplot2 and basic GLMs

Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Hong Tran, April 21, 2015

Loss reserving with GLMs: a case study · Loss reserving with GLMs 3 The scheme of insurance commenced in its present form in September 1994, and the data base contains claims with

MARINE MAMMAL...Marine Mammal Research Unit University of British Columbia Room 247, AERL, 2202 Main Mall Vancouver, B.C. Canada V6T 1Z4 Tel.: (604) 822-8181 Fax: (604) 822-8180 consortium@zoology.ubc.ca

PROFILE OF THE TEACHERS 1. Name of the teacher: Dr. Deepa ...182.18.165.51/Fac_File/BioData349.pdf · Deepa Srivast ava and K.Shukl a Interna tional 201 1 ISBN-ISBN 978-3-8454-1211-