Upload
colby-stoever
View
101
Download
2
Embed Size (px)
Citation preview
INSTITUTIONAL RESEARCH AND REGRESSION
ABSTRACT
• WORKSHOP 5: REGRESSION MODELS AND THEIR USES IN INSTITUTIONAL RESEARCH
THE WORKSHOP WILL NOT BE A STATISTICAL LECTURE ON THE MATHEMATICS UNDERLYING REGRESSION TECHNIQUES, AND IT WILL NOT BE ABOUT PROGRAMMING REGRESSION MODELS INTO MY FAVORITE STATISTICAL PROGRAMMING LANGUAGE. (ALTHOUGH AT DINNER, I WOULD LOVE TO DISCUSS THESE TOPICS). RATHER THIS WORKSHOP WILL INTRODUCE PARTICIPANTS TO THE WORLD OF REGRESSION BY DISCUSSING WHAT TYPES OF REGRESSION MODELS EXIST, WHAT TYPES OF RESEARCH QUESTIONS CAN BE PARTIALLY ANSWERED WITH THESE REGRESSION MODELS, WHERE A RESEARCHER CAN REALLY MESS UP, AND BASIC INTERPRETATION OF RESULTS FOR END USERS.
THE GOAL OF THIS WORKSHOP IS TO EDUCATION PARTICIPANTS ON WHAT REGRESSION MODELS THEY WANT TO INVESTIGATE FURTHER TO BETTER THEIR IR OFFICES.
WHAT WE WILL NOT DO
• MATH IS OUT.
• (SOMETIME)FORMULAS BUT YOU WILL NOT HAVE TO REMEMBER THEM OR
UNDERSTAND THEM.
• EACH OF THE TOPICS DISCUSSED IN THIS WORKSHOP COULD FILL A COURSE OR
AT LEAST TWO WEEKS OF LECTURES.
WHAT WE WILL DO
• LEARN ABOUT TYPES OF REGRESSION.
• BECOME AWARE OF WHAT CAN BE DONE. ******
• LEARN HOW TO INTERPRET INFORMATION FROM REGRESSION MODELS. (VERY
BASIC)
• LEARN WHAT RESEARCH QUESTIONS CAN AND CANNOT BE ANSWERED BY EACH
REGRESSION TYPE.
• LEARN ABOUT COMMON PROBLEMS.
• LEARN IR USES.
CONTENTS
• QUICK REVIEW OF INFORMATION
• LINEAR REGRESSION
• LOGISTICAL REGRESSION
• MULTINOMIAL REGRESSION
• ORDINAL REGRESSION
• POISSON REGRESSION
• HIERARCHICAL LINEAR MODELING
• EVENT-HISTORY ANALYSIS
• REGRESSION-DISCONTINUITY DESIGN
• TIME SERIES REGRESSION (FORECASTING)
SCALES OF MEASUREMENT
• NOMINAL SCALE
• ORDINAL SCALE
• INTERVAL SCALE
• RATIO SCALE
• NEVER FORCE A METRIC INTO ANOTHER SCALE (NO MEDIAN SPLITS).
TERMS
• OUTCOME VARIABLES (DEPENDENT VARIABLES)-
• PREDICTOR VARIABLES (INDEPENDENT VARIABLES)
CORRELATION VERSUS CAUSATION
• MOST IR RESEARCH DOES NOT AND OFTEN CANNOT EXPLAIN A CAUSAL
RELATIONSHIP.
• STATISTICS (NO MATTER HOW WELL DONE) DO NOT PROVE BY THEMSELVES
CASUAL RELATIONSHIPS.
• ONLY RESEARCH METHODOLOGY CAN HELP PROVE CAUSATION.
• IT IS OFTEN UNETHICAL BECAUSE THE RANDOM ASSIGNMENT IS NOT ALWAYS
ETHICAL.
• CHOOSE YOUR LANGUAGE WISELY.
MEDIATION VERSUS MODERATION
• MEDIATION (MEDIATOR VARIABLES)- SHOWING A DIRECT RELATIONSHIP
BETWEEN A OUTCOME AND PREDICTOR VARIABLE
• MODERATION (MODERATOR VARIABLES)-INFLUENCES THE DIRECTION (SIGN) OR
STRENGTH OF A RELATIONSHIP BETWEEN A PREDICTOR AND OUTCOME
VARIABLE.
• SPECIAL TECHNIQUES ARE NEEDED TO DETERMINE MODERATION.
• BARON, R. M., & KENNY, D. A. (1986). THE MODERATOR-MEDIATOR VARIABLE DISTINCTION IN SOCIAL
PSYCHOLOGICAL RESEARCH: CONCEPTUAL, STRATEGIC, AND STATISTICAL CONSIDERATIONS. JOURNAL OF
PERSONALITY AND SOCIAL PSYCHOLOGY, 51, 1173-1182.
POWER
• TYPE I ERROR- INCORRECT REJECTION OF A TRUE NULL HYPOTHESIS
• TYPE II ERROR- THE FAILURE TO REJECT A FALSE NULL HYPOTHESIS
• POWER OF A STATISTICAL TEST IS THE PROBABILITY THAT THE TEST WILL REJECT
THE NULL HYPOTHESIS WHEN THE ALTERNATIVE HYPOTHESIS IS TRUE (I.E. THE
PROBABILITY OF NOT COMMITTING A TYPE II ERROR).
• IR HAS LARGE POTENTIAL TO COMMIT BOTH
THE EARTH IS ROUND (P<.05)JACOB COHEN
PRACTICAL VERSUS STATISTICAL SIGNIFICANCE
• P<0.05 OR P<.001 OR P<.00001 WHICH IS BETTER???
• P-VALUES ARE N SAMPLE SIZE DEPENDENT.
• THE MORE N THE MORE CHANCES OF PROVING STATISTICAL SIGNIFICANCE
• P-VALUES ARE NOT DIRECTLY RELATED TO THE SIZE OF A RELATIONSHIP OR
EFFECT.
• EFFECT SIZE STATISTICS LIKE R2, R2, COHEN’S D, AND OMEGA-SQUARED
• IR RESEARCHERS NEED TO BE CARE ABOUT OVER POWERED MODELS --
ANOVA, ANCOVA, MANOVA VS REGRESSION
• ANOVA, ANCOVA, & MANOVA ARE SPECIAL FORMS OF REGRESSION.
• NO STATISTIC PROVES CAUSATION.
• ANOVA, ACOVA, MANOVA DO NOT PROVE CAUSATION.
• REGRESSION IS NOT JUST CORRELATION. IT CAN HELP PROVE CAUSATION.
LINEAR REGRESSION
WHAT DOES LINEAR REGRESSION TELL US?
• LINEAR REGRESSION HELPS US UNDERSTAND HOW ONE VARIABLE RELATES TO ONE OR MANY VARIABLES.
• FOR EXAMPLE, HOW DO STUDENTS SAT MATH SCORES RELATE TO SES, GENDER, AND HIGH SCHOOL GPA.
• LINEAR REGRESSION CAN HELP PREDICT ONE VARIABLE GIVEN ONE OR MULTIPLE VARIABLES.
• LINEAR REGRESSION CAN TELL US HOW MUCH SEVERAL VARIABLES RELATED TO ONE VARIABLE.
• LINEAR REGRESSION CAN TELL US HOW, HOW MUCH, AND IF VARIABLES RELATED TO A VARIABLE.
LINEAR REGRESSION
•OUTCOME VARIABLES (DEPENDENT VARIABLES)
MUST BE INTERVAL OR RATIO SCALE.
• PREDICTOR VARIABLES (INDEPENDENT VARIABLES)
CAN BE ALL TYPES OF VARIABLES.
DUMMY CODING AND REFERENCES
• PREDICTOR VARIABLES (INDEPENDENT VARIABLES) CAN BE ALL TYPES OF
VARIABLES.
• BUT…..
• NOMINAL VARIABLES LIKE GENDER BECAUSE THEY ARE DICHOTOMOUS (TWO
CHOICES)
• ETHNICITY – IS NOT SO EASY IF YOU HAVE 5 ETHNICITIES YOUR REGRESSION
MODEL WILL NEED 4 VARIABLES + 1 REFERENCE GROUP
LINEAR REGRESSION
• 𝑌 = 𝑎 + 𝑏1𝑋1 + 𝑏2𝑥2 UNSTANDARDIZED EQUATION
• 𝑌 = 𝑏1𝑋1 + 𝑏2𝑥2 STANDARDIZED EQUATION
• 𝑏 = REGRESSION COEFFICIENT
• 𝑎= CONSTANT
• X=ACTUAL VALUES
• 𝑌= PREDICTED VALUE OF THE OUTCOME VARIABLE
UNSTANDARDIZED EQUATION
• UNSTANDARDIZED EQUATION
• 𝑌 = 𝑎 + 𝑏1𝑋1 + 𝑏2𝑥2 UNSTANDARDIZED EQUATION
• ALL NUMBERS ARE IN TERM OF THE OUTCOME VARIABLE
• YOU CAN ACTUALLY PLUG ACTUAL VALUES OF THE PREDICTOR VARIABLES AND
GET A PREDICTED VALUE OF THE OUTCOME VARIABLE
STANDARDIZED EQUATION
• 𝑌 = 𝑏1𝑋1 + 𝑏2𝑥2 STANDARDIZED EQUATION
• REGRESSION COEFFICIENT EXIST BETWEEN -1 AND 1
ASSUMPTIONS OF LINEAR REGRESSION
• INDEPENDENCE
• NORMALITY
• HOMOSCEDASTICITY
• LINEAR
• MULTICOLLINEARITY
COOL STUFF
• MULTIPLE- R2
• DIFFERENCE IN R
• PARTIAL CORRELATIONS
• SEMI-PARTIAL CORRELATIONS
HTTP://PAGES.UOREGON.EDU/STEVENSJ/MRA/PARTIAL.PDF
EXAMPLE
• OVERALL MODEL USING THE ELA EXIT LEVEL TAKS AS THE VARIABLE OF INTEREST WAS SIGNIFICANT (F(3,232687)=4551.81, P<.0001, ADJR2=.0554).
• MODEL USING ELA TAKS SCALES SCORES FOUND THAT MALES (𝛽 = .119, P<.0001, SR2=0.013)
• AND STUDENTS WHO PARTICIPATED IN FREE/REDUCED LUNCH PROGRAMS (𝛽 = −.205, P<.0001, SR2=0.041) WERE PREDICTED TO HAVE STATISTICALLY LOWER SCORES ON THE ELA TAKS.
• GENDER ONLY EXPLAINED 1.0% OF THE VARIABLE IN THE ELA TAKS AND SHOULD BE VIEW HAS A USELESS VARIABLE IN PREDICTING ELA TAKS SCORES.
• STATISTICALLY, AVID H.S. GRADUATES WERE PREDICTED TO HAVE HIGHER ELA TAKS SCORES THAN NON-AVID H.S. GRADUATES (𝛽 = .026, P<.0001, SR2=0.0006); HOWEVER SINCE ONLY 0.06% OF THE VARIANCE IN THE TAKS TEST WAS EXPLAINED BY AVID PARTICIPATION, READERS SHOULD VIEW AVID AND NON-AVID H.S. GRADUATES AS PERFORMING THE SAME ON THE ELA EXIT LEVEL TAKS.
LINEAR REGRESSION AND IR
• IT IS GREAT IF YOU HAVE AN OUTCOME VARIABLE THAT IS IN INTERVAL OR
RATIO SCALE.
• MOST INDIVIDUALS CAN UNDERSTAND THE INFORMATION PRODUCED.
• HOWEVER, MANY IR OUTCOME VARIABLES ARE NOT INTERVAL OR RATIO SCALE.
WARNING: LINEAR REGRESSION
• SOMETIME RELATIONSHIPS ARE NOT LINEAR.
• PREDICTOR VARIABLES THAT ARE HIGHLY RELATED TO EACH OTHER CAN BE A
PROBLEM.
• SMALLER IS BETTER. --- PARSIMONIOUS MODELS ARE BETTER.
• OUTCOME VARIABLE THAT ARE BINARY OR CLASSIFICATION VARIABLES.
• OVER POWERED MODELS.
LOGISTIC REGRESSION
• WE USED LOGISTIC REGRESSION TO PREDICT DICHOTOMOUS OUTCOMES.
• Y = B0 + B1X + E
• UNLIKE LINEAR REGRESSION, THERE IS NO STANDARDIZED MODEL.
• UNFORTUNATELY, THE UNSTANDARDIZED MODEL IS NOT AS EASY TO
UNDERSTAND AS LINEAR REGRESSION.
• ALSO, THE MAGIC OF R2 DOES NOT EXIST FOR LOGISTIC REGRESSION
• IN FACT, HYPOTHESIS TEST IS DONE USING CHI-SQUARE TEST.
LOGISTIC VS LINEAR REGRESSION
EXAMPLE
LOGITS, ODDS, AND PROBABILITIES
PROBABILITIES
• PROBABILITY- IS THE LIKELIHOOD OF AN EVENT (OR THING) OCCURRING
• THE TIMES THE EVENT CAN OCCUR/ THE NUMBER OF POSSIBLE EVENTS
• HEADS ON A COIN FLIP– ½- .5
• SIX ON A DICE – 1/6-.167
• EXIST FROM 0 TO 1
• NO CHANCE OF EVENT OCCURRING- GIVING A 7 ON A DICE ROLL
• 0.5 EQUAL CHANCES OF AN EVENT OCCURRING
• 1 NO CHANCE OF AN EVENT NOT OCCURRING – 1 PICK IN A DRAW
LOGITS
• LOGITS ARE THE NATURAL EXPRESSION OF LOGISTIC REGRESSION.
• IT MAKES A NON-CONTINUOUS THING TO CONTINUOUS.
• IT COMPUTED THROUGH THE FORMULA
• LOGIT(PROMOTION)=.39(PUBS)-6.00
• 4 PUBS=LOGIT=-4.44
• EXIST -∞ TO +∞
• IT IS NOT EASY TO UNDERSTAND OR EXPLAIN
ODDS
• ODDS ARE THE EXP(LOGIT)
• EXISTS 0 TO ∞.
• 0 TO 1 --- 0 MEANS NO CHANCE OF THE THING OCCURRING
• 1 MEANS 1 TO 1 CHANCE—COIN FLIP
• 1 TO ∞
• NOW, WE ARE TALKING ABOUT THE INCREASE LIKELIHOOD OF AN EVENT
OCCURRING.
• ABOVE 1 IS EASY TO INTERPRET
• BELOW 1 IS NOT EASY TO INTERPRET
LOGITS, ODDS, AND PROBABILITIES
• LOGITS ARE NOT LOGIC TO INDIVIDUALS OUTSIDE OF STATISTIC.
• ODDS ARE EASY INTERPRET IF ABOVE ONE
• PROBABILITIES HARD FOR PEOPLE TO UNDERSTAND AS WELL
COOL STUFF
• DIFFERENCE IN CHI-SQUARE BETWEEN MODELS
• COX AND SNELL INDEX
• NAGELKERKE INDEX
• NON-CENTRALITY PARAMETER
LOGISTIC REGRESSION AND IR
• MANY ISSUES IR WANTS TO STUDY ARE DICHOTOMOUS.
• PRETTY EASY FOR MOST INDIVIDUALS TO UNDERSTAND WHEN ODDS RATIOS
ARE USED.
MULTINOMIAL LOGISTIC REGRESSION
• WHAT IF YOUR OUTCOME VARIABLE IS NOMINAL BUT NOT DICHOTOMOUS?
• LETTER GRADES IN A COURSE.
• HIGH, MEDIUM, LOW
• MULTINOMIAL LOGISTIC REGRESSION CAN TELL YOU HOW INDIVIDUALS WILL
LIKELY BE PLACED IN THE GROUP OF YOUR OUTCOME VARIABLE.
• I JUST WANT YOU TO BE AWARE OF THE EXISTENCE OF MULTINOMIAL
REGRESSION.
ORDINAL LOGISTIC REGRESSION
• WHAT IF YOUR OUTCOME VARIABLE IS ORDINAL (RANK)?
• CLASS RANK, PLACES IN A RACE OR TOURNAMENT
• I HAVE NEVER DONE THIS BEFORE
POISSON REGRESSION
• COUNT DATA.
• PREDICTS NUMBER OF EVENTS THAT OCCUR IN A SPECIFIC TIME PERIOD FROM
ONE OR MORE INDEPENDENT VARIABLES.
• WORKS EVEN WHEN EVENTS ARE RARE OR MANY PEOPLE HAVE ZERO EVENTS
• GREAT FOR ATTENDANCE DATA, WHEN DEATH IS AN EVENT
EXAMPLE
HIERARCHICAL LINEAR MODEL
HIERARCHICAL LINEAR MODELS
• HIERARCHICAL LINEAR MODELS- REGRESSION ANALYSIS MODELS THAT
CONTAIN PREDICTORS MEASURED AT MORE THAN ONE LEVEL OF AGGREGATION
OF DATA (COHEN, ET AL. , 2003)
• AS CALLED MULTILEVEL MODEL.
• LINEAR AND LOGISTIC REGRESSION MODEL ASSUMES INDEPENDENTS IN
OBSERVATIONS.
• ARE ANY OBSERVATIONS EVER TRULY INDEPENDENT?
HIERARCHICAL LINEAR MODELS
HIERARCHICAL LINEAR MODELS
• STUDENTS → CLASSES → COLLEGE →UNIVERSITY
• CLUSTERING – OF OBSERVATIONS IS COMMON
• WHAT ARE SOME OF CLUSTERS THAT WERE COMMON?
HIERARCHICAL LINEAR MODELS
• INTRACLASS CORRELATION
• ICC IS THE PROPORTION OF VARIANCE IN THE OUTCOME ATTRIBUTABLE TO THE
GROUPING.
• RANGE FROM 0 TO 1;
• 0 IS SMALL AND 1 IS LARGE.
• SOME RESEARCHERS SAY EVEN A SMALL CORRELATION MEANS DEPENDENTS OF
OBSERVATIONS (0.05)
• THE FORMULA LARGE AND NOT NEEDED FOR THIS WORKSHOP.
REMEMBER – HLM IS ALSO CALLED
MULTILEVEL MODELING.
BASICALLY, HLM ASSUMES EACH
CLUSTER IN YOUR DATA HAS A
DIFFERENT REGRESSION LINE.
HLM TAKES ALL OF THESE
REGRESSION LINES AND AVERAGES
THEM TOGETHER.
HIERARCHICAL LINEAR MODELS
• FOR EXAMPLE – I WANT TO PREDICT MATH ACHIEVEMENT GIVEN MATH ANXIETY
• HLM—MORE THAN ONE LEVEL, MORE THAN ONE EQUATION
• LEVEL ONE IS LIKE NORMAL LINEAR REGRESSION
• IT REPRESENTS THE LOWEST LEVEL (THE TRUE UNIT OF ANALYSIS)—THE INDIVIDUAL
STUDENT –PARTICIPANT
• ONE-LEVEL REGRESSION IS SINGLE EQUATION CONTAINING COEFFICIENTS
• LEVEL TWO IS STUDENTS NESTED WITHIN CLASSES
• REGRESSION EQUATIONS FOR REGRESSION COEFFICIENTS
HLM AND IR
• HLM IS DIFFERENT TO LEARN FOR SEASONED RESEARCHER.
• NESTING WITHIN AN ORGANIZATION CAN BE IMPORTANT.
• HOWEVER, THINK ABOUT EXPLAIN THE IDEA OF “AVERAGE OF REGRESSION
COEFFIENTS” COEFFICIENT
•
EVENT HISTORY ANALYSISSURVIVAL ANALYSIS
EVENT HISTORY ANALYSIS
• LOGISTIC, HLM, AND HLM ARE SNAP SHOT DATA.
• THEY ONLY LOOK AT ONE POINT IN TIME.
• WHAT IF YOU NEED TO KNOW WHEN AN EVENT OCCURS OR WHEN IT IS LIKELY
TO OCCUR?
• EVENT HISTORY ANALYSIS IS LOGISTIC REGRESSION WITH TIME AS A VARIABLE.
EVENT HISTORY ANALYSIS
• EVENTS ARE TRANSITIONS IN STATUS (CHANGES FROM ONE STATE TO
ANOTHER)
• METHODOLOGICAL FEATURES OF DATA AND EVENT OCCURRENCE DESIGNS
• TARGET EVENT
• OCCURRENCE OF A PARTICULAR (WELL DEFINED) EVENT IS THE FOCUS OF STUDY
• BEGINNING OF TIME/ BEFORE THE EVENT
• THE POINT THE STUDY STARTS OR WHEN NO ONE HAS YET EXPERIENCED THE TARGET
EVENT
• END OF TIME
• AN EVENT TIME FOR EACH SUBJECT
EVENT HISTORY ANALYSIS
• DISCRETE-TIME ANALYSIS-
• CONTINUOUS-TIME ANALYSIS-
• WE WILL TALK MAIN ABOUT DISCRETE TIME ANALYSIS BECAUSE IR WOULD
MAINLY USE IT.
•
EVENT HISTORY ANALYSIS
• CENSORING
• LEFT CENSORING – EVENT OCCURRED BEFORE THE STUDY STARTED
• RIGHT CENSORING – EVENT OCCURRED AFTER THE STUDY ENDED
• INTERVAL –CENSORING- INDIVIDUAL IS REMOVED FROM RISK AT THE TIME OF
DIFFERENT EVENT
• LIFE TABLES
• EXAMPLE DEVELOPMENTAL EDUCATION – THE MOVE FROM DEVELOPMENTAL
EDUCATION TO COLLEGE READINESS
• 100 DE STUDENTS START COLLEGE
EVENT HISTORY ANALYSIS
• DISCRETE TIME HAZARD 𝑝 – THE ESTIMATED PROBABILITY OF DROPOUT IN A
INTERVAL --
• 20 DROPPED OUT AND 80 REMAIN AFTER A SEMESTER 20/100=.2
• 20 DROPPED OUT AND 60 REMAIN AFTER TWO SEMESTER 20/80 = .25
• SURVIVOR FUNCTION – THE PROPORTION OF SURVIVORS IN THE PREVIOUS
PERIOD TIMES THOSE WHO SURVIVE TO WHATEVER PERIOD (.55)
EVENT HISTORY ANALYSIS
• ONE CAN ALSO ADD THE OTHER VARIABLES TO SEE HOW THEY INFLUENCE
SURVIVAL ANALYSIS.
• FOR EXAMPLE , YOU CAN KNOW IF A MALE IS MORE LIKELY TO DROP OUT IN
THE SECOND SEMESTER ABOVE THE NORMAL SURVIVAL RATE
EVENT HISTORY ANALYSIS AND IR
• EASY TO LEARN
• CAN ANSWER IMPORTANT QUESTIONS TO LEADERSHIP
• REGRESSION WITH TIME – NOT JUST ONE POINT IN TIME – WHEN IS THE CRITICAL
TIME POINT.
REGRESSION-DISCONTINUITY DESIGN
REGRESSION-DISCONTINUITY DESIGN
REGRESSION-DISCONTINUITY DESIGN
OTHER
• STRUCTURAL EQUATION MODELING
• FORECASTING.