Logistic Regression

Embed Size (px)

DESCRIPTION

The complete method of doing logistic regression.Download this and prepare the best predictive models.

Citation preview

Step 1: Descriptive AnalysisStep 2: Univariate AnalysisStep 3: Testing of CollinearityStep 4: Multivariable AnalysisStep 5: Model DiagnosticsDescriptive Analysis1. Gain an understanding of the distribution of data for each variable.2. For continuous variables, create histograms or box plotsFor categorical, create frequency tables and bar charts(a) If quantitative variables are skewed, consider categorisation(or transformation)(b) exclude the variables if they have (i) little variability (ii) high number of missing values3. Now we need to understand the IDVs with the DV, that is understand the distribution of the explanatory variables for each category of the outcome(a) For Categorical, create a contingency tables with the outcome/bar charts stratified by the outcome variable.(b) For Continuous, estimate means and standard deviation/ create box plots for each category of the outcome(c) In categorical, if there are any cells with low or zero frequency, they are not suitable for a chi square test, so collapse together some of the categories or deleting the category altogether.Univariate Analysis1. We test the association of one explanatory variable at a time with the outcome without worrying about other variables.This is essential to shortlist the variables for multivariate analysis and those who dont show there own significant association with the outcome will not show association after adjusting for other variables.2. Results of univariate logistic regression are: Wald statistics/maximum likelihood ratio/P values/Parameter estimates and standard errors, odd ratio and their confidence limits3. p-value gives the probability and does not give the idea of the magnitude and variability of association.Values of parameter estimates are not very intuitive as they are calculated on a log scale.Odds ratio = 1(no association), >1(+ve association), 1) there is potential over-dispersion.3. If the chi square is equal, the model perfectly fits the data.