Bio Statics

Embed Size (px)

Citation preview

  • 7/27/2019 Bio Statics

    1/93

    Good morning

  • 7/27/2019 Bio Statics

    2/93

    Biostatistics

    Ashfaq yaqoob

    18.01.2010

  • 7/27/2019 Bio Statics

    3/93

    Introduction

    Any science needs precision for its development.

    For precision, facts, observations ormeasurements have to be expressed in figures.

    It has been said when you can measure whatyou are speaking about and express it innumbers, you know something about it, but

    when you cannot express it in numbers yourknowledge is of meager and unsatisfactorykind. - Lord Kelvin

  • 7/27/2019 Bio Statics

    4/93

    Similarly in medicine, be it diagnosis, treatment

    or research everything depends on measurement E.g. you have to measure or count the number ofmissing teeth OR measure the verticaldimension and express it in number so that it

    makes sense. Statisticor datummeans a measured or

    counted fact or piece of the information stated asa figure such as height of one person, birth

    weight of a baby etc.

    Statisticsor datais plural of the same.

  • 7/27/2019 Bio Statics

    5/93

    Stat ist icsis science of figures.

    It is a field of study concerned withtechniques/methods of collection of data,

    classification, summarizing, interpretation,

    drawing inferences, testing hypothesis and

    making recommendations.

    Biostat is t ics-is term used when tools of

    statistics are applied to the data derived frombiological sciences.

  • 7/27/2019 Bio Statics

    6/93

    Datadiscrete observations ofattributes/events that carry little meaning

    when considered alone. Information is data which is reduced andadjusted, according to variations such as agesex-so that comparisons over time and placeare possible.

    Intelligence

    is transformation ofinformation through integration andprocessing with experience and perceptionsbased on social and political values.

    Any measurable characteristic of apopulation is called a Parameter.

  • 7/27/2019 Bio Statics

    7/93

    Statistics used to summarize, or describe,the characteristics of a sample are calledDesc ript iv e stat ist ics .

    Statistical procedures that are used to makeinferences (ie, draw conclusions) about thepopulation that the sample represents arecalled In ferential stat ist ic s.

  • 7/27/2019 Bio Statics

    8/93

    Descriptive statistics

  • 7/27/2019 Bio Statics

    9/93

    In the real world, we can not study the infinitemembers of an entire population.

    Instead, we must select a sample in the hopethat it will serve as a representative surrogate.

  • 7/27/2019 Bio Statics

    10/93

    sample -can be used to estimate quantities in a

    population as a whole

    Sampling variations minimized by

    adequate sample size

    proper sampling techniques

  • 7/27/2019 Bio Statics

    11/93

    Non random samplingeasier and more

    convenient to perform

    Randomsampling .

    In random sampling (also calledprobabilitysampling)

    everyone in the sampling frame has an equalprobability of being chosen.

  • 7/27/2019 Bio Statics

    12/93

    Non-random sampling (also called nonprobability sampling) does not have these aims,but is usually easier and more convenient to

    perform.

    Convenience or opportunistic sampling is thecrudest type of non random sampling.

    This involves selecting the most convenientgroup available (e.g. using the first 20colleagues we see at work).

    Though simple to perform, but is unlikely toresult in a sample that is either representative ofthe population or replicable.

  • 7/27/2019 Bio Statics

    13/93

    Random selection of samples is important

    In random sampling, everyone in the samplingframe has an equal probability of being chosen.

    sample is truly representative of the population

    It can help minimize bias (bias can be defined asan effect that produces results which are

    systematically different from the true values )

  • 7/27/2019 Bio Statics

    14/93

    Simple random sample using random numbers.

    a. lottery method

    b. Table of random numbers.

    Multi stage sampling :school health survey all

    children-.

    Cluster sampling -all of the subjects in the final-stage

    sample are investigated.

    Stratified sampling - to randomly select subjectsfrom different strata or groups.

  • 7/27/2019 Bio Statics

    15/93

    Systematic sampling is formed by selecting oneunit at random and then selecting additionalunits at evenly spaced interval till sample ofrequired size is formed.

    Pathfinder surveys:specified proportion ofpopulation.1%

  • 7/27/2019 Bio Statics

    16/93

    Sources of data

    1. Experiments2. Surveys3. Records

    Primary Secondary

    Categories1. Quantitative/continuous

    measured with a number

    2. Qualitative/discrete- cannot be meaningfullysummarized by a number.

  • 7/27/2019 Bio Statics

    17/93

    Qualitative or discrete data

    In such data there is no notion of magnitude or

    size of an attribute as the same cannot bemeasured.

    The number of person having the sameattribute are variable and are measured

    e.g. like out of 100 people 75 have class Iocclusion, 15 have class II occlusion and 10have class III occlusion.

    Class I II III are attributes , which cannot bemeasured in figures, only no of people havingit can be determined

  • 7/27/2019 Bio Statics

    18/93

    Quantitative or continuous data

    In this the attribute has a magnitude. both

    the attribute and the number of personshaving the attribute vary

    E.g Freeway space. It varies for every patient. It

    is a quantity with a different value for eachindividual and is measurable. It is continuousas it can take any value between 2 and 4 like itcan be 2.10 or 2.55 or 3.07 etc.

  • 7/27/2019 Bio Statics

    19/93

    Data presentation

    Statistical data once collected should besystematically arranged and presented

    To arouse interest of readers

    For data reduction

    To bring out important points clearly andstrikingly

    For easy grasp and meaningful conclusions

    To facilitate further analysis To facilitate communication

  • 7/27/2019 Bio Statics

    20/93

    Two main types of data presentation are

    Tabulation

    Graphic representation with charts anddiagrams

    Tabulation

    It is the most common method

    Data presentation is in the form of columnsand rows

  • 7/27/2019 Bio Statics

    21/93

    General principles for designing tables.

    1. Tables should be numbered.2. A title- brief and self explanatory should be given for

    each table.3. Headings of rows and columns should be clear and

    concise.

    4. Data must be presented according To size orimportance (chronologically/ alphabetically).

    It can be of the following typesSimple tables

    Frequency distribution tables

  • 7/27/2019 Bio Statics

    22/93

    Simple table

    NO of patients in MCODS Mangalore

    Jan 2006 2000

    Feb 2006 1800

    March 2006 2300

  • 7/27/2019 Bio Statics

    23/93

    Frequency distribution table

    Data is first split into convenient groups andnumber of items in each group is shown in

    adjacent columns.

  • 7/27/2019 Bio Statics

    24/93

    Frequency distribution table

    Number of Cavities Number of Patients

    0 to 3 78

    3 to 6 67

    6 to 9 32

    9 and above 16

  • 7/27/2019 Bio Statics

    25/93

    Charts and diagrams

    Useful method of presenting statistical data

    Powerful impact on imagination of the people

  • 7/27/2019 Bio Statics

    26/93

    Bar chart

    Length of bars drawn vertical or horizontal isproportional to frequency of variable.

    suitable scale is chosen

    bars usually equally spaced

    They are of three types -simple bar chart

    -multiple bar chart two or more variables are grouped together

    -component bar chart bars are divided into two parts

    each part representing certain item and

    proportional to magnitude of that item

  • 7/27/2019 Bio Statics

    27/93

    Bar diagrams

    Simple

    Sub-divided Multiple

    Simple

    Sub-dividedMultiple

  • 7/27/2019 Bio Statics

    28/93

    Histogram

    -Pictorial diagram offrequency distribution .

    Frequency polygonobtained by joiningmidpoints of histogramblocks at the height of

    frequency by straightlines usually forming apolygon

    75

    4540

    32

    43

    22

    3429

    38

    0

    10

    20

    30

    40

    50

    60

    70

    80

    Number of carious lesions

    0 to 3

    3 to 6

    6 to 9

    9 to 12

    12 to 15

    15 to 18

    18 to 21

    21 to 24

    24 to 27

  • 7/27/2019 Bio Statics

    29/93

    Pie charts

    In this frequencies of the group are shown as

    segment of circle Degree of angle denotes the frequency

    Angle is calculated by

    class frequency X 360total observations

    200, 31%

    150, 24%

    180, 29%

    70, 11%30, 5%

    PROSTHO

    CONSO

    PERIO

    ORTHO

    PEDO

  • 7/27/2019 Bio Statics

    30/93

    Scatter diagrams: show relation between twovariables.

    If dots are clustered around a straight line-shows evidence of relationship of linear nature.

    If no such cluster- it is probable that no relationbetween variables.

    0

    2

    4

    6

    8

    10

    12

    14

    0 5 10 15

    Carious lesion

    Sugar Exposure

  • 7/27/2019 Bio Statics

    31/93

    Pictogram

    Popular method of presenting data to thecommon man

    Spot map or map diagram

    These maps are prepared to show geographicdistribution of frequencies of characteristics

  • 7/27/2019 Bio Statics

    32/93

    Implies a value in distribution around whichother values are distributed.

    Gives a picture of central value.1. Arithmetic mean2. Median3. Mode

    Measures of statistical averages or

    central tendency

  • 7/27/2019 Bio Statics

    33/93

    Mean refers to arithmetic mean

    it is the summation of all the observationsdivided by the total number of observations (n)

    denoted by X for sample and for population X = x1 + X2 + X3 . Xn / n

    Advantages it is easy to calculate

    Disadvantages influenced by extreme values

  • 7/27/2019 Bio Statics

    34/93

    Median

    When all the observation are arranged either inascending order or descending order, the middleobservation is known as median

    In case of even number the average of the twomiddle values is taken

    Median is better indicator of central value as it isnot affected by the extreme values

  • 7/27/2019 Bio Statics

    35/93

    Mode

    Most frequently occurring observation in a data

    is called mode Not often used in medical statistics.

    Example

    Number of decayed teeth in 10 children2,2,4,1,3,0,10,2,3,8

    Mean = 34 / 10 = 3.4

    Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2

    = 2.5

    Mode = 2 ( 3 Times)

  • 7/27/2019 Bio Statics

    36/93

    Variations

    Data colleted has incredible variations.

    Variation from person to person And alsovariation in same person at different times.

    Thus Measures of variation / dispersion areused. Range

    Mean/average deviation Standard deviation (sigma )

  • 7/27/2019 Bio Statics

    37/93

    Range difference between highest and lowestvalues

    Mean deviation-average of deviation fromarithmetic mean.

    M.D.= (X-X1

    )/n X 1= observation X = mean

    n = no of observation

  • 7/27/2019 Bio Statics

    38/93

    Standard deviaitonroot mean square

    deviaiton. Denoted by (sigma) or S.D

    = (X-X1 ) 2 /n

    Greater the standard deviation, greater will bethe magnitude of dispersion from mean

    Small standard deviation means a high degree of

    uniformity of the observations Usually measurement beyond the range of 2SD are considered rare or unusual in anydistribution

  • 7/27/2019 Bio Statics

    39/93

    Variance of the data Another way to describe dispersion is to

    present interquartile ranges, such as thevalues for the 25th and 75th percentile level,

    which are not as likely to be influenced by thevalues at the extreme upper and lower end ofthe spread of data points.

  • 7/27/2019 Bio Statics

    40/93

    For continuous data, the most commonly usedmeasure of central tendency is the mean.

    For ordinal data, the median or modeis used torepresent the center of the data.

    The medianis also used as a measure of centraltendency for continuous data that are skewedto

    minimize the effect of extremely large or smallvalues on the estimate of the center of the data.

  • 7/27/2019 Bio Statics

    41/93

    Nominal dataare summarized by reporting theproportion or percentageof the data that are

    classified in each level.

  • 7/27/2019 Bio Statics

    42/93

    Sample Size and Power

    Designing studies with inadequate sample sizesmay lead to errors and false conclusions (false

    negative findings)

    False negative findings can occur either bychance or study is under powered.

    Care full sample size calculation can guideresearchers as to what can and cannot beaccomplished in a study with a finite amount ofresources .

  • 7/27/2019 Bio Statics

    43/93

    Although the sample size calculations areperformed using mathematical methods, the

    preparation for the calculation requires bothstatistical reasoning and clinical experience.

    Calculation of sample size require four things

    1. Deciding on the design of study2. Assessing the availability of resources

    3. Specifying distribution assumptions

    4. Defining a clinically relevant effect

  • 7/27/2019 Bio Statics

    44/93

    Inferential statistics

  • 7/27/2019 Bio Statics

    45/93

    Inferential statistics are those statistical

    procedures that compare groups to see if thegroups are significantly different from eachother.

    two kinds

    parametric statistics

    nonparametric statistics.

  • 7/27/2019 Bio Statics

    46/93

    Parametric statisticsrefers to a group ofstatistical tests that uses meansand a measure of

    variation (standard deviation, variance) to helpdetermine if groups are different from eachother.

  • 7/27/2019 Bio Statics

    47/93

    Certain conditions regarding the data must be metbefore the simplest parametric tests, based on meansand standard deviations, may be validly used.

    1. The data must be continuous(measured on acontinuous scale, eg, millimeters, pounds, degrees)

    2. A scatter plot of the data must look like a normaldistribution (bell shaped curve) and

    1. The dispersion or spread of data for each variablemust be the same in each group being compared (the

    size of the variance or standard deviation of thevariable is the same in each of the groups beingcompared).

  • 7/27/2019 Bio Statics

    48/93

    Distributions

    Begin the initial analysis by plotting them on agraph to see how they are distributed.

    points can be seen to follow some recognizedpattern or distribution.

    Many patterns of distributions occur in nature.Frequently, these patterns can be described bymathematical functions, which then enable us todetermine the likelihood that a data point will

    fall under a specific area of the distributioncurve.

  • 7/27/2019 Bio Statics

    49/93

    The Normal distribution or Gaussian

    distribution.

    Bell - shaped curve

    The data cluster around a central point andspread symmetrically around this center point. the central point is the mean of the sample. The width of the bell-shaped curve depends on

    how much variability there is in the data.

  • 7/27/2019 Bio Statics

    50/93

  • 7/27/2019 Bio Statics

    51/93

    The way to estimate the amount of variability is to

    calculate the SD, the square root of the average squareddeviation of each data point from the mean value of all thedata points.

    The larger the SD is, the greater the variability in the data.

    The greater the variability is, the wider the shape of thecurve.

  • 7/27/2019 Bio Statics

    52/93

  • 7/27/2019 Bio Statics

    53/93

    Importance of distribution

    Many statistical tests are based on parametric assumptions(ie, the data are assumed to follow a distribution that can besummarized by parameters) requiring distribution of the

    data which is normal (bell-shaped).

    Many parametric statistical tests are insensitive to milddepartures of the data from normality, but severedepartures from the normal distribution mandate the use of

    distribution-free tests- nonparametric statistics.

  • 7/27/2019 Bio Statics

    54/93

    Parametric statistics tend to be more powerfulthan nonparametric statistics.

    This means that they are more likely thannonparametric statistics to detect a significantsignificance between samples when thedifference is real, but use of a parametric test

    when assumptions are violated is incorrect.

  • 7/27/2019 Bio Statics

    55/93

    Common parametric tests include the

    Student t test and

    Analysis of variance (ANOVA)

  • 7/27/2019 Bio Statics

    56/93

    Ordinal dataare analyzed by nonparametric

    procedures. Nonparametric statistics use the ranks/medians of thedata rather than means and standard deviations tomake group comparisons.

    Common nonparametric tests based on ranks include

    the Mann-Whitney U test, the Wilcoxon signed rank test, and the Kruskal-Wallis test

    Nonparametric statistical tests are also used forcontinuous data that are not normally distributed(bell-shaped curve).

  • 7/27/2019 Bio Statics

    57/93

    The most common test to analyze nominal datais the 2test

    Data that are nominal (eg, sex, tooth type) cannot besummarized by means or ordered into ranks.

    Ratios / proportionscan be determined.

  • 7/27/2019 Bio Statics

    58/93

    Test Statistics Statistical procedures comparing samples provide a

    test statistic or critical ratio that is associated with aprobability level (Pvalue).

    The probability level, is the likelihood or chance thattwo groups, representative of the same population,would be chosen, and that there would be adifference in the groups at least as big as the one

    detected. Pvalue < .05 means there is an equal or lower than5% chance (1 in 20) that the two groups could besamples from the same population.

    By convention, whenP

  • 7/27/2019 Bio Statics

    59/93

    Parametric Tests

    The Student t test is used when only two groups arebeing compared.

    The Student t test uses sample means and standard

    deviations to calculate the probability or likelihood thatthe groups are different.

    It helps us to determine if the means differ because thetwo groups represent two different populations or if themeans differ because the groups have different subjectsbut each group represents the same population.

  • 7/27/2019 Bio Statics

    60/93

    exists in two forms depending on whether thetwo groups under comparison are

    paired (matched) or independent of each other.

  • 7/27/2019 Bio Statics

    61/93

    A common paired design occurs when a single group ofsubjects is measured before and after a procedure toexamine the effect of some intervention (eg, treatment).

    A matched group study design is one in which theoutcome of each subject in the treatment group iscompared directly to the outcome in another subject whois as similar as possible to its mate, with the exception of

    the treatment under investigation.

  • 7/27/2019 Bio Statics

    62/93

    An example of a paired study is a comparison ofmasticatory efficiency of complete denture

    wearer with bilateral balanced occlusion afterselective grinding.

  • 7/27/2019 Bio Statics

    63/93

    Two -sample, independent t test. to compare independent groups or unmatched

    groups. An example is to estimate the masticatory

    efficiency between bilateral balanced occlusionand lingualised occlusion in complete denture

    wearers patients.

  • 7/27/2019 Bio Statics

    64/93

    In paired study designs, the number of subjectsin both groups is the same, whereas in the two-

    sample, independent design, the size of the twosamples may be different.

  • 7/27/2019 Bio Statics

    65/93

    If more than two groups are being compared, theANOVAis used.

    Unlike the t test, which uses the mean and standarddeviation of groups for its computations, ANOVAuses the mean and variance of groupsforcomputations.

    Test statistic is F statistic.

    ANOVA makes a series of pair-wise comparisons for

    all the groups in the comparison.

  • 7/27/2019 Bio Statics

    66/93

    A significantPvalue indicates that a difference existssomewhere between any two comparisons, but ANOVAdoes not identify which groups are different.

    To determine which pairs differpost hoc or a posterioritestsused to examine the groups in detail and revealwhich groups significantly differ from each other.

    Common post hoc tests are

    the Tukey-Kramer honestly significant difference, Scheff, Dunnett, Duncan, and Newman-Keuls tests.

  • 7/27/2019 Bio Statics

    67/93

    Nonparametric Tests

    A common nonparametric test forcomparison of two unpaired samples is theMann-Whitney U testalso known as theWilcoxon rank sum test.

    Compares the medians of the groups. Test statistic is U statistic.

    Example -grade point averages

    The comparable nonparametric test to thepaired t test is theWilcoxon signed rank test.

  • 7/27/2019 Bio Statics

    68/93

    The nonparametric test comparable to the ANOVA is theKruskal-Wallis procedure.

    Examines intergroup differences based on ranks.

  • 7/27/2019 Bio Statics

    69/93

    x2 test.

    nominal data analyzed.

    It is used to compare the proportion of the datathat fall into each level of the nominal variable.

    Correlation

  • 7/27/2019 Bio Statics

    70/93

    Correlation. To test whether or not two variables bear a linear

    relationship to each other (ie, whether or not they vary

    together, either positively or negatively), the techniqueof Pearson product-moment linear correlationiscommonly used.

    The correlation coefficient (r), a dimensionless indexindicates of the extent to which the two characteristicsvary together.,

    r can range from +1, denoting a perfect positiverelationship, to 1, characteristic of a perfect negativerelationship,r = 0 signify complete independence.

    normally r = 0.6 or -0.3 or 0.1

  • 7/27/2019 Bio Statics

    71/93

    Regression.

    If a linear relationship is significant statisticallyand is strong enough to be of practical use, the

    next step is to model it mathematically in theform of a prediction equation so that it can beused clinically.

    Y =A + BX

  • 7/27/2019 Bio Statics

    72/93

    Regression and correlation are closely related: one dealswith the strength of a linear relationship and the other

    with its form.

  • 7/27/2019 Bio Statics

    73/93

    Multivariate Analysis

  • 7/27/2019 Bio Statics

    74/93

    A statistical analysis that involves more thanone dependent variable.

    The analysis of simultaneous relationshipsamong several variables. Examining simultaneously the effects of age, sex,

    and social class on hypertension would be an

    example of multivariate analysis

  • 7/27/2019 Bio Statics

    75/93

    Considers the interrelationships of several traitsat a time .

    Multivariate analysis comprises a set oftechniques dedicated to the analysis of data setswith more than one variable.

  • 7/27/2019 Bio Statics

    76/93

    One data set

    Interval or ratio level of measurement: principalcomponent analysis (PCA)

    Nominal or ordinal level of measurement:correspondence analysis (CA), multiplecorrespondence analysis (MCA)

    Similarity or distance: multidimensional scaling (MDS)

    - Multidimensional scaling (MDS)is a set of relatedstatisticaltechniques often used in data visualizationfor exploringsimilarities or dissimilarities in data.

    T d

    http://en.wikipedia.org/wiki/Statisticalhttp://en.wikipedia.org/wiki/Data_visualizationhttp://en.wikipedia.org/wiki/Data_visualizationhttp://en.wikipedia.org/wiki/Statistical
  • 7/27/2019 Bio Statics

    77/93

    Two data sets Case one: one independent variable set and one

    dependent variable set- Multiple linear regression analysis (MLR) Regression with too many predictors and/or several

    dependent variables Partial least square (PLS) regression (PLSR)

    Principal component regression (PCR) Ridge regression (RR)

    Reduced rank regression (RRR) or redundancy analysis

    Multivariate analysis of variance (MANOVA) Predicting a nominal variable: discriminant analysis

    (DA) Fitting a model: confirmatory factor analysis (CFA)

  • 7/27/2019 Bio Statics

    78/93

    Two (or more) dependent variable sets:

    Canonical correlation analysis (CC)

    Multiple factor analysis (MFA)

    Multiple correspondence analysis (MCA)

    Procustean analysis (PA)

  • 7/27/2019 Bio Statics

    79/93

    Regression analysis

    In statistics, regression analysisis used tomodel relationships between random variables,

    determine the magnitude of the relationshipsbetween variables, and can be used to makepredictions based on the models.

    http://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Random_variable
  • 7/27/2019 Bio Statics

    80/93

    Predictor variables may be defined quantitatively orqualitatively (or categorical).

    If the predictors are all quantitative,- multipleregression.

    If the predictors are all qualitative, one performs analysis

    of variance.

    If some predictors are quantitative and some qualitative,one performs an analysis of covariance

    http://en.wikipedia.org/wiki/Multiple_regressionhttp://en.wikipedia.org/wiki/Multiple_regressionhttp://en.wikipedia.org/wiki/Analysis_of_variancehttp://en.wikipedia.org/wiki/Analysis_of_variancehttp://en.wikipedia.org/wiki/Analysis_of_covariancehttp://en.wikipedia.org/wiki/Analysis_of_covariancehttp://en.wikipedia.org/wiki/Analysis_of_variancehttp://en.wikipedia.org/wiki/Analysis_of_variancehttp://en.wikipedia.org/wiki/Multiple_regressionhttp://en.wikipedia.org/wiki/Multiple_regression
  • 7/27/2019 Bio Statics

    81/93

    If two or more independent variablesarecorrelated, we say that the variables are

    multicollinear. Multicollinearity results in parameter estimates

    that are unbiased and consistent, but which mayhave relatively large variances

  • 7/27/2019 Bio Statics

    82/93

    Many patterns of distributions occur innature. Frequently, these patterns can be

    described by mathematical functions. The most common statistical tests can beapplied to data that is normally distributed.

    What if data obtained is not normally

    distributed?? Log transformationof data to normaldistribution is undertaken.

    Normal staistical tests cannot be applied to

    data that is log transformed.

  • 7/27/2019 Bio Statics

    83/93

    Logistic regression In statistics, logistic regressionis a model used for

    prediction of the probabilityof occurrence of an event.

    It makes use of several predictor variables that may beeither numerical or categories. For example, theprobability that a person has a heart attack within aspecified time period might be predicted fromknowledge of the person's age, sex andbody mass index.

    The "input" is z and the "output"

    http://en.wikipedia.org/wiki/Logistic_regressionhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Body_mass_indexhttp://en.wikipedia.org/wiki/Body_mass_indexhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Logistic_regression
  • 7/27/2019 Bio Statics

    84/93

    The input iszand the outputisf(z). The logistic function isuseful because it can take as an

    input, any value from negativeinfinity to positive infinity,whereas the output is confinedto values between 0 and 1.

    The variablezrepresents the

    exposure to some set of riskfactors, whilef(z) represents theprobability of a particularoutcome, given that set of riskfactors. The variablezis a

    measure of the totalcontribution of all the riskfactors used in the model and isknown as the logit

    http://en.wikipedia.org/wiki/Logithttp://en.wikipedia.org/wiki/Logit
  • 7/27/2019 Bio Statics

    85/93

    Z = 0 + 1x1 + 2x2 + 3x3 .

    0 is the intercept valueit is the value of z when other risk factors are absent.

    1, 2 and 3 are regression coefficient

    X1,x2 and x3 are risk factor for heart disease

    The application of a logistic regression may be illustrated

    i fi titi l f d th f h t di

  • 7/27/2019 Bio Statics

    86/93

    using a fictitious example of death from heart disease.This simplified model uses only three risk factors (age,sex and cholesterol) to predict the 10-year risk of death

    from heart disease.

    0 = 5.0 (the intercept) 1 = + 2.0

    2 = 1.0 3 = + 1.2 x1 = age in decades x2 = sex, where 0 is male and 1 is female x3 = cholesterol level, in mmol/dl

    Risk of death =1/1+e z where z = -5.0+2.0 x1 - 1.0 x2+1.2x3

  • 7/27/2019 Bio Statics

    87/93

    Discriminant AnalysisDiscriminant function(modified Maddrey's

    discriminant function)originally described by Maddrey and Boitnott to predict

    prognosisin alcoholic hepatitis.

    canonical variate analysis attempt to establish whether aset of variables can be used to distinguish between two

    or more groups.

    http://en.wikipedia.org/wiki/Prognosishttp://en.wikipedia.org/wiki/Alcoholic_hepatitishttp://en.wikipedia.org/wiki/Alcoholic_hepatitishttp://en.wikipedia.org/wiki/Prognosis
  • 7/27/2019 Bio Statics

    88/93

    Suppose we have two samples representing differentpopulations,

    We measured one character for them and found thattheir means for this character are not identical, theirdistributions overlap considerably, so that on thebasis of this character one could not, with any degreeof accuracy, identify an unknown specimen as

    belonging to one or the other of the two populations. A second character may also differentiate them

    somewhat, but not absolutely Two variables sayXl andX2 can be used to

    distinguish them.

  • 7/27/2019 Bio Statics

    89/93

    Discriminant function analysis computes a new variablesay Z, which is a linear function of both variablesX1andX2.

    This function is constructed in such a way that as manyas possible of the members of one population have highvalue for "z" and as many as possible of the members ofthe other have low values, so that "z" serves as a muchbetter determinant of the two populations than doesvariableXl andX2 taken singly.

  • 7/27/2019 Bio Statics

    90/93

    Example : Blood pressure and cholesterol levelsand blood sugar are different between those whoare obese and normal in body build.

    Discriminant function analysis can be utilisedfor assessing the combined effect of factors thatare different between the two groups of subjects.

  • 7/27/2019 Bio Statics

    91/93

    meta-analysis In statisticsa meta-analysiscombines the results of

    several studies that address a set of related researchhypotheses.

    The first meta-analysis was performed by Karl Pearsonin 1904, in an attempt to overcome the problem ofreduced statistical powerin studies with small samplesizes; analyzing the results from a group of studies canallow more accurate data analysis.

    http://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Karl_Pearsonhttp://en.wikipedia.org/wiki/Statistical_powerhttp://en.wikipedia.org/wiki/Statistical_powerhttp://en.wikipedia.org/wiki/Karl_Pearsonhttp://en.wikipedia.org/wiki/Statistics
  • 7/27/2019 Bio Statics

    92/93

    CONCLUSION

    Understanding the complexities of statisticalmodeling not only enable the use of test

    characteristics in the actual design of diagnostictests, but familiarity with fundamental conceptswill also facilitate insight and critical evaluationof research that relies on such methodology.

  • 7/27/2019 Bio Statics

    93/93

    Thank you