Upload
amar-bhochhibhoya
View
248
Download
0
Embed Size (px)
Citation preview
7/27/2019 Bio Statics
1/93
Good morning
7/27/2019 Bio Statics
2/93
Biostatistics
Ashfaq yaqoob
18.01.2010
7/27/2019 Bio Statics
3/93
Introduction
Any science needs precision for its development.
For precision, facts, observations ormeasurements have to be expressed in figures.
It has been said when you can measure whatyou are speaking about and express it innumbers, you know something about it, but
when you cannot express it in numbers yourknowledge is of meager and unsatisfactorykind. - Lord Kelvin
7/27/2019 Bio Statics
4/93
Similarly in medicine, be it diagnosis, treatment
or research everything depends on measurement E.g. you have to measure or count the number ofmissing teeth OR measure the verticaldimension and express it in number so that it
makes sense. Statisticor datummeans a measured or
counted fact or piece of the information stated asa figure such as height of one person, birth
weight of a baby etc.
Statisticsor datais plural of the same.
7/27/2019 Bio Statics
5/93
Stat ist icsis science of figures.
It is a field of study concerned withtechniques/methods of collection of data,
classification, summarizing, interpretation,
drawing inferences, testing hypothesis and
making recommendations.
Biostat is t ics-is term used when tools of
statistics are applied to the data derived frombiological sciences.
7/27/2019 Bio Statics
6/93
Datadiscrete observations ofattributes/events that carry little meaning
when considered alone. Information is data which is reduced andadjusted, according to variations such as agesex-so that comparisons over time and placeare possible.
Intelligence
is transformation ofinformation through integration andprocessing with experience and perceptionsbased on social and political values.
Any measurable characteristic of apopulation is called a Parameter.
7/27/2019 Bio Statics
7/93
Statistics used to summarize, or describe,the characteristics of a sample are calledDesc ript iv e stat ist ics .
Statistical procedures that are used to makeinferences (ie, draw conclusions) about thepopulation that the sample represents arecalled In ferential stat ist ic s.
7/27/2019 Bio Statics
8/93
Descriptive statistics
7/27/2019 Bio Statics
9/93
In the real world, we can not study the infinitemembers of an entire population.
Instead, we must select a sample in the hopethat it will serve as a representative surrogate.
7/27/2019 Bio Statics
10/93
sample -can be used to estimate quantities in a
population as a whole
Sampling variations minimized by
adequate sample size
proper sampling techniques
7/27/2019 Bio Statics
11/93
Non random samplingeasier and more
convenient to perform
Randomsampling .
In random sampling (also calledprobabilitysampling)
everyone in the sampling frame has an equalprobability of being chosen.
7/27/2019 Bio Statics
12/93
Non-random sampling (also called nonprobability sampling) does not have these aims,but is usually easier and more convenient to
perform.
Convenience or opportunistic sampling is thecrudest type of non random sampling.
This involves selecting the most convenientgroup available (e.g. using the first 20colleagues we see at work).
Though simple to perform, but is unlikely toresult in a sample that is either representative ofthe population or replicable.
7/27/2019 Bio Statics
13/93
Random selection of samples is important
In random sampling, everyone in the samplingframe has an equal probability of being chosen.
sample is truly representative of the population
It can help minimize bias (bias can be defined asan effect that produces results which are
systematically different from the true values )
7/27/2019 Bio Statics
14/93
Simple random sample using random numbers.
a. lottery method
b. Table of random numbers.
Multi stage sampling :school health survey all
children-.
Cluster sampling -all of the subjects in the final-stage
sample are investigated.
Stratified sampling - to randomly select subjectsfrom different strata or groups.
7/27/2019 Bio Statics
15/93
Systematic sampling is formed by selecting oneunit at random and then selecting additionalunits at evenly spaced interval till sample ofrequired size is formed.
Pathfinder surveys:specified proportion ofpopulation.1%
7/27/2019 Bio Statics
16/93
Sources of data
1. Experiments2. Surveys3. Records
Primary Secondary
Categories1. Quantitative/continuous
measured with a number
2. Qualitative/discrete- cannot be meaningfullysummarized by a number.
7/27/2019 Bio Statics
17/93
Qualitative or discrete data
In such data there is no notion of magnitude or
size of an attribute as the same cannot bemeasured.
The number of person having the sameattribute are variable and are measured
e.g. like out of 100 people 75 have class Iocclusion, 15 have class II occlusion and 10have class III occlusion.
Class I II III are attributes , which cannot bemeasured in figures, only no of people havingit can be determined
7/27/2019 Bio Statics
18/93
Quantitative or continuous data
In this the attribute has a magnitude. both
the attribute and the number of personshaving the attribute vary
E.g Freeway space. It varies for every patient. It
is a quantity with a different value for eachindividual and is measurable. It is continuousas it can take any value between 2 and 4 like itcan be 2.10 or 2.55 or 3.07 etc.
7/27/2019 Bio Statics
19/93
Data presentation
Statistical data once collected should besystematically arranged and presented
To arouse interest of readers
For data reduction
To bring out important points clearly andstrikingly
For easy grasp and meaningful conclusions
To facilitate further analysis To facilitate communication
7/27/2019 Bio Statics
20/93
Two main types of data presentation are
Tabulation
Graphic representation with charts anddiagrams
Tabulation
It is the most common method
Data presentation is in the form of columnsand rows
7/27/2019 Bio Statics
21/93
General principles for designing tables.
1. Tables should be numbered.2. A title- brief and self explanatory should be given for
each table.3. Headings of rows and columns should be clear and
concise.
4. Data must be presented according To size orimportance (chronologically/ alphabetically).
It can be of the following typesSimple tables
Frequency distribution tables
7/27/2019 Bio Statics
22/93
Simple table
NO of patients in MCODS Mangalore
Jan 2006 2000
Feb 2006 1800
March 2006 2300
7/27/2019 Bio Statics
23/93
Frequency distribution table
Data is first split into convenient groups andnumber of items in each group is shown in
adjacent columns.
7/27/2019 Bio Statics
24/93
Frequency distribution table
Number of Cavities Number of Patients
0 to 3 78
3 to 6 67
6 to 9 32
9 and above 16
7/27/2019 Bio Statics
25/93
Charts and diagrams
Useful method of presenting statistical data
Powerful impact on imagination of the people
7/27/2019 Bio Statics
26/93
Bar chart
Length of bars drawn vertical or horizontal isproportional to frequency of variable.
suitable scale is chosen
bars usually equally spaced
They are of three types -simple bar chart
-multiple bar chart two or more variables are grouped together
-component bar chart bars are divided into two parts
each part representing certain item and
proportional to magnitude of that item
7/27/2019 Bio Statics
27/93
Bar diagrams
Simple
Sub-divided Multiple
Simple
Sub-dividedMultiple
7/27/2019 Bio Statics
28/93
Histogram
-Pictorial diagram offrequency distribution .
Frequency polygonobtained by joiningmidpoints of histogramblocks at the height of
frequency by straightlines usually forming apolygon
75
4540
32
43
22
3429
38
0
10
20
30
40
50
60
70
80
Number of carious lesions
0 to 3
3 to 6
6 to 9
9 to 12
12 to 15
15 to 18
18 to 21
21 to 24
24 to 27
7/27/2019 Bio Statics
29/93
Pie charts
In this frequencies of the group are shown as
segment of circle Degree of angle denotes the frequency
Angle is calculated by
class frequency X 360total observations
200, 31%
150, 24%
180, 29%
70, 11%30, 5%
PROSTHO
CONSO
PERIO
ORTHO
PEDO
7/27/2019 Bio Statics
30/93
Scatter diagrams: show relation between twovariables.
If dots are clustered around a straight line-shows evidence of relationship of linear nature.
If no such cluster- it is probable that no relationbetween variables.
0
2
4
6
8
10
12
14
0 5 10 15
Carious lesion
Sugar Exposure
7/27/2019 Bio Statics
31/93
Pictogram
Popular method of presenting data to thecommon man
Spot map or map diagram
These maps are prepared to show geographicdistribution of frequencies of characteristics
7/27/2019 Bio Statics
32/93
Implies a value in distribution around whichother values are distributed.
Gives a picture of central value.1. Arithmetic mean2. Median3. Mode
Measures of statistical averages or
central tendency
7/27/2019 Bio Statics
33/93
Mean refers to arithmetic mean
it is the summation of all the observationsdivided by the total number of observations (n)
denoted by X for sample and for population X = x1 + X2 + X3 . Xn / n
Advantages it is easy to calculate
Disadvantages influenced by extreme values
7/27/2019 Bio Statics
34/93
Median
When all the observation are arranged either inascending order or descending order, the middleobservation is known as median
In case of even number the average of the twomiddle values is taken
Median is better indicator of central value as it isnot affected by the extreme values
7/27/2019 Bio Statics
35/93
Mode
Most frequently occurring observation in a data
is called mode Not often used in medical statistics.
Example
Number of decayed teeth in 10 children2,2,4,1,3,0,10,2,3,8
Mean = 34 / 10 = 3.4
Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2
= 2.5
Mode = 2 ( 3 Times)
7/27/2019 Bio Statics
36/93
Variations
Data colleted has incredible variations.
Variation from person to person And alsovariation in same person at different times.
Thus Measures of variation / dispersion areused. Range
Mean/average deviation Standard deviation (sigma )
7/27/2019 Bio Statics
37/93
Range difference between highest and lowestvalues
Mean deviation-average of deviation fromarithmetic mean.
M.D.= (X-X1
)/n X 1= observation X = mean
n = no of observation
7/27/2019 Bio Statics
38/93
Standard deviaitonroot mean square
deviaiton. Denoted by (sigma) or S.D
= (X-X1 ) 2 /n
Greater the standard deviation, greater will bethe magnitude of dispersion from mean
Small standard deviation means a high degree of
uniformity of the observations Usually measurement beyond the range of 2SD are considered rare or unusual in anydistribution
7/27/2019 Bio Statics
39/93
Variance of the data Another way to describe dispersion is to
present interquartile ranges, such as thevalues for the 25th and 75th percentile level,
which are not as likely to be influenced by thevalues at the extreme upper and lower end ofthe spread of data points.
7/27/2019 Bio Statics
40/93
For continuous data, the most commonly usedmeasure of central tendency is the mean.
For ordinal data, the median or modeis used torepresent the center of the data.
The medianis also used as a measure of centraltendency for continuous data that are skewedto
minimize the effect of extremely large or smallvalues on the estimate of the center of the data.
7/27/2019 Bio Statics
41/93
Nominal dataare summarized by reporting theproportion or percentageof the data that are
classified in each level.
7/27/2019 Bio Statics
42/93
Sample Size and Power
Designing studies with inadequate sample sizesmay lead to errors and false conclusions (false
negative findings)
False negative findings can occur either bychance or study is under powered.
Care full sample size calculation can guideresearchers as to what can and cannot beaccomplished in a study with a finite amount ofresources .
7/27/2019 Bio Statics
43/93
Although the sample size calculations areperformed using mathematical methods, the
preparation for the calculation requires bothstatistical reasoning and clinical experience.
Calculation of sample size require four things
1. Deciding on the design of study2. Assessing the availability of resources
3. Specifying distribution assumptions
4. Defining a clinically relevant effect
7/27/2019 Bio Statics
44/93
Inferential statistics
7/27/2019 Bio Statics
45/93
Inferential statistics are those statistical
procedures that compare groups to see if thegroups are significantly different from eachother.
two kinds
parametric statistics
nonparametric statistics.
7/27/2019 Bio Statics
46/93
Parametric statisticsrefers to a group ofstatistical tests that uses meansand a measure of
variation (standard deviation, variance) to helpdetermine if groups are different from eachother.
7/27/2019 Bio Statics
47/93
Certain conditions regarding the data must be metbefore the simplest parametric tests, based on meansand standard deviations, may be validly used.
1. The data must be continuous(measured on acontinuous scale, eg, millimeters, pounds, degrees)
2. A scatter plot of the data must look like a normaldistribution (bell shaped curve) and
1. The dispersion or spread of data for each variablemust be the same in each group being compared (the
size of the variance or standard deviation of thevariable is the same in each of the groups beingcompared).
7/27/2019 Bio Statics
48/93
Distributions
Begin the initial analysis by plotting them on agraph to see how they are distributed.
points can be seen to follow some recognizedpattern or distribution.
Many patterns of distributions occur in nature.Frequently, these patterns can be described bymathematical functions, which then enable us todetermine the likelihood that a data point will
fall under a specific area of the distributioncurve.
7/27/2019 Bio Statics
49/93
The Normal distribution or Gaussian
distribution.
Bell - shaped curve
The data cluster around a central point andspread symmetrically around this center point. the central point is the mean of the sample. The width of the bell-shaped curve depends on
how much variability there is in the data.
7/27/2019 Bio Statics
50/93
7/27/2019 Bio Statics
51/93
The way to estimate the amount of variability is to
calculate the SD, the square root of the average squareddeviation of each data point from the mean value of all thedata points.
The larger the SD is, the greater the variability in the data.
The greater the variability is, the wider the shape of thecurve.
7/27/2019 Bio Statics
52/93
7/27/2019 Bio Statics
53/93
Importance of distribution
Many statistical tests are based on parametric assumptions(ie, the data are assumed to follow a distribution that can besummarized by parameters) requiring distribution of the
data which is normal (bell-shaped).
Many parametric statistical tests are insensitive to milddepartures of the data from normality, but severedepartures from the normal distribution mandate the use of
distribution-free tests- nonparametric statistics.
7/27/2019 Bio Statics
54/93
Parametric statistics tend to be more powerfulthan nonparametric statistics.
This means that they are more likely thannonparametric statistics to detect a significantsignificance between samples when thedifference is real, but use of a parametric test
when assumptions are violated is incorrect.
7/27/2019 Bio Statics
55/93
Common parametric tests include the
Student t test and
Analysis of variance (ANOVA)
7/27/2019 Bio Statics
56/93
Ordinal dataare analyzed by nonparametric
procedures. Nonparametric statistics use the ranks/medians of thedata rather than means and standard deviations tomake group comparisons.
Common nonparametric tests based on ranks include
the Mann-Whitney U test, the Wilcoxon signed rank test, and the Kruskal-Wallis test
Nonparametric statistical tests are also used forcontinuous data that are not normally distributed(bell-shaped curve).
7/27/2019 Bio Statics
57/93
The most common test to analyze nominal datais the 2test
Data that are nominal (eg, sex, tooth type) cannot besummarized by means or ordered into ranks.
Ratios / proportionscan be determined.
7/27/2019 Bio Statics
58/93
Test Statistics Statistical procedures comparing samples provide a
test statistic or critical ratio that is associated with aprobability level (Pvalue).
The probability level, is the likelihood or chance thattwo groups, representative of the same population,would be chosen, and that there would be adifference in the groups at least as big as the one
detected. Pvalue < .05 means there is an equal or lower than5% chance (1 in 20) that the two groups could besamples from the same population.
By convention, whenP
7/27/2019 Bio Statics
59/93
Parametric Tests
The Student t test is used when only two groups arebeing compared.
The Student t test uses sample means and standard
deviations to calculate the probability or likelihood thatthe groups are different.
It helps us to determine if the means differ because thetwo groups represent two different populations or if themeans differ because the groups have different subjectsbut each group represents the same population.
7/27/2019 Bio Statics
60/93
exists in two forms depending on whether thetwo groups under comparison are
paired (matched) or independent of each other.
7/27/2019 Bio Statics
61/93
A common paired design occurs when a single group ofsubjects is measured before and after a procedure toexamine the effect of some intervention (eg, treatment).
A matched group study design is one in which theoutcome of each subject in the treatment group iscompared directly to the outcome in another subject whois as similar as possible to its mate, with the exception of
the treatment under investigation.
7/27/2019 Bio Statics
62/93
An example of a paired study is a comparison ofmasticatory efficiency of complete denture
wearer with bilateral balanced occlusion afterselective grinding.
7/27/2019 Bio Statics
63/93
Two -sample, independent t test. to compare independent groups or unmatched
groups. An example is to estimate the masticatory
efficiency between bilateral balanced occlusionand lingualised occlusion in complete denture
wearers patients.
7/27/2019 Bio Statics
64/93
In paired study designs, the number of subjectsin both groups is the same, whereas in the two-
sample, independent design, the size of the twosamples may be different.
7/27/2019 Bio Statics
65/93
If more than two groups are being compared, theANOVAis used.
Unlike the t test, which uses the mean and standarddeviation of groups for its computations, ANOVAuses the mean and variance of groupsforcomputations.
Test statistic is F statistic.
ANOVA makes a series of pair-wise comparisons for
all the groups in the comparison.
7/27/2019 Bio Statics
66/93
A significantPvalue indicates that a difference existssomewhere between any two comparisons, but ANOVAdoes not identify which groups are different.
To determine which pairs differpost hoc or a posterioritestsused to examine the groups in detail and revealwhich groups significantly differ from each other.
Common post hoc tests are
the Tukey-Kramer honestly significant difference, Scheff, Dunnett, Duncan, and Newman-Keuls tests.
7/27/2019 Bio Statics
67/93
Nonparametric Tests
A common nonparametric test forcomparison of two unpaired samples is theMann-Whitney U testalso known as theWilcoxon rank sum test.
Compares the medians of the groups. Test statistic is U statistic.
Example -grade point averages
The comparable nonparametric test to thepaired t test is theWilcoxon signed rank test.
7/27/2019 Bio Statics
68/93
The nonparametric test comparable to the ANOVA is theKruskal-Wallis procedure.
Examines intergroup differences based on ranks.
7/27/2019 Bio Statics
69/93
x2 test.
nominal data analyzed.
It is used to compare the proportion of the datathat fall into each level of the nominal variable.
Correlation
7/27/2019 Bio Statics
70/93
Correlation. To test whether or not two variables bear a linear
relationship to each other (ie, whether or not they vary
together, either positively or negatively), the techniqueof Pearson product-moment linear correlationiscommonly used.
The correlation coefficient (r), a dimensionless indexindicates of the extent to which the two characteristicsvary together.,
r can range from +1, denoting a perfect positiverelationship, to 1, characteristic of a perfect negativerelationship,r = 0 signify complete independence.
normally r = 0.6 or -0.3 or 0.1
7/27/2019 Bio Statics
71/93
Regression.
If a linear relationship is significant statisticallyand is strong enough to be of practical use, the
next step is to model it mathematically in theform of a prediction equation so that it can beused clinically.
Y =A + BX
7/27/2019 Bio Statics
72/93
Regression and correlation are closely related: one dealswith the strength of a linear relationship and the other
with its form.
7/27/2019 Bio Statics
73/93
Multivariate Analysis
7/27/2019 Bio Statics
74/93
A statistical analysis that involves more thanone dependent variable.
The analysis of simultaneous relationshipsamong several variables. Examining simultaneously the effects of age, sex,
and social class on hypertension would be an
example of multivariate analysis
7/27/2019 Bio Statics
75/93
Considers the interrelationships of several traitsat a time .
Multivariate analysis comprises a set oftechniques dedicated to the analysis of data setswith more than one variable.
7/27/2019 Bio Statics
76/93
One data set
Interval or ratio level of measurement: principalcomponent analysis (PCA)
Nominal or ordinal level of measurement:correspondence analysis (CA), multiplecorrespondence analysis (MCA)
Similarity or distance: multidimensional scaling (MDS)
- Multidimensional scaling (MDS)is a set of relatedstatisticaltechniques often used in data visualizationfor exploringsimilarities or dissimilarities in data.
T d
http://en.wikipedia.org/wiki/Statisticalhttp://en.wikipedia.org/wiki/Data_visualizationhttp://en.wikipedia.org/wiki/Data_visualizationhttp://en.wikipedia.org/wiki/Statistical7/27/2019 Bio Statics
77/93
Two data sets Case one: one independent variable set and one
dependent variable set- Multiple linear regression analysis (MLR) Regression with too many predictors and/or several
dependent variables Partial least square (PLS) regression (PLSR)
Principal component regression (PCR) Ridge regression (RR)
Reduced rank regression (RRR) or redundancy analysis
Multivariate analysis of variance (MANOVA) Predicting a nominal variable: discriminant analysis
(DA) Fitting a model: confirmatory factor analysis (CFA)
7/27/2019 Bio Statics
78/93
Two (or more) dependent variable sets:
Canonical correlation analysis (CC)
Multiple factor analysis (MFA)
Multiple correspondence analysis (MCA)
Procustean analysis (PA)
7/27/2019 Bio Statics
79/93
Regression analysis
In statistics, regression analysisis used tomodel relationships between random variables,
determine the magnitude of the relationshipsbetween variables, and can be used to makepredictions based on the models.
http://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Random_variable7/27/2019 Bio Statics
80/93
Predictor variables may be defined quantitatively orqualitatively (or categorical).
If the predictors are all quantitative,- multipleregression.
If the predictors are all qualitative, one performs analysis
of variance.
If some predictors are quantitative and some qualitative,one performs an analysis of covariance
http://en.wikipedia.org/wiki/Multiple_regressionhttp://en.wikipedia.org/wiki/Multiple_regressionhttp://en.wikipedia.org/wiki/Analysis_of_variancehttp://en.wikipedia.org/wiki/Analysis_of_variancehttp://en.wikipedia.org/wiki/Analysis_of_covariancehttp://en.wikipedia.org/wiki/Analysis_of_covariancehttp://en.wikipedia.org/wiki/Analysis_of_variancehttp://en.wikipedia.org/wiki/Analysis_of_variancehttp://en.wikipedia.org/wiki/Multiple_regressionhttp://en.wikipedia.org/wiki/Multiple_regression7/27/2019 Bio Statics
81/93
If two or more independent variablesarecorrelated, we say that the variables are
multicollinear. Multicollinearity results in parameter estimates
that are unbiased and consistent, but which mayhave relatively large variances
7/27/2019 Bio Statics
82/93
Many patterns of distributions occur innature. Frequently, these patterns can be
described by mathematical functions. The most common statistical tests can beapplied to data that is normally distributed.
What if data obtained is not normally
distributed?? Log transformationof data to normaldistribution is undertaken.
Normal staistical tests cannot be applied to
data that is log transformed.
7/27/2019 Bio Statics
83/93
Logistic regression In statistics, logistic regressionis a model used for
prediction of the probabilityof occurrence of an event.
It makes use of several predictor variables that may beeither numerical or categories. For example, theprobability that a person has a heart attack within aspecified time period might be predicted fromknowledge of the person's age, sex andbody mass index.
The "input" is z and the "output"
http://en.wikipedia.org/wiki/Logistic_regressionhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Body_mass_indexhttp://en.wikipedia.org/wiki/Body_mass_indexhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Logistic_regression7/27/2019 Bio Statics
84/93
The input iszand the outputisf(z). The logistic function isuseful because it can take as an
input, any value from negativeinfinity to positive infinity,whereas the output is confinedto values between 0 and 1.
The variablezrepresents the
exposure to some set of riskfactors, whilef(z) represents theprobability of a particularoutcome, given that set of riskfactors. The variablezis a
measure of the totalcontribution of all the riskfactors used in the model and isknown as the logit
http://en.wikipedia.org/wiki/Logithttp://en.wikipedia.org/wiki/Logit7/27/2019 Bio Statics
85/93
Z = 0 + 1x1 + 2x2 + 3x3 .
0 is the intercept valueit is the value of z when other risk factors are absent.
1, 2 and 3 are regression coefficient
X1,x2 and x3 are risk factor for heart disease
The application of a logistic regression may be illustrated
i fi titi l f d th f h t di
7/27/2019 Bio Statics
86/93
using a fictitious example of death from heart disease.This simplified model uses only three risk factors (age,sex and cholesterol) to predict the 10-year risk of death
from heart disease.
0 = 5.0 (the intercept) 1 = + 2.0
2 = 1.0 3 = + 1.2 x1 = age in decades x2 = sex, where 0 is male and 1 is female x3 = cholesterol level, in mmol/dl
Risk of death =1/1+e z where z = -5.0+2.0 x1 - 1.0 x2+1.2x3
7/27/2019 Bio Statics
87/93
Discriminant AnalysisDiscriminant function(modified Maddrey's
discriminant function)originally described by Maddrey and Boitnott to predict
prognosisin alcoholic hepatitis.
canonical variate analysis attempt to establish whether aset of variables can be used to distinguish between two
or more groups.
http://en.wikipedia.org/wiki/Prognosishttp://en.wikipedia.org/wiki/Alcoholic_hepatitishttp://en.wikipedia.org/wiki/Alcoholic_hepatitishttp://en.wikipedia.org/wiki/Prognosis7/27/2019 Bio Statics
88/93
Suppose we have two samples representing differentpopulations,
We measured one character for them and found thattheir means for this character are not identical, theirdistributions overlap considerably, so that on thebasis of this character one could not, with any degreeof accuracy, identify an unknown specimen as
belonging to one or the other of the two populations. A second character may also differentiate them
somewhat, but not absolutely Two variables sayXl andX2 can be used to
distinguish them.
7/27/2019 Bio Statics
89/93
Discriminant function analysis computes a new variablesay Z, which is a linear function of both variablesX1andX2.
This function is constructed in such a way that as manyas possible of the members of one population have highvalue for "z" and as many as possible of the members ofthe other have low values, so that "z" serves as a muchbetter determinant of the two populations than doesvariableXl andX2 taken singly.
7/27/2019 Bio Statics
90/93
Example : Blood pressure and cholesterol levelsand blood sugar are different between those whoare obese and normal in body build.
Discriminant function analysis can be utilisedfor assessing the combined effect of factors thatare different between the two groups of subjects.
7/27/2019 Bio Statics
91/93
meta-analysis In statisticsa meta-analysiscombines the results of
several studies that address a set of related researchhypotheses.
The first meta-analysis was performed by Karl Pearsonin 1904, in an attempt to overcome the problem ofreduced statistical powerin studies with small samplesizes; analyzing the results from a group of studies canallow more accurate data analysis.
http://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Karl_Pearsonhttp://en.wikipedia.org/wiki/Statistical_powerhttp://en.wikipedia.org/wiki/Statistical_powerhttp://en.wikipedia.org/wiki/Karl_Pearsonhttp://en.wikipedia.org/wiki/Statistics7/27/2019 Bio Statics
92/93
CONCLUSION
Understanding the complexities of statisticalmodeling not only enable the use of test
characteristics in the actual design of diagnostictests, but familiarity with fundamental conceptswill also facilitate insight and critical evaluationof research that relies on such methodology.
7/27/2019 Bio Statics
93/93
Thank you