5Data Processing

Embed Size (px)

Citation preview

  • 7/31/2019 5Data Processing

    1/145

    Wakgari Deressa, PhD

    School of Public HealthAddis Ababa University

  • 7/31/2019 5Data Processing

    2/145

    ObjectivesThe participants should be able to: Understand the process involved in data

    processing Use computers to perform data

    Interpret summary statistics, graphicaldisplays

    Understand estimations & hypothesistesting

    Understand statements in published articles

  • 7/31/2019 5Data Processing

    3/145

    Introduction Describes statistical methods commonly

    used in health research. Data processing, management

    Data analysis

    Interpretation

    Use of computers and statisticalsoftware packages

    Epi Info 2002: one of the most commonlyused software packages by health researchers

  • 7/31/2019 5Data Processing

    4/145

    Preparation of Data for Statistical

    Analysis Data collected should be entered into

    a computer for analysis

    Includes tasks such as:

    1. Checking and manual editing

    2. Coding3. Creating format or views for data entry

    4. Entering data into a computer

    5. Cleaning

    6. Transformation

  • 7/31/2019 5Data Processing

    5/145

    1. Checking and manual editing Is a function of the quality of data

    Involves checking of data to detectincompleteness, inconsistencies, andother obvious errors in the questionnaire.

    The majority of the errors in the datashould be detected and corrected in thefield before the data are sent away.

    Interviewers and supervisors play a keyrole to correct any error closer to a

    source

  • 7/31/2019 5Data Processing

    6/145

    Types of checks1. Range checks: e.g., Male (=1) or

    Female (=2), age (1-99 yr)

    2. Typographic check: 41 rather than 14

    3. Consistency check: date of birth

    and age, age of mother =20 and achild of 15 years old

  • 7/31/2019 5Data Processing

    7/145

  • 7/31/2019 5Data Processing

    8/145

    2. Coding Coding is assigning a separate non-

    overlapping numerical code for separate

    responses recorded in words andmissing values

    Dont understand or accept alphabetical

    texts or codes or verbatim responses

    Example: 1=Male, 2=Female

    Should be both exhaustive and mutuallyexclusive

  • 7/31/2019 5Data Processing

    9/145

    Closed-ended questions are usually pre-assigned a numerical code (pre-coding= before data collection)

    What is your current marital status?1. Never married

    .

    3. Widowed/ separated

    Post-coding = after data collection

    Mainly used for open-ended questionsThe code is assigned after reviewing a

    representative responses from respondents

  • 7/31/2019 5Data Processing

    10/145

    Codebook = is a document that lists thecodes (or keys) of assignments of thevalues of the variable

    Guides researchers to find the rightcode for each answer category

    Some responses such as quantitativevariables can be directly entered into a

    computer without codingExample: Age, weight, body temperature,

    number of pregnancies and ANC visits, etc.

  • 7/31/2019 5Data Processing

    11/145

    Question/Statement

    Variablename Codes

    1.Your sex SEX 1=Male2=Female

    2. What is our occu ation? OCCUP 1=Farmer

    2=Housewife

    3=Trader

    4=Student

    5=Other

    Variable name = usually contains fewer than 8 characters

  • 7/31/2019 5Data Processing

    12/145

    3. Creating data format in acomputer

    Designing a format or electronic

    questionnaire in a computer

    Examples

    ex: =ma e, = ema e2Age: ## in years

    3Occup: # 1=farmer, 2=housewife,

    3=trader, 4=student, 5=other4Sick: # 1=yes, 2=no

    5RxSource _____________________

  • 7/31/2019 5Data Processing

    13/145

    4. Data Entry The transfer of data from a

    questionnaire to a computer file. Before entry, data must be checked for

    errors

    Data must be coded

    Date entered by a data entry clerk or a

    researcher

  • 7/31/2019 5Data Processing

    14/145

    5. Data cleaning Data entered must be checked for errors,

    impossible or implausible values and

    inconsistencies

    In most cases errors are inevitable

    ,avoidable

    Errors can result from incorrect reading

    of codes, incorrect reporting, missedentry, incorrect coding, repeated entry,incorrect typing, and so forth.

  • 7/31/2019 5Data Processing

    15/145

    Data cleaning involves two types ofchecks:

    1. Checking for outliers

    Sex: 1=male, 2=female If 3 or other number is entered rather than 1

    or 2, the error should be corrected by looking

    n o e or g na source2. Performing consistency checks

    Checking whether data in one part of a

    record is compatible with data in another part If sex is initially entered as 1=male, and 2entered for number of pregnancies, the datais not internally consistent

  • 7/31/2019 5Data Processing

    16/145

    6. Data transformation Involves restructuring of the data set and

    generating new variables or recodingsome of the existing data fields to definenew variables

    To transform the raw scores into standardscores

    Facilitates data analysis and interpretation

    Easily handled directly through instructionsinto a computer

  • 7/31/2019 5Data Processing

    17/145

    Example: Infants original BWT (enteredon the computer files in grams) can becategorized into a dichotomous variable

    Recode BWEIGHT (LOWEST THRU 2500=1) (2501 THRU HIGHEST=2)

    = ow r we g , = orma we g Age can be recoded from a continuous

    variable into a categorical one:

    1=15-24, 2=25-34, 3=35-44, 4=45-54

  • 7/31/2019 5Data Processing

    18/145

    EPI Info 2002 Epi Info 2002 is a series of programs foruse by public health professionals in

    conducting outbreak investigations,managing databases for public healthsurveillance and epidemiological data

    Epi Info 2002 software is in the publicdomain and freely available for use,copying, translation and distribution.

    "Epi Info" is a trademark of the Centers forDisease Control and Prevention (CDC).

    SPSS and STATA very expensive

  • 7/31/2019 5Data Processing

    19/145

    EPI Info 2002

    With Epi Info and a personal computer,

    physicians,

    epidemiologists, and

    can easily develop a questionnaire orform, customize the data entry process,

    and enter and analyze data (all in onepackage).

  • 7/31/2019 5Data Processing

    20/145

    EPI Info 2002 Epi Info is a tool that public health

    professionals use to: Create a questionnaire (format) (MakeView)

    Enter data in a questionnaire (Enter Data)

    Analyse the data (Analyse Data) Additional features: shape data entry,

    error checking, coding, selecting

    records, create new variables, recodedata, import and export files from othersystems.

  • 7/31/2019 5Data Processing

    21/145

    Running EPI Info 2002 Put on your computer

    Double click Epi Info 2002

    Select program (Make view, Enter data,Analyze data, etc.) from the menu.

    ThisistheEpiI

    nfoforWindo

    wsmainmenu

  • 7/31/2019 5Data Processing

    22/145

    Makeview Program Is used to place prompts and data entry

    fields on one or many pages of a View. Used to create questionnaires or

    .

    Regarded as both the form designerand the database design environment

  • 7/31/2019 5Data Processing

    23/145

    Questionnaire / View An electronic replications of paper forms

    or other information sources that arecreated to enter and store data

    Is created in the Epi Info Makeviewprogram

  • 7/31/2019 5Data Processing

    24/145

  • 7/31/2019 5Data Processing

    25/145

    Running Makeview Type the file name you gave earlier (e.g.

    Mary) and click OK. You can also typeanother name.

    Right click to create a field.

    Right click on any space will provide a

    box for writing the name of the variable(E.g. Name, age, sex, etc)

  • 7/31/2019 5Data Processing

    26/145

    Running Makeview Below you will see a menu of field or

    type of the variable Select the appropriate field (e.g. upper

    , ______Occupation), numeric (# for age), labelfor title) etc.

    Click OK

  • 7/31/2019 5Data Processing

    27/145

  • 7/31/2019 5Data Processing

    28/145

    Enter Program Displays data entry screen created in

    MakeView ready to receive data

    Data are entered and edited here

    Controls the data entry process, using

    specified in MakeView.

    A search function is provided so that

    records can be located that matchvalues specified for any combination ofvariables.

  • 7/31/2019 5Data Processing

    29/145

    Entering data using Enter Dataprogram Click Enter Data from the main menu

    Across the top of the screen click onFile and select Open

    You will be asked "Select a table" withdefault the file name you already gave it.

    Click OK

  • 7/31/2019 5Data Processing

    30/145

    The program will now make a data filefrom your questionnaire and display thevariables on the screen ready to receive

    data.

    Fill in the blanks until you have

    completed the form. The program will be ready to receive the

    2nd entry and so on.

    After finishing your data entry click Exit

  • 7/31/2019 5Data Processing

    31/145

    The Analyze Data Program The analysis program produce lists,

    frequencies, graphs, tables, and more

    statistics Select Analyze Data from the main Menu

    command windows will appear Click Read (import)

    Choose your file (e.g., Mary.rec)

    Click OK

  • 7/31/2019 5Data Processing

    32/145

    Descriptive Statistics Distribution or probability distribution

    refers to the way data are distributed, in

    order to draw conclusions about a setof data.

    ont nuous var a es = t e a m s todetermine whether or not normalitymay be assumed

    Categorical variables = We obtain thefrequency distribution for each variable

  • 7/31/2019 5Data Processing

    33/145

    Distribution of Categorical Variable

    A study was conducted to assess thecharacteristics of a group of 234 smokers by

    collecting data on gender and othervariables.

    Gender, 1 = male, 2 = female

    Gender Frequency (n) Relative Frequency

    Male (1)

    Female (2)

    110

    124

    47.0%

    53.0%

    Total 234 100%

  • 7/31/2019 5Data Processing

    34/145

    Frequency distribution of BWT:Bar Chart

    4000

    5000

    6000

    .

    0

    1000

    2000

    3000

    Very low Low Normal Big

    BWT

    F

    re

  • 7/31/2019 5Data Processing

    35/145

  • 7/31/2019 5Data Processing

    36/145

    Prevalence of active trachoma among

    children (1-9) by sex and area of residence

    33.5

    28.6

    34.4

    31.2

    22.824.3

    30

    40

    e(%)

    AT TF TI

    17.5

    20.2

    9.8

    .

    118.7

    10.4

    7.79.9

    0

    10

    20

    Female Male Rural Urban Total

    Prevalenc

    Gender and area of residence

  • 7/31/2019 5Data Processing

    37/145

    Pie Chart with relative frequencies of

    categories of BWT

  • 7/31/2019 5Data Processing

    38/145

    Distribution of Continuous Variable Examples:

    Age, height, weight, etc

    Continuous variable is infinite

    The probability associated with any

    particular value is almost equal to Zero However, it will assume some value in the

    interval enclosed by two ranges: x1 and x2

    The prob distn is visualized as a curve andprobabilities are areas under the curve

  • 7/31/2019 5Data Processing

    39/145

    The total area under a probability distribution is

    always 1. The section marked A represents theprobability of observing a value of 3 orgreater, symbolically written as Pr(X 3). If the area of

    A is say 0.2 units, then Pr(X 3) = 0.2

    0 1 2 3 4 5

    BPr(X1)

    APr(X3)

  • 7/31/2019 5Data Processing

    40/145

  • 7/31/2019 5Data Processing

    41/145

  • 7/31/2019 5Data Processing

    42/145

  • 7/31/2019 5Data Processing

    43/145

  • 7/31/2019 5Data Processing

    44/145

  • 7/31/2019 5Data Processing

    45/145

    The Normal Distribution

    mean

    standard deviation

  • 7/31/2019 5Data Processing

    46/145

  • 7/31/2019 5Data Processing

    47/145

    Histograms Histograms are frequency distributions with

    continuous class intervals that have been turned

    into graphs. To construct a histogram, we draw the interval

    boundaries on a horizontal line and the

    frequencies on a vertical line. Non-overlapping intervals that cover all of the

    data values must be used.

    Bars are then drawn over the intervals Area of each column proportional to the number

    of observations in that interval

  • 7/31/2019 5Data Processing

    48/145

  • 7/31/2019 5Data Processing

    49/145

    Frequency polygon

    A frequency distribution can beportrayed graphically in yet another wayby means of a frequency polygon.

    To draw a frequency polygon weconnect the mid-point of the tops of thecells of the histogram by a straight line.

  • 7/31/2019 5Data Processing

    50/145

    The frequency polygon of birth weight ofnewborns by sex

  • 7/31/2019 5Data Processing

    51/145

    Numerical Summary MeasuresMeasures of location

    Measures of dispersion

  • 7/31/2019 5Data Processing

    52/145

    Measures of LocationThe most common measures:

    Mean (Arithmetic Mean) Median

    o e

  • 7/31/2019 5Data Processing

    53/145

    Mean (Arithmetic mean) the "average" which is obtained by adding

    all the values in a sample or population anddividing them by the number of values.

  • 7/31/2019 5Data Processing

    54/145

    Example: 10 numbers:

    19 21 20 20 34 22 24 27 27 27

    Mean = (19 + 21 + +27) = 24.1

    10

    Median

  • 7/31/2019 5Data Processing

    55/145

    Median The median is the value which divides the

    data set into two equal parts.

    If the number of values is odd, the medianwill be the middle value when all values arearranged in order of magnitude.

    When the number of observations iseven, there is no single middle value but twomiddle observations.

    In this case the median is the mean of thesetwo middle observations, when allobservations have been arranged in the

    order of their magnitude.

  • 7/31/2019 5Data Processing

    56/145

    In the above data set, arranging in

    increasing order :19 20 20 21 22 24 27 27 27 34

    Median = (22 + 24)/2 = 23

  • 7/31/2019 5Data Processing

    57/145

    Mode Any observation of a variable at which

    the distribution reaches a peak Most distributions are unimodal

    n t e a ove examp e, t e mo e s .

    It occurs three times (most frequentvalue)

    It is possible to have more than onemode or no mode.

  • 7/31/2019 5Data Processing

    58/145

    Measures of Dispersion

    Dispersion refers to the variety exhibited by

    the values of the data.

    e amount may e sma w en t e va uesare close together.

    Two or more sets may have the same meanand/or median but they may be quitedifferent.

  • 7/31/2019 5Data Processing

    59/145

    These two distributions have the same mean,median, and mode, but they may be quite different

  • 7/31/2019 5Data Processing

    60/145

    Range The range is the difference between the largestand smallest values in the set of observations.

    These values are often called the maximum andthe minimum.

    If 167 is the largest and 40 is the smallest, then

    range is,

    167 40 = 127

  • 7/31/2019 5Data Processing

    61/145

  • 7/31/2019 5Data Processing

    62/145

    a) The first quartile (Q1): 25% of all theranked observations are

  • 7/31/2019 5Data Processing

    63/145

    Inter-quartile range (IQR)

    The IQR encompasses the middle50% of the observations

    3 1,

    Q3 = 3rd quartile and Q1= 1st first quartile.

  • 7/31/2019 5Data Processing

    64/145

    Median = 2nd quartile (dividing into twohalves)

    1st quartile (Q1) = 1/4(n + 1)th 2nd Quartile (Q2) = 1/2 (n + 1)th

    r uar e 3 = n+

    2 2

  • 7/31/2019 5Data Processing

    65/145

    Variance (2

    , S2

    ) A measure of the dispersion relative to

    the scatter of the values about theirmean.

    squares of the deviations taken from themean.

    Population variance = 2

    Sample variance = S2

    A sample variance is calculated for a sample of

  • 7/31/2019 5Data Processing

    66/145

    A sample variance is calculated for a sample of

    individual values (X1, X2, Xn) and uses the samplemean (e.g. ) rather than the population mean .

  • 7/31/2019 5Data Processing

    67/145

    Limitation of the VarianceThe variance is a mean of squared values

    It is not expressed in the same unitas the observation for which it representsthe dis ersion

    A variance of a distribution of weightis not expressed in Kg, but in Kg2

    weight = 36.5 Kg, s = 257 Kg2

  • 7/31/2019 5Data Processing

    68/145

    Standard deviationStandard deviation = Square root of the variance

    =

    =

    m = 36.5 Kg

    s = 257 Kg2

    s = = 16 Kg257The standard deviation is expressed in the same units

    as the measurement it represents

  • 7/31/2019 5Data Processing

    69/145

  • 7/31/2019 5Data Processing

    70/145

    Box Plot It is another way to display information

    about the distribution of a set of data. Can be used to display a set of discrete

    single vertical axis only certainsummaries of the data are shown

    First the quartiles of the data set mustbe defined

    A box is drawn with the top of the box at

  • 7/31/2019 5Data Processing

    71/145

    A box is drawn with the top of the box atthe third quartile and the bottom at thefirst quartile.

    The location of the mid-point of thedistribution is indicated with a horizontal

    .

    Finally, straight lines are drawn from thecentre of the top of the box to the largestobservation and from the centre of thebottom of the box to the smallestobservation.

  • 7/31/2019 5Data Processing

    72/145

    Percentile = p(n+1), p=the required percentile

    Arrange the numbers in ascending order

    A. 1st quartile = 0.25(n+1)th

    B. 2nd quartile = 0.5(n+1)th

    C. 3rd

    quartile = 0.75(n+1)th

    D. 20th percentile = 0.2(n+1)th

    C. 15th percentile = 0.15(n+1)th

  • 7/31/2019 5Data Processing

    73/145

  • 7/31/2019 5Data Processing

    74/145

    G est . age

    P re

    T er m

    P o s t

    Bir th w eight(gra m s )

    5000 4500 4000 3500 3000 2500 2000 1500 1000 500

  • 7/31/2019 5Data Processing

    75/145

    Statistical Inference Statistical inference is the process of

    using samples to make inferences about apopulation.

    hypothesis testing. Often the population parameter of interest

    is either mean or a proportion.

  • 7/31/2019 5Data Processing

    76/145

    Parameter Estimations Population parameter: the underlying

    (unknown) distribution of the variable ofinterest for a population

    Sample parameter: estimates of thepopulation parameters obtained from asample

  • 7/31/2019 5Data Processing

    77/145

    Types of Estimates

  • 7/31/2019 5Data Processing

    78/145

    yp

    Point estimation involves the calculationof a single number to estimate the

    population parameter Single values

    Interval estimation specifies a range ofreasonable values for the parameter Confidence interval

    Provides more information about a populationcharacteristic than does a point estimate

  • 7/31/2019 5Data Processing

    79/145

    Confidence Intervals Used for estimating the true value of

    the population parameter

    for anticipated true populationparameter.

  • 7/31/2019 5Data Processing

    80/145

  • 7/31/2019 5Data Processing

    81/145

    CI tells us how precise our estimate islikely to be

    A narrow CI implies highprec s on, w e a w e mp es ow

    precision.

    A Narrow CI reflects large sample sizeor low variability or both.

  • 7/31/2019 5Data Processing

    82/145

    95% CI commonly usedSometimes 90% and 99%

    The 95% CI is calculated in such a waythat, under the conditions assumed forunderlying distribution, the interval willconta n true popu at on parameter 5%

    of the time. Loosely speaking, you might interpret a

    95% CI as one which you are 95%confident contains the true parameter.

  • 7/31/2019 5Data Processing

    83/145

    CIs can also answer the question ofwhether or notan association exists ora treatment is beneficial or harmful.analo ous to -values

    e.g., if the CI of an odds ratio includes the value 1.0we cannot be confident that exposure is associatedwith disease.

  • 7/31/2019 5Data Processing

    84/145

    C.I. for a population meana) Known variance (large sample size)

    A 100(1-)% C.I. for is

    is to be chosen by the researcher, mostcommon values of are 0.05, 0.01,0.001 and 0.1.

    Margin of Error

  • 7/31/2019 5Data Processing

    85/145

    (Precision of the estimate)

  • 7/31/2019 5Data Processing

    86/145

    B. Unknown variance(small sample size, n 30)

    What if the for the underlying populationis unknown and the sam le size is small?

    As an alternative we use Students tdistribution.

    Based on degrees of freedom

  • 7/31/2019 5Data Processing

    87/145

  • 7/31/2019 5Data Processing

    88/145

    Note: t approaches z as n increases

    C.I. for a population proportion

  • 7/31/2019 5Data Processing

    89/145

  • 7/31/2019 5Data Processing

    90/145

    Hence,

    is an approximate 95% CI for the true proportion P.

    E l

  • 7/31/2019 5Data Processing

    91/145

    Example: A study on dental health practice. Of 300

    adults interviewed, 123 said that theyregularly had a dental check-up twice a

    . . .

    proportion in the population? (0.36, 0.46).

    E ti ti f T P l ti

  • 7/31/2019 5Data Processing

    92/145

    Estimation for Two Populations

    H pothesis testing

  • 7/31/2019 5Data Processing

    93/145

    Hypothesis testing The majority of statistical analyses involve

    comparison, most obviously betweentreatments or procedures or between.

    Hypothesis: A statement about one ormore population

    Is the true population mean BW equals 3000 g?

  • 7/31/2019 5Data Processing

    94/145

    The alternative hypothesis HA is ah di i h h ll

  • 7/31/2019 5Data Processing

    95/145

    The alternative hypothesis, HA, is astatement that disagrees with the nullhypothesis.

    The effect of interest is not zero, there isa difference

    tates t e ne o t n ng o t e

    researcher It is the hypothesis that the researcher

    wants to prove

    Examples of Research Hypotheses

  • 7/31/2019 5Data Processing

    96/145

    Population Mean

    The average length of stay of patients

    admitted to the hospital is five days The mean BW of babies delivered by

    mothers with low SES is lower than those

    from higher SES

    The economic burden of HIV/AIDS on thepoor is higher than that of the wealthierpeople

    Etc

    Population Proportion

    The proportion of adult smokers in Addis

  • 7/31/2019 5Data Processing

    97/145

    The proportion of adult smokers in AddisAbaba is p = 0.40

    The prevalence of HIV among non-married adults is higher than that in

    A greater proportion of people who live inpoverty may have a low health status

    Inappropriate prescription of drugs is

    more common in settings wheremicroscopy is unavailable

    Etc

    HT using test statistics (E g Mean):

  • 7/31/2019 5Data Processing

    98/145

    HT using test statistics (E.g., Mean):H0: = 0 H0: 0 H0: 0H1: 0 H1: > 0 H1: < 0

    two-tailed one-tailed one-tailed

    Decide on the appropriate test statisticfor the hypothesis (Z, t, 2, F, etc.)

    Select the level of significance for thestatistical test (=0.05, 0.01, 0.001, etc.)

    Another way to state conclusion:

  • 7/31/2019 5Data Processing

    99/145

    Another way to state conclusion:

    Reject H0 if P-value< ,

    Accept H0 if P-value .

    the smaller the P-value the stronger the evidence

    against the Ho.

    P-value

  • 7/31/2019 5Data Processing

    100/145

    P-value

    The P valueis the probability that a difference as

    large as we have observed could have occurred

    simply by chance

    The probability that we could be wrong if we reject

    the Ho

    Indicates the probability that the association

    between two variables might be due to chance

    Types of Errors in Hypothesis

  • 7/31/2019 5Data Processing

    101/145

    Types of Errors in HypothesisTests

    Whenever we reject or accept the Ho, wecommit errors.

    Two types of errors are committed.

    Type I Error

    Type II Error

    Type I error (): the probability of

  • 7/31/2019 5Data Processing

    102/145

    Type I error (): the probability ofrejecting H0 when it is true.

    It is the probability of being wrong wheno s true.

    Typical value for (significance level) is 5%

    Type II error (): The probability of not

  • 7/31/2019 5Data Processing

    103/145

    Type II error (): The probability of notrejecting H0 when it is actually false.

    Failure to accept HA when it is true

    Power: The probability of rejecting H

  • 7/31/2019 5Data Processing

    104/145

    Power: The probability of rejecting H0when it is false OR accepting HA when it is

    true.Power = 1- .

    Typical value for Power is 80%

    Statistical Methods for Continuous

  • 7/31/2019 5Data Processing

    105/145

    Statistical Methods for ContinuousVariables: Comparison of Groups

    Is there a significant difference between two or moregroups?

    Students t-Test (Unpaired data)

  • 7/31/2019 5Data Processing

    106/145

    Student s t Test (Unpaired data) The Students t-test or simply t-test is

    commonly used for comparison of the

    means of two groups. Our null h othesis is HO : 1 = 2 where1 is the true mean of the first group and2 is the true mean of the second group.

    Assumption:

    The data are independent & normallydistributed

    Example:C m i m BWT b t m l d

  • 7/31/2019 5Data Processing

    107/145

    p Comparing mean BWT between males and

    females

    Comparing mean blood pressure betweendiabetic and non-diabetic patients

    Etc

    Students t-Test (Paired data)

  • 7/31/2019 5Data Processing

    108/145

    Student s t Test (Paired data) Study subjects from one population can be

    matched, or paired with particular subjectsin the second population.

    Comparing mean BWT weight between twins Comparing mean blood pressure of diabetic

    patients before and after some treatment

    Eyes or ears of the same individual

    One-way analysis of variance

    (ANOVA)

  • 7/31/2019 5Data Processing

    109/145

    (ANOVA) Suitable for deciding whether differences

    exist between the means of more than two

    groups. Is a eneralization of the Students t-test.

    The Ho is HO: 1 = 2 = 3 = 4 , where iis the true mean of the ith group.

    Allows us to test whether the mean of at

    least one of the groups differs significantlyfrom some other group.

  • 7/31/2019 5Data Processing

    110/145

    Correlation

  • 7/31/2019 5Data Processing

    111/145

    Correlation Measures the strength of the relationship

    between two continuous variables from a

    single population The relationshi should be linear.

    ProcedureDisplay the data in scatter plot before carrying out any further analysis

    One variable is plotted on the X-axis

    The other on the Y-axis

    Nation %Immunized

    Child MR

    per 1000 LBBoliviaB il

    77

    69

    118

    65

  • 7/31/2019 5Data Processing

    112/145

    Brazil

    Cambodia

    Canada

    China

    Czech Republic

    Egypt

    Ethiopia

    69

    32

    85

    94

    99

    89

    13

    65

    184

    8

    43

    12

    55

    208

    Finland

    France

    Greece

    India

    Italy

    Japan

    Mexico

    PolandRussia

    Senegal

    Turkey

    UK

    95

    95

    54

    89

    95

    87

    91

    9873

    47

    76

    90

    7

    9

    9

    124

    10

    6

    33

    1632

    145

    87

    9

    Percentage of children immunized against DPT and

    under-five mortality rate for 20 countries, 1992

  • 7/31/2019 5Data Processing

    113/145

    125

    150

    175

    200

    225

    250

    alityrate

    0

    25

    50

    75

    100

    0 25 50 75 100 125

    Per ce ntage im m unize d