41
1 Basic descriptive epidemiology you can analyze your data! The GWU YES Center Evaluation Team and Manya Magnus, PhD, MPH The George Washington University School of Public Health and Health Services Department of Epidemiology and Biostatistics July 2007 The data are pouring in—use ‘em!

Basic Statistics Spns Summer 2007

  • Upload
    shva

  • View
    281

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Basic Statistics Spns Summer 2007

1

Basic descriptive epidemiology

you can analyze your data!

The GWU YES Center Evaluation Team and

Manya Magnus, PhD, MPH

The George Washington University

School of Public Health and Health Services

Department of Epidemiology and Biostatistics

July 2007

The data

are

pouring

in—use

‘em!

Page 2: Basic Statistics Spns Summer 2007

2

Outline

• Research question development

– Research question matrix

• Data description

• Descriptive epidemiologic study designs

– Case reports

– Case series

– Ecologic studies

– Outbreak investigations (introduced)

Page 3: Basic Statistics Spns Summer 2007

3

You’ve seen this before!

Page 4: Basic Statistics Spns Summer 2007

4

Research questions

• What is a research question?

– A carefully articulate question that you are asking

– Examples• Is there an association between exposure to the dairy products at a buffet and vomiting?

• Is there an association between exposure to a new virus and severe respiratory distress?

• Is there an association between highly active antiretroviral treatment (HAART) for HIV and HIV viral load?

Page 5: Basic Statistics Spns Summer 2007

5

Null and alternative hypotheses (1)

• The research question drives development

of formalized null and alternative

statements

– The null hypothesis is the statement of no

difference

– The alternative hypothesis is generally the

formalized hypothesis underlying the research

question—what you think you will find

Page 6: Basic Statistics Spns Summer 2007

6

Null and alternative hypotheses (2)

• Examples– Research question: Is there an association between highly active antiretroviral treatment (HAART) for HIV and HIV viral load?

– H0: There is no association between receipt of HAART and HIV viral load or viral loads among patients on HAART = viral loads among patients not on HAART

– HA: There is an association between receipt of HAART and HIV viral load or viral loads among patients on HAART ≠ viral loads among patients not on HAART

Page 7: Basic Statistics Spns Summer 2007

7

Null and alternative hypotheses (3)

– Research question: Is there an association between

exposure to the quiche at the buffet and salmonella

food poisoning?

– H0: There is no association between receipt of quiche

and salmonella or salmonella rates among patients

exposed to the quiche = salmonella rates among

patients not exposed to the quiche

– HA: There is an association between receipt of quiche

and salmonella or salmonella rates among patients

exposed to the quiche ≠ salmonella rates among

patients not exposed to the quiche

Page 8: Basic Statistics Spns Summer 2007

8

Null and alternative hypotheses (4)

• These are two-sided hypotheses; one sided are also possible as follows– Research question: Is HAART associated with decreased HIV viral loads?

– H0: Receipt of HAART is associated with increased or equal HIV viral loads or viral loads among patients on HAART are > viral loads among patients not on HAART

– HA: Receipt of HAART is associated with decreased HIV viral loads or viral loads among patients on HAART are < viral loads among patients not on HAART

Page 9: Basic Statistics Spns Summer 2007

9

Null and alternative hypotheses (5)

• Note that in a one-sided H0and H

A, the

equal sign remains in the H0 (the <) not in

the HA

• The HAis the statement that indicates a

difference, whereas the null is the

statement of no difference

• There are statistical testing implications in

this choice

Page 10: Basic Statistics Spns Summer 2007

10

Variable selection (1)

• Essential to select appropriate variables to

evaluate and to operationalize (define)

them in advance

– Independent variable=predictor=x=exposure

– Dependent variable=outcome=y=disease

Page 11: Basic Statistics Spns Summer 2007

11

Variable selection (2)

• Examples

– Is use of a DEET-based insect repellent social

marketing campaign associated with West

Nile Virus?

– Is substance abuse status associated with

unmet need and immunocompromise?

Page 12: Basic Statistics Spns Summer 2007

12

Research question matrix (1)

• The research question matrix is a useful

tool to ensure that you are considering all

independent, dependent, confounder

(potential), and effect modifier (potential)

variables in your study

• Helps ensure that you are able to answer

your research question in the end

Page 13: Basic Statistics Spns Summer 2007

13

Research question matrix (2)

Page 14: Basic Statistics Spns Summer 2007

14

Steps in preparing a study

Page 15: Basic Statistics Spns Summer 2007

15

Background

• Descriptive epidemiology– Describing public health events—including those concerning infectious disease—with respect to person, place, and time

– Provides information in a format which may be interpreted

– Suggests studies that will eventually test hypotheses about the associations between exposures (e.g., organisms) and outcomes (e.g., infectious disease); also known as hypothesis generating

– One of the first, and still one of the most important, facets and methods of epidemiologic inquiry

Page 16: Basic Statistics Spns Summer 2007

16

Basic tenets of descriptive

epidemiology (1)

• Descriptive epidemiology is the foundation

for all subsequent analytic methods

– No matter what analytic designs are later

selected, a solid description of the data at

hand is required and is always a first step

– Description must be systematic, however; just

because there is no hypothesis being tested

does not imply that rigor in methods is not

required—it still is

Page 17: Basic Statistics Spns Summer 2007

17

Example:

Case reports and HIV/AIDS (2)

• Noticing differences in person, place, and/or

time characteristics is often the first step towards

stopping disease; recognition is crucial

• The first case reports of PCP pneumonia in 1981

ushered in the era of HIV/AIDS

– Had it not been for the keen observation skills of

clinical providers, noticing differences in

characteristics of disease, it could have taken longer

to identify that a problem was present than it did

Page 18: Basic Statistics Spns Summer 2007

18

Example:

Case reports and HIV/AIDS (3)

A snapshot of the

original MMWR report of

the first 5 documented

cases of PCP among

what later became

known to be HIV+ men

Page 19: Basic Statistics Spns Summer 2007

19

Case reports and HIV/AIDS (4)

• Looking at the profile—the description—of the cases is

what shed light on the possibility of there being a

problem

– PCP pneumonia usually among elderly and/or those severely

immunocompromised—seldom among healthy, young

individuals

– Cluster of relatively rare type of pneumonia

– Behavioral clues

– Similar presentation vis a vis other comorbidities (e.g., CMV)

• This approach, looking at individual case reports—is one

of the descriptive epidemiologic designs

Page 20: Basic Statistics Spns Summer 2007

20

Descriptive epidemiologic methods (1)

• Case reports are just one of the methods used in

descriptive epidemiology

• Descriptive methods share the same goals

– Describe and evaluate trends in health outcomes or

behaviors

– Provide data for planning needs

– Suggest future research needs

Page 21: Basic Statistics Spns Summer 2007

21

Descriptive epidemiologic methods (2)

• Comprehensive description of the person, place,

and time of the events is the hallmark of

descriptive epidemiology

• What are these exactly?

Page 22: Basic Statistics Spns Summer 2007

22

Descriptive epidemiologic methods (3)

• Person

– Characteristics of individuals affected by the disease

of interest

• Examples of person

– Who are they?

– Gender?

– Race? Ethnicity?

– Age?

– Religion?

– Educational attainment?

Page 23: Basic Statistics Spns Summer 2007

23

Descriptive epidemiologic methods (4)

– What unites the cases?

– What do they have in common? What do they have

that differs from non-cases or from each other?

– Health status, including specific co-morbidities, places

access care, treatments, etc.

– Community

– Socioeconomic status

– Behavior, including licit, illicit

– Many, many more

Page 24: Basic Statistics Spns Summer 2007

24

Descriptive epidemiologic methods (5)

• Place– Characteristics of where the cases took place

• Examples of place– Country, state, locality, zip code, census tract?

– One establishment? What type? School?

– Where were the vents in relation too each other, proximal? Isolation?

– Urban, suburban, rural?

– Where was the disease contracted vs. where diagnosed (i.e., air travel problem)

– Housing type?

– Environmental characteristics: Weather? Dust? Pollution? Pollens? Industrial? Dessert? Toxins?

Page 25: Basic Statistics Spns Summer 2007

25

Descriptive epidemiologic methods (6)

• Time– Characteristics of when the cases took place

• Examples of time– When did the events first occur? When did the symptoms first occur?

– What was happening around that time?

– Month day year?

– Day of week?

– Time of day?

– Sun/dark?

– Relation to other events, such as sewage release? Toxic emissions? Summer break? Holiday?

– Geospatial and temporal clustering

Page 26: Basic Statistics Spns Summer 2007

26

Descriptive epidemiologic methods (7)

• Gathering data systematically is essential

– Haphazard person, place, and time data collection is

seldom fruitful

– Instruments are frequently used to direct interviewing

and data collection process

– These can be essential

Page 27: Basic Statistics Spns Summer 2007

27

Descriptive epidemiologic methods (8)

• Once data have been collected systematically,

they need to be described

• Statistical packages as well as qualitative

analysis software can be used to describe data

obtained

• Additional coursework will be necessary to do

this, but the concepts are straightforward

– We want to comprehensively describe the data in as

many ways as possible to truly understand what is

going on with respect to person, place, and time

Page 28: Basic Statistics Spns Summer 2007

28

Steps in describing data (1)

• Calculate the frequencies of categorical

variables

– This will inform you of how the sample is distributed

among different categories of independent variables

– Summary data of your outcomes are especially

important because they reveal the proportion of

missing data, which can impact your study

enormously

– See how data are communicated in reports and

journals you usually read

Page 29: Basic Statistics Spns Summer 2007

29

Steps in describing data (2)

• Calculate measures of central tendency (mean,

median, mode) and dispersion (standard

deviation or variance) for continuous variables

– How are variables distributed? Do they follow a

normal distribution (that is, like a bell curve)? Or are

they skewed left or right? Are the tails heavy or

skinny?

– This can be assessed visually to some degree, and

tested quantitatively as well

Page 30: Basic Statistics Spns Summer 2007

30

Steps in describing data (3)

• Plot the continuous data one variable at a time,

using box plots, stem and leaf plots, or other

graphic displays at your disposal

– This describes the data variable by variable

– In addition, it helps identify where there are out of

range values or missing values, and gives a general

description of your continuous data

Page 31: Basic Statistics Spns Summer 2007

31

Steps in describing data (4)

Stem-and-leaf plot for age (age of index)• 1f | 45555

• 1s | 6667777777

• 1. | 88888888888888888899999999999999

• 2* | 0000000000000000000000111111111111111

• 2t | 2222222222222222222222233333333

• 2f | 4444444444444444444444455555555555555

• 2s | 66666666666666667777777777

• 2. | 88888888899999999999

• 3* | 000000000000111111111111

• 3t | 22222222222222222333333333

• 3f | 4444444444455555555555555555555

• 3s | 66666666677777777777

• 3. | 88888888888888889999999

• 4* | 000001111111111111

• 4t | 222222233333333333333

• 4f | 44444444555555

• 4s | 666777

• 4. | 8888999

• 5* | 0000000011111A histogram

Page 32: Basic Statistics Spns Summer 2007

32

Steps in describing data (5)

• Plot the data in a scatterplot, placing the dependent

(outcome) variable on the Y-axis (the vertical axis) and

the independent (potential predictor) variable on the X-

axis (the horizontal axis)

– What do the data look like?

– How do they relate to each other?

– Is there any discernable pattern or relationship between the

independent variables and dependent variable under study? Is

there any discernable pattern between independent variables?

– (Remember that we are still looking for “clues”; not seeing a

pattern does not mean there is not one, just as seeing one does

not mean there is one!)

Page 33: Basic Statistics Spns Summer 2007

33

Steps in describing data (6)

• A scatterplot

Page 34: Basic Statistics Spns Summer 2007

34

Steps in describing data (7)

• Look at outliers, datapoints that stand out from the rest of the distribution. – For continuous variables, this can be quantified by looking at datapoints that extend beyond a set level (e.g., two standard deviation above or below the mean).

– What are they? Get to know each of these outliers. Investigate them. Are they data entry errors? Documentation problems? Or are they true?

– There are a number of techniques available to diagnose outliers and treat them appropriately

– However, sometimes, if the data are correct, the outlier can be a substantive “clue” towards figuring out the problem at hand. Each might represent an acute case, a pronounced relationship, or something “different” that can be extremely useful

Page 35: Basic Statistics Spns Summer 2007

35

Steps in describing data (5a)

• A box and whisker plotThese dots are out of range

values, to help see outliers

Shows upper 75th percentile,

median (50th percentile), and

25th percentile

Shows upper and

lower values

Steps in describing data (8)

A box-and-whisker plot

Page 36: Basic Statistics Spns Summer 2007

36

Specific descriptive methods (1)

• There are several basic descriptive methods

– Case reports

– Case series

– Ecologic studies

– Outbreak investigations

– Descriptive data emerging from cross-sectional

studies (to be covered in a later lecture)

Page 37: Basic Statistics Spns Summer 2007

37

Specific descriptive methods (2)

• Case reports

– Identification of “unusual” cases that are shared with the medical

or public health communities

• Generally consist of one or more of these hallmark

characteristics

– A known disease in an unusual population

– Presentation of a previously unrecognized syndrome or disease

– Presentation of a disease more or less severe than previously

seen, or with a different characteristic than noted before

– Presentation of a disease that was transmitted in a mode not

generally seen or suspected

Page 38: Basic Statistics Spns Summer 2007

38

Specific descriptive methods (3)

• Case series

– A series of case reports put together in a systematic

(or semi-systematic) fashion

– May include some denominator data though will be

historic in nature, and it may not have been

systematically collected

Page 39: Basic Statistics Spns Summer 2007

39

Specific descriptive methods (4)

• Ecologic studies

– These differ from other descriptive (and analytic)

studies in that they utilize aggregate data

– These are data collected on the group level, such as

• Means (averages) per state

• Proportions per city

• And so forth

Page 40: Basic Statistics Spns Summer 2007

40

Specific descriptive methods (5)

• Ecologic studies have one major limitation– The ecologic fallacy

• This results because we do not know if the association seen on the aggregate level holds on the individual level

• Still, extremely important and useful study design– Hypothesis generating

– Make comparisons between large groups, for example, continents, which would otherwise be impossible

– Can be done with minimal resources

Page 41: Basic Statistics Spns Summer 2007

41

Descriptive methods (6)

• Descriptive studies are just the beginning

• Once foundational information is gained about the

disease under study, public health action is required

– That may include additional studies

• Observational, case-control or cohort studies

• Intervention, experimental designs

– Or working with community of other scientists towards greater

understanding of the organism in question

– Or working with the public health community to prevent future

spread of the disease—engaging in preventive measures

immediately, discussion through media