14
Where Do Data Come From? Conceptualization and operationalization of concepts --> measurement strategy --> data. Different strategies --> different ways of obtaining data. Closer look at these different ways of collecting/producing data The main distinction is between observational data and experimental data Important ideas: Sample, Population, Simple random sample, Bias.

Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Embed Size (px)

Citation preview

Page 1: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Where Do Data Come From?

● Conceptualization and operationalization of concepts --> measurement strategy --> data.

● Different strategies --> different ways of obtaining data.

● Closer look at these different ways of collecting/producing data

● The main distinction is between observational data and experimental data

● Important ideas: Sample, Population, Simple random sample, Bias.

Page 2: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Where Do Data Come From?

● Observational Study– Observes individuals (units of analysis) and

measures variables of interest– Unobtrusive; does not attempt to influence the

responses● Experiment

– Deliberately imposes some “treatment” on the individuals

– Studies whether the treatment causes change in the response

– Not always feasible. e.g. 1. Does living close to hazardous waste sites affect children's academic performance? e.g.2. Does gender explain voting behavior?

Page 3: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Observational Study (?)

Notz01-C01

Page 4: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Observational Data Examples● Sample Surveys. Observations on samples of a

population. e.g.– Public opinion polls (such as the Gallup polls)

● Population: U.S. Residents 18 years of age and over● Sample: between 1000 and 1500 people

– The Current Population Survey (unemployment and many other economic and social variables)

● Population: U.S. Households● Sample: About 60,000 households interviewed each

month.– The General Social Survey (personal and family

background, experiences and habits, attitudes and opinions)

● Population: Adults (>=18) living in U.S. Households ● Sample: about 3000 interviewed in person at home

Page 5: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Observational Data Examples

● Census: Tries to survey the entire population– Why census? Need block-by-block data to create

election districts.– Issues:

● Very expensive, in both time and money● Data can be noisier than from samples● Can only attempt to sample the entire population.

2000 census missed .12% of the population, including 2.78% of the black population.

– Reform: the “long form” has been replaced by a sample survey (“American Community Survey”)

Page 6: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Experiments● Do something to the individuals to see the

causal effect of the “treatment”● e.g., Job training for welfare mothers.

– Under voluntary participation, self selection can be a problem, so that the training group may differ systematically from the control group (in motivation, education, attitude, etc.)

– Randomized experiments help in finding the answer– Estimate average effects. Individual level effect not

possible. Why? (fundamental problem of causal inference)

– Issues: ethical considerations, compliance, etc.– More in the lecture on experiments

Page 7: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Samples, Good and Bad

1

9

23 4 5

6

78

10

17161513

14

1211

Sample

How to draw samples?

Page 8: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Samples, Good and Bad● Bad: sample does not represent the population

well; Systematically bias the results – Convenience sampling

● Select individuals based on convenience (e.g. Mall samples; Classroom samples); include those easiest to reach in the sample

– Voluntary response sampling● allowing individuals to choose to be in the sample

(write-in or call-in opinion polls)● e.g. Ann Landers asked readers of her advice column

whether they'd have children if they were to do it all over again. 10,000 responded, with 70% saying “No!” Representative of the general population? Who likely responded? Who read her columns to start with? (See cartoon)

Page 9: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

“Would you have children if you could do it all over again?”

Notz02_C01

Page 10: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Samples, Good and Bad● Good: Simple random samples

– Get rid of the personal choice element in convenience and voluntary samples.

– Choosing a sample by impersonal chance. Ensure that every individual in the population has equal chance of being selected into the sample

– Idea: “Drawing names from a hat” – How to do it? Make use of random numbers

● Look up a random number table. But better yet:● Use software. e.g. http://www.randomizer.org/

– Label the individuals, from 1 to N (say 1 to 340)– Decide the sample size k (say 34)– Software randomizes your list through sorting it by

uniform random numbers. After the list is randomized the first k could be used. Try it.

Page 11: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Samples, Good and Bad

● When you encounter any sample (such as in reading about a Gallup poll story), ask whether it's a “good” sample. Is it representative of the intended population? Does every individual have equal chance of being selected?

● Look for the magical words “randomly selected” (or something equivalent)

● Things don't always go as planned. Complications in executing a random sampling plan such as non-response can cause the resulting sample to be non-random. Missing data

Page 12: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Samples Good and Bad: More Examples

Free internet polls, e.g. http://www.misterpoll.com

Pick a poll and take a look at the results. e.g., “secret

spying”, “Obama's progress so far”

Do you trust the results? (What's the sample size?

Who can vote? Can one vote twice? More? Are the

questions well designed?)

Page 13: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Samples Good and Bad: More Examples

Many television stations conduct polls by asking viewers to call one phone number if they feel one way about an issue and a different phone number if they feel the opposite.

Does the result of such a poll represent the feelings of the community?

Or does it represent the feelings of all those watching the TV station at the time?

Or some other group?

Page 14: Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different

Samples Good and Bad: More Examples

Suppose you had a telephone directory listing all of the businesses in a city, alphabetized by type of business. If you wanted to phone 100 of them in order to get a representative sampling of opinion on some issue, would it be a good idea to simply use the first 100 businesses listed?