bef begin

Embed Size (px)

Citation preview

  • 7/30/2019 bef begin

    1/6

    C

    Before the Beginn ing:

    Who, What, Where,Why, When , and How?

    Macintosh HD:DA:DA IX:Volu me I:009 Before the Beginning Wed nesd ay, Ju ne 12, 1996

    http://002%20t%20of%20c%20%28under%20revision%29.pdf/http://002%20t%20of%20c%20%28under%20revision%29.pdf/
  • 7/30/2019 bef begin

    2/6

    Int roduction to Data analysis: The Ru les of Evidence Joel H. Levine

    Before I analyze d ata. Before I try to explain anything . Before Icompute a single average or look at a single fact: Who, What, Where,Why, When, an d How? Which means, establish the context. Beforeyou get involved with the detail, ask questions: Who collected thedata? What are the data about? Where , if that is important. Whywere they collected? When , if that is important. How were theycollected? You d ont need to u se a check list ask questions.

    So, for example, in a later chapter I am going to u se U.S. Censusda ta reporting the pop ulations of states of the United States. Theresthe wh o: The U. S. Censu s Bureau . They have a good rep uta tion foraccur acy on total pop ulation, which is whats in these da ta. Forsome kinds of data, the results have known biases but for thesepop ulation counts, this is the best I can get. And theres the wh at:The data describe the population of the states of the United States.Where? The individ ual states. Why? To determ ine repr esentationin the U. S. Congr ess. When ? These da ta were published in 1991,referring to the popu lations in 1990. H ow? The census attempts tocount everyone, every last person in the United States, which,strangely enough, makes the Census less accurate (not more accu-rate) than it would be if it used a carefully selected sample of thepopulation. 1 I don t need the wh ole Who, What, Where, .... Thepoint is to be alert and ask questions.

    As a mn emon ic, think of this as step 0 . In data analysis step twois two variables (the relation between tw o variables). Step one is onevariable extracting information from a single variable like popu-lation size or growth rat e. This is step zero, no variables, the stepbefore the analysis. Step zero is to ask wheth er the data is worthy of my time, whether it is trustworthy, whether it is pertinent: Who,What, Where, Why, When, and H ow?

    1 Curiously, a carefully drawn sample of a population can givemore accurate results than an attempt to look at the entirepop ulation. The reason is a matter of cost and realism. Really, itcosts a lot of mon ey to track dow n every last person . So, if I talk toonly one person in one hundred, I can spend one hundred timesmore money tracking that person down, making sure that thatperson is representative and making su re of my results for that on eperson . So the data from a samp le can be mor e carefully examinedat the same, or lower cost, than d ata from a comp lete enumeration.See ______in Tanur, 1989.

    14

    M acin tosh H D:D A:DA IX:Vo lu m e I:009 Befor e t he Beg in nin g Wed n esd ay, Ju n e 12, 1996

  • 7/30/2019 bef begin

    3/6

    Before the Beginning: Who, What, W here, Wy, w hen, and How

    OPINION : Which Environm ental Problems do We Think are Most

    SeriousExtremely Very

    Serious Serious

    H azardous and toxic waste 47% 42%Oil spills 48 36Air pollu tion 36 44Damage to the earths atmosphere 39 40Solid w aste d isposal 38 41Nuclear waste 43 35Contaminated d rinking water 38 39Destruction of forests 39 37Threats to endangered species 26 41Use of pesticides 22 38World population growth 25 32Global warming 22 34Inefficient energy use 17 39Reliance of fuels like coal and oil 29 34Economic development of natural 17 33

    wetlandsRadon gas 11 24Indoor air pollu tion 7 20

    From The Environmental A lmanac , Simon and Schuster, New York,1992, page 11.

    Figure 1

    U.S Attitudes Toward Environmental Problems

    15

    Wed n esd ay, Ju n e 12, 1996 M acin tosh H D:D A:DA IX:Vo lu m e I:009 Befor e t he Beg in nin g

  • 7/30/2019 bef begin

    4/6

    Int roduction to Data analysis: The Ru les of Evidence Joel H. Levine

    Why do I ask questions? Because Im skeptical. BecauseIm careful. Why so careful? Because this is wh ere you learnthat h omilies like, d ont believe everything you read are all toovaluable. To make the point, let me show you some data thatfailed step zero. This is da ta I chose not to analyze let meshow you w hy not: Prep aring myself to write, I said to myself,What wou ld people be interested in? What am I interested in?Ah, lets get some data on th e environm ent.

    So I went to my local bookstore and looked around , think-ing Get some data sources that everyone can get their handson. I looked through the almanacs, people who teach dataanalysis tend to collect almanacs, and there was a new one: The1992 Information Please Environmental Almanac , compiled byWorld Resources Institute, Hou ghton Mifflin, 1992. Ah, Ithought, just the ticket, and I thumbed through it looking fornumbers.

    Her es one set of nu mbers, repro du ced in Figure 1. This isthe kind of thing I w as looking for. But then I remembered : Doas you teach. Youre trying to teach them that d ata analysis isnot about numbers, it uses nu mbers. So ask qu estions. Wheredoes this stuff come from ? Who, What, Where, Why, When ,and How?

    Do as you teach, that slowed m e dow n. Lets see, the A lmanac tells me: Source: Environm ental Op inion Stud y. Iwonder w hat that is.

    Looking through the text for an answer to my question, Ifind it is A 1991 poll conducted for Environmental OpinionStudy, a nonprofit organization established to provide data onpu blic attitudes on the environment And now Im in trouble.Someone is trying to get past me with buzz words and puffery.The text flashes the phrase non-profit, implying something orother. It uses the word d ata, and it specifies pu blic atti-tud es. So far, the text has used a string of w ord s to tell me thesource, but the w ords have told m e nothing.

    So now Im asking qu estions and Im on full alert: When

    there is one loose thread in the credibility of a source, look forothers. And so, looking more carefully at these data, the thing

    16

    M acin tosh H D:D A:DA IX:Vo lu m e I:009 Befor e t he Beg in nin g Wed n esd ay, Ju n e 12, 1996

  • 7/30/2019 bef begin

    5/6

    Before the Beginning: Who, What, W here, Wy, w hen, and How

    begins to unrav el: Do they give me enough information so that Ican find the original source and check for myself? N o. Any sec-ondary report (a report using information from another source)must give me enough information so that I can check the pri-mary report for myself, if I choose to but this report offersbarely a clue. And now that Ive seen the A lmanac try to get pastme with evasive terms, like p ublic attitud es, Im even morealert. So I ask Which pu blic?, Who are these peop le? N oanswer.

    More alert, I look at the numbers. Oops, the numbers arepercentag es. Percentages of wh at? percentages of 100 peop learound the office of Environmental Opinion Study, percentagesof a representative sample of 1,000 adults randomly sampledfrom the U.S. pop ulation? Percentage of wh at? Who know s?And I look again, noting details, noting that the vocabulary isodd . These are not the words and rh ythm of standard Am ericanspeech too formal. So I wonder, how were the questions put?Did the interviewer ask What problems do you think are seri-ous? Or did the interviewer ask Do you think hazardousw aste is serious? It makes a difference: If it was the latter, thenthe interviewer might as well have asked whether hazardouswaste is hazard ous. Who could say No to that?

    And now, as Ive kept testing the credibility of these num-bers, the whole thing has come apart as I look at the first row of numbers and wonder about 47% plus 42%, that is, 89% sayingtoxic waste is seriou s? Really? Eight-nine percent , eighty nine ou tof one hun dred p eople of wh at popu lation? Do I believe that for any pop ulation? Frankly, no. And can I qu ibble w iththese pub lished d ata? You bet I can, par ticularly because thewriters have made it all but impossible for me to re-assuremyself. So, in truth , these num bers arent da ta, theyre some sortof num erical decoration taking up sp ace. The stuff looks likedata but, really, weve been asked to take the n um bers on faith.And thats not the way to deal with controversial issues.

    So, whats the moral of the story? Before the beginning,

    Who, What, Where, Why, When, an d How. You do that to avoidbeing fooled. And w hen you write you mu st provide that in-

    17

    Wed n esd ay, Ju n e 12, 1996 M acin tosh H D:D A:DA IX:Vo lu m e I:009 Befor e t he Beg in nin g

  • 7/30/2019 bef begin

    6/6

    Int roduction to Data analysis: The Ru les of Evidence Joel H. Levine

    formation if you you rself wa nt to be taken seriously. For all Iknow this environmental opinion study is great stuff. Maybe,somewhere in th e book, there is even a footnote that answ ers allmy q uestions. But, if it is great stuff, then its also a great p itybecause the authors have sabotaged their own h ard w ork. Theydidnt precede their data analysis with a solid foundation, beforethe beginning and, so, they might as well not have bothered w iththe rest.

    Reading:

    How to Lie w ith Statistics , Darrel Hu ff, Chap ter 1, The Samp lewith the Built-in Bias

    ________________ in Tanu r, ~~ W hy sam ples are m ore accuratethan counts.

    18

    M acin tosh H D:D A:DA IX:Vo lu m e I:009 Befor e t he Beg in nin g Wed n esd ay, Ju n e 12, 1996