23
Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications 1 Statistical Exercise #1: An introduction to statistical analysis using SDA Welcome to the world of data analysis! In the data analysis exercises, we will be exploring some of the issues we read about in our class text. These exercises are designed to introduce you to the idea of using non-complicated statistics to engage in some preliminary testing of some of the theories in this book. Don’t worry, testing theoretical ideas with real data is actually an interesting task that some of my students have described as a decent way to spend an afternoon. The setup of these exercises is consistent; after completing a number of guided activities, you will be able to conduct similar analyses on your own. There are two homework assignments at the conclusion of each exercise. The first is a general assignment that asks questions about the guided activities. The second ("further exploration") is an optional set of exercises designed to facilitate your further exploration of the researched relationships. In short, these exercises add a "hands-on" component to the study of theories of crime. This first exercise is designed to introduce you to statistics and to allow you to do some data exploring. The dataset we’ll be using is study #2833, the National Survey of Adolescents in the United States, 1995. Dean Kilpatrick and Benjamin Saunders collected these data in an effort to examine relationships between a variety of personal characteristics and victimization experiences in a sample of youth. To collect their data, Kilpatrick and Saunders surveyed 4,023 boys and girls aged 12-17 by telephone. Their dataset is a rich one and can be used to test several theories in this text. You can read a more detailed description of the study and gain access to the online options at: http://www.icpsr.umich.edu/cocoon/NACJD/STUDY/02833.xml The citation for this study is: Kilpatrick, Dean G., and Benjamin E. Saunders. NATIONAL SURVEY OF ADOLESCENTS IN THE UNITED STATES, 1995 [Computer file]. ICPSR version. Charlestown, SC: Medical University of South Carolina [producer], 1999. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2000. The National Archive of Criminal Justice Data (NACJD) and the Inter-university Consortium for Political and Social Research (ICPSR) have done us a favor by storing the datasets where the public can get to them for research purposes and by hosting an online analysis engine called Survey Documentation and Analysis (SDA) so that individuals without access to a statistics program can utilize the data. SDA is easy to use and allows users to sidestep the sometimes-complicated procedures necessary to get datasets into a format that can be read by their particular statistics software. It also allows users to conduct analyses without installing any software on their computers. We’ll be using this online resource to do some simple analyses that will help shed some light on the theories in our book. Using the tabs on the original study page above, you can “browse documentation” (i.e., look through the study’s documents such as the codebook, “download data” yourself and work with them in SPSS or another statistical package, “analyze and subset” the data (more on this below) or look at a list of “related literature” (i.e., at writings based on analyses of a particular dataset) by clicking on the corresponding tabs above the detailed description (near the top of the page, but below the NACJD banner). For this exercise and the following ones, we are going to take advantage of the online analysis engine; you can get to that by clicking on the “analyze and

Statistical Exercise #1: An introduction to statistical analysis using SDA

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

1

Statistical Exercise #1: An introduction to statistical analysis using SDA

Welcome to the world of data analysis! In the data analysis exercises, we will be

exploring some of the issues we read about in our class text. These exercises are designed to introduce you to the idea of using non-complicated statistics to engage in some preliminary testing of some of the theories in this book. Don’t worry, testing theoretical ideas with real data is actually an interesting task that some of my students have described as a decent way to spend an afternoon. The setup of these exercises is consistent; after completing a number of guided activities, you will be able to conduct similar analyses on your own. There are two homework assignments at the conclusion of each exercise. The first is a general assignment that asks questions about the guided activities. The second ("further exploration") is an optional set of exercises designed to facilitate your further exploration of the researched relationships. In short, these exercises add a "hands-on" component to the study of theories of crime.

This first exercise is designed to introduce you to statistics and to allow you to do some data exploring. The dataset we’ll be using is study #2833, the National Survey of Adolescents in the United States, 1995. Dean Kilpatrick and Benjamin Saunders collected these data in an effort to examine relationships between a variety of personal characteristics and victimization experiences in a sample of youth. To collect their data, Kilpatrick and Saunders surveyed 4,023 boys and girls aged 12-17 by telephone. Their dataset is a rich one and can be used to test several theories in this text. You can read a more detailed description of the study and gain access to the online options at: http://www.icpsr.umich.edu/cocoon/NACJD/STUDY/02833.xml The citation for this study is: Kilpatrick, Dean G., and Benjamin E. Saunders. NATIONAL SURVEY OF ADOLESCENTS IN THE UNITED STATES, 1995 [Computer file]. ICPSR version. Charlestown, SC: Medical University of South Carolina [producer], 1999. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2000.

The National Archive of Criminal Justice Data (NACJD) and the Inter-university Consortium for Political and Social Research (ICPSR) have done us a favor by storing the datasets where the public can get to them for research purposes and by hosting an online analysis engine called Survey Documentation and Analysis (SDA) so that individuals without access to a statistics program can utilize the data. SDA is easy to use and allows users to sidestep the sometimes-complicated procedures necessary to get datasets into a format that can be read by their particular statistics software. It also allows users to conduct analyses without installing any software on their computers. We’ll be using this online resource to do some simple analyses that will help shed some light on the theories in our book. Using the tabs on the original study page above, you can “browse documentation” (i.e., look through the study’s documents such as the codebook, “download data” yourself and work with them in SPSS or another statistical package, “analyze and subset” the data (more on this below) or look at a list of “related literature” (i.e., at writings based on analyses of a particular dataset) by clicking on the corresponding tabs above the detailed description (near the top of the page, but below the NACJD banner). For this exercise and the following ones, we are going to take advantage of the online analysis engine; you can get to that by clicking on the “analyze and

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

2

subset” tab then clicking on the link designated for the study (towards the bottom of the webpage), or by jumping directly to http://www.icpsr.umich.edu/cgi-bin/bob/newark?study=2833;path=NACJD.

Before you can engage in any analysis or even move to some of the other tabs visible for this dataset, you will have to create a MyData account on ICPSR or log in as a guest. If you already have a MyData account, simply login, but few students have pre-existing accounts on this service. Those who feel they may want access to more features on ICPSR may chose to create a MyData account. Another option is to simply login anonymously as a guest. Here’s an information page on MyData: http://webapp.icpsr.umich.edu/cocoon/NACJD-FAQ/0044.xml, but I’m going to assume that you want to log in as a guest just to do these exercises. To do that, click on the button toward the bottom of the page that says “Log in anonymously.” Guest logins are valid for nine hours or until you logout, so you may have to login more than once to complete your analyses if you reboot your computer or get kicked off the Internet.

Once you have logged in using the guest or account feature, you will be taken to a “Terms of use” page. This is an important page because it details how users are not to attempt to ascertain identities or to misuse the data hosted by NACJD or ICPSR. This helps NACJD and ICPSR maintain one of the most important ethical canons in research, keeping research respondents’ identities confidential. Data that are posted online have all identifiers removed, but some researchers worry that individuals could use multiple data sources to narrow in on the identity of particular individuals who have unique identifiers. In one of my classes many years ago, for example, there was a woman who was Togiak (a tiny Alaskan tribe with only a few hundred members) and the only Mormon in the classroom. Class surveys would have been impossible because even with more than 200 students, it would have been easy to pick out her individual responses meaning that her privacy would be invaded every time survey results were shared. To combat this possibility, datasets that are posted online are “cleaned” so that others cannot use them to figure out who someone is and attribute any particular survey answers to that individual. So, the Togiak woman in my class would likely be listed as “other” for religion (due to the small numbers of Mormons) and “other” for ethnicity (due to the small numbers of Native Americans, particularly Togiak people). So, for tables printed in publications, a reader would have difficulty isolating her particular responses. A problem could still occur, however, if individuals downloaded data containing her responses, because she might be the only female with “other” for both religion and ethnicity. Very few individuals are as unique as she is in terms of identifiers, but situations like this one led ICPSR and NACJD to make researchers who use their datasets agree not to use their data for such purposes. If your answers were in a survey about something sensitive (e.g., attitudes toward abortion or the death penalty), you would want to know that individuals are enjoined from using the resulting data to harass you. There are other ways of misusing the datasets, too. So, read over the agreement and click the “I agree” button to proceed. You will have to do this every time you log into the system.

You can now see the SDA analysis screen, as shown below. If you want to review it, a full set of documentation for SDA is available at: http://sda.berkeley.edu/man3h/uindex.html.

Graphic #1: The below screen is visible when you launch SDA

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

3

Let’s first get the feel for this system. Across the top of the SDA page are an area to click

if you want the “classic interface” which was used until recently (ignore that unless you are familiar with and prefer that interface) and the title of the study whose data you are analyzing. The five buttons below those areas are: *Analysis- use this button to chose the statistical procedures you want to use (e.g, “run frequency or crosstabulation”). *Create Variables- use this button to recode or compute (create) new variables or to list any new variables you have created. *Download- use this button to either download the dataset and its documentation or to download a customized subset. You will not need these options for any exercise in this book, but it’s good to know those options are there in case you want to do more advanced or offline analyses using statistical software of your own. *Codebook- this handy button allows you to get more information about the variables in the dataset you are analyzing. If you click on the button, a description of the study will pop up in a new window. Along the left-hand side of that window are “group headings” which allow you to narrow in on particular areas of interest, a “standard variable list” which lists all the variables in the order in which they occur in the data set, and an “alphabetical order” button that creates a handy list of the variables in alphabetical order. *Getting Started- this button opens up more information on SDA and how to use it. Running a frequency table

When you first open up SDA, it displays the interface for running frequencies or crosstabulation tables. (NOTE: If you do not have Java enabled, you might get the “classic interface”; just click on “Run frequency or crosstabulation” then click the “Start” button to begin). Let’s run a simple frequency to get a feel for SDA. To run a frequency or crosstabulation (more on crosstabulation tables later), you should use the interface on the right hand side of the screen (labeled “SDA frequencies/crosstabulation program”). That interface on the left is for use in viewing variables in order to make decisions about which ones to use. You can also use it to run simple frequencies if you wish.

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

4

There are two ways to get a frequency using the interface on the right. The first is to simply type the name of the variable into the box beside “Row.” Let’s try that by typing the variable “GENDER” (without quote marks) into the box (like shown below) and hitting the enter (or return) key on your computer. Note that case doesn’t matter. In the writeups, I will capitalize the variables to set them apart from the other text, but you may use upper or lower case when you type the variable names in.

Graphic #2:

SDA Analysis screen with variable GENDER entered to obtain frequency table

Viola! A window with our frequency pops up, showing the below output (note: the graph

is omitted from the below output): NATIONAL SURVEY OF ADOLESCENTS IN THE UNITED STATES, 1995

Variables

Role Name Label Range MD Dataset

Row GENDER GENDER 1-2 1

Frequency Distribution

Cells contain: -Column percent -N of cases

Distribution

1: Male 50.2 2,018

2: Female 49.8 2,005

GENDER

COL TOTAL 100.0 4,023

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

5

Allocation of cases

Valid cases 4,023

Total cases 4,023

Datasets

1 /SDA/NACJD/02833-0001

2 /SDA/NACJD/02833-0001

The output provides a lot of material to help us make sure we have the variable we wanted. Notice the name of the study at the very top; this is very handy if one is testing ideas using multiple datasets. We’ll be using several datasets throughout this book, so you might find this information useful if you find a confusing stack of printouts sitting on your desk that you need to organize. The next important box contains the header “variables” and provides a list of all the variables involved in the statistical procedure you just ran. When you’re running frequencies, it’s not all that useful, but when you have three or more variables, it’s nice to have these summary descriptions in one location. The first column of the “Variables” table lists the “role” of the variable. Since this is a frequency, it’s a “row” variable. When we do crosstabulation tables, you’ll see “column,” and sometimes “control” and “filter” (we’ll talk about these three later). The second column of the table contains the variable name or the name that the computer uses to keep track of the variable for which we’re running the frequency, which just happens to be “GENDER” in this case. The text in the third column is the variable label. In this case, columns two and three are the same (“GENDER”) but they often differ, as we’ll see later. Usually, the label contains much more information than the name, which is typically limited to eight letters. The fourth column is the “range” of the variables or the lowest and highest value that are included in this particular statistical run. This information is useful when “eyeballing” the results to help add context to the study. The fifth column, labeled “MD” lists the code assigned to missing data, if applicable, and the sixth column, labeled “Dataset” simply lists the dataset number.

Then, we get to the actual numerical breakdown. A few more than half of the respondents were male; 50.2% of the sample was male, or 2,018 of the 4,023 respondents. The remainder was female (versus not providing any gender at all); 49.8%, or 2,005 of the 4,023 subjects. This is quite close to the gender breakdown for the population in 1995, which was 51% female, 49% male (see http://www.census.gov/popest/archives/1990s/nat-agesex.txt for the numbers I used to do these calculations).

The next displayed item is a graph. The default graph is a stacked bar chart. Personally, I find those less helpful than some of the other options, such as “bar chart” and “line chart,” but I wanted to let you see one of these. To change the type of chart you get, simply scroll down in the interface on the left hand side of the screen and click the dropdown box beside “Type of chart.” You can change other settings, too, such as the orientation, size of the graph, etcetera. You should spend a few minutes tinkering with this section to get a feel for the many types and styles of graphs you can display. Personally, I prefer just the numerical output because the charts can actually be confusing if they are not set up properly and they waste a lot of ink to print. Instead, you might consider using the graph feature to create a masterpiece or two to include in a paper or

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

6

other formal report of your research. To get pointers on the graphs and their elements, click on the highlighted links in the interface.

The final two boxes are formalities, but can be useful. The box labeled “Allocation of cases” simply lets you know how many cases were included in this particular output. While that total is also displayed in the bottom right hand column of the numerical table, the information in this box also explains why cases were excluded from the numerical table if that is true. For our gender table, all 4,023 cases/respondents in the study are included so there is no explanation. But, assume 100 of the respondents did not answer the question about gender or their handwriting could not be read on a written survey; those 100 cases would have invalid values for gender and would not be included in any numerical tables for this reason. This box would contain an explanation like “Cases with invalid codes on row variable – 100” so we would know about that. For some variables, the number of cases with invalid codes may exceed the number of those with valid data, so it can be important to know why cases were excluded from our tables. When we use filters to limit which cases are displayed (e.g., if we run a frequency for just the males in a dataset), this box will include that information, too. The final box, labeled “Datasets” simply includes the internal NACJD dataset numbers for the studies used in making this particular output.

Now, let’s look back at the numerical table itself, which is the central part of this output. From this table, we learned that there are roughly the same number of boys and girls in the dataset. That table wasn’t all that hard to create, so let’s run another frequency to learn more about this dataset. But, first, let’s make a couple of changes to improve the quality of our output. To make the changes, scroll down to the sections labeled “Table options” and “Chart options.” First, add a check to the box beside “Question text” so we can get the question wording when it’s available. Then, add a check to the box beside “Statistics.” For frequencies, SDA will display means, standard deviations and other statistics. Watch out, though. The statistics are automatically generated without regard for the level of measurement, so some can be essentially meaningless; for example, you cannot calculate a mean gender or a gender standard deviation. SDA provides these statistics because it cannot classify variables and know which statistics are appropriate, so it relies on the user (that’s you) to make those classifications. Ignore the mean and standard deviation for all nominal variables such as gender or race or other variables than cannot be meaningfully ranked. For nominal variables, the mode, minimum, maximum, and range are appropriate. If you had run a frequency of age (which is a ratio level variable when it’s recorded in years), all the generated statistics would be appropriate. If you run a crosstabulation table, you will get some very useful statistics, including some that will allow you to ascertain the strength of a relationship between variables. The final change is to change the “Chart options” to “no chart” (if you prefer to see the charts, change it to a format you feel is most helpful). Click the “Run the table” button to see the differences the changes made. Once you have set the options, you do not have to change them for the remainder of the session. A screenshot showing those changes appears below:

Graphic #3: SDA Analysis screen with options changed to improve output

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

7

Now, let’s get that second frequency run. Go back to the top part of the interface and type the variable “RACE” (without quote marks) into the box beside “Row” either hit return or click the “Run the table” button at the bottom of the screen. That will cause the following table to be displayed. SDA 3.1: Tables NATIONAL SURVEY OF ADOLESCENTS IN THE UNITED STATES, 1995

Variables

Role Name Label Range MD Dataset

Row RACE RACE 1-7 1

Frequency Distribution

Cells contain: -Column percent -N of cases

Distribution

1: White, Non-Hispanic 68.32,746RACE

2: African American 14.2572

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

8

3: Hispanic 9.7390

4: Native American 3.4135

5: Asian 1.767

6: Other Race 1.040

7: Refused 1.873

COL TOTAL 100.04,023

Summary Statistics

Mean = 1.66 Std Dev = 1.25 Coef var = .75

Median = 1.00 Variance = 1.56 Min = 1.00

Mode = 1.00 Skewness = 2.39 Max = 7.00

Sum = 6,686.00 Kurtosis = 5.92 Range = 6.00

Inference about the mean:

Std Err = .02 CV(mean) = .01

No text for variables

Allocation of cases

Valid cases 4,023

Total cases 4,023

Datasets

1 /SDA/NACJD/02833-0001

2 /SDA/NACJD/02833-0001

From this output, we learn that more than two-thirds of the respondents were white (68.3%, n=2,746), 14.2% (n=572) were African American, 9.7% (n=390) were Hispanic, 3.4% (n=135) were Native American, 1.7% (n=67) were Asian, 1% (n=40) were some other race, and 1.8% (n=73) refused to answer the question. Because RACE is a nominal variable, we again ignore the mean and standard deviation and other statistics except for those appropriate for nominal variables. I looked up census estimates for 1995 and learned that 73.6% of the U.S. population was non-Hispanic whites, 12% was African American, 10.3% was Hispanic origin, .7% was Native American, and 3.4% was Asian [those figures are available at: http://www.census.gov/popest/archives/1990s/nat-srh.txt]. This shows that the racial breakdown

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

9

for the survey respondents is not all that far off from the racial breakdown in the U.S. at the time of the survey.

You might have noticed the box containing “No text for variables.” That means this particular variable does not have additional information. We’ll soon encounter some that do. But, for now, let’s run another frequency. The first two frequencies were easy to do because you knew what the variable names were. But, try running the frequency for AGE. Oops, we get an error message: “Cannot find ‘AGE’ in specified datasets.” It appears that this method only works when we already know the variable name. Since we know that age is (or should be) in the dataset, we’ll have to use an alternative method for obtaining its variable name. You may have noticed that box of information under the boxes where we typed in the variables for which we wanted frequencies. Right under the interface box, we can see “National Survey of Adolescents in the …” followed by sub-folders reading “Identification,” “Analysis weight”, “Geographic” etc. Just for the sake of curiosity, click on the sub-folder that reads “Identification” (you might have to click it twice to get it to open). Inside that folder, four variables are listed (i.e., CASEID, ID, SAMID, and SAMTYPE), each followed by short descriptions. Click on the top variable. As soon as we click on CASEID, that variable name appears in the box above the folders, labeled “Selected.” So, we could look in those sub-folders for suitable variables. Rather than waste part of the afternoon looking in all the sub-folders, scroll down to the one that reads “Respondent (adolescent) demographic…” and open it. Age, gender, and race are all demographics, so maybe it’s in there. Aha, it’s the top variable, named “S1.” Click on it to pop it into the box. Then you can click the “view” button to display a quick frequency or you can click the “Row” button under the variable name to move it into the “Row” box on the interface on the right, and then hit return. That yields the below table: SDA 3.1: Tables

NATIONAL SURVEY OF ADOLESCENTS IN THE UNITED STATES, 1995

Variables

Role Name Label Range MD Dataset

Row S1 AGE 12-17 99 1

Frequency Distribution

Cells contain: -Column percent -N of cases

Distribution

12 14.3 576

13 17.1 685

S1

14 18.5 744

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

10

15 18.2 733

16 17.0 682

17 14.9 597

COL TOTAL 100.0 4,017

Summary Statistics

Mean = 14.51 Std Dev = 1.64 Coef var = .11

Median = 15.00 Variance = 2.68 Min = 12.00

Mode = 14.00 Skewness = .00 Max = 17.00

Sum = 58,289.00 Kurtosis = -1.17 Range = 5.00

Inference about the mean:

Std Err = .03 CV(mean) = .00

Statistics exclude missing-data and out-of-range values.

Text for 'S1' How old are you?

Allocation of cases

Valid cases 4,017

Cases with invalid codes on row variable 6

Total cases 4,023

Datasets

1 /SDA/NACJD/02833-0001

2 /SDA/NACJD/02833-0001 Notice that the “Row” and “Label” columns in the second box do not match as they did

for GENDER and RACE. Instead, the first column contains “S1” (the variable name) and the second contains “Age” (the short variable label). And, if you scroll down to the box below the statistics, you’ll see the actual question that was used to obtain the respondents’ ages. Notice also that six cases were excluded from the table due to invalid codes; six out of 4,023 isn’t many; we would be more worried if a substantial number of cases were missing or if we suspected that the missing cases have something in common (e.g., if victims of crime were less likely than other respondents to provide their ages). A quick perusal of the numerical table shows that the youth are aged 12-17, with 14.3% to 18.5% of the sample falling into any single age category. This

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

11

appears to be a fairly well balanced sample of respondents. Let’s do one more demographic frequency, so you can learn another way to find variable names. Remember that “Codebook” button at the top of the screen? We’ll use that to access an easy-to-use method for finding variable names. Click on it, and then click on the “Standard variable list” link on the left-hand side. That will produce a fairly easy to read listing of variables. This study’s data are broken into groupings, such as “Respondent demographics/characteristics,” “Community/government views,” “Observation of violence,” and more. If you click on any of the variable names (to the left of the descriptions), you’ll get a quick frequency table for that variable. Scroll down to “Observation of violence” and locate the variable name for “Seen someone actually shoot someone.” Click on the variable name (in this case, “Q1A”). The output appears below and is a quick frequency. You could also use the “Codebook” feature to search for a variable. Assume, for example, that you wanted to locate variables related to witnessing crime; you could use the search function for your Internet browser (typically control-F) and search all the variable labels for those containing the word “seen someone” (try that for fun to see all the different measures somehow related to observing crime). As an extra feature, you can quickly peruse the frequencies generated using the “Codebook” button for other variables simply by scrolling up or down after you display a frequency. This additional feature only provides quick frequencies, even if you are doing higher order statistics, but it’s quite handy for quick examinations of datasets. The tables look a little different from the ones generated using the interface, but all of the required elements are there and the numerical breakdown is the same.

Q1A SEEN SOMEONE ACTUALLY SHOOT SOMEONE

Description of the Variable

Have you ever-seen someone actually shoot someone else with a gun?

Percent N Value Label 5.3 213 1 Yes 94.7 3,807 2 No 0.0 2 4 Refused 1 9 Unknown 100.0 4,023 Total

Properties Data type: numeric Missing-data codes: 9 Mean: 1.95 Std Dev: .23 Record/column: 1/43

Looking at this table, we see that 5.3% (n=213) of the youth in the sample had witnessed a shooting. Two youth declined to answer the question and one did not know whether s/he had seen a shooting. During analyses, we would discard these three individuals because there is really no way to know whether any of them witnessed a shooting. We’re going to play with this

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

12

variable a bit to learn how to run crosstabulation tables. This particular exercise isn’t tied to one of the readings in the book, but it will shed some light on an important issue, the observation by youth of shootings. For now, we’ll confine ourselves to the four variables we have explored (i.e., GENDER, RACE, S1, Q1A ). In future exercises, we will use other variables (and other datasets) to explore theories.

Review #1: How to generate a frequency table in SDA

1. Open SDA for the dataset you are using. For this exercise, you can go to:

http://www.icpsr.umich.edu/cgi-bin/bob/newark?study=2833;path=NACJD. 2. Insert the name of the variable for which you want a frequency, by either typing it

into the box beside “Row” on the right hand side of the interface or by searching for it using one of the two methods described in the text above (navigating to the appropriate subfolder beneath the “Mode” selection area or by clicking on the Codebook at the top of the screen and locating a suitable variable).

3. Click on the “Run the table” button or hit enter while the cursor is in a text box. Your frequency will appear in a popup window.

The quick and dirty on crosstabulation tables In their simplest form, crosstabulation tables allow us to break out one variable by another. You've seen many of these before, but may not have known the name for them. When you see political polls broken out by party or gender, you're actually examining crosstabulation tables. Most crosstabulation tables, including all of the ones we will look at in this chapter's exercises, present two variables; one variable appears at the top of the table and the other variable is on the left-hand side. For our first crosstabulation table, let’s look at whether boys or girls are more likely to have witnessed a shooting. I picked this particular crosstabulation table because it will be the easiest to run and interpret, making it a good starting point for us. Turning our attention to the SDA page, look at the interface box on the right, headed by “SDA Frequencies/Crosstabulation Program” (the same box we used to generate the frequencies). For now, ignore the headings and move your attention to the next row, appropriately titled “Row.” This is where we put the dependent variable, that is, the variable we are trying to predict or explain. Since we want to know which gender witnesses the most violence, witnessing a shooting (i.e., Q1A) is the dependent variable. GENDER, in this case, is the independent variable, or the variable that we are using to predict or explain the dependent variable. An easy way to keep them straight is that, in some ways, the dependent variable “depends on” or “is dependent on” the independent variable. So, we need to put Q1A in the box labeled “Row” and GENDER in the column box, like shown below. My shorthand for crosstabulation tables will be INDEPENDENT VARIABLE

DEPENDENT VARIABLE, in this case GENDER Q1A.

Graphic #3: SDA Analysis screen with variables GENDER and Q1A entered to obtain crosstabulation table

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

13

Now, scroll down to the “table options.” You’ll notice that “Column” is already checked under “percentaging” and that “Color coding” is already checked. Unless you’ve already done so, add a check to the box labeled “Statistics with 2 decimal(s)” and “Question text” since we’ll want some statistics and labels, and change the type of chart to no chart. A screenshot of those changes is shown below:

Graphic #4: SDA Analysis screen with table options marked for crosstabulation table

Then, click the button at the bottom of the page that says, “Run the table.” That yields the below output. SDA 3.1: Tables NATIONAL SURVEY OF ADOLESCENTS IN THE UNITED STATES, 1995

Variables

Role Name Label Range MD Dataset

Row Q1A SEEN SOMEONE ACTUALLY SHOOT SOMEONE 1-4 9 1

Column GENDER GENDER 1-2 1

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

14

Frequency Distribution

GENDER Cells contain: -Column percent -N of cases 1

Male 2

Female ROW

TOTAL

1: Yes 5.9 120

4.6 93

5.3213

2: No 94.0 1,896

95.3 1,911

94.73,807

4: Refused .0 1

.0 1

.02

Q1A

COL TOTAL 100.0 2,017

100.0 2,005

100.04,022

Means 1.94 1.95 1.95

Std Devs .24 .22 .23

Color coding: <-2.0 <-1.0 <0.0 >0.0 >1.0 >2.0 Z

N in each cell: Smaller than expected Larger than expected

Summary Statistics

Eta* = .03 Gamma = .13 Chisq(P) = 3.45 (p= 0.18)

R = .03 Tau-b = .03 Chisq(LR) = 3.46 (p= 0.18)

Somers' d* = .01 Tau-c = .01 df = 2

*Row variable treated as the dependent variable.

Text for 'Q1A' Have you ever-seen someone actually shoot someone else with a gun? Text for 'GENDER' Respondent's sex.

Allocation of cases

Valid cases 4,022

Cases with invalid codes on row or column variable 1

Total cases 4,023

Datasets

1 /SDA/NACJD/02833-0001

2 /SDA/NACJD/02833-0001

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

15

In case you’ve never seen a crosstabulation before, let’s work through this one in detail. First, you should verify the orientation of the table-- your independent variable should be along the top (like GENDER is for the above table) and your dependent variable should be along the side (like Q1A). Then, glance to make sure the “col total” percentages (the column totals) equal 100 for every column. That means your table is set up correctly. Whenever you run crosstabulation tables, you should take a moment to make certain they are oriented and percentaged correctly to minimize errors (note: if you ever see crosstabulation tables with the variables switched, the “row total” percents should be 100 because you always percentage through your independent variable). Now let’s look at the numbers in the crosstabulation table. I find that it’s easiest to tackle a crosstabulation table row by row. Let’s start with the first row, which contains all the respondents who answered “yes” to Q1A (i.e., they reported seeing someone shot by another person). The first column (which should be a dark pink in your crosstabulation) contains the numbers 5.9 over 120. The 5.9 means that 5.9% of males (notice the number is in the “male” column) answered “yes” to Q1A; the 120 simply means that 120 males answered “yes” to Q1A. When we are doing crosstabulation table comparisons, the percents are more important than the frequencies. The second column of that row (which should be turquoise in your crosstabulation) contains 4.6 and 93, meaning that 4.6% of the females (i.e., 93 females) answered “yes” to Q1A. That third row contains the total percents for each row, meaning that 213 or 5.3% of all the respondents answered “yes” to Q1A. The important columns are the ones to the left of the total percent column. So, 5.9% of males and 4.6% of females had seen someone shot. The percent for males is higher, but is it a real finding? More on that below, when we talk about the statistics generated by SDA for our use. Now, let’s look at the second row, which contains all the respondents who answered “no” to Q1A—in other words, those who reported that they had not seen anyone shot. Looking at the numbers in the boxes, we learn that 94% (n=1,896) of males and 95.3% (n=1,911) of females had not seen a shooting. Notice that the color scheme switches, so that the first row is dark pink then turquoise, but the second row is turquoise then dark pink. The reason is because in the first row, females were “underrepresented” (i.e. there were fewer females who said yes than would be expected if there was no relationship at all between the variables GENDER and Q1A) among those who had seen shootings, but they were “overrepresented” in the second row (i.e., there were more females who said “no” than would be expected if there was no relationship at all between the variables). The third column informs us that 94.7% (n=3,807) of all the respondents, regardless of gender, answered that they had not seen someone else shot. Ooops! What is that third row there for (the one labeled “4: Refused”)? It contains the two respondents who declined to answer the question about whether they had seen a shooting. This row causes problems for us in two ways. First, it makes our table look bad and harder to interpret because we don’t want those two individuals included in our analyses. But, more importantly, including them in the table throws off the percentages (very slightly in this particular case because only 2 out of 4,022 youth refused to answer the question1) and the statistics (this is a larger problem). We’ll need to get rid of those individuals using an easy filtering approach. We’ll do that in a few minutes, after I’ve finished explaining the table. The fourth row contains the total percents. Because we chose to percentage by columns, these should all be 100. The fifth row contains the means by column; because Q1a is a nominal 1 Note: An additional respondent is missing in the table because his/her answer wasn’t known so SDA deleted him/her. That brings our total to 4,023, the number of individuals in the sample.

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

16

level variable, means are meaningless. The sixth row contains the standard deviations by column, but they too are meaningless for nominal level measures like Q1a. The seventh & eighth rows contain a handy interpretation chart for the colors used by SDA to code the answers. Darker red numbers are higher than would be expected if there were no relationship at all, while darker blue numbers are lower than would be expected. The next section contains the statistics, but since they are in error, let’s make an easy filter Q1a and rerun them before discussing them or the other remaining sections in the output. Scroll back up to the top of the screen where the “Row” box is. See where we’ve entered Q1A? Go to that box and add the following text “(1,2)” just like shown below.

Graphic #5 SDA Analysis screen with an easy filter added to variable Q1A

What that filter will do is limit our analysis to just those cases for which the answers to Q1A are “1” or “2.” How did we know to use “1” and “2”? From the frequency or crosstabulation table that showed “1” and “2” to be associated with “Yes” and “No,” which are both answers we want included. Not all filters are as easy as this one. We’ll see some more complicated ones later. Hit return or click the “Run the table” button to generate the new crosstabulation table, the changed parts of which are shown below: SDA 3.1: Tables NATIONAL SURVEY OF ADOLESCENTS IN THE UNITED STATES, 1995

Frequency Distribution

GENDER Cells contain: -Column percent -N of cases 1

Male 2

Female ROW

TOTAL

Q1A 1: Yes 6.0 120

4.6 93

5.3213

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

17

COL TOTAL 100.0 2,016

100.0 2,004

100.04,020

Means 1.94 1.95 1.95

Color coding: <-2.0 <-1.0 <0.0 >0.0 >1.0 >2.0 Z

N in each cell: Smaller than expected Larger than expected

Summary Statistics

Eta* = .03 Gamma = .13 Chisq(P) = 3.45 (p= 0.06)

R = .03 Tau-b = .03 Chisq(LR) = 3.46 (p= 0.06)

Somers' d* = .01 Tau-c = .01 df = 1

*Row variable treated as the dependent variable.

Allocation of cases

Valid cases 4,020

Cases with invalid codes on row or column variable 3

Total cases 4,023 You should first notice that the individual percentages changed only slightly. That’s because we only excluded a couple of cases, not enough to change the percentages dramatically. The real differences lay in the “Allocation of cases” box (notice three versus one case were excluded from the new table) and the statistics generated for the crosstabulation table. While the actual statistical values are similar, the “p” for Chisq(P) changed from .18 to .06. The reason is that the additional row was being treated as a valid outcome in the first table, which affects how the statistics are computed. From the numerical table portion of the output, we learn that 6.0% of the males compared to 4.6% of the females reported witnessing a shooting. More males than females witnessed a shooting, but is that a substantial or important difference? The color-coding can help with our assessment by alerting us that the certain classifications are over or under-represented, but we need to look at the statistics to tell whether a relationship is “statistically significant.” Below is a breakdown of the statistics provided by SDA and the levels of measurement for which they are appropriate:

Level of measurement for variable

Appropriate statistics provided by SDA

Nominal SDA only gives us Chi-Square but researchers usually use Phi, Cramer’s V, Lambda and/or contingency coefficient. The two versions provided are Pearson’s Chi-Square (labeled with a “P”) and Likelihood Ratio Chi-

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

18

Square (labeled with an “LR”). Pearson’s Chi-Square is the most common. Likelihood Ratio Chi-Square is calculated slightly differently, using the maximum likelihood estimation, and seldom yield estimates that differ from Pearson’s Chi-Square. For our exercises, you can ignore the Chisq(LR).

Ordinal Gamma, Somer’s d, Tau-b, and Tau-C. More on these later. Interval R (abbreviation for Pearson Correlation Coefficient) and Eta. “df” SDA also provides the “df,” which is simply the degrees of freedom. The

degrees of freedom are calculated by multiplying two numbers: the number of data rows in a table minus 1 and the number of data columns in a table minus one. For our Q1A by GENDER table above, the degrees of freedom is (2-1)(2-1), which is 1x1, which is 1. We use (2-1)(2-1) because there are two data rows and two data columns (the columns and rows containing just labels or totals do not count). We’ll use the df only when writing up the results from our analyses.

Using the information in the above breakdown paired with the knowledge that gender is a nominal variable and Q1A would typically be classified as nominal (in a very technical sense, we could call it ordinal by saying that “yes” is the presence of having witnessed a shooting and “no” is the absence, but let’s treat this variable as nominal because there’s no second-guessing that classification). That means we would use chi-square. The Chisq(P) for the table is 3.45 and the corresponding statistical probability (labeled “p”) is 0.06. Statistical probability sounds scarier than it is. All statistical probability does is alert the researcher (that's you) to the possibility that his/her findings are due to chance rather than due to a real relationship between two variables. Statistical probability ranges from 0 to 1. A statistical probability value of 1 means there is a 100% chance the finding is due to chance so it cannot be trusted at all. A statistical probability value of 0 means there is almost no chance the finding is due to chance so it can be trusted. Very few relationships have significance values of 1 or 0, however, so researchers had to develop a scale of when they would trust their findings. Many years ago, they decided that they would accept any finding with a statistical probability of .05 or lower as statistically significant because that would mean there is only a 5% chance (or lower) that the finding is due to chance. In our GENDER Q1A crosstabulation, on the other hand, the statistical probability is .06, which is higher than .05, so it’s not statistically significant (some researchers would say it’s “trending” because it’s less than .10, but we’re not going to consider that alternative in order to keep these assignments easy to understand). Another way to interpret the statistical probability is to say there is a six percent (6%) chance that our observed relationship (i.e., that males witness more shootings than females) is due to chance. So, while males appear on visual inspection to witness more shootings than females, that finding is not statistically significant because it does not withstand the rigors of scientific scrutiny. So, what do we do with that chi-square value of 3.45? We include it in the write-up along with the degrees of freedom (df), and the significance. When writing up significance levels, we use only certain cutoffs to streamline our writing, so you will see statistically significant probabilities written up as: p<.05, p<.01, p<.001 or p<.0001. You should pick the lowest one that includes your exact statistical probability, so a probability of .023 would be written up as “p<.05” while a probability of .002 would be written up as “p<.01”—the trick is to always make

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

19

sure the description you use accurately describes your exact statistical probability (e.g., .023 is less than .05 and .002 is less than .01). If your exact statistical probability is more than .05, you can use “p>.05” (sometimes written as “n.s.” for “not significant”). Using those cutoffs, here’s a model for a full write-up on the GENDER Q1A crosstabulation table: Six percent (n=120) of males compared to 4.6% (n=93) of females reported having

witnessed a shooting. In chi-square analysis, gender and likelihood of having witnessed a shooting were not significantly related (chi-square=3.45, df=1, p>.05).

NOTE: When doing a write-up for a table, chi-square can be written as “chi-square” as I have done above, as 2, or as X2. I will use “chi-square” in these exercises because it can be easily written by anyone, even those without word processors and to minimize confusion with other statistics that are written using Greek letters. Now, let’s run another crosstabulation, so you can see one with a significant finding and learn how to present those. The remaining variables require some recoding, so I wanted to start with the runs that are easiest to run, which unfortunately didn’t produce a significant finding. Leave “Q1A (1,2)” in the Row box, but change the variable in the Column box to RACE and run the table. Yikes, what a mess! We have seven categories across the top and two down the side. First, we need to get rid of the “Refused” category, and then we should combine some of the smaller groups. I propose a simple white vs. non-white comparison since I’ve seen many of those in other studies. To do that, we need to get rid of category “7” and combine categories “2” through “6.” The formula for doing this would be: RACE (r: 1 "white"; 2-6 "non-white") You’ve seen the parentheses after a variable name before so this isn’t as complicated as it looks. When we wanted to limit our analysis to just those who answered “yes” or “no” to the question about witnessing a shooting, we simply listed the categories we wanted included. Now, we want to collapse or recode a variable so we include an “r” followed by a colon (the “r” stands for “recode”). Then we include all the values that will be in the first newly created category. Since we will be comparing whites to non-whites, we want only the white respondents to be in the first category and everyone else in the second. Each newly created category is separated by a semicolon. The words enclosed in quote marks are technically optional but are a very good idea to include. Whatever you include in the quote marks will be the label for the newly created variable. So, our statement [RACE (r: 1 "white"; 2-6 "non-white")] simply tells SDA to create a new two-category variable out of RACE, including the White respondents in a category we’ve called “white” and the African-American, Hispanic, Native American, Asian, and Other Race respondents into a second category that we called “non-white.” Because we did not include category #7 (Refused) in the recode statement, it will be excluded. [Note: If you cut and paste the recode statements, SDA will sometimes refuse to run the statistics, saying there is an “error in recode specification”; if this happens, simply retype just the quote marks.] Hit return or click the “Run the table” to generate our new crosstabulation table, which is shown below: SDA 3.1: Tables NATIONAL SURVEY OF ADOLESCENTS IN THE UNITED STATES, 1995

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

20

Variables

Role Name Label Range MD Dataset

Row Q1A(1,2) SEEN SOMEONE ACTUALLY SHOOT SOMEONE 1-4 9 1

Column RACE(Recoded) RACE 1-2 1

Frequency Distribution

RACE Cells contain: -Column percent -N of cases 1

white

2 non-white

ROWTOTAL

1: Yes 2.7 75

11.4 137

5.4212

2: No 97.3 2,669

88.6 1,067

94.63,736

Q1A

COL TOTAL 100.0 2,744

100.0 1,204

100.03,948

Means 1.97 1.89 1.95

Std Devs .16 .32 .23

Color coding: <-2.0 <-1.0 <0.0 >0.0 >1.0 >2.0 Z

N in each cell: Smaller than expected Larger than expected

Summary Statistics

Eta* = .18 Gamma = -.64 Chisq(P) = 123.09 (p= 0.00)

R = -.18 Tau-b = -.18 Chisq(LR) = 111.16 (p= 0.00)

Somers' d* = -.09 Tau-c = -.07 df = 1

*Row variable treated as the dependent variable.

Recode for 'RACE' 1 = 1 "white"; 2 = 2-6 "non-white"

Text for 'Q1A' Have you ever-seen someone actually shoot someone else with a gun?

Allocation of cases

Valid cases 3,948

Cases with invalid codes on row or column variable 75

Total cases 4,023

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

21

Datasets

1 /SDA/NACJD/02833-0001

2 /SDA/NACJD/02833-0001 Glancing over the output, you should see a new box, containing our recode for RACE. That’s quite handy in case you print the table then later wonder what categories you included and what values they contained. For this example, it would be rather easy to recreate, but what if you had recoded respondents’ scores on a 100-point assessment scale into “high” and “low.” Even a few days later, it could be very hard to ascertain which cutoff values you used, especially if you tried several different categorizations based on different theories. So, let’s look at and interpret the numerical part of this table. Since we know that adding those who answered “yes” to those who answered “no” will always produce 100% (assuming they are the only two categories in a table), we talk about the table using just one of the two categories, whichever makes the most sense to us. It certainly looks like there’s a difference (11.4% is more than four times 2.7%), but we need to first look to the “p” to see whether the relationship is statistically significant. The “p” is “0.00” which is the lowest statistical probability we can get with SDA. Though the output says “0.00,” it’s not really zero—instead the answer has been rounded off because SDA doesn’t have enough space to print the real statistical probability which could be .000000001 or any other value with more than two zeros to the right of the decimal, including. When we see “p<0.00,” we write it up as “p<.001” in order to acknowledge that there is some probability that the relationship is based on chance. So, we could write up the RACE (recoded) Q1A table as: Non-whites were significantly more likely than whites to report having witnessed a

shooting; 11.4% (n=137) of non-whites reported witnessing a shooting compared to just 2.7% (n=75) of whites (chi-square=123.09, df=1, p<.001).

There are several ways to write up the same table, but you should use the models above until you get the hang of interpreting crosstabulation tables. Now, let’s run the remaining table in this exercise so you can examine and interpret it, using the two models above as guides. First, change the column variable to S1 (the variable name for age) and generate the crosstabulation table. There are no cases that need to be excluded from the analysis, so we have a couple of options about how we would like to proceed. You could simply write up the results as they appear, making sure to include the percents and frequencies for each category or you could collapse the six categories into fewer categories. Notice how the percentages of youth who have witnessed shootings seems to rise with age, but it’s not a clean linear relationship. Instead it drops slightly, then goes up, then drops slightly again. That’s really not a problem because the general trend is there. You can see the trend if you glance at the color coding; see how the 16 and 17-year-olds are bright red, the 15-year-olds are pink, but the younger youth are blue and turquoise? In other words, the older a youth is, the more likely s/he is to have witnessed a shooting. Don’t get too excited about this finding, however, because we would expect that. Remember, every 17-year-old has had five more years in which to witness a shooting than every 12-year-old! But, since this isn’t a statistics class, let’s make this table easier to interpret. Besides, you need some more practice recoding variables.

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

22

You should always have some rationale for how you collapse or recode your categories. Based on cutoffs used in other studies, we could compare the 16 and 17 year olds to the younger youths. Alternatively, we could separate out those who are junior high school age (12-14) versus senior high aged (15-17). Or, we could compare the twelve-year-olds to the teenagers (ages 13-17). In fact, there are quite a number of logical comparisons we could make based on our particular needs, but let’s compare the high school aged youth to the junior high school aged respondents. What will your recode statement look like? Try your recode and see if the numerical and statistics portions of your output match the below sections. If they don’t, you need to redo your recode. If you can’t figure it out, I have included the correct recode below the table, but you should first try to figure it out on your own. SDA 3.1: Tables NATIONAL SURVEY OF ADOLESCENTS IN THE UNITED STATES, 1995

Frequency Distribution

S1 Cells contain: -Column percent -N of cases

1 junior high aged

2 high

school aged

ROWTOTAL

1: Yes 3.4 68

7.2 145

5.3213

2: No 96.6 1,937

92.8 1,864

94.73,801

Q1A

COL TOTAL 100.0 2,005

100.0 2,009

100.04,014

Summary Statistics

Eta* = .09 Gamma = -.38 Chisq(P) = 29.23 (p= 0.00)

R = -.09 Tau-b = -.09 Chisq(LR) = 29.87 (p= 0.00)

Somers' d* = -.04 Tau-c = -.04 df = 1

*Row variable treated as the dependent variable. Here’s the recode that generated the above output sections: S1 (r: 12-14 "junior high aged"; 15-17 "high school aged") Now that you have created the crosstabulation table, try interpreting it. Remember, the easiest way is to look at each value of the dependent variable, one at a time. So, in the AGE (recoded)

Q1A crosstabulation, you will want to look at the percentage of individuals who said they had witnessed a shooting. First, note what percent of junior high school aged respondents said they had witnessed a shooting, then the percentage of high school aged individuals who had witnessed

Exercises created by Jon’a Meyer specifically for use with Crime: Readings, 3e by Crutchfield et al. © SAGE Publications

23

a shooting. Do the percentages differ? Which is higher? What is the chi-square value and is it significant? If you have problems with this table, go back and go over the first two crosstabulation tables, using the text to check your work.

Review #2: How to generate a crosstabulation table in SDA

1. Open SDA for the dataset you are using. For this exercise, you can go to:

http://www.icpsr.umich.edu/cgi-bin/bob/newark?study=2833;path=NACJD. 2. Insert the name of your dependent variable in the “row” box and your

independent variable in the “Column” box, by either typing it into the appropriate boxes or by searching for them using one of the two methods described in the text above (navigating to the appropriate subfolder beneath the “Mode” selection area or by clicking on the Codebook at the top of the screen and locating a suitable variable).

3. Scroll down to the “Table options” section and add a checkmark by “Statistics” and “Question text.”

4. Make any changes to the chart options that you wish. 5. Click on the “Run the Table” button. Your crosstabulation table will appear in a

popup window. 6. If necessary, recode any variables and rerun the crosstabulation table.

This exercise exposed us to the SDA statistical engine, and we learned how to run frequencies and crosstabulation tables. We also learned how to limit the cases included in our analysis tables and how to recode variables. In the next series of exercises, we will be using a variety of online databases to further illuminate the readings in our text and to test some propositions related to the theories we’ll be learning. Further exploration What else might influence a youth’s likelihood of witnessing a shooting? There are plenty of other variables in the dataset for you to look at. Possibly you feel that witnessing other forms of violence such as seeing someone beat up (Q6A) or whether a respondent describes his/her school as a dangerous place (S7) has some effects on their likelihood of witnessing a shooting. Possibly you feel that respondents who have been beaten up themselves (Q18D and Q18E) are more likely to witness shootings. For the "further exploration" homework assignment, you will pick one of these variables (or one of your own choosing) and explore its effects on the likelihood of witnessing a shooting using the same procedures we used for SEX, RACE, and S1.