Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Amanda Sullivan & Yerlin Barquero
Cigarette Smoking at Raritan Valley Community College: Attitudes and Prevalence
Semester Project Part IV
Statistics Honors MATH117H-01
Dr. Lynne E. Kowski, Ph.D
December 7th, 2016
Table of Contents
Section One: Introduction …………………………………………………………………… …. 3
Section Two: Sampling Design…………………………………………………………………...4
Section Three: Analysis
Question One: How Old Are You? (In Years) … ………………….…………….……...10
Question Two: Do You Smoke Cigarettes? (Yes or No) ……………..............................14
Question Three: If so, How Many Cigarettes Per Day……………………….….............17
Question Four: Less People Are Smoking Now More Than in the Past Decade………..20
Question Five: How Many People in Your Household Smoke……… ………………...23
Question Six: What Do You Believe to Be a Trigger for Someone
To Start Smoking.......................................................................................26
Question Seven: What is Your Gender…………………….…………………………….29
Question Eight: Do You Believe It is A Possibility That Smoking Can Be
Eventually Eliminated…………………………………………………..32
Analysis for ANOVA……………………………………………….…………………...34
Analysis for Regression……………………………………………………………….....37
Conclusion……………………………………………………………………………………….40
Appendix…………………………………………………………………………………………42
Survey ……………………………………………………………………………….….42
Works Cited…………………………………………………………………………………….. 43
2
Section One: Introduction
The one population under consideration in this study is Raritan Valley Community
College (RVCC) students. Our survey was influenced by the common belief that among people
today, particularly millennials, there is a lower prevalence of cigarette smoking. Our survey was
influenced in part by the fact that there is an abundance of medical literature supporting that
there are harmful side effects of cigarette smoking. Additionally, there are more methods
available today such as drugs and support groups for cigarette smokers who want to quit. In turn,
these factors could reduce the prevalence of cigarette smoking not just in the millennial
generation, but the entire population. Given the fact that the nature of RVCC as an institution
leads to diversity in ages among its students, we administered a survey to a sample of the RVCC
student population to make inferences to see the current trends on cigarette smoking within the
college. Although RVCC consists mostly of younger students, there are some older adults as well
so we tested our data with information for all adult smokers.
3
Section Two: Sampling Design
In order to obtain the data needed for our analysis, we administered our surveys using the
method of systematic random sampling. In this method, the population size is divided by the
desired sample size; if it is not a whole number, it must be rounded down. The number from the
calculation is called m. Afterward, a random number (k) must be obtained, either through
technology or a random number chart. It must fall between the numbers 1 and m. Once m and k
are calculated, the random samples can be obtained through the following procedure: k, k+m,
k+2m, and so on. This procedure is repeated until you have achieved all of your required
samples.
In our case, the total population of all Raritan Valley Community College students were
grouped into classes. Professor Kowski compiled a list of active classes available during the Fall
2016 semester, with each having an assigned number. This allowed us to perform our systematic
sampling procedure easily. We elected to have a sample size of 25 classes, which would ensure a
sufficient number of responses in case uncontrollable circumstances occured, such as absent
students or professors declining our request. For the systematic random sampling procedure, we
first divided the total number of active classes at RVCC (963) by our desired sample size (25).
So, 963 divided by 25 was equal to 38.25. Because this is not a whole number, we rounded down
to 38 and labeled the number as m. With m equal to 38, we calculated the number k, which had
to be between 1 and 38 (m). Using a random number generator, we found that k was 13
(Random.org). Then, we applied the final procedure of k, k+m, k+2m as follows: 13, 13+38,
13+2(38), 13+3(38)... and so on until we reached the sample size of 25. This method helped us to
prevent bias because the number k was randomly generated, and thus allowed us to move down
the class list in systematic way without any interventions on our part. Unlike cluster sampling,
this method mitigates the risk that the selected samples may have attributes in common, such as a
group of classes belonging to the Biology major. Our systematically chosen samples from the
class list are as follows:
4
*Bold type denotes classes selected for surveyingSystematically
Chosen Class Number (Using k, k+m, k+2m)
Class Information (CRN, Section, Course Title, Day and Time, Location, Instructor)
Students Enrolled in Class (As of
Beginning of Fall 2016 Semester)
13 14494, ACCT-221 51, TAX ACCOUNTING I, W 05:30 pm-08:20 pm, HNTRDN H116, Nicole DeRosa
No Response
19
51 14832, ARTS-131 01, WATERCOLOR II, W 01:00 pm -04:50 pm, ARTS A07, Christopher Koep
3
89 11021,BIOL-101 03X, GENERAL BIOLOGY I, WF 10:00 am - 11:20 am, WEST W114, Lenahan, Melanie
20
127 10243, BIOL-124 51X, HUMAN ANATOMY & PHYSIOLOGY, MW 06:00 pm - 07:20 pm, SCIENC SC151, Anthony Rosania
No Response
19
165 15262, BUSI-112 02, PRINCIPLES OF MANAGEMENT W 01:00 pm - 03:50 pm, HNTRDN H328, Tonya Rimple
38
203 14489, CHEM-212 02X, ORGANIC CHEMISTRY II TR 02:30 pm - 04:20 pm, SCIENC SC151, Paul Schueler
Accepted, Received Surveys
11
241 13681, CISY-102 95X, COMPUTER LITERACY W 05:30pm - 09:40 pm, WEST W309, Robert Bertolatus
No Response
22
279 10430, COMM-101 03 SPEECH MW 01:00 pm - 02:20 pm, WEST W113, Thomas Valasek
24
317 16073, CRMJ-122 13, DOMESTIC PREPAREDNESS F 06:00 pm - 08:50 pm, SMRSET S244, Joseph Howarth
No Response
6
355 10001, ECON-102 01, MICROECONOMICS, WF 10:00 am-11:20 am, HNTRDN H330, Daniel Aronson
37
393 11479, ENGL-070 L06, ENGLISH COMPOSITION I WORKSHOP, MW 11:30 am - 12:50 pm, SMRSET S350, Andrew Manno
19
5
Accepted, Surveys Received431 10583, ENGL-111 28, ENGLISH
COMPOSITION I, MW 04:00 pm - 05:20 pm, HNTRDN H122A, Melody Pazian
23
469 13904 ENGL-111H 01 ENGLISH COMP I HON:COMP&CONTRO, TR 10:00 am - 11:20 am, SMRSET S243, Christine Pipitone
21
507 10032, ENGR-105 02, INTRODUCTION TO ENGINEERING, W 02:30 pm - 04:20 pm SCIENC SC100, Antonella Pompo
26
545 12919, FILM-101 04, ART OF MOVIE:FILM APP & ANALYS, W 04:00 pm - 06:50 pm, HNTRDN H120, Thomas Lancaster
23
583 10142, HIST-101 01, WORLD CIVILIZATION I MW, 07:00 am - 08:20 am, HNTRDN H110, Christian Kemp
Accepted, Surveys Received
25
621 12800, HMNS-240 51, INTRODUCTION TO SPECIAL ED, R 06:00 pm - 08:50 pm, SMRSET S018, Andrea Romano
15
659 15250, MATH-020 13Y, ELEMENTARY ALGEBRA, M 11:00 am - 12:50 pm, SMRSET S349, Tatyana Stepanova
21
697 14840, MATH-030R 51, INTER ALGEBRA W/ALGEBRA REVIEW, TWR 07:30 pm - 09:20 pm, CLLCTR C022, Gregory Bullock
15
735 12428, MATH-112 04X, PRECALCULUS I MW 11:00 am-12:50 pm, SMRSET S247, Patricia Hulsen
Accepted, Surveys Received
40
773 12392, MUSC-103 01, ROCK N' ROLL HIST & CULTURE, TR 10:00 am - 11:20 am, ARTS A33, Lawrence Barbee
Declined Request
39
811 811, 10304 NURS-L01 05X, FOUND OF NURSG LAB, W 02:00 pm - 03:50 pm, HNTRDN H227, Maryann Balut
Accepted, Surveys Received
11
849 13390, PHIL-114 03 ETHICS, MW 02:30 pm - 03:50
17
6
Pm, HNTRDN H329, Sean Meseroll887 10916, PSYC-103 15, INTRODUCTION
TO PSYCHOLOGY, TR 04:00 pm - 05:20 pm, HNTRDN H333, Kathleen Manz
32
925 12865 SPAN-103 03X INTRODUCTION TO SPANISH I, TR 03:00 pm - 04:50 pm, WEST W306, Sandra Reynolds
19
TOTAL CLASSES: 25
EXPECTED STUDENT RESPONSES: 543
After getting the 25 samples of the classes, which totaled to 543 enrolled students,
Professor Kowski informed us that between monitoring responses from the instructors and
analyzing the data, a sample of 25 classes equating to 543 possible surveys would be too
overwhelming. We agreed with Professor Kowski and decided to reduce our sample size. So,
after the first class listed, we would select every fifth class and request permission to administer
surveys from those instructors. This ensured diversity among majors and gave us a total of five
classes to administer surveys to. This gave us an expected 113 student surveys which was well
over the amount of the 50 minimum surveys needed for analysis.
After sending out the e-mails requesting permission to administer our surveys to the
professors, we received three acceptances out of five classes. We obtained a prompt acceptance
from Professor Paul Schueler. We received the surveys from his class, but did not have all of the
expected amount since some students may have been absent, withdrew from the class at that
time, or refused to take the survey. Professor Christian Kemp approved our request, but we did
not receive all of the expected amount of surveys for the same possible reasons stated before.
Professor Andrew Manno approved our request, but again we did not receive all expected
surveys for the possible reasons mentioned above. Professor Lawrence Barbee declined our
request to administer the surveys to his class for undisclosed reasons. Unfortunately, Professor
Nicole DeRosa did not respond to our e-mail requests. After one week, we sent her a friendly
reminder. After another week, we finally decided to sample another class and sent an email
informing Professor Nicole DeRosa that we would no longer require the surveys from her class.
Although we were following the correct procedures to ensure a sufficient number of
responses we realized that across our returned surveys, very few students responded in the
affirmative that they currently smoke cigarettes. Professor Kowski informed us that this would
become an issue later on. For our analysis to work in terms of achieving a normal distribution,
7
we would need to have at least 30 students who responded in the affirmative that they currently
smoke cigarettes. At that point, we chose to select more classes to ensure that we could get the
30 minimum that smoke cigarettes by the posted deadlines. Professor Kowski advised us to
select classes that took place during the evening, had a large number of students enrolled, or was
in a discipline that is often believed to have a large amount of smokers (e.g. nursing). This was a
source of bias in our sampling because it was not truly random and required intervention on our
part. We selected the following classes which are represented by their numbers: 127, 241, 317,
735, and 811. These classes are shown in bold font on the table.
After selecting the above five classes, we emailed the professors and two of the five
responded in the affirmative. Professor Patricia Hulsen accepted our request to administer the
surveys to her class, but we did not receive the exact number of expected responses possibly due
to the aforementioned possible reasons. Professor Maryann Balut did not respond to our first
email request, but she did respond to the friendly reminder a week later and accepted our request.
We received all expected surveys. Unfortunately, Professors Anthony Rosania, Robert
Bertolatus, and Joseph Howarth did not respond to our emails. We sent an initial email to each,
followed by a friendly reminder one week later. Another week later, we finally decided to stop
pursuing their classes and sent them an email stating we would no longer require surveys from
their class.
Even with the surveys from the extra classes, we did not achieve the 30 minimum surveys
with students answering “yes” that they currently smoke cigarettes. In order to resolve this issue,
Professor Kowski authorized us to give in-person surveys to students we saw smoking cigarettes
around the RVCC campus. Even though this was a definite source of bias in our sampling
design, it was necessary so that we could further progress our study. We waited by the smoking
section of the main parking lot for students to come by. When they did, we asked them if they
were enrolled in at least one class at RVCC. We also ensured that they did not already take the
survey, which was a possibility since we already administered a considerable number of surveys
to our selected classes. Most students were cooperative in taking the survey and by the end of the
day, we received a significant number of smoker affirmative surveys. In order to preserve the
integrity of our sampling process, one survey had to be discarded since the person forcefully
grabbed a survey from our pile and filled it out on behalf of someone else, despite us repeatedly
telling him not to. One person filled out a survey circling both “yes” and “no” for the question of
8
whether or not they smoke cigarettes, so we discarded this as well since there was no way to
analyze it within the parameters of our study. Added to the few affirmative cigarette smoking
surveys from the classes we sampled, we managed to get a total of 33 students who responded
that they smoke cigarettes.
Given our complex sampling experience, we obtained a sample of 117 surveys from the
RVCC student population, which was well above the minimum of 50 that we needed for our
analysis.
9
Section Three: Analysis
The first question of our survey is below:
1) How Old Are You (In Years)?
________________
It is important to first list our null and alternate hypotheses for the average ages of
students at RVCC. Professor Kowski provided us with an enrollment report which
contained demographic data on the college’s student body. From this, we noted that as of
the Spring 2015 semester, the average age of RVCC students was equal to 25.5 years old
(10th Day Enrollment Report”). We rounded up to 26 years old for efficiency. We formed
these hypotheses:
Ho: μ=26
Ha: μ≠ 26
We chose our alternate hypothesis to be not equal to 26 for a variety of reasons.
Many students enroll the academic year directly after their high school graduation, which
could place them well below the mean age of 26 years old. Some older adult students
enroll in order to change their career path or to take a class for personal enrichment, so
their ages could be well above the mean of 26 years old. Before performing further
analyses, we had to determine whether or there is a difference between the ages of the
smokers and nonsmokers. To see if there is a difference, we looked at summary statistics
of smokers vs non-smokers and performed a Two-Sample T test at the 5% significance
level, as run by Minitab: N Mean St Dev SE Mean Age of Smokers 33 20.52 2.46 0.43
Age of Non-Smokers 84 21.94 6.63 0.72
Difference = μ (Age of Smokers) - μ (Age of Non-Smokers)Estimate for difference: -1.42595% CI for difference: (-3.092, 0.241)T-Test of difference = 0 (vs ≠): T-Value = -1.69 P-Value = 0.093 DF = 114
10
Based on this hypothesis test, the p-value was .093, which is greater than our set
significance level of 5%, or .05. Therefore, we retained the null hypothesis that there is
no difference in ages between smokers and non-smokers as there is not sufficient
evidence for the alternative hypothesis. Using Minitab, we calculated a tally for each
discrete variable along with their frequencies and relative frequencies. These are shown
in the table below:
Tally for: Age (In Years)
Age Count Percent 17 1 0.85 18 22 18.80 19 30 25.64 20 17 14.53 21 14 11.97 22 9 7.69 23 5 4.27 24 6 5.13 25 2 1.71 27 2 1.71 28 1 0.85 29 1 0.85 30 1 0.85 32 1 0.85 36 1 0.85 37 1 0.85 44 1 0.85 50 1 0.85 56 1 0.85 N= 117
The data of frequencies and tallies show the ages of the students. A majority of the students are
between 18 and 21 years old. This is expected as many students enroll directly from high school
and are expected to take around two to three years to finish their associates degree. The most
common ages are 18 and 19 years old, which make up 18.80% and 25.64% respectively. As
shown in the table, there still is a diversity among ages as student ages are spread around ages in
late-twenties, in their thirties, forties, and fifties categories. It is interesting to note that the
youngest student was 17 years old which may mean they are in an early college credits program
or graduated from high school early. From this, we can conclude that the ages of students at
Raritan Valley Community College mostly consists of young adults. To further investigate
information on the ages of the sampled students, we generated a histogram and boxplot which
include summary statistics. Minitab performed this task, as shown below:
11
1st Quartile 19.000Median 20.0003rd Quartile 22.000Maximum 56.000
20.477 22.600
19.000 20.000
5.135 6.650
A-Squared 14.90P-Value <0.005Mean 21.538StDev 5.795Variance 33.578Skewness 3.7576Kurtosis 16.6596N 117Minimum 17.000
Anderson-Darling Normality Test
95% Confidence Interval for Mean
95% Confidence Interval for Median
95% Confidence Interval for StDev
52.545.037.530.022.5
Median
Mean
2322212019
95% Confidence Intervals
Summary Report for Age
Both the histogram and boxplot from the data articulate our findings in the frequency and
relative frequency tables. Since the histogram is right-skewed, we can see that most of the
student’s ages are distributed around 22.5 years of age or less. In addition to the histogram, the
boxplot represents this trend as well. Most students are young adults as the boxplot is right
skewed. Unlike the histogram, the boxplot better displays the outliers, whose ages are much
higher therefore considerably farther away from the edge of the boxplot. In this case, the IQR is
equal to 3. The outliers are 1.5 times this range. To find further information on the ages, we will
perform a t-interval and one mean t-test, as generated by Minitab:
One-Sample T: Age
Test of μ = 26 vs ≠ 26
Variable N Mean StDev SE Mean 95% CI T PAGE 117 21.538 5.795 0.536 (20.477, 22.600) -8.33 0.000
12
From this calculation, we can see that our p-value is 0.000. We have chosen to perform
our hypothesis testing at the 5% significance level, or .05. Because the p-value of 0.000 is less
than the significance level of .05, we reject the null hypothesis in favor of our alternative. So, the
mean age is not equal to 26. When using the confidence interval with a 95% confidence level,
we can be 95% confident that the mean age of students falls between 20.477 and 22.600 years
old.
Tally for: Do you Smoke Cigarettes?
Do you smoke cigarettes? Count PercentNo 84 71.79Yes 33 28.21
N= 117
13
Question 2: Do you smoke cigarettes? (Circle One)
a.Yes b. No
For this question, the students taking the survey were given the option to circle
“yes” that they currently do smoke cigarettes, and “no” that they do not currently smoke
cigarettes. Before starting our analysis, we researched the proportion of smokers in the
United States to obtain our null and alternate hypotheses. The Centers for Disease
Control and Prevention stated: “In 2014, nearly 17 of every 100 U.S. adults aged 18 years
and older (16.8%) currently smoked cigarettes” (“Adult Cigarette Smoking Hits All-Time
Low”). Using this, we multiplied 16.8% by 100 in order to convert it to a proportion
(.168). Here are our null and alternative hypotheses for those that smoke cigarettes:
Ho: p=.168Ha: p<.168We chose our alternate hypothesis to be less than .168 because of the common
assumption that less people today are smoking cigarettes than in the past. This belief is
especially true in regards to young adults, which made up a majority of the students in
Question 1. As mentioned in our introduction, there is an abundance of medical literature
highlighting the negative side effects of cigarette smoking, which could lead the
alternative hypothesis to be less than .168. Below is a frequency and relative frequency
table of the responses students gave for this question:
Tally for: Do you smoke cigarettes? (Yes or No)Do you smoke cigarettes? Count Percent
No 84 71.79
Yes 33 28.21
N= 117
From this table, 84 students out of the 117 surveyed responded in the negative
that they do not smoke cigarettes, making up 71.79%. 33 out of the 117 surveyed
responded that they do smoke cigarettes, making up 28.21%. From the percentages, it
14
becomes clear that a majority of the RVCC students do not smoke cigarettes, which is
consistent with the common belief that less people today are smoking cigarettes. To
further analyze this, we created a pie chart. For data entry purposes, 0 represents non-
cigarette smokers and 1 represents cigarette smokers:
Our pie chart further validates that many RVCC students are not cigarette smokers. Those
who responded that they do not smoke cigarettes make up more than half of the pie chart.
Although those who responded that they do smoke cigarettes take up a portion of the pie chart,
they still are the minority. We then performed a one-proportion hypothesis test at the 5%
significance level:
One-Proportion Z-TestTest of p = .168 vs < .168
Variable X N Sample p 95% CI (Exact) P-Value (Exact)Smoke Cigarettes? 33 117 0.282051 (0.202759, 0.372708) 0.999
15
From the results, the exact p-value is 0.999, which is much higher than our set
significance level of 5%, or .05. So, we retained the null hypothesis that the proportion of
cigarette smokers is equal to .168. The data do not provide sufficient evidence in order to adopt
our alternative hypothesis that a proportion of less than .168 smoke cigarettes. From our
confidence interval, we can be 95% confident that the population proportion of those who smoke
cigarettes lies within the range of .202759 and .372708.
16
Question 3: If so, how many cigarettes per day? (If you don’t smoke, please answer
“ZERO”)
_______________
For this portion of the survey, we asked specifically to those who did smoke, how many
cigarettes they smoked per day. For the purpose of creating an unbiased test, we went ahead and
looked at the national average as per the Center for Disease Control and Prevention website.
Under their press release from 2014, they recorded a decline of cigarettes smoked per day from
16.7 to 14.2 (“Adult cigarette smoking rate overall hits all-time low”). For our purposes, we kept
the data they recorded as 14.2, therefore creating our null hypothesis of 14.2.
With the common assumption of this survey that cigarette smoking has declined, the
alternate hypothesis states that RVCC students smoke less than 14.2 cigarettes per day.
Ho: μ=14.2Ha: μ<14.2
Below is a frequency and relative frequency table of the cigarettes smoked per day by
students generated by Minitab:
Tally for: Cigarettes Smoked Per DayCigarettes/Day Count Percent 1.0 2 6.06
2.0 2 6.06
3.5 4 12.12
4.0 2 6.06
4.5 2 6.06
5.0 3 9.09
6.5 1 3.08
8.0 1 3.08
10.0 5 15.15
12.0 1 3.03
12.5 1 3.03
14.0 1 3.03
15.0 2 6.06
20.0 5 15.15
25.0 1 3.03
N= 33
17
As seen above with the relative frequency chart above, we can see that the total smokers
who smoke more than 14 cigarettes per day is only 9, compared to the remaining 24 individuals
who smoke less than 14 cigarettes per day. Our minimum is one cigarette and the maximum
being 25 cigarettes per day. Below is a histogram of the data:
The following information will be used to retain the average from the Centers of Disease
Control and Prevention (CDC), or reject their statement. We have chosen a confidence level of
95% with a 5% significance level, or .05. Here are the results for the One-Sample T Test,
performed by Minitab:
One-Sample T Test: Cigarettes Smoked Per DayTest of μ = 14.2 vs < 14.2
Variable N Mean StDev SE Mean 95% CI T PCigarettes Smoked/Day 33 9.39 6.78 1.18 (6.99,11.80) -4.07 0.000
18
The p-value is 0.000, which is much smaller than our set significance level of 5%, or .05.
So, we rejected the null in favor of our alternate hypothesis. At the 5% significance level, the
data provide sufficient evidence to conclude that the cigarette smoking students smoke less than
14.2 cigarettes per day. From this, we can also be 95% confident that the average number of
cigarettes smoked per day is between 6.99 and 11.80.
19
Question 4: Less people are smoking now than in the past decade. (Please Circle One)
Strongly Disagree | Disagree | Neutral | Agree | Strongly Agree
Students read this statement and had to respond using a Likert Scale. They were given the
choices of “Strongly Disagree,” “Disagree,” “Neutral,” “Agree,” and “Strongly Agree.” For data
entry purposes, each response was given a number, so “Strongly Disagree” is 1, “Disagree” is 2,
and so on. We formulated our null and alternative hypotheses:
Ho: μ= 3 (Neutral)Ha: μ ≠3 (Neutral)Even though data suggests that less people smoke cigarettes now than in the past years,
some people may have different views on the current trends of smoking. As a default, we set our
null hypotheses to the response of “Neutral.” Since we have no concrete way of inferring how
students may respond to this question, we set our alternative hypothesis to be not equal
“Neutral.” Here is a frequency and relative frequency table of the responses students gave:
Likert Response Count PercentStrongly Disagree (1) 9 7.69
Disagree (2) 21 17.95
Neutral (3) 28 23.93
Agree (4) 50 42.74
Strongly Agree (5) 9 7.69
N= 117
From the table, many students answered “Agree” in response to our statement that less
people are smoking now than in the past years. 50 out of 117 responders selected this answer,
making up 42.74%,. A considerable number of students responded with “Neutral,” making up 28
out of 117 students and 23.93 percent. The other categories have a considerable number of
responses as well. Here is a histogram of the students’ responses to the statement:
20
The histogram is left-skewed as seen by the longer tail on that side, which is logical since
the majority of the students responded with “Agree,” or 4. This better reflects and provides a
visual of the data we found in the frequency and relative frequency table as it shows the
distribution. So, from both the histogram and table, many students agree that less people are
smoking today than in the past decade. Here is the data needed to test our alternative hypothesis
along with our 95% Confidence Interval each from Minitab:
One-Sample T-TestVariable N Mean StDev SE Mean 95% CI T PQuestion 4 117 3.248 1.082 0.100 (3.050, 3.446) 2.48 0.015
Our p-value from the one mean t-test is equal to 0.015, which is less than our significance
level of 5%, or .05. We rejected the null in favor of our alternate hypothesis. At the 5%
21
significance level, the data provide sufficient evidence to conclude that student’s feelings on our
statement differ from “Neutral.” From the confidence interval, we can be 95% confident that the
mean response will be between 3.050 and 3.446, which roughly equates to the “Neutral”
category.
22
Question 5: How many people in your household smoke?
How many people in your household smoke, and approximately what are their age in the
following table: (If you live alone answer “ZERO” for each)
Relation: Smoke: Yes / No Age (Approximate)
Spouse
Sibling(s)
Children
Parent
Other_______________
According to the surveys we received, a number of students seemed to have had
difficulty answering this question. We ultimately choose to not take the age column into
consideration for our analysis because some responses received for this part were
incomprehensible. To simplify this, we took a tally of the smokers in the students’ household and
did this for each survey. From the 2010 Census, it is stated that the average household size is
3.14, so for our testing purpose we are rounding this number down to 3 (“Household and
Families: 2010). We could not find data stating the average number of smokers per household, so
we will simply test with average number of people in the house regardless of whether or not they
smoke cigarettes. Our null hypothesis is 3. Following the trends from the declining values shown
in the Center of Disease Control for Prevention, we set the alternate hypothesis to less than 3:
Ho: μ = 3 Ha: μ< 3
23
Before doing further analysis, we must determine whether or not there is a significant
difference between smokers and non-smokers for this question. Here are the results from
Minitab:
Difference = μ (Smokers in House (Smokers)) - μ (Smokers in House (Non-Smokers))Estimate for difference: 0.26995% CI for difference: (-0.074, 0.613)T-Test of difference = 0 (vs ≠): T-Value = 1.58 P-Value = 0.121 DF = 51
Based on the results of this hypothesis test, the p-value is equal to 0.121. This is greater
than our set significance level of 5%, or .05. So, we retained the null that there is no difference
between the amount of smokers in the household depending on whether or not one smokes or
does not smoke cigarettes. We proceeded with the analysis of cigarette smokers in the household
with cigarette smokers and non-cigarette smokers together. Here is a frequency and relative
frequency table of the amount of cigarette smokers in the household:
Tally for: People who smoke in household
#of Cigarette Smokers in Household Count Percent 0 69 58.97 1 33 28.21 2 12 10.26 3 3 2.56 N= 117
As seen with the frequency relative frequency table, over half of the people surveyed had
0 people who smoked in the household making up for 58.97% of the total surveyed. This is
consistent with the ideology that less people today are smoking cigarettes. Very few of those
surveyed had 2 or more cigarette smokers in their household and many only had one cigarette
smoker in their household. Here is a histogram of the data:
24
As seen above, this histogram a skewed to the right, meaning that many of the students
had 1 cigarette smoker in their household or less. This is lower than 3 cigarette smokers in the
household, as those who had 0 cigarette smokers accounted for more than half of those surveyed.
The following is a One-Sample T Test at the 5% significance level:
One-Sample T: People who smoke in householdTest of μ = 3 vs < 3
Variable N Mean StDev SE Mean 95% Upper Bound T PSmokers in House 117 0.5641 0.7811 0.0722 0.6838 -33.73 0.000
Using the p-value approach, we can reject the null because the p-value of 0.000 is much
lower than .05. We can reject the null hypothesis and adopt our alternate hypothesis which states
that members of the household who smoke cigarettes is less than 3.
25
Question 6: What do you believe to be a trigger for someone to start smoking?
What do you believe to be a trigger for someone to start smoking? (Fill In)
a. Peer Pressure b. Media c. Stress D. Other:___________
Students were asked to select what they believed to be the biggest trigger for someone to
start smoking cigarettes, with the proposed options listed as: “Peer pressure, Media, Stress” and
the option for “other” in case students believed there was another trigger for beginning cigarette
smoking. Due to the lack of statistical knowledge we had at the time of writing this survey, we
neglected to add the instructions to circle only one answer. This error caused many students to
circle more than one answer. Since we had to satisfy the assumptions for the one-way ANOVA
testing, we diced to create an additional two groups. So, our new groups were “Peer Pressure,”
“Media,” “Stress,” “Two Triggers,” and “Three Triggers.” In order to complete the data, we
reviewed all the surveys and for those who wrote an answer for “Other” and categorized them
into one of the original groups of “Peer Pressure,” “Media,” and “Stress.” For example, in survey
34 the individual stated: “People I work with smoke,” so we grouped this in “Peer Pressure.”
Below we will examine what students at Raritan Valley Community College believed to be the
biggest trigger for someone to start smoking. Three of the surveys were not taken into
consideration for this question’s analysis since they did not provide an answer for this question,
so our sample was reduced to 114 for this question and our ANOVA investigations we
performed after this:
Tally for: Biggest Trigger for Starting Smoking CigarettesSmoking Triggers Count Percent Media 3 2.63
Peer Pressure 38 33.33
Stress 44 38.60
Three Triggers 14 12.28
Two Triggers 15 13.16
N= 114
26
With the information from the frequency table we can see that the two leading triggers as
believed by the students surveyed at Raritan Valley Community College are “Stress” making up
38.6% and following closely behind was Peer Pressure making up 38.33%. This did not come as
a surprise as it is a generally believed that smoking cigarettes is a way to cope with stress. Below
is a pie graph of the data, which provides a visual representation of how students answered this
question:
What did come as a surprise in this study, was that only 2.63% of students believed that
“Media” was a sole trigger to begin cigarette smoking. This is shocking since media plays such
an integral role in present society and is made up of television ads and films showing cigarette
smoking. In fact, the American Cancer Society has stated the following regarding the media and
its influence on smoking. “Tobacco use is also shown in video games, online, and on TV. And
movies showing smokers are another big influence. Studies show that young people who see
smoking in movies are more likely to start smoking” (“Why People Start Smoking and Why It’s
Hard to Stop”). Given this, it’s surprising to see that this category made up such a small
percentage.
27
28
Question 7: What is your gender?What is your gender? (Circle one)
a. Maleb. Femalec. Other
Students had to respond to this question with their gender. We choose to include “Other”
as one of the options since somebody may not identify as either “Male” or “Female.” However,
nobody responded with “Other.” Because of this, we eliminated it from our observations as it
would have no effect to the final data. First, we had to see if there was a difference in gender
depending on whether or not they’re cigarette smokers. Below is the information we needed to
perform the Two-Sample T Test:
male (Smoke) (1 = yes, 0 24 61 0.393443female (Smoke) (1 = yes, 9 56 0.160714 Difference = p (male (Smoke) (1 = yes, 0= no)) - p (female (Smoke) (1 = yes, 0= no))Estimate for difference: 0.23272895% CI for difference: (0.0769034, 0.388553)Test for difference = 0 (vs ≠ 0): Z = 2.93 P-Value = 0.003
Because our p-value is 0.003 and our significance level is 5%, or .05, we rejected the null
hypothesis that there is no difference between the genders or students and their status on
cigarette smoking. So, at the 5% significance level, the data provides us with sufficient evidence
to conclude that a difference exists between males and females in terms of their cigarette
smoking status. We will analyze the data for only the cigarette smokers since the hypothesis test
results were significant. Below is a frequency and relative frequency table of the genders of
cigarette smokers:
Gender of Smokers Count Percent Female 9 27.27
Male 24 72.73
N= 33
Males made of the greatest amount of the cigarette smoking students. 24 out of 33 were
males, making up 72.73% of the students. Only 9 out of 33 were female, making up 27.27% of
the students. Below is a pie chart of the genders of cigarette smoking students:
29
01
Category
1
0
Cigarette Smoking Students (By Gender)
We researched information on cigarette smoking and gender from the World Health
Organization (WHO). WHO had run their own study and found the following: “Globally, about
40% of men smoke as compared with nearly 9% of women. However, the epidemic of tobacco
use among women is increasing in some countries” (“10 Facts on Gender and Tobacco”). Since
WHO is a worldwide recognized institution, we used their proportion of 0.09 for our hypothesis
testing. This created our null hypothesis 9% of women surveyed smoke. For our alternate
hypothesis we want to see if there is a difference from their study, which is why our alternate
hypothesis will be set to not equal to .09. When we ran the test in Minitab, “Female” was the
specified attribute (or “Event”):
Ho: p= 0.09Ha: p≠0.09Test and CI for One Proportion: Female Cigarette Smokers
Test of p = 0.09 vs p ≠ 0.09
Event = 1
30
Variable X N Sample p 95% CI P-Value (Exact)Gender (X=Female) 9 33 0.272727 (0.132996, 0.455238) 0.002
From the calculated p-value, which is 0.002, the data is significant at 5%, or .05. We had
sufficient evidence to conclude that there is a difference between our proportion of female
cigarette smokers and the statistic reported from WHO. According to WHO, “…40% of men
smoke compared with 9% of women” (“10 Facts on Gender and Tobacco”). From the confidence
interval, we can be 95% confident that the proportion of females who smoke cigarettes is
between .132996 and .455238. Using this, we formulated null and alternate hypotheses for male
cigarette smokers. We chose our alternate to be not equal to in order to find if there is a
difference in the proportion of male smokers compared to WHO:
Ho: p= 0.40Ha: p≠0.40Here are the calculations for the One-Proportion Z test, with “Male” set as the attribute,
or “Event”:
Test and CI for One Proportion: Male Cigarette Smokers
Test of p = 0.4 vs p ≠ 0.4
Event = 1
Variable X N Sample p 95% CI P-Value (Exact)Male Cigarette Smokers 24 33 0.727273 (0.544762, 0.867004) 0.000
The calculated test results are above. The p-value is 0.000, which is lower than our set
significance level of 5%, or .05. We reject the null and adopt the alternate hypothesis that there is
a difference between the proportion of male cigarette smoker students and the data for males
reported by WHO. From the confidence interval, we can be 95% confident that the proportion of
cigarette smoking males is between .544762 and .867004.
31
Question 8: Do you believe it is a possibility that smoking cigarettes can be eventually
eliminated? (Circle One)
a. Possible
b. Impossible
Students were asked to circle the option they believed to be true. Here are our null and
alternative hypotheses for this proportion question:
Ho: p=.50Ha: p≠.50Like Question 4, we set our hypotheses as such because there is no way concrete way to
predict how students will feel about this question. So, we chose the default proportion of .50
would feel that putting an end to cigarette smoking is “Possible” and set our alternate hypothesis
to be not equal to since students may feel one way or the other about this question. Here is
frequency and relative frequency table of the students’ responses:
Response Count Percent Impossible 56 47.86
Possible 61 52.14
N= 117
Most of the students responded that it is “Possible” that smoking cigarettes will eventually be
eliminated, with 61 out of 117 responding this way. This made up 52.14%. However, a
considerable number responded that the prospect of eliminating cigarette smoking was
“Impossible.” They made up 56 out of 117, or 47.86%. From this table, the students’ responses
did not lean dramatically toward either of the responses. Here is a pie chart of the data:
32
The pie chart reflects the same trend as the frequency and relative frequency table.
Slightly more students responded that it was “Possible” to end cigarette smoking, while slightly
less answered “Impossible” to our question. As seen by the slices of the pie graph, the students
did not respond drastically toward any one answer. Here are the data needed to perform a
hypothesis test at the 5% significance level along with the 95% Confidence Interval:
X N Sample p 95% CI P-ValuePossible/Impossible 61 117 0.521368 (0.427057, 0.614570) 0.712
The p-value is .712, which is greater than our set significance level of 5%, or .05%. We retain
the null that the proportion of students answering “Possible” is equal to a proportion .50. At the
5% significance level, the data does not provide sufficient evidence to conclude that there is a
difference from the proportion of .50 that eliminating cigarette smoking can be eliminated. From
the Confidence Interval, we can be 95% confident that the proportion of students answering that
cigarette smoking can be eliminated is between .427057 and .614570.
33
One-Way ANOVA Investigations: Numerical and Proportion
Before performing the ANOVA investigations, we had to eliminate three rows since the
students did not provide a response for question 6 about the triggers for starting cigarette
smoking. Because of this, we also had to eliminate the data from those surveys for what we were
testing against. For example, when we performed the investigation of Age versus the Trigger, we
had to omit the age data for those students who did not respond to question 6 in order to prevent
confusion. This reduced our sample size from 117 to 114 for these investigations. We ran
ANOVA tests for all of our numerical and proportional questions. Below are the ones that had a
significant p-value. If none of the tests yielded significance, we simply chose the investigation
that had the lowest p-value:
One-way ANOVA: Age versus Triggers for Starting SmokingMethod
Null hypothesis All means are equalAlternative hypothesis At least one mean is differentSignificance level α = 0.05Rows unused 3
Equal variances were assumed for the analysis.
Factor Information
Factor Levels ValuesQUESTION 6 (Trigger) 5 Media, Peer Pressure, Stress, Three Triggers, Two Triggers
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-ValueQUESTION 6 (Trigger) 4 189.6 47.41 1.62 0.175Error 109 3192.9 29.29Total 113 3382.5
Model Summary
S R-sq R-sq(adj) R-sq(pred)5.41225 5.61% 2.14% 0.00%
Means
QUESTION 6
34
(Trigger) N Mean StDev 95% CIMedia 3 21.00 0.00 ( 14.81, 27.19)Peer Pressure 38 22.82 8.29 ( 21.08, 24.56)Stress 44 20.477 2.246 (18.860, 22.094)Three Triggers 14 19.286 1.978 (16.419, 22.153)Two Triggers 15 22.33 5.21 ( 19.56, 25.10)
Pooled StDev = 5.41225
The p-value calculated from this investigation was 0.175. This is not significant since we
chose our significance level was 5%, or .05. The data do not provide sufficient evidence to
conclude that the mean age varies depending on what students believe to be a trigger for starting
cigarette smoking.
One-way ANOVA: Do You Smoke Cigarettes versus Triggers for Starting Cigarette Smoking
Method
Null hypothesis All means are equalAlternative hypothesis At least one mean is differentSignificance level α = 0.05Rows unused 3
Equal variances were assumed for the analysis.
Factor Information
Factor Levels ValuesQUESTION 6 (Trigger) 5 Media, Peer Pressure, Stress, Three Triggers, Two Triggers
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-ValueQUESTION 6 (Trigger) 4 1.121 0.2803 1.37 0.250Error 109 22.326 0.2048Total 113 23.447
Model Summary
S R-sq R-sq(adj) R-sq(pred)0.452579 4.78% 1.29% 0.00%
Means
QUESTION 6(Trigger) N Mean StDev 95% CI
35
Media 3 0.000000 0.000000 (-0.517881, 0.517881)Peer Pressure 38 0.2105 0.4132 ( 0.0650, 0.3560)Stress 44 0.3636 0.4866 ( 0.2284, 0.4989)Three Triggers 14 0.429 0.514 ( 0.189, 0.668)Two Triggers 15 0.200 0.414 ( -0.032, 0.432)
Pooled StDev = 0.452579
The p-value for this test equated to 0.250. Given our significance level of 5%, or .05, we
had to retain the null. The data do not provide sufficient evidence to conclude that the mean
proportion of students who smoke cigarettes is influenced by what they believe to be a trigger to
start smoking.
36
Regression Investigation: Numerical Data
To perform the regression analysis, we ran each of our numerical questions to see if one
the variables were associated. It was appropriate to only consider the data exclusive to cigarette
smokers for these investigations. For example, we only looked at the ages of the cigarette
smokers and the people who smoke cigarettes in the cigarette smoker’s household. Upon
calculating the p-values for each regression investigation, we found the variables of the ages of
smokers was associated with the amount of smokers in their household. Below are the null and
alternate hypotheses and the results of the test:
Ho: β=0 Meaning age of the smokers (x) is useful for predicting number of cigarette smokers in the household smokers have (y)Ha: β ≠ 0 Meaning age of the smokers (x) is not useful for predicting number of cigarette smokers in the household smokers have (y)Regression Analysis: Smokers in House Smokers versus Ages of Smokers
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-ValueRegression 1 2.995 2.9954 4.41 0.044 Ages of Smokers 1 2.995 2.9954 4.41 0.044Error 31 21.065 0.6795 Lack-of-Fit 6 2.932 0.4886 0.67 0.672 Pure Error 25 18.133 0.7253Total 32 24.061
Model Summary
S R-sq R-sq(adj) R-sq(pred)0.824332 12.45% 9.63% 1.26%
Coefficients
Term Coef SE Coef T-Value P-Value VIFConstant -1.79 1.22 -1.46 0.153Ages of Smokers 0.1242 0.0591 2.10 0.044 1.00
Regression Equation
Smokers in House Smokers = -1.79 + 0.1242 Ages of Smokers
37
Fits and Diagnostics for Unusual Observations
Smokers in House StdObs Smokers Fit Resid Resid12 3.000 0.569 2.431 3.01 R19 2.000 1.811 0.189 0.30 X20 3.000 1.315 1.685 2.20 R
R Large residualX Unusual X
The p-value is 0.044, which is less than our set significance level of 5%, or .05. So, we
reject the null in favor of the alternative. At the 5% significance level, the data provide sufficient
evidence to conclude that the ages of the cigarettes smokers are useful for predicting the amount
of cigarette smokers in their household. Below is a scatter plot of the data, which includes R2:
30282624222018
3.0
2.5
2.0
1.5
1.0
0.5
0.0
S 0.824332R-Sq 12.4%R-Sq(adj) 9.6%
Ages of Smokers
Smok
ers i
n Ho
use S
mok
ers
Fitted Line PlotSmokers in House Smokers = - 1.790 + 0.1242 Ages of Smokers
Based on this graph, the data points are only very slightly placed around the line of best
fit. R2 is equal to 12.4%, meaning that the variation of cigarette smokers in a cigarette smokers
household is some-what explained by the age of the cigarette smokers. To further examine the
relationship between the two variables, we will examine the Residual plot vs. predictor and
normal probability plot of residual graphs:
38
210-1-2
99
90
50
10
1
Standardized Residual
Perc
ent
0.840.780.720.660.60
3
2
1
0
-1
Fitted Value
Stan
dard
ized
Resid
ual
2.52.01.51.00.50.0-0.5-1.0
16
12
8
4
0
Standardized Residual
Freq
uenc
y
302520151051
3
2
1
0
-1
Observation Order
Stan
dard
ized
Resid
ual
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Smokers in House Smokers
From the Normal Probability plot, it is clear that the data is not linear and therefore does
not have a normal distribution. Although this violates a rule to do regression testing, we needed
to continue for the purposes of our test. For the residual plot vs. predictor, the data is not
scattered linearly around the y=0 line and it is difficult to tell which assumptions from the
textbook it violates.
39
Conclusion:
As we mentioned in the beginning, the primary focus of this study was to see if the
prevalence of smoking cigarettes in the United States population is actually decreasing, and what
the attitudes of students are in regards to the future of cigarette smoking. As a conclusion of this
study, we were able to gain some insightful information as to the habits of the students surveyed
at Raritan Valley Community College. When it came to the age of the students surveyed, there
was no difference in the age between smokers and nonsmokers, as we had predicted a mean age
of 26, this proved to be untrue, as the mean age of the surveyed was under 22.5. When singling
out the proportion of those who smoked vs. those who did, the nation proportion as stated by the
CDC was 16.8% of the population smokes, and when running test for this study we came to the
conclusion that out of everyone surveyed there were 28.21% who did smoke, and the hypothesis
test led us to retain the null hypothesis. On the other hand, we were extremely happy to see that
for those who do smoke, they smoked less cigarettes per day than the national of 14.2. RVCC
students are smoking between 6.9891 and 11.79888 (7 and 12 rounded) cigarettes per day.
Additionally, those surveyed who smoked a great majority of them were male. The WHO states
that an overall 9% of women smoke, and in our study we were able to prove that female cigarette
smoking students was not equal to a proportion of .09 (9%). In our analysis regarding the number
of people who smoked in the household, we were surprised to see that a majority of the students
(58.97%) surveyed had no people (0) who smoked in their household, with the mean falling
between 0.421 and 0.707 cigarette smokers per household. At the completion of this survey, the
triggers for smoking as answered by those surveyed proved to be a shock. Media had only 2.97%
of the students’ responses, a number we found surprisingly low considering how involved the
millennial generation is with technology and the access they had for social media. When
questioning students regarding their whether or not people are smoking now more so than in the
past, our hypothesized response of “Neutral” had been rejected after running analysis and we
found it be not equal to. Based on our tallies and pie chart and histogram, many students
answered “Agree.” One of the things we were pleased to see were the responses students had to
the future of cigarette smoking. We proposed the question of whether or not cigarette smoking
could be eventually eliminated, and were glad to see that 52.14% of the students surveyed
answer that it would be possible, with the mean falling between .427057 and .614570. Following
our ANOVA analysis, we found that for the investigation having the lowest p-value for
40
proportion and numerical investigations, all of the means proved to be equal according to the
test. For the regression, our data violated the key principles needed to run a regression hypothesis
test. However, we did so anyway and found that the age of the cigarette smoking students was
mildly useful for predicting the amount of cigarette smokers present in a cigarette smokers
household.
Our sampling design did come with a lot of bias, simply because of the difficulty in
obtaining a sufficient amount of cigarette smokers from the classes surveyed. Because we had to
approach students, this created the level of bias in our study. While we were glad to see that most
students in the classes we selected did not smoke, it created a problem in terms of running the
analysis.
To conclude this survey, and ultimately answer the question on the prevalence of
smoking at Raritan Valley Community College, we were able to determine that while there was
not sufficient evidence to conclude that RVCC students are smoking less than the national
average, but for those who did smoke actually smoked less cigarettes per day than the national
average reported by the CDC. In addition, our hypothesis testing and tallies show that RVCC
students hold an optimistic vision for less people smoking cigarettes today and many feel that it
can eventually be eliminated.
41
Appendix
CIGARETTE SMOKINGQuestions
1. How old are you (In years)?______________
2. Do you smoke cigarettes? (Circle one)a. yes b. no
3. If so, how many cigarettes per day? (If you don’t smoke, please answer “ZERO”)____________________
4. Less people are smoking now than in the past decade. (Please circle one)
Strongly Disagree Disagree Neutral Agree Strongly Agree
5. How many people in your household smoke, and approximately what are their age in the following table: (If you live alone answer “ZERO” for each)
Relation: Smoke: Yes / No Age (Approximate)
Spouse
Sibling(s)
Children
Parent
Other_______________
6. What do you believe to be a trigger for someone to start smoking?a. Peer Pressure b. Media c. Stress D. Other:___________ (Fill In)
7. What is your gender? (Circle one)
a. Maleb. Femalec. Other
8. Do you believe it is a possibility that smoking cigarettes can be eventually eliminated? (Circle one)
a. Possibleb. Impossible
42
Works Cited
"10 Facts on Gender and Tobacco." World Health Organization. WHO, 2010. Web. 6 Dec. 2016.
"Adult Cigarette Smoking Rate Overall Hits All-time Low." Centers for Disease Control and
Prevention. Centers for Disease Control and Prevention, 26 Nov. 2014. Web. 06 Dec.
2016.
Facts about RVCC. (n.d.). Retrieved December 6, 2016, from
http://www.raritanval.edu/about/foundation/facts.html
"Households and Families: 2010." Census. U.S. Census Bureau, Apr. 2012. Web. 6 Dec. 2016.
"Why People Start Smoking and Why It’s Hard to Stop." Why People Start Using Tobacco, and
Why It's Hard to Stop. American Cancer Society, 13 Nov. 2015. Web. 06 Dec. 2016.
43