22
Abstract Topic: The relation between smoking and state of health/lifestyle. Authors: Alex Luojos, Axel Linnovaara, Ashwath Venkatasubramanian, Sergei Gordienko. Aim of the project: The aim of our project is to investigate the relationship between smoking and health state. Hypotheses: 1. Currently smoking people are sick more often than former smokers and former smokers are sick more often than never smokers. 2. Currently smoking people are the least physically active, former smokers are more physically active but never-smoking people are the most physically active 3. Current smokers consume the most alcohol, former smokers consume the intermediate amount and never smokers consume the least alcohol. These theses were made based on everyday observations of smokers and non-smokers.

DCP EPIDEMIOLOGY PRJECT

Embed Size (px)

Citation preview

Page 1: DCP EPIDEMIOLOGY PRJECT

Abstract

Topic:

The relation between smoking and state of health/lifestyle.

Authors:

Alex Luojos, Axel Linnovaara, Ashwath Venkatasubramanian, Sergei Gordienko.

Aim of the project:

The aim of our project is to investigate the relationship between smoking and health state.

Hypotheses:

1. Currently smoking people are sick more often than former smokers and former smokers are

sick more often than never smokers.

2. Currently smoking people are the least physically active, former smokers are more physically

active but never-smoking people are the most physically active

3. Current smokers consume the most alcohol, former smokers consume the intermediate amount

and never smokers consume the least alcohol.

These theses were made based on everyday observations of smokers and non-smokers.

Our target is people aged 18-30, the data will be collected with by a questionnaire by spreading it

through social networks. Additionally, we expect to collect data from approximately 100 people.

We will collect data regarding:

1. Age

2. Sex

3. Smoking habits

4. Weight/ height

5. Sport activity (h/week)

Page 2: DCP EPIDEMIOLOGY PRJECT

6. Cases of being sick (last 6 months)

7. Alcohol consumption (restaurant portions/week)

8. Personal judgment of health state (1-5)

9. Duration of smoking habit

The survey is included in the appendix of this project.

Introduction

Smoking is the single most important preventable health risk in developed world. It is

also an important cause of premature death worldwide. Smoking cause a wide range of diseases,

including cancers, chronic obstructive pulmonary disease, coronary heart disease and stroke.

There is also a great interest for studying unhealthy lifestyles and their causes, because smoking

and alcohol related health problems are a major burden for the modern society both economically

and socially. In our research we studied the relation between health state and smoking habits.

The main aim of the project was to understand the relationship between smoking and health state

of an individual. We also studied whether or not smoking people use alcohol more often. The

study was carried out by using of internet survey that was sent to Erasmus and medical school

students in Tartu. The study material was analyzed by help of “Stata” program. To be able to

understand the lifestyle and health state of our sample we collected this data through

questionnaire as well.

Course of work and methods

To collect data, it was decided that an online survey would be the most appropriate as

many people can have access to it. Furthermore, to avoid bias, there is a certain degree of

randomization when opening the survey to a wide range of people. The survey comprised of 10

questions capable of obtaining the relevant information about lifestyle and smoking habits of the

subjects. The survey was made and launched on the “surveymonkey.com” platform and spread

Page 3: DCP EPIDEMIOLOGY PRJECT

through social networks. Overall 104 random exchange and degree students of Tartu University

of different ages and nationalities took part in the survey. The raw data collected was later

processed and this process can be seen in the next section. However, before that the sample was

profiled. See figures 1-6.

Figure 1. Gender distribution of the sample.

Page 4: DCP EPIDEMIOLOGY PRJECT

Figure 2. Age distribution of the sample

Figure 3. Weight (in Kg) distribution of the sample

Page 5: DCP EPIDEMIOLOGY PRJECT

Figure 4. Height(in cm) distribution of the sample

Figure 5. Smoking habit distribution of the sample

Page 6: DCP EPIDEMIOLOGY PRJECT

Figure 6. Health self-evaluation distribution amongst the sample

Health state evaluation by subjects.

1. My health state is dangerously poor (no results for this category).

2. My health state is poor but it is not a danger, however it still needs improvement.

3. My health state is not too bad however it is not too good either.

4. My health state is quite good.

5. My health state is extremely good, it can’t get much better.

After looking through the general profile of the sample, more detailed analysis in order to

check 3 hypotheses was conducted. Due to the pattern of data obtained the main methods applied

were Fisher’s exact test and Kruskal-Wallis test. More complex information about the course of

data processing including graphs and results of test can be found in the next section of the report.

Page 7: DCP EPIDEMIOLOGY PRJECT

Data processing and results

To process our data we used the program “Stata”. With this program we were able to

process a large amount of data quickly. Our first step was to determine what statistical tests

should be used according to our data. Therefore, we decided to create histograms of the variables

that are part of our hypotheses. This is to show the distribution of our data as a normal

distribution can point to the use of a t-test. Refer to figures 7, 8 and 9.

Figure 7. Histogram showing distribution of amount of times sick in the last 6 months

Page 8: DCP EPIDEMIOLOGY PRJECT

Figure 8. Histogram showing distribution of hours of exercise per week

Exercising habits of subjects.

1. 0-2 hours per week

2. 3-5 hours per week

3. 6-8 hours per week

4. 9-11 hours per week

5. More than 11 hours per week

Page 9: DCP EPIDEMIOLOGY PRJECT

Figure 9. Histogram showing distribution of alcohol consumption (portions per week)

As one can see from the graphs above, the data in all of the variables is not normally

distributed. Therefore, it is certain that the t-test cannot be used. As a result, it was decided that

the Kruskal-Wallis test can be used, along with the fisher exact test for tabulated data. The

reason we will not use the chi-squared test instead of the fisher exact test is because we do not

have a very large amount of data.

The first hypothesis to be tested is that currently smoking people are sick more often than

former smokers and former smokers are sick more often than never smokers. For this the

Kruskal-Wallis test can be used. It is a test that shows whether samples originate from the same

distribution, essentially showing if a variable has a significant effect on the results. It is similar to

a t-test in the way that in the test we seek to reject the null hypothesis and therefore accept the

alternative hypothesis. We can reject the null hypothesis only when the p-value (resulting value

form the test) is equal to or less than 0.05.

Page 10: DCP EPIDEMIOLOGY PRJECT

H0 (null hypothesis) - The hypothesis that there is no significant difference between

specified populations, any observed difference being due to sampling or experimental

error.

H1 (alternative hypothesis) - The hypothesis that the observations are the result of a

real effect. There is significance of the variable in question.

The resulting p-value that was found by using the Kruskal-Wallis test of sickness by

smoking was 0.3768. Therefore, p-value > 0.05, and this means that we have to accept the null

hypothesis as we don’t have the evidence to reject it and there is no significant effect by

smoking on the amount of times fallen sick. A box and whisker plot can also be generated to

show the relationship between the two variables. See figure 10.

Figure 10. Box and whisker plot showing relationship between smoking and the amount of times fallen sick in the past year.

X-axis (smoking): 0 - Never smokers, 1 - Former smokers, 2 - Current smokers

Page 11: DCP EPIDEMIOLOGY PRJECT

Even this box and whisker plot shows that there is no clear relationship as the medians of

every category of smoking are around the same. If the hypothesis stayed true then one would

expect to see never-smokers with the lowest median, former smokers with the second highest

median and current smokers with the highest median.

To make sure, we also used the fisher exact test on this hypothesis. However, as the

fisher exact test can be used only with two categorical variables, we had to make our sick

variable into two categories: fallen sick in the last 6 months and not fallen sick in the last 6

months. After this new variable was generated we tabulated the data. See table 1.

Table 1. Table showing frequency of falling sick amongst current smokers, former smokers and never smokers.

Smoking Not fallen sick Fallen sick Total0 (never smokers) 15

25.864374.14

58100.00

1 (former smokers) 320.00

1280.00

15100.00

2 (current smokers) 825.81

2374.19

31100.00

Total 2625.00

7875.00

104100.00

The fisher exact test is very similar to the Kruskal-Wallis test, in a way that it is based

upon the null and alternative hypothesis. The same rules apply so p-value has to be equal to or

less than 0.05 for us to reject the null hypothesis and believe that there is a statistically

significant effect.

The result for the fisher exact test is 0.952. This means that we do not have enough

evidence to reject the null hypothesis and we must believe that there is no effect or statistically

significant difference. Therefore once again, it is proved that smoking does not have an effect on

amount of times one falls sick.

The next hypothesis that needs to be tested is that currently smoking people are the least

physically active, former smokers are more physically active but never-smoking people are the

most physically active. An immediate problem that we saw with the data about sport activity is

that only one person exercised more than 11 hours per week (5th category). Therefore, what was

Page 12: DCP EPIDEMIOLOGY PRJECT

done was that the 4th and 5th categories were joined together. So the new categories for sport

activity would look like:

1. 0-2 hours per week

2. 3-5 hours per week

3. 6-8 hours per week

4. More than 9 hours per week

See table 2 below.

Table 2. Table showing amount of sport activity amongst current smokers, former smokers and

never smokers.

Sport activity (hours per week)Smoking 0-2 3-5 6-8 9+ TotalNever smokers

25 21 7 5 58

Former smokers

8 3 2 2 15

Current smokers

17 8 4 2 31

Total 50 32 13 9 104

With this data we can carry out a fisher exact test to verify if our hypothesis is true or not

or if we need to accept the null hypothesis. The result of the fisher exact test is: 0.840. This

means that we do not have enough evidence to reject the null hypothesis, which is that there is no

effect on sport activity with smoking; therefore we have to accept it. Once again, this can be

shown on a box and whisker plot. See figure 11.

Figure 11. Box and whisker plot showing relationship between smoking and the amount of hours

of sport activity in a week.

Page 13: DCP EPIDEMIOLOGY PRJECT

X-axis (smoking): 0 - Never smokers, 1 - Former smokers, 2 - Current smokers

Y-axis (sport activity): 1: 0-2 hours per week, 2: 3-5 hours per week, 3: 6-8 hours per week,

4: 9-11 hours per week, 5: More than 11 hours per week

This box and whisker plot also shows that there is no clear relationship as the medians are

all about in the same range. If the hypothesis was true the median of never smokers group would

be the highest, former smokers in the middle and current smokers the lowest.

The third and final hypothesis that has to be investigated is that current smokers consume

the most alcohol, former smokers consume the intermediate amount and never smokers consume

the least alcohol. The set of data regarding this test is quite large and therefore the fisher exact

test can’t be used. Instead we will revert back to the Kruskal-Wallis test. The p-value of this test

ended up being 0.0001. This means that there is enough evidence to reject the null hypothesis,

which would be that current smoker, former smokers and never smokers consume around the

same amount of alcohol. However, since the null hypothesis is rejected we can accept the

Page 14: DCP EPIDEMIOLOGY PRJECT

alternative hypothesis, which is that current smokers consume the most alcohol, former smokers

consume the intermediate amount and never smokers consume the least alcohol.

Furthermore, we can show this relationship using a box and whisker plot once again. See

figure 12 below.

Figure 12. Box and whisker plot showing relationship between smoking and the consumption of alcohol (restaurant portions per week)

X-axis (smoking): 0 - Never smokers, 1 - Former smokers, 2 - Current smokers

From this graph, it can be observed that there is a very clear pattern as stated in our

hypothesis. The median of the never smoking group is the lowest, and then next comes the

former smoking group and the highest median is of the current smoking group. This clear pattern

proves our hypothesis.

Conclusions

Page 15: DCP EPIDEMIOLOGY PRJECT

The study was concentrated around three hypotheses that predicted the associations

between smoking and health, or smoking and lifestyle. In these hypotheses the assumptions were

that people who smoke are sick more often, smokers are less physically active than non-smokers,

and smokers consume more alcohol than non-smokers.

When studying the assumption that smokers are sick more often, no consistent

association was found between susceptibility to infections and smoking by using Fischer’s test

(p=0.952, therefore H0 couldn’t be rejected). Also no difference was found between groups when

comparing the self-evaluation of their health.

With the assumption that smokers are less physically active than non-smokers, no reliable

association was found with Fisher’s exact test (p-value=0.324, H0 couldn’t be rejected). With

this hypothesis, however, the results could have been affected by our population, sample size and

wide categories for physical activity. Neither of the groups were physically very active, and with

our broad physical activity categories, almost all the answers fell in the same category of

exercise (0-2h a week).

Unlike in the previous two hypotheses, considerable association was found between

volumes of consumed alcohol and smoking within the sample. Kruskal-Wallis test was used to

determine the equality of populations and chi-squared test showed positive association with

smoking and volume of alcohol used (p-value=0.0001). Even though causalities cannot be

determined from this study, it would seem that in this sample of students, habits of smoking and

drinking alcohol were often concentrated on the same subjects.

It has to be noted that when interpreting the validity of the results, a few other concerns

should be taken into account. As the study was conducted via social media groups, it was

probably answered mostly by Erasmus students, and also to some degree by medical students.

Therefore could be assumed that these samples consist mainly of people who in general are very

healthy (young age, short smoking time, ability to acquire higher education and travel abroad).

As the sample of the study was very small and sample was taken from very specific population

with assumingly similar lifestyle, these results shouldn’t probably be extrapolated to concern any

wider groups of people.

Page 16: DCP EPIDEMIOLOGY PRJECT

What might also be interpreted from our results is that within these student groups with

young and generally healthy people, the primary health effects of smoking were not yet visible

on their health. If a follow-up study could be conducted, it would be interesting and perhaps

more fruitful to see if the association would be visible in these samples after 10 or 20 years.

Appendix

Survey questions:

1. How old are you?

2. Male/Female?

3. Are you a smoker? Yes, I currently smoke/No, I used to smoke/No, I never smoked

4. For how many years or months have you smoked for continuously? (Continuous smoking defined as at least a pack of cigarettes a week) **Only answer if you are a current smoker or former smoker**

4. How much do you smoke? (Packs per week)

a) 1

b) 2

c) 3

d) 4

e) 5

f) 6

g) 7

5. What is your weight? (in Kg)

6. What is your height? (in cm)

7. How many times have you fallen sick (flu or cold), in the last 6 months?

Page 17: DCP EPIDEMIOLOGY PRJECT

8. How many portions of alcohol do you consume in a week? *One portion is defined as a restaurant portion such as 500 ml of beer (standard glass of beer), 175 ml of wine (standard glass of wine), 45 ml of hard liquor (standard shot).*

9. How would you grade your state of health from 1 to 5?

1: My health state is dangerously poor.

2: My health state is poor but it is not a danger, however it still needs improvement.

3: My health state is not too bad however it is not too good either.

4: My health state is quite good.

5: My health state is extremely good, it can’t get much better.

10. How many hours of sporting activity do you do in a week (slow walking does not count!)?

a) 0-2 hours a week

b) 3-5 hours a week

c) 6-8 hours a week

d) 9-11 hours a week

e) More than 11 hours a week