18
sets to study factors associated with the incidence of multiple sclerosis. Tamah Fridman David Glick John Kidd

Using large data sets to study factors associated with the incidence of multiple sclerosis. Tamah Fridman David Glick John Kidd

Embed Size (px)

Citation preview

Using large data sets to study factors associated

with the incidence of multiple sclerosis.

Tamah Fridman

David Glick

John Kidd

Multiple Sclerosis (MS)

• A complex autoimmune disease with both acute and chronic phases.

• Confounding factors include:o genetic background o viral infections including EBV and HSV o nutritional factors o environmental factors such as latitude and

smoking

Multiple Sclerosis (MS)

• In a more general way, this module could be used to explore the difference between correlation and causation.

• For use in a course, the instructor will supply appropriate background information on the immune response as applied to MS.

Multiple Sclerosis (MS)

• There is a vast literature examining the effects of o geography omigration o infectious diseases o sunlight related to vitamin D levelso cigarette smoking o diet o hormones

Multiple Sclerosis (MS)

• Over time a number of data sets have been published that explore relationships between environmental factors and MS.

• Many of these are single studies that were later included in one or more “meta-analysis” articles.

• In addition, there are incidence statistics available from a variety of sources such as CDC, World Life Expectancy.com, WHO, and others.

Multiple Sclerosis (MS)

• In order to demonstrate the module’s potential, we have constructed several examples of analysis using a variety of techniques linking MS incidence to rainfall and viral diseases via:o A GIS ploto A scatter plot o 3-D Principle Component Analysis (PCA)

• These are based on the same data to demonstrate that large data sets can be visualized and analyzed in a variety of ways.

Multiple Sclerosis (MS)

Multiple Sclerosis (MS)

• The Excel function “Correl” was used to look for correlations with MS rates and a series of viral diseases and a “lifestyle” disease. o Hepatitis C: -0.0152o Cervical cancer: -0.34991o Liver cancer: -0.25501o HIV: -0.1451o Lung cancer: 0.547928

Multiple Sclerosis (MS)

Country ms rate Hep C rate cerv ca rate liv ca rate HIV rate lung ca rate

Afghanistan 0.4 3.8 2.6 3.8 0 7.2

Albania 2.8 0.1 1.5 6.7 0.2 31

Algeria 0.1 0.1 3.4 1.3 2 10.6

Andorra 0.4 0.6 0.8 4.9 0 21.6

Angola 0.2 1 12.5 9.6 79.2 2.3

Antigua/Bar. 0 0 5.4 5.2 19.7 8.3

This slide is a sample—the complete spreadsheet contains 192 countries.

Multiple Sclerosis (MS)

• The above spreadsheet data were also used to construct scatter plots of MS v Hepatitis C (a viral disease) and also v Lung Cancer (an environmental/lifestyle disease). These plots follow.

Multiple Sclerosis (MS)

0 1 2 3 4 5 60

0.5

1

1.5

2

2.5

3

f(x) = − 0.0482397115134393 x + 0.310970668054575R² = 0.0110671587093064

ms rate (Y) versus Hep C rate (X)

ms rateLinear (ms rate)

Multiple Sclerosis (MS)

0 10 20 30 40 50 600

0.5

1

1.5

2

2.5

3

f(x) = 0.017306523346616 x + 0.0283145007695525R² = 0.300225342134711

ms rate (Y) versus lung cancer rate (X)

ms rateLinear (ms rate)

Multiple Sclerosis (MS)

• The complete Excel spreadsheet was also used in Principal Component Analysis (PCA).

• The data were saved in a tab delimited format and then imported into the NIA Array Analysis Tool for Principle Component Analysis.

• The results are password protected on this site: http://lgsun.grc.nia.nih.gov/ANOVA/index.html

Multiple Sclerosis (MS)• As something completely different, meta-

analysis data were extracted into Excel, transformed into a PGPLOT, and a Fortran program was written to analyze and display these data.

• A great deal of difficulty was encountered fitting disparate data points into congruent categories, so the following graph are shown with some reservation.

• However, students “inventing” their own analysis can be expected to encounter similar problems.

Multiple Sclerosis (MS)

Multiple Sclerosis (MS)

Multiple Sclerosis (MS)

• We are deeply indebted to: • Ileana Betancourt and Colleen McLinn for

help with GIS • Jeff Lutgen and Bruce Wiggins for help

with Excel.