55
Linköping Studies in Science and Technology. Dissertations No. 1282. GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS Sara Hägg Department of Physics, Chemistry and Biology Linköping University, SE58183 Linköping, Sweden Linköping, 2009

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Linköping Studies in Science and Technology.

Dissertations No. 1282.

GENE EXPRESSION PROFILING OF HUMAN

ATHEROSCLEROSIS

Sara Hägg

Department of Physics, Chemistry and Biology

Linköping University, SE58183 Linköping, Sweden

Linköping, 2009

Page 2: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

All previously published papers were reproduced with permission from the publisher. Printed by LiU-Tryck. Linköping, Sweden 2009. © Sara Hägg, 2009 ISBN 978-91-7393-502-9 ISSN 0345-7524

Page 3: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

ABSTRACT Atherosclerosis is a progressive inflammatory disease that causes lipid accumulation in the arterial wall, leading to the formation of plaques. The clinical manifestations of plaque rupture—stroke and myocardial infarction—are increasing worldwide and pose an enormous economic burden for society. Atherosclerosis development reflects a complex interaction between environmental exposures and genetic predisposition. To understand this complexity, we hypothesized that a top-down approach—one in which all molecular activities that drive atherosclerosis are examined simultaneously—is necessary to highlight those that are clinically relevant. To this end, we performed whole-genome expression profiling in multiple tissues isolated from patients with coronary artery disease (CAD).

In the Stockholm Atherosclerosis Gene Expression (STAGE) study, biopsies of five tissues (arterial wall with and without atherosclerotic lesions, liver, skeletal muscle and visceral fat) were isolated from 124 CAD patients undergoing coronary artery bypass grafting surgery (CABG) at the Karolinska University Hospital, Solna and carotid lesions from 39 patients undergoing carotid artery surgery at Stockholm Söder Hospital. Detailed clinical characteristics of these patients were assembled together with a total of 303 global gene expression profiles obtained with the Affymetrix GeneChip platform.

In paper 1, a two-way clustering analysis of the data identified 60 tissue clusters of functionally related genes. One cluster, partly present in both visceral fat and atherosclerotic lesions, related to atherosclerosis severity as judged by coronary angiograms. Many of the genes in that cluster were also present in a carotid lesion cluster relating to intima-media thickness (IMT) in the carotid patients. The union of all three clusters relating to extent of atherosclerosis—referred to as the “A-module”—was overrepresented with genes belonging to the transendothelial migration of leukocyte (TEML) pathway. The transcription co-factor, Lim domain binding 2 (LDB2), was identified as putative regulator of the A-module and TEML pathway in validation studies including Ldb2-/- mice.

In paper 2, we investigated the increased incidence of postoperative complications in CABG patients with diabetes. Using the STAGE compendium, we identified an anti-inflammatory marker, dual-specificity phosphatase 1 (DUSP1), as a novel preoperative blood marker of risk for a prolonged hospital stay after CABG.

In paper 3, plaque age was determined with C14-dating in the carotid patients. Interestingly, the strongest correlation with plaque age was not the age of the patients or IMT. Rather, the strongest correlations were with plasma insulin levels and inflammatory gene expression.

Taken together, the findings in this thesis show that a top-down approach using multi-tissue gene expression profiling in CAD and C14-dating of plaques can contribute to a better understanding of the molecular processes underlying atherosclerosis development and to the identification of clinically useful biomarkers.

Page 4: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

POPULÄRVETENSKAPLIG SAMMANFATTNING Allt fler människor i världen utvecklar idag hjärt-kärlsjukdom och drabbas av dess följder som hjärtinfarkt och stroke. Den bakomliggande orsaken är främst förträngningar i kärlen som ger upphov till minskad blodtillförsel i organ och vävnader. Förträngningarna uppstår när kolesteroler i blodet tar sig in i kärlväggen och lagras i fettdepåer. Denna process involverar även immunförsvaret som genom sitt angrepp mot kolesterolet startar en kronisk inflammation i kärlväggen. Genom en förbättrad livsstil, dvs. genom att äta hälsosam mat, öka sin fysiska aktivitet och genom att inte röka kan man minska det sjukliga förloppet av fettinlagringar i kärlen. Tyvärr finns det också ärftliga anlag som påverkar benägenheten att få förträngningar och genom att studera dessa kan vi bättre förstå nya mekanismer bakom sjukdomen.

Vi har studerat arvsmassan, dvs. generna, hos kranskärlssjuka patienter för att hitta nya gener som är viktiga för sjukdomsförloppet. Vi har mätt alla geners uttryck och kopplat detta till graden av förträngning hos varje patient. På så sätt har vi lyckats identifiera en grupp av gener som är kopplade till hur celler i vårt immunförsvar tar sig in i kärlväggen och reagerar på fettinlagringarna. Vi har också hittat en gen vars proteinnivåer i blod tyder på en inflammation hos vissa patienter. Denna inflammation förefaller leda till ökad risk för komplikationer och ökat behov av vård efter genomgången kranskärlsoperation. Slutligen så har vi, med hjälp av C14-dateringsmetoden, för första gången lyckats mäta hur många år sedan det var som fettinlagringen startade hos en grupp patienter. På så vis har vi kunnat konstatera att en snabbare utveckling av kärlförträngningar hos vissa individer är kopplad till en högre grad av inflammation i kärlet. Genom att dela med oss av dessa nya kunskaper hoppas vi kunna öka förståelsen för sjukdomsmekanismen bakom hjärt-kärlsjukdomar. En förståelse som i förlängningen kan leda till förbättrade behandlingar och nya mediciner.

Page 5: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

PREFACE Many people have contributed to the studies in this thesis, which is collaboration between Linköping University, Karolinska Institutet (KI) and Clinical Gene Networks AB. The main co-workers, all with different skills, are presented here.

Biopsies were assembled by surgeons at the Karolinska University Hospital Solna (Torbjörn Ivert, Anders Franco-Cereceda, Jan Liska, Ulf Lockowandt) and at the Stockholm Söder Hospital (Peter Konrad, Rabbe Takolander). Blood sampling of the Tartu cohort in Estonia was done by another surgeon (Arno Ruusalepp). Patient characteristics was collected by a research nurse at the Thoracic research clinic at the Karolinska University Hospital (Merja Heinonen). I gathered additional clinical characterizations and performed angiographic measurements of patients. Intima-media thickness measurements were done by Stefan Rosfors at the Stockholm Söder Hospital. Plaque age estimation by 14C-dating was done by Mehran Salehpour at Uppsala University. RNA isolation was performed in our own laboratory at KI mainly by Peri Noori (PN) and Josefin Skogsberg (JS). Assessment of RNA quality and quantity was performed by me, PN and JS. The expression profiling was also done in our laboratory by PN, JS, myself and Roland Nilsson (RN). Jesper Lundström (JL), RN, myself and Björn Brinne did data pre-processing. As for the analyses, JL and myself worked in collaboration where JL did the two-way clustering and network analysis. I did most of the other calculations such as the gene function annotation, genetic enrichment (together with Judy Zhong from the Schadt group), Spearman rank correlation and statistical analysis. The experimental validation was carried out by JS, PN, Shohreh Maleki and Ming-Mei Shang. I wrote the papers together with Johan Björkegren (JB) and JS. JB designed and had the overall responsibility for the clinical studies, JS for the validation experiments and Jesper Tegnér (JT) for data analysis. JB and JT have joint overall responsibility for the laboratory.

Page 6: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

PUBLICATIONS INCLUDED IN THE THESIS

I. Hägg S*, Skogsberg J*, Lundström J*, Noori P, Nilsson R, Zhong H, Maleki

S, Shang M-M, Brinne B, Bradshaw M, Bajic V, Samnegård A, Silveira A,

Kaplan LM, Gigante B, Leander K, de Faire U, Rosfors S, Lockowandt U,

Liska J, Konrad P, Takolander R, Franco-Cereceda A, Schadt EE, Ivert T,

Hamsten A, Tegnér J, Björkegren J.

Multi-Organ Expression Profiling Uncovers a Gene Module in Coronary Artery

Disease Involving Transendothelial Migration of Leukocytes and LIM Domain

Binding 2: The Stockholm Atherosclerosis Gene Expression (STAGE) Study.

PLoS Genetics. In press. * Shared 1st authors.

II. Hägg S, Alserius T, Noori P, Skogsberg J, Ruusalepp A, Ivert T, Tegnér J,

Björkegren J.

Dual-Specificity Phosphatase-1—An Anti-Inflammatory Marker in Blood

Independently Predicting Prolonged Postoperative Stay after Coronary Artery

Bypass Grafting.

Submitted to JACC

III. Hägg S, Salehpour M, Noori P, Lundström J, Skogsberg J, Konrad P,

Rosfors S, Tegnér J, Björkegren J. Carbon-14 Dating to Determine Carotid Plaque Age. Manuscript

Page 7: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

OTHER PUBLICATIONS

I. Kirkegaard M, Skjönsberg Å, Hägg S, Bucinskaite V, Laurell G,

Ulfendahl M.

Cisplatin-Induced Gene Expression in the Rat Cochlea. Manuscript

II. Lindgren E, Enervald E, Hägg S, Sjögren C, Ström L.

Inactivation of the Cohesion Machinery Alters the Gene Expression Pattern

both Globally and in Response to a Single DNA Double Strand Break.

Manuscript

III. Kostyszyn B, Åkesson E, Hägg S, Ulfendahl M.

Molecular Mechanisms Involved in Early Human Inner Ear Development. Manuscript

Page 8: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

CONTENTS 1  Introduction ................................................................................................. 1 

1.1  Background ....................................................................................... 1 1.2  Atherosclerosis ................................................................................. 1 1.3  Environmental Risk Factors .............................................................. 2 

1.3.1  Modifiable Factors ................................................................ 2 1.3.2  Non-Modifiable Factors......................................................... 3 

1.4  Genetic Risk Factors ........................................................................ 3 1.4.1  Rare Factors ......................................................................... 3 1.4.2  Common Factors .................................................................. 4 

1.5  Clinical Procedures ........................................................................... 4 1.5.1  Angiographic Procedures ..................................................... 4 1.5.2  Coronary Artery Bypass Grafting (CABG) Surgery .............. 5 1.5.3  Carotid Surgery ..................................................................... 5 

1.6  Whole-Genome Expression Profiling ............................................... 5 1.6.1  Gene Expression Measurements ......................................... 5 1.6.2  Gene Analysis ....................................................................... 6 1.6.3  Functional Analysis ............................................................... 7 

1.7  Cell and Animal Models of Atherosclerosis ...................................... 8 2  Aim ............................................................................................................ 10 

2.1  Specific Aim of Each Paper ............................................................ 10 3  Methods .................................................................................................... 11 

3.1  Cohorts ............................................................................................ 11 3.1.1  The Stockholm Atherosclerosis Gene Expression (STAGE) Cohort ............................................................................................ 11 3.1.2  The Carotid Patient Cohort ................................................. 12 3.1.3  The Tartu University Hospital Cohort ................................. 12 

3.2  Plaque Measurements .................................................................... 12 3.2.1  Quantitative Coronary Angiogram (QCA)........................... 12 3.2.2  Intima-Media Thickness (IMT) ............................................ 13 3.2.3  Plaque Age by Carbon-14 Technique ................................ 13 

3.3  Laboratory Measurements .............................................................. 14 3.4  Gene Expression Analysis .............................................................. 14 

3.4.1  Cluster Analysis .................................................................. 14 3.4.2  Resampling Analysis .......................................................... 14 3.4.3  Correlation Analysis ............................................................ 14 3.4.4  Gene Function Analysis ...................................................... 15 

3.5  Genetic Enrichment Analysis .......................................................... 15 3.6  Statistical Analysis .......................................................................... 15 

4  Results and Discussion ............................................................................ 16 4.1  Identification of an Atherosclerosis Gene Module in CAD Patients (Study I) .................................................................................................... 16 4.2  LDB2 as a Regulator of the Atherosclerosis Module (Study I) ..... 19 4.3  DUSP1 – An Anti-inflammatory Marker of PostOperative Stay and Complications (Study II) ........................................................................... 21 

Page 9: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

4.4  A Novel Plaque Characteristic – Carotid Plaque Age Determined by Carbon-14-Dating (Study III) ................................................................25 

5  General Discussion ...................................................................................29 5.1  Experimental Planning ....................................................................29 5.2  Gene Expression Profiling of Heterogeneous Samples .................29 5.3  Clustering of Gene Expression Profiles ..........................................30 5.4  Human Gene Expression Studies in Atherosclerosis .....................31 5.5  Genetics of Gene Expression Studies ............................................31 

6  Future Considerations ...............................................................................33 6.1  Biological Research Moving Forward .............................................33 6.2  Personalized Medicine ....................................................................34 

7  Concluding Remarks .................................................................................35 8  Acknowledgements ...................................................................................36 9  References ................................................................................................38 

Page 10: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

LIST OF ABBREVIATIONS AF Atrial Fibrillation

ALAT Alanine Aminotransferase

AMS Accelerator Mass Spectrometry

BMI Body Mass Index

CABG Coronary Artery By-pass Graft

CAD Coronary Artery Disease

CCA Common Carotid Artery

CRP C-Reactive Protein

CVD Cardiovascular Disease

DAVID Database for Annotation, Visualization, Integration and

Discovery

DNA Deoxyribonucleic Acid

DS Diameter Stenosis

DUSP1 Dual-Specificity Phosphatase 1

EC Endothelial Cells

FBG Fasting Blood Glucose

FDR False Discovery Rate

FH Familial Hypercholesterolemia

GGE Genetics of Gene Expression

GO Gene Ontology

GWAS Genome Wide Association Studies

HDL High Density Lipoprotein

IMT Intima-Media Thickness

KEGG Kyoto Encyclopedia of Genes and Genomes

LCA Left Coronary Artery

LD Lumen Diameter

LDB2 LIM Domain Binding 2

LDL Low Density Lipoprotein

MI Myocardial Infarction

mRNA messenger RNA

QCA Quantitative Coronary Angiography

qRT-PCR Quantitative Real-Time Polymerase Chain Reaction

RefSeq Reference Sequence

RMA Robust Multichip Average

RNA Ribonucleic Acid

SAGE Serial Analysis of Gene Expression

SD Standard Deviation

SMC Smooth Muscle Cells

SNP Single Nucleotide Polymorphism

STAGE Stockholm Atherosclerosis Gene Expression study

SPC Superparamagnetic Clustering

TEML Transendothelial Migration of Leukocytes

TF Transcription Factor

Page 11: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

TIA Transient Ischemic Attack

VLDL Very Low Density Lipoprotein

WHR Waist-to-Hip Ratio

WTCCC Wellcome Trust Case Control Consortium

Page 12: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS
Page 13: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Introduction 1

1 INTRODUCTION 1.1 BACKGROUND

In westernized countries and increasingly in developing countries, a major cause of death is cardiovascular disease (CVD). According to World Health Organization (WHO), about 17 million people worldwide die of CVD every year [1]. CVD comprises many different diseases and coronary artery disease (CAD), largely due to atherosclerosis, is predominant. When atherosclerotic plaques rupture they often give rise to stroke, in which blood supply to the brain is interrupted, or to myocardial infarction (MI), in which the blood supply to the heart is interrupted. These diseases are on the rise around the world and lead to huge socio-economic burden in the cost of healthcare and lost productivity due to early mortality.

Atherosclerosis is a complex disease caused by a combination of many factors, including both environmental and genetic determinants, which act together in an intricate fashion throughout the progression of disease. After release of the human genome in 2001 [2-3], many new measurement techniques were introduced for genetic research—and studies of complex disorders such as CAD shifted from a candidate gene approach to systems biology approaches, which can better take into account the full complexity of disease processes [4]. Thus, our understanding of how all genes and proteins interact in different pathways, cell types and organs during the course of disease will continue to grow, paving the way for new treatments and disease markers for clinical use.

1.2 ATHEROSCLEROSIS

Atherosclerosis is a disease of the large arteries characterized by the accumulation of lipids in the intima, the innermost layer of the artery wall (Figure 1). Atherosclerosis initiates when low density lipoproteins (LDL) are transported across the endothelial cell (EC) barrier into the intima, where they react with proteins and free radicals in the extracellular matrix, forming an oxidized complex. The oxidized LDL particle acts as an antigen and triggers the immune system by attracting monocytes to the location. Through a process called transendothelial migration, monocytes enter the intima via cell-adhesion molecules on the EC and start to proliferate and differentiate into macrophages. Macrophages secrete cytokines, which stimulate the migration of more

Page 14: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

2 Introduction | Sara Hägg

monocytes and other circulating leukocytes, which adhere to and pass through the EC layer. Macrophages start to express scavenger receptors that recognize and bind to the oxidized LDL. Soon, rapid uptake of the lipid particles by macrophages leads to foam-cell formation. As foam cell formation progresses, smooth muscle cells (SMCs) migrate from the media to the intima, forming an extra barrier next to the EC, a fibrous cap. Apoptotic foam cells together with extracellular matrix establish a necrotic core known as the atherosclerotic plaque. A vulnerable plaque has a thin fibrous cap that breaks easily, exposing plaque contents to the blood flow resulting in thrombosis (Figure 1, lower panel) [5-6].

Figure 1. Atherosclerosis progression in the arterial wall. LDL particles enter the intima, become oxidized, and trigger the immune system. Circulating monocytes adhere to and migrate through the EC layer, proliferate into macrophages, and start to incorporate oxidized LDL via scavenger receptors. Foam cells are created as lipid content of the macrophages increases and apoptotic processes start building a necrotic core, or plaque. Thrombosis occurs when the plaque ruptures, exposing its contents to the blood flow (inset, lower right corner).

1.3 ENVIRONMENTAL RISK FACTORS

1.3.1 Modifiable Factors

The traditional environmental risk factors for atherosclerosis are lifestyle choices that are modifiable, including intake of cholesterol-rich foods, low exercise level, and smoking. Low physical activity in combination with an unhealthy diet will elevate plasma levels of LDL (the bad cholesterol), lead to increased lipid accumulation both in

Page 15: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Introduction 3

the artery wall and in fat deposits, and increase the incidence of obesity and the risk of developing type 2 diabetes. A low level of high-density lipoprotein (HDL; the good cholesterol) and an increased level of triglycerides are also unfavorable and associated with increased risk for CVD [7]. Moreover, cigarette smoke exposure increases the level of free radicals and thereby increases oxidative stress in the circulatory system, which leads to endothelial dysfunction and diminished vasomotor function. The artery wall will thus be weakened, LDL oxidation will increase, and inflammatory processes will be triggered, leading to atherosclerosis symptoms [8]. Hypertension (high blood pressure) is a risk factor that also increases the stress on the artery wall. It rises with age and is often associated with a cholesterol-rich diet [9].

1.3.2 Non-Modifiable Factors

Non-modifiable risk factors for atherosclerosis include gender, higher age, heredity, airborne pollutants, and infections. In males, fat distribution is often characterized by central abdominal obesity, in which fat is stored within tissues surrounding the organs (visceral fat), which can lead to an adverse metabolic profile. In females, however, the fat is stored below the skin (subcutaneous fat) and does not cause adverse metabolic effects to the same extent [10]. Air pollution from the combustion of traditional fossil fuels and photochemical air pollution (ozone) are both potent oxidants; in the respiratory and circulatory system, they cause pathogenic reactions similar to those of cigarette smoke [11]. A viral infection can help trigger a local immune response and contribute to atherosclerotic plaque development in several ways [12]; however, it is not entirely clear how infectious agents contribute to CAD. Manifestations of other diseases with shared pathophysiological mechanisms are also risk factors for developing atherosclerosis. Type 2 diabetes and metabolic syndrome display disturbed metabolic activities in the liver and peripheral tissues such as skeletal muscle and adipocytes giving rise to lipoprotein abnormalities [7,13-14]. Autoimmune disorders such as systemic lupus erythematosus and rheumatoid arthritis have many characteristics in common with atherosclerosis, e.g., inflammatory mechanisms including leukocyte infiltration and cytokine secretion [15].

1.4 GENETIC RISK FACTORS

A family history of CAD is a very important risk factor. There could be rare mutations or common genetic variants affecting the genetic predisposition and thereby, in a combination with environmental factors, leading to increased risk of developing CAD.

1.4.1 Rare Factors

Rare diseases caused by single gene mutations are inherited in a Mendelian fashion and can cause very large effects. One such disease is familial hypercholesterolemia (FH), caused by a mutation in the LDL receptor that dramatically reduces LDL uptake from the circulation. This results in extremely high plasma cholesterol levels and premature atherosclerosis. In most populations, FH heterozygotes occur with a frequency of about 1:500 and homozygotes with a frequency of 1:1 000 000. Other monogenic disorders affect the lipoprotein profile in a similar manner. Some mutations,

Page 16: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

4 Introduction | Sara Hägg

however, give rise to favorable conditions. For example, the so-called Milano mutation improves reverse cholesterol transport by HDL. A mutation with opposite effect on HDL is seen in Tangier disease where the transmembrane cholesterol transport is impaired causing decreased HDL formation and increased risk of CAD as a consequence [16].

1.4.2 Common Factors

Single nucleotide polymorphisms (SNPs) are nucleotide variations seen in a large group of a population that give rise to weak, yet sometimes significant, changes in common disease phenotypes. Genome-wide association studies (GWAS) to identify multiple loci containing disease-linked variants require large, well-defined cohorts, necessitating world-wide collaborations. GWAS offers a new approach to discover genes and biological processes that affect heritable disease traits, especially those associated with common complex disorders such as CAD [17].

Recently the Wellcome Trust Case Control Consortium (WTCCC) identified a locus (9q21) that is strongly associated with CAD [18-19]. Repeated studies have confirmed that this locus significantly increase the risk of MI, as shown by the deCODE program in Iceland [20], and of type 2 diabetes, as shown by the Diabetes Genetics Initiative [21]. However, the underlying mechanism by which the affected genes of interest act is still unknown.

Although many new genetic variants have been identified by GWAS, they explain only a small fraction of the overall calculated inherited risk of developing common diseases. Many more disease loci remain to be recognized, and much more reliable techniques must be invented to fully grasp the complex picture of inheritance in common diseases such as CVD [17,22].

1.5 CLINICAL PROCEDURES

1.5.1 Angiographic Procedures

Signs of CVD appear in different forms. When the coronary arteries are affected (i.e., CAD), chest pain and MI are usually noticed. In patients with these signs, angiography of the coronary arteries is performed to evaluate the degree of plaque (i.e., stenosis; see section 3.2.1). Coronary angiography is performed by inserting a catheter into the femoral artery in the groin, guiding it to the aorta in the heart, and injecting an iodinated contrast material. The coronary arteries can then be visualized by x-ray fluoroscopy and recorded by a video camera (Figure 2). If the arterial narrowing is not too great, the artery may be widened by balloon angioplasty, in which an intra-arterial balloon is inflated to compress the plaque. Often, an expandable metal stent, a wire mesh tube, is then inserted. Sometimes, the stent is coated with an eluting drug to prevent restenosis, or regrowth of the plaque.

Page 17: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Introduction 5

Figure 2. A coronary angiogram of the left coronary artery (LCA) from a patient in the STAGE cohort. Coronary arteries are visualized by x-ray after injection of contrast material through a catheter inserted in the femoral artery and guided to the aorta. The camera angle can be changed so that passage of blood through the arteries can be seen in several projections, allowing the extent of plaques (degree of stenosis) to be determined.

1.5.2 Coronary Artery Bypass Grafting (CABG) Surgery

If an angiographic procedure such as balloon angioplasty is not sufficient to relieve symptoms, CABG can be performed. In this procedure, a healthy artery without atherosclerosis, such as the internal mammary artery (IMA) from the chest wall, or a piece of a vein from a leg or an arm is used to make a detour around the blocked part of the coronary artery. However, CABG is an invasive open heart surgery that will increase the risks for co-morbidities and co-mortalities and require many days of hospitalization.

1.5.3 Carotid Surgery

The carotid artery, located in the neck, supplies blood to the brain. When it is blocked, signs of dizziness, fainting, transient ischemic attacks (TIA), or even stroke may occur. In such cases, the carotid artery can be examined by ultrasound to measure intima-media thickness (IMT), an indication of plaque size (see section 5.2.2). If the plaque is large, it must be removed by a surgical procedure called carotid endarterectomy, in which the artery is cut open and the plaque is scraped off. This surgery is most often done with the patient asleep under general anesthesia; however, sometimes a local anesthetic is used, and the patient remains awake.

1.6 WHOLE-GENOME EXPRESSION PROFILING

The last decade has seen a paradigm shift in how complex diseases are studied, from the old-school “bottom-up” approach of looking for candidate genes to the new and promising “top-down” approach of simultaneously assessing many genes. One example of a top-down approach is whole-genome expression measurement, a technique that was made possible by the decoding of the human genome.

1.6.1 Gene Expression Measurements

Gene expression measurements, or mRNA transcript measurements, can be done with different techniques. To measure only a few genes, quantitative real-time polymerase chain reaction (qRT-PCR) is used. For analysis of many genes, serial analysis of gene expression (SAGE) [23] is used to detect small tags of 9 base pairs (bp), sufficient information to identify the gene target. SAGE is rapid and reliable, but microarrays are a less expensive method for whole-genome measurements. With this technology, probes (11–50 bp mapped to gene target sequences) attached to a glass slide hybridize to fluorescent target sequences, and probe intensity is assessed by

Page 18: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

6 Introduction | Sara Hägg

laser scanning [24]. Multiple platforms and manufacturers are available for microarray technology and compatibility problems of available data sets are recognized.

1.6.2 Gene Analysis

1.6.2.1 Single Gene Discovery

Microarray technology generates an immense amount of biological data, which needs to be arranged. A common procedure is to use a supervised method (using a known sample classification) such as a t-test to identify all genes differentially expressed between two states (e.g., disease versus control). However, many thousands of genes are analyzed, which creates a multiple testing problem where many genes by chance will turn out to be falsely discovered. To deal with this issue, the false discovery rate (FDR) can be assessed [25].

Some early studies of gene expression applied fold changes to extract genes with marked up- or downregulation. However, this approach does not provide any statistical significance and is therefore not commonly used anymore.

Another approach is to use Spearman rank correlation (see explanation below), where a single gene profile, mRNA expression levels for a specific gene over all samples, is correlated to a clinical phenotype. Here a top-ranked candidate gene will be prioritized in the following analysis.

1.6.2.2 Gene Clustering

To organize multiple genes at the same time, genes can be grouped into classes (clusters) on the basis of their similarity, i.e., to look for co-expression. This type of approach is unsupervised, as no prior knowledge of sample composition is used.

1.6.2.2.1 Distance Metric

Co-expression can be measured with a distance metric, which can be parametric or nonparametric. Among the most widely used techniques are Euclidian distance and Manhattan distance, which measure the absolute distance between pairs of genes using a straight line or axis-aligned direction, respectively; both distances take into account the magnitude of expression changes. Other metrics include cosine-angle, which measures the angular separation, and Pearson correlation, which measures the directional similarity and is insensitive to the length of the expression vector. However, Pearson correlation is sensitive to outliers in the data that could have great effects on the vector direction. A popular nonparametric measure is Spearman rank correlation, which considers the ranks instead of the actual values in each expression pattern and is thus insensitive to outliers.

When genes are grouped into different clusters, the distances between clusters can be measured by using single, complete, average, or centroid linkage. Single linkage compares the closest neighbors in two different clusters, whereas complete linkage compares the most distant neighbors. Average linkage compares the means of all genes in a cluster, and centroid linkage compares the centroid distance. Single linkage clustering produces large clusters, whereas complete linkage produces small and compact clusters [26].

Page 19: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Introduction 7

1.6.2.2.2 Clustering Algorithms

Hierarchical clustering is a regular form of clustering algorithm that graphically presents the result as a tree diagram (dendrogram). The length of the branches in the tree reflects the co-expression of two genes (Figure 3). All combinations of distance measures used for genes and clusters could be applied, leaving many combinations to consider before analysis [27].

Another type of clustering algorithm is partitioning, in which a number of clusters are defined and the distance between clusters or within clusters is not taken into account. K-means clustering, a common form of partitioning, starts by randomly selecting a mean for each cluster. Then every gene expression profile is assigned to a cluster by finding the closest mean, and the process is iterated by recalculating cluster means and reassigning expression profiles until no more change is needed. K-means clustering has two weaknesses: the difficulty of selecting the appropriate number of clusters and its sensitivity to noise and outliers [26].

Superparamagnetic clustering (SPC) is a partitioning method that differs from hierarchical clustering by allowing genes to belong to multiple clusters or to no cluster at all. The distance measure used by SPC applies the absolute value of Spearman rank correlation. Thus, anti-correlated genes are also put together. The probability of genes belonging to the same cluster is calculated by using a temperature value (T). At low T, many genes have similar properties, and the system is in an ordered ferromagnetic state. At high T, the gene properties change, and the state becomes unordered and paramagnetic. At some value of T, the so-called superparamagnetic state, the system contains some genes in the ordered ferromagnetic state and some in the unordered paramagnetic state—thereby defining genes that belong (ordered), or do not belong (unordered), to a given cluster. The advantage of SPC is that it excludes outliers and noisy data and that the algorithm itself selects the optimal number of clusters [28-29].

1.6.3 Functional Analysis

Genes that cluster tightly together show related functionality. In this manner, co-expression of genes with known function and of poorly characterized or novel genes may provide insights into molecular activities that are not currently available [27,30]. There are many tools to facilitate functional analysis. The most commonly used tool is Gene Ontology (GO) [31] (Figure 3), a database in which all genes and their products are characterized according to a controlled vocabulary of terms. GO uses three subclasses to define a gene: (1) the cellular component of the gene product, (2) its molecular function, and (3) the biological process it regulates. Other projects provide detailed maps of molecular activity, manually drawn as pathways representing existing knowledge of molecular interactions and networks in areas of metabolism, human diseases, and cellular processes (Figure 3). The most prominent of these projects are the KEGG encyclopedia [32] and Biocarta [33].

Page 20: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

8 Introduction | Sara Hägg

Figure 3. Schematic overview of different analysis procedures related to gene expression profiling. A vast amount of DNA microarray data can be organized by clustering algorithms that classify functionally associated genes in terms of co-expression. Biologically relevant information is extracted by using pathway and Gene Ontology (GO) analysis.

1.7 CELL AND ANIMAL MODELS OF ATHEROSCLEROSIS

To validate functional genomics studies in atherosclerosis, it is important to have a good model of the disease. In vitro experiments in all major lesion cell types could provide information on whether a gene is expressed in the cell and what type of action is seen in induced vs. non-induced cells. There could also be major differences depending on what type of in vitro system is used. Some systems use primary cells, such as human umbilical vein endothelial cells derived directly from the umbilical vein. Others use cultured cell lines, such as human endothelial cell line EAHY926, a hybridoma cell line with characteristics of human endothelial cells [34]. Repeatability of experiments is a problem with primary cells, unless they are taken from the same individual under the same conditions—which is difficult to do. Cell lines, on the other hand, are permanent and allow repeated experiments, but because such cells are cloned, they do not have the same properties as primary cells.

In vivo atherosclerosis experiments are conducted predominantly with mouse models. Mice have many advantages. They are small, breed quickly, and are relatively inexpensive to maintain. They are easily genetically modified to mimic human cardiovascular diseases. However, mice have a different lipoprotein profile compared to humans and do not normally develop atherosclerosis. Some strains, when given a

Page 21: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Introduction 9

high-fat diet, can build up plaques, but most atherosclerotic mouse models are genetically engineered.

A common transgenic model is the apolipoprotein E knockout (Apoe-/-) mouse. Apoe is involved in lipoprotein uptake from the circulation by the LDL receptor. Without apoE, mice develop hypercholesterolemia and atherosclerotic lesions. The LDL receptor knockout (Ldlr-/-) mouse has a human-like lipoprotein profile and develops moderate plaques on a normal chow diet and advanced plaques on a high-fat diet. When crossed with mice expressing only apolipoprotein B100 (Apob100/100), which is naturally found in lower amounts in mice and is associated with increased risk for atherosclerosis in humans, the mice (Apob100/100ÿLdlr-/-) develop severe atherosclerosis on a chow diet [16,35].

Page 22: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

10 Aim | Sara Hägg

2 AIM The overall aim of this thesis was to identify functionally associated genes related to atherosclerosis and its co-morbidities by using whole-genome expression profiling in multiple tissues of well-characterized patients with CAD or carotid stenosis and to validate key genes in appropriate model systems of atherosclerosis.

2.1 SPECIFIC AIM OF EACH PAPER

I . To identify groups of functionally related genes in multiple organs important for atherosclerosis development.

I I . To identify biomarkers of adverse outcome after CABG.

I I I . To determine the importance of plaque age for clinical outcome and molecular underpinnings of atherosclerosis.

Page 23: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Methods 11

3 METHODS 3.1 COHORTS

3.1.1 The Stockholm Atherosclerosis Gene Expression (STAGE) Cohort

3.1.1.1 Patient Recruitment

From 2002 through 2004, 124 patients undergoing CABG surgery at the Karolinska Hospital Solna were enrolled in the STAGE study. Exclusion criteria were severe diseases (e.g., cancer, kidney disease, and chronic systemic inflammatory diseases) other than CAD. The studies were approved by the Ethics Committee of Karolinska University Hospital. All patients gave written informed consent. Patients were also characterized by quantitative coronary angiogram (QCA) techniques (see section 3.2.1).

3.1.1.2 Biopsy Collection

Biopsies were taken by four surgeons, who followed a strict protocol to standardize the collection procedure, including a specific order for obtaining each tissue sample in relation to the start of surgery. Anesthesia was standardized to keep systolic blood pressure <150 mm Hg throughout the operation. The time of extraction of each biopsy, deviations from the protocol, and non-routine events were noted. Tissue samples were obtained from each subject at five sites: (1) the wall of the ascending aorta (aortic root) at the site of proximal vein anastomosis, (2) the IMA dissected from the inside of the left chest wall, (3) the skeletal muscle close to the incision in sternum, (4) visceral fat in mediastinal tissue, and (5) liver tissue; the latter biopsy was obtained through a small incision in the diaphragm. The tissue samples were rinsed with RNAlater (Qiagen) to remove blood and immediately put into vials containing fresh RNAlater. The vials were stored at room temperature until the end of surgery, at 4°C overnight, and then at –80°C until further processed. Furthermore, we performed whole-genome expression profiling of 66 samples of liver, skeletal muscle, and visceral fat and 40 samples of the aortic root and IMA, using HG-U133 Plus 2.0 arrays (Affymetrix).

Page 24: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

12 Methods | Sara Hägg

3.1.1.3 Three-Month Follow-up

Patients were scheduled for a follow-up meeting 3 months after the surgery. Three patients had died and seven others did not show up, leaving 114 patients for testing. Using a standard questionnaire, a research nurse obtained a medical history and lifestyle information (e.g., smoking, alcohol consumption, and physical activity). A physical examination was performed and venous blood was sampled. Moreover, the hospitalization time after surgery was defined.

3.1.2 The Carotid Patient Cohort

3.1.2.1 Patient Recruitment

Starting in 2003, 42 patients undergoing carotid endarterectomy at Stockholm Söder Hospital were admitted to the study. Exclusion criteria were severe diseases (e.g., cancer, kidney disease, and chronic systemic inflammatory diseases) other than CAD. The studies were approved by the Ethics Committee of Karolinska University Hospital. All patients gave written informed consent. IMT was measured to estimate the atherosclerosis burden (see section 3.2.2).

3.1.2.2 Biopsy Collection

Two surgeons performed the endarterectomy procedure. The carotid plaque was dissected and immediately embedded in OCT (Histolab Products), frozen in liquid isopentane and dry ice, and stored at –80ºC. We also performed whole-genome expression profiling of 25 carotid stenosis samples using HG-U133 Plus 2.0 arrays (Affymetrix).

3.1.2.3 Three-Month Follow-up

Patients were scheduled for a follow-up meeting 3 months after surgery. Two patients had died and another patient did not show up, leaving 39 patients for testing. Using a standard questionnaire, a research nurse obtained a medical history and lifestyle information (e.g., smoking, alcohol consumption, and physical activity). A physical examination was performed, including venous blood sampling.

3.1.3 The Tartu University Hospital Cohort

Starting in 2007, preoperative blood samples from 250 patients undergoing CABG at Tartu University Hospital were collected according to the STAGE protocol. Before surgery, patients were screened for CAD risk factors by lifestyle questioning, standardized physical examination, and routine lab tests.

3.2 PLAQUE MEASUREMENTS

3.2.1 Quantitative Coronary Angiogram (QCA)

All patients underwent preoperative cardiac catheterization (Judkins technique). The coronary arteries were identified, divided into segments according to the AHA classification [36], and analyzed by QCA. Each segment was measured during end-diastole, and the extent of stenosis was estimated as a percentage of lumen diameter

Page 25: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Methods 13

(LD) reduction, i.e., percentage diameter stenosis (DS) (Figure 4A). A stenosis score was calculated from all atherosclerotic lesions in the coronary arteries (1, 20–50% DS; 2, >50% DS). A mean percentage of plaque area was calculated from the seven biggest segments of the LCA (Figure 4B).

Figure 4. Cross-section of a vessel segment showing angiographic measurements. (A) The LD and the percentage DS, or percentage reduction of LD at a plaque site. (B) The plaque area and the lumen area in a two-dimensional cross-section of a vessel segment. The plaque area of the segment was calculated as (plaque area) / (plaque area + lumen area) and expressed as a percentage.

3.2.2 Intima-Media Thickness (IMT)

Before surgery, the carotid arteries were examined by B-mode ultrasound. Blood flow velocities were recorded, and a velocity >1.2 m/s was used to define a DS >50%. The far wall (better for estimations than the near wall because of the location of the anatomic interface) of the common carotid artery (CCA) and the carotid bulb was used for measurements of IMT on both right and left CCA (Figure 5). The IMTmean was calculated from IMTright and IMTleft, and the IMT from the endarterectomy side (left or right) is referred to as the IMT value.

Figure 5. The CCA and bifurcation. The plaque (gray) is normally situated in the carotid bulb. IMT was measured at the far wall using ultrasound originating from the near wall.

3.2.3 Plaque Age by Carbon-14 Technique

The age of carotid plaque samples was determined by 14C-dating using accelerator mass spectrometry (AMS). The 14C:12C isotopic ratio was measured with Uppsala 5 MV Pelletron Tandem Accelerator (NEC). The fractionation-corrected isotopic ratios were presented in fraction modern (F14C), and the average formation year was extracted based on the measurements of atmospheric CO2 concentration variation in Northern Europe, using the Levin bomb curve data as reference [37-39].

Page 26: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

14 Methods | Sara Hägg

3.3 LABORATORY MEASUREMENTS

Subjects were screened for CAD risk factors, including fasting plasma concentrations and content of very low density lipoprotein (VLDL), LDL, and HDL in cholesterol and triglycerides, fasting glucose levels, insulin and pro-insulin concentrations, liver and kidney failure biomarkers, inflammatory markers such as C-reactive protein (CRP), and anthropometric variables such as body mass index (BMI) and waist-to-hip-ratio (WHR).

3.4 GENE EXPRESSION ANALYSIS

Biopsies collected from different tissues during surgery were preserved at -80C. After RNA isolation, sample quantity and quality was assessed with a spectrophotometer (ND-1000, NanoDrop Technologies) and Agilent Bioanalyzer 2100 before hybridization to HG-U133 Plus 2.0 arrays (Affymetrix). The arrays were processed with a Fluidics Station 450, scanned with a GeneArray Scanner 3000, and analyzed with GeneChip Operational Software 2.0. Gene expression values were pre-processed using the robust multichip average [40] procedure in three steps: (1) background adjustment, (2) quantile normalization, and (3) summarization. Perfect-match Affymetrix probe signals were mapped to transcripts using reference sequence (RefSeq) numbers as identifiers [41]; 15,042 RefSeq transcripts were generated, corresponding to 12,621 unique genes. Some samples showed signs of experimental batch effects (i.e., undesired technical variation, see section 5.1). To address this issue, we used batch normalization methods [42].

3.4.1 Cluster Analysis

Gene expression data were clustered by a coupled two-way approach [43], which identifies gene clusters using the SPC algorithm [28-29] and was stopped after the first iteration. For each gene cluster, patients were classified by hierarchical clustering, using the Manhattan distance between gene profiles and average linkage to calculate the distance between clusters [27]. See section 1.6.2.2.2 for a more detailed description of the SPC algorithm.

3.4.2 Resampling Analysis

To ensure the stability of the clusters, we performed resampling; that is, we constructed data subgroups for cluster reexamination, using Jackknife method with leave-one-out analysis [44], excluding one patient at the time. For each new dataset, clusters were reassembled with the two-way approach. After all runs, a 90% one-sided confidence interval was constructed containing the smallest number of genes that reoccurred in each cluster.

3.4.3 Correlation Analysis

To identify correlations between gene expression levels (mRNA) and other parameters, we used Spearman rank correlation, a nonparametric measure. P-values were calculated with Student´s t-test after a Fisher transformation. For correlation analysis between clinical parameters we used Pearson correlation.

Page 27: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Methods 15

3.4.4 Gene Function Analysis

We used the GO category “molecular function” and pathway analysis to describe the genes of interest. All analyses were performed with Database for Annotation, Visualization and Integrated Discovery (DAVID) software [45], where both KEGG and BioCarta pathways were represented. All gene groups were presented with enrichment p-values and corresponding p-values corrected for the FDR. Genes from the same family were classified using the Panther database (http://www.pantherdb.org/). In addition, we used text mining analysis to generate a list of genes previously related to atherosclerosis. A gene was considered related if it co-occurred in the abstract of a published article in PubMed with the MeSH terms coronary arteriosclerosis, arteriosclerosis, or atherosclerosis or with the text words coronary artery disease, arteriosclerosis, or atherosclerosis.

3.5 GENETIC ENRICHMENT ANALYSIS

We used data from a recent GWAS, the WTCCC study, containing 2000 CAD cases and 3000 controls [19]. Each DNA sample was genotyped on the Affymetrix 500K SNP chip. Analyses were restricted to SNPs that had a minor allele frequency greater than 4% and that did not deviate significantly from Hardy-Weinberg equilibrium. A SNP was considered cis-acting if it was located within 1 megabase of the transcription start or stop codon of the corresponding structural gene according to Affymetrix chip annotation. To calculate an enrichment p-value, we used a random sampling strategy where equally sized groups of genes were selected many times and the percentage of significant SNPs was assessed. With those counts, we built a null distribution to which we compared our observed percentage of SNPs with GWAS p-values < 0.05. We also computed an enrichment p-value using only SNPs whose allele distribution was functionally related to gene expression levels (eSNPs) [46]. See section 5.5 for a discussion of these types of analyses.

3.6 STATISTICAL ANALYSIS

Clinical and metabolic characteristics are given as continuous variables with means and standard deviations (SD) and as categorical variables with numbers and percentages of subjects. For continuous variables, p-values were calculated with unpaired t-tests; skewed values were log-transformed. For categorical variables, chi-square or Fisher exact (n<5) tests were used. Step-wise regression was done to control for covariance. Logistic regression was applied to calculate odds ratios. Statistical enrichment p-values were estimated for gene group overlaps using hypergeometric distribution assumptions. All calculations were performed with Mathematica 6.0, SAS 9.1, R, or StatView 5.0.1.

Page 28: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

16 Results and Discussion | Sara Hägg

4 RESULTS AND DISCUSSION 4.1 IDENTIFICATION OF AN ATHEROSCLEROSIS GENE MODULE IN CAD

PATIENTS (STUDY I)

We used the STAGE cohort to perform whole-genome expression profiling in five tissues: atherosclerotic aortic root, IMA, skeletal muscle, liver, and visceral fat. Expression analysis could not be performed in all tissues from all patients. Therefore, clinical characteristics are presented in expression subgroups of the STAGE patients (i.e., STAGE metabolic contains patients with liver, skeletal muscle and visceral fat profiles and STAGE complete only includes those patients with additional arterial profiles) (Table 1). STAGE entire, metabolic and complete showed similar clinical phenotypes. Thus, the expression subgroups are representative of the entire STAGE cohort.

To highlight atherosclerosis gene expression in the aortic wall, we used atherosclerotic aortic root/IMA ratios (atherosclerotic arterial wall) because both aortic wall and IMA samples contain normal wall gene expression. Unlike the aortic wall, however, the IMA has no atherosclerosis [47] and is therefore used as a graft in the bypass surgery. After data preprocessing (i.e., normalization, background adjustments and probe mapping), we used coupled two-way clustering to identify small and stable gene groups with correlated expression. The coupled two-way approach includes an SPC algorithm that has many advantages over other clustering methods. It is stable against noise, excludes genes with poor correlations, and chooses the optimal numbers of clusters. The first clustering step generated 60 tissue clusters representing 4007 RefSeqs/3958 genes: 20 clusters in visceral fat, 11 in skeletal muscle, 15 in liver, and 14 in atherosclerotic arterial wall samples.

In the second step in the two-way approach, we clustered patients on the basis of their gene expression within each cluster. Two of the 60 clusters segregated patients according to the extent of coronary stenosis (i.e., the stenosis score)—one cluster in the arterial wall (n=49 RefSeqs/48 genes, stenosis score p=0.008) (Figure 6A) and one in visceral fat (n=59 RefSeqs/genes, stenosis score p=0.00015) (Figure 6B). Seven genes were present in both clusters (p=10-10). Resampling strategies (using

Page 29: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Results and Discussion 17

subsets of the original data set to estimate the precision of the clustering algorithm, see section 3.4.2) showed that the clusters were stable and repeatable.

Table 1. Basic patient characteristics.

Characteristics Expression profile

STAGE entire

STAGE Metabolic

p STAGE

complete p

Carotid patients

n (% of total) 114 (100) 66 (58) 40 (35) 25 (100) Age, y (mean±SD) 66 ± 8 66 ± 8 66 ± 8 69 ± 11 Male, n (%) 102 (89) 59 (89) 37 (93) 15 (60) Body-mass index, kg/m2 (mean±SD) 26.6 ± 3.7 26.4 ± 3.9 26.3 ± 3.9 25.3 ± 3.2 Waist-to-hip ratio (mean±SD) 0.94 ± 0.06 0.93 ± 0.06 0.93 ± 0.06 0.91 ± 0.07 Blood pressure, mm Hg (mean±SD)

Systolic 141 ± 19 140 ± 19 135 ± 18 150 ± 19 Diastolic 80 ± 9 80 ± 10 78 ± 8 77 ± 9 Insulin, pmol/L (mean±SD) 62 ± 47 59 ± 49 61 ± 53 44 ± 16 Proinsulin, pmol/L (mean±SD) 5.6 ± 5.7 5.1 ± 5.7 5.5 ± 6.9 4.6 ± 2.4 HbA1c, % (mean±SD) 5.2 ± 1.3 5.0 ± 0.7 5.0 ± 0.6 4.8 ± 0.4 Cholesterol, mmol/L (mean±SD)

Total 4.08 ± 1.01 3.97 ± 1.08 3.83 ± 1.02 4.74 ± 1.21 VLDL 0.32 ± 0.25 0.29 ± 0.25 0.26 ± 0.25 0.22 ± 0.17 LDL 2.09 ± 0.79 2.01 ± 0.84 1.87 ± 0.76 2.60 ± 0.90 HDL 1.49 ± 0.29 1.51 ± 0.33 1.54 ± 0.39 1.74 ± 0.48 Triglycerides, mmol/L (mean±SD)

Total 1.41 ± 0.73 1.36 ± 0.70 1.41 ± 0.76 1.23 ± 0.49 VLDL 1.04 ± 0.67 0.97 ± 0.64 0.98 ± 0.68 0.79 ± 0.42 LDL 0.26 ± 0.09 0.27 ± 0.09 0.28 ± 0.09 0.29 ± 0.09 HDL 0.16 ± 0.05 0.17 ± 0.05 0.19 ± 0.06 <0.01 0.20 ± 0.08 Current smoker, n (%) 8 (7) 4 (6) 2 (5) 1 (4) Former smoker, n (%) 70 (61) 42 (64) 25 (63) 18 (67) Alcohol consumption, g/week (mean±SD) 120 ± 96 117 ± 89 124 ± 82 117 ± 106 Stenosis score (mean±SD) - 5.06 ± 2.41 5.37 ± 2.43 NA IMT, mm (mean±SD) NA NA NA 1.24 ± 0.24 Diabetes mellitus, n (%) 24 (21) 11 (17) 5 (13) <0.05 2 (8) Insulin-requiring 23 (20) 9 (14) 5 (13) 1 (4) Hyperlipidemia, n (%) 84 (74) 49 (74) 27 (68) 13 (52) Statins 101 (89) 61 (92) 37 (93) 15 (60) Hypertension, n (%) 72 (63) 43 (65) 25 (63) 16 (64) Betablocker 103 (90) 62 (94) 38 (95) 11 (44) ACE inhibitors 42 (37) 25 (38) 15 (38) 5 (20) Thiazide diuretics 0 (0) 0 (0) 0 (0) 1 (4) Loop diuretics 26 (23) 13 (20) 10 (25) 3 (12) Calcium-channel blockers 15 (13) 7 (11) 4 (10) 5 (20) p-Values were calculated using unpaired t-tests comparing subgroups in STAGE with the entire STAGE cohort (n=114). Subgroups are included in

the entire cohort. NA indicates not available. HbA1c, glycated haemoglobin; VLDL, very low density lipoprotein; LDL, low density lipoprotein;

HDL, high density lipoprotein; IMT, intima-media thickness; ACE, angiotensin-converting enzyme.

A common statement within the functional genomics field is that clustering always produces clusters and that all different techniques and parameter adjustments will give variations in the result [26]. (See section 5.3 for a discussion of this matter.) Therefore, it is important to evaluate the method used and to validate the findings in several ways. Thus, we used the same two-way clustering approach to look for similar gene clusters in expression profiles from a second atherosclerosis cohort, the carotid patient cohort (Table 1). Eight clusters (904 RefSeqs/894 genes) were identified. One cluster (n=55 RefSeqs/54 genes) segregated the patients according to IMT score (p=0.04; Figure 6C). Remarkably, 16 of the 55 RefSeqs in that cluster overlapped with genes in the visceral fat cluster (p=10-27), and 17 overlapped with the genes in the atherosclerotic arterial wall cluster (p=10-17). Six RefSeqs/genes were found in all three clusters

Page 30: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

18 Results and Discussion | Sara Hägg

related to atherosclerosis severity (p=10-23) and the union of the clusters—defined as the atherosclerosis module (the A-module)—contained 129 RefSeqs/128 genes. Annotation of this module with GO and KEGG revealed a significant association to the transendothelial migration of leukocytes (TEML) pathway (p=6.6ÿ10-5; Figure 7). Eight genes were identical in the TEML pathway and in the A-module, and another 15 genes were related, as shown by Panther family classification. Transendothelial migration of monocytes is essential in the early phases of plaque formation, whereas T-cell migration may be important in later phases [5]. We therefore concluded that many genes in the A-module are involved in the TEML pathway. Future studies may show that a majority of the non-annotated genes also are involved.

Figure 6. Heat maps of the three clusters related to atherosclerosis severity in CAD patients using gene expression signals and coupled two-way clustering in atherosclerosis arterial wall (A), visceral fat (B) and carotid stenosis (C). Columns represent individual patients and rows individual RefSeqs with corresponding gene symbols and mRNA ratios of the two patient groups. Bars below indicate individual stenosis score or IMT together with means and SD, average ratios in each group, and p-values for comparing groups. Cluster A has 49 RefsSeqs or 48 genes. Cluster B has 59 RefSeqs/genes; genes in red were also found in cluster A. Cluster C has 55 RefSeqs or 54 genes; genes in red were also found in cluster A or B.

Page 31: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Results and Discussion 19

Figure 7. The TEML pathway. Red genes were found in the A-module (p=6.6ÿ10-5), and blue genes were associated with genes in the A-module according to Panther family classification. In all, 23 of 128 A-module genes were part of or related to TEML.

To determine whether genes in the A-module are causally important for atherosclerosis, and not merely reactive markers of disease, we sought to determine whether SNPs in any of our 128 genes were enriched for CAD risk. Indeed this was the case, whether we used SNPs linked to expression phenotypes, eSNPs (p=0.004) or not (p=0.003). When only eSNPs are included in the analysis, it increases the potential to extract much more disease-relevant information. This approach will likely be used much more frequently in the future, although we did not see any large differences in our analysis using eSNPs or not.

4.2 LDB2 AS A REGULATOR OF THE ATHEROSCLEROSIS MODULE

(STUDY I)

Of the six genes in the intersection of the three clusters, only one—LIM domain binding 2 (LDB2)—was related to transcriptional regulation, which suggested that it might regulate the activities of the genes in the A-module. A gene expression network was inferred where nodes (i.e., genes) were included if present in at least two of the three separate networks generated from the same expression profiles as the clusters related to atherosclerosis severity. LDB2, one of 49 nodes, had the highest degree of interconnectivity with 19 of 55 edges in total. In silico analysis showed that LDB2 interacts with seven transcription factors (TFs) that have LIM-binding domains. Many

Page 32: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

20 Results and Discussion | Sara Hägg

of these TFs had previously been linked to atherosclerosis activities, such as insulin signaling, angiogenesis, and lymphocyte regulation. At least one of these TFs could bind to target promoters in 81% of the genes with identified promoter sequences in the atherosclerosis module (122 of 128 genes). In relation to other known human promoters, the binding of LIM domain TFs to the A-module genes was enriched (p=0.01). Thus, bioinformatic analyses provide further evidence that LDB2 is important for the A-module and perhaps also for TEML gene regulation.

To further investigate the role of LDB2, we carried out functional validation studies in vitro and in vivo to assess the presence of LDB2 in three major atherosclerosis cell types: ECs, macrophages, and SMCs. In ECs, immunohistochemical staining showed that LDB2 co-localizes with the endothelial marker von Willebrand factor before lesion development; the co-localization was less obvious after lesions had progressed. This finding was confirmed by qRT-PCR analysis in a human EC line, which showed higher expression levels of LDB2 in noninduced cells.

LDB2 also co-localized with the macrophage marker CD68 in early and late lesions. Staining for both markers was a bit stronger in late lesions, although CD68 staining was more intense. Consistent with these findings, LDB2 mRNA expression increased with the differentiation of a human monocytic cell line (THP-1) to macrophages and foam cells. Table 2. mRNA levels measured by qRT-PCR from the aortic arch of 6-week old mice deficient in Ldb2 (Ldb2-/-) and wild-type littermate controls (Ldb2wt/wt). Category Gene Symbol Ldb2wt/wt Ldb2-/- p-Value

A-module genes associated to TEML

Claudin 5 Cldn5 307±108 397±271 0.47

Phospholipase C gamma 2 Plcg2 461±65 726±219 0.019

Cadherin 5 Cdh5 352±114 603±179 0.011

Chemokine (C-X-C motif) ligand 12 Cxcl12 498±103 715±168 0.015

Platelat/endothelial cell adhesion molecule Pecam1 345±122 564±157 0.016

Angiotensin II receptor-like 1 Aplnr 435±253 846±404 0.069

Kinase insert domain receptor Kdr 386±224 964±555 0.043

Protocadherin 12 Pcdh12 491±188 785±339 0.10

Protein Kinase N3 Pkn3 410±193 1076±697 0.050

Protein kinase C eta Prkch 547±199 1045±369 0.019

Protein tyrosine phosphatase receptor type B Ptprb 486±167 1115±575 0.030

Tek tyrosine kinase (endothelial) Tek 430±122 1068±551 0.021

Tyrosine kinase with immunoglobulin-like and EGF-like domains 1 Tie1 524±170 895±374 0.056

Other TEML genes

Intercellular adhesion molecule 1 Icam1 405±54 533±73 0.0042

F11 receptor F11r 388±59 614±151 0.0037

Junction adhesion molecule 2 Jam2 452±70 616±137 0.018

Junction adhesion molecule 3 Jam3 567±53 741±163 0.022

Vascular cell adhesion molecule 1 Vcam1 492±83 730±134 0.0025

Thymus cell antigen 1 Thy1 556±158 707±264 0.23

CDC42 effector protein (Rho GTPase binding) 5 Cdc42ep5 540±127 622±119 0.26 Values are mean±SD. p-Values are calculated with unpaired t-test. Values are normalized to acidic ribosomal phosphoprotien P0 and TATA box binding protein. Ldb2-/-, n=5-6; Ldb2wt/wt, n=6-7.

In SMCs, LDB2 co-localized with SM22, a marker of lesion SMCs, but some areas only showed LDB2 staining. The qRT-PCR results confirmed the expression of LDB2 in primary SMCs.

Thus, LDB2 was expressed in all major atherosclerosis cell types. Before lesion formation and in early lesions, LDB2 appeared mainly in ECs; in late lesions, it

Page 33: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Results and Discussion 21

appeared mainly in macrophages/foam cells and in SMCs. Thus, the expression of LDB2 seems to correspond to plaque progression. ECs are important for leukocyte recruitment in early plaque development, whereas macrophages/foam cells and SMCs are responsible for lesion formation.

The TEML pathway has also been implicated in all three atherosclerosis cell types [48-50] and could potentially be regulated by LDB2. Therefore, we examined mRNA levels of 20 genes central to TEML in the arterial wall of 6-week-old Ldb2-/- mice (Table 2). All genes were expressed at higher levels than in wild-type littermates, and 13 genes were significantly changed. These young Ldb2-/- mice had not yet developed atherosclerosis, and the regulatory role of Ldb2 is not fully understood. However, some mechanistic change was obvious, and five of these genes have been knocked in to create mouse models of atherosclerosis [51-55]. We have also seen an upregulation of Ldb2 expression between early and late lesions in atherosclerotic-prone mice. Again it could be emphasized that LDB2 levels differ according to the stage of plaque progression and are increased in the cell types that are important at the early and late stages of lesion development.

4.3 DUSP1 – AN ANTI-INFLAMMATORY MARKER OF POSTOPERATIVE STAY

AND COMPLICATIONS (STUDY II)

Diabetes and hyperglycemia are associated with an increased risk of mortality and morbidity after CABG [56-60]. Therefore, we divided the STAGE cohort into groups based on their diabetes status: 17% of the patients had a clinical diagnosis of type 2 diabetes at admission (n=11; 2 oral drugs, 6 insulin treated, 3 diet only), 15% had undiagnosed diabetes as indicated by a preoperative fasting blood glucose (FBG) level of ≥6.1 mmol/l (n=10), and 11% were defined as pre-diabetes, with FBG levels of 5.6 to <6.1 mmol/l (n=7). The remaining patients (57%, n=38) had a normal FBG of <5.6 mmol/L (Table 3) [61].

As expected, patients with type 2 diabetes and those with previously undiagnosed diabetes had prolonged postoperative rehabilitation (hospital and rehabilitation time) (Figure 8A) and hospitalization (only hospital time) (Figure 8B) than patients without diabetes. The main reason for the prolonged stays was atrial fibrillation (AF); other causes were infection, bleeding, kidney failure, stroke, and pneumothorax (Figure 8C).

There is growing evidence that systemic inflammation is the primary cause of AF [62-63]. Therefore, we hypothesized that patients with ongoing but subtle inflammation would be more prone to these complications after CABG. To search for molecular markers of prolonged stay and complications after surgery, we analyzed the STAGE gene expression profiles. In skeletal muscle, liver, and visceral fat we correlated individual gene profiles to rehabilitation days using Spearman rank correlation. Many of the correlated genes in skeletal muscle are involved in inflammatory processes (Table 4); the most significant gene was dual-specificity phosphatase 1 (DUSP1).

Page 34: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

22 Results and Discussion | Sara Hägg

Table 3. Basic characteristics of the STAGE cohort divided in subgroups of diabetes status.

Non Pre- Non Tartu

STAGE DM diagnosed diabetics diabetics CABG

(n=66) (n=11) (n=10) (n=7) (n=38) (n=54)

CONTINUOUS VARIABLES Mean SD Mean SD P Mean SD P Mean SD P Mean SD Mean SD

Age – yr 66 8 68 9 68 7 68 11 65 8 67 7,9

Preoperative Preop

FBG - mmol/L 5,6 1,4 6,8 2,2 ** 7,1 1,1 *** 5,7 0,1 *** 4,8 0,4 NA

HbA1c - % 5,2 1 7,1 1,2 *** 5 0,4 4,7 0,4 4,8 0,4 NA

Creatinine - mmol/L 88,8 21,6 95,5 29,5 99,6 36,2 90,5 12,5 83,5 12,2 NA

3 Months postoperative

BMI - kg/m2 26,4 3,9 28,1 4,8 24,6 3,1 25,6 3,9 26,6 3,7 29,1 4,8

WHR 0,93 0,06 0,95 0,07 0,92 0,06 0,93 0,06 0,93 0,05 1 0,1

Blood pressure – mmHg

Systolic 140 19 141 15 157 20 ** 135 22 136 18 132 22

Diastolic 80 10 75 9 86 7 * 79 7 80 10 76 10

HbA1c - % 5 0,7 6 0,8 *** 5 0,4 4,9 0,4 4,8 0,4 6,6 1,1

Creatinine - mmol/L 87,6 20,1 96 27,2 101 34,2 92 7,6 ** 81,5 11,9 80,6 27,4

CRP - mg/L 5,9 9 5,1 3,5 11,6 14,2 7,3 8,2 4,4 8,2 3,5 3,7

Cholesterol – mmol/L

Total 3,97 1,08 3,49 1,07 3,85 0,78 3,74 1,12 4,18 1,13 4,85 1,2

VLDL 0,29 0,25 0,29 0,38 0,29 0,18 0,28 0,24 0,29 0,23 NA

LDL 2,01 0,84 1,62 0,71 * 1,89 0,65 1,77 1,04 2,19 0,85 3,32 1,12

HDL 1,51 0,33 1,45 0,19 1,54 0,35 1,54 0,54 1,51 0,32 1,39 0,63

Triglycerides – mmol/L

Total 1,36 0,7 1,27 0,71 1,23 0,58 1,3 0,66 1,44 0,74 1,46 0,7

Alcohol – g/week 117 89 70 52 ** 138 80 92 55 130 100 43 58

Angiographic score

Mean % SA 7,5 1,4 7,5 1,4 6,4 2,2 7,2 0,9 6,9 1,7 NA

Stenosis score 5,06 2,41 5,64 4,15 4,9 2,18 5 2,31 4,94 1,8 NA

CATEGORICAL VARIABLES No % No % P No % P No % P No % No %

Sex ***

Male 59 89 7 64 9 90 7 100 36 95 32 59

Female 7 11 4 36 1 10 0 0 2 5 22 41

Smokers

Current 4 6 1 9 0 0 1 14 2 5 6 11

Former 42 64 6 55 7 70 4 57 25 66 12 22

Non 20 30 4 36 3 30 2 29 11 29 36 67

Preop heart attack 7 11 1 9 5 50 ** 1 14 6 16 NA

Present disease

Hypertony 43 65 7 64 9 90 * 6 86 21 55 34 63

Hyperlipidemia 49 74 9 82 8 80 6 86 26 68 29 54

Postop dyspnea

Never 24 38 3 27 5 50 2 29 15 41 NA

Heavy effort 14 21 1 9 1 10 1 14 11 30 NA

Moderate effort 20 30 5 46 3 30 3 43 9 24 NA

Easy effort 6 9 2 18 1 10 1 14 2 5 NA

Treatment

Statins 61 92 11 100 10 100 7 100 33 87 32 60

Betablockers 62 94 10 91 9 90 6 86 37 97 37 70

Insulin (oral/subc) 9 14 5 45 0 0 0 0 0 0 NA

FBG, fasting blood glucose; HbA1c, glycated haemoglobin; BMI, body mass index; WHR, waist-hip ratio; CRP, c-reactive protein; VLDL, very low density lipoprotein; LDL, low density lipoprotein; HDL, high density lipoprotein; SA, segment area; P-values are compared to the non diabetics group. * p<0.05; **p<0.01; ***p<0.001; NA, Not Applicable

As the name implies, DUSP1 protein has dual specificity—it dephosphorylates both tyrosine and threonine—and works as an inactivator of genes involved in a signaling cascade that triggers the innate immune system [64]. Through a negative feed-back loop, DUSP1 is upregulated in response to proinflammatory stimuli such as oxidative stress and cytokine secretion [65-66]. DUSP1 possesses regulatory properties, is important in various diseases [67], and has also been associated with the

Page 35: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Results and Discussion 23

A B

C Figure 8. Postoperative stay of patients with diabetes and types of complications in the STAGE cohort. Box plots (values between 25th and 75th percentiles) and bars (between 10th and 90th percentile) of the number of rehabilitation days (A) and hospitalization days (B) in the diabetic groups. (C) Number of STAGE patients with specific postoperative complication. Dots show extreme values.

Table 4. Correlations between inflammatory genes and rehabilitation days in STAGE.

Skeletal muscle Gene Corr P-value Dual specificity phosphatase 1 0,44 0,0002 Zinc finger protein 36 0,41 0,0006 Splicing factor, arginine/serine-rich 10 0,38 0,002 Mitogen-activated protein kinase kinase 1 interacting protein 1 0,36 0,003 Matrix metallopeptidase-like 1 -0,36 0,003 Natural cytotoxicity triggering receptor 1 -0,36 0,003 Interferon gamma receptor 1 0,35 0,004 Forkhead box H1 -0,35 0,004

Liver Gene Corr P-value Major histocompatbility complex, class II, DO beta -0,45 0,0001 Lymphotoxin beta -0,39 0,001

Visceral fat Gene Corr P-value Contactin 6 0,36 0,002 A kinase anchor protein 5 -0,28 0,006

extent of atherosclerosis [68]. Thus, we considered DUSP1 to be a good candidate as an anti-inflammatory marker.

The mRNA levels of DUSP1 not only segregated the STAGE patient subgroups with the longest and shortest rehabilitation (Figure 9A) and hospitalization (Figure 9B) but also discriminated all patients into those with normal (8 days) and prolonged (>8 days) rehabilitation (p=0.004, Figure 9C) and hospitalization (p=0.003, Figure 9D).

Page 36: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

24 Results and Discussion | Sara Hägg

Figure 9. DUSP1 mRNA levels in skeletal muscle and postoperative stay in STAGE. Box plots of number of rehabilitation days (A) and hospitalization days (B) in patients with low (n=10) and high (n=10) mRNA levels of DUSP1 as indicated by the GeneChip profiling. Bar graphs of DUSP1 mRNA levels in all patients with normal (≤8 days) and long (>8 days) rehabilitation (C) and hospitalization (D). Box plots of DUSP1 qRT-PCR expression in patients with normal (n=13) and long (n=12) rehabilitation (E) and normal (n=10) and long (n=8) hospitalization (F). The boxes enclose values between the 25th and 75th percentiles and the bars between the 10th and 90th percentiles. Dots show extreme values. Bar graph plots show means and SDs.

qRT-PCR confirmed the difference in DUSP1 mRNA levels between those with the shortest and longest rehabilitation and hospitalization (Figure 9E and F).

To validate DUSP1 as a candidate marker for prolonged hospitalization stay after CABG, we needed a second validation cohort. For this purpose, we used preoperative blood samples drawn from CABG patients admitted to Tartu University Hospital in Estonia (Table 3). Skeletal muscle is not a feasible sample to use in clinical practice. Therefore, we measured plasma levels of DUSP1 protein rather than skeletal muscle mRNA. DUSP1 levels were assessed with ELISA, and hospitalization times were gathered for the Tartu cohort. DUSP1 blood levels clearly separated patients with 8 or fewer hospital days from those with more than 8 days (Figure 10).

Page 37: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Results and Discussion 25

Figure 10. Preoperative blood levels of DUSP1 in 109 patients with normal (≤8 days) and in 72 patients with long (>8 days) hospitalization after CABG in a validation cohort. The boxes enclose values between the 25th and 75th percentiles, and the bars between the 10th and 90th percentiles. Dots show extreme values.

In summary, we show that elevated plasma levels DUSP1 protein in patients

before CABG is a marker of increased incidence of co-morbidities and prolonged hospitalization after surgery. To identify CAD patients listed for CABG that are at higher risk of postoperative complications, and therefore should be rescheduled for later surgery, would be beneficial both for the individual and from a health economic perspective minimizing the overall risks.

4.4 A NOVEL PLAQUE CHARACTERISTIC – CAROTID PLAQUE AGE

DETERMINED BY CARBON-14-DATING (STUDY III)

In the carotid cohort, we assessed a novel characteristic: plaque age measured by 14C-dating, carried out by accelerator mass spectrometry (AMS). The biological age of the plaques was measured by determining the relative 14C levels and mapping it to atmospheric levels, using the radiocarbon Bomb curve (Figure 11). In the 1950s, the atmospheric concentration of 14C increased rapidly as a consequence of nuclear bomb testing. Our study is to our knowledge the first attempt to characterize plaque age.

Figure 11. The Bomb curve. Carotid plaque samples were 14C-dated with AMS. The cellular birth dates were inferred by determining the time at which the sample´s 14C concentration corresponded to the atmospheric concentration, which peaked during the 1950s after nuclear bomb testing.

The sample-to-sample variation of plaque age was relatively low with a standard deviation of 3.31 years, suggesting a relatively homogenous biological age. Technical replicates also supported this by an average of 1.15 years of difference between samples. A plaque constitutes many cell types and other components which make it

Page 38: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

26 Results and Discussion | Sara Hägg

biologically heterogeneous and therefore our finding was surprising. All plaques except for two were generated 5-15 years before carotid surgery which also suggested that they were not developed slowly but exponentially over a short period of time. The same observation has been seen in atherosclerosis development in mouse models [69].

The average plaque age in all patients was 9.61 years (Table 5). When subdividing the carotid patients into three groups (terciles) based on their plaque age measurements we found that the low plaque age group had significantly higher insulin levels, consumed more alcohol and to be using statins more frequently (Table 5).

Table 5. Basic carotid patients characteristics divided by plaque age terciles.

Characteristics All patients (n=29)

High plaque age

(n=10)

Intermediate plaque age

(n=9)

P Low plaque age

(n=10)

P

Plaque age, y (mean±SD) 9.61 ± 3.31 13.23 ± 1.25 9.48 ± 1.19 *** 6.11 ± 1.73 ***

Plaque start age, y (mean±SD) 58 ± 10 56 ± 8 63 ± 9 57 ± 12

Age, y (mean±SD) 68 ± 10 69 ± 8 72 ± 9 63 ± 12

IMT, mm (mean±SD) 1.25 ± 0.23 1.44 ± 0.30 1.13 ± 0.17 1.19 ± 0.09

Male, % (n) 55 (16) 56 (5) 63 (5) 67 (6)

Body-mass index, kg/m2 (mean±SD) 25.1 ± 3.6 24.2 ± 4.1 25.8 ± 2.7 25.5 ± 3.9

Waist-to-hip ratio (mean±SD) 0.90 ± 0.06 0.87 ± 0.04 0.92 ± 0.05 * 0.92 ± 0.07

Blood pressure, mm Hg (mean±SD)

Systolic 154 ± 19 155 ± 11 152 ± 24 155 ± 15

Diastolic 78 ± 8 76 ± 8 79 ± 10 79 ± 9

Insulin, pmol/L (mean±SD) 45 ± 18 31 ± 11 47 ± 15 * 57 ± 18 **

Proinsulin, pmol/L (mean±SD) 4.9 ± 2.6 5.2 ± 3.1 4.3 ± 2.5 5.2 ± 2.2

HbA1c, % (mean±SD) 4.9 ± 0.5 5.1 ± 0.6 4.6 ± 0.4 5.0 ± 0.4

CRP, mg/L (mean±SD) 10.5 ± 16.1 7.70 ± 10.27 9.40 ± 14.84 14.25 ± 21.99

Cholesterol, mmol/L (mean±SD)

Total 4.64 ± 1.12 4.79 ± 1.32 4.96 ± 1.26 4.21 ± 0.69

VLDL 0.25 ± 0.17 0.28 ± 0.22 0.21 ± 0.10 0.26 ± 0.16

LDL 2.52 ± 0.87 2.60 ± 0.96 2.81 ± 1.00 2.19 ± 0.60

HDL 1.69 ± 0.41 1.75 ± 0.54 1.72 ± 0.34 1.61 ± 0.33

Triglycerides, mmol/L (mean±SD)

Total 1.29 ± 0.46 1.30 ± 0.62 1.19 ± 0.32 1.38 ± 0.42

VLDL 0.85 ± 0.40 0.88 ± 0.60 0.73 ± 0.23 0.92 ± 0.32

LDL 0.30 ± 0.10 0.28 ± 0.08 0.31 ± 0.09 0.30 ± 0.12

HDL 0.19 ± 0.04 0.19 ± 0.04 0.19 ± 0.03 0.19 ± 0.05

Smoking status NA NA

Current, % (n) 7 (2) 20 (2) 0 (0) 0 (0)

Former within two years, % (n) 38 (11) 40 (4) 63 (5) 22 (2)

Former more than two years ago, % (n) 28 (8) 0 (0) 25 (2) 67 (6)

Non, % (n) 21 (6) 40 (4) 13 (1) 11 (1)

Alcohol, g/week (mean±SD) 102 ± 108 58 ± 53 114 ± 164 142 ± 89 *

Diabetes mellitus, % (n) 7 (2) 11 (1) 0 (0) 11 (1)

Insulin-requiring 3 (1) 0 (0) 0 (0) 11 (1)

Hyperlipidemia, % (n) 48 (14) 56 (5) 63 (5) 44 (4)

Statins 59 (17) 44 (4) 57 (4) 100 (9) *

Hypertension, % (n) 62 (18) 78 (7) 75 (6) 56 (5)

Betablocker 38 (11) 44 (4) 50 (4) 33 (3)

IMT, intima-media thickness; HbA1c, glycated haemoglobin; CRP, c-reactive protein; VLDL, very low density lipoprotein; LDL, low density lipoprotein;

HDL, high density lipoprotein; * p<0.05; **p<0.01; ***p<0.001; vs. high plaque age group; NA, Not Applicable

In univariate Pearson correlation analyses of plaque age and other clinical

parameters, plasma insulin levels and plaque age were inversely correlated (r=–0.59, p=0.0014). Surprisingly, the correlations with patient age and plaque size (i.e., IMT) were much weaker and statistically insignificant (r=0.34, p=0.09 for patient age and r=0.29, p=0.24 for IMT). In bivariate correlation analyses including the four top-ranked

Page 39: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Results and Discussion 27

parameters from the univariate analyses (plasma insulin levels, patient age, alanine aminotransferase (ALAT), gamma glutamyltransferase), the inverse correlation with plasma insulin levels was shown to be independent. In addition, plaque age was associated with smoking status when dividing patients into three groups (former smokers: 6.7±2.1 vs. non smokers: 11.5±3.0 (p=0.004); vs. smokers: 10.3±3.1 (p=0.01)).

A high insulin level may indicate insulin resistance, a condition seen in type 2 diabetes and a common co-morbidity in cardiovascular disease. Insulin resistance also increases the levels of proinflammatory cytokines and acute-phase mediators, a state that will promote plaque instability and cause symptomatic carotid atherosclerosis [70-72]. Although the plasma insulin levels in the low and intermediate plaque age groups did not reach the level of manifested insulin resistance, we believe it could be important. Moreover, markers of liver function such as ALAT are largely influenced by inflammatory states and have been positively correlated with insulin levels [73], as in our study.

Smoking status is also important for CVD. Cigarette smoke contributes to premature cellular aging and endothelial dysfunction and thereby works as a proinflammatory agent acting through many different pathophysiological pathways to promote atherosclerosis.

Last, we performed two-way clustering with 24 plaque gene expression profiles from the carotid patient cohort. Of eight carotid stenosis clusters identified, one (n=13 RefSeqs/genes) segregated the patients in two groups according to plaque age (Figure 12). In a GO analysis of this cluster, the biological process immune response was the most significant term (p-value=0.001). In all, eight genes were involved in immune or inflammatory processes linked to atherosclerosis. The expression levels of these genes follow the same inverse relation to plaque age as plasma insulin levels. Thus our earlier discussion on insulin resistance and proinflammatory mediators is relevant to these gene expression results as well. To summarize, we believe that increased plasma insulin levels lead to increased inflammatory gene activity in the plaque, triggering plaque growth as reflected by the low plaque age, which in turn probably leads to a more vulnerable plaque and early clinical complications (e.g., TIA and stroke).

Page 40: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

28 Results and Discussion | Sara Hägg

Figure 12. A gene cluster in carotid plaques relating to plaque age. We identified groups of functionally related genes important for plaque age by two-way clustering analysis of gene expression datasets from carotid plaques. In the first step, a cluster algorithm was used to determine the total number of functionally related gene groups (i.e., gene clusters) in the carotid plaques calculated using 24 gene expression profiles. Eight gene clusters were identified. In the second step, one (n=13 RefSeqs/genes) of the eight clusters segregated the carotid stenosis patients into two groups according to plaque age (p-value=0.04), suggesting these genes are important to this phenotype. Eight of the 13 genes were related to immune or inflammatory processes previously linked to atherosclerosis.

Page 41: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | General Discussion 29

5 GENERAL DISCUSSION 5.1 EXPERIMENTAL PLANNING

Experimental design is crucial for the success of any study. In microarray experiments, with large sample sizes, it is of even greater importance. Ideally all samples should be run from the same array batch, on the same day, and by the same person. However, this is not feasible. Therefore, we have to learn how to control for technical variation [30,74].

Technical variation is seen when the same biological sample is analyzed on two different microarray chips. Replicates should then be combined into one sample by averaging the intensities. Biological variation is seen when two different samples are analyzed in the same run. The aim is usually to make statistical inferences about different biological conditions and thus biological replicates are needed, although the number of samples required depends on the variation level of expression in the samples.

To control for confounding factors such as technical variation, samples must be randomized. In microarray experiments, many steps have to be well thought-out, including biopsy collection, tissue preservation, RNA extraction, sample preparation, hybridization, and chip scanning. In many studies, the date of hybridization to the chip is a main confounder. Bias may also be introduced by the sample protocols used. In the Affymetrix microarray platform, two different amplification protocols are used, depending on the amount of starting total RNA, leading to technical variation in samples.

There are, however, some methods available to compensate for technical sample variation. Even small sample sizes can be adjusted for nonbiological variation [42]. However, to separate batch effects from true biological effects, all samples need to be properly randomized.

5.2 GENE EXPRESSION PROFILING OF HETEROGENEOUS SAMPLES

In this thesis, we analyzed mRNA expression levels in different tissues relevant to atherosclerosis development. Common to all tissue types is the diversity of cells defining the structure of these samples. The microarray data reflect signals from combinations of different cell types and cell interaction activities. In this sense,

Page 42: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

30 General Discussion | Sara Hägg

interpretation of expression data could be difficult, since the precise cellular source of a signal cannot be accounted for. This situation is different from analysis of homogeneous cells in culture. However, when an in vitro disease cell model is removed from its natural environment in the plaque, many important interactions affecting gene expression may be missed. Some studies have been conducted to deal with these issues. Laser capture microdissection technique can extract cells from the local tissue environment before RNA isolation and has been successfully used for expression profiling of macrophages from atherosclerotic lesions [75]. Nevertheless, laser capture microdissection is a demanding task. Each cell must be manually extracted, and hundreds of cells from every sample are required to obtain a sufficient amount of total RNA for analysis. In addition, it is usually difficult to maintain high RNA quality after such procedures.

We believe it is important to catch signals from interacting cells through whole tissue sampling, especially in atherosclerosis, where different cells act together throughout plaque development. An example of this is the interaction of ECs and monocytes in the TEML pathway. However, to extract pathophysiological signals, the tissues collected must be disease relevant. Blood sampling is sometimes a good alternative. For example, in study II, we found a biomarker in blood, DUSP1, that may be useful in clinical practice. Nonetheless, blood contains many diverse circulating cells and macromolecules excreted from all tissues in the body and thus is less specific for disease processes than relevant tissues in complex disorders such as atherosclerosis and obesity [76].

5.3 CLUSTERING OF GENE EXPRESSION PROFILES

Gene expression data capture the biological state in a sample tissue, but cannot reveal whether this state is causing the disease or merely is reactive and thus secondary to disease development. Clustering is commonly used to identify a biological state related to gene groups within these large datasets (see section 1.6.2.2). One advantage of this type of analysis over single gene analysis (see section 1.6.2.1) is that it does not have to deal with the multiple testing issue. However, several other issues must be considered before starting the cluster analysis [26,77]. The type of clustering algorithm to choose depends on the type of data available. The accuracy of the expression profiles is influenced by the experimental design and technical variation (see section 5.1 above) and by the preprocessing of the data. If some values are missing, leading to incomplete datasets, how is this dealt with by the algorithm? The level of variation (i.e., noise) in the data may differ, and a method must be selected to handle outlier data points. Such problems should be addressed in a proper way in order to apply the best clustering algorithm with the preferred distance and linkage measures. For example, in vitro cultured cells usually have less variation than human tissue samples. Therefore, use of an Euclidean distance measure would be adequate for the cell samples. However, Spearman rank correlation would be better suited to deal with the outlier issue in human samples. Choosing a set of parameters that are not optimal for the current dataset could lead to misinterpretation of the data. Another set of parameter choices that better fit the type of condition

Page 43: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | General Discussion 31

studied would render a completely different result. Therefore, it is of utmost importance to conduct extensive validation of the clustering results by several different methods to justify the biological relevance.

5.4 HUMAN GENE EXPRESSION STUDIES IN ATHEROSCLEROSIS

Early studies of human gene expression in atherosclerosis were often conducted using small sample sizes and single gene detection algorithms [78]. A piece of the carotid or coronary artery was used or sometimes an aortic tissue. The classification was based on stenosis grading or plaque stability vs. instability. A problem in CVD expression study design is, of course, the difficulty of obtaining control tissues. Some studies use IMA as an internal control [78], as we did in study I of this thesis. This artery is atherosclerosis free [47]. However, ideally an aged matched, atherosclerosis free control cohort providing the same tissue types would be preferred. If blood samples were used instead, the sampling of a control cohort would be easier since blood, unlike coronary arterial wall samples, can be obtained without going through open heart surgery. In fact, gene signatures have been successfully derived from studies of blood gene expression profiles [79]. On the other hand, those samples have a high degree of cellular heterogeneity, as alluded to above.

The CVD research field has by tradition a strong clinical connection. For the most part, however, the field has not adopted new high-throughput methods and systems biology approaches. In other fields such as cancer, much more effort has been put into these new types of studies. One reason for this difference lies in the nature of the diseases. Tumors and other cancer tissue types show great deviations from normal tissues, which have been successfully described using expression profiling and prediction models [80]. Lately, however, the use of systems biology approaches and pathway tools for cardiovascular disease research has increased [81] and our study (study I) is a good example of this.

5.5 GENETICS OF GENE EXPRESSION STUDIES

Recently a new type of study for elucidating the mechanism of common complex diseases has arrived—genetics of gene expression (GGE) studies. Here a combination of gene expression profiling, genotyping, and disease phenotype assessment of human cohorts have provided insights into disease gene networks. One study in Nature by Emilsson et al. [76] collected blood and adipose tissue samples from an Icelandic cohort. By linking SNP allele frequencies to expression levels, they showed that heritability increases with the number of significant eSNPs. Furthermore, from expression data in adipose tissue, they identified a gene module that was significantly enriched by genetic association to human obesity.

A study by Schadt et al. [46] used human liver samples to perform a GGE study. They also integrated other GWAS data associated with diabetes and demonstrated how this approach could identify candidate susceptibility genes. In paper I, we showed an enrichment of genetic risk for CAD in the atherosclerosis module by combining eSNPs and a GWAS associated with CAD. Similarly, we also showed an enrichment of CAD risk using all SNPs in the GWAS. However, linking genetic variants to

Page 44: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

32 General Discussion | Sara Hägg

expression traits in human disease populations clearly has the potential to identify functional SNPs, making the targeted genes ideal candidates for drug response.

Page 45: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Future Considerations 33

6 FUTURE CONSIDERATIONS 6.1 BIOLOGICAL RESEARCH MOVING FORWARD

Today, conventional cardiovascular medicine and research focus on preventing disease by modifying established risk factors and by treating clinical manifestations of the disease. The study hypothesis derives from candidate gene approaches, where single disease gene and protein interactions are studied and targeted. Lately we have seen a paradigm shift in biology where the “omics” era has started to evolve; high-throughput technologies have appeared and the output is defined in terms of genomics, transcriptomics, metabolomics, and proteomics [82]. A new discipline has emerged—systems biology, where the goal is to identify and describe all parts and interactions in biological networks. The reason for this change in research focus lies in the recent advances in several fields: (1) the human genome was successfully sequenced, (2) multiple disciplines moved into the biology field, and mathematicians, physicians, biologists, chemists, and computer scientists started to work together, (3) the internet created the basis for large data exchanges through public databases, and (4) the development of large-scale methods for analysis made it possible to gather global data sets [83].

The STAGE study used a systems biology framework to identify disease-relevant groups of genes and their interactions. By whole-genome gene activity measurements in multiple organs, we were able to capture molecular disease signatures important for CAD development in several tissue types of the body. In a recent paper by Schadt and co-workers [84], tissue-to-tissue co-expression networks were constructed that consisted of genes in the hypothalamus, liver, and adipose tissues in obese mice. By including only genes whose expression correlated between tissues, they could highlight the communication between tissues and identify highly connected subnetworks enriched for functional categories important for obesity. They concluded that single tissue expression networks would have foreseen most of the relevant genes detected up to date, emphasizing the significance of entire system analysis for understanding complex diseases.

In this sense, systems biology in medical research will lead to improved capacity to model molecular networks and the ability to study drug interactions both in animals and humans. A common misconception in the traditional clinical research community

Page 46: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

34 Future Considerations | Sara Hägg

is that we do not, at this point, have all the tools required to conduct such studies. However, by embracing the complexity and studying the whole system, instead of single parts of it, we can differentiate and extract more subtle information that otherwise would have been hidden, as exemplified in the obesity study described above. Moreover, the combinations of systems biology approaches and GGE studies will in the future lead to much bigger studies with the potential to obtain much more detailed mechanistic insights into common human diseases. Finally, progress in new technology will eventually lead to better diagnostic tests for clinical use.

6.2 PERSONALIZED MEDICINE

Future cardiovascular medicine will be much more tailored to prevent disease at an earlier stage. Current therapies are administered in the same way to a population with heterogeneous background and different severities of disease. As large-scale measurements of the genome/transcriptome/metabolome/proteome are available, patient screening for subclassification of disease phenotypes will advance. Great progress has been made in identifying biomarkers suitable for the clinic. By using nanotechnology and microfluidics platforms, small sample amounts could be tested for proteome multiparameter diagnosis [83]. Many protein measurements together represent a state in a biological system, and by knowing the molecular interactions of the disease network, the results could be analyzed to understand and monitor the disease. In combination with the genetic predisposition of a patient, an individualized risk profile and prevention plan with tailored pharmacotherapy could be carried out.

Page 47: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Concluding Remarks 35

7 CONCLUDING REMARKS During the last decade, biological research has evolved rapidly as a consequence of incorporating new high-throughput techniques and systems biology approaches into the study design. This thesis is an example of how these new studies have the potential to expand the medical field and contribute to the identification of new disease mechanisms and reveal their interplay. We used whole-genome expression profiling in well-characterized CVD patients with atherosclerosis and applied a number of gene discovery methods and functional validation of key network genes. In this fashion, we identified a number of genes with implications for atherosclerosis and inflammatory processes. These genes could be of use for understanding disease, paving the way for novel drug targets and disease markers.

Page 48: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

36 Acknowledgements | Sara Hägg

8 ACKNOWLEDGEMENTS My PhD project was conducted in the Computational Medicine group at the Department of Medicine, Karolinska Institutet and at the Department of Physics, Chemistry and Biology, Linköping University. This work has been supported by the Swedish Knowledge Foundation through the Industrial PhD programme in Medical Bioinformatics at Karolinska Institutet, Corporate Alliances and by Clinical Gene Networks AB. The project begun in late 2003 and over the years I have had the opportunity to work with many people who I would like to thank: My main supervisor Johan Björkegren for your enthusiasm and positive thinking. I have learnt a lot from you and from now on I will never give up on anything; persistence pays! My co-supervisor Jesper Tegnér for your ability to show off your expertise in the field of systems biology and for bringing me onboard as a fresh student when it all begun. My co-supervisor Josefin Skogsberg for your encouragement and never ending patience with all my questions throughout the years. Also, for being a true friend and for sharing my interest in sports among other things. Other members of the Computational Medicine group, in particular my coauthor Jesper Lundström for your helpfulness and positive energy. I guess we finally took off in separate directions. Peri Noori for all your help in the lab and for your genuine interest in scientific methods. Shohreh Maleki for always being open for a discussion on any matter. Ming-Mei Shang for bringing a Chinese flavor around and David for nice coffee breaks. All former group members and students; Olivia, Andrei, Roland, Alex, Kristoffer, Björn, Jose, Guiyuan, Janne, Maria, Simon, Johanna, Idha, Malin, Marina and Per-Erik. My coauthor Thomas Alserius and Torbjörn Ivert for putting the clinical endpoint in focus and for trying to understand the issues in gene expression analysis.

Page 49: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | Acknowledgements 37

The former atherosclerosis unit and GV community; Karin Stenström for refreshing lunch breaks and many wise words, Stina Eneling for sharing thoughts on kids and marriage, Pelle Sjögren for a handful beers and soccer games, Petra Thulin, Anna Aminoff and Dick Wågsäter for all spex activities. Maria Kolak, Maria Mannila Nastase, Justo Sierra Johnson, Ann Samnegård and the rest of the Anders Hamsten group for a nice atmosphere. All current members of the GV house for lunch breaks and friendly faces. Especially Mette Kirkegaard and Beata Kostyszyn for fruitful collaborations across disciplines but also Anna Magnusson, Paula Mannström, Cecilia Söderberg and Mattias Sjöström. Lena Ström and Emma Lindgren for good teamwork and the ability to understand my complicated explanations. All my dear girl friends following each other from Molequl and forever; Anna, Ylva, Milla, Amy and Elin. Moa and Christofer for sharing life. My mother in law for a helping hand. My mother for trying to explain the non-scientific parts of life. My father for being a sincere researcher who works for fun. My sister for always being there. Jonas and Arvid for being my dearest treasures in life. I love you. All other friends and family for your support. Thank you! Sara Hägg

Page 50: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

38 References | Sara Hägg

9 REFERENCES 1. WHO (2007) Cardiovascular diseases. World Health Organization. 317 317.

2. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860-921.

3. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, et al. (2001) The sequence of the human genome. Science 291: 1304-1351.

4. Tegner J, Skogsberg J, Bjorkegren J (2007) Thematic review series: systems biology approaches to metabolic and cardiovascular disorders. Multi-organ whole-genome measurements and reverse engineering to uncover gene networks underlying complex traits. J Lipid Res 48: 267-277.

5. Hansson GK (2005) Inflammation, atherosclerosis, and coronary artery disease. N Engl J Med 352: 1685-1695.

6. Lusis AJ (2000) Atherosclerosis. Nature 407: 233-241.

7. Brunzell JD, Davidson M, Furberg CD, Goldberg RB, Howard BV, et al. (2008) Lipoprotein management in patients with cardiometabolic risk: consensus conference report from the American Diabetes Association and the American College of Cardiology Foundation. J Am Coll Cardiol 51: 1512-1524.

8. Ambrose JA, Barua RS (2004) The pathophysiology of cigarette smoking and cardiovascular disease: an update. J Am Coll Cardiol 43: 1731-1737.

9. Smith SC, Jr., Allen J, Blair SN, Bonow RO, Brass LM, et al. (2006) AHA/ACC guidelines for secondary prevention for patients with coronary and other atherosclerotic vascular disease: 2006 update: endorsed by the National Heart, Lung, and Blood Institute. Circulation 113: 2363-2372.

10. Canoy D (2008) Distribution of body fat and risk of coronary heart disease in men and women. Curr Opin Cardiol 23: 591-598.

11. Brunekreef B, Holgate ST (2002) Air pollution and health. Lancet 360: 1233-1242.

12. Soderberg-Naucler C (2008) HCMV microinfections in inflammatory diseases and cancer. J Clin Virol 41: 218-223.

13. Nesto RW (2005) Managing cardiovascular risk inpatients with metabolic syndrome. Clin Cornerstone 7: 46-51.

14. Carl-David Agardh CB, Jan Östman (2004) Diabetes: Liber AB.

Page 51: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | References 39

15. Frostegard J (2005) Atherosclerosis in patients with autoimmune disorders. Arterioscler Thromb Vasc Biol 25: 1776-1785.

16. Chien KR, editor (2004) Molecular basis of cardiovascular disease. Second edition ed. Philadelphia: Saunders. 713 p.

17. Altshuler D, Daly MJ, Lander ES (2008) Genetic mapping in human disease. Science 322: 881-888.

18. Samani NJ, Erdmann J, Hall AS, Hengstenberg C, Mangino M, et al. (2007) Genomewide association analysis of coronary artery disease. N Engl J Med 357: 443-453.

19. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661-678.

20. Helgadottir A, Thorleifsson G, Manolescu A, Gretarsdottir S, Blondal T, et al. (2007) A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316: 1491-1493.

21. Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, et al. (2007) Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316: 1331-1336.

22. Kingsmore SF, Lindquist IE, Mudge J, Beavis WD (2007) Genome-wide association studies: progress in identifying genetic biomarkers in common, complex diseases. Biomark Insights 2: 283-292.

23. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270: 484-487.

24. Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ (1999) High density synthetic oligonucleotide arrays. Nat Genet 21: 20-24.

25. Efron B, Tibshirani R, Storey J, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association 96: 1151-1160.

26. Do JH, Choi DK (2008) Clustering approaches to identifying gene expression patterns from DNA microarray data. Mol Cells 25: 279-288.

27. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95: 14863-14868.

28. Blatt M, Wiseman S, Domany E (1996) Superparamagnetic clustering of data. Phys Rev Lett 76: 3251-3254.

29. Tetko IV, Facius A, Ruepp A, Mewes HW (2005) Super paramagnetic clustering of protein sequences. BMC Bioinformatics 6: 82.

30. Allison DB, Cui X, Page GP, Sabripour M (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7: 55-65.

31. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25-29.

32. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 27-30.

33. BioCarta Pathways.

Page 52: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

40 References | Sara Hägg

34. Edgell CJ, McDonald CC, Graham JB (1983) Permanent cell line expressing human factor VIII-related antigen established by hybridization. Proc Natl Acad Sci U S A 80: 3734-3737.

35. Daugherty A (2002) Mouse models of atherosclerosis. Am J Med Sci 323: 3-10.

36. Austen WG, Edwards JE, Frye RL, Gensini GG, Gott VL, et al. (1975) A reporting system on patients evaluated for coronary artery disease. Report of the Ad Hoc Committee for Grading of Coronary Artery Disease, Council on Cardiovascular Surgery, American Heart Association. Circulation 51: 5-40.

37. Salehpour M, Forsgard N, Possnert G (2008) Accelerator mass spectrometry of small biological samples. Rapid Commun Mass Spectrom 22: 3928-3934.

38. Salehpour M, Forsgard N, Possnert G (2009) FemtoMolar measurements using accelerator mass spectrometry. Rapid Commun Mass Spectrom 23: 557-563.

39. Salehpour M, Possnert G, Bryhni H (2008) Subattomole sensitivity in biological accelerator mass spectrometry. Anal Chem 80: 3515-3521.

40. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, et al. (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31: e15.

41. Mecham BH, Klus GT, Strovel J, Augustus M, Byrne D, et al. (2004) Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements. Nucleic Acids Res 32: e74.

42. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8: 118-127.

43. Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci U S A 97: 12079-12084.

44. Efron B (1979) Bootstrap methods: another look at the jackknife. AS 7: 1-26.

45. Dennis G, Jr., Sherman BT, Hosack DA, Yang J, Gao W, et al. (2003) DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4: P3.

46. Schadt EE, Molony C, Chudin E, Hao K, Yang X, et al. (2008) Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6: e107.

47. Sims FH (1983) A comparison of coronary and internal mammary arteries and implications of the results in the etiology of arteriosclerosis. Am Heart J 105: 560-566.

48. Braunersreuther V, Mach F (2006) Leukocyte recruitment in atherosclerosis: potential targets for therapeutic approaches? Cell Mol Life Sci 63: 2079-2088.

49. Cai Q, Lanting L, Natarajan R (2004) Interaction of monocytes with vascular smooth muscle cells regulates monocyte survival and differentiation through distinct pathways. Arterioscler Thromb Vasc Biol 24: 2263-2270.

50. Cai Q, Lanting L, Natarajan R (2004) Growth factors induce monocyte binding to vascular smooth muscle cells: implications for monocyte retention in atherosclerosis. Am J Physiol Cell Physiol 287: C707-714.

51. Hauer AD, Habets KL, van Wanrooij EJ, de Vos P, Krueger J, et al. (2009) Vaccination against TIE2 reduces atherosclerosis. Atherosclerosis 204: 365-371.

Page 53: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | References 41

52. Hauer AD, van Puijvelde GH, Peterse N, de Vos P, van Weel V, et al. (2007) Vaccination against VEGFR2 attenuates initiation and progression of atherosclerosis. Arterioscler Thromb Vasc Biol 27: 2050-2057.

53. Koni PA, Joshi SK, Temann UA, Olson D, Burkly L, et al. (2001) Conditional vascular cell adhesion molecule 1 deletion in mice: impaired lymphocyte migration to bone marrow. J Exp Med 193: 741-754.

54. Stevens HY, Melchior B, Bell KS, Yun S, Yeh JC, et al. (2008) PECAM-1 is a critical mediator of atherosclerosis. Dis Model Mech 1: 175-181; discussion 179.

55. Zernecke A, Liehn EA, Fraemohs L, von Hundelshausen P, Koenen RR, et al. (2006) Importance of junctional adhesion molecule-A for neointimal lesion formation and infiltration in atherosclerosis-prone mice. Arterioscler Thromb Vasc Biol 26: e10-13.

56. Alserius T, Hammar N, Nordqvist T, Ivert T (2006) Risk of death or acute myocardial infarction 10 years after coronary artery bypass surgery in relation to type of diabetes. Am Heart J 152: 599-605.

57. Latham R, Lancaster AD, Covington JF, Pirolo JS, Thomas CS (2001) The association of diabetes and glucose control with surgical-site infections among cardiothoracic surgery patients. Infect Control Hosp Epidemiol 22: 607-612.

58. Pomposelli JJ, Baxter JK, 3rd, Babineau TJ, Pomfret EA, Driscoll DF, et al. (1998) Early postoperative glucose control predicts nosocomial infection rate in diabetic patients. JPEN J Parenter Enteral Nutr 22: 77-81.

59. Takamura T, Sakurai M, Ota T, Ando H, Honda M, et al. (2004) Genes for systemic vascular complications are differentially expressed in the livers of type 2 diabetic patients. Diabetologia 47: 638-647.

60. Umpierrez GE, Isaacs SD, Bazargan N, You X, Thaler LM, et al. (2002) Hyperglycemia: an independent marker of in-hospital mortality in patients with undiagnosed diabetes. J Clin Endocrinol Metab 87: 978-982.

61. Alberti KG, Zimmet PZ (1998) Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus provisional report of a WHO consultation. Diabet Med 15: 539-553.

62. Bruins P, te Velthuis H, Yazdanbakhsh AP, Jansen PG, van Hardevelt FW, et al. (1997) Activation of the complement system during and after cardiopulmonary bypass surgery: postsurgery activation involves C-reactive protein and is associated with postoperative arrhythmia. Circulation 96: 3542-3548.

63. Liuba I, Ahlmroth H, Jonasson L, Englund A, Jonsson A, et al. (2008) Source of inflammatory markers in patients with atrial fibrillation. Europace 10: 848-853.

64. Chi H, Flavell RA (2008) Acetylation of MKP-1 and the control of inflammation. Sci Signal 1: pe44.

65. Abraham SM, Clark AR (2006) Dual-specificity phosphatase 1: a critical regulator of innate immune responses. Biochem Soc Trans 34: 1018-1023.

66. Chi H, Barry SP, Roth RJ, Wu JJ, Jones EA, et al. (2006) Dynamic regulation of pro- and anti-inflammatory cytokines by MAPK phosphatase 1 (MKP-1) in innate immune responses. Proc Natl Acad Sci U S A 103: 2274-2279.

Page 54: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

42 References | Sara Hägg

67. Boutros T, Chevet E, Metrakos P (2008) Mitogen-activated protein (MAP) kinase/MAP kinase phosphatase regulation: roles in cell growth, death, and cancer. Pharmacol Rev 60: 261-310.

68. Reddy ST, Nguyen JT, Grijalva V, Hough G, Hama S, et al. (2004) Potential role for mitogen-activated protein kinase phosphatase-1 in the development of atherosclerotic lesions in mouse models. Arterioscler Thromb Vasc Biol 24: 1676-1681.

69. Skogsberg J, Lundstrom J, Kovacs A, Nilsson R, Noori P, et al. (2008) Transcriptional profiling uncovers a network of cholesterol-responsive atherosclerosis target genes. PLoS Genet 4: e1000036.

70. Handberg A, Levin K, Hojlund K, Beck-Nielsen H (2006) Identification of the oxidized low-density lipoprotein scavenger receptor CD36 in plasma: a novel marker of insulin resistance. Circulation 114: 1169-1176.

71. Handberg A, Skjelland M, Michelsen AE, Sagen EL, Krohg-Sorensen K, et al. (2008) Soluble CD36 in plasma is increased in patients with symptomatic atherosclerotic carotid plaques and is related to plaque instability. Stroke 39: 3092-3095.

72. Montecucco F, Steffens S, Mach F (2008) Insulin resistance: a proinflammatory state mediated by lipid-induced signaling dysfunction and involved in atherosclerotic plaque instability. Mediators Inflamm 2008: 767623.

73. Fernandez-Real JM, Handberg A, Ortega F, Hojlund K, Vendrell J, et al. (2009) Circulating soluble CD36 is a novel marker of liver injury in subjects with altered glucose tolerance. J Nutr Biochem 20: 477-484.

74. Grant GR, Manduchi E, Stoeckert CJ, Jr. (2007) Analysis and management of microarray gene expression data. Curr Protoc Mol Biol Chapter 19: Unit 19 16.

75. Paul A, Yechoor V, Raja R, Li L, Chan L (2008) Microarray gene profiling of laser-captured cells: a new tool to study atherosclerosis in mice. Atherosclerosis 200: 257-263.

76. Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, et al. (2008) Genetics of gene expression and its effect on disease. Nature 452: 423-428.

77. Kerr G, Ruskin HJ, Crane M, Doolan P (2008) Techniques for clustering gene expression data. Comput Biol Med 38: 283-293.

78. Bijnens AP, Lutgens E, Ayoubi T, Kuiper J, Horrevoets AJ, et al. (2006) Genome-wide expression studies of atherosclerosis: critical issues in methodology, analysis, interpretation of transcriptomics data. Arterioscler Thromb Vasc Biol 26: 1226-1235.

79. Sinnaeve PR, Donahue MP, Grass P, Seo D, Vonderscher J, et al. (2009) Gene expression patterns in peripheral blood correlate with the extent of coronary artery disease. PLoS One 4: e7037.

80. Valk PJ, Verhaak RG, Beijen MA, Erpelinck CA, Barjesteh van Waalwijk van Doorn-Khosrovani S, et al. (2004) Prognostically useful gene-expression profiles in acute myeloid leukemia. N Engl J Med 350: 1617-1628.

81. Wheelock CE, Wheelock AM, Kawashima S, Diez D, Kanehisa M, et al. (2009) Systems biology approaches and pathway tools for investigating cardiovascular disease. Mol Biosyst 5: 588-602.

Page 55: GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

GENE EXPRESSION PROFILING OF HUMAN ATHEROSCLEROSIS

Sara Hägg | References 43

82. Ouzounian M, Lee DS, Gramolini AO, Emili A, Fukuoka M, et al. (2007) Predict, prevent and personalize: Genomic and proteomic approaches to cardiovascular medicine. Can J Cardiol 23 Suppl A: 28A-33A.

83. Weston AD, Hood L (2004) Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. J Proteome Res 3: 179-196.

84. Dobrin R, Zhu J, Molony C, Argman C, Parrish ML, et al. (2009) Multi-tissue coexpression networks reveal unexpected subnetworks associated with disease. Genome Biol 10: R55.