“10,001 Dalmatians” research programme:
Discovery of genetic variants that control
human quantitative traits and predispose to diseases
Igor Rudan, Mladen Boban, Tatijana Zemunik, Gordan Lauc, Zoran Đogaš, Stipan Janković, Ivica Grković, Ana Marušić,
Janoš Terzić, Rosanda Mulić, Vjekoslav Krželj, Lina Zgaga, Zrinka Biloglav, Ivana Kolčić, Marina Pehlić,
Grgo Gunjača, Danijela Budimir, Ozren Polašek
2001. – human genome sequence
was published
Main expectation (general public, investors, researchers, pharma and biotech industries):
Linking genes with diseases and development of new treatments and “personalized medicine” – the race towards this goal begun (each group with its own approach)
Main idea:
1)Find “markers” in the genome and “tag” the whole genome as densely as possible;
2)Find consistent associations between some of those markers and disease phenotypes
3)Find genes in proximity of implicated markers – they are “disease genes”
CASES(“affected”)
CONTROLS(“unaffected”)
STRMARKER A
STR MARKER B,C…
DISEASE GENE (MUTATION)
DISEASE GENE (WILD TYPE)
Short tandem repeats (STR) –
e.g. (TA)x4 or (CTG)x7
– hundreds of STRs across
the genome
- STR marker maps were not dense, but they were still very useful to “pick” genes that caused monogenic (Mendelian) diseases
Problems with genome-wide linkage analyses using genome-wide STR maps:
1)STR markers and diseases were not always 100% linked because of incomplete penetrance of causing mutations or genetic heterogeneity of the disease: low study power
2)STR markers and disease genes were not always 100% linked because of recombination (crossing over) between them: low study power
CASES(“affected”)
CONTROLS(“unaffected”)
STRMARKER A
STR MARKER B,C…
DISEASE GENE (MUTATION)
DISEASE GENE (WILD TYPE)
Problems with genome-wide linkage analyses using genome-wide STR maps:
3) Even when a marker closest to disease gene was found with nearly 100% certainty, it still took years to find all candidate genes in regions up to 10 megabases (or more) and sequence them all to find exact causal mutation
4) Good ideas:
-Choose to study phenotypes that are precisely measurable and in good correlation with genotypes-Use populations with large linkage disequilibrium
Strategy (1): Our group proposed to rely on isolated populations (for increased LD) and pedigree-based approach (adds information) in 1999
Nat Genet 1999; 23: 397-404
Strategy (2): Our group proposed a highly polygenic model for complex traits and diseases in 2003
Genetics 2003; 163: 1011-1021
Trends Genet 2003; 19: 97-106
HIGHLY POLYGENIC GENETIC BASIS (FEW RARE VARIANTS WITHLARGE EFFECTS AND MANY COMMON WITH SMALL EFFECTS)
“-OMICS” LEVEL (PROTEOMICS, LIPIDOMICS,GLYCOMICS, METABOLOMICS)
QUANTITATIVE TRAIT LEVEL (e.g. CHOLESTEROL, BLOOD PRESSURE)
Our understanding of complex traits and diseases:
COMPLEX DISEASE PHENOTYPE
ENVIRONMENT
ENVIRONMENT
ENVIRONMENT
GWAS: MOST POWER & FUNCTIONAL RELEVANCE
Strategy (3): Our group proposed to measure large number of QTs (closer to genes - power, more chance, later - networks)
Quantitative traits: More than 100 selected initiallyANTHROPOMETRIC MEASURES PHYSIOLOGICAL MEASURES ELECTROCARDIOGRAM
Body height Systolic blood pressure (1&2) ECG (30 sec, digital)*
Body weight Diastolic blood pressure (1&2) P duration
Bicondylar brachial width Impedance - body resistency PR interval
Abdomen circumference Impedance - body reactancy QRS duration
Hip circumference Ankle-brachial BP indeks QT interval
Brachial circumference Spirometry - FVC QTc interval
Biceps skinfold Spirometry - FEV1 P axis
Triceps skinfold Spirometry - PEF QRS axis
Subscapular skinfold Spirometry - FEF25 T axis
Suprailiac skinfold Spirometry - FEF50
Abdomen skinfold Peak flow
Head circumference Bone mineral density
COGNITIVE & SLEEP TRAITS EYE MEASURES LIFESTYLE
Eysenck Personality Inventory Retinal art:ven diameter ratio Family disease history
Digit-symbol test Retinal art leng:diam ratio Birth weight
Mill-Hill vocabulary Retinal art branching angle Medical/surgical history
Standard Progress. Matrices Retinal arteriolar tortuosity Menstruation, menarche, HRT
Controlled Oral Word Assoc. Retinal arterjunction expon Rose Angina questionnaire*
Weschler Memory Scale Intraoccular pressure, OD, OS Claudication questionnaire*
Munich Chronotype Question. Fundus photography Respiratory questionnaire
GHQ-30 Autorefractor-measurements Physical activity
Intra-ocular length-measur. Smoking
Alcohol
Diet
Socioeconomic status
Quantitative traits: More than 100 selected initiallyBIOCHEMICAL MEASURES LIPIDOMICS MARKERS OF INFLAMMATION
Creatinine A large number (several FibrinogenUric acid hundred) of circulating von Willebrand's factorTotal cholesterol lipid metabolytes, e.g. D-dimersTriglycerides 132 phospholipids, CRP
HDL 70 sphingolipids, fatty acids, tPA inhibitor
LDL apolipoproteins, etc.
CalciumPhosphorousAlbuminHbA1c
Glucose
GLYCOMICS URINE TRAITS GENOTYPING
16 main groups of N-glycans, A larger number of traits Cohort 1: STR typing
4 additional groups based on quantitated in urine samples Cohort 2: 800 STR
number of antennas, and that are biomedically 317.000 SNP
3 derived variables relevant Cohort 3: 370.000 SNP + CNV
Cohort 4: 370.000 SNP + CNV
2000-2002: The British Council 2001-2004: The Wellcome Trust 2002-2003: Medical Research Council UK (1/3) 2002-2006: Ministry of Science and
Technology, Croatia 2003-2005: The Royal Society, UK 2003-2004: National Institutes of Health, USA 2003-2005: Medical Research Council UK (2/3) 2006-2009: EU fp6 EUROSPAN 2005-2010: Medical Research Council UK (3/3) 2007-2012: Ministry of S & T, Croatia (The Croatian Biobank)
Grants awarded 2000-2007 (£ 4.0 M)
Strategy (4): Finding money to start a large cohort
“Susak-10”:served to choose the
most appropriate population(2001-2002)
COHORT 1.(1001 examinee)
2003: The choice of further populations was based on demography data and population genetic studies
2003: The populations were extremely differentiated (based on analysis of 26 STR markers below); LD studies conducted using 8 STR markers on Xq13-12
COHORT 2.(1024 examinees)
“Vis”:genotyped with (i) 800 STRs and (ii) Illumina 317 k
(2003-2005)
COHORT 3.(969 examinees)
“Korcula”:genotyped with
Illumina 370 k CNV(2006-2007)
COHORT 4.(1001 examinees)
“Split”:outbred population
genotyped with Illumina 370 k CNV
(2008-2009)
Year 2005: BAD YEAR
We used 800 STR marker scan and analysed the data using genome-wide linkage analysis.
What did we find?
ABSOLUTELY NOTHING.
Other approaches (e.g. candidate genes and case-control studies)?
NO REPLICATIONS FOR ANY OF THE THOUSANDS OF REPORTED ASSOCIATIONS (…OK, MAYBE 4-5 MAX.)
The HapMap project
Tried to define “blocks” of genome between “recombination hotspots” and tag each one of them with one of more than 10 million predicted SNPs: new GWAS based on SNPs
Year 2006: TECHNOLOGICAL BREAKTHROUGH!
Affymetrix Inc. and Illumina Inc.:
Dense genome-wide scans using hundreds of thousands of SNP markers (from HapMap project – “tagging SNPs)
Year 2007: THE “BRAVE NEW WORLD” STUDY
(WTCCC, Nature, June 07, 2007)
2006.-2007. First analyses of data using SNP
2008: uric acid & gout
2009: lipid levels & coronary heart disease
2010: fasting glucose & diabetes type 2 Nat Genet 2010
2010: FVC, FEC & chr. lung disease Nat Genet 2010
2010: creatinine & chr. kidney disease Nat Genet 2010
(2011: blood pressure & stroke) JAMA 2011 ?
(2011: CFH & age-related mac. degeneration) Lancet ?
Nat Genet 2009; 41: 47-55Nat Genet 2008; 40: 437-442
Results of GWAS of QTs with “disease risk” studied
2009: smoking initiation and intensity Nat Genet 2010
2009: clotting factors VII, VIII & vWF Circulation 2010
(2010: sleep duration and latency) Nat Genet 2010 ?
(2010: human height, weight, WHR) 3 x Nat Genet 2010 ?
(2010: global lipids) Nature 2010 ?
(2010: cognitive traits) 2 x Nat Genet 2010 ?
(2010: ECG, urine, CRP, HbA1c, ABPI, P, cortisol…)
PLoS Genet 2009; 5: e1000539PLoS Genet 2009; 5: e1000504
Results of GWAS of QTs without disease risk links
Strategy (5): Next moves (plan for 2010-2012)
1. GWAS of -OMICS (“1 level down from QT”) & functional follow-up & systems biology / pathways
2. Development of novel methods for analysis of the effect of CNV and rare variants on human QTs
3. Expand the number of phenotypes measured in plasma in at least 3,000 examinees (e.g. ILs, etc.)
4. Whole-genome sequence for 1,000 examinees & the new round of consortia participation
Forthcoming (2010): GWAS of 132 circulating phospholipids (PLoS Genetics)
PLoS Genet 2009; 163: 1011-1021
Results of GWAS of LIPIDOMICS traits
Further interest of our group: GWAS of glycomics, proteomics, other metabolomics and functional follow-up
Progress in GLYCOMICS: dependent of measurement
Nature 2009; 457: 617-620
High-performance liquid chromatography (HPLC):- Glycoproteins immobilized
Rudd PM et al. (Natl. Inst. Bioprocessing Res. Train.):
refined chromatography approaches for analysis of glycosylation
- Glycans released- Fluorescent labels attached- Labelled sugars run on a normal phase HPLC column- Resulting peaks correlated to a pre-run dextran ladder
“GlycoBioGen”:A consortium led by
collaboration of Scottish, Croatian & Irish
institutions
CROATIAN CENTRE FORGLOBAL HEALTH
• Separation of plasma N-glycans in 16 chromatographic peaks using HPLC method (GP1-GP16): area under peak measured as a QT
Quantitation of glycans in human plasma:
• Unusual biological variability at population level• Significant effects of age, gender, environmental
factors• Highly varying heritabilities• Striking correlations with other biochemical QTs
J Proteome Res 2009; 8: 694-701
• FUT8: associated with GP1 in 1,000 subjects (p=5.09 x 10-8 - 7.07 x 10-8)
Results of GWAS study (Vis island, Croatia):
Strategy (5): Next moves (plan for 2010-2012)
1. GWAS of -OMICS (“1 level down from QT”) & functional follow-up & systems biology / pathways
2. Development of novel methods for analysis of the effect of CNV and rare variants on human QTs
3. Expand the number of phenotypes measured in plasma in at least 3,000 examinees (e.g. ILs, etc.)
4. Whole-genome sequence for 1,000 examinees & the new round of consortia participation
• CNVs (copy number variants):
• Nature (April 2010) – WTCCC – didn’t find any associations with disease at all;
• Rare variants:
• “Moving frames” method (by Eleftheria Zeggini at Sanger, Hinxton, Cambridge): MAGIC, DIAGRAM & SPIROMETA
• “Exome sequencing” (4-10x)
• “Deep whole-genome sequencing” (48x)
“Missing heritability”:
Strategy (5): Next moves (plan for 2010-2012)
1. GWAS of -OMICS (“1 level down from QT”) & functional follow-up & systems biology / pathways
2. Development of novel methods for analysis of the effect of CNV and rare variants on human QTs
3. Expand the number of phenotypes measured in plasma in at least 3,000 examinees (e.g. ILs, etc.)
4. Whole-genome sequence for 1,000 examinees & the new round of consortia participation
• Gordan: N-glycans
• Zoran: CRD series
• Tatijana i Vesela: T4, TSH
• Mladen: markers of oxidative stress?
• Janoš: proteomics?
• Rosanda: anti-HBV antigens?
• Ana: interleukins, CD4?
“Expand phenotypes”:
Strategy (5): Next moves (plan for 2010-2012)
1. GWAS of -OMICS (“1 level down from QT”) & functional follow-up & systems biology / pathways
2. Development of novel methods for analysis of the effect of CNV and rare variants on human QTs
3. Expand the number of phenotypes measured in plasma in at least 3,000 examinees (e.g. ILs, etc.)
4. Whole-genome sequence for 1,000 examinees & the new round of consortia participation
• Wellcome Trust Sanger Institute, Hinxton, Cambridge: agreement that 400 / 2500 first examinees with WGS will be Croatians (Korcula)
• Why? – genealogies (expanding the number through “imputation”) and dense phenotyping (hundreds of QTs)
• Project will start: end of 2011
• Value for us: GBP 4 million at present time; should get us into the “next wave” of consortia work; needs Vesna Boraska etc.
“Whole-genome sequence era”: