Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
The EFR Project: a Collaborative Network to Establish an
Arabian Bio-bank Resource to Identify Disease Genes of
Indigenous Populations.
Habiba Sayeed Al-Safar BSc (Biochemistry)
MSc (Medical Engineering)
This thesis is presented for the degree of
Doctor of Philosophy
Centre for Forensic Science
2011
i
i
DEDICATION
To My special Uncle Hamza
Thank you for being there every step of the way
Thank you for guiding me when I went astray
Thank you for everything you did throughout my whole life
ii
ii
DECLARATION
This thesis is submitted to University of Western Australia in fulfillment of the requirements
for the Degree of Doctor of Philosophy.
This thesis has been composed by myself from results of my own work, except where stated
otherwise, and no part of it has been submitted for a degree at this, or at any other university.
Habiba Sayeed Al Safar
iii
iii
PREFACE This thesis is presented as a series of eight chapters. The introductory chapter sets the basis for
the work under taken during the tenure of this study. The final commentary summarises the
main features and findings of the work performed and establishes the next phase of work that
is required. In between are six chapters presented as manuscripts in format of journal that they
have been submitted to. Preceding each manuscript, a general synopsis with specific authors
contributions are declared. Each chapter contains the basic components of an article, namely
an abstract or summary; introduction; materials and methods; results with concluding remarks;
acknowledgments, and a bibliography in the format of the journal to which the manuscript is
submitted.
iv
iv
TABLE OF CONTENTS
DEDICATION ................................................................................................................... i
DECLARATION .............................................................................................................. ii
PREFACE ....................................................................................................................... iii
TABLE OF CONTENTS ................................................................................................ iv
ACKNOWLEDGMENTS ........................................................................................... viii
ABSTRACT ...................................................................................................................... x
LIST OF ABBREVIATIONS ........................................................................................ xv
DEFINITIONS .............................................................................................................. xvi
CHAPTER 1 ..................................................................................................................... 1
LITERATURE REVIEW: AN OVERVIEW OF FACTORS THAT PREDISPOSE
TO TYPE 2 DIABETES IN DIFFERENT POPULATIONS AND THE NEED OF
GENOME STUDIES IN ETHNIC POPULATION OF THE MIDDLE EAST. ........ 1
Epidemiology of Type 2 Diabetes ...................................................................................... 5
Types of Diabetes ............................................................................................................. 10
Risk Factors ...................................................................................................................... 11
Symptoms ......................................................................................................................... 13
Screening .......................................................................................................................... 13
Treatment ......................................................................................................................... 13
Prevention ......................................................................................................................... 15
Genetic approach towards understanding Type 2 Diabetes ............................................. 18
Positional candidate genes approach ................................................................................ 22
Whole genome screen approach ....................................................................................... 23
Association studies ........................................................................................................... 23
Linkage studies ................................................................................................................. 26
Identifying genes contributing to Diabetes ...................................................................... 27
Previous studies ................................................................................................................ 28
Animal studies .................................................................................................................. 28
Human studies .................................................................................................................. 28
Genome wide scans of different populations ................................................................... 30
Asia ................................................................................................................................... 34
Chinese population ........................................................................................................... 34
Japanese population .......................................................................................................... 35
v
Indian population .............................................................................................................. 38
North America .................................................................................................................. 39
Pima Indian population .................................................................................................... 39
Amish population ............................................................................................................. 41
African American population ........................................................................................... 42
Mexican American population ......................................................................................... 43
Europe .............................................................................................................................. 44
Dutch population .............................................................................................................. 45
Ashkenazi Jews population .............................................................................................. 45
Finnish population ............................................................................................................ 46
French population ............................................................................................................. 48
Middle East ...................................................................................................................... 51
Historical Background of Arabs ....................................................................................... 51
Arab Migration: ................................................................................................................ 54
Genetic Disorders in the Arab world: .............................................................................. 54
Need and Scope of Medical Researches in the Arab World: ........................................... 56
United Arab Emirates (UAE) ........................................................................................... 56
Conclusion ........................................................................................................................ 59
CHAPTER 2 ................................................................................................................... 69
THE PREVALENCE OF TYPE 2 DIABETES MELLITUS IN THE UNITED
ARAB EMIRATES: JUSTIFICATION FOR THE ESTABLISHMENT OF THE
EMIRATES FAMILY REGISTRY. ............................................................................ 69
Abstract ............................................................................................................................ 77
Introduction ...................................................................................................................... 78
Results .............................................................................................................................. 83
Discussion ........................................................................................................................ 91
Conclusion ........................................................................................................................ 96
Acknowledgements .......................................................................................................... 97
References ........................................................................................................................ 98
CHAPTER 3 ................................................................................................................. 101
HERITABILITY OF QUANTITATIVE TRAITS ASSOCIATED WITH TYPE 2
DIABETES IN AN EXTENDED FAMILY FROM THE UNITED ARAB
EMIRATES. .................................................................................................................. 101
Abstract .......................................................................................................................... 107
vi
Introduction .................................................................................................................... 108
material and methods ..................................................................................................... 110
Results ............................................................................................................................ 112
Discussion ...................................................................................................................... 116
Acknowledgment ........................................................................................................... 118
References ...................................................................................................................... 119
CHAPTER 4 ................................................................................................................. 121
EVALUATION OF DIFFERENT SOURCES OF DNA FOR USE IN GENOME
WIDE STUDIES ........................................................................................................... 121
Abstract .......................................................................................................................... 131
introduction .................................................................................................................... 132
Material and Methods ..................................................................................................... 135
Results ............................................................................................................................ 137
Discussion ...................................................................................................................... 146
Acknowledgements ........................................................................................................ 150
conflict of Interest .......................................................................................................... 151
References ...................................................................................................................... 152
CHAPTER 5 ................................................................................................................. 155
CHARACTERISATION OF MHC POLYMORPHIC ALU INSERTIONS
(POALIN) IN A POPULATION OF ARAB BEDOUINS. ....................................... 155
Abstract .......................................................................................................................... 161
Introduction .................................................................................................................... 162
Materials and Methods ................................................................................................... 164
Results ............................................................................................................................ 167
Discussion ...................................................................................................................... 176
Acknowledgements ........................................................................................................ 179
References ...................................................................................................................... 180
CHAPTER 6 ................................................................................................................. 183
A GENOME WIDE SEARCH FOR TYPE 2 DIABETES SUSCEPTIBILITY
GENES IN ARAB FAMILIES. ................................................................................... 183
Abstract .......................................................................................................................... 189
Introduction .................................................................................................................... 190
Results ............................................................................................................................ 192
Discussion ...................................................................................................................... 205
vii
Materials and methods ................................................................................................... 210
Acknowledgment ........................................................................................................... 213
Conflict of Interest ......................................................................................................... 214
References ...................................................................................................................... 215
CHAPTER 7 ................................................................................................................. 221
A GENOME-WIDE ASSOCIATION STUDY EXAMINING OBESE FACTORS IN
AN ARAB FAMILY WITH A HISTORY OF TYPE 2 DIABETES ....................... 221
Abstract .......................................................................................................................... 227
Introduction .................................................................................................................... 228
MaterialS and MethodS .................................................................................................. 230
Results ............................................................................................................................ 232
Discussion ...................................................................................................................... 245
Acknowledgment ........................................................................................................... 250
References ...................................................................................................................... 251
CHAPTER 8 ................................................................................................................. 257
COMMENTARY AND FINAL REMARKS ............................................................. 257
Commentary and Final Remarks .................................................................................... 259
References ...................................................................................................................... 266
viii
ACKNOWLEDGMENTS
IN THE NAME OF GOD, THE MERCIFUL, THE CLEMENT!
PRAISE BE TO GOD, Lord of the Worlds, and prayer and peace upon the Lord of the
Prophets, Our Lord and Master Muhammad and upon his family and companions prayer and
peace perpetually required until the Day of Judgment.
The marvelous journey has come to an end. Over recent years I have come to realize that
doing a Ph.D. is the best job one can have, and that Australia is actually the best place for
doing it. I consider myself lucky that I have been given the opportunity to do my Ph.D. here, at
the University of Western Australia, both for professional and social reasons.
First and foremost, I would like to thank GOD the merciful and the passionate for giving me
wisdom and guidance throughout my life and I do believe that GOD send his blessings in to
me form of people.
I would not be able to name everyone separately and to thank for everything that they did for
me, however I would like to take the opportunity and express a few words of thanks to my
best colleagues, friends and family.
I am grateful to His Excellency, Lieutenant General Dhahi Khalfan Tamim, the Dubai Police
Commander-in-Chief for the scholarship, which enabled me to undertake this doctoral work at
the University of Western Australia. I am also thankful to Mr. Ahmed Al-Mansori, Head of
scholarship section in Dubai Police for his assistance.
This study would not have been possible without the general support of my two supervisors,
Dr. Guan Tay and Dr. Kamal Khazanehdari who opened up the real ‘world of worms’ to me!
They were generous in providing me this opportunity to receive my Ph.D., and very brave to
take me on as their student. They have always been eager to help me through the toughest
challenges during my time at University of Western Australia and Dubai.
I am grateful to the staff at Molecular Biology and Genetics (MBG) Department in Central
ix
Veterinary Research Laboratory (CVRL) for their help, friendship and useful discussions. I
would like to thank and acknowledge every one of you. We had nice times, stories, humors
which will be always in my memory.
I am also most grateful to staff at Telethon Institute for Child Health Research, who always
made me feel very welcome and who gave all possible assistance in the search for
bioinformatics and biostatistics. In particular, I thank Dr. Sarra Jamieson, Richard Francis and
Professor Jenefer Blackwell for all their efforts.
I acknowledge the valuable contributions of Dr Heather Cordell, who not only gave of her
time generously and imparted enormous detail, but also followed up with further advice or
sources of information.
I would like to acknowledge the sources of financial support for this research: CVRL,
Emirates Foundation and Dubai Police Head Quarter. Without them, this study would not have
been possible.
This whole thesis could not have looked like it is without my best friend Jenan. I more than
appreciate her help and support, on occasion, she has dried my tears.
I thank my faith friends: Moza Alnahyan, Amal Alghanim, Laila Alsayegh and Ahlam
Salmeen for their support, perspective, and encouragement.
I would like to thank my cousin Hind Alsafar a graphic designer for all her help and support in
designing all the posters, which I participated at conferences and seminars.
Last but of course not least, this work would not have been achieved without the support and
understanding of my family. I would like to thank my mum, dad, sisters and brothers. My
hard-working parents have sacrificed their lives for my sisters, brothers and myself and
provided unconditional love and care. I love them so much, and I would not have made it this
far without them. I know I always have my family to count on when times are rough.
The work described in this thesis was performed with approval from the University of Western
Australia's Human Research Ethics Committee with reference # RA/4/1/4432.
x
ABSTRACT
This project was developed back in 2006 with the aim to detect loci or gene(s) that may
influence susceptibility to Type 2 Diabetes (T2D) and related traits in individuals of Arab
descent. This was required the comparative study of patients and unaffected individuals.
Samples were made available from consenting volunteers from United Arab Emirates (UAE)
population. Phenotypic data and the genotyping results were systematically compiled in bio-
banking and data repository known as the “Emirates Family Registry” (EFR). When the
project was initially conceived, data on DNA haplotypes in the tribes of the Middle-East was
limited. Coincidentally, significant advances in DNA technology, particularly in the field of
DNA arrays, provide the opportunity to study this group of people. Over the past four years,
basic infrastructure that will allow longitudinal genetic studies have started to emerge. This
study has specifically benefited through access to information on the Bedouin people, a
predominantly desert-dwelling Arab ethnic group.
In the first instance, the study examined the evolutionary relationship between the Arab
Bedouin and other ethnic groups. Polymorphic Alu insertions (POALINS) are genetic markers
that are widely distributed through the human genome. These markers have been used in a
range of applications, including anthropological analyses of human populations. In an effort
to understand the evolutionary relationship of the Bedouin population in the context of other
ethnic groups, the frequencies of individual insertions of four POALINs within the human
Major Histocompatibility Complex (MHC) class I region, namely AluyMICB, AluyTF, AluyHJ
and AluyHF; were compiled. The phylogenetic tree was constructed using MEGA version 4.
The genotype frequencies of each of these POALINS in Bedouins were found to be very
similar and nearly identical to that previously reported for Caucasians in an Australian study.
For AluyHJ, the highest frequency for allele*1 was found in Malaysian Chinese, northeastern
Thais, Japanese, and Mongolians (0.376 to 0.292). In contrast, the frequency in Bedouins
(0.242) was similar to that previously reported for Australian Caucasians (0.273), each
representing the second highest allele frequency. The African subpopulations showed a lower
frequency of this allele (0.107 to 0.050). Phylogenetic analysis of the relative allele
frequencies of AluyHJ in combination with the remaining three POALINs markers revealed
that Bedouins have a similar lineage to Caucasians, at least for the MHC region studied. The
structure of the phylogenetic tree supports the popular contention that humans originated in
xi
Africa. The nature of the clusters suggests that the Middle East represent a crossroads from
which humans populations migrated toward Asia in the east and Europe to the northwest.
The characteristics of Arab population make them ideal for the study of complex, polygenic,
multifactorial disorders such as Type 2 Diabetes (T2D). In the United Arab Emirates (UAE)
alone, it has been estimated that one out of five people between the ages of 20 to 79 lives with
this disease. Due to an increasing prevalence of T2D in the region, lifestyle management
strategies with an emphasis on prevention are required. An appreciation of the genetic risk
factors can also make an important contribution to understanding the processes leading to the
disease.
Major hospitals and diabetes centres in the UAE were contacted to establish a bio-banking
facility referred to as the EFR (an abbreviation for the “Emirates Family Registry”). Through
assistance made available by the Ministry of Health and collaborators of this network,
demographic data of T2D patients were collected and collated in a database for analysis and
longitudinal studies in the future. Clinical specimens were collected for Genome Wide
Association Studies (GWAS) study and biochemical profiling (such as; glucose, lipids, HbA1c
levels) were also collected from volunteers who consented to be part of the study.
In the field of epidemiology, GWAS studies are commonly used to identify genetic
predispositions of many human diseases. Large repositories housing biological specimen for
clinical and genetic investigations have been established to store material and data for these
studies. The logistics of specimen collection and sample storage can be onerous, and new
strategies have to be explored. This study established the utility of FTATM cards as a viable
storage matrix for cells from which DNA can be extracted to perform GWAS analyses.
Specifically, three different DNA sources (namely, degraded genomic DNA, amplified
degraded genomic DNA and amplified extracted DNA from FTA card) for GWAS using the
Illumina platform were examined. No significant difference in call rate was detected between
amplified degraded genomic DNA extracted from whole blood; the gold standard for GWAS,
and amplified DNA retrieved from FTATM cards. However, using unamplified- degraded
genomic DNA reduced the call rate to a mean of 42.6% compare to amplified DNA extracted
from FTA card with mean of 96.6%. It is therefore possible to use FTATM stored biological
samples as a source of DNA for GWAS studied, provided that a pre-amplification step is
incorporated into the process.
xii
In the first 24 months of operation, the EFR recruited 23,064 adult volunteers from three
major hospitals and nine primary care centres throughout the UAE. Within this cohort, 88%
were patients classified as T2D patients from the medical records. The cohort was divided
into age categories with 59% of T2D patients aged between 40 and 59 years of age. UAE
nationals comprised 30% of the database of which 21% were diagnosed with T2D. However
the percentage of adults with T2D was higher in other ethnic groups affecting almost 33% of
the Indians who live in the UAE. A total of 741 UAE Nationals consented to donate blood; in
Phase I of the study; for biochemical testing, of which 23% were diagnosed with T2D, 30%
with pre-T2D and 47% were healthy following the completion of testing.
This study subsequently assessed the value of specific clinical markers for T2D among five
generations of an extended Arab family. This family included 319 members of 41 nuclear
families; from which 178 individuals (86 males, 92 females; 66 diabetic, 112 healthy) formed
the study sample set. The heritability of eight quantitative traits (fasting glucose, glycated
hemoglobin (HbA1c), cholesterol, triglyceride, urea and creatinine) were determined. Once the
data in the disease and control groups were stratified, a significant relationship between T2D
status and waist circumference (WC) (p = 2.6, E-9) and BMI (Body Mass Index) (p = 1.0, E-6)
was found. The estimated power for these two traits was 80% to 90%, respectively.
Creatinine (p = 0.002) and cholesterol (p = 0.02) levels were also associated with T2D. Not
surprisingly the results support the link between environmental and genetic factors in the
pathophysiology of T2D and its related phenotypes in an Arab population. To dissect the
mechanisms that cause disease, genetic studies followed.
Firstly, a Family Based Association Test (FBAT) in the same family was performed using the
Illumina Human 660 Quad chip array to better understand the gene(s) that play a role in
pathways that cause T2D disease. The study revealed 21 new association signals from single
nucleotide polymorphisms (SNPs) within five genes (RBM47, KCTD8, GABRB1, SCD5 and
PRKD1). Six SNPs within PRKD1 (Protein Kinase D1) gene on chromosome 14 were found
to be most strongly associated with T2D in this Arab population. It has been suggested that
PRKD1 a serine/threonine kinase; plays an important role in insulin secretion. The strongest
statistical evidence for a new association signal was from rs7154546 in intron 1 of PRKD1,
with the overall estimate of effect returning an odds ratio (OR) of 3.72 (95% confidence
interval, 1.28 to 10.82); (p = 8.46, E-06) using an additive model.
xiii
As mentioned, WC and BMI are phenotypes that have strong heritability values. Since
overweight and obesity are major risk factors for a number of chronic diseases, including T2D
a search to identify common genetic variants that may influence obesity and its association
with T2D was undertaken. Specifically, a GWAS study was conducted in an extended family
with 178 individuals of Arab descent using WC and BMI as indicators. This study revealed
three loci that reached genome-wide significance. The meta-analysis of Caucasian GWAS
resulted in one previously described locus that was associated with WC on chromosome 16
within FBXO31 gene region (rs9308437, p = 7.5, E-7). Another novel association, the
rs2793823 SNP in the ADAM30 (p = 1.86, E-8) gene that has been previously show to be
associated to T2D. One novel SNP (rs7120774) in GALNTL4 was also showed to be
associated with BMI (p =1.82, E-10). The positive associations between SNPs from the JAZF1
loci and BMI, WC, T2D were also confirmed. Further work is required to replicate these
results in other sample sets to validate these preliminary results.
This study is the first GWAS study undertaken in T2D candidates in families of Arab descent.
These findings may provide important insights into the pathogenesis of T2D, in Middle
Eastern populations. Comparative analysis with sequences from other ethnic groups could
assist in dissecting the mechanisms that cause the disease. These efforts will continue to be
important with the increasing affluence of Arab communities. Greater personal wealth in
linked to greater indulgences. It is important to develop an understanding of the relationship
between ethnic specific allelic and haplotypic patterns that leads to disease, in an effort to
control the spread of and manage the consequences of the disease.
In conclusion, comparative genomics in medical science has been widely used to identify
genetic factors that cause disease. Ethnic differences have also been helpful in this respect.
The genetic links of several genetic discoveries that are unique to specific ethnic groups (eg.
hemochromatosis in Caucasians, thalassemia in ethnic groups of the Mediterranean) have been
identified through the comparisons of genomes of different races. Other opportunities
including DNA and race profiling in forensic science will benefit from an appreciation of
ethnic specific differences.
The null hypothesis of this study was that the alleles and genes in the Arabic population that
predispose patients to Type 2 Diabetes were the same as those described for other populations
xiv
previously studied. The genetic factors of interest were studied using GWAS (Genome Wide
Association Study) technology in the context of lifestyle factors that is known to affect
patients living with diabetes. If this hypothesis is rejected, then the alternative, novel genetic
factors unique to the Arabic population contribute to the pathophysiology of the disease.
Regardless of the findings, the data gleaned from this study will result in the characterization
and definition of Arabic haplotypes that are associated with the disease. The genetic
characteristics will have other applications including anthropological and evolutionary
analysis as well as Forensic profiling.
xiv
xv
LIST OF ABBREVIATIONS
AITD Autoimmune Thyroid Disease
BMI Body Mass Index
CVRL Central Veterinary Research Laboratory
dgDNA Degraded Genomic DNA
DIO Diet Induced obese
DNA Deoxyribonucleic Acid
EFR Emirates Family Register
FBAT Family Based Association Test
GWAS Genome Wide Association Studies
IPA Ingenuity Pathway Analysis
LD Linkage disequilibrium
LOD Logarithm of the Odds
MHC Major Histocompatibility Complex
OGTT Oral Glucose Tolerance Test
OR Odd Ratio
p-value Probability Value
PCA principal Componant Analysis
PCR Polymerase Chain Reaction
POALINS Polymorphic Alu insertions
QC Quality Control
QTDT Quantitative Trait transmission Disequilibrium Test
SNP Single Nucleotide Polymorphism
T1D Type 1 Diabetes
T2D Type 2 Diabetes
UAE United Arab Emirates
UWA The University of Western Australia
VMH Ventromedial Hypothalamus
WA Western Australia
WC Waist Circumference
WGA Whole Genome Amplification
WHO World Health Organization
xvi
xvi
DEFINITIONS
Allele An alternative form of a gene that is located at a specific position
on a specific chromosome.
Candidate gene A gene believed to influence expression of complex phenotypes
due to known biological and/or physiological properties of its
products, or to its location near a region of association or
linkage.
Genome The entire complement of genetic material in a chromosome set.
Genotyping call rate Proportion of samples or SNPs for which a specific allele SNP
can be reliably identified by a genotyping method.
Haplotype A group of specific alleles at neighboring genes or markers that
tend to be inherited together.
HapMap Project Genome-wide database of patterns of common human genetic
sequence variation among multiple ancestral population samples.
Hardy Weinberg
Equilibrium
Population distribution of 2 alleles (with frequencies p and q)
such that the distribution is stable from generation to generation
and genotypes occur at frequencies of p2, 2pq, and q2 for the
major allele homozygote, heterozygote, and minor allele
homozygote, respectively under the assumption of natural
selection does not act on the alleles under consideration.
Heritability The proportion of variation in a phenotype (trait, characteristic
or physical feature) that is thought to be caused by genetic
variation among individuals. The remaining variation is usually
attributed to environmental factors. Studies of heritability
typically estimate the proportional contribution of genetic and
environmental factors to a particular trait or feature.
Linkage disequilibrium Association between 2 alleles located near each other on a
chromosome, such that they are inherited together more
frequently than expected by chance.
Linkage Equilibrium Occurs when the genotype present at one locus is independent of
the genotype at a second locus.
xvii
Minor allele frequency Proportion of the less common of 2 alleles in a population (with
2 alleles carried by each person at each autosomal locus) ranging
from less than 1% to less than 50%.
Phenotypes The total characteristics displayed by an organism under a
particular set of environmental factors, regardless of the actual
genotype of the organism.
Polymerase Chain
Reaction
A method for amplifying segments of DNA, by generating
multiple copies using DNA polymerase enzymes under
controlled conditions. As little as a single copy of the DNA
segment or gene can be cloned into millions of copies, allowing
detection using dyes and other visualization techniques.
Population stratification A form of confounding in genetic association studies caused by
genetic differences between cases and controls unrelated to
disease but due to sampling them from populations of different
ancestries.
Power A statistical term for the probability of identifying a difference
between 2 groups in a study when a difference truly exists.
Single Nucleotide
Polymorphism
DNA sequence variations that occur when a single nucleotide
(A, T, C, or G) in the genome sequence is altered. Each
individual has many single nucleotide polymorphisms that
together create a unique DNA pattern for that person. SNPs
promise to significantly advance our ability to understand and
treat human disease.
Whole Genome
Association Study
An examination of genetic variation across a given genome,
designed to identify genetic associations with observable traits.
1
CHAPTER 1
LITERATURE REVIEW: AN OVERVIEW OF
FACTORS THAT PREDISPOSE TO TYPE 2 DIABETES
IN DIFFERENT POPULATIONS AND THE NEED OF
GENOME STUDIES IN ETHNIC POPULATION OF
THE MIDDLE EAST.
2
3
Chapter 1
Literature Review: An Overview of Factors that
Predispose to Type 2 Diabetes in Different Populations and
the Need of Genome Studies in Ethnic Population of the
Middle East.
This Chapter is a prelude to a study to develop an understanding of the environmental and
genetic predisposition that gives rise to the collection of event etiologies resulting in Type 2
Diabetes in indigenous populations of the Middle East. Although the focus of the study is on
the Arab race that has roamed the deserts of the Middle East for centuries known as the
Bedouins, the work is the beginning of a research effort to understand diseases that commonly
affect the many tribes of Arabs. The processes and methods developed towards understanding
the factors that cause Type 2 Diabetes will be expanded beyond this initial effort to unlock
searches for other debilitating disease.
Therefore this chapter will outline the definition of diabetes; the lifestyle and genetic risk
factors of the disease and its potential health consequences. It will also discuss on preventive
measures. This review will also touch the implications of genetic research, with specific
emphasis on the findings of genome wide screening of T2D patients among different
population and ultimately discuss the necessity of genetic and genomic research to study the
disease among the indigenous Arab populations.
4
5
God has honored the human, and excelled in his creation, enable to create his creation in the
best stature, He says in the Qur’an, “Surely, we created man of the best stature”. The human
body is one of the most complex biological systems on earth compared to other living
creatures. It composed of trillions of cells, which contain the body’s hereditary material in the
chemical composition deoxyribonucleic acid (DNA). A person's DNA represents a "genetic
blueprint" that is unique to each individual. DNA consists of two long strands called double
helix composed of units called nucleotides; each group of nucleotides is a gene, which is the
basic physical and functional unit of heredity. Every person has inherited two copies of
chromosomes from his/her biological parents. Because no two human individuals (exception
of identical twins) are composed of the exact same genetic profile, DNA testing is the absolute
means to confirm any biological relationship in doubt.
DNA can provide insights into many intimate aspects of people and their families including
susceptibility to particular diseases, legitimacy of birth, identifying criminals, perhaps
predispositions to certain behaviors and defining ancestry.
In this study, we propose to measure genetic ancestry in Arab population in the United Arab
Emirates (UAE) using genome-wide Single Nucleotide Polymorphism (SNP) arrays. The
identification of polymorphisms that vary in frequency to this population will provide an
opportunity to enhance DNA profiling. Ethnic-specific polymorphisms can be used to profile
biological evidence left at the crime scene to provide information that could be useful in an
investigation. The study of DNA from the local ethnic groups provides a double benefit. Apart
from the development of new opportunities in forensic science, the markers will allow the
study of specific diseases that are common to populations of this region such as Type 2
Diabetes (T2D). Because the frequency of genetic variants can differ across populations, we
aim to detect genes influencing susceptibility to T2D in UAE population.
Epidemiology of Type 2 Diabetes
Type 2 Diabetes is a group of metabolic diseases characterised by hyperglycemia resulting
from defects in insulin secretion, the actions of insulin, or both [1]. Diabetes is currently one
of the most prevalent chronic diseases, which plays a significant role in the lives of millions of
people worldwide leaving others with much morbidity.
6
According to the International Diabetes Federation, the number of people diagnosed with
diabetes has risen from 30 million people to more than 246 million people in only the past
twenty years [2] (Figure 1). This illness is well documented in the United States. It has been
estimated that the total annual economic cost of diabetes in 2002 was estimated to be $132
billion, or one out of every 10 health care dollars spent in the United States [3]. Further, the
report indicates that seven of the ten countries with the highest number of diabetics are in the
developing world rather than where the medicines and treatments might be readily available.
In the Middle East, the percentage of the diabetic population ranges from 12 to 20 percent and
these numbers increase every year along with the rising costs associated with health care
provisions. In 2007, the UAE ranked the second highest noticing terms of diabetes prevalence,
and it is estimated that one out of five people aged 20 to 79 lives with this disease, while a
similar percentage of the population is at risk of developing the disease [4].
The purpose of this review is to outline what is known about diabetes; the lifestyle and genetic
risk factors of the disease and its potential health consequences. It will also touch on
preventive measures as well as management strategies to care for those afflicted with the
disease. This review will also discuss the implications of genetic research, with specific
emphasis on the findings of genome wide screening of T2D patients among different
population and ultimately discuss the necessity of genetic and genomic research to study the
disease among the indigenous Arab populations.
7
Figure 1: The prevalence of Type 2 Diabetes is predicted to rise in all continents according to Wild et al (2004) [5]. The global average of
20% is predicted to rise to 52.8% based on modeling studies by Parves et al (2007) [2]. More significantly, the prevalence will
receive by over 100% in population groups throughout Asia, Africa and the Middle East, with the latter recording the highest rise
of 164%.
8
Simply, diabetes is a disorder of sugar metabolism, which leads to inefficient use of sugar
resources in the body, leading to their accumulation. This in turn causes a range of
pathological consequences in the body, and the patient lives in a compromised state. A
primary factor in diabetes is the level of insulin present in the body. Insulin is the protein that
the body produces naturally to manage the levels of glucose in the system. When the body
produces too little insulin, greater amounts of glucose are allowed to enter the bloodstream
thereby causing the symptoms of the disease. Glucose, a simple sugar, enters the body by way
of ingested food and into every red blood cell via the bloodstream; the cells then break down
the glucose, which acts to supply energy throughout the body. Brain cells, as well as other
organs, are fueled by glucose alone. In diabetics, the body is not able to regulate the levels of
glucose and maintain a stable amount in the cells. This means the body has more than the
necessary glucose levels immediately after a meal but too little otherwise. To maintain a
constant blood-glucose level, the healthy body produces glucagon and insulin, two hormones
originating from the pancreas. Typically, there is balance of these hormones in the
bloodstream with the insulin acting to prevent the concentration of blood glucose from
increasing disproportionately.
The chronic hyperglycemia of diabetes is associated with long-term damage, dysfunction, and
failure of various organs. Several pathogenic processes are involved in the development of
diabetes. These range from autoimmune destruction of the β-cells of the pancreas with
consequent insulin deficiency to abnormalities that result in resistance to insulin action. Long-
term complications of diabetes include retinopathy with potential loss of vision; nephropathy
leading to kidney failure; peripheral neuropathy with risk of foot ulcers, amputations, and
charcot joints; and autonomic neuropathy causing gastrointestinal, cardiovascular and
genitourinary symptoms which can include sexual dysfunction [6].“These life-threatening
consequences strike people with diabetes more than twice as often as they do others” [7].
Patients with diabetes have an increased incidence of atherosclerotic cardiovascular, peripheral
arterial, and cerebrovascular disease. Hypertension and abnormalities of lipoprotein
metabolism are also often found in people with diabetes (Figure 2).
9
Figure 2: Major complications of diabetes include retinopathy, nephropathy, and
peripheral neuropathy with risk of foot ulcers, Charcot joints; and
cardiovascular disease.
10
Types of Diabetes
The complexity of the disease has led to many variants of this condition; however, the more
widely used method of classification is into two broad etiopathogenetic categories: Type 1
Diabetes (T1D) and Type 2 Diabetes (T2D). Simplistically, the modes of the insulin
deficiency in both the cases are different, which is why the treatment methodology also varies
considerably. T1D or Juvenile Diabetes, which occurs primarily in children, is caused by an
absolute deficiency of insulin secretion. This type of the disease afflicts less than 10 percent of
all diabetics. T2D; which is also referred to as ‘non-insulin-dependent’ or ‘adult-onset diabetes
is caused by a combination of resistance to insulin action and an inadequate compensatory
insulin secretory response. More than 90 percent of diabetics suffer from this disease, which
normally afflicts those over 40 years of age.
In some ways, TID can be considered as lesser of the two evils, since proper dosing of insulin
at regular intervals enables the person to lead an active and healthy life. Compliance to
treatment is high as the patients and their families are acutely aware of the role of therapy.
T2D however, is mainly the result of environmental influences such as, sedentary mode of
living, as well as imbalanced and improper eating habits that is compounded by an underlying
genetic background. In most cases, the sufferer is obese, which results in the inability of the
body to take up excess load of sugar levels. In this form of the disease there is no absence of
insulin production as is the case of T1D [8].
Initial epidemiological data will reveal a very simplified version of age groups that prevail in
one or the other type of diabetic condition. For example, T1D generally occurs in younger
patients, who may not be obese and those who present with symptoms such as ketoacidosis.
T2D however, has been reported to be more prevalent in the older age groups, where Body
Mass Index (BMI) is the primary factor that leads to T2D [8]. However, with the transfer of
sedentary mode of living in the younger age groups and children, and with better health care
delivery in the older age groups, the boundaries between the two types of diabetes are very
likely to overlap. Therefore, contemporary definitions are beginning to lose their
specifications with regard to the type of diabetes and their occurrence and prevalence in
various age groups. What is more important is to know that diabetes and its various forms
must be considered in all its aspects in patients regardless of their age [8], and perhaps a new
set of biomarkers considered with advances in the post genomic area.
11
Gestational diabetes is similar to T2D and can arise in all categories of women who are
pregnant. Studies have confirmed that nearly all women with a history of gestational diabetes
have about a 40 percent chance of developing diabetes in the future. “Other specific types of
diabetes, which may account for one to two percent of all diagnosed cases, result from specific
genetic syndromes, surgery, drugs, malnutrition, infections, and other illnesses” [9]. Women
with gestational diabetes experience an abnormal tolerance to glucose and have somewhat
elevated insulin levels. While pregnant, the effects of insulin are blocked by various
hormones, which act to desensitize the patient to the insulin her body produces. This form of
diabetes can be effectively treated by supplementing insulin injections and by submitting to
specialised diets. Normally, the symptoms of gestational diabetes do not continue in the
woman following the birth of the baby.
The classification of diabetes has been of significant value in the progression of researches
related to diabetes. The different pathological findings and clinical presentations in each
variation led to much confusion regarding the pathology of each type and what genes
contribute in each case [8, 10].
Risk Factors
Currently over 170 million people globally suffer from T2D. Most of these patients are middle
aged, however, variations in this regard are not rare, and are affected by factors such as
lifestyle, heredity, as well as behavioral factors [11].
There are several factors that influence T1D such as, the immune system, the environment and
genetics whereas the risk factors for T2D are more clearly defined. These include obesity,
physical inactivity, elderly people, family history of diabetes, a past history of gestational
diabetes and those with a weakened tolerance for glucose. Ethnicity is another risk factor. For
example “African Americans, Hispanic/Latino Americans, American Indians, and some Asian
Americans and Pacific Islanders are at particularly high risk for T2D” [7].
What is interesting to note is the role of urbanisation and changes in the living style that help
in the propagation and prevalence of the disease. Populations such as Mapuche Indians and
Chinese, who are living in rural areas of mainland, have a very low percentage of diabetics
amongst them. This points clearly to the role of physical and environmental factors that are
12
also contributory to the development of the disease [12]. Again, some of the highest numbers
have been seen among the Pima Indians in Arizona and the Naura, which points towards the
role of genetics in the development of the condition [12]. This means that diabetes is a
condition that is very much affected by both environmental and genetic factors, and both can
come into play in varying degrees in the pathology among various populations.
Although about 33 percent of people with the illness are unaware of their condition, nearly
three million or almost 12 percent of the African American population over 20 years of age
suffer with symptoms of diabetes. Because of this, African Americans have been identified as
being at greater risk than those of Anglo descent to suffer macro-vascular problems such as
strokes and heart disease. “African Americans are 1.6 times more likely to have diabetes than
non-Latino whites. 25 percent of African Americans between the ages of 65 and 74 have
diabetes. One in four African American women over 55 years of age has diabetes” [7]. The
disproportionate gap that exists between the African American population and others
regarding diabetes continues to widen. “National health surveys during the past 35 years show
that the percentage of the African American population that has been diagnosed with diabetes
is increasing dramatically” [13]. In a thorough investigative study conducted from 1976 to
1980, the total prevalence of diabetes was less than nine percent in African Americans aged 40
to 75. Another similar study conducted between 1988 and 1994 showed that this number had
increased two-fold to more than 18 percent while in the white community the rate rose only
slightly to just over ten percent.
African Americans, Hispanic/Latino Americans, American Indians and those with a family
history of diabetes also experience a greater chance of contracting gestational diabetes than do
those of other life classifications. In addition, the women who have contracted this form of
diabetes find themselves at a higher risk for developing T2D later in life.
The prevalence of diabetes in different populations is very variable. These are stated as 5% or
near to these in Asian populations. Almost 50% of the Pima Indian population suffers from
diabetes [14].
What is understood is that there are both monogenic as well as polygenic forms of the
condition that can occur in a wide variety of variations. While the simple classification method
of T1D and T2D are helpful in researching, they are still not able to identify in between cases,
13
and therefore, a more extensive time period of continuous research is required to understand
the true nature of this disease [15].
Symptoms
Diabetics display numerous symptoms including “excessive thirst (polydipsia), frequent
urination (polyuria), extreme hunger or constant eating (polyphagia), unexplained weight loss,
presence of glucose in the urine (glycosuria), tiredness or fatigue, changes in vision, numbness
or tingling in the extremities (hands, feet), slow-healing wounds or sores and abnormally high
frequency of infection” [16]. These various symptoms are common to both forms of diabetes.
However, patients do not necessarily succumb to all of the signs mentioned above.
Screening
The method of detection of diabetes is mainly through blood glucose analysis at various time
frames, which is then compared with the normal levels. Ideally, in the fasting stage, the blood
sugar levels must be no more than 126 milligrams per deciliter, or 7 millimoles per liter. In
random state, this level must be no more than 200 milligram per deciliter confirmed via two
sets of separate readings. Any increase in the amount of sugar than these is considered a case
of diabetes, and proper administration of medication and life modification techniques are
advised and administered [8]
A Hemoglobin component A1C (HbA1c) test measures the level of glucose in blood cells. The
diabetic who has not received treatment may show levels as high as 10 percent while a person
not afflicted with the disease tests at close to five percent. As previously discussed, the lack of
insulin production allows higher levels of glucose in cells. High levels of blood glucose (or
sugar) in the bloodstream leads to various diabetic related health complications if allowed to
go unchecked [17].
Treatment
While there is no known cure for the disease, diabetes can be effectively managed with proper
specific lifestyle regimes. “The key to treating diabetes is to closely monitor and manage your
blood-glucose levels through exercise, diet and medications” [16]. The type of diabetes
14
dictates the type of treatments to be followed. T1D must examine their blood-glucose levels
many times per day and inject insulin accordingly, usually at mealtime so as to help manage
the glucose being ingested. The supplementing of insulin assures that blood glucose levels
maintain stability. T2D have the ability to control the disease through personal lifestyle
decisions such as the loss of weight, exercising more and not smoking at all. In severe
instances, medication may need to be given to control glucose levels. Diabetics are able to
significantly decrease the risks of complications due to the disease if they are willing to
educate themselves then apply that knowledge to their daily lives.
Optimistically, modern medicine can bring up to date the treatment for many diseases. One of
the most important goals of contemporary biomedical research is to provide medical care to an
individual's needs, based on information from the individual's genotype or gene expression
profile, so-called personalised medicine. These principles can offer huge advances in medical
care but can only succeed if the genetic variation of humans can be accurately mapped.
The advent of a new generation of experimental techniques, has now given biomedical
researchers the opportunity to map the complete genetic variation of large numbers of humans
via full genome sequencing. The data produced from such efforts will provide an unparalleled
amount of information that can be used to stratify the human race, and help tailor medical care
that targets the specific needs of different populations and individuals. The technology to test
massive volumes is continuously evolving and the computing capability to manage datasets, is
also keeping pace with the exponential increase in sequences capability. Personalised
medicine is thus on the brink of a major breakthrough.
A T1D patient’s diet should include about 35 calories per kg of body weight per day. T2D
patients are commonly restricted to approximately 1500 to1800 calorie diet per day. These
regimes are to control the onset of obesity and to maintain an ideal body mass. These
numbers, of course, vary somewhat depending on the patient’s gender and age along with their
current weight and body type and their level of physical activity. Those diabetics who are
overweight when they begin the nutritional program may require more initial calories until
their weight drops to a more normal level. The reasoning is that too rapid a weight loss can be
unhealthy and it takes additional calorie intake to sustain a larger body frame. Gender also
plays a role in a proper program as males generally possess a greater muscle mass than
females and consequently may require a higher intake of calories. Because muscle uses up
15
more calories per hour than does fat, people who are not physically active will have less need
for calorie intake, a good reason for everyone, and especially those with diabetes, to exercise
regularly and build-up muscle mass. In other words, if you like to eat, supplement it with
proportional amounts of exercise. There are different theories regarding the most effective
diet but the fact that diet is very important in controlling the symptoms of diabetes is
indisputable (American Diabetes Association, 2006). A diabetic’s daily calorie intake,
generally speaking, should consist of 40 to 60 percent carbohydrates because the lower the
carbohydrate intake, the lower the levels of sugar that enters the bloodstream. The advantages
associated with carbohydrate intake are negated by the patient’s intake of foods that are high
in fat. This dilemma can be circumvented by the substitution of polyunsaturated and
monounsaturated fats for saturated fats. “Most people with diabetes find that it is quite helpful
to sit down with a dietician or nutritionist for a consultation about what is the best diet for
them and how many daily calories they need. It is quite important for diabetics to understand
the principles of carbohydrate counting and how to help control blood sugar levels through
proper diet” [18].
Prevention
According to the Florida Department of Health, the proper management of glucose in the
bloodstream benefits people with both type of diabetes. “For every one point reduction in
HbA1C, the risk for developing micro-vascular complications (eye, kidney and nerve disease)
decreases by up to 40 percent. Blood pressure control can reduce cardiovascular disease
(heart disease and stroke) by 33 to 50 percent and can reduce micro-vascular disease (eye,
kidney and nerve disease) by approximately 33 percent. Improved control of cholesterol and
lipids (e.g. HDL, LDL, and triglycerides) can reduce cardiovascular complications by 20 to 50
percent. Detection and treatment of diabetic eye disease with laser therapy can reduce the
development of severe vision loss by an estimated 50 to 60 percent. Comprehensive foot care
programs can reduce amputation rates by 45 to 85 percent.” [19]. Proper weight control,
increased activity and not smoking should also coincide with regular visits to the doctor in
order to better regulate blood pressure, glucose and cholesterol levels. The patient would be
best served if they form a team-like relationship with their health care professionals. “Because
people with diabetes have a multi-system chronic disease, they are best monitored and
managed by highly skilled health care professionals trained with the latest information on
16
diabetes to help ensure early detection and appropriate treatment of the serious complications
of the disease” [7].
In “Metallothionein-Mediated Antioxidant Defense System and Its Response to Exercise
Training Are Impaired in Human Type 2 Diabetes” [20], the authors discuss the importance of
metallothioneins I and II (MT1 and MT2) as part of the antioxidant defense system and its
relationship to exercise in the diabetic patient. Previous studies on these antioxidants have
indicated that exercise has only beneficial effects on the production of MT1 and MT2, but the
research team noticed that none of the studies had actually been conducted on people with
T2D. Further evidence had suggested the possibility that these important chemicals are
reduced with exercise in persons with T2D. During the study, it was confirmed that levels of
MT1 and MT2 are increased in the skeletal muscle tissue and plasma of healthy individuals
who have participated in a regular exercise program. Participants who had T2D showed no
corresponding increases though. While the study was careful to note that there were no
increases or decreases in MT1 and MT2 levels in the skeletal musculature in these patients, it
was also noted that levels were decreased somewhat in the plasma levels. Decreased MT1 and
MT2 can lead to oxidative stress, which “contributes to the development and acceleration of
related conditions such as nephropathy, neuropathy, retinopathy and macro- and microvascular
damage” [20]. At the same time, tissue samples taken from patients with Type T2D indicated
increased oxidative stress from the control group with tissue appearing more susceptible to
damage.
As further research is conducted as to just how important the decreased levels of MT1 and
MT2 are in the overall health and well-being of the diabetic patient, some changes may occur
in the types of physical therapy recommended for these patients. Before this occurs, however,
it must be determined the exact role these compounds play in the antioxidant defense as well
as whether pharmacological or therapeutic treatment options will work best to provide the
patient with the greatest possible benefit.
However, exercise will continue to play a large role in the treatment of diabetic patients thanks
to the many other benefits it offers. According to Kennedy et al (1999), exercise also helps to
distribute GLUT4 throughout the body, a process that does not occur as readily in the person
with diabetes as it does in those without the illness. GLUT4 is the glucose transporter that
brings glucose into the cell through the plasma membrane. For various reasons, GLUT4 is
17
considered to be “the major mechanism responsible for the increased rate of glucose transport
after insulin or exercise stimulation” [21]. However, this is a process that takes place
primarily in the skeletal muscle, which, in the diabetic patient, has proven to decrease insulin-
stimulated uptake. This study showed that the muscle is not similarly resistant to the effects of
exercise by demonstrating that the GLUT4 transporter enters the plasma membrane in
response to exercise where it doesn’t respond to insulin. “In contrast to insulin stimulation,
acute exercise promotes normal glucose uptake and GLUT4 translocation” [21]. In addition,
the study showed that exercise can increase the GLUT4 levels in the plasma membrane which
are comparable to people who are leaner and younger and don’t have diabetes.
Kennedy et al’s (1999) study begins to outline the various ways in which exercise and
physical therapy in diabetic patients can assist them in their disease maintenance. Exercising
the muscle helps to increase the levels of GLUT4 in the plasma membrane making it possible
for the patient’s body to absorb the glucose within the bloodstream more effectively. Even
more specifically, exercise targets an area of dysfunction that insulin has little to no effect
upon as skeletal muscle has been shown through this and other studies to have little to no
reaction to insulin.
This study is supported by a subsequent study conducted by Musi et al (2001) in which it was
determined that AMP-Activated Protein Kinase (AMPK) activity was normal in response to
exercise, as it should be if the previous study regarding the effect of exercise on the GLUT4
transporter held true. “AMPK has recently emerged as a potentially key signaling
intermediary in the regulation of exercise-induced changes in glucose and lipid metabolism in
skeletal muscle” [22]. AMPK plays a significant role in the signaling of the GLUT4 release
into the plasma membrane. This study proves that AMPK functions properly in the T2D during
exercise and suggests that it does not function properly while at rest. This was done by
comparing the blood sugar levels of a test group of diabetics with the blood sugar levels of the
control groups before, during and after riding an exercise bicycle for 45 minutes. While the
blood sugar levels of the diabetics were significantly reduced after the exercise, the blood
sugar levels of the control groups remained the same. However, like GLUT4, the mean AMPK
content in diabetic patients as compared to the control group did not show a significant
difference. Because of its believed role in the regulation of this process, however, this study
suggests further investigation as to just how the AMPK pathway stimulates the uptake of
18
glucose with the intent of the development of a new set of drugs designed to stimulate the
exercise-induced response.
With exercise comes the possibility of broken bones, making the studies of Lu et al (2003)
necessary for proper physical therapy and understanding following an accident. In their study,
“Diabetes Interferes with the Bone Formation by Affecting the Expression of Transcription
Factors that Regulate Osteoblast Differentiation,” researchers found that people with T1D do
experience inadequate bone formation, osteopenia and delayed fracture healing as a result of
their illness. Previous studies have established diabetics have decreased bone density and
bone formation as compared to control groups which suggests they have diminished osteoblast
activity. “In streptozotocin-induced diabetic rats, abnormal bone repair was shown to be
insulin dependent because the deficient osseous healing was reversed by insulin treatment.
This finding demonstrates a specific cause and effect relationship between inadequate insulin
production and abnormal bone formation” [23]. The study indicated that these deficiencies
could be reversed with the proper application of insulin, yet finding the mechanism that
prevents the bone formation at the protein level would enable researchers to further negate the
effects of diabetes on patients.
Genetic approach towards understanding Type 2 Diabetes
The role of genetic factors in the etiology of diabetes has long been implicated. This
possibility of “disease” genes was noted when family incidences of diabetes were found to be
highly significant. Therefore, patients with diabetes are very likely to have siblings or other
near relatives suffering from the same problem. Further researches have also directly
established the fact that diabetes is a condition strongly influenced by the genetic factors and
mutations therein [8].
The rates of genetic influence vary between the two forms of the disease, Type 1 Diabetes and
Type 2 Diabetes. While siblings of Type 1 patients have a 6% chance of developing the
condition themselves, this percentage increases from 30 to 40% in siblings of patients who
suffer from Type 2 Diabetes Mellitus. This makes the risk 6 to 7 times higher than in any other
group within the population. Similarly, twin studies also show a very high probability of
developing the condition ranging from 20 to 70 percent. Combined with the environmental
factors, the rates of diabetes are very likely to increase significantly [8].
19
Various syndromes have been seen where diabetes is the main feature. Such syndromes
include those such as Wolfram syndrome [8]. Hereditary indicase of genetic transfer of this
condition in the siblings is as high as 70 to 80% [14]. This transfer however, occurs in only 0.1
to 1 percent of the patients where severe insulin resistance takes place. The MELAS or the
Mitochondrial Encephalomyopathy Lactic Acidosis and stroke like epilepsy syndrome also
takes place should any mutation in the mitochondrial DNA take place [24].
It is however important to identify which genes are in reality diabetogenic in nature and which
are diabetes related genes. While some genes may modify the chances of developing diabetes
in a patient due to problems in fat storage, and use of glucose deposits, they may not
necessarily mean that each case will develop into diabetes. However, certain genes may lead
to progression of diabetes even in the absence of other environmental factors. Therefore,
research should also be able to identify which genes are actual diabetes causing genes and
which are diabetes promoting genes [25].
Therefore, genetic factors play an important role in the development of T2D. Despite
considerable effort, there has been relatively little progress in identifying genes that affect risk.
This may be due, at least in part, to phenotypic heterogeneity, that is, T2D comprises many
diseases characterized by hyperglycemia.
The etiology of T2D is so multifaceted that the debate still continues about the dichotomous
inheritance pattern of the disease. Since the environmental and genetic factors are both
important in the etiology, the individual role of each remains to be understood [26]. More than
60 genes have been researched so far in the pathology of Type 2 Diabetes and this highlights
the complex pattern of diabetic pathology that leads to the formation of the disease [24].
Two basic methods are available currently for the genome wide scan in particular case of T2D.
Gene mapping requires an elaborate review since this is the main technique, which has
enabled the scientists to recognize much of the genetic information previously unknown to the
scientific community. The method however, can vary and usually consists of positional
candidate approach or the genome wide scan. The genome wide scan is carried out via two
methods, which are linkage studies or association studies.
20
Gene mapping gained widespread popularity due to the fact that it is a cheaper option than
other genetic testing methods. This method is also faster and more accurate, and therefore is
one of the most favored methods among researchers [27]. Genetic or linkage mapping is able
to map out combinations of genes and how they can be responsible for various genetic
pathologies. The method is carried out via samples of the patient, which are blood, and tissue
samples. With the help of genetic markers and processes such as recombination, the process of
gene mapping is achieved very easily [27]. Recent researches have yielded a new class of
markers, which are obtained from DNA variation occurring naturally. They do not influence
any changes in the normal DNA, and are numerous in number, therefore, very effective in
linkage type of analysis [27].
For this, the markers are used in a variety of techniques such as the restriction fragment length
polymorphisms, randomly amplified length polymorphisms and randomly amplified
polymorphic DNAs or RAPDs [27].
There are two modes of mapping. The first is the genetic mapping, where the position of each
gene is made relevant to the other and determining their level of linkage. The physical
mapping is more focused on finding the exact location of a gene in the chromosome [27].
Linkage is defined as the presence of two different genes on the same chromosome. If these
are located near to each other, they are termed as tightly linked. This method is able to help
construct DNA maps by approximating the location of one gene with the other [27]. The
concept of map unit is used in this technique, that is “the effective distance needed to obtain, a
one percent recombination between linked alleles” [27].
Genome wide association study is defined as follows: “A genome-wide association study is
an approach that involves rapidly scanning markers across the complete sets of DNA, or
genomes, of many people to find genetic variations associated with a particular disease. Once
new genetic associations are identified, researchers can use the information to develop better
strategies to detect, treat and prevent the disease. Such studies are particularly useful in finding
genetic variations that contribute to common, complex diseases, such as asthma, cancer,
diabetes, heart disease and mental illnesses” [27].
21
Previously association studies were not possible due to lack of information about the human
genome scan. Now with the completion of genome wide scan and information about the
human genome, the association studies are emerging as an important adjunct to health research
[27]. With the help of the information gathered, the LOD score (logarithm of the odds) is then
estimated. In this method, the probable birth sequence is accessed via estimation of linkage
distance. The result obtained is then divided by the probability of a given birth sequence when
assuming the genes are unlinked. The formula applied is as follows: LOD score=z= log
[probability of birth sequence with a given linkage value/probability of birth sequence with no
linkage].
The GENNID study was perhaps one of the most significant researches done in the area of
diabetes susceptibility genes detection. Carried out by the American Diabetic Society, this
project included the various populations within America, and tried to find out the role of
different factors in the etiology of the condition. The four groups of populations that were
studied included the Caucasian whites, the Mexican Americans, the blacks, and the Japanese
Americans [28]. The criteria selected for the study was the elevation of glucose levels above
normal limits set via international standards. In most of the samples, families were selected
who had first or second relatives suffering from the same condition. Studies were carried out
through blood sampling. The study looked into diabetes as well as impaired glucose tolerance
independently as two areas of research. The method of research selected was whole genome
polymorphism scan and the markers selected ranged in number from 389 to 395 [28].
The study revealed the presence of almost 24.4% of pedigree errors among the various
families. Various markers were linked to various populations under study. For example,
D3S2432 was linked to Mexican Americans, D5S1404 were linked to whites, D10S1412 was
linked to African Americans. Mixed findings were seen in linkage of GATA172D05 on the X
chromosome in case of the two white samples taken in the study [28].
Genome wide scanning or genome mapping are a significant addition to the genetic
identification of various pathologies in the body. The use of ultramodern technologies such as
Illumina and Affymetrix are testimonial to the fact that current medical research is impossible
to carry out without the use of genetic research. Among the diseases that are being extensively
followed is T2D. It is important to understand the genetic basis of this disease, so as to devise
treatments that are able to target the problem. With the increase in the prevalence of diabetes
22
worldwide, the need for intensive research is now a prime need rather than a research fantasy
[29].
The studies have been able to collect this data on the basis of research done on certain rare
variants of the diabetes condition. As mentioned, the division of the diabetes variants into
categories has been of fundamental importance to gain information about the genetic
predisposition to various pathologies of the disease. In this regard, the maturity onset diabetes
of the young is the prime disease variant that has helped in identifying many of the genes and
their loci [24]. Although prevalent in only 2% of the population, this variant has been very
helpful in identifying some of the most complicated issues in the disease. The genes that have
been identified in the MODY include those that encode for hepatic nuclear factor 4 alpha,
glucokinase, HNF1 alpha, insulin promoter factor 1, HNF 1 beta, and NEURODI/beta 2
respectively [24].
Positional candidate genes approach
Candidate genes approach has been of much help in the identification of genetic components
of T2D genes. In fact, it can be stated that the initial and most compelling evidences that
identified genetic components to diabetes pathology have been proved by candidate gene
approach [29]. Three types of candidate genes have been found, which include the functional
candidate genes, the positional candidate genes, and the experimental candidate genes [14].
Through this approach many genes have been implicated in the etiology of T2D. These
include the peroxisome proliferators-activated receptor-gamma receptor, the beta cell
adenosine triphosphate sensitive potassium channel, and the peroxisome proliferators activated
receptor gamma coactivator-1 alpha [14]. These genes are among the very first to be identified
in the pathology of diabetes [29]. The insulin receptor substrate 1 or the IRS1 decreases
insulin signaling, and is being studied further for its possible contribution in the disease
pathology [24]. This protein is essentially a paralogue that has been linked to cellular
functions related to insulin function. The total number of such paralogues identified so far
amount to 18, which are expected to increase as more information is obtained [24].
The positional cloning approach has recently gained widespread interest and approval in many
of the research purposes. This approach was among the first to identify the role of Caplain 10
gene in the etiology of diabetes [30].
23
Whole genome screen approach
Genome scanning has been very effective and helpful in identifying sets of genes that may be
the cause of T2D. This set of genes has been found to be very different from the genetic set
recognized for T1D, since there is no constant set of genes identified for type 2 Diabetes.
Genome wide scans to map T2D susceptibility loci have been conducted in many different
populations. Some of the mapped loci have been observed in multiple populations. Other
regions, however, may be unique to specific populations. This may reflect underlying
phenotypic heterogeneity, racial/ethnic differences in susceptibility allele frequencies, or
differences in sample size, study design, and analytical methods.
The more common loci which have shown genetic abnormalities in the T2D include the
1q25.3, 2q37.3,3p24.1, 3q28, 10q26.13, 12q24.31 and 18p11.22. [8].
Genetic studies carried out in T2D are mainly of two types, the association studies as well as
the linkage studies [14]. Whatever the type of study chosen, what is understood is that single
or monogenic causes of diabetes are found in only 5% of the population, where the primary
cause of it is impaired insulin secretion or impaired beta cell mass [31].
Association studies
The association study is also known as the candidate gene approach, where association
between gene variants and T2D is found. This type of research however, requires multiple
researches to reach definite conclusions since false positive and negatives are very likely to
take place [14].
Genome linkage studies have shown increased association of various chromosomal
abnormalities in the various populations with similar chromosomal mutations. These findings
can be summarized in (Table 1).
The primary problem in these studies has been the lack of association studies that can
specifically identify which gene is responsible for a particular phenotypic trait. This leads
primarily to probability recognition of the genes rather than a confirmed analysis of the genes
24
that are responsible for causing T2D [8]. The association studies are mainly undertaken to
make association between the markers and the disease loci within a specific group or
population [14]. Physical linkage can be ascertained, however, there is required a large amount
of evidence in this regard. This method however, is more specific, and the marker determines
the accuracy with which any association can be made, therefore, the number of markers
required for genomic scanning is more [14].
There is however, a significant pool of genes that have been associated with T2D Again, these
findings have been supported by very limited data and follow up research, which limits the
credibility of these researches. Included in these associations is the PPARδ gene or the
peroxisome proliferator-activated receptor gamma. This gene is mainly involved in the
adipocyte development. This gene has been found to be protective against T2D in Finnish as
well as second generation Japanese populations, which can reduce it by a considerable 70%
[8]. This association is perhaps the most proved and researched finding in the genetic locus
recognition [8].
ABCC8 is another gene that has been implicated in T2D progression. In hyperinsulinism, this
gene is the prime location of mutation. In this particular gene is the exon 22, which is
responsible for T2D. Similar to this gene is the KCNJ11 that has also been implicated in the
T2D. Researchers have significantly supported this association between these genes and T2D
[8].
25
Table 1: A summary of LOD scores of loci findings of Type 2 Diabetes in various
populations.
Chr
omos
ome
#
Japa
nese
Afr
ican
Am
eric
an
Euro
pean
Fren
ch
Finn
ish
Paci
fic Is
land
er
Euro
pean
Am
eric
an
Ash
kena
zi Je
wis
h
Am
eric
an In
dian
East
Asi
an
1 - 0.27[32] 3.30[8] 1.50[14]
3.00[8] 1.27[33] 2.40[8] - 3.30[8]
4.30[14] - 4.10[8, 14] 8.90[14]
2 - 0.38[32] 2.60[8] 2.30[8] 1.60[33] 1.90[14]
4.10[8, 14] - 2.60[8] 2.20[14] - - 2.10[8]
3 1.40[8, 14] 0.54[32] 2.40[8] 2.97[33] 4.70[14] 3.90[8, 14] 1.10[8, 14] 4.10[8] - 1.80[14] -
4 - 0.82[32] - 1.34[33] - - - 1.30[8] - -
5 - 0.36[32] 2.80[8] 1.52[33] - 2.40[8] - - - -
6 - 2.26[32] 1.80[32] 7.30[8] 7.30[8] 4.10[14] 3.20[14] 7.30[8] - 1.80[14] 6.20[14]
7 - 0.75[32] - 1.32[33] - - - - 2.00[8] -
8 - 0.28[32] 2.60[8, 14] - - - 1.30[8] 3.60[14] - - -
9 - 0.92[32] - 1.30[33] 2.40[8] 3.30[8] - - - 2.90[8]
10 - 0.87[32] 1.90[8] 2.00[14] 1.50[33] 3.80[14] - 2.80[8] - - 2.00[8]
11 3.10[8] 0.30[32] 2.10[8] 3.40[8] 1.34[33] - - 2.10[8] - - -
12 - 0.37[32] - 1.50[8] - 3.60[8, 14] 1.50[8, 14] - - -
13 - 0.08[32] - - - - - - - -
14 - 0.58[32] 2.00[8] - - - 2.00[8] - - -
15 - 0.14[32] - - - - - - - -
16 - 0.81[32] 3.40[8] - - - 3.90[8] - - -
17 - 0.25[32] - - - - - - - -
18 - 0.54[32] 1.10[8] - - 4.20[14] 1.10[8] 2.40[14] - - 1.00[14]
19 - 0.71[32] - 1.20[33] - - - - - -
20 2.30[14] 0.21[32] - 2.70[14] - 2.00[8] 4.80[14] 0.90[8] - 2.90[8]
21 - 0.09[32] - - - - - - - -
22 - 1.33[32] - - - - - - - -
26
Recent researches have pointed to variation in the FTO gene that is causative of both diabetes
as well as obesity. Having an extra copy of this variant in the body increases the risk of
developing diabetes by more than 50%. This finding points towards the possibility of other
genes that contribute to obesity, and therefore, may become etiological factors in the
pathology of T2D [34].
Linkage studies
These types of studies are mainly testers of specific genes that look for association as well as
linkage. They are preferred as they are able to provide reasonably accurate results in
population stratification. However, the statistical validation has to be of considerable size and
value as to be recognized in the findings, and therefore, any weak links or findings are likely
to be left ignored. Any linkage that is found is able to correlate with the physical attribute of
the condition to the genetic problem. This data is mainly obtained from more than two
individual sources, should any family linkage need to be determined [14].
Linkage studies have not been able to locate and clone genes localised to a particular interval.
This limits the ability to fine map the genes that are responsible for disease. However, this
method requires lesser markers for the genome scan, and therefore, is able to work with less
data with more efficiency [14].
The researches have not been able to find linkages to phenotypic presentations of various
symptoms of T2D. While chromosomes such as chromosome 18 have been strongly
implicated in the diabetes pathology, the physical attributes to it remain unknown. Limited
research has shown that this particular loci has been implicated in human obesity. To this gene
linkages MC5R and MC4R have been identified, which have been found to be the strongest
link to diabetes [10].
When considering monogenic causes of insulin dysfunction, there are many genes identified to
MODY, which has been used to study diabetes extensively. The genes identified include
HNF4A, GCK, HNF1A, IPF1, HNF1B and NEUROD1 respectively. The linkages in the same
order to these chromosomes found were 20q, 7p, 12q, 13q, 17 q and 2q respectively [12]. Of
these the first three have been identified via the linkage studies.
27
Identifying genes contributing to Diabetes
The identification of many genes in the glucose metabolism and regulation are the key areas
from where any diabetes genome related research begins. The introduction of genome wide
scans has made it possible to introduce new possible lines of research in this regard. The genes
responsible for age related onset of the condition have been ascertained to many genes, and
include genes on chromosomes 1qter, 4p15-4q12, 5p15, 12p13-12q13, 12q24 and 14q12-
14q21 respectively. These loci have been implicated in other researches as well [26].
The chromosome 1 is perhaps one of the most discussed areas that is responsible for diabetes
type II pathology. There are also many candidate genes that have been identified in this
particular area, and these include potassium inwardly rectifying channel, subfamily J,
members 9 and 10, liver specific pyruvate kinase, C reactive protein, lamin A/C, and omentin
[35].
Another region of genetic mutations was found on chromosome 12, which has been strongly
implicated to age of onset. These loci are the 12p13-12q13. In the 12q24 region, the linkage
evidence has been found between D12S324 and D12S1659. The problem with these
researches however, is that specific gene locus and significant linkage analysis is still to be
determined. While the above genes have been strongly implicated with the age of onset of the
diabetes condition, the role of obesity and its genetic predetermination have to be known [26].
The SNPs have also been recognized as chief players in the pathology of diabetes. Of these a
significant number of these are found in the CASQ1 gene also known as the calsequestrin 1
gene. Its calcium regulation activity within the sarcoplasmic reticulum allows it to regulate the
GLUT4 expression within the cell [35]. The expression of this particular protein is increased
in animal models during diabetes, which proposes its role in glucose uptake and glycogen
synthesis, leading to higher risk of T2D [35].
The genes that lead to T2D have to be identified properly, as many are still implicated. For the
most part, researchers claim that etiological factors are also important contributors in the
development of diabetes [10]. This percentage has been identified to constitute 10% of the
diabetic population [10].
28
Previous studies
Animal studies
Animal models have been very helpful in identifying a significant number of genes and
markers, which are causative of insulin secretion and insulin related disorders. It is well
established that reduced insulin secretion and beta cell mass is responsible for diabetes [31].
For example, some of the early researches carried out on the protein activins has been done in
animal models. The roles of activins have been long known for their axial determination in
embryos, and the sonic hedgehog expression in embryonic chicks. Of interest, these proteins
are especially involved in the expression, stimulation and secretion of insulin within rat
models, through actions on the calcium and ATP potassium channels respectively [36].
Alongside, many receptors that are involved in the functioning of activin have been identified
through the same animal models. The various receptors include the ACVR1, the ACVR2 and
the ACVR2B respectively. Of these the ACVR2 a/b have both shown abnormalities in the
functioning of the pancreas function, which lead to problems in rats such as impaired glucose
tolerance or hypoplastic pancreatic cells [36]. These findings have been largely responsible for
further research carried out on this particular protein regarding their role in the pathology of
diabetes.
The IPF1 gene is a transcription factor that not only regulates the development of pancreas,
but also is responsible for exerting effect of the insulin gene. Other genes that it regulates
include GCK, IAPP and SCL2A2 respectively. Animal models have proven that the absence of
IPF1 in mice leads to absence of pancreas in them.
Human studies
Studies have shown that there is a significant role played by the CAPN10 sequence mutations
in the pathology of T2D [8]. The caplain group of proteins is made of calcium-activated
proteases, which activate or inactivate intracellular signaling, proliferation or differentiation.
They are also considered as chief role players in the insulin signaling and its secretion [14].
Variations and haplotype combinations in different locations have been associated with the
disease, however, there are very few studies that have been able to reproduce or carry out
researches on the same lines. While the statistical data produced in the initial study was
29
significant, the lack of further research in this regard leaves it to be determined with more
consistency [8]. The gene has shown a 2.8 fold increase in the risk of developing T2D in the
affected patient. The African American and Mexican American studies in this regard also
point towards the high probability rates of T2D development [14, 37].
The genes found to be responsible for T2D have been mainly known by the association and
linkage studies carried out. Genome wide linkages have shown the presence of mutations in
1q42.2, 2p21, 2q24.3, 4q34.1, 5q13.3, 5q31.1, 7q32.3, 9p24.2, 9q21.12, 10p14, 11p13, 11q13-
14, 12q15, 14q23, 20p12.3 and Xq23 respectively. These linkages have been found in the
Finnish Caucasians, the French Caucasians, the Australian aborigines, the American
Caucasians, Pima Indians, Mexican Americans, African Americans, and the Japanese
populations [14].
Repeated linkage studies demonstrate the susceptibility genes for T2D in the region of the
chromosome 1q21-q25 [14]. This main region is considered an important contributor to
diabetes pathology. This particular area alone includes encoding for things such as insulin
receptor related receptor or the INSRR, the hepatic pyruvate kinase or the PKLR, the lamin
A/C or the LMNA and the apolipooprotien A2 respectively [14].
The CAPN10 sequence however, has been demonstrated in very limited populations at the
moment. One of the few populations this includes is the Mexican American population,
otherwise; proof in other populations has been very limited [30].
Research by Parker et al, (2001) has shown the presence of chromosome 18 in the T2D
pathology, which repeats the results of research carried out before in the same area. Near to
chromosome region was found a strong association of the glucagon receptor gene or GCGR on
the 17q25 [10]. Other researchers have shown the associations between chromosomes 12q and
20 [10].
Some of the well-established associations in the T2D include TCF7L2, PPARG and KCNJ11
[11]. Another cluster variant is in the IGF2BP2 region or the insulin like growth factor 2
mRNA binding protein [11]. The TCF7L2 region in particular has been identified as one of the
most important identifications of the genetic components. The identification of this gene was
helpful in proving that “a non-candidate gene or region based association effort could work”
30
[29]. It was also significant for proving that diabetes pathology may be present in unexpected
places in the genomic sequence, and therefore, the process is currently underway for
discovering the genes that are responsible for it [29].
A very important role of the upstream transcription factor 1 (USF1) is that it has been reported
to be involved in glucose and lipid metabolism [38]. This protein is primarily located on the
chromosome 1q22 to 23, which is very widely known to be involved in diabetes progression
and pathology. This is perhaps one of the most proved protein and chromosome involved in
the pathogenesis of diabetes. However, its expression may vary from one population to the
other [38].
Many of these researches include families in their studies, due to the very strong familial
component found in the pathology of the disease. Siblings and offsprings are very likely to
suffer from very high rates of diabetes, due to their genetic affiliation, and the further the
prevalence increases within the family; the more there is aggregation of the genes responsible
for it. The twin models have easily demonstrated such affiliations, and both monozygotic and
dizygotic twin models have proven the presence of genetic similarity in the pathology of
diabetes [12].
Figure 3 summaries the timeline of the discovery of the genes predispose to T2D in the past
decade.
Genome wide scans of different populations
Genome wide scans to map T2D susceptibility loci have been conducted in many different
populations through linkage analysis (Table 1) and association analysis (Table 2). Some of
the mapped loci have been observed in multiple populations. Other regions, however, may be
unique to specific populations. This may reflect underlying phenotypic heterogeneity,
racial/ethnic differences in susceptibility allele frequencies, or differences in sample size,
study design, and analytical methods.
31
Figure 3: Timeline of discovered genes which are associated to Type 2 Diabetes for the
past decade.
32
Table 2: Thirty-nine Genes showing Genome-Wide Association study of fifty-two loci associated with Type 2 Diabetes in previous studies
among different population and their p-value.
Gene SNP Region
p value
[39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [11] [52]
FR JP FR EU JP JP EU ICLD EU EU EU AMI ICLD FI FR
TCF7L2 rs7903146
10q25.2 1.E-30 8.E-12 - 9.E-30 - - 3.E-23 - 5.E-08 - - - 2.E-10 1.E-08 2.E-34
rs4506565 - - - 6.E-16 - - - - - 5.E-12 - - - - rs7901695 - - - - - - - - - - 1.E-48 - - - -
CDKAL1
rs4712523
6p22.3
2.E-12 7.E-20 - - - - - - - - - - - - - rs4712524 - - - - 3.E-10 - - - - - - - - - - rs6931514 - - - - - - 1.E-11 - - - - - - - - rs9465871 - - - - - - - - - - 3.E-07 - - - - rs7754840 - - - - - - - - - - - - - 4.E-11 - rs10946398 - - - - - - - - - - 1.E-08 - - - - rs7756992 - - - - - - - - - - - - 8.E-09 - -
SLC30A8 rs13266634 8q24.11 8.E-08 2.E-14 - 7.E-06 - - - - - - 5.E-08 - 3.E-06 5.E-08 6.E-08
HHEX rs1111875 10q23.33 - 7.E-12 - - - - - - - - - - - 6.E-10 3.E-06 rs5015480 - - - - - - 7.E-08 - - - 5.E-06 - - - -
KCNQ1 rs2237892 11p15.5 - 1.E-26 - - - 2.E-42 - - - - - - - - - rs2237897 - - - - 1.E-16 - - - - - - - - - -
FTO rs8050136 16q12.2 - - - 2.E-17 - - 7.E-06 - - - 7.E-14 - - 1.E-12 - rs9939609 - - - - - - - - - - 2.E-07 - - - -
KCNJ11 rs5219 11p15.1 - - - 1.E-9 - - - - - - - - - 7.E-11 - rs5215 - - - 5.E-7 - - 4.E-07 - - - 5.E-11 - - - -
LOC64673 IRS1 rs2943641 2q36.3 9.E-12 - - - - - - - - - - - - - -
WFS1 PPP2R2C rs4689388 4p16.1 1.E-08 - - - - - - - - - - - - - -
LOC72901 CETN3 rs12518099 5q14.3 7.E-07 - - - - - - - - - - - - - -
CDKN2A CDKN2B
rs564398
9p21.3
- - - - - - - - - - 1.E-06 - - - - rs10811661 - - - 7.E-07 - - - - - - 5.E-06 - - - - rs7020996 - - - - - - 2.E-07 - - - - - - - - rs2383208 - 2.E-29 - - - - - - - - - - - - -
PPARG rs1801282 3p25.2 - - - - - - - - - - 2.E-06 - - - -
IGF2BP2 rs4402960 3q27.2 - 1.E-06 - - - - 8.E-08 - - - 9.E-16 - - 9.E-16 - rs6769511 - - - - 1.E-09 - - - - - - - - - -
33
Table 2 (continued)
Gene SNP Region p value
[39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [11] [52] FR JP FR EU JP JP EU ICLD EU EU EU AMI ICLD FI FR
MTNR1B rs1387153 11q21 - - 2.E-36 - - - - - - - - - - - -
CDKAL rs10946398 6p22.3 - - - 7.E-07 - - - - - - - - - - -
JAZF1 rs864745 7p15.1 - - - - - - 5.E-14 - - - - - - - - CDC123 CAMK1D rs12779790 10p13 - - - - - - 1.E-10 - - - - - - 8.E-15 -
TSPAN8 LGR5 rs7961581 12q21.1 - - - - - - 1.E-09 - - - - - - - -
THADA rs7578597 2p21 - - - - - - 1.E-09 - - - - - - - -
ADAMTS9 rs4607103 3p14.1 - - - - - 1.E-08 - - - - - - - - NOTCH2 ADAM30 rs10923931 1p12 - - - - - - 4.E-08 - - - - - - - -
DCD rs1153188 12q13.2 - - - - - - 2.E-07 - - - - - - - - SYN2
PPARG rs17036101 3p25.2 - - - - - - 2.E-07 - - - - - - - -
VEGFA rs9472138 6p21.1 - - - - - - 4.E-06 - - - - - - - -
TCF2 rs4430796 17q12 - - - - - - - 1.E-11 - - - - - - -
CETP rs1800775 16q13 - - - - - - - - - 3.E-13 3.E-6 - - - - -
APOE cluster rs4420638 19q13.32 - - - - - - - - - 3.E-13 - - - - -
LPL rs328 8p21.3 - - - - - - - - - 5.E-07 - - - - -
APOB rs693 2p24.1 - - - - - - - - - 7.E-07 - - - - -
PVT1 rs2648875 8q24.21 - - - - - - - - - - - 2.E-06 - - -
NR
rs358806
-
- - - - - - - - - - 3.E-06 - - - - rs12304921 - - - - - - - - - - 7.E-06 - - - - rs1495377 - - - - - - - - - - 7.E-06 - - - - rs7659604 - - - - - - - - - - 9.E-06 - - - -
Intergenic rs1859962 17q24.3 - - - - - - - 3.E-10 - - - - - - - rs6712932 2q12.1 - - - - - - - - 6.E-06 - - - - - -
* Population abbreviations: FR=French; JP=Japanese; EU=European; AFA=African-American; ICLD=Iceland; AMI=American Indian; FI=Finnish.
34
Asia
The Asian population is a large pool of different ethnic communities and races, that have been
living in isolation or in conjunction with each other. This has resulted in a wide mixing of the
genetic component within these populations, with a few pockets of communities that have
preserved original genetic structures. The challenge here is the number of genome scans that
must be carried out based on each individual population so as to ascertain correctly what their
genetic components are. Below is a small introduction to this wide variety of genetic pool.
Chinese population
The Chinese population also displays some of the highest numbers of diabetic disease in the
world. Even more complicating is the fact that the Chinese population accommodates within
itself a very large number of ethnic groups, which may display various genetic variations
within them. This makes generalization in China a very difficult task, and understanding each
and every sect is important to understand the true extent of the disease in this country. This
variation in the diabetes scans shows itself in the form of varying heterogenic expressions of
chromosomes responsible for diabetes in this particular population [53].
Studies show that the increase in the incidences of diabetes have risen by more than ten fold,
and with the increasing population of China, there is an ever increasing number of people who
are suffering from diabetes or diabetes related complications [53].
The most significant of the Chinese ethnic groups is the Hans group, which constitutes the
biggest pool in the Chinese population. These as well as many other Chinese ethnicities
display mutations on chromosome 1 at the locus D1S1589, primarily determining the age of
onset of the disease. Other chromosomal regions associated with the age onset of diabetes
included the chromosomes 6, 12 and 16 respectively. Linkage levels were also found for the
6q and 1q genes. The Mendelian laws of inheritance also apply to this particular disease [53].
The genes mutated in the Chinese population for T2D include 1q25.3, 2q37.3, the 6q22,
18p11.22 and the 20q13.1 respectively [14].
35
The most significant genetic mutations found by the Chinese population include those present
on the chromosome 6. The LOD found in this particular chromosome are as high as 6.2, and
this linkage has been associated with the gene 6q21-q23 [14]. Related to this chromosome is
the low fasting glucose levels which are also found in the Finnish populations, and the African
American, the Mexican Americans and the Pima Indian populations. This region has been
implicated in a multiple of genetic mutations, and is therefore, considered to be a prime region
involved in the etiology of T2D [14].
Japanese population
The chromosomal abnormalities found for T2D in the Japanese population include the 3q28
and the 20q13.1 respectively [14]. Linkage has been found on the 11p13 chromosome, which
is supported by researches and findings on American Caucasians [14]. The linkage Xq23,
originally located in the American Caucasians, has also shown its presence in the Japanese
population as well [14].
Chromosome 3 is another area that has shown high rates of mutations related to the pathology
of T2D. These mutations have been found on both the long as well as the short arms. Such
linkages have been demonstrated in many populations that include the French Caucasian as
well as the Australian population respectively. Similar studies found the linkages in the Pima
Indians at the region of GLUT2 gene on 3q26.1 and D3S1292 respectively [14].
LOD scores as high as 8.91 were found near the marker D1S2815 in the linkage studies
performed [14]. The Hong Kong Chinese have shown susceptibility loci on chromosome
1q21-q25 [14].
The T54 allele homozygosity in the Japanese population has been found to be associated with
higher basal and 2 hour insulin levels when compared to other genotypic variations [14].
Chromosome 11 mutations have been found to link to marker D11S935. The particular
location on the chromosome is 11p13.The similar findings were seen in the Finnish population
at the location 11q13 [14].
36
Of significance, the MODY mutations in the Japanese populations have revealed two more
genes responsible for familial diabetes. These are the nonsense mutation of Q310X found in
the chromosome MAPK81P1, and the missense mutation of E1506K in ABCC8 gene [12].
The objective of Takeuchi study was to detect the new T2D gene variants and substantiate the
previously detected variants through a three-stage GWAS study on the Japanese population
[40].
In the first stage, 519 case and 503 control individuals were genotyped with 482625 SNPs.
The Cochran-Armitage trend test was used to test the association between T2D and genetic
variants.
In the second stage, 1456 SNPs were genotyped using iPLEX (Sequenom) and GoldenGate
(Illumina) assays. According to the p-value criteria (p < 7 x 10 -5), 30 SNPs symbolising 17
unique loci were chosen as significant.
In addition to the GWA study, the objective was also to replicate T2D association of 17 SNPs
from 16 candidate loci detected earlier in the Europeans.
The third stage incorporated the replication of association and estimation of 4000 case subjects
and 12569 sample subjects based on population. An association was taken to be significant
only if it used the same risk allele as used by the other two stages and then it was evaluated on
the basis of a one-tailed test. A meta-analysis was conducted using the combined results of
two or all the three stages with the past Japanese studies carried out by 3 other groups. The
correlation coefficient between a SNP coded by the number of risk alleles and the disease
status is given by R2 and it is used to compare the explained sum squares between Japanese
and European population.
The results gave 4 loci with one new and 3 previously detected ones. There was considerable
overlap of T2D susceptibility genes between the Japanese and European population, while
extent of effect and explained variance was inclined towards a higher level in the Japanese
population and the association was more for the Japanese population than the European
sample.
37
Although the study could not validate whether the penetrance for a genotype of notice differs
considerably between Japanese and European descendants, yet with respect to genetic effects,
4 out of 7 confirmed loci verified a considerably higher odds ratio in the Japanese population
[40].
Another GWAS study was conducted by Unoki and his team to detect genetic variants that
multiplies the risk of type 2 diabetes in the Japanese people [43].
To perform this test, 268,068 SNPs were genotyped from 194 Japanese subjects with T2D and
diabetic retinopathy and, 1558 unrelated control subjects. These SNPs constituted about 56%
of common Japanese SNPs and among them 207,097 SNPs that were successfully genotyped
were chosen. The 8,323 SNPs that showed the lowest p values were chosen to be genotyped in
1,367 T2D cases and 1266 controls. It yielded 6,731 SNPs successfully for further analysis.
Nine SNPs loci were chosen with p values less than 0.0001 and a third cohort of Japanese
were genotyped with 3,557 T2D cases and 1352 controls. All these populations were
combined in a subsequent case-control analysis and the findings detected 6 SNPs that were
strongly associated with T2D. Among them, CDKAL1 locus and IGF2BP2 locus were
detected earlier and three others had a p value greater than 0.056 and hence were excluded in
the third test. The only remaining locus was KCNQ1 (rs2283228) and hence, it was further
examined in quite a number of case-control studies. The analysis apart from being performed
on the Japanese population was also performed on the Singaporean and Danish populations to
confirm the association KCNQ1 with T2D risk.
A significant interactive effect between rs2237897 and rs234844 was detected using stepwise
logistic regression analysis. The results of the Singaporean and Danish studies showed that
rs2237895 and rs2237897 are strongly associated with T2D in East Asian and European
descent respectively.
KCNQ1 was not detected in the studies hitherto and this has been the first attempt to verify its
association with T2D. However, there exists a possibility that the CDKN1C gene near the
KCNQ1 gene is the actual variant causing diabetes and calls for further studies to verify it
[43].
38
Indian population
The subcontinent region of Asia suffers from an acute lack of research in genome scanning in
almost all aspects of medicine. Therefore, the most important or significant data that is worth
mentioning is the epidemiological data. That too, is strongly affected by the lack of local
health care system and setup, where a large proportion of the society is unable to reach
medical help. This also means that much of the population is unable to afford treatment or
even have diagnosis of their condition, let alone have genetic scan. With a constant increase in
population in this particular region, there is an expected exponential increase in the numbers
of diabetic patients as well [31]. The sample therefore, is not representative of all portions of
the society, and may be only confined to the economically sound population. The countries
that usually comprise this set of population include people from Pakistan, India, Bangladesh
and Sri Lanka etc.
To date, some of the most significant research carried out in this regard have been done so in
India. The findings showed a very high prevalence of diabetes among the samples collected,
where more than 12 percent of the patients were suffering from diabetes or insulin resistance.
Gender differences were very nominal, and the mean age of onset was found to be at 40 years.
Epidemiological factors such as diet, living style, lack of activity and BMI etc. were important
contributors in the disease [31].
Scans have shown loci for T2D on chromosomes 1q21, 2q,3, 5, 11q, 12q and 20q respectively
[31]. The most prevalent gene was found to be 1q21-24, which has expressed itself in other
populations including Utah Caucasians, Pima Indians, English, French, Amish, Chinese and
other populations [31]. Its evidence for linkage however, remains to be determined.
Susceptible genes that may contribute towards diabetes in the patients were found to be in
PPARG, KCNJ11, CAPN10 and HNF alpha genes. CAPN10 variations were significant in the
Hispanic as well as Finnish populations [31]. Study carried out by Sanghera et al in 2007
found some new genes in the pathology of T2D. These included IGF2BP2, cyclin dependant
kinase 5, a zinc transporter protein, CDKN2A, HHEX, TCF7L2, KCNJ11 and FTO
respectively [54].
Other chromosomes found in researches among the Indian population include 3q22 region
loci, 1q44, 8q23 and 2q37 respectively [31]. Other findings include chromosomes 16q12,
39
19p13.3 respectively. The expression has been found to vary from one region in the Indian
population to the other. The main reason behind it is the large numbers of different ethnic and
racial populations that live in the same region. Therefore, genetic scans in the Indian region
will mean scans according to different localities and the type of population being included in
the study [31].
Studies carried out in the same line by Sanghera et al concluded the high association of
PPARG2, IGF2BP2, TCFL2, and FTO. This study was able to identify the role of the genes,
which had been previously implicated in the diabetes pathology. This research was able to
compare results of studies carried out on Caucasian populations and was able to identify the
presence of these genes in the Asian genetic pool as well [54].
North America
The American region is perhaps one of the most affected regions of diabetes. While there have
been considerable advancements in the field to improve the quality of life in the diabetic
patients, there still has been a significant increase in this population nevertheless. The primary
reason is the ageing of the baby boomers population, more life expectancy than in the past,
and better health facilities and awareness regarding care of such patients. This is resulting in
an increase in the number of people who will require health care provision due to diabetic
complications. More importantly the complications of diabetes have started to rise as well in
this region. Now there are more cases than ever where younger age groups are demonstrating
renal complications, amputations, or blindness due to a direct consequence of diabetes [8, 28].
This number continues to rise, and with the current challenge the US health care system faces,
the care for such patients will be more challenging than ever [8].
Pima Indian population
A study of genome-wide association has been carried out on the Pima Indian population to
examine this linkage. An example of repeated expression of genetic mutations in different
populations is the chromosome 1q25.3, which is seen in both the Pima Indians as well as
among U.S Caucasians. The same findings were replicated in the French and the UK
Caucasian populations as well. The marker found in this population is the D1S2127 [8]. Other
races where the chromosome 1q25.3 has been reported are the French Caucasians, the UK
40
Caucasians, the Amish, the Chinese and the Framinghams [14]. Pima Indians have also shown
the presence of the mutations in the chromosome 3q28 the marker of which is the D3S1580, as
well as the 6q22, the marker of which is D6S1040 [14].
Studies carried out on the Pima Indian populations also found the role of CAPN10 in the
genetic etiology of T2D. Low mRNA Caplain 10 levels were found in patients who displayed
homozygous G allele of UCSNP-43. Insulin resistance was also demonstrated, which was the
same in the Finnish population [14].
Pima Indians with chromosome 4 mutations especially the missense mutations of A54T have
shown “greater insulin resistance and higher rates of fat oxidation compared with homozygous
normal controls” [14].
Another study carried out to detect genetic variants associated with the onset of diabetes in the
young age [50]. This study suffers from an extensive occurrence of obesity and type 2
diabetes.
The study consists of 300 individuals having T2D at an onset age of 25 years or less. It
consisted of another 334 control subjects without diabetes aged 45 or more. 121 non-diabetic
siblings of the diabetic sample and 140 diabetic siblings of the 334 individuals were included
to check genetic association within the family (case-control approach). A resulting 80044
utilizable SNPs were derived after genotyping the individuals on the Affymetrix 100K array.
These SNPs were tested for both general and within family association in case and control
samples. SNPplex was employed to genotype persons in the follow-up studies based on
population.
The study shows that an early onset of diabetes is to a great extent influenced by the genetic
determinants. This proves that there are a number of regions where marker alleles are strongly
in linkage disequilibrium with variants that confer susceptibility to early onset of diabetes
mellitus.
Genome-wide mapping analyses are only an initial step in the explication of susceptibility
variants. Although the current analyses have pointed out several areas that may hold genetic
variants that affect susceptibility to early onset of diabetes in American Indians yet, fine-
41
mapping analyses of these areas are needed to pinpoint the indicators to specific genes. In the
present study, confirmation of the function of genes in the identified areas would involve
replication analysis on other populations as well as functional analysis [50].
Amish population
The Amish population is perhaps one of the purest populations to study genetic linkages
related to various diseases. The Amish population exhibits a somewhat homogenous lifestyle
and preserves extensive family history accounts and hence, offers a good ground to carry the
genetic analysis. Researches based on this population of older adults or of adults in the past,
ensured the supply of genetic material that was not influenced by other populations. Therefore,
the genetics dynamics related to this population can be considered genuine and may be helpful
in identifying some of the main genes that are involved in diabetes pathology in the Amish
population [35]. The Amish population is one of the first populations to prove the role of
chromosome 1 related mutations in T2D pathology. Multiple polymorphisms have been found
in the sequencing of the exons, showing mutations in the region [35].
The Amish population has also reported finding of the chromosome 1 related linkages and
mutations for T2D [14]. The main complication however, is the number of genes that are ideal
candidates for the progression of T2D. At least 450 have been documented to be involved in
the possible etiology of T2D in this region alone [35]. Patients were found to have impaired
glucose homeostasis, with peak at the marker D1S2715. This location is very near to the
linkage locations found in the Utah Mormon and the French populations [14]. Other
researches point to five populations that show mutations and associated etiology related to
chromosome 1. These include Pima Indians, Utah Caucasians, French Caucasians, UK
Caucasians and the Chinese [35].
A study conducted by Rampersaud and his team to find out the T2D susceptibility genes by
carrying out a genome-wide association scan (GWAS) on the Old Order Amish, a population
of Swiss immigrants [55].
The study consisted of 124 T2D subjects identified by the AFDS and 295 control subjects
exhibiting normal glucose tolerance. Their DNA was genotyped on the Affymetrix 100K SNP
array. In total, 82,485 SNPs were examined to check their association with T2D on the basis
42
of quality control tests and Hardy-Weinberg equilibrium test. These SNPs associated with
T2D were again prioritized on the basis of genetic links with 5 oral glucose tolerance test traits
of 427 Amish individuals not having diabetes. The secondary quantitative test comprised of
the highly significant (p<0.01) subjects out of the 427 non-diabetic participants. The related
SNPs were used for in silico duplication from three distinct 100K SNP GWASs taken from the
population of FHS Caucasians, Pima Indians, and Mexican Americans along with a 500K
GWAS in Scandinavians.
The results showed that the in 1 of the 3 independent 100K GWASs, 80 SNPs were
technically linked with T2D, 3 SNPs, that is, rs2540317 in MFSD9, rs10515353 on
chromosome 5, and rs2242400 in BCAT1 were linked with T2D in a single or more
population, and among the Scandinavians, 11 SNPs showed an association with T2D. The
strongest T2D association trait in the Amish was detected on chromosome 7 in a functionally
pertinent T2D runner gene, GRB10 [55].
African American population
The African American population suffers extensively from T2D. In the USA, this rate of
disease is exclusively high when compared to other races currently residing in the US [32].
Part of the problem may lie in the lack of adequate health care services being provided to the
African American community. The incidence, age of onset as well as the rates of
complications, all have been found to be very high among this group. This is of main concern,
for as the population of this community rises, there will be an increase in the number of
diagnosed cases as well. As many patients have early onset diabetes, the possibility of
developing complications is also raised. These combined with other chronic conditions such as
heart diseases and blood pressure, can lead to a very complex pathology, requiring extensive
treatment of the patients.
Again, the African American community displays a cocktail of genetic as well as
environmental factors that may contribute to the progression of the disease. Genetic
complications have been found to be as high as 2.9 fold in those families where the disease is
prevalent, as compared to those families where the patients are unaffected [32]. Again, this
information is very much arbitrary and based on assumptions, since there are very few studies
carried out regarding the prevalence of diabetes and diabetes related complications in the
43
African American community. Even rarer are the studies carried out on the genetic
identification of the loci responsible for the cause of the disease. Therefore, it may be expected
that the actual numbers of people suffering from diabetes may be much higher than anticipated
[32].
Phenotypic data carried out so far have shown that the age of onset of this condition in this
particular group is much younger. The mean ages were found to be 41 years at the time of
onset. This mean may go down to much younger ages due to the sedentary pattern of living in
this community. This means that the average life of a person suffering from this condition
spans to around 16 years [32]. Researchers have also shown that the predominant population
suffering from this condition is females, having obesity, poor blood sugar control and early
onset of the condition. The genetic findings in these patients showed two main regions of
single locus mutations. These are the chromosome 6 at 163.5cM and chromosome 22 at 32cM.
Here the possible genes candidates for causing T2D include estrogen receptor 1 genes, tubby
superfamily protein, insulin like growth factor 2 receptor, mitogen activated protein kinase
kinase 4, and manganese superoxide dismutase. The multilocus mutations found are in
regions 6q,7p and 18q. These loci have been found to be associated with the early onset of the
condition, as well as in the low BMI [32].
The findings of the researches have also shown that some genetic variations and mutations are
very similar in some of the other racial populations around the world. Linkage peaks of 6q24-
q27 have been found in races of Pima Indians as well as Chinese Hans [32].
Mexican American population
The Mexican European population has shown many overlapping genes which result in T2D.
These gene mutations are located on chromosomes 2q37.3, 3p24.1, and in 10q26.13
respectively. These genes have also been seen in American Caucasians, Chinese, Finnish
Caucasians and UK Caucasians [8, 14]. Indo Mauritians also display mutations in the 2q37.3
gene the marker of which is D2S125. The 3p24.1 gene is also found in the Finnish Caucasians.
The 10q26.13 is also found in the UK Caucasian population [14].
The 2q37 region linkages have been especially found to be high in the Mexican American
population, when interactions with loci on chromosome 15 were researched [14].
44
Chromosome 3 mutations are also seen in these populations. This linkage was found again on
the 3p24.1 as mentioned above, where the LOD was found to be 3.91. Very similar findings
have been received from the Finnish populations [14].
The recent researchers have found strong evidence for genetic mutations found in the region of
2q37 for Non Insulin Dependent Diabetes Mellitis, and caplain 10 for Type 2 Diabetes
Mellitus. Alongside is the contributing role of the SNPs 43 and 44 in the etiology of the
condition [37].
Europe
The European populations especially the UK population has a wide variety in its plate
considering the populations living there. The influx of many new populations have added to
the gene pool, which makes this region very diverse in terms of genetic material. The findings
of the UK population therefore, are mainly determined considering which populations have
been included in the study, and the results reflect the prevalence of diabetes in that population
only. Such disparity in the prevalence is also seen in America, and both these regions show a
very less percentage of Caucasians suffering from diabetes compared to communities such as
the African Americans, who have three to six fold higher rates of diabetes [12].
The main UK population of Caucasians has found many mutations and linkages related to
Type 2 Diabetes Mellitus. Linkages have been found on chromosomes 5q13.3 and 5q31.1
respectively, mirroring the similar findings on the American and the Finnish Caucasians [14].
Chromosome 8 mutations have also been researched in the European populations, where
mutations were found in the 8p23 region. These findings have also been demonstrated in the
population groups of American Caucasians, where the candidate gene found is the PPPIR3B
[14]. This gene has also been implicated in the Pima Indian population, and has been
associated with increased insulin resistance, along with other populations such as the Japanese
and aboriginal Canadians [14].
45
Dutch population
Very little research has been done on the Dutch population at present. The genomic scans are
limited and there is a need for carrying out more research in this regard. The study by
Einarsdottir, 2006 was based on 59 families and their genome wide scan was carried out.
Linkage analysis in this regard found high linkage with the chromosome 2, 3, 7, 11 and 12
respectively [56]. The study confirmed the role of CAPN10 gene in the risk of T2D. This also
supported previous studies where isolated Dutch families presented with CAPN10 gene in
their genome sequence [56].
The study by Rasmussen also mirrors some of the findings of the research of Einarsdottir.
Rasmussen also points out the potential and important role of CAPN10 gene in the pathology
of diabetes [57]. The gene and its various haplotypes are being discovered that may contribute
towards genetic pathology in diabetes. The three polymorphisms identified as yet include
UCSNP43, UCSNP19 and UCSNP63 respectively. These polymorphisms were initially
identified in the Mexican American population, which was later on also confirmed by research
on the African American population. The research by Rasmussen also confirmed a strong
association of the three polymorphisms with diabetes prevalence [57].
Ashkenazi Jews population
The Ashkenazi population can be considered as a very good point to start new and fresh
research on a population that has not suffered from mixing of other population genes. A
relatively uncontaminated sample of DNA that has not mixed with DNA from other
populations, therefore, can be a fertile ground on which to conduct significant research [15].
The Ashkenazi Jews populations have shown a major mutation site at chromosome 4, which is
very similar to the linkage analysis results found in the French populations. In their particular
case, the gene supposed to be responsible is the FABP2 gene or the fatty acid binding protein
gene. In this particular location, a missense mutation is found at the A54T region [14].
Genomic research on relatively unmixed samples of this population have revealed four
chromosomes involved in the pathosis of T2D. The most important ones are the chromosome
4, as mentioned above, and the chromosome 20. On the chromosome 4, the number of markers
46
found is eight in number whereas on chromosome 20, the number of markers found is five
[15]. The findings of the research are very similar to the Finnish studies that have also
identified significant role of chromosome 20 mutations in T2D acquisition [15].
Finnish population
Research carried out in Finland has identified many genes that link to T2D. Other types of
research has found the association between various physical attributes to the etiology of T2D
[10]. Linkage studies have shown high LOD values for chromosome 4 as well as on
chromosome 17[10].
In another genomic scans the Finnish population has shown linkage at chromosome 2q37.3,
and in other similar regions as well [8]. Other mutations found in the Finnish population
include mutations in the 12q24.31 and 18p11.22 genes. These genes have expressed
themselves in populations of American Caucasians and Pacific Islanders as well [8]. Other
genes found include the 3p24.1, the 6q22, 12q24.31 and the 20q13.1 respectively [14]. The
Finnish pool has shown a larger number of mutations for T2D when compared to other
population samples.
Linkage studies have shown involvement of chromosomes 1q42.2, 5q31.1, 9q21.12, 14q23,
20p12.3 and 4q34.1 in the Finnish population. The same linkages have been found in the UK
Caucasians, the Chinese, the American Caucasians and the Ashkenazi Jews respectively [14].
The most significant findings have been seen in the chromosome 4 among the Finnish
populations. The Finnish Caucasians demonstrate linkages on the chromosome 4q, which is
very near and similar to the linkage found in the French Caucasians at 4q34.1 [14].
The second significant finding has been seen on the chromosome 12. The mutation in this
region has been associated to low circulating insulin levels in the body, supported by research
on Finnish population groups [14]. The US Caucasian family researches have also shown such
linkages in this area.
Third significant linkage found was on chromosome 18, where the BMI was reported to be the
highest. The 18p11 has shown linkage to T2D in Netherlanders Caucasians, and the Mormon
Caucasians, as well as the Hans Chinese [14].
47
Another study has used GWA analysis to detect genetic variants that are associated with T2D
in Finnish population [11]. Further, the study has used its results to compare it with those of
two other similar studies of Sladek (2007) and Saxena (2007) [48, 52].
This study used 1,161 cases of T2D along with 1174 NGT control subjects from the Finnish
population was genotyped on 317,503 SNPs in the first stage. Based on the quality control
criteria and Minor Allele Frequency (MAF) values, 315,635 SNPs were selected and
examined for T2D association using an additive model. These samples were taken from the
Finland–United States Investigation of Non–Insulin-Dependent Diabetes Mellitus Genetics
(FUSION) and Finrisk studies of 2002. The proportion of the unknown SNP variants was
enlarged by an imputation approach that utilized genotype statistics and linkage disequilibrium
data from the HapMap Centre d’Etude du Polymorphisme human samples to estimate
genotypes of autosomal SNPs that were not genotyped in the subjects examined. To enhance
the statistical significance of the results obtained in this stage, the second stage used 80 SNPs
from an additional 1,215 Finnish cases with T2D and 1258 NGT control subjects and
performed a combined analysis of the joint FUSION samples of both the stages [11].
The combined samples from all the three analyses showed evidence for seven other T2D loci.
There was significant confirmation in the FUSION stage 1 GWA data for the first three loci
and, for the rest four, the FUSION stage 1 result was more reserved. The results show that
there are genetic variants associated with T2D that are present in an inter-genic area of
chromosome 11p12, in the vicinity of IGF2BP2 and CDKAL1 genes and in the area of
CDKN2A and CDKN2b genes. It also established that the genetic variants in the vicinity of
TCF7L2, SLC30A8, HHEX, FTO, PPARG, and KCNJ11 are associated with the risk of
developing T2D. These findings along with the other gives a total of ten loci associated with
T2D [11].
Studies on the Finland population have shown the presence of ASP or affected sibling family
pair populations. These populations show a significant percentage of diabetes affliction, where
incidence can range from 5 to 30% depending on the age group [30]. Studies on this particular
area have found strong evidence for chromosome 11 via fine mapping procedures. Linkages
have also been reported on chromosomes 2, 6 and 10 respectively. Study by Silander et al in
2003 has revealed the presence of 12 significant areas where linkages related to diabetes have
been implicated. The second strongest evidence was found for chromosome 14, which encodes
48
for endoplasmic reticulum functioning and ensures proper working of the liver and pancreas
respectively [30]. The particular study revealed four chromosomes that have been linked to
diabetes incidence in the Finnish sibling pairs, which are 6, 11, 14 and X respectively [30].
French population
The French Caucasian populations have shown chromosome mutations in 3q28, the marker of
which is the D3S1580 [8]. Other genes located include 2q37.3, 3q28 and the 20q13.1
respectively [14].
Despite the contribution of this particular population in the identification of the various genes
causing diabetes, the largest studies conducted yet on this population showed the linkage to
chromosome 20q12-13.1 respectively. This finding was supported by a multitude of other
researches as well [15].
Research by Silander et al, 2001 has also shown certain genetic susceptibilities that indicate
possible involvement in diabetes [30]. The research by Scott et al in 2007 has shown strong
evidence for chromosome 11 associations in the SNP region of rs9300039. Another
association was made in the intron 5 region of CDKAL1, along with rs4712523 and rs7754840
(Scott et al, 2007, pp 1344) The study showed associations of 10 genes in the pathology of
T2D. The genes are: IGF2BP2, CDKAL1, CDKN2A/B, FTO, PPARG, SIC30AB, HHEX,
TCF712, and KCNJ1 respectively [11]. Two silent genes, the E111E and N486N have also
been reported in high frequency among European and American populations [36].
Vionnett et al have carried out another significant research on genome wide scan in the
Caucasian French population in 2000. This particular genome scan has been able to verify
many of the genetic association and linkages that were proposed to occur in the French
Caucasian population. Loci identified through this research included the 2q37, which was not
found with much convincing results in the four population study carried out by Ehm et al in
the same year [28]. Study by Vionett included 143 families. The inclusion of families is able
to identify any similar traits present in the genome makeup, which could correlate with the
phenotypic and genotypic features of diabetes. Multiple individual families suffering from
diabetes were selected to ensure the inclusion of most of the genetic determinants in the scan.
The Mendelian inheritance pattern was carried out as part of the research to identify the
49
genetic picture more deeply. The phenotypic characteristics, like in the previous studies, were
determined via the physical presentations and the analysis of blood of the patients. The
phenotypic traits included in the research included the status of the diabetes condition, the age
of onset and diagnosis and the BMI [33]. The study was able to associate strongly the
phenotypic trait of impaired glucose tolerance and early age of onset to the chromosome 3q27-
qter [33]. Many genes were identified as primary candidates of diabetes occurrence. The
chromosome 1q21-q24 was strongly associated with diabetes in lean patients. This is very
different from the findings in the Pima Indian population, where the chromosome 1q has been
associated with diabetes and obesity [33]. Chromosome 20 was also implicated in the research
regarding role in the diabetes progression [33]. Study of Gibson et al, in 2005 was also an
extension in identifying the role of various proteins involved with chromosome 1 in the
pathogenesis of diabetes in the French population. Contrary to the results displayed by
Vionnet et al, the results were not as supportive for the French Caucasian population. The
study which looked into the role of the upstream transcription factor 1 (USF1), was unable to
identify any particular role in the French caucasian population [38].
The findings of the role of activins in the pathosis of diabetes and diabetes related syndromes
have also prompted studies in the humans as well. This is with regards to the pathological
development of conditions such as hypoplastic spleen, abnormal stomach, and defects in axial
patterning and lateral asymmetry etc. due to abnormal expressions in the ACVR2B. The role of
the same protein in humans was therefore, a subject of much interest, and was checked on the
French population for any association. In humans, there were three nucleotide variations found
in this particular protein. These included two silent mutations in exons 3 and 11, and a T to C
variation 13bp upstream of exon 7 [58].
The objective of the study which was carried by Rung et. Al (2009) was to identify T2D risk
loci in a group of French subjects derived from a first-stage GWAS and then followed by a
huge second stage concentrating on the 5% of those variants that are associated with T2D with
a very high significance [39]. This is followed by the third stage, which puts a greater focus on
the Danish cases and controls.
In the first stage 1,376 French subjects were used to obtain 16,360 SNPs that were nominally
associated with T2D and these SNPs were examined in an independent sample of 4,977
French subjects. There were 28 best outcomes, which were replicated in 7,698 Danish
50
individuals and resulted in detecting 4 SNPs that showed potential association with T2D. The
control subjects were chosen from DESIR study. The association was tested using
EIGENSTRAT. The quantitative analysis was carried out through linear regressions and the
odd ratios were computed with the help of logistic regression. Occurrence of T2D was
examined by using Cox proportion hazard models in DESIR. All these methods were adjusted
for sex, age and BMI.
The analysis helped to detect the T2D risk loci in the vicinity of IRS1 that were not reported
previously. This is one of the first T2D risk locus identified in a GWAS which is related to
insulin resistance and hyperinsulinemia. It confirmed that the C allele of rs2943641 was linked
with hyperinsulinemia and insulin resistance in a total of 14,358 French, Finnish and Danish
subjects. The findings further emphasize the function of insulin secretion and insulin
sensitivity in creating T2D risk, and they also show direct evidence of a genetic alteration
affecting IRS1 protein and the activity of PI (3) K. These two are key phases in insulin signal
transduction.
Although these findings show that G972R and rs2943641 could independently influence T2D
risks and strength of reactivity towards insulin, however, fine-mapping studies are
recommended to detect the etiological SNPs and analyze their interactions in different
populations in greater details [39].
Zeggini et. Al (2008) carried out meta-analysis of GWAS data to detect further susceptibility
loci associated with T2D [45]. Three T2D GWAS data consisting of 10,128 issues of
European ancestry were used for the meta-analysis. The 2,202,892 SNPs that were directly
genotyped or attributed were analyzed individually in each study for testing their association
with T2D. These were further corrected on the basis of remaining population stratification,
obscure relatedness or methodological artifact by means of genomic control. After that these
results were pooled in a genome-wide meta-analysis over a total of 10,128 samples with 4,549
case and 5,579 control subjects consisting of the results obtained from the first stage of
WTCCC, FUSION and Dig samples. Total of 69 genotypes were prioritised in the second
stage of the meta-analysis and were taken from three replication sets.
The study identified six additional loci that were not detected earlier, to be significantly
associated with T2D. These were JAZF1, CDC123-CAMK1D, TSPAN8-LGR5, THADA,
ADAMTS9 and NOTCH2 gene regions with significant probability values. The most
51
significant statistical evidence for a new association indication was found in rs864745 in
intron 1of JAZF1. These loci presented important evidences of the functions involved in the
continuance of standard glucose homeostasis and in the pathogenesis of T2D [45].
Middle East
The Middle East lacks in the number of genome wide population studies in T2D, which has
created serious gap in the understanding of diabetes trend within the population. Only
epidemiological studies have been made so far, but even these are not consistent and leave
much to be desired in understanding the total picture. Before going into more details about the
genetic disorder of Arab world, I would like to define Arab population and give a slight
introduction of their history and their migration.
Historical Background of Arabs
Looking into the history of the world at large, it becomes evident that human societies have
always been stratified on the basis of caste, class, clan, race, region, religion, ethnicity, gender,
age and socioeconomic status. It is ethnicity and racial discrimination that distinguishes one
nation from the other. “Ethnicity is” Macionis submits, “a shared cultural heritage and people
define themselves or others as members of an ethnic category based on common ancestry,
language or religion that gives them a distinctive social identity.” Same is the case with the
Arab world, which maintains its unique ethnographic identity, historical background, ancestry,
cultural traits, social norms, moral values, religious beliefs and genealogy. The people
speaking Arabic as their primary or first language are called the Arabs. At present, the total
Arab population, inhabited in twenty-three countries of the world, has been estimated to be
about 325 million with 2.3% annual increase.
They have been articulated divergent propositions regarding the origin and background of the
Arabs. One school of thought declares that the Arabian Peninsula is the origin of the Arabs,
and the Bedouin clans of that region are the forefathers of them, who had been living there far
before the birth of Abraham in Babylonia. The first positive reference to the Arabian extant
occurs in an inscription of the Assyrians, Shalmaneser III, who speaks of the capture of a
thousand camels from Gindibu, the Arabia, in 854 B. C. (Landau, 1958: 11-21: quoted in
bible.ca) In addition, it had mistakenly been considered that all Arabs are the descendents of
52
Ismail (Ishmael) the elder son of Abraham. The basic source of this information is the Semitic
religions and a large majority of the Abrahamic religions, including Jews, Christians and
Muslims; view Ismail as the father of the Arabs. According to the Jewish sources, it was
Ishmael, whose descendents were blessed and multiplied as a great nation: “GOD heard the
boy (Ishmael) crying, and the angel of GOD called to Hagar from heaven and said to her,
"What is the matter, Hagar? Do not be afraid; God has heard the boy crying as he lies there.
Lift the boy up and take him by the hand, for I will make him into a great nation."” (Genesis
21:17-18) On the contrary, historians do not see eye to eye with the tradition that all the
arabian tribes have one and the same ancestor. Instead, they strongly believe that a significant
number of many races migrated to the Arabian Peninsula after Abraham left Ismail and his
mother Hagar in the desert of Paran. The tradition got great popularity due to the fact that the
word Arab simply means a desert with neither water nor trees. Hence, the relics prove the very
fact that the region was not populated before the advent of Ismail. “Linguistically, the word
"Arab" means deserts and waste barren land well-nigh waterless and treeless; ever since the
dawn of history, the Arabian Peninsula and its people have been called as such.” After the
advent of Ismail as well as the appearance of Zamzam Well , the Qahtani tribes got their way
to the peninsula and sought the permission of Ismail to get settled in the area. Ismail got
married to the daughter of Jurhum branch of the Qahtani tribe. “The Historians generally agree
that the ancient Semitic peoples Assyrians, Aramaeans, Canaanites (including the Phoenicians
and Hebrews) and, later, the Arabs themselves migrated into the area of the Fertile Crescent
after successive crises of overpopulation in the Peninsula beginning in the third millennium
before the Common Era (BCE) and ending with the Muslim conquests of the 7th century CE.”
The historians divide the origin of Arabs into three categories:
Perishing Arabs: The relics and archaeological researches have got very little knowledge
about the very initial Arab tribes. According to the researches, it has been estimated that some
Arabic speaking clans existed in or around the present Saudi Arabia soon after the
construction of Holy Ka’aba, which were perished away at the eve of the Noah’s flood due to
their disobedience to the ways of GOD. Since there is no authentic record of their origin, life,
activities and descendents, they are often stated as the perishing ancient Arabs. Somehow, it is
thought that some of the ancient nations, including 'Ad, Thamûd, Tasam, Jadis, Emlaq, and
others, destroyed and ruined because of the wrath of Almighty GOD due to their misdeeds and
malpractices, were among the very first Arabian tribes.
53
Pure Arabs: Pure Arabs are the people, which seek their ancestry in the person of Qahtan. The
progeny of Ya'rub bin Yashjub bin Qahtan is the pure Arab, which existed from 2300 BC to
800 BC in the Sayhad region of South Arabia. “In the late 3rd Millenia BC Semitic tribes
began to concentrate in the Sayhad region in South Arabia uniting under the leadership of the
semi-legendary Qahtan.” The Qahtanis began building simple earth dams and canals in the
Marib area in the Sayhad desert. At present, the Qahtani Arabs live in Palestine, Lebanon,
Syria, Egypt, Morocco, Lybia, Ethiopia, Nigeria and other parts of the same region.
Arabized Arabs: Almost all the theologians and historians are unanimously view that an
overwhelming majority of Arabs is from the progeny of the Ishmael, the elder son of the
Prophet Abraham. They were also called Adnanian Arabs, after the name of Adnan, a pious
man and one of the descendents of Kedar, the second son of Ishmael (Ismail). Adnan is also
the ancestor of the Holy Prophet of Islam Muhammad Bin Abdullah (peace and blessings of
Almighty GOD be upon him and his family). The family of Adnan observed an imperative
growth and spread in a significant part of particularly the Arabian Peninsula. The Adnanian
Arabs were the trustee of the Holy Ka’aba and served the pilgrims arriving from yonder lands
and distant parts of the world to perform the pilgrimage of the Holy Ka’aba. The Adnanian
Arabs traveled widely for trade and commerce in different parts of the region as well as the
world. Some of them migrated and settled in divergent areas of the present Middle East as
well as the northern and central parts of Africa. The Arabized Arabs ruled over Yemen,
Heerah, Syria and Hejaz from 650 B. C. onward and retained their unique culture, language,
norms and identity wherever they moved for different purposes. They witnessed popularity,
boost and respect particularly after the advent of Islam. The descendents of Pure and Arabized
Arabs reside in almost all countries of the world, with majority and strong hold in twenty-four
countries of the Middle East and Africa. The researches reveal the very fact that the
overwhelming majority of the Arabized Arabs is Muslim and lives in almost all the states of
Africa and Asia, particularly from Iraq in the east to Morocco in the west and Lebanon in the
north to Tanzania in the south. Arab populations are distributed on 23 different countries,
namely: Algeria, Bahrain, Comoros, Djibouti, Egypt, Eritrea, Iraq, Jordan, Kuwait, Lebanon,
Libya, Mauritania, Morocco, Oman, Palestine, Qatar, Saudi Arabia, Somalia, Sudan, Syria,
Tunisia, United Arab Emirates, and Yemen.
54
Arab Migration:
The world has turned into a global village in the contemporary era and people migrate from
one part of the world to the other because of very easy and speedy means of traveling and
communication. Being one of the most dynamic races, and rulers of multiple states of the
globe, the Arabs have also made migration to Asian, European, American and Australian
continents for business, studies, health, trade and employment purposes. Moreover, the
descendents of the Holy Prophet (peace be upon him) have also migrated to Iraq, Iran, Syria,
Lebanon, Egypt, India, Russia and Yemen during the course of time on political, economic
and religious purposes for the last fourteen century. Further, the Arab Muslims also settled in
Qatar, Kuwait, UAE, Turkey, Spain, Nigeria, Tanzania, Kenya, Afghanistan, Bangladesh,
China and other parts of the globe, only for the sales of their merchandise as well as for the
preaching of their religious beliefs. In addition, the other nations have also made their
migration into Arab world. There are Africans, Indians, Chinese, Europeans and Australians
particularly in the United Arab Emirates and Kuwait serving and working at various positions
and professions.
Genetic Disorders in the Arab world:
Genetic disorders are very common among the Arabs. Researches reveal the very reality that
the ratio of genetic disorders is far higher in the Arab world in comparison with the western
and non-Arab societies of the world. “A genetic disorder is a disease that is caused by an
abnormality in an individual’s DNA. Abnormalities can range from a small mutation in a
single gene to the addition or subtraction of an entire chromosome or set of chromosomes.”
Genetic disorders result in physical or mental disabilities and dysfunctions, and high infant
mortality rate among the individuals. Like other regions of the globe, the Arab world also
undergoes genetic disorders in its population. There are many causes of such kind of disorders
among Arabs, which can be studied as under:
Consanguinity among the Arab Population:
The demographic statistics show that consanguineous marriage (inter-familial marriages or
marriages in blood-relations and cousin-marriages) are very much common among the Arabs,
which multiply the probabilities of the transmission of the family diseases to the next
55
generations. Medical researchers have proved the facts that continuous cousin-marriages
accelerate the chances of the transfer of same deficiencies in genes of the people. It has often
been observed that T2D, blood pressure and heart diseases are more common in the families,
which observe cousin marriages in access. “Throughout the Arab World”, Tadmouri observes,
“consanguineous marriage is traditionally common. Overall, around 40% to 50% of marriages
in the Arab World are consanguineous. The specific types of consanguineous marriage vary
between and within countries. First cousin marriages are the most common consanguineous
bonds in the Arab World. Estimates indicate that the percentage of first cousin marriages is
approximately 11.4% in Egypt, 21% in Bahrain, 29% in Iraq, 30% in Kuwait, 31% in Saudi
Arabia, and 32% in Jordan.” (2004: p 3) Hence, chain of consanguineous marriages is one of
the most imperative causes of genetic disorders among the Arabs.
The High Occurrence of Haemoglobinopathies in the Arab World:
Another important cause of the presence of genetic disorders includes the excessive metabolic
disorders. “The high prevalence of haemoglobinopathies”, Al-Ghazali opines, “glucose-6-
phosphate dehydrogenase deficiency, autosomal recessive syndromes, and several metabolic
disorders cause genetic disorders among the Arabs.” (2006: p 831) Absence of proper health
measures and medical check-ups during the pregnancy also result in the prevalence of
haemoglobinopathies among the next generation.
The High Birth Rates:
The number of pregnancies in Arab countries is far more than that of the pregnancies in the
western world. It seriously affects the health of the mother, which undergoes immunity
deficiency and many other diseases. An ailing, ill and aged mother cannot give birth to the
healthy children. It is therefore; infant mortality rate is very high in Arab countries due to the
genetic disorders.
Lack of Physical Exertion:
Discovery and exploration of the liquid gold i.e. oil in bulk in the Arab world during 1960s
has revolutionised the life-style and financial position of the Arabs. Economic development
has turned the Arabs more and more easy-going. Lack of physical activities and absence of
56
hard efforts cause the creation of the tender and deficient physique; which consequently
projects and promotes genetic disorders among the next generations of the whole Arab
community.
Need and Scope of Medical Researches in the Arab World:
Modern technological advancements has revolutionized the patterns of life and influenced the
pole-apart regions and areas of the world. But there exist some societies and races, which did
not take any benefit and advantage of superb scientific inventions, achievement and
accomplishments. The same is the case with the Arab and African worlds of today, where
genetic disorder has held strong roots, causes of which are still a question mark for the
individuals suffering from these uneven, unpleasant and untoward physical and mental
deficiencies. Though some theories have been articulated and researches have been conducted
regarding the causes of genetic disorders among the Arabs, yet no significant reason of such
deficiencies have been discovered still. There is an opportunity of vast scope to measure the
causes by applying research methods in order to find out sociological, biological,
environmental and psychological avenues behind the genetic diseases in the Arabs. In
addition, the public health sector is very backward in the Arab world.
United Arab Emirates (UAE)
Federation of seven prosperous, vigorous, brisk and glamorous states, the south eastern region
of the Middle East is united under the canopy of United Arab Emirates in 1971. It has been the
centre of trade and commerce in the present day world for the last four decades. People from
every part of the world arrive there in search of labor, job, trade and business. Inauguration of
grand commercial institutions, arrival of most popular multinational brands and establishment
of the chains of gorgeous recreational centers has turned the UAE as one of the most
sensational and fascinating region of the globe. Though the federation has become the
amalgamation of so many cultures of the world and the individuals belonging to almost every
nation can be found there, yet the Arabs are the most dominating stratum of the culture and
society of the UAE. Though the political set up and control of the federation is in the hands of
the native people, yet it has been estimated that over 81% of its total population consists of
foreign workers, laborers, investors, traders and other professionals. Out of 4.7 million
population of the UAE, it is estimated that only 19% are UAE national, due to the high rate of
57
immigration as well as bright opportunities of commercial activities in the whole region. Most
of the pure and original Arabs have migrated to the UAE from Hijjaz, Iraq, Egypt, Syria,
Bahrain, Oman, Lebanon, Palestine, Libya, Morocco, Somalia, Tunisia, Sudan and Algeria.
T2D has become a major public health problem in the UAE. A survey completed by the
Ministry of Health in UAE reported that the overall percentage of people with diabetes was
19.6% among UAE citizen group. Furthermore, recent studies estimated that 25% of adult
Arabs now suffer from diabetes; mainly T2D; and the prevalence of the disease is increasing
[5]
The high frequency of consanguineous marriages leads to an increase in the prevalence of
homozygosity which greatly facilitates the identification of predisposing genes [59]. Early and
extended child bearing age leads to large pedigrees with multiple affected members, which
allows extensive linkage and sibling pair analyses. These factors provide an opportunity to
study the various ethnic/tribal groups of Unites Arab Emirates (UAE) towards understanding
genetic predisposition of T2D.
T2D has become a major public health problem in the UAE. A survey completed by UAE’s
Ministry of Health reported that the overall percentage of people with diabetes was between
13% and 19% among expatriates who live in UAE. Furthermore, Malik and his colleagues
[60] have estimated that 25% of UAE nationals suffer from diabetes; mainly T2DM; and the
prevalence of the disease is increasing.
In addition, another study conducted by Reed et. al (2005) [61] on a random sample of UAE
citizens over the age of 30 living around the city of Al-Ain reported that 20% of subjects
studies suffered from T2D (14% rural to 25% urban). However, the methodology used may
have resulted in underestimation of prevalence by as much as 20% as a recent studies reported
by Centre for Arab Genomic Studies (CAGS) indicated that the prevalence of T2D in UAE
rises with increasing age reaching 40% in people over 30 years. These observations emphasise
the necessity of considering prevention for diabetes in the UAE.
Unfortunately, there are very few researches that have been carried out in the UAE regarding
genetic associations of T2D. At the moment, even the epidemiological studies about the
58
prevalence of the condition in the UAE is lacking [4]. The studies have shown that there have
been differences in the percentage of disease in urban as well as rural populations [4].
While the adult population has shown a strong inclination towards diabetes occurrence, there
are even lesser studies carried out on the pediatric population of the Arab countries. The
demographics of pediatric diabetes have shown a high prevalence of new cases being detected
each year, where the male to female ratio is roughly similar. Majority of these patients
presented with diabetic ketoacidosis in their first visit. This shows that many patients remain
undiagnosed until complications develop [15].
A related study carried out in Bahrain can offer some insight into the prevalence of diabetes in
the region. While these findings are not explanatory for UAE patients, it nevertheless may
help in identifying some of the key features that may be similar in the region. The prevalence
of diabetes in this country is very high, and patients above 30 years of age alone constitute
21% of the population [62]. In 41% of the diabetics a strong family history was found, and a
strong association with the presence of hypertension was also seen in these patients. Obesity
was as high as 74 percent in these cases, which points to the relative other health risks that
increase due to high BMIs [62]. This study however, is very old and may not be indicative of
the current trends and numbers. However, it does point towards the significance of the
problem even a very long time back.
These observations emphasize the necessity of considering prevention for T2D in the UAE.
To date, no research has been conducted on the implication of genetic testing, or genome wide
screen for T2D among UAE population or any Arab population. With the high prevalence
rates of diabetes and the constant decrease in the average age of first onset of diabetes, the
need for such a research is essential in understanding the pathological process involved in this
population.
Since, T2D has not been extensively studied among the Arab populations of the Middle East
along with the characteristics of Arabic population make them ideal for the study of complex,
polygenic, multifactorial disorders such as diabetes [63], thus there is a need for researches
which can be conducted on the implication of genetic testing, or genome wide screening for
T2D among UAE population or any other Arab population.
59
Conclusion
Lack of genome wide scan leaves very little to be discussed regarding the genetic prevalence
of diabetes in the Arab countries. The UAE suffers from the same problem. Clinical researches
in this area have helped identify the main trends in the diabetes progression in other countries
of the world; however, with the lack of this basic knowledge, the UAE population cannot
expect to advance further in the treatment strategy for diabetes.
Therefore, there is a pending need for the development of genome wide scans that should
curtail to the population of the UAE and other Arab nations. Without any progress in this area,
there is no hope for proper treatment strategies, and the number of patients with diabetes is
bound to increase.
The aim of this project is to detect loci and genes influencing susceptibility to T2D and related
traits in the UAE population, however data on DNA haplotype in the tribes of the Middle East
is limited and the advances in DNA technology provide the opportunity to study this group of
people. Therefore there are ranges of expected outcomes from this study such as:
(A) Medical applications: The study of DNA from the local ethnic groups provides a double
benefit. Apart from the development of new opportunities in forensic science, the markers will
allow the study of specific diseases that are common to populations of this region such as
T2D. If genetic profiling could be used successfully to identify high-risk individuals, this
would result in substantial benefits to both individuals and society. Targeting preventive
measures towards individuals with high-risk genotypes could delay the onset of disease, slow
its progression, and reduce the ultimate severity of the condition. This would result in
substantial improvements in quality of life for affected individuals and a reduction in
healthcare costs.
(B) Forensic biology Application: The understanding of the distribution of ethnic specific
haplotypes will expand the understanding of STR markers currently employed in crime scene
investigation. Further it will address possible limitations of STR-based DNA profiling
especially the identification of novel variants of alleles, null alleles and mutations. As the
number of samples stored in judiciary databases grows exponentially, unexpected alleles are
constantly being discovered. Previous work has shown new alleles arising from extra or less
60
core repeat units, partial repeats or indel in the sequence flanking the STR repeats. New STR-
based and SNP-based markers could be identified improving DNA profiling in forensic
science.
The identification of polymorphisms that are unique to these populations will provide an
opportunity to enhance DNA profiling. Ethnic-specific polymorphisms can be used to profile
biological evidence left at the crime scene to provide information that could be useful in an
investigation.
In order to achieve this goal, collaborations were established with major hospitals and diabetes
centers in the country. Through this collaboration, demographic data of T2D patients of
Arabic origin was collected and tabulated in database. Individuals from small nuclear families
belonging to a large extended family was selected for genome wide scans after completing
consent forms. Blood samples were taken for genotyping. DNA samples were extracted
according to the standard molecular protocols.
The biochemical data (Glucose, Lipids, HbA1c etc) were collected to complement the genetic
data. Information regarding their lifestyle were also recorded for correlation with the genetics
and biochemical data.
Genome wide screening of the samples were preformed using Human Quad 660 chips scanned
on Illumina’s BeadArray™ technology. The data collected was evaluated using strategically
selected single nucleotide polymorphisms.
Family based association analyses were used to identify genomic regions associated to the
disease. Single nucleotide polymorphisms (SNPs) were identified and haplotype association
studies were performed using haplotype relative risk and transmission disequilibrium test
analysis.
Identification of target genes might also lead to development of novel therapeutic modalities.
Further, the data could complement existing information available for other ethnic groups
towards enhancing our knowledge of the genetic etiology of the disease.
61
The Arab world was never an active participant in the large international projects in the field
of genomics, and the work presented in subsequent chapters aims at changing this position and
addressing the deficiencies that currently exist.
In chapter 2 of this thesis, the samples and data been collected throughout the United Arab
Emirates to establish the Emirates Family Registry (EFR) to develop the capabilities of a bio-
specimen repository, the associated database resources, high-throughput genotyping
capabilities and skills in medical bioinformatics for the UAE. Due to an increasing prevalence
of T2D in the region, lifestyle management strategies with an emphasis on prevention are
required. Consequently, Total of 23,064 volunteers provided consent to allow their clinical
data to be stored in EFR's database in order to study the prevalence of T2D in a population of
United Arab Emirates (UAE).
In chapter 3 of this thesis, we examine the influence of environmental factors in the
pathophysiology of T2D and its related phenotypes in an Arab population. Upon showing that
Arabs have lifestyle problems. Multiple factors, both environmental and genetic, contribute to
the incidence and distribution of T2D therefore; this study describes the role of genes and the
influence of the environmental on the increasing prevalence of T2D in Arab populations.
Physical and clinical traits were collected for assessment. In addition, pairwise phenotypic
correlations of the eight quantitative traits was observed, specifically between HbA1c and
fasting glucose. This assessment of phenotypic factors will be followed up with ongoing
studies to evaluate the contribution of genetic polymorphisms that contribute to the prevalence
of T2D in Arab populations.
Chapter 4 of this thesis, a new method was assessed to allow collection in remote regions and
in developing countries. This study describes the use of FTATM technology for storage DNA
and a Whole Genome Amplification step prior to GWAS application as an alternative strategy
for high throughput genotyping.
Chapter 5 of this thesis describes the distribution of four Alu markers in the Bedouin
population of the Middle East. Specifically, it establishes the relationship between Arab
populations and others previously studied. Ethnic-specific polymorphisms can be used to
profile biological evidence left at the crime scene to provide information that could be useful
in an investigation.
62
Chapter 6 of this thesis, current a study to detect loci and genes influencing susceptibility to
T2D in the United Arab Emirates population. In this study more sophisticated technologies
were used to study DNA polymorphisms and their influence on T2D among Arab.To date, no
research has been conducted on the implication of genetic testing, or genome wide screening
for T2D among UAE population nor any other Arab population.
Chapter 7 of this thesis, study conducted to study the genetic associations with obesity in
ethnically homogeneous cohorts from United Arab Emirates. In this chapter the study focused
on mean Body Mass Index and mean Waist Circumference. A total of 657,367 SNPs in one
extended Emirati family of 319 members only 178 were genotyped been tested in these two
traits. Modern life style of Arab population increased significantly weight gain early in adult
life, thus contributing to the obesity epidemic and associated diseases such as T2D, which
makes them an ideal population to conduct such study.
63
REFERENCES
1. Leslie, R.D., Metabolic changes in diabetes. Eye (Lond), 1993. 7 ( Pt 2): p. 205-8.
2. Hossain, P., B. Kawar, and M. El Nahas, Obesity and diabetes in the developing
world--a growing challenge. N Engl J Med, 2007. 356(3): p. 213-5.
3. Hogan, P., T. Dall, and P. Nikolov, Economic costs of diabetes in the US in 2002.
Diabetes Care, 2003. 26(3): p. 917-32.
4. El-Sharkawy, T., Diabetes in the United Arab Emirates and Other Arab Countries:
need for Epidemiological and Genetic Studies, in Genetic Disorders in the Arab
World. 2004, Centre for Arab Genomic Studies: Dubai. p. 57.
5. Wild, S., et al., Global prevalence of diabetes: estimates for the year 2000 and
projections for 2030. Diabetes Care, 2004. 27(5): p. 1047-53.
6. Goldstein, I., The mutually reinforcing triad of depressive symptoms, cardiovascular
disease, and erectile dysfunction. Am J Cardiol, 2000. 86(2A): p. 41F-45F.
7. American Diabetes Association: National Diabetes Fact Sheet. Alexandria, VA, ADA,
2002.
8. Florez, J.C., J. Hirschhorn, and D. Altshuler, The inherited basis of diabetes mellitus:
implications for the genetic analysis of complex traits. Annu Rev Genomics Hum
Genet, 2003. 4: p. 257-91.
9. Chu, S.Y., S.Y. Kim, and C.L. Bish, Prepregnancy obesity prevalence in the United
States, 2004-2005. Matern Child Health J, 2009. 13(5): p. 614-20.
10. Parker, A., et al., A gene conferring susceptibility to type 2 diabetes in conjunction
with obesity is located on chromosome 18p11. Diabetes, 2001. 50(3): p. 675-80.
11. Scott, L.J., et al., A genome-wide association study of type 2 diabetes in Finns detects
multiple susceptibility variants. Science, 2007. 316(5829): p. 1341-5.
12. Barroso, I., Genetics of Type 2 Diabetes. Diabet Med, 2005. 22(5): p. 517-35.
13. Acton, R.T., et al., Genes within the major histocompatibility complex predict NIDDM
in African-American women in Alabama. Diabetes Care, 1994. 17(12): p. 1491-4.
14. Huang, Q.-Y., M.-R. Cheng, and S.-L. Ji, Linkage and Association Studies of the
Susceptibility Genes for Type 2 Diabetes. Acta Genetica Sinica, 2006. 33(7): p. 573-
589.
15. Permutt, M.A., et al., A genome scan for type 2 diabetes susceptibility loci in a
genetically isolated population. Diabetes, 2001. 50(3): p. 681-5.
64
16. Freudenrich, C., How Diabetes Works. 2002.
17. Parnes, B., et al., Provider deferred decisions on hemoglobin A1c results: a report
from the Colorado Research Network (CaReNet) and the High Plains Research
Network (HPRN). J Am Board Fam Med, 2006. 19(1): p. 20-3.
18. Norman, J., The Diabetes Center. 2006, The Norman Parathyroid Clinic: Tampa, FL.
19. National Institute of Diabetes and Digestive and Kidney Diseases. National Diabetes
Statistics fact sheet: general information and national estimates on diabetes in the
United States. 2003, Department of Health and Human Services, National Institutes of
Health: Bethesda, MD: U.S.
20. Scheede-Bergdahl, C., et al., Metallothionein-mediated antioxidant defense system and
its response to exercise training are impaired in human type 2 diabetes. Diabetes,
2005. 54(11): p. 3089-94.
21. Kennedy, J.W., et al., Acute exercise induces GLUT4 translocation in skeletal muscle
of normal human subjects and subjects with type 2 diabetes. Diabetes, 1999. 48(5): p.
1192-7.
22. Musi, N., et al., AMP-Activated Protein Kinase (AMPK) Is Activated in Muscle of
Subjects With Type 2 Diabetes During Exercise. Diabetes, 2001. 50(5): p. 921-927.
23. Lu, H., et al., Diabetes interferes with the bone formation by affecting the expression of
transcription factors that regulate osteoblast differentiation. Endocrinology, 2003.
144(1): p. 346-52.
24. Almind, K., A. Doria, and C.R. Kahn, Putting the genes for type II diabetes on the
map. Nat Med, 2001. 7(3): p. 277-9.
25. Gerich, J.E., The Genetic Basis of Type 2 Diabetes Mellitus: Impaired Insulin
Secretion versus Impaired Insulin Sensitivity. Endocr Rev, 1998. 19(4): p. 491-503.
26. Wiltshire, S., et al., Evidence from a large U.K. family collection that genes
influencing age of onset of type 2 diabetes map to chromosome 12p and to the
MODY3/NIDDM2 locus on 12q24. Diabetes, 2004. 53(3): p. 855-60.
27. Strachan Tom, R.A., Human Molecular Genetics 2 Second ed. 1999 John Wiley &
Sons, Inc.
28. Ehm, M.G., et al., Genomewide Search for Type 2 Diabetes Susceptibility Genes in
Four American Populations. The American Journal of Human Genetics, 2000. 66(6):
p. 1871-1881.
29. Frayling, T.M., Genome-wide association studies provide new insights into type 2
diabetes aetiology. Nat Rev Genet, 2007. 8(9): p. 657-62.
65
30. Silander, K., et al., A Large Set of Finnish Affected Sibling Pair Families With Type 2
Diabetes Suggests Susceptibility Loci on Chromosomes 6, 11, and 14. Diabetes, 2004.
53(3): p. 821-829.
31. Das, S.K., Genetic Epidemiology of Adult Onset Type 2 Diabetes in Asian Indian
Population: Past, Present and Future. INTERNATIONAL JOURNAL OF HUMAN
GENETICS, 2006. 6(1): p. 1-13.
32. Sale, M.l.M., et al., A Genome-Wide Scan for Type 2 Diabetes in African-American
Families Reveals Evidence for a Locus on Chromosome 6q. Diabetes, 2004. 53(3): p.
830-837.
33. Vionnet, N., et al., Genomewide Search for Type 2 Diabetes-Susceptibility Genes in
French Whites: Evidence for a Novel Susceptibility Locus for Early-Onset Diabetes on
Chromosome 3q27-qter and Independent Replication of a Type 2-Diabetes Locus on
Chromosome 1q21-q24. The American Journal of Human Genetics, 2000. 67(6): p.
1470-1480.
34. Reinberg, S., Exercise Helps Teens Overcome 'Obesity Gene', in HealthDay. 2007.
35. Fu, M., et al., Polymorphism in the calsequestrin 1 (CASQ1) gene on chromosome
1q21 is associated with type 2 diabetes in the old order Amish. Diabetes, 2004. 53(12):
p. 3292-9.
36. Dupont, S., et al., No Evidence for Linkage or for Diabetes-Associated Mutations in
the Activin Type 2B Receptor Gene (ACVR2B) in French Patients With Mature-Onset
Diabetes of the Young or Type 2 Diabetes. Diabetes, 2001. 50(5): p. 1219-1221.
37. Hayes, M.G., et al., Patterns of linkage disequilibrium in the type 2 diabetes gene
calpain-10. Diabetes, 2005. 54(12): p. 3573-6.
38. Gibson, F., S. Hercberg, and P. Froguel, Common Polymorphisms in the USF1 Gene
Are Not Associated With Type 2 Diabetes in French Caucasians. Diabetes, 2005.
54(10): p. 3040-3042.
39. Rung, J., et al., Genetic variant near IRS1 is associated with type 2 diabetes, insulin
resistance and hyperinsulinemia. Nat Genet, 2009. 41(10): p. 1110-5.
40. Takeuchi, F., et al., Confirmation of multiple risk Loci and genetic impacts by a
genome-wide association study of type 2 diabetes in the Japanese population.
Diabetes, 2009. 58(7): p. 1690-9.
41. Bouatia-Naji, N., et al., A variant near MTNR1B is associated with increased fasting
plasma glucose levels and type 2 diabetes risk. Nat Genet, 2009. 41(1): p. 89-94.
66
42. Timpson, N.J., et al., Adiposity-related heterogeneity in patterns of type 2 diabetes
susceptibility observed in genome-wide association data. Diabetes, 2009. 58(2): p.
505-10.
43. Unoki, H., et al., SNPs in KCNQ1 are associated with susceptibility to type 2 diabetes
in East Asian and European populations. Nat Genet, 2008. 40(9): p. 1098-102.
44. Yasuda, K., et al., Variants in KCNQ1 are associated with susceptibility to type 2
diabetes mellitus. Nat Genet, 2008. 40(9): p. 1092-7.
45. Zeggini, E., et al., Meta-analysis of genome-wide association data and large-scale
replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet, 2008.
40(5): p. 638-45.
46. Gudmundsson, J., et al., Two variants on chromosome 17 confer prostate cancer risk,
and the one in TCF2 protects against type 2 diabetes. Nat Genet, 2007. 39(8): p. 977-
83.
47. Salonen, J.T., et al., Type 2 diabetes whole-genome association study in four
populations: the DiaGen consortium. Am J Hum Genet, 2007. 81(2): p. 338-45.
48. Saxena, R., et al., Genome-wide association analysis identifies loci for type 2 diabetes
and triglyceride levels. Science, 2007. 316(5829): p. 1331-6.
49. Zeggini, E., et al., Replication of genome-wide association signals in UK samples
reveals risk loci for type 2 diabetes. Science, 2007. 316(5829): p. 1336-41.
50. Hanson, R.L., et al., A search for variants associated with young-onset type 2 diabetes
in American Indians in a 100K genotyping array. Diabetes, 2007. 56(12): p. 3045-52.
51. Steinthorsdottir, V., et al., A variant in CDKAL1 influences insulin response and risk of
type 2 diabetes. Nat Genet, 2007. 39(6): p. 770-5.
52. Sladek, R., et al., A genome-wide association study identifies novel risk loci for type 2
diabetes. Nature, 2007. 445(7130): p. 881-885.
53. Xiang, K., et al., Genome-wide search for type 2 diabetes/impaired glucose
homeostasis susceptibility genes in the Chinese: significant linkage to chromosome
6q21-q23 and chromosome 1q21-q24. Diabetes, 2004. 53(1): p. 228-34.
54. Sanghera, D.K., et al., Impact of nine common type 2 diabetes risk polymorphisms in
Asian Indian Sikhs: PPARG2 (Pro12Ala), IGF2BP2, TCF7L2 and FTO variants
confer a significant risk. BMC Med Genet, 2008. 9: p. 59.
55. Rampersaud, E., et al., Identification of novel candidate genes for type 2 diabetes from
a genome-wide association scan in the Old Order Amish: evidence for replication from
67
diabetes-related quantitative traits and from independent populations. Diabetes, 2007.
56(12): p. 3053-62.
56. Einarsdottir, E., et al., Linkage but not association of calpain-10 to type 2 diabetes
replicated in northern Sweden. Diabetes, 2006. 55(6): p. 1879-83.
57. Rasmussen, S.K., et al., Variants within the calpain-10 gene on chromosome 2q37
(NIDDM1) and relationships to type 2 diabetes, insulin resistance, and impaired acute
insulin secretion among Scandinavian Caucasians. Diabetes, 2002. 51(12): p. 3561-7.
58. Dupont, S., et al., No Evidence for Linkage or for Diabetes-Associated Mutations in
the Activin Type 2B Receptor Gene (ACVR2B) in French Patients With Mature-Onset
Diabetes of the Young or Type 2 Diabetes. Diabetes, 2001. 50(5): p. 1219-1221.
59. de Costa, C.M., Consanguineous marriage and its relevance to obstetric practice.
Obstet Gynecol Surv, 2002. 57(8): p. 530-6.
60. Malik, M., et al., Glucose intolerance and associated factors in the multi-ethnic
population of the United Arab Emirates: results of a national survey. Diabetes Res
Clin Pract, 2005. 69(2): p. 188-95.
61. Reed, R.L., et al., A controlled before-after trial of structured diabetes care in primary
health centres in a newly developed country. Int J Qual Health Care, 2005. 17(4): p.
281-286.
62. Al-Zurba F, A.-G.A., Prevalence of Diabetes Mellitus among Bahrainis Attending
Primary Health Care Centers. Eastern Mediterranean Health Journal, 1996. 2: p. 274-
282.
63. Kambouris, M., Target gene discovery in extended families with type 2 diabetes
mellitus. Atheroscler Suppl, 2005. 6(2): p. 31-6.
68
69
CHAPTER 2
THE PREVALENCE OF TYPE 2 DIABETES
MELLITUS IN THE UNITED ARAB EMIRATES:
JUSTIFICATION FOR THE ESTABLISHMENT OF
THE EMIRATES FAMILY REGISTRY.
This chapter was submitted to International Journal of Diabetes in Developing Countries and
the format presented is as per the "Instruction to Authors" from the publishing house.
70
71
Chapter 2
The Prevalence of Type 2 Diabetes Mellitus in the United
Arab Emirates: Justification for the Establishment of the
Emirates Family Registry.
Chapter 2 is presented as a manuscript submitted to International Journal of Diabetes in
Developing Countries. The data collected was only possible through the collaborative
network of three hospitals, nine primary care centres, the Dubai Police Clinic, the United
Arab Emirates (UAE) Ministry of Health and the University of Western Australia.
This manuscript describes The "EFR Project" or Emirates Family Registry, which was
established as part of a collaborative effort to develop the capabilities of a bio-specimen
repository, associated database resources, high-throughput genotyping capabilities and skills
in medical bioinformatics for the UAE. Towards demonstrating its feasibility, a pilot project
commenced in 2007 has recruited volunteers from 3 local hospitals and 9 primary care
centres. Through this network, 23,064 volunteers provided consent to allow their clinical data
to be stored in EFR's database (Table1). DNA samples from Bedouins with Type 2 Diabetes
(T2D) were collected from 1,766 donors. Due to the increasing prevalence of T2D in the
region, lifestyle management strategies with an emphasis on prevention are required.
Consequently, understanding the environmental factors and genetic predispositions were
important aims of this study to ensure successful implementation of future public awareness
programs.
Table1: The Emirates Family Registry Database
Disease Status Ethnicity Males Females Total
Type 2 Diabetes
Without complication
Bedouin 1,092 1,595 2,687 Others 6,550 8,450 15,000
With Cardiovascular Complications Bedouin 1,092 1,595 2,687
Healthy Volunteers Bedouin 1,330 1,360 2,690 Total: 23,064
72
The data presented in the manuscript specifically summarises the features of diabetes in a
local population, not previously studied. The UAE consists of a cosmopolitan population that
includes the tribes of the Middle East and expatriates from neighboring Asian nations.
Although unique in the make-up of the ethnic group studied, the project was conceived on the
basis of previous studies in different ethnic groups including those published in the
International Journal of Epidemiology (1992, 21:352-358; 1998, 27:853-1859; 1999, 28:498-
501; and 2006, 35:1553-1562).
It is evident that both lifestyle and inherited risk factors lead to the development of T2D.
Therefore this study was established to study the prevalence of T2D in a population of UAE
residents through the creation of the EFR. The conclusion from the analyses performed have
revealed that obesity, waist circumference, consanguineous marriage, family history, lack of
physical activity, unhealthy diet with high total cholesterol and triglycerides levels were more
prevalent in T2D patients.
The pilot program of the EFR described here was quite successful. The data presented
throughout this manuscript sort life style features, which contribute to disease in order to
defining more effective and specific plans to screen for and manage diabetes and its
complication in the UAE and other developing countries throughout the Middle East region.
Figure 1: Collaborative Links of the EFR Project have been established throughout the
Middle East, United Kingdom and Australia
73
Further, the collaborative network established with international research groups in
Australia, Europe and the Middle East (see Figure1) will ensure future development of the
EFR project.
I prepared this manuscript with support from the co-authors listed. The samples were
collected by local healthcare workers and I compiled all the collected data. Dr Hassoun; a
physician; of the Joslin Diabetes Centre, an affiliate of the Dubai Health Authority,
contributed as clinical expert and provided guidance. Khadra Jama-Alol is the epidemiologist
at University of Western Australia’s School of Public Health who assisted me with the
statistical analyses. Dr Tay assisted me throughout the study with specific guidance relating to
the design of the study.
74
75
The Prevalence of Type 2 Diabetes in the United Arab Emirates: Justification for the
Establishment of the Emirates Family Registry.
Habiba S Al Safar1, 2, Khadra A Jama-Alol3, Ahmed AK Hassoun4, Guan K Tay1
1 Centre for Forensic Science, University of Western Australia, Western Australia,
Australia. 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates. 3 School of Population Health, University of Western Australia, Western Australia,
Australia. 4 Joslin Diabetes Centre, Affiliate Dubai Health Authority, Dubai, United Arab Emirates.
Abbreviated title: Type 2 Diabetes, United Arab Emirates, Emirates Family Registry
Keywords: Prevalence, Type 2 Diabetes, UAE, EFR
Publication number HA09-0005 of the Center for Forensic Science at the University of
Western Australia.
Corresponding author:
Associate Professor Guan K Tay
Centre for Forensic Science
The University of Western Australia
35 Stirling Highway, Crawley WA 6009, AUSTRALIA
Phone: + 61 8 6488 7286
Fax: + 61 8 6488 7285
Email: [email protected]
76
77
ABSTRACT
This project was conceived with the aim of studying the prevalence of Type 2 Diabetes (T2D)
in a population of United Arab Emirates (UAE) residents through the creation of the “Emirates
Family Registry” (EFR). This resource is the first of its kind as it focuses on the indigenous
populations of the Arab world. It will allow researchers to collect and collate data from
patients with T2DM and healthy volunteers to assess features that may contribute to disease
progression among populations of the Middle East.
Methods: Major hospitals and diabetes centres in the UAE were contacted to establish a bio-
banking facility referred to as the EFR. Through assistance made available by the Ministry of
Health and collaborators of this network, demographic data of T2D patients were collected
and collated in a database for analysis and longitudinal studies into the future. Clinical
specimens were collected for biochemical profiling (such as; glucose, lipids, HbA1c levels).
Results: In the first 24 months of the operation the EFR recruited 23,064 adult volunteers from
three major hospitals and nine primary care centres throughout the UAE. Within this cohort,
88% were patients classified as T2D patients from the medical records. The cohort was
divided into age categories with 59% of T2D patients aged between 40 and 59 years old. UAE
nationals comprised 30% of the database of which 21% were diagnosed with T2D. However
the percentage of adults with T2D was higher in other ethnic groups effecting almost 33% of
the Indians who live in the UAE. A total of 741 UAE Nationals consented to donate blood; in
phase I of the study; for biochemical testing, of which 23% were diagnosed with T2D, 30%
with pre T2D and 47% were healthy following the completion of testing.
Conclusion: This study is consistent with the previously reported high prevalence of T2D in
the UAE. Furthermore, analyses of the factors that predispose to the disease have revealed that
obesity, a large waist circumference, consanguineous marriage, family history, lack of
physical activity, unhealthy dietary practices, high total cholesterol, and high triglycerides
levels were more prevalent in T2D patients. The classification of these features will contribute
to defining more effective and specific plans to screen for and manage diabetes and its
complication in the UAE and other developing countries throughout the Middle East region as
well as other developing countries.
78
INTRODUCTION
Type 2 Diabetes (T2D) is a group of metabolic diseases characterised by hyperglycaemia (1).
Several physiological processes are involved in the development of diabetes (2). These range
from autoimmune destruction of the β-cells of the pancreas with consequent insulin deficiency
to abnormalities that result in resistance to insulin action. The majority diabetic cases fall into
two broad etiopathogenetic categories: Type 1Diabetes (T1D) caused by an absolute
deficiency of insulin secretion, and T2D caused by a combination of resistance to insulin
action and an inadequate compensatory insulin secretary response. T2D accounts for some 90
to 95% of those with diabetes. It was previously referred to as non-insulin dependent diabetes
or adult onset diabetes (3). It includes individuals who have insulin resistance, relative insulin
deficiency, and usually need insulin treatment mainly later in the course of disease (4, 5).
The chronic hyperglycaemia resulting from diabetes is associated with long-term dysfunction,
damage and eventually failure of various organs (6, 7). These changes mainly occur due to
micro- and macro- vascular complications. Long-term complications of diabetes include
retinopathy with potential loss of vision; nephropathy leading to chronic kidney disease;
peripheral neuropathy with risk of foot ulcers, amputations and Charcot joints; and autonomic
neuropathy causing gastrointestinal, genitourinary, and cardiovascular symptoms and sexual
dysfunction (8). Patients with diabetes have an increased risk of developing atherosclerotic
cardiovascular, peripheral arterial and cerebrovascular disease (9-12). People with diabetes
often have high prevalence of hypertension and abnormalities of lipoprotein metabolism (13).
The United Arab Emirates (UAE) has a cosmopolitan population of about 4.7 million and
exhibits a unique demographic structure. The UAE sits at a crossroad of the trade routes
between Asia and Europe. It has flourished as a contemporary centre of trade and commerce
over the last four decades. People from every part of the world arrive in search of jobs, trade
and business. UAE national makes up only 19% of the total population with balance
comprising expatriates of different ethnic backgrounds. The largest ethnic group are people of
South Asian origin (approximately 50%). Those from other parts of Asia include Philippines,
China, Hong Kong, Indonesia, Singapore and Thailand. These East Asians are grouped with
Caucasian and compromise up to 8% of the population. Iranians comprises 8% and the rest of
the population are from other Arab states (15%). These estimates are based on the results of
79
the 2005 census that included a significantly higher estimate of net immigration of non
citizens than estimates in July 2009 (14, 15).
T2D, has become a major public health problem in the UAE. A survey completed by UAE’s
Ministry of Health reported that the overall percentage of people with diabetes was between
13% and 19% among expatriates who live in UAE. Furthermore, Malik and his colleagues
(16) have estimated that 25% of UAE national suffer from diabetes; mainly T2D; and the
prevalence of the disease is increasing.
In addition, another study conducted by Reed and colleagues (2005) (17, 18) on a random
sample of UAE citizens over the age of 30 living around the city of Al-Ain reported that 20%
of subjects studies suffered from T2D (14% rural to 25% urban). However, the methodology
used may have resulted in underestimation of prevalence by as much as 20% as a recent
studies reported by Centre for Arab Genomic Studies (CAGS) indicated that the prevalence of
T2D in UAE rises with increasing age reaching 40% in people over 30 years. These
observations emphasise the necessity of considering prevention for diabetes in the UAE.
The Emirate Family Registry (EFR) project was conceived to provide a means to more
accurately estimate prevalence through a longitudinal approach. Secondly it represents an
important tools and resource as the genomic era gains momentum towards assisting in
deciphering the complexity of diseases in humans (19). Similar approaches to assess risk
factors of diabetes in other populations have been conducted (20-23) .When the EFR project
commenced, the requirement was to establish a registry with well defined description of the
disease (ie. the phenotype) as well as the genetic background of populations of interest (ie. the
genotype). This resource is currently not available for the ethnic groups of the Arab world.
Therefore the EFR was developed to address this deficiency. The EFR can be used by local
research groups to systematically study common diseases throughout the Middle East region.
It will also be used to develop regional and international collaborations in biomedical science.
The EFR is a register containing information on the local ethnic population of the region
designed specifically to study the genetic factors that are unique to this region which will lead
to better patient care, disease management and improved quality of life.
80
MATERIAL AND METHODS
Emirates Family Registry (EFR)
Three major hospitals and nine primary care centres in the United Arab Emirates (UAE) were
contacted to establish EFR. Through this collaboration, data from all patients attending these
clinics and hospitals was collected and tabulated in a database. This study was approved by the
ethics committee of the UAE Ministry of Health and Dubai Police Head Quarter (see
appendix). In general, UAE nationals are the majority group that visit the clinics and the care
center, from which the samples were collected, sees mostly patients with UAE identify cards.
For a non-local visitors including expatriates, passports are required in order to receive
treatment. Therefore the nationality of each volunteer was determined by their legal
documents. Patients and volunteers were selected randomly. A verbal consent was obtained
from those patients who agreed to allow their name to be added to the registry and an
informed consent was obtained from all individuals who donated blood before commencement
of the study procedures. The procedure for collecting the data and samples is summarised in
Figure1.
The database of the registry was constructed using Visual Studio 2006. The EFR comprises
two components: (1) a computer database documenting the details of participants of the
registry and (2) a DNA and bio-specimen repository. Data from patients include demographic
data, biochemical results such as haemoglobin A1c (HbA1c), fasting blood glucose, oral
glucose tolerance test (OGTT), lifestyle variables (healthy diet, daily physical activity,
smoking, quality of life), disease complications (neuropathy, nephropathy, retinopathy) and
family history. There are provisions to expand the registry to include different diseases and
their associated clinical and genetic features.
Subject
A total of 23,064 adult who reside in the UAE volunteered to participate in this study on their
routine visit to the three major hospital and nine primary care centre. Of the total group 20,374
were diagnosed with T2D. Overall 741 UAE Nationals donated blood for biochemical test to
confirm their diagnosis (Diabetic, Pre Diabetic and healthy) and to study their risk factors
which contribute to developing T2D.
81
Collection of Phenotype data
Trained nurses measured the height and weight of each participant using a calibrated wall-
mounted stadiometer and a weigh scale, respectively. Body Mass Index (BMI in kg/m2) was
the measure: weight in kilograms (kg) divided by the square of height in metres (m). Waist
Circumference (WC) was measured in inches. For classification over weight and obesity was
defined according criteria provided by the World Health Organization (WHO). A WHO
classification for BMI over weight ranges between 25 to 30 kg/m2. High waist circumference
was defined as ≥ 35 inches for females and ≥ 40 inches for males.
Biochemistry Profile
Up to 5ml of peripheral blood was drawn from 741 UAE national and collected in EDTA,
Heparin and Fluoride vacutainers for biochemical test. Fluoride and Heparin tubes were
centrifuged at 3,000 rpm for 5 minute and serum was collected. Serum from the Fluoride tubes
were used to measure fasting glucose, Total cholesterol and oral glucose tolerance, and serum
from Heparin tubes were used to measure triglycerides, urea and creatinine level. 25µl of
blood from EDTA tube were used to measure haemoglobin component A1c (HbA1c).
An individual was classified as diabetic if the subject (1) was diagnosed with Diabetes by a
qualified physician, (2) was on a prescribed drug treatment regime for Diabetes and (3) had
biochemical test results that was consistent with the criteria laid by the World Health
Organization (WHO) consultation group report that specifies a fasting plasma glucose level of
at least 126mg/dl. Impaired glucose tolerance was preformed only on subject that did not
suffer from diabetes when enrolled in this study. Individuals were classified in the pre
Diabetic group if the 2 hour post glucose level in the subject was more than 140mg/dl and
more and normal glucose tolerance was a classification used if the 2 hours post glucose level
was less than 140mg/dl.
All the biochemical tests were performed at Al-Baraha Hospital using the Cobas Integra 800
clinical chemistry system (Roche Diagnostics, Indianapolis, USA).
Statistical analysis
The p values (probability value) for each phenotype studied were calculated using Dunnett's
Multiple Comparison Test in GraphPad Prism version 5.0. The standard deviation, mean and
82
percentages were calculated from data input into a Microsoft Excel spreadsheet. A p value <
0.05 was regarded as statistically significant for a two-sided test.
83
RESULTS
The establishment of a registry which contains essential clinical information linked to genomic
data is vital towards great understanding of disease mechanisms in the local ethnic groups of
the UAE. The flow chart in Figure 1 shows the path all patients who volunteer to participate in
the EFR go through is a well defined process. The patient is interviewed and consent is
obtained in their routine visit to the primary care centre or hospital. The data collected from
this patient is entered into the database and becomes part of the overall data of the registry.
The patient’s disease status is assessed and specimen types are recommended and collected.
Subsequently, as data from the analysis becomes available it is entered into the database.
Table 1 provides a summary of the data that has been entered into the registry at the time this
manuscript was compiled. As the patients’ data are entered as shown in Figure 1, it
accumulates and increases the amount of information available for analysis. Over the lifetime
of the registry this information will become an important resource. To date the EFR contains
information on 23,064 individuals, of which 60% were between the ages 40 to 59 years old.
Female volunteers comprise 56% of the entries in the database. Almost 30% are UAE
nationals and 88.3% were diagnosed with T2D.
The registry was set up to collect data to allow estimates of the percentages of the population
who are burdened with diabetes. In time, the overall prevalence of disease throughout the
population will be determined. Figure 2 shows a breakdown of the information collected for
the separate age categories studied. Approximately 3% of T2DM patients were under the age
of 20, about 13% of adult were aged between 20 and 39 years, 59% of adult were aged 40 and
59 years and more than 24% of adults were aged 60 years or older. This was included in the
study design because of the fact that most studies are showing that younger populations
throughout the world are succumbing to T2D.
84
Figure 1: Flow chart outlining the process of recruiting volunteers into the Emirates Family
Registry from three major hospitals and nine primary care centres in the UAE.
The Emirates Family Registry consists of (a) associated database resources that
contain demographic, clinical and genetic data and (b) bio-specimen repository.
85
Table 1: Characteristic of the 23,064 individuals in the Emirates Family Registry.
Characteristic Value (n) Percent (%)
Age (years)
18-20 1,014 4.40
21-39 3,281 14.22
40-59 13,126 59.91
+60 5,642 24.46
Gender
Male 10,059 43.61
Female 13,005 56.39
Ethnicity
UAE National 6,904 29.93
Others* 16,160 70.07
Disease Affection
Type 2 Diabetes 20,374 88.34
Healthy 2,690 11.66
*Consist of: (124; 0.54%) African, (14,587; 63.25%) Asian, (348; 1.51%) Caucasian, (1097;
4.76%) Middle Eastern (except UAE) and (4; 0.017%) Southern American who are residence of
the United Arab Emirates during the study period
86
Figure 2: Chart estimating the percentage of Type 2 Diabetes patients by age group in Emirates Family Registry.
Approximately 3% of T2D patients were under the age of 20, about 13% of adult were aged between 20 and 39 years,
59% of adult were aged 40 and 59 years and more than 24% of adults were aged 60 years or older.
87
The EFR reflects the ethnic diversity of the UAE population and in Table 2; the volunteers are
separated into East Asia, Central Asia, and Middle East. Unfortunately, the local Ministry of
Economy has chosen to combine the minority groups into one category, which combines the
district genetic groups in the orient (East Asia) with Caucasians (western group). Apart from
this discrepancy, information is readily available according to country giving data that is more
specific to each population. Overall, the population of UAE nationals with T2DM is 21%.
However the EFR revealed a higher percentage of T2D in other ethnic groups such as Indian
(33%) as one of the major hospital most of their patients where Indian origin at the time this
study was carried out.
The issue of screening for T2D is important both in terms of an individual’s health and day-to-
day clinical practice as well to a country's overall public health system. One of the advantages
of the screening process set out in the EFR program is to identify individuals at risk of having
undiagnosed T2D or at risk of developing T2D as it will play an important role in preventing
complication of the disease. Tables 3 and 4 shows the risk factors that affect the 741 UAE
national who volunteered for the study. There were three groups, those diagnosed with T2D,
those with pre-T2D and healthy individuals. The physical appearance, life style, family
history and results of biochemical test of each volunteer was recorded. In regard of the
physical appearance 26.18% of the UAE participants are overweight and 7% are obese. Of the
741 UAE national 39% have large waist circumference (male and female). Additionally,
lifestyles features can be seen in Table 3 which shows that 58% of the patients having
unhealthy food in their diet such as fast food, 45% not performing any kind of exercises
(minimum of 30 minutes walking a day). Genetics heritability is another risk factor in
developing the disease; Table 3 shows that 65% have a history of T2D in their family (at least
one parent diagnosed with T2D) with 35% of consanguinity marriage. As far as the
biochemical tests performed the percentage of the population with results association with the
disease are summarised in Table 3.
Table 5 summarises the predicted p value between healthy group and pre-T2D and between
healthy group and T2D using Dunnett's Multiple Comparison Test. Age represents the most
significant risk factor in developing T2D among the physical appearance features p = 0.0065.
Lipids profile such as cholesterol and triglyceride shows a significant p value (0.0018, 0.0023
respectively) when healthy individuals are compared to diabetic patients.
88
Table 2: In 2009, UAE’s population was estimated at 4.7 million, of which 19% were UAE nationals, while the majority of the
population were expatriates. The largest group were of South Asian origin (50%). Those from other parts of Asia
(includes Philippines, China, Hong Kong, Indonesia, Singapore and Thailand) and those of Caucasian origin
compromised up to 8% of the population, while Iranian comprised 8% of the population and the rest of the population
were from other Arab states (15%). It was estimated that close to 20 percent of UAE national have Type 2 Diabetes.
However the percentage of adults with Type 2 Diabetes is higher in other group effecting almost 52.67% of Southern
Asian who lives in the United Arab Emirates.
Ethnic Group
Percent of Ethnic Group in UAE
Number of T2D in EFR
Percent of T2D in EFR
Prevalence of T2D per100,000
South Asian 50% 10,732 52.67% 447.31
UAE National 19% 4,214 20.68% 462.21
Other Arab 15% 3,961 19.44% 550.31
Caucasian + East Asian 8% 504 2.47% 131.29
Iranian (Persians) 8% 487 2.39% 126.86
* According to the UAE census bureau, Caucasian and East Asians were consolidated in minority group.
89
Table 3: Characteristics of clinical data for 741 (UAE National) Type 2 Diabetes, pre
Type 2 Diabetes and healthy adult individual.
Category Subcategory Value (n)
Physical
Appearance
Gender Male 470 (63.43%) Female 271 (36.57%)
Age*
18-20 5 (0.67%) 21-39 246 (33.20%) 40-59 396 (53.44%) 60+ 76 (10.26%)
Body Mass Index (BMI)
Underweight <18.50 27 (3.64%)Normal range 18.50-24.99 329 (44.40%) Overweight 25.00-29.99 294 (39.67%)Obese ≥30.00 91 (12.28%)
Waist Circumference (WC)
Male ≤40 in 315 (42.51%)>40 in 155 (20.92%)
Female ≤35 in 135 (18.22%)>35 in 136 (18.35%)
Lifestyle
Smoking Yes 185 (24.97%)No 556 (75.03%)
Physical Activity Yes 405 (54.66%)No 336 (45.34%)
Diet Yes 314 (42.38%)No 427 (57.62%)
Inheritance Family History Yes 479 (64.64%)
No 262 (35.36%)Consanguinity Marriage
Yes 259 (34.95%)No 482 (65.05%)
Biochemistry
Test
Fasting Plasma Glucose
<100 mg/dl 340 (45.88%)100-125 mg/dl 247 (33.33%)≥126 mg/dl 154 (20.78%)
Oral Glucose Tolerance
<140 mg/dl 349 (47.10%)≥140 mg/dl 224 (30.23%)
HBA1c <6.5 % 542 (73.14%)≥6.5 % 199 (26.86%)
Cholesterol <200 mg/dl 495 (66.80%)≥200 mg/dl 246 (33.20%)
Serum Triglycerides
<150 mg/dl 430 (58.03%)≥150 mg/dl 311 (41.97%)
Urea <43 mg/dl 707 (95.41%)≥43 mg/dl 34 (4.59%)
Creatinine <1.3 mg/dl 675 (91.09%)≥1.3 mg/dl 66 (8.91%)
*The total number in age category does not total to 741 as some individuals under 18 years old were not included in this study
90
Table 4: Clinical and biochemical features of 391 patients diagnosed with T2D, pre T2D and 350 healthy individuals
Characteristic
Type 2 Diabetes Pre-Type 2 Diabetes Healthy
Male Female Male Female Male Female
n= 85 n = 83 n=167 n=56 n=218 n=132
Physical
Appearance
Age 51.75 ± 9.17 50.96 ± 11.01 37.94 ± 9.83 30.66 ± 9.17 48.39 ± 12.95 44.93 ± 9.93
BMI 33.47 ± 7.21 31.94 ± 7.97 28.37 ± 6.75 29.11 ± 8.40 23.92 ± 4.01 23.84 ± 4.06
Waist Circumference 41.42 ± 10.16 41.62 ± 6.78 44.52 ± 13.36 42.43 ± 11.41 33.71 ± 6.94 33.50 ± 8.29
Life Style
Smoking 49.41 10.84 40.72 1.79 28.90 1.52
Physical Activity 32.94 28.92 28.74 12.50 87.16 81.82
Diet 23.53 16.87 3.59 5.36 75.69 80.30
Inheritance Family History 63.53 71.08 55.69 78.57 3.67 3.03
Consanguinity Marriage 50.59 51.81 38.92 30.36 29.82 19.70
Biochemical
Test
Fasting Plasma Glucose 179.91 ± 48.88 160.52 ± 38.99 105.37 ± 9.41 109.25 ± 7.78 89.77 ± 6.36 88.98 ± 7.54
Impaired Glucose Tolerance - - 159.61 ± 13.16 158.18 ± 14.03 99.73 ± 9.61 98.91 ± 8.37
HbA1c 8.24 ± 1.73 7.86 ± 1.90 6.42 ± 1.48 5.98 ± 0.46 4.97 ± 0.63 4.92 ± 0.60
Cholesterol 239.78 ± 35.95 227.98 ± 54.31 195.37 ± 15.42 203.13 ± 15.40 116.11 ± 19.48 122.43 ± 16.40
Serum Triglycerides 188.85 ± 35.32 178.11 ± 51.18 153.22 ± 17.98 155.98 ± 19.29 91.60 ± 19.16 100.16 ± 11.38
Urea 43.47 ± 6.77 32.27 ± 7.35 25.17 ± 5.98 23.93 ± 5.92 21.95 ± 5.13 21.55 ± 5.31
Creatinine 1.24 ± 0.16 1.04 ± 0.23 0.81 ± 0.19 0.80 ± 0.22 0.88 ± 0.13 0.85 ± 0.12
91
DISCUSSION
The importance of a thorough and well-maintained database for significant disease entities
cannot be overstated. Diabetes is an overwhelming healthcare problem throughout the world
and in the UAE studies have shown that around 20% of the population has T2D.
The EFR was conceived as a resource to manage T2D in the UAE. It provides data that is
available through clinical testing and DNA screening. The data is stored in a systematic way
within a database and is coupled to a DNA and bio-bank repository to facilitate future
longitudinal studies. To the best of our knowledge such an effort has not been undertaken for
the Arab population. By breaking down the data in different ways, it is easy to establish what
particular strategy might be employed to improve disease management.
The process works quite the same for each patient and is carried out the same way allowing
simplifying the decision making process for healthcare workers, allowing for consistency in
the material collected. As shown in Figure 1; at the initial consultation; the status of each
individual is assessed and specific protocols are followed. A specific questionnaire and
assessment of clinical information within the UAE healthcare database, allows for a first pass
screen to determine disease status. The volunteers are categorised according to disease status,
and the decision is made as to the nature of bio-specimens that need to be collected. Using
T2D as a case in point, biochemical test relating to glucose, triglycerides and others are
requested along with a sample for research purposes. Since there is a lack of information on
genetic factors that predispose Arabs to T2D, DNA samples are stored for present and future
studies. Its value as a DNA data bank will increase over time as more volunteers are recruited
and genetic studies are completed. It has, thus far, been quite successful but by increasing the
numbers of patients that are in the DNA pool, we also increase our ability to identify gene
polymorphisms that may be related to T2D in Arabs.
When T2D identified in its early stages can be treated reducing the impact of the disease and
severity of the complication. However, there are even greater possibilities if the disease could
be decreasing through DNA testing. By having a large database of patients, we are more likely
to determine the nature of the genetic polymorphisms to predisposes to disease and the
possibly the underlying mechanisms, giving rise to the potential of therapy. DNA research in
92
general has just come into its own over the last two decades but research in diabetes and
particularly an understanding of genetic makeup that is associated with T2D in Arabs is
desperately needed at this time.
This database will allow clinicians and researchers to have access to information that can
make a tremendous difference to the nature for treatment that might be put in place to manage
the disease. For example, as illustrated in Table 2, there is a higher number of females who are
affected by T2D in the UAE. Further, the prevalence of the disease increases with age (Figure
2), in addition specific physical attributes and lifestyle habits (Table 4) are associated with the
disease. A complication related to T2D is metabolic syndrome. Hypercholesterolemia is one
of many significant problems with levels above 200mg/dl indicative of disease (Table 3 and
4). With this information, and in combination with other factors, physician can monitor patient
with hypercholesterolemia and be aware of signs that could indicate a progression to T2D.
Currently over 170 million people around the globe suffer from T2D. Most of these patients
are middle aged, however, variations in this regard are not rare, and are affected by factors
such as lifestyle, heredity, as well as behavioural factors (24). In this study, Table 3 shows
young patients of 30 year age with a large waist circumference have fasting blood sugars at
108mg/dl or greater and HbA1c levels at 6.42% (data not shown). It is also noted in Figure 2
that the 40 to 59 year old group is the largest group but the group of 20 to 39 year olds are not
far behind with 10% diagnosed with T2D.
T2D risks increase, as an individual grows older, especially after the age of 45 years. It has
been estimated that one out of five people aged 20 to 79 lives with this disease. Part of the
reason is that as people grow older they tend to become less physically active and they
gradually loose muscle mass and gain weight (25). However over recent years, a dramatic rise
in T2D among individuals in their 30s and 40s has been observed and more children and
teenagers are being diagnosed with the disease.
Public awareness can be increased using campaigns to reverse the alarming trend of increasing
prevalence among patients. Moreover, over the past decade it has been obvious that the
prevalence of T2D is increasing rapidly. Unless appropriate action is taken, it is predicted that
there will be at least 350 million people in the world with T2D by the year 2030 (26).
93
Risk factors for T2D are well defined. These include obesity, physical inactivity, elderly
people, family history of diabetes and those with a weakened tolerance for glucose. Table 4
illustrates the importance of maintaining a healthy physique, especially BMI and waist
circumference, life style, and biochemicals testing to monitor physiological changes are
essential. Abnormal fasting glucose above 126mg/dl, triglycerides above 150mg/dl,
cholesterol above 200mg/dl, and an elevated BMI and waist circumference can mean that the
patient already has metabolic syndrome.
Previous studies have shown that a family history of T2D is a very important indicator for
developing T2D (27, 28). Among the 741 UAE national who donated blood for this study,
63% of males and 71 % of females who are diabetic have first-degree relatives with T2D,
where only 3.6% of males and 6.06% females with the disease has history of T2DM in their
family. EFR has focused on collecting DNA from families to study T2D on the premise that
having one or more first-degree relatives with T2D increases the odds of having the disease.
Further, the use of families provides a degree of redundancy with a registry. Over time
patients are loss to the system due to migration. These individuals can be readily tracked by
contacting family members to discern their whereabouts.
The prevalence of T2D was more common among individuals in consanguineous marriages
with first degree relatives compared with the healthy group, an observation that is consistent
of a study by Bener et, al.(2005), which showed that consanguineous marriages were more
prevalent in T2D patients (29). This study also confirms previous studies (28) that show T2D
closely associated with overweight and obesity (BMI > 25). Okosun and his group (1998)
showed in their study that a large waist circumference is the strongest indicator of T2D risk
(30). Data in Table 4 show that males patients have larger waist circumferences than their
females counterparts.
Additionally, the data in Table 4 shows the significant of smoking among T2D and pre-T2D is
higher than the healthy individual. It has been suggested that smoking increases the risk of
diabetes but the evidence has been inconclusive. It is not surprising that smoking plays an
important role as there is evidence that smoking is bad for the pancreas, causes internal
inflammation and increases the hormones that increase abdominal fat even in thin smokers,
which could hamper the work of the insulin resistance (31). Table 5 summarises information
to be used especially by those in primary care clinics. With access to a public health database
94
such as the EFR, physicians can establish deficiencies and where diagnostic processes a poor
resulting in diabetics being missed. This in turn allows them to determine processes that might
allow for earlier diagnosis, follow up and preventing complications.
As with the association between disease and family relationships, ethnicity is another risk
factor. African Americans, Hispanic or Latino Americans, American Indians, and some Asian
Americans and Pacific Islanders are at particularly high risk for T2D (32). Some component of
this factor is most likely related to genes carried from earlier times, passed down through
generations. The data collected in the EFR and tabulated in Table 2 supports this showing
varying degrees of prevalence among populations from different race.
The nature of DNA profile or genetic makeup is generally population specific and can provide
leads toward best practice for care. To date, the nature of the genetic lesions that leads to T2D
in Arabs is not known. One of the primary objectives of the EFR was to provide a resource to
study genes of indigenous Arab populations. The DNA repository, when coupled with to
longitudinal data, will provide opportunities for researchers to dissect different variations of
the disease and for physicians to determine what the long term management procedures might
be used for monitoring patients.
In summary, with the resource that is the EFR data collected from volunteers has revealed that
obesity, waist circumference, consanguineous marriage, family history, lack of physical
activity, unhealthy diet with high total cholesterol and triglycerides levels were more prevalent
in T2D patients with predicted p value between healthy group and pre T2D and between
healthy group and T2D summarised in Table 5.
What is known is that there are both monogenic as well as polygenic forms of the conditions
that manifest in T2D can occur in a wide variety of variations. While the simple classification
method of Type 1 Diabetes and Type 2 Diabetes are helpful in unlocking the secrets of the
disease, these have not resulted identifying key clear cut factors between both forms of the
disease in the Arab population, therefore, a more extensive time period of continuous research
is required to understand the true nature of this disease. We believe that the longitudinal nature
of the EFR will allow the researchers to assess whether or not there are confounding
environmental factors or if a different set of genes account for earlier onset T2D.
95
Table 5: p-value generated by Dunnett's Multiple Comparison Test between Healthy
individuals and Pre-Diabetic and Diabetic patients.
Risk Factors
p value
Healthy vs. Pre-Diabetic
Healthy vs. Diabetic
Physical Appearance
Age (years) 0.0079 0.0065
BMI (kg/m2) 0.0121 0.0113
Waist Circumference (inch) 0.0083 0.0085
Biochemical Test
Fasting Plasma Glucose (mg/dl) 0.0032 0.0025
HbA1c (%) 0.0565 0.0484
Total Serum Cholesterol (mg/dl) 0.0020 0.0018
Serum triglycerides (mg/dl) 0.0025 0.0023
Urea (mg/dl) 0.0138 0.0107
Creatinine (mg/dl) 0.2792 0.2498
96
CONCLUSION
The Emirates Family Registry or EFR was developed in pursuit of several outcomes: (1)
studying lifestyle variables and other exposures that may be related to the development of
Diabetes Mellitus, (2) evaluating patient awareness about the disease and developing new
trends in disease prevention and management for the UAE; and (3) categorising patients and
their families based on disease complications which may imply different pathophysiology and
therefore different susceptibility genes.
The pilot programme of the EFR described here was successful. The data presented
throughout this paper could not have been gathered any other way as the tightly knit Bedouin
communities that are essentially closed to the technological advances could only be
approached through key members at the upper end of the family heirachy. The study has
provided an initial dataset collected from large numbers of volunteers (23,064). The
information gained is useful in many ways including genome wide association studies to
identify contributing polymorphisms, data that will entered into this data base, when available.
Analysis of the information within the database has revealed much about T2D in the UAE
allowing for the possibility of earlier diagnoses, treatment and intervention. This information
could help in the diagnosis and treatment of diabetes even before the patient has symptoms, in
the silent stage. The need to continue to add patients to the database as they are found and
treated; as well as those that do not presently have the disease is extremely important. This
kind of study and continued collection of data could lead to the genomic studies needed to
control of Diabetes. This would be a great thing for the patient, families, and the healthcare
system of any country.
To date, the lack of genome wide association studies leaves very little to be discussed
regarding the genetic prevalence of diabetes in the Arab countries. Therefore, there is a
pending need for the development of genome wide association studies for populations of the
UAE and other Arab nations.
97
ACKNOWLEDGEMENTS
Publication number HA09-0005 of the Centre for Forensic Science at the University of
Western Australia. Ms Alsafar is a PhD scholar at the University of the Western Australia
supported by the Dubai Police General Head Quarters in the United Arab Emirates. Ethics
approval was obtained form the United Arab Emirates Ministry of Health committee. Funding
for this project was provided by the Emirates Foundation. We would like to thank the Al-
Baraha Hospital and the Dubai Police Clinic for assisting with biochemical tests performed in
this study.
98
REFERENCES
1. Leslie RD. Metabolic changes in diabetes. Eye (Lond). 1993;7 ( Pt 2):205-8.
2. Chandy A, Pawar B, John M, Isaac R. Association between diabetic nephropathy and
other diabetic microvascular and macrovascular complications. Saudi J Kidney Dis
Transpl. 2008 Nov;19(6):924-8.
3. American Diabetes Association: National Diabetes Fact
Sheet. Alexandria, VA, ADA. 2002.
4. Centers for Disease Control and Prevention. National Diabetes Fact Sheet, General
Information and National Estimates on Diabetes in the United States. Atlanta, U.S. :
Department of Health and Human Services, Centers for Disease Control and
Prevention2007.
5. Centers for Disease Control and Prevention Coordinating Center for Health Promotion.
Diabetes: Successes and Opportunities for Population-Based Prevention and Control At-
A-Glance2009.
6. Diagnosis and Classification of Diabetes Mellitus. Diabetes Care. 2004 January
2004;27(suppl 1):s5-s10.
7. Eaks GA, Tiszka R. Chronic complications of diabetes: a creative management
approach. Nurse Pract Forum. 1998 Jun;9(2):74-86.
8. Goldstein I. The mutually reinforcing triad of depressive symptoms, cardiovascular
disease, and erectile dysfunction. Am J Cardiol. 2000 Jul 20;86(2A):41F-5F.
9. Pan WH, Cedres LB, Liu K, Dyer A, Schoenberger JA, Shekelle RB, et al. Relationship
of clinical diabetes and asymptomatic hyperglycemia to risk of coronary heart disease
mortality in men and women. Am J Epidemiol. 1986 Mar;123(3):504-16.
10. Uusitupa MI, Niskanen LK, Siitonen O, Voutilainen E, Pyorala K. 5-year incidence of
atherosclerotic vascular disease in relation to general risk factors, insulin level, and
abnormalities in lipoprotein composition in non-insulin-dependent diabetic and
nondiabetic subjects. Circulation. 1990 Jul;82(1):27-36.
11. Kannel WB, D'Agostino RB, Wilson PW, Belanger AJ, Gagnon DR. Diabetes,
fibrinogen, and risk of cardiovascular disease: the Framingham experience. Am Heart J.
1990 Sep;120(3):672-6.
99
12. Laakso M, Kuusisto J. Epidemiological evidence for the association of hyperglycaemia
and atherosclerotic vascular disease in non-insulin-dependent diabetes mellitus. Ann
Med. 1996 Oct;28(5):415-8.
13. Malecki MT, Klupa T. Type 2 diabetes mellitus: from genes to disease. Pharmacol Rep.
2005;57 Suppl:20-32.
14. El-Sharkawy T. Diabetes in the United Arab Emirates and Other Arab Countries: need
for Epidemiological and Genetic Studies. Genetic Disorders in the Arab World. Dubai:
Centre for Arab Genomic Studies; 2004. p. 57.
15. Expat numbers rise rapidly as UAE population touches 6m: Department of Economic
and Social Affairs Population Division2009.
16. Malik M, Bakir A, Saab BA, King H. Glucose intolerance and associated factors in the
multi-ethnic population of the United Arab Emirates: results of a national survey.
Diabetes Res Clin Pract. 2005 Aug;69(2):188-95.
17. Reed RL, Revel AD, Carter AO, Saadi HF, Dunn EV. A controlled before-after trial of
structured diabetes care in primary health centres in a newly developed country. Int J
Qual Health Care. 2005 August 1, 2005;17(4):281-6.
18. Saadi H, Carruthers SG, Nagelkerke N, Al-Maskari F, Afandi B, Reed R, et al.
Prevalence of diabetes mellitus and its complications in a population-based sample in Al
Ain, United Arab Emirates. Diabetes Res Clin Pract. 2007 Dec;78(3):369-77.
19. Niazi TN, Cannon-Albright LA, Couldwell WT. Utah Population Database: a tool to
study the hereditary element of nonsyndromic neurosurgical diseases. Neurosurg Focus.
Jan;28(1):E1.
20. Nystrom L, Dahlquist G, Ostman J, Wall S, Arnqvist H, Blohme G, et al. Risk of
developing insulin-dependent diabetes mellitus (IDDM) before 35 years of age:
indications of climatological determinants for age at onset. Int J Epidemiol. 1992
Apr;21(2):352-8.
21. Phillips P, Wilson D, Beilby J, Taylor A, Rosenfeld E, Hill W, et al. Diabetes
complications and risk factors in an Australian population. How well are they managed?
Int J Epidemiol. 1998 Oct;27(5):853-9.
22. Sekikawa A, Eguchi H, Tominaga M, Manaka H, Sasaki H, Chang YF, et al. Evaluating
the reported prevalence of type 2 diabetes mellitus by the Oguni diabetes registry using a
two-sample method of capture-recapture. Int J Epidemiol. 1999 Jun;28(3):498-501.
100
23. Villegas R, Shu XO, Li H, Yang G, Matthews CE, Leitzmann M, et al. Physical activity
and the incidence of type 2 diabetes in the Shanghai women's health study. Int J
Epidemiol. 2006 Dec;35(6):1553-62.
24. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, et al. A genome-
wide association study of type 2 diabetes in Finns detects multiple susceptibility
variants. Science. 2007 Jun 1;316(5829):1341-5.
25. Oguma Y, Sesso HD, Paffenbarger RS, Jr., Lee IM. Weight change and risk of
developing type 2 diabetes. Obes Res. 2005 May;13(5):945-51.
26. Wild S, Roglic G, Green A, Sicree R, King H. Global prevalence of diabetes: estimates
for the year 2000 and projections for 2030. Diabetes Care. 2004 May;27(5):1047-53.
27. de Costa CM. Consanguineous marriage and its relevance to obstetric practice. Obstet
Gynecol Surv. 2002 Aug;57(8):530-6.
28. Chen Y, Rennie DC, Dosman JA. Synergy of BMI and family history on diabetes: the
Humboldt Study. Public Health Nutr. 2009 Aug 26:1-5.
29. Bener A, Zirie M, Al-Rikabi A. Genetics, obesity, and environmental risk factors
associated with type 2 diabetes. Croat Med J. 2005 Apr;46(2):302-7.
30. Okosun IS, Cooper RS, Rotimi CN, Osotimehin B, Forrester T. Association of waist
circumference with risk of hypertension and type 2 diabetes in Nigerians, Jamaicans,
and African-Americans. Diabetes Care. 1998 November 1998;21(11):1836-42.
31. Ding EL, Hu FB. Smoking and type 2 diabetes: underrecognized risks and disease
burden. Jama. 2007 Dec 12;298(22):2675-6.
32. Centers for Disease Control and Prevention, National Diabetes Fact Sheet: General
Information and National Estimates on Diabetes in the United States. Department of
Health and Human Services. 2005.
101
CHAPTER 3
HERITABILITY OF QUANTITATIVE TRAITS
ASSOCIATED WITH TYPE 2 DIABETES IN AN
EXTENDED FAMILY FROM THE UNITED ARAB
EMIRATES.
This chapter was submitted to the International Journal of Diabetes and Metabolism in the
recommended format presented in the "Instruction to Authors" from the publishing house.
102
103
Chapter 3
Heritability of Quantitative Traits Associated with Type 2
Diabetes in an Extended Family from the United Arab
Emirates.
Chapter 3 was prepared as a manuscript which was submitted to International Journal of
Diabetes and Metabolism. In this chapter, the influence of environmental factors in the
pathophysiology of Type 2 Diabetes (T2D) and its related phenotypes in an Arab population
was examined.
Multiple factors, both environmental and genetic, contribute to the incidence and distribution
of T2DM. Therefore this study describes the role of genes and the influence of the
environmental on the increasing prevalence of Type 2 Diabetes in Arab populations. It
expands on a study presented by Mathias and colleagues in a 2009 edition of Metabolism:
Clinical and Experimental (10:1439-45). As the incidence of Type 2 Diabetes is increasing at
an alarming rate, an appreciation of the contributing factors will assist in improving
management strategies.
Physical and clinical traits were collected for assessment. Pair-wise phenotypic correlations
of the eight quantitative traits were observed, specifically between glycated hemoglobin
(HbA1c) and fasting glucose. This assessment of phenotypic factors will be followed up with
ongoing studies to evaluate the contribution of genetic polymorphisms that contribute to the
prevalence of T2D in Arab populations.
Diet and lifestyle factors (smoking, exercise, etcetera) are known to play a role in T2D.
Assessment of the quantitative traits collected in this study showed significant contributions by
factors such as Body Mass Index (BMI) and waist circumference (p < 1x10-6). There were
other suggestive traits (cholesterol, creatinine levels; p < 0.05). Although phenotype studies
provide some insight, matching genetic studies will augment the understanding of disease
104
mechanisms. Towards this, the first Genome Wide Association Study in Bedouins was
performed on 178 volunteers from the EFR project's DNA repository using Illumina's Human
660W-Quad-BeadChip. The outcomes of the GWAS are discussed in chapter 6 and 7.
This manuscript was prepared by myself with support from the co-authors listed. Drs Cordell,
Blackwell and Jameison guided me through the statistical analysis and provided me with
valuable comments and feedbacks. Dr Tay guided me throughout the study from designing the
study to proof reading the manuscript.
105
Heritability of Quantitative Traits Associated with Type 2 Diabetes in an Extended
Family from the United Arab Emirates
Habiba S. Al Safar1, 2, Sarra E. Jamieson3, Heather J. Cordell4, Jenefer M. Blackwell3,5, Guan
K. Tay1
1 Centre for Forensic Science, The University of Western Australia, Crawley Western
Australia. 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates. 3 Telethon Institute for Child Health Research, Centre for Child Health Research, The
University of Western Australia, Subiaco, Western Australia. 4 Institute of Human Genetics, Newcastle University, Newcastle upon Tyne, United
Kingdom. 5 Cambridge Institute for Medical Research and Department of Medicine, School of Clinical,
Medicine University of Cambridge, Cambridge, United Kingdom.
Abbreviated title: Type 2 Diabetes, United Arab Emirates
Keywords: Heritability, Quantitative Trait, Type 2 Diabetes
Publication number HA09-0004 of the Centre for Forensic Science at the University of
Western Australia
Corresponding author:
Associate Professor Guan K Tay
Centre for Forensic Science
The University of Western Australia
35 Stirling Highway, Crawley WA 6009, AUSTRALIA
Phone: + 61 8 6488 7286
Fax: + 61 8 6488 7285
Email: [email protected]
106
107
ABSTRACT
The prevalence of Type 2 Diabetes (T2D) in the United Arab Emirates (UAE) is steadily
increasing, posing a major public health problem. This study assessed the value of specific
clinical markers for T2D among five generations of an extended Arab family. This family
included 319 members of 41 nuclear families; from which 178 individuals (86 males, 92
females; 66 diabetic, 112 healthy) formed the study sample set. The ages of the participants
ranged from 4 to 88 years. All participants completed a questionnaire that focused on baseline
factors that have previously been associated with T2D such as diet, smoking, and family
history of the disease. The quantitative traits, fasting glucose, glycated hemoglobin (HbA1c),
cholesterol, triglyceride, urea and creatinine levels were measured. Body mass index (BMI)
and waist circumference were also recorded. The heritability of these eight quantitative traits
were determined with values ranging from 6% to 48%. We found a significant relationship
between T2D diagnosis and waist circumference (p = 2.6, E-9) and BMI (p = 1.0, E-6). The
estimated power for these two traits was 80% to 90%, respectively. Creatinine (p = 0.002) and
cholesterol (p = 0.02) levels were also associated with T2D. Our results support the link
between environmental and genetic factors in the pathophysiology of T2D and its related
phenotypes in an Arab population.
108
INTRODUCTION
Type 2 Diabetes (T2D) is one of the most widespread chronic diseases, contributing to the
severe illness and ultimately leading to death of millions of people worldwide. According to
the International Diabetes Federation, the number of people diagnosed with T2D has risen
over the past twenty years from 30 million to more than 246 million (1, 2). In the Middle
East, 12% to 20% of the population suffers from diabetes. This incidence increases every year
along with the rising costs associated with health care provision (3). A Ministry of Health
survey in 1999 and 2000 reported that 19.6% of people in the United Arab Emirates (UAE)
were diagnosed with diabetes. More recent studies have estimated that 25% of adult Arabs
suffer from T2D, and the prevalence of the disease is increasing. In 2007, the UAE population
had the second-highest incidence of diabetes in the world. In this country, an estimated one in
five people aged between 20 to 79 years of age lives with diabetes, while a similar percentage
of the population is at risk of developing the disease.
A range of risk factors contribute to being at risk to T2D, particularly obesity, physical
inactivity, age, ethnicity, history of gestational diabetes, weakened glucose tolerance, and a
familial history of diabetes (4). The prevalence of diabetes varies between different
populations. Approximately 5% of Asian populations are affected, while almost 50% of the
Pima Indian population suffers from diabetes (5-7) at the top end of this spectrum.
Researchers have noted high rate of new T2D cases among youth in the United State every
year for ; African-American (39 per 100,000), Hispanic-American (29 per 100,000),
American Indian (45 per 100,000), and to a lesser extent Asian-American and Pacific Island
populations (24 per 100,000) (8).
Multiple factors, both environmental and genetic, contribute to the incidence and distribution
of T2D. Urbanisation and concordant changes in lifestyle have been linked to the prevalence
of the disease (9). For instance, the incidence of T2D is very low in some rural populations
such as the Mapuche Indians of Chile and rural Chinese groups, indicating the role of
environmental factors (10). Some of the highest incidences of T2D, however, have been
among the Pima Indians of Arizona and the Naura of Papua New Guinea, suggesting the
importance of genetic factors in the development of the condition (10).
109
The increasing prevalence of T2D in the UAE appears to follow similar trends. Families
among the indigenous tribes show varying degrees of predisposition to the disease. With
widespread urbanisation in the Middle East over the past century, environmental factors
increasingly exert an influence. In this report, we estimate the heritability of traits associated
with T2D in an extended family from the UAE. This assessment of phenotypic factors will be
followed up with ongoing studies to evaluate the contribution of genetic polymorphisms that
contribute to the prevalence of T2D in Arab populations.
110
MATERIAL AND METHODS
Subjects
Major hospitals and primary care centers in the UAE were contacted to establish a
collaborative recruiting network for this study. The study was performed with the approval of
the ethical review committee of the United Arab Emirates Ministry of Health. Through this
collaboration, doctor diagnosed data collected through one-on-one interviews of T2D patients
(and healthy controls) were evaluated. Clinical assessment and questionnaire completion were
conducted at the clinic. Subsequently, 319 individuals belonging to one extended family of
Bedouins origin were identified. Multigeneration family relationships were compiled for these
individuals, and the pedigree of five generation extended family was constructed from 41
nuclear families. A total of 178 individuals from this sample agreed to participate in this study.
Physical attributes
The age, waist circumference and body mass index (BMI) for each volunteer was recorded.
Biochemical testing
All biochemical tests were performed at the Al-Baraha Hospital, Dubai, UAE, using the Cobas
Integra 800 clinical chemistry system (Roche Diagnostics, Indianapolis, IN, USA). Peripheral
blood was collected from the 178 individuals in EDTA, heparin and fluoride vacutainers. The
heparin and fluoride tubes were centrifuged at 3,000 rpm for 5 minutes. Serum from the
fluoride tubes was aspirated off to measure fasting glucose, cholesterol and impaired glucose
tolerance, while serum from the heparin tubes was used to measure triglycerides, urea and
creatinine levels. HbA1c was measured with 25µl of blood from the EDTA tubes. An
individual was classified as diabetic if the subject: (1) was diagnosed with the disease by a
qualified physician; (2) had been prescribed drug treatment for diabetes; and/or (3) met the
fasting plasma glucose criterion of ≥ 126 mg/dl set by the World Health Organisation (WHO).
Statistical analysis
Raw phenotypic data was transformed and adjustment for age and sex. The transformation
process, quantile-quantile (QQ) plots and histogram plots were generated by version 11 of
STATA statistical software (College Station, TX, USA). To achieve normal distribution, the
quantitative trait data were log-transformed. Heritability and power estimates were calculated
111
for each trait using Solar version 4 (11). Pairwise correlations between all phenotypic pairs
were calculated using STATA.
112
RESULTS
The study population included 66 subjects with T2D and 112 healthy subjects; 86 were male
and 92 were female, ranging from 4 to 97 years of age. The mean age of the cohort was 37
years. The means and standard deviations of the eight quantitative traits used in this study are
presented in Table 1.
Table 2 shows the estimated heritability and power for the eight traits used to evaluate the
influence of genetic component on phenotypic variation by using Solar. All traits showed
moderate to high familial aggregation, with heritability estimates ranging from 6% to 44%.
Waist circumference, BMI, creatinine and cholesterol levels showed significant levels of
heritability (p < 0.05), while the p-values were greater than 0.05 for triglyceride, fasting
glucose, HbA1c and urea levels. Waist circumference (44% heritability) and BMI (48%
heritability) had the highest heritability rates among the eight traits, with powers of 80% to
90%. Fasting glucose (36% heritability) and HbA1c (6% heritability) were the only traits that
were directly related to T2D.
Table 3 presents the pairwise phenotypic correlations of the eight quantitative traits. The
highest phenotypic correlation observed in this study was that between fasting glucose and
HbA1c (0.89). Another significant pairwise correlation was between BMI and waist
circumference (0.70), which is related to obesity. There was also a phenotypic correlation
between waist circumference and both fasting glucose (0.52) and HbA1c (0.41); both of which
are related to obesity.
113
Table 1: Phenotypic and clinical characteristics of 178 individuals belonging to an
extended family of Arab origin.
Description Number
Males 86
Females 92
Type 2 Diabetes 66
Healthy 112
Variable Mean ± SD
Physical Appearance
Age (years) 37.35 ± 19.24
Waist circumference (inches) 38.41 ± 7.75
Body mass index (BMI) 29.48 ± 7.97
Biochemical Tests
Creatinine (mg/dl) 0.96 ± 0.25
Cholesterol (mg/dl) 177.19 ± 62.23
Triglyceride (mg/dl) 148.24 ± 83.04
Fasting glucose (mg/dl) 117.32 ± 44.14
Urea (mg/dl) 26.24 ± 8.21
HbA1c (%) 5.73 ± 1.38
114
Table 2: Heritability and power estimation to obtain a suggested (LOD =3) of eight
quantitative traits in 178 individuals. Values have been adjusted for sex and
age. Significant p values are indicated in bold.
Trait H2ra p valuea Chi-squarea Power
estimate
Waist Circumference 0.44 2.6, E-9 34.04 > 80%
Body mass index 0.48 1.0, E-6 28.01 > 90%
Creatinine 0.28 2.0, E-3 7.60 > 20%
Cholesterol 0.18 0.02 3.59 > 10%
Triglyceride 0.14 0.06 2.28 > 10%
Fasting glucose 0.36 0.10 1.63 > 50%
Urea 0.10 0.11 1.49 > 10%
HbA1c 0.06 0.36 0.11 > 10%
a Heritability (H2r), p and chi-square values were obtained with tests on transformed
quantitative trait data. The chi-square and p values relate to the likelihood ratio test comparing
polygenic models to sporadic models.
115
Table 3: Pairwise correlation between diabetes-related phenotypic traits in 178 individuals.
Waist circumference
BMI Creatinine Cholesterol Triglyceride Fasting Glucose
Urea HbA1c
Waist Circumference 1
BMI 0.70 1
Creatinine 0.20 0.18 1
Cholesterol 0.21 0.18 0.29 1
Triglyceride 0.23 0.29 0.20 0.31 1
Fasting Glucose 0.52 0.26 0.22 0.24 0.24 1
Urea 0.01 0.09 0.29 0.13 0.07 0.14 1
HbA1c 0.40 0.28 0.22 -0.04 0.14 0.89 0.16 1
116
DISCUSSION
Our study of T2D in an extended family of Arab origin provides insights into the roles of
genetic predisposition and environmental influence in the rising prevalence of T2D in Arab
populations. We found strong phenotypic correlations between fasting glucose levels and
HbA1c, and between these two traits and waist circumference. Our findings also indicate a
heritable tendency for obesity in this family, indicated by waist circumference and BMI
values. Therefore the heritability of these traits suggest the contribution of genetic factors to
the prevalence of T2D in this population Obesity results from a combination of genetic and
environmental factors that appear to play a significant role in the development of T2D in this
sample. A major and prevalent public health problem, obesity is associated with numerous
conditions such as hypertension, T2D, coronary heart disease and cancer.
Wide ranges of heritability have been reported for these traits in other populations. Mathias
and colleagues (12) found moderate to high familial aggregation for the traits tested in this
study in a south Indian population, with heritability ranging from 21% to 72%.
Anthropometric measures such as height, weight and BMI showed the highest heritability in
their study, and the results in Arabs shown here are consistent with this finding. The
researchers also found strong correlations between genetic and environmental effects for the
measures most directly related to T2D, especially between fasting insulin levels and
anthropometric measures. However, only two pairs of traits showed evidence for complete
pleiotropy: waist circumference was correlated with BMI and fasting insulin levels. These
results suggest that common genes may exert an influence on obesity and insulin levels in
these pedigrees (12).
A study conducted by the Framingham Heart Study group estimated the heritability of
anthropometric and biochemical traits in a Caucasian population (13-15). There
anthropometric trends of this Arabian study were familiar with those shown in the
Framingham studies which found heritability rates for height (0.52 ± 0.09 to 0.88 ± 0.06),
weight (0.42 ± 0.10 to 0.56 ± 0.50) and BMI (0.46 ± 0.10 to 0.49 ± 0.06). However, the
heritability of cholesterol (0.51 ± 0.04) and triglyceride (0.56) levels was much higher than in
the Arab population studied. Their heritability results for fasting blood glucose (0.17 ± 0.04 to
0.39) were similar to that observed in the Arab study.
117
In summary, this study supports the influence of both environmental and genetic factors in the
pathophysiology of T2D and its related phenotypes in an Arab population. Waist
circumference and BMI may play a more prominent role in the development of diabetes in this
population. The results presented show a strong familial aggregation of quantitative traits
associated with T2D. Further studies are underway to identify potentially specific genetic loci
in Arab populations.
118
ACKNOWLEDGMENT
Publication number HA09-0004 of the Centre for Forensic Science at the University of
Western Australia. We gratefully acknowledge the family whose cooperation made this study
possible. We also would like thank Richard Francis at Telethon Institute for Child Health
Research for his support that was allowed the statistical work to be carried out for this study.
Ms Alsafar is a PhD scholar at the University of Western Australia supported by the Dubai
Police General Head Quarters in the United Arab Emirates. Funding for this project was
provided by the Emirates Foundation.
119
REFERENCES
1. Dunstan DW, Zimmet PZ, Welborn TA, De Courten MP, Cameron AJ, Sicree RA, et
al. The rising prevalence of diabetes and impaired glucose tolerance: the Australian
Diabetes, Obesity and Lifestyle Study. Diabetes Care. 2002;25:829-34.
2. Sicree R, Shaw J, and Zimmet P, editors. Diabetes and impaired glucose tolerance. 3rd
edition. Brussels; 2006.
3. International Diabetes Federation. Diabetes Atlas. 2006.
4. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, et al. A genome-
wide association study of type 2 diabetes in Finns detects multiple susceptibility
variants. Science. 2007;316:1341-5.
5. Knowler WC, Bennett PH, Hamman RF, Miller M. Diabetes incidence and prevalence
in Pima Indians: a 19-fold greater incidence than in Rochester, Minnesota. Am J
Epidemiol. 1978;108:497-505.
6. Pavkov ME, Hanson RL, Knowler WC, Bennett PH, Krakoff J, Nelson RG. Changing
patterns of type 2 diabetes incidence among Pima Indians. Diabetes Care.
2007;30:1758-63.
7. Yang X, Pratley RE, Tokraks S, Bogardus C, Permana PA. Microarray profiling of
skeletal muscle tissues from equally obese, non-diabetic insulin-sensitive and insulin-
resistant Pima Indians. Diabetologia. 2002;45:1584-93.
8. Centers for Disease Control and Prevention. National Diabetes Fact Sheet: General
Information and National Estimates on Diabetes in the United States. Department of
Health and Human Services. 2005.
9. Elsharkawy T. Diabetes in the United Arab Emirates and other Arab countries: need
for epidemiological and genetic studies. Genetic Disorders in the Arab World, United
Arab Emirates: Centre for Arab Genomic Studies; 2004. p. 57.
10. O’Rahilly S IBaNW. Genetic factors in type 2 diabetes: the end of the beginning?
Science. 2005:370-3.
11. Almasy L, Blangero J. Multipoint quantitative-trait linkage analysis in general
pedigrees. Am J Hum Genet. 1998;62:1198-211.
120
12. Mathias RA, Deepa M, Deepa R, Wilson AF, Mohan V. Heritability of quantitative
traits associated with type 2 diabetes mellitus in large multiplex families from South
India. Metabolism. 2009;58:1439-45.
13. Brown WM, Beck SR, Lange EM, Davis CC, Kay CM, Langefeld CD, et al. Age-
stratified heritability estimation in the Framingham Heart Study families. BMC Genet.
2003;4:S32.
14. Mathias RA, Roy-Gagnon MH, Justice CM, Papanicolaou GJ, Fan YT, Pugh EW, et
al. Comparison of year-of-exam- and age-matched estimates of heritability in the
Framingham Heart Study data. BMC Genet. 2003;4:S36.
15. McQueen MB, Bertram L, Rimm EB, Blacker D, Santangelo SL. A QTL genome scan
of the metabolic syndrome and its component traits. BMC Genet. 2003;4:S96.
121
CHAPTER 4
EVALUATION OF DIFFERENT SOURCES OF DNA
FOR USE IN GENOME WIDE STUDIES
This chapter has been published in Applied Microbiology and Biotechnology according to the
format prescribed by the journal.
122
123
Chapter 4
Evaluation of Different Sources of DNA for use in Genome
Wide Studies
Chapter 4 is presented as a manuscript submitted to Applied Microbiology and Biotechnology
Journal. The version of the manuscript presented in this chapter has been corrected after
receiving comments from the editor and reviewers of Applied Microbiology and
Biotechnology. The amended manuscript has been returned to the journal's editor for
publication.
As part of the overall effort, we established a DNA repository with the clinical database to
allow; (1) association studies between genotype and phenotype and (2) longitudinal studies
for future work. DNA was collected using the traditional methods of extraction. A new
method was assessed to allow collection in remote regions and developing countries. From
our background in Forensic science, we use FTATM for STR analysis. This study describes
another use of FTATM technology. FTATM cards were developed by Whatman, accompany
which has a respectable track record in filter paper technology and application. FTATM card
system incorporates a chemical preservative that allows in-field collection of biological
samples. Applications in forensic science (blood, saliva and semen collection and storage) as
well as conservation biology (storage of DNA from endangered species) have been reported.
DNA samples stored over 11 years have been successfully amplified for analysis. Storage of
DNA on cards at ambient temperatures represents a substantial saving in infrastructure costs.
This manuscript describes the storage of DNA and a Whole Genome Amplification step prior
to using the GWAS application as an alternative strategy for collecting and storing bio-
specimens for high throughput genotyping.
The use of FTATM to store DNA for genomic applications is becoming more common.
Whatman reported the successful use of DNA from FTATM to genotype 1,516 SNPs using
Illumina's Golden Gate platform and subsequently studied 10,000 SNPs using Affymetrix. In
124
2008, the Hunt Biobank group collected samples on FTATM paper for future genotyping
applications.
This study expands on a study published in BMC Notes by McClure et al (2009); where DNA
extracted from cells on FTATM cards were used to genotype 54,122 cattle SNPs. In this study,
three different sources of DNA (degraded genomic DNA, amplified degraded genomic DNA
and amplified extracted DNA from FTA card) as suitable templates for genome-wide analysis
using Illumina’s Human 660w-Quad Bead Chip which contains 12 times the number of
markers (ie. 660,000 SNPs) was assessed. To the best of our knowledge, this is the first
description of FTATM sourced DNA for high throughput genotyping to study human
polymorphisms.
This manuscript was prepared by myself with support from the co-authors listed. All the
laboratory work at the Central Vetairnary Research Laboratory (CVRL) including DNA
extraction, whole genome amplification were performed by myself. Genotyping was
performed with technical assistance from Dr Abidi, under the guidance of Dr Khazanehdari.
The manuscript was proof read by Dr Dadour, co-advisor to my PhD project. Dr Tay, my
principal advisor, guided me throughout the study from design to proof reading the
manuscripts.
129
Evaluation of Different Sources of DNA for use in Genome Wide Studies and Forensic
Application
Habiba S Al Safar1, 2, Fatima H Abidi3, Kamal A Khazanehdari3, Ian R Dadour1, Guan K Tay1
1 Centre for Forensic Science, the University of Western Australia, Western Australia,
Australia 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates 3 Molecular Biology & Genetics, Central Veterinary Research Laboratory, Dubai, United
Arab Emirates
Abbreviated title: Genome Wide Studies
Keywords: FTA, GWAS, DNA quality
Publication number HA09-0003 of the Centre for Forensic Science at the University of
Western Australia.
Corresponding author:
Associate Professor Guan K Tay
Centre for Forensic Science
The University of Western Australia
35 Stirling Highway, Crawley WA 6009, AUSTRALIA
Phone: + 61 8 6488 7286
Fax: + 61 8 6488 7285
Email: [email protected]
130
131
ABSTRACT
In the field of epidemiology, Genome-Wide Association Studies (GWAS) are commonly used
to identify genetic predispositions of many human diseases. Large repositories housing
biological specimens for clinical and genetic investigations have been established to store
material and data for these studies. The logistics of specimen collection and sample storage
can be onerous, and new strategies have to be explored. This study examines three different
DNA sources (namely, degraded genomic DNA, amplified degraded genomic DNA and
amplified extracted DNA from FTA card) for GWAS using the Illumina platform. No
significant difference in call rate was detected between amplified degraded genomic DNA
extracted from whole blood and amplified DNA retrieved from FTATM cards. However, using
unamplified-degraded genomic DNA reduced the call rate to a mean of 42.6% compared to
amplified DNA extracted from FTA card (mean of 96.6%). This study establishes the utility of
FTATM cards as a viable storage matrix for cells from which DNA can be extracted to perform
GWAS analysis.
132
INTRODUCTION
The collection of biological samples on paper matrices is a common and routine practice. For
example, use of Guthrie spots on filter papers to store and transport samples of blood from
newborns by a heel prick method is standard practise. With advances in molecular techniques,
specific preservatives and novel extraction chemicals have been developed to enhance paper.
Flinders Technology Associates (FTATM) was developed to simplify the collection, shipment
and archiving of a wide variety of biological specimens. It comprises a cellulose-based matrix
containing chemicals (formamide, citrate and Trizma-base) for cell lysis and nucleic acid
preservation (Moscoso et al., 2004). Chemical activation occurs when a biological fluid
comes into contact with the FTATM surface. The preservatives on the FTATM matrix inactivate
bacteria and viruses, thus protecting the biological samples from microbial growth and
contamination. Further, users collecting biological specimens for DNA are protected from
hazardous microbes that may be present in the specimen. FTATM technology has also been
used in a number of animal tissue culture applications. For example, it has been used to safely
transport samples infected by foot-and-mouth disease virus (FMDV) (Muthukrishnan et al.,
2008). FTATM paper also provides the advantage of sample storage at ambient room
temperatures.
FTATM paper has been commonly used as a matrix for DNA storage in a number of
disciplines, particularly in the pharmaceutical sector (Martins et al., 2002; Tolunay et al.,
2006), law enforcement groups (Raina and Dogra, 2002; Tack et al., 2007), agriculture
(Crabbe, 2003; Ndunguru et al., 2005) and regulatory agencies. In the field of forensic science,
FTATM technology has excelled (Harvey, 2005; Yoshihiko and Shin-ichi, 2006). The
simplicity of the collection technique, its adaptability to the range of biological specimens
encountered at potential crime scenes and the ease of storage has made the technology the
preferred evidence collection method. It has been shown that specimens stored on FTATM
paper have long shelf life, with DNA samples recovered from FTATM stored over 17 years
used for reliable human identification (Ndunguru et al., 2005).
On the forensics front, the use of microsatellite short tandem repeats (STR) for DNA profiling
first developed by jeffrey et al (1985) has been invaluable (Gill et al., 1985). However, with
133
advances in genome science, new opportunities continue to be considered (Foster et al., 1998).
The large amount of data from Genome-Wide Association Studies (GWAS), once a hindrance
to applied work such as criminal profiling, will become more manageable with the
development of analysis, visualisation and interrogation software. Here, we have shown that
FTATM archived DNA can be used in GWAS. Consequently, current DNA storage procedures
in forensics are acceptable in the event GWAS or similar genomic methods are adopted for
criminal profiling.
In 2008, the Hunt BioSciences study in Norway, which commenced in 1984, used FTATM
technology for storage of DNA. The study is comprised of a population-based
epidemiological health studies which have focused on factors that predispose to diabetes and
breast cancer. There are some 75,000 participants, with a participation rate of 88%. In their
third ongoing survey HUNT3, in which 10,000 samples were collected, biological specimens
were collected and preserved on FTATM.
In this study, the suitability of DNA stored on FTATM was assessed for more sophisticated
DNA analysis techniques, namely GWAS. GWAS applications have led to a proliferation in
the number of biobanks or biological sample repositories to provide the necessary biological
resource for these substantial genome-wide studies. Considerable effort has been put into
collecting blood and tissue samples and matching these to patient information ranging from
demographic data to specific clinical histories. Over the years, associations between these
phenotypes and genetic polymorphisms have revealed a plethora of genetic associations.
For GWAS studies, the FTATM Elute system, when used in combination with whole genome
amplification (WGA) technologies can create a virtually unlimited supply of nucleic acid
template. Valuable biological samples can be archived or banked at ambient laboratory
temperatures, replacing the need for expensive, space-consuming and energy-demanding
freezer banks. In GWAS studies, the investigation of large groups is necessary because genetic
factors involved in the cause of multifactorial diseases can only ever supply partial
explanations. There is only a certain probability that genetic factors will result in a given
multifactorial disease, and as the sample number increases, the probability becomes more
precise and accurate. However, current storage systems are relatively limited and require
significant infrastructure (e.g. −80°C freezers) and support. Consequently, more convenient
alternatives have to be considered. It is expected that the dissection of genetic factors that
134
predispose to disease and which explain the etiology of the complex multi-factorial disorders
will be the key to preventative strategies, as well as the development of targeted therapeutic
modalities. The development and assessment of technologies including FTATM that facilitate
large-scale genomic efforts are critical to these outcomes.
135
MATERIAL AND METHODS
Sample set
Peripheral blood was drawn and collected in EDTA tubes from three healthy unrelated
individuals (denoted S1, S2 and S3) after receiving ethical approval from the Ministry of
Health in the United Arab Emirates. These three samples were used in each set of the
experiments mentioned below. Four drops of blood from each sample were transferred to a
FTATM paper (Whatman, Maidstone, Kent, UK) and stored at ambient room temperature
(20°C).
Preparation of genomic DNA for GWAS analysis
Three different sets of DNA templates were prepared and used in the present study.
Set 1: DNA was extracted from blood embedded in FTATM (abbreviated PCR-FTA) and then
amplified. DNA samples were purified from FTA by placing a 3-mm disk in a microcentrifuge
tube. The disk was rinsed in TE−1 (10mM Tris–HCI, 0.1 mM EDTA, pH 8) buffer twice and
left to stand for 5 min at room temperature (20°C). The buffer was subsequently removed and
fresh TE−1 buffer was added. The disk was left to stand in elution buffer for 20 min at room
temperature. This step was repeated twice. Subsequently, the elution buffer was removed and
the disk was dried at room temperature for 1 h. At the end of the drying process, a complete
WGA step was performed by thermal Cycler GeneAmp PCR system 9700 (Applied
Biosystems, Lincoln Centre Drive, Foster City, CA, USA) on all three samples separately
using Sigma's Genomeplex® kit (Sigma #WGA4) according to the manufacturer's instructions
(Sigma-Aldrich, St Louis, MO, USA). Prior to GWAS analysis, the PCR products were
cleaned using a Promega Kit (Promega, Madison, WI, USA) according to the protocol
provided.
Set 2: DNA was extracted from whole blood using standard methods and amplified (referred
to as PCR-dgDNA). The quantity and purity of the three DNA samples used were determined
by absorbance measurements using a NanoDrop ND-1000 Spectrophotometer (NanoDrop,
Wilmington, DL, USA). A total of 10ng/µl of each DNA sample was amplified using Sigma's
Genomelex® kit, with PCR clean up performed using the PCR purification Kit of Promega
using thermal Cycler GeneAmp PCR system 9700 (Applied Biosystems).
136
Set 3: DNA was extracted from whole blood using standard methods without further
amplification (dgDNA). Three DNA samples at concentrations of 50ng/µl were prepared for
GWAS analysis.
All sample sets were qualified for GWAS analysis, with DNA ratios (A260:A280) of 1.9 and
the average DNA concentrations of 200ng/μl used for the study. All samples were diluted to a
concentration of 50ng/μl in Tris EDTA (TEKnova, Hollister, CA, USA).
GWAS assay
A genome-wide study was performed on all three sets of DNA with the Human660W-Quad
BeadChip (Illumina, San Diego CA, USA), which contains 660,000 SNPs derived from the
International HapMap Project. The genotype assays for the three sets of DNA were performed
according to the manufacturer's recommendations. In brief, 200ng of DNA template was
subjected to whole-genome amplification at 37°C for 20 to 24 h. Products were degraded,
precipitated, and re-suspended in hybridisation buffer. The re-suspended samples were
denatured at 95°C for 20 min, loaded onto the BeadChips, and placed in a 48°C hybridisation
chamber for 16 to 20 h. After hybridisation, non-hybridised DNA was washed away from the
BeadChips. An allele-specific single-base extension of the oligonucleotides on the BeadChip
was performed in a 48-position GenePaintTM Slide Chamber Rack (Tecan, Männedorf,
Switzerland) using labelled deoxynucleotides and the captured DNA as a template. After
staining of the extended DNA, BeadChips were washed and scanned on an I-Scan apparatus
(Illumina), and genotypes were called using the BeadStudio software version 3.0 (Illumina).
Statistical analysis
Statistics on the data generated were carried out with one-way analysis of variance (ANOVA)
and Bonferroni's multiple comparison tests.
137
RESULTS
The integrity of degraded genomic DNA is critical when used as template for GWAS studies.
The call rates for degraded DNA can be variable, which compromises the integrity of the
study. By way of illustrating this in Figure 1, when GWAS assays were performed using
DNA templates that were degraded, the ratio of “calls” to “no calls” can be highly variable.
The efficiency of the assay is low, with call rates as low as one in five (or 20%) achieved.
The use of a WGA step prior to GWAS analysis can improve the call rate to around 96% (call
rates for three samples under the PCR-degraded gDNA category in Figure 1). In the same
study, the use of FTATM as a DNA collection and storage media was assessed with call rates of
96% and higher achieved (PCR-FTA Figure 1).
In Figure 2, the quality of the base calling function is illustrated. Specifically, in Figure 2C,
clustering of the plots shows that genotypes are not assigned when using degraded DNA
templates. The results for three separate samples, S1, S2 and S3, fall outside the ‘call zone’.
There is some improvement, when the degraded genomic DNA is processed with an
amplification step prior to GWAS analyses (Figure 2B). Interestingly, amplified genomic
DNA collected using FTATM cards generated the best results (Figure 2A), suggesting that this
simple method of specimen collection and nucleic acid purification could be a suitable prelude
for GWAS studies.
In the 20 selected SNPs on chromosome 18, it is clear the genotypes are not called when
degraded genomic DNA is used for analysis. Examples of the types of missed calls and no
calls are specifically shown in Figure 3. The three different DNA templates (PCR-FTA, PCR-
dgDNA and dgDNA) for all three individuals (S1, S2 and S3) were compared and the range of
examples of call scenarios is presented. There are three examples of SNP positions where
there is concordance between all three DNA templates shown, rs10083985 and rs10163808 for
S1 and rs1010360 for S3. In sample S1, the only example of a no call observed for all three
DNA templates used can be seen at SNP rs10163736.
Importantly, the type of DNA template used can give rise to erroneous results. These errors
are compounded and generally missed due to the large amount of data that is generally
138
associated with GWAS studies. An example of a miscall genotype is shown at position
rs1008899 in sample S1 when using degraded genomic DNA as a template for GWAS assays.
To provide a chromosome-wide perspective of the data selected for Figure 3, the same data for
all the SNPs analysed for chromosome 18 is presented using Illumina's Chromosome Browser
(Fig. 4). The density of genotypes called when using DNA templates collected by FTATM
paper is higher when compared to amplified degraded genomic DNA. The number of
genotypes called and accuracy of the calls with degraded genomic DNA was poor. These
results were consistent with the quality control step using box plots to represent the log R ratio
recommended by Illumina (see Fig. 5). The log R values when using amplified DNA template
from FTATM were typically 0.1 to 0.25 for all three samples studied, the range for a good call
(Fig. 5A). The average score for amplified degraded genomic DNA was acceptable (Fig. 5B);
however, SNPs were not as tightly clustered as seen with amplified templates from FTATM.
As expected, the scores reflected the poor quality of results obtained using degraded DNA
(Fig. 5C).
In summary, for three subjects studied (S1, S2 and S3), the call rates were variable when using
degraded DNA as a template (19%, 61% and 48%, respectively, Table 1).
While collecting blood in the conventional fashion for S1, S2 and S3, blood spots were also
collected on FTATM paper. The DNA was harvested and subjected to a genome-wide
amplification step prior to the GWAS assay. The call rates using these DNA templates were
equivalent (96%) or better than (97%) the assays that used amplified degraded genomic DNA
as templates (Table 1).
Results from one-way ANOVA (Table 2) shows pair-wise comparisons of the three sources of
DNA. Overall, there is a significant difference (p = 0.0027) between the call rates observed for
degraded genomic DNA (dgDNA) when compared to PCR amplified degraded genomic DNA
and DNA sourced from FTATM. The call rates of the latter two were similar (mean of 96.0%
and 96.6%), respectively. These call rates above 95% are above the optimal rates used for
conventional GWAS using pristine quality DNA.
139
Figure 1: Summary of called genotype and no genotype calls of 657,366 SNPs across 23 chromosomes using three sources of DNA: PCR-
FTA, PCR-dgDNA and dgDNA. For each source of DNA, three independent samples were collected (S1, S2 and S3) for testing
and comparison.
0
100000
200000
300000
400000
500000
600000
S1 S2 S3 S1 S2 S3 S1 S2 S3
PCR-FTA PCR-fragmented gDNA Fragmented DNA
SNPs
Num
ber
Calls No Calls
PCR‐FTA PCR‐dgDNA dgDNA
140
Figure 2: Examples of clustering plots showing the
accuracy of calling for SNP rs1013861 on
chromosome 18 using different sources of
DNA. a High call rate for the three PCR-
FTA samples, squares S1, circles S2 and
triangles S3, with the genotype called
correctly. b When using PCR amplified
degraded DNA, there were two correct
calls (S2 and S3) and one no call (S1). (c)
The genotypes of all three samples of
degraded DNA could not be assigned due
to poor call rates. d A typical clustered
SNP clustering pattern in 178 samples
with all genotypes being correctly called.
Norm R, normalised intensity; Norm
Theta, angle of the centre of cluster in
normalised polar coordinates. Dark shaded
area, the call zone for AA (right), AB
(middle) and BB (left) genotypes.
141
Chromosome 18
SNPs
rs10
0005
5
rs10
0440
3
rs10
0839
61
rs10
0839
85
rs10
0889
9
rs10
0981
9
rs10
1036
0
rs10
1044
4
rs10
1194
7
rs10
1386
1
rs10
1534
05
rs10
1546
0
rs10
1636
57
rs10
1637
36
rs10
1638
08
rs10
1640
09
rs10
1725
2
rs10
1998
9
rs10
2159
9
rs10
2214
43
S1
PCR-FTA AB AB AA BB AB AA AA AA AA AB AA BB AA AA BB AB BB AB
PCR-dgDNA AB AB AA BB AB AA AA AA AA AB AA BB AA AA BB AB BB AB
dgDNA BB AA AA
S2
PCR-FTA AB AA AA BB AB AA AB AB BB AA AA AA BB AB AA AA AB AB AB BB
PCR-dgDNA AB AA AA BB AB AA AB AB BB AA AA AA BB AB AA AA AB AB AB BB
gDNA
S3
PCR-FTA AB AB AA AB AB AA AA BB BB AB AA AA BB AA AA AA AA AB AB AA
PCR-dgDNA AB AB AA AB AB AA AA BB BB AB AA AA BB AA AA AA AA AB AB AA
dgDNA AA Figure 3: Examples of correct calls, miscalls and no calls in three samples (S1, S2 and S3) in a comparison between PCR
amplified DNA from blood sample collected on FTA (PCR- FTA), whole genome amplified from degraded DNA (PCR-dgDNA) and degraded genomic DNA (dgDNA). Twenty SNPs on chromosome 18 were randomly selected from the 660,000 SNPs available for all three subjects. At each SNP, the genotypes were either (1) called correctly: see dgDNA genotype of rs10083985 for S1, (2) miscalled: see dgDNA genotype of rs1008899 or (3) not called: see genotype of all three sources of DNA for rs10163736.
142
Figure 4: The Illumina Chromosome Browser (ICB) features a plot of the B allele
frequencies along the chromosome 18 in sample 1. The horizontal axis denotes
the physical position of SNPs (scale in megabases, Mb), and the vertical axis
denotes the estimated the B allele frequency. aNinety six percent of SNPs were
called and genotyped as AA, AB or BB using PCR-FTA as a source of DNA. b
Ninety five percent of SNPs were called and genotyped using PCR-dgDNA as a
source of DNA. c Eighteen percent of SNPs were called and genotyped using
dgDNA. a, b There is a deletion in 55 to 65 Mb, where, in c, due to poor-quality
DNA, the deletion was not obvious.
143
Figure 5: A box plot representing the distribution of log R ratio in all three samples using three different sources of DNA. The log R ratio
provides a measure of the noise in the data. Typical values associated with high-quality data are 0.1 to 0.25. a A log R ratio is
shown using PCR- FTA. b The log R ratio was not as tightly grouped when using PCR-dgDNA. c Good-quality log R ratio was
observed due to a poor DNA quality when using dgDNA.
(A) PCR‐FTA (B) PCR‐dgDNA (C) dgDNA
144
Table 1: Summary of number of “calls” and “no calls”, call rate, allele frequencies for the AA, AB and BB genotypes, minor allele frequency
and percentile of Gen Call on 657,366 Loci for PCR-FTA, PCR-dgDNA and dgDNA.
DNA Sources Sample #No Calls #Calls Call_Rate A/A Freq
A/B Freq
B/B Freq
Minor Freq
50% GC_Score
10% GC_Score
PCR-FTA Loci= 657,366
S1 18,494 542,996 0.9671 0.3312 0.2923 0.3765 0.4773 0.4396 0.2867
S2 17,446 544,044 0.9689 0.315 0.3261 0.359 0.478 0.4396 0.2861
S3 20,156 541,334 0.9641 0.3259 0.3032 0.3709 0.4775 0.4396 0.2861
PCR-dgDNA Loci=657,366
S1 25,007 536,483 0.9555 0.3285 0.2956 0.376 0.4763 0.8741 0.5439
S2 23,455 538,035 0.9582 0.3131 0.327 0.3599 0.4766 0.8795 0.5534
S3 20,041 541,449 0.9643 0.324 0.3055 0.3706 0.4767 0.8915 0.6483
dgDNA Loci=657,366
S1 457,583 103,907 0.1851 0.5549 0.2823 0.1628 0.3039 0.6957 0.2085
S2 218,313 343,177 0.6112 0.2174 0.4559 0.3268 0.4453 0.7994 0.2781
S3 292,182 269,308 0.4796 0.299 0.3517 0.3493 0.4749 0.7853 0.2594
145
Table 2: Bonferroni's multiple test shows that the call rates for genomic DNA extracted from FTA (96.6%) and PCR amplified genomic
DNA (average = 96.0%) are significantly higher when compared to degraded genomic DNA (42.6%) (p = 0.0027).
Bonferroni’s Multiple Comparison Test
Test Mean Difference t Significance (p<0.05) 95% Cl of Difference
FTA-PCR vs PCR-dgDNA 0.006 0.065 NO -0.33 to 0.34
FTA-PCR vs dgDNA 0.540 5.326 YES 0.21 to 0.87
PCR-dgDNA vs dgDNA 0.533 5.260 YES 0.20 to 0.87
ANOVA (one way analysis of variance)
Test Sum of Squares Degrees of Freedom Mean Squares F Ratio p-value
Three DNA templates 0.570 2 0.30 18.68 0.0027
Call Rate 0.090 6 0.02
Total 0.660 8
146
DISCUSSION
DNA collected for SNP analysis needs to be of sufficient quality to ensure high genotype call
rates. Association studies investigating the underlying factors of complex diseases
increasingly require sustainable high-quality DNA resources for large-scale single-nucleotide
polymorphism (SNP) genotyping (Paynter et al., 2006).
While venous blood is often considered the optimal source for DNA, the invasiveness and cost
of obtaining venous blood samples can be prohibitive, especially for large-scale human studies
or those that deal with livestock and wild animals. Additionally, fresh samples collected in the
field may experience degradation before they can be processed. Previous research has shown
that multiple genomic sources, including lymphocytes (Dictor et al., 2007), buccal cells (Milne
et al., 2006), sperm (Yoshihiko and Shin-ichi, 2006) and fingernails (Nakashima et al. 2008),
can be used to generate high-density SNP data provided the DNA sample is of adequate
quality and quantity (Jasmine et al., 2008). The ease of collection, transportation, storage and
protection from degradation of samples stored on FTATM cards provides a possible solution.
McClure et. al. (2009) used DNA extracted from cells on FTATM cards to study SNPs on
Illumina’s I-select Bead Chip which contains 54,122 SNPs (McClure et al., 2009). This study
expands on McClure et al.’s (2009) study and assessed three different sources of DNA as
suitable templates in a genome-wide study (GWAS) using Illumina's human 660W-Quand
Bead Chip, which contains 660,000 SNP markers.
In this study, three different types of DNA templates (PCR-FTA, PCR-dgDNA and dgDNA
see methodology) were used for GWAS. A call rate of greater than 95% may be obtained for
GWAS studies of a good-quality DNA on Illumina’s Infinium Array. On the other hand, poor-
quality DNA such as degraded DNA, can result in low call rates as a result of polymorphisms
that were called erroneously (miss call) or SNPs that were not called (no call). Figure (1)
shows the ratio of “calls” to “no calls” can be highly variable among the three templates. For
instance, degraded DNA (dgDNA) shows a low number of SNP calls, which affected the call
rate (mean of 42.6%), where the use of an amplification step on degraded DNA (PCR-dgDNA
) prior to GWAS improved the call rate (mean of 96.0%). It would appear that the use of
FTATM as a DNA collection method also increased the call rate of the samples (mean of
96.6%).
147
In order for a SNP to be called or genotyped correctly, the SNP should fall in the call zone
(middle of darker shade) of the designated AA, AB or BB regions (see Figure 2). Poor-quality
DNA can result in the SNP falling outside the dark shaded area, which results in a "no call" for
the marker. Where an amplification step was used before GWAS, the SNPs fall within the call
zone and were genotyped correctly. Moreover, when using DNA from FTATM, the highest call
rate results were obtained. This suggests the possibility of using this simple specimen,
relatively inexpensive collection and nucleic and purification technology as a convenient
method of collection and storage of blood samples before embarking on GWAS studies.
A further problem when dealing with poor quality of DNA is the miscalled genotype (or
mistakenly called) effect. Figure (3) shows an example of miss call for SNP rs1008899 in S1
when using degraded DNA. The SNP was genotyped AA, with the call falling outside call
zone and between AA and AB areas. When the sample was amplified and subsequently
genotyped, the SNP called AB. The genotype called was in the middle of the shaded area for
AB. Further, the same SNP from the sample sourced from FTATM confirmed the call was
indeed AB.
One of the advantages of using the Illumina platform is the ability to study the loss of
heterozygosity (or LOH). Figure (4) shows the effect of poor-quality DNA on the call rate.
The result for degraded DNA is scattered throughout the plot, and it is difficult to distinguish
whether the call of the SNP is AA, AB or BB. Whereas in PCR-amplified degraded DNA, the
call rate or efficiency for SNPs improved. The use of DNA sourced from FTATM also gave
rise to a high call rate with SNPs genotyped correctly.
Strategies to recover degraded DNA samples for GWAS analyses have previously been used
(Ballantyne et al., 2007), one of which is based on an amplification step prior to the pre-
amplification step that occurs during the GWAS assay (Ryo et al., 2007). In this study,
10ng/µl of each degraded DNA sample was amplified using Sigma's Genomelex® kit,
followed by a clean-up step performed using Promega's PCR purification kit. This additional
amplification step before the GWAS assay step proper improved the call rates from 19% to
96% in the first sample (S1 in Table 1). The call rates in S2 and S3 also improved to 96%
from 61% and 48%, respectively (see Table 1).
148
Quality control (QC) algorithms for GWAS have been incorporated in the analysis process to
assess, evaluate and guarantee the quality of genotyping. The bead studio analysis software
package provides several convenient QC modules, such as the Box Plot, a useful tool to
quickly visualise the variation within an array and between arrays. A "log of R ratio" provides
a measure of noise in the data. The typical values associated with high-quality data ranges
from 0.1 to 0.25. Figure (5) shows results generated from DNA extracted from FTATM had
the least noise of the three templates. This provides some degree of confidence that DNA
from biological samples collected and stored on this matrix can be used for genome-wide
studies. The p value of 0.0027 obtained from ANOVA shows significant difference between
the three templates. A Bonferroni’s pair-wise comparison was also performed and showed
there were significant differences (Table 2) between both PCR amplified degraded DNA and
PCR amplified DNA from FTATM when compared to degraded DNA. Although the three
samples discussed to this point show a call rate of 96%, analysis was performed across 23
samples with an average call rate of 99% (data not shown) when using DNA from FTATM.
Furthermore, there have been studies that have shown that blood spots on FTATM cards are a
more efficient source of DNA for studying genetic polymorphisms including STR analysis
(Guangyun et al., 2005). DNA from neonatal blood that has been stored over 10 years on
Guthrie cards have been successfully extracted using modified FTATM technology known as
GenSolve for whole genome microarray analysis. In contrast, the traditional procedures of
strong alkali or heat treatment used for DNA extraction compromised the physical and
chemical integrity of nucleic acid (Hardin et al., 2009).
FTATM has received considerable interest from other sectors of bioscience, such as forenscis,
due to its non-invasive and cost-effective means for obtaining DNA in large-scale studies.
FTATM cards have also been shown to be compatible with virtually all cell types (McClure et
al., 2009). While early studies have shown that DNA harvested from FTATM cards were
suitable for genotyping 1,516 SNPs on the Illumina Golden Gate platform and 10,000 SNPs
on the Affymetrix 10 K GeneChip, more recently, FTATM cards have been shown to be
suitable for high-throughput genotyping on the Illumina iSelect platform, which currently
assays up to 200,000 SNPs. McClure et al (2009) concluded that FTATM cards provide an
excellent medium for harvesting DNA from multiple cell types, and that, when assayed using
the Illumina iSelect technology, yield high-genotype call rates and reproducibility, particularly
when the DNA is extracted using the GenSolve kit (McClure et al., 2009). DNA from FTATM
149
cards has been used in Illumina Golden Gate Bead Array Assay in ovarian cancer studies to
assess its performance in multiple displacement association WGA studies performed by
Cunningham et. al. (2008) (Cunningham et al., 2008). In this study, DNA from FTATM was
successfully used on Illumina's chip containing 660,000 SNPs and showed the highest
accurate call rates in comparison to other DNA sources, amplified and not-amplified genomic
DNA.
In conclusion, FTATM cards capture nucleic acid in one easy step. Captured nucleic acid is
ready for downstream applications in less than 30 min. Nucleic acids collected on FTATM
cards are stable for years at room temperature. FTATM cards are stored at room temperature
before and after sample application, reducing the need for laboratory freezers. They are
suitable for virtually any cell type and any genotyping platform. FTATM cards come with a
built-in indicator that changes colour upon sample application to facilitate handling of
colourless samples. They are available in a variety of configurations to meet application
requirements. They have been widely used in the fields of forensics, transgenics, transfusion
medicine, plasmid screening, food and agriculture testing, drug discovery, genomics, STR
analysis, animal identification, diagnostics, pharmacogenomics and molecular biology. Thus,
FTATM cards are a routine and cost-effective technology that provide a simple method for
preservation of biospecimens, amenable to high-throughput DNA extraction, all the attributes
required to undertake successful GWAS in an efficient manner.
150
ACKNOWLEDGEMENTS
Publication number HA09-0003 of the Centre for Forensic Science at the University of
Western Australia, Ms. Alsafar is a Ph.D. scholar at the University of the Western Australia
supported by the Dubai Police General Head Quarters in the United Arab Emirates. Funding
for this project was provided by the Emirates Foundation, and support was also kindly
provided by Ali Ridha, the director of Central Veterinary Research Laboratory (CVRL) in
Dubai, United Arab Emirates.
151
CONFLICT OF INTEREST
All authors declare that they have no conflict of interest.
152
REFERENCES
Ballantyne KN, van Oorschot RA, Mitchell RJ. 2007. Comparison of two whole genome
amplification methods for STR genotyping of LCN and degraded DNA samples.
Forensic Sci Int 166:35-41.
Crabbe MJ. 2003. A novel method for the transport and analysis of genetic material from
polyps and zooxanthellae of scleractinian corals. J Biochem Biophys Methods 57:171-
176.
Cunningham JM, Sellers TA, Schildkraut JM, Fredericksen ZS, Vierkant RA, Kelemen LE,
Gadre M, Phelan CM, Huang Y, Meyer JG, Pankratz VS, Goode EL. 2008.
Performance of amplified DNA in an Illumina GoldenGate BeadArray assay. Cancer
Epidemiol Biomarkers Prev 17:1781-1789.
Dictor M, Skogvall I, Warenholt J, Rambech E. 2007. Multiplex polymerase chain reaction on
FTA cards vs. flow cytometry for B-lymphocyte clonality. Clin Chem Lab Med
45:339-345.
Foster EA, Jobling MA, Taylor PG, Donnelly P, de Knijff P, Mieremet R, Zerjal T, Tyler-
Smith C. 1998. Jefferson fathered slave's last child. Nature 396:27-28.
Gill P, Jeffreys AJ, Werrett DJ. 1985. Forensic application of DNA 'fingerprints'. Nature
318:577-579.
Guangyun S, Ritesh K, Prodipto P, Michael W, Diane S, Hong C, Mei L, Ranajit C, Li J,
Ranjan D. 2005. Whole-genome amplification: relative efficiencies of the current
methods. Legal medicine (Tokyo, Japan) 7:279-286.
Hardin J, Finnell RH, Wong D, Hogan ME, Horovitz J, Shu J, Shaw GM. 2009. Whole
genome microarray analysis, from neonatal blood cards. BMC Genet 10:38.
Harvey ML. 2005. An alternative for the extraction and storage of DNA from insects in
forensic entomology. J Forensic Sci 50:627-629.
153
Jasmine F, Ahsan H, Andrulis IL, John EM, Chang-Claude J, Kibriya MG. 2008. Whole-
genome amplification enables accurate genotyping for microarray-based high-density
single nucleotide polymorphism array. Cancer Epidemiol Biomarkers Prev 17:3499-
3508.
Martins S, Trigo F, Azevedo L, Silva MJ, Guimaraes JE, Amorim A. 2002. Haplotype study
of microsatellites flanking the t(15;17) breakpoint in acute promyelocytic leukemia
patients from North Portugal. Leukemia 16:1353-1357.
McClure M, McKay S, Schnabel R, Taylor J. 2009. Assessment of DNA extracted from
FTA(R) cards for use on the Illumina iSelect BeadChip. BMC Research Notes 2:107.
Milne E, van Bockxmeer FM, Robertson L, Brisbane JM, Ashton LJ, Scott RJ, Armstrong
BK. 2006. Buccal DNA collection: comparison of buccal swabs with FTA cards.
Cancer Epidemiol Biomarkers Prev 15:816-819.
Moscoso H, Thayer SG, Hofacre CL, Kleven SH. 2004. Inactivation, storage, and PCR
detection of Mycoplasma on FTA filter paper. Avian Dis 48:841-850.
Muthukrishnan M, Singanallur NB, Ralla K, Villuppanoor SA. 2008. Evaluation of FTA cards
as a laboratory and field sampling device for the detection of foot-and-mouth disease
virus and serotyping by RT-PCR and real-time RT-PCR. J Virol Methods 151:311-
316.
Nakashima M, Tsuda M, Kinoshita A, Kishino T, Kondo S, Shimokawa O, Niikawa N,
Yoshiura K. 2008. Precision of high-throughput single-nucleotide polymorphism
genotyping with fingernail DNA: comparison with blood DNA. Clin Chem 54:1746-
1748.
Ndunguru J, Taylor NJ, Yadav J, Aly H, Legg JP, Aveling T, Thompson G, Fauquet CM.
2005. Application of FTA technology for sampling, recovery and molecular
characterization of viral pathogens and virus-derived transgenes from plant tissues.
Virol J 2:45.
154
Paynter RA, Skibola DR, Skibola CF, Buffler PA, Wiemels JL, Smith MT. 2006. Accuracy of
Multiplexed Illumina Platform-Based Single-Nucleotide Polymorphism Genotyping
Compared between Genomic and Whole Genome Amplified DNA Collected from
Multiple Sources. Cancer Epidemiology Biomarkers & Prevention 15:2533-2536.
Raina A, Dogra TD. 2002. Application of DNA fingerprinting in medicolegal practice. J
Indian Med Assoc 100:688-694.
Ryo I, Takamitsu T, Chinatsu S, Mitsugi I, Kazunari U. 2007. Simple and rapid detection of
the porcine reproductive and respiratory syndrome virus from pig whole blood using
filter paper. Journal of Virological Methods 141:102.
Tack LC, Thomas M, Reich K. 2007. Automated forensic DNA purification optimized for
FTA card punches and identifiler STR-based PCR analysis. Clin Lab Med 27:183-191.
Tolunay B, Raymond KB, Robert JC. 2006. Zinc Supplementation of Young Men Alters
Metallothionein, Zinc Transporter, and Cytokine Gene Expression in Leukocyte
Populations. Proceedings of the National Academy of Sciences of the United States of
America 103:1699-1704.
Yoshihiko F, Shin-ichi K. 2006. Application of FTAآ® technology to extraction of sperm
DNA from mixed body fluids containing semen. Legal medicine (Tokyo, Japan) 8:43-
47.
155
CHAPTER 5
CHARACTERISATION OF MHC POLYMORPHIC ALU
INSERTIONS (POALIN) IN A POPULATION OF ARAB
BEDOUINS.
This chapter was submitted to Journal of Evolutionary Biology according to the format presented in "Instruction to Authors" from the publishing house.
156
157
Chapter 5
Characterisation of MHC Polymorphic Alu Insertions
(POALIN) in a population of Arab Bedouins.
Chapter 5 describes the distribution of four Alu markers located with the Human Major
Histocompatibility Complex (MHC) in the Bedouin population of the Middle East for the first time. It
expands on work first presented by Dunn et al (Journal of Molecular Evolution. 2002; 55:718-26) and
subsequently in Tissue Antigens. (2007; 70:136-43). Dunn et al (2002, 2007) studied the distribution
of these MHC markers in Caucasians, Northern Eastern Thai, Japanese, Malaysian Chinese and
Southern Africans. The distributions of the MHC markers were compared to the results presented in
these studies by phylogenic analysis. Specifically, it establishes the relationship between Arab
populations and other populations previously studied.
The identification of polymorphisms that are unique to these populations will provide an
opportunity to enhance DNA profiling. Ethnic-specific polymorphisms can be used to profile
biological evidence left at a crime scene to provide information that could be useful in an
investigation. The comparative analysis revealed the genotype frequencies of each of these
markers in Bedouins to be identical to that previously reported for Australian Caucasians
therefore, the Middle East represent a crossroads from which humans populations migrated
toward Asia in the east and Europe to the northwest.
My colleagues and I have prepared this manuscript. I carried out all laboratory work at
Central Veterinary Research Laboratory (CVRL) under the guidance of Dr Khazanehdari. Ms
Pitt optimised the initial PCR conditions for the 4 Alu markers and Mr Ismail provided his
technical assistance. Mr laschi assisted with phylogenetic analysis. Dr Tay guided me
throughout the study from designing the study to proof reading the manuscripts. All the co-
authors have proof read the manuscript.
158
159
Characterisation of MHC Polymorphic Alu Insertions (POALIN) in Arab Bedouins
Population
Habiba S Al Safar1, 2, Alison P Pitt1, Stephen P.A. Iaschi1, Motasem W Ismail3, Kamal A
Khazanehdari3, Guan K Tay1
1 Centre for Forensic Science, The University of Western Australia, Crawley, Australia. 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates. 3 Molecular Biology & Genetics, Central Veterinary Research Laboratory, Dubai, United
Arab Emirates.
Abbreviated title: Major Histocompatibility Complex, POALINS
Keywords: Major Histocompatibility Complex, Polymorphisms, Bedouins, Alu
insertions
Publication number HA09-0002 of the Centre for Forensic Science at the University of
Western Australia
Corresponding author:
Associate Professor Guan K Tay
Centre for Forensic Science
The University of Western Australia
35 Stirling Highway, Crawley WA 6009, AUSTRALIA
Phone: + 61 8 6488 7286
Fax: + 61 8 6488 7285
Email: [email protected]
160
161
ABSTRACT
Polymorphic Alu insertions (POALINS) are widely spread through the human genome, and
have been used in a range of applications, including anthropological analyses of human
populations. For example, the evolutionary relationships between populations of African,
European, and Asian descent have been analyzed by comparing the distribution of specific
POALINS of the major histocompatibility complex (MHC) with those of HLA, complement,
and other polymorphic markers.
In the current study, we have broadened this analysis by focusing on a previously
uncharacterized population, the Bedouin from the Middle East (n = 91). Specifically, we
determined the frequency of individual insertions of four POALINs within the MHC class I
region of this population: AluyMICB, AluyTF, AluyHJ and AluyHF.
We found the genotype frequencies of each of these POALINS in Bedouins to be identical to
that previously reported for Australian Caucasians. For AluyHJ, the highest frequency for
allele*1 was found in Malaysian Chinese, northeastern Thais, Japanese, and Mongolians. The
frequency in Bedouins was similar to that previously reported for Australian Caucasians, each
representing the second highest allele frequency in the current analysis. The African
subpopulations showed a lower frequency of this allele. Phylogenetic analysis of the relative
allele frequencies of AluyHJ in combination with the remaining three POALINs markers
revealed that Bedouins have a similar lineage to Caucasians, at least for the MHC region
studied. The structure of the phylogenetic tree supports the popular contention that humans
originated in Africa. The nature of the clusters suggests that the Middle East represent a
crossroads from which humans populations migrated toward Asia in the east and Europe to the
northwest.
162
INTRODUCTION
The human major histocompatibility complex (MHC) lies on the short arm of chromosome 6,
within a gene-rich region that has been intensively studied. The MHC encodes many genes
that participate in the regulation of the immune system. One of the most striking features of
this region is its high gene density, with many of its component genes having been replicated
to form multigene families [1]. Among the genes within a given family, both single nucleotide
polymorphisms and insertion/deletion elements exist. The clustering of these families and their
highly polymorphic nature has been interpreted to be biologically and evolutionarily
significant, as they are involved in the suppression of recombination events [2]. Consequently,
contemporary MHC haplotypes contain highly specific (haplospecific) sequences. These
haplotypes are preserved over time, resulting in the development of ancestral haplotypes, such
that the sharing of one or more haplotypes between individuals implies that they are related
through a remote but common ancestor [3-7]. More recently, repetitive elements have been
used to refine the definition of MHC ancestral haplotypes, which has allowed the dating of
specific human lineages by evolutionary and anthropological methods [8-11].
One class of repetitive elements, polymorphic Alu insertions (POALINs), are members of an
Alu subfamily that appears to have been inserted into the human genome in relatively recent
evolutionary history [2, 12]. Alu repeats are short stretches of retrotransposable DNA that
were originally characterized by the action of the restriction endonuclease Alu I, which
cleaves double-stranded DNA [13, 14]. POALINs have the ability to copy themselves and
insert into new chromosomal locations, and can be diagnostic at particular genomic regions by
being either present or absent. Because inserted or deleted polymorphisms are genetically
inherited, individuals who share a particular polymorphism are assumed to share a common
ancestor [2]. Because the generation of a new Alu insertion event is rare, POALINs are a
desirable DNA marker for studying the genetic relationships between populations [15-17]. Alu
insertions also allow a large number of screenings to be done simultaneously through a single
polymerase chain reaction (PCR). Specifically, a single pair of PCR primers can generate a
number of different amplification products of a length that can resolved in agarose gels, and
can thereby be analyzed directly for polymorphisms [16].
163
Alu insertions are rarely deleted and, even if a deletion occurs, a signature of the original
insertion is left behind. As a direct result of this, Alu-specific sequences are abundant
throughout the genome, where they promote genetic recombination events that are responsible
for large-scale deletions, duplication and translocations [3, 18-21]. Deletions occur mostly in
AT–rich regions, and have been determined to be unlikely to have been created independently
of the insertion of the Alu elements [22].
In this study, we have focused on four MHC class I POALINs (Fig. 1). AluyMICB is located
with the first intron of the MICB gene, in the beta block. AluyTF is located in the region
between the beta and kappa regions, adjacent to the TFIIH and CDSN genes. The remaining
two POALINs, AluyHJ and AluyHF, lie at the beginning and the end of the alpha block, close
to the HLA-J, and the HLA-G and HLA-F genes, respectively.
The ease with which the POALINs can be genotyped has made them valuable lineage markers
for the study of human population genetics and pedigrees, which has increased our
understanding human diversity and evolution. The four MHC POALINs studied here have
been used in a range of applications, primarily focusing on the anthropological analysis of
human populations [16, 23]. The current study expands on previous analyses of specific
population groups [9, 16, 23-28]. In this paper, we report efforts to define the polymorphisms
of four Alu elements in the class I region of the MHC in a previously unstudied population, the
Bedouins of the Middle East.
164
MATERIALS AND METHODS
Subjects
The study population consisted of 91 healthy, unrelated, Bedouin individuals, each of whom
gave signed, informed consent based on information provided by the ethics committee of the
Dubai police headquarters.
Genomic DNA
After blood was drawn into EDTA tubes, genomic DNA was extracted using the MagNA
Pure LC Total Nucleic Acid Kit (Roche Applied Science, Indianapolis, IN, USA) according to
the recommendations of the manufacturer. Specifically, 300μl of whole blood from each
sample was mixed with 200μl of lysis buffer (50mM Tris pH 8.0, 100mM EDTA, 100mM
NaCl, 1% SDS) to lyse the cell membrane and to release the DNA. The procedure also
included the addition of 40μl of Proteinase K. 100μl of isoproponal was subsequently added
to remove residual amounts of protein. 500μl of Inhibitor Removal Buffer (5M guanidine-
HCl, 20mM Tris-HCl pH 6.6) was then added. The DNA was washed with a buffer (20mM
NaCl; 2mM Tris-HCl; pH 7.5) and centrifuged twice at 2,000 rpm. The DNA was washed
using cold 70% ethanol, centrifuged at 3,000 rpm and the supernatant was discarded, leaving
purified template DNA that was diluted in TE Buffer (1mM EDTA; 10mM Tris-HCl, pH 7.5)
to a concentration of approximately 20ng.μl-1. 2μl to 4μl of DNA was used for each
Polymerase Chain Reaction (PCR) assay.
POALIN PCR assay
The presence or absence of the Alu motif at each of the four loci was determined based on the
predicted size of the PCR product for each of the specific primer pairs designed for each
marker. Table 1 summarizes the primer sequences and annealing temperatures for each
marker. For primers AluyHJ, AluyHF and AluyMICB, the PCR solution (20 μl) contained 80
ng of DNA template, 10 pmol each primer, 25 nmol of each deoxyribonucleotide
triphosphates (dNTPs), 0.4 units of FastStart Taq polymerase (Roche Applied Science,
Indianapolis, IN, USA), 3 mM MgCl2, and 2 μl of 10× PCR buffer (600 mM Tris-HCl, pH 8.3;
250 mM KCl; 1% Triton X100; 100 mM β-mercaptoenthanol). The AluyTF reaction mixture
included 40 ng of DNA template, 5 pmol each primer, 0.4 μl of each dNTP, 0.5 units of
FastStart Taq polymerase, 1μl of 3 mM MgCl2, and 1μl of 10× PCR buffer. PCR was
165
performed using a DNA Engine Tetrad Thermal Cycler (Bio-Rad Laboratories, Hercules, CA,
USA), with a single hot start step at 95°C for 10 min to release the FastStart Taq, A total of 35
cycles were used, each consisting of 30 sec dentaturation at 95°C, a 30 sec annealing step
(59°C for AluyMICB and AluyHF, 55°C for AluyHJ, and 56°C for AluyTF), and an extension
step at 72°C for 45 secs. A final extension step of 72°C for 10 min completed the cycle.The
PCR reaction products were separated on 1.5% agarose gels in Tris-Borate EDTA (TBE) on a
horizontal model 192 gel electrophoresis sub-cell (Bio-Rad Laboratories, Hercules, CA,
USA), which were stained with ethidium bromide.
Genotype Analysis
The PCR assays were designed to detect the presence and absence of the insertion or deletion
characteristic of each of the MHC POALINs. In each case, a larger PCR product band
indicated the presence of the Alu element (referred to as allele*2), while the smaller band
indicated the absence of the insertion (allele*1). Allele frequencies were obtained using the
gene counting method [29]. and were calculated by adding the number of alleles seen in the
study group (e.g. 2*allele*1 for 1, 1 and 1*allele*1 for 1, 2), and then dividing this value by
the total number of alleles present in the sample population (or twice the number of subjects).
The alternative allele (allele*2) is 1-frequency [allele*1] (Table 2).
The estimated genotype frequency of each allele was calculated using the Hardy-Weinberg
equilibrium equation p² + 2pq + q² = 1 . Here, p is defined as the frequency of allele*1 and q
as the frequency of allele*2. The observed and estimated genotype frequencies were
subsequently compared. The frequencies for allele*1 and allele*2 were calculated by squaring
their allele frequency. The frequency for heterozygous alleles was calculated as double the
product of the frequency of allele*1 and the frequency of allele*2. The population was
considered to be in Hardy-Weinberg equilibrium if the observed frequency matched that
predicted by the equation.
Phylogenetic analysis
We used Gendist software, a component of the Phylip program (version 3.69), to compare
Nei's genetic distance values of the Bedouin population compared to eight previously studied
populations. The distance matrix was converted to MEGA format, and a neighbour-joining
phylogenetic tree was constructed in MEGA (version 4) [30]. Bootstrap 1000 replicate, seed =
166
64,238 values were selected to indicate the reliability of the tree topology. DisPan (Genetic
Distance and Phylogenetic) analysis was used to confirm the phylogeny.
167
RESULTS
The POALIN PCR assay results are shown in Fig 2. For each locus, a smaller band
corresponds to the (allele*1), while the larger band represents the allele containing the Alu
insertion (allele*2). Homozygotes for an Alu insertion would thus be expected to show only
one or the other band, with heterozygotes having both.
For example, lanes 2, 3, 4, 6 and 9 show the ALuyMICB assay results for an individual
homozygous for the allele*1, which yields a 502-base-pair band (denoted 1, 1). In contrast, the
single, 604-bp band visible in lane 5 corresponds to an individual homozygous for the
AluyMICB insertion allele (denoted 2, 2). Lanes 1, 7, and 8 show results for an individual who
was heterozygous for the AluyMICB element (denoted 1, 2).
Similarly, in Fig. (2) the 710-base-pair product apparent in lane 1 indicates the subject to be
homozygous for the AluyTF insertion, whereas the single, 422-base-pair product visible in
lanes 2, 3, and 5 to 9 indicates individuals homozygous for the allele*1. Results for a
heterozygous individual are shown in lane 4.
Fig. (2) shows results for AluyHJ; in which a single 501-base-pair indicates the subject to be
homozygous with the AluyHJ insertion (lane 5), a single 163-base-pair indicates a subject
homozygous for the allele*1 (lanes 1, 2, 3, 6, 7, and 9, and the presence of both bands
indicates a heterozygote (lanes 4 and 8).
For AluyHF, the allele*1 yields a 458-base-pair product, with the Alu insertion yielding a 605-
base-pair band. Thus, lane 9 indicates in individual homozygous for the AluyHF insertion,
lanes 1 to 5 represent individuals homozygous for the allele*1, and lanes 6, 7, and 8
individuals heterozygous for the AluyHF insertion.
Genotype frequencies were determined for each locus. For each of the POALINs, the number
of individuals with each genotype, either homozygous for the absence of the element (1, 1),
homozygous for the presence of the element (2, 2), or heterozygous (1, 2) were counted.
Frequencies were then established for each genotype by dividing the number individuals with
that genotype by the total number of individuals in the population. The frequency of observed
168
genotypes, allele frequencies, Hardy-Weinberg significance, and heterozygosity for
AluyMICB, AluyTF, AluyHJ and AluyHF in the Bedouin population are shown on Table 2.
AluyHJ was the POALIN in which allele*2 was most frequent, either in the heterozygous or
homozygous state (0.242), followed by AluyHF (0.225), AluyMICB (0.146), and AluyTF
(0.110). All POALINs were in Hardy Weinberg equilibrium.
Table 3 shows the comparison between the insertion frequencies of the four MHC POALINs
in the Arab Bedouin population and that of previously studied populations. For each of the
four POALINS, the insertion frequencies in the Bedouin population were similar to those in
Australian Caucasian.
Allele frequencies of the four MHC POALINs in nine populations (Table 3) produced the
genetic distance values (Table 4) that were used to construct the phylogenetic tree shown in
Fig. 3. A theoretical outgroup with a frequency close to zero was used to root the tree. Based
on the ancestral form being the root of the tree, the MHC POALIN data indicated that the 4
Asian populations (Malaysian Chinese, Japanese, northeast Thai, and the Mongolian formed a
cluster, while the Australian Caucasian and the Bedouins were separated from both the Asian
cluster and the African subpopulation.
169
kb2000150010005000
AluyMICB AluyTF AluyHJ AluyHF
Telomeric
BAT1
MICB
MICA
HLA
‐BHLA
‐C
CDSN
DDR1
FLOT1
GNL1
HLA
‐EMICC
HLA
‐30
HLA
‐92
TRIM
26
TRIM
31HLA
‐JJM
ICD
HLA
‐AMICF
HLA
‐GMICG
MICE
HLA
‐F
MHC –Class IIIMHC –Class II β block κ block α block
Figure 1: The human Major Histocompatibility Complex (MHC) is approximately 4 mega bases and is located at 6p21.3. It is composed of
three sub regions, class I, class II, and the central MHC region (also known as the MHC class III). The class I region is contained
within a 2,000 kilo base region constituting the telomeric portion of the human MHC. Above is the map of the approximate
locations and of the four polymorphic Alu insertions (POALIN: AluyMICB, AluyTF, AluyHJ and AluyHF), HLA class I loci and
related genes within the MHC Class I region.
170
1 2 3 4 5 6 7 8 9 MW
1,500bp
500bp664bp502bp
‐ve
+ve
1 2 3 4 5 6 7 8 9 MW
1,500bp
500bp
‐ve
+ve
501bp
163bp
1 2 3 4 5 6 7 8 9MW‐ve
+ve
1,500bp
500bp
710bp
422bp
1,500bp
500bp
‐ve
+ve
605bp458bp
1 2 3 4 5 6 7 8 9MWAluyMICB
AluyHJ AluyHF
AluyTF
Figure 2: Gel photograph illustrating the genotypes of nine subjects for the four MHC Class I POALINs studied. PCR assays were designed
to detect the presence and absence of insertion of four POALINs: AluyMICB, AluyTF, AluyHJ and AluyHF. The larger PCR product size for any of the four POALIN represent the presence of the insertion (referred to as allele*2) and the smaller size represent the absence of the insertion (allele*1). For example, in the panel representing the amplification products for AluyTF, an individual who is homozygous for the larger allele*2 (710 basepairs) product containing the insertion is shown in lane 1 (genotype: 2,2). An individual with the heterozygous genotype (1,2) is shown in lane 4. The remaining seven samples were homozygous for the smaller 422 basepairs POALIN product without the insertion (1,1). The same allele convention: allele*1 for the smaller product and allele *2 for larger, is also used for AluyMICB, AluyHJ and AluyHF.
171
Table 1: The primer sequences and the predicted product size of PCR amplified products of the four POALIN loci.
Aluy Loci Primer Name Primer Sequence (5' - 3') Accession
Number Position
Fragment size (bp) Annealing
Temperature allele*1 allele*2
AluyMICB
AluyMICB.F GCC TTC CAA TGC CAT TCA CAG AC006046 38,921 38,941
502 664 59°C
AluyMICB.R CTC AGC CCT GCT TTC CCA TCT AC006046 38,277 38,297
AluyTF
AluyTF.F GTG CCT GGT AAA AAT TTA AGA GCT GTA AC005530 7,150 7,177
422 710 56°C AluyTF.R TGC ACC CGG CCT AAA ACC ACT GGT T AC005530 7,836 7,859
AluyHJ AluyHJ.F AAG AAA CCC ATA ACT CAC TTG AP000519 11,430 11,450
163 501 55°C
AluyHJ.R TGT GTC CAG GTT AAA CTT CAG AP000519 11,909 11,929
AluyHF AluyHF.F GCC TCA TGG CCT GAA TCT GCC AGT GTC CTT AP000521 124,367 124,396
458 605 59°C AluyHF.R GTA ACT GAC GTG CCC TCT ATA GTA TAG TCT AP000521 124,794 124,825
172
Table 2: The frequency of the observed genotypes, allele frequencies, Hardy-Weinberg significance and heterozygosity for
AluyMICB, AluyTF, AluyHJ and AluyHF in the Bedouin population.
Aluy Loci
aGenotypes observed Allele frequencies Chi-
squared p value Heterozygosity n 1,1 1,2 2,2 Allele*1 Allele*2
AluyMICB 89 65 22 2 0.854 0.146 0.007 0.931 0.249
AluyTF 91 70 20 1 0.890 0.120 0.157 0.745 0.196
AluyHJ 91 50 38 3 0.758 0.242 1.758 0.185 0.367
AluyHF 91 53 35 3 0.775 0.225 0.944 0.330 0.349
aGenotypes: 1,1 homozygote absent; 1,2 heterozygote and 2,2 homozygote present
173
Table 3: The allele frequencies of four MHC POALINs in 9 different populations used for genetic distance calculation
Population n POALIN allele*2 frequenciesb,c
Reference AluyMICB AluyTF AluyHJ AluyHF
Bedouins 89-91 0.146 0.110 0.242 0.225
Australian Caucasian 105 0.157 0.107 0.073 0.038 (24)
Japanese 87 0.118 0.083 0.376 0.064 (25)
Malaysian Chinese 50 0.170 0.040 0.300 0.030 (23)
North-Eastern Thai 192 0.117 0.086 0.292 0.018
(28)
Mongolian Khalkh 41 0.378 0.220 0.293 0.098
South African South Eastern Bantu 50 0.030 0.100 0.070 0.090
South African Kung San 42 0.036 0.283 0.107 0.060
South African Sekele San 60 0.050 0.034 0.050 0.083 bPOALIN=polymorphic Alu insertions. cThe alternative allele (allele*1) = 1 - frequency of allele*2
174
Table 4: Genetic distance values from the four POALIN allele frequencies in nine different populations.
Population Genetic distance values
1 2 3 4 5 6 7 8 9
Australian Caucasian -
Japanese 0.0119
North-Eastern Thai 0.0110 0.0027
Chinese 0.0109 0.0035 0.0016
Mongolian Khalkh 0.0267 0.0308 0.0272 0.0234
South African South Eastern Bantu 0.0150 0.0299 0.018 0.0233 0.0526
South African Kung San 0.0272 0.0384 0.0257 0.0367 0.0497 0.0101
South African Sekele San 0.0158 0.0315 0.0191 0.0219 0.0542 0.0013 0.0181
Bedouins 0.0003 0.0149 0.014 0.0143 0.0306 0.0147 0.0273 0.0158
Root 0.0275 0.0392 0.0237 0.0276 0.0678 0.0039 0.0212 0.0020 0.0280
175
Mongolian Khakh
Japanese
North-Eastern Thai
Malaysian Chinese
Australian
Bedouins
South-African Kung San
South-African South Eastern Bantu
South-African Sekele San
Root
0.002
58
36
4880
51
67
85
Figure 3: Phylogenetic relationship of Bedouins and other studies populations using calculated distances based on frequency data from the
four studied POALINs.
176
DISCUSSION
The allelic distribution of the MHC POALINs in different populations is generally less than
0.4. Thus, the Alu insertion frequencies of the MHC POALINs are lower than those of many
other chromosomal POALINs that have been studied in other populations [23]. This has
allowed these markers to afford a closer comparison of specific populations, such as the
African subpopulations, and their similarities and differences to be refined to a more precise
allelic distribution.
In this study, we have applied, to the Bedouin population, four MHC POALIN lineage and
linkage markers that have previously been found to be informative in investigating the
ancestral relationships between other populations. These markers have also been shown to
associate with specific groups of HLA class I alleles, microsatellites, and MHC ancestral
haplotypes, which together may help to better identify variation in linkage disequilibrium and
historical recombination events. For example, according to Dunn et al.[23]. the AluyMIC *2
allele shows a strong association with four different HLA-B alleles: HLA-B13, HLA-B44,
HLA-B48, and HLA-B57. The AluyHJ*2 allele is strongly associated with HLA-A24 and with
HLA-A1. The association of these Aluy insertions and the distribution of HLA alleles suggests
that there may have been recombination between different haplotypes, rather than separate Alu
insertion events in individuals carrying various HLA alleles [9].
The highest allele frequency of any MHC POALIN insertion in the Middle Eastern Bedouin
samples was 0.242, detected for allele*2 of AluyHJ. When compared with other populations,
the AluyHJ allelic distribution in Bedouin individuals was similar to that reported by Dunn et
al. in Australian Caucasians [24]. Furthermore, the relative frequencies of the AluyHJ alleles
places the Japanese, northeastern Thai, Malasian Chinese, and Mongolian Khalkh in a separate
cluster from either the present study population or 3 previously examined African
subpopulations.
Allele*2 of AluyHF had the second highest allele frequency for the Middle Eastern Bedouin
samples. A comparison with data generated by Dunn and co-workers [24] indicates that the
AluyHF allelic distribution in Bedouin individuals was, again, closest to the Australian
Caucasian genotype frequency of 0.038. Similar results were observed for allele*2 of
177
AluyMICB, which had a frequency of 0.146 in Bedouins versus 0.157 in Caucasians.
Moreover AluyTF, with a frequency of (0.110), again presented similarity to the Australian
Caucasian population (0.107). Thus, while the Japanese, northeastern Thai, and Malaysian
Chinese appear to share a similar allele frequency distribution, a distinct frequency of the
AluyTF insertion in African subpopulations has giving rise to a more sparse topology on the
phylogenetic tree (Fig. 3). These data are in accordance with the hypothesis that early humans
originated in Africa, (the “out-of-Africa” hypothesis) with the Middle East having acted as a
crossroads from which populations then migrated east to Asia and to the north west to Europe.
In forensic DNA applications, POALINs are potentially useful DNA markers for population
identification. Specifically, they can complement other markers used in forensic science by
assisting in identifying the racial background of individuals. The results presented in this study
should form a basis for research on further racial subpopulations such, as the Middle Eastern
Bedouin, the larger Middle Eastern population, and others. This in turn may provide a more
accurate and complete forensic population database for the region, and enhance the utitility of
POALINs as a forensic tool in these geographical regions.
It is of interest that there is some coincidence between scientific and population religious
beliefs. For example, according to both Islamic and Christian scriptures, the earth was
completely destroyed during a catastrophic flood and that Noah the prophet, and his family
were the sole survivors to continue the human race. According to the Qur’an (Surah Hud
11:27-51), the present population of the world was descended from Noah's three sons: Shem,
Ham, and Japheth. It is believed that Africans were ancient descendants of Ham, Shem is
considered to be the founder of Arabs and Caucasians and Asians are from Japheth’s
descendants. According to the bible all humans descend from Noah, through his three sons
Shem, Japheth and Ham. Genesis lists seventy descendants of Noah saying: “from these the
nations were spread about in the earth” (Genesis 10:32) one of the many ways in which these
nations have been classified is with references to skin colour. The presence of melanin in skin
of humans providing protection against the elements is believed to be an important trait. Noah
and his three sons all had a measure of this dark pigment. From Shem came the Babylonians,
the Assyrians, the Jews and the Arabs who vary from fair to light brown skin. The descendants
of Japheth, who include the indo European races, vary from light skin to dark brown. As for
Ham some but not all of his descendants had dark skin. The Egyptians, with light-brown skin,
descended from Ham’s son Mizraim. Therefore the bible presents Egypt as the land of Ham
178
(Psalms 78:51;105:23,27;106:22). To unravel the mysteries of these texts and to shed light on
the interracial relationships, research is required.
In summary, based on analysis of the four POALIN markers we have examined here, the
populations we analysed segregate into 3 phylogenetic groups: (1) the Asian subpopulation,
(2) the Bedouins and Caucasians, and (3) the three included African subpopulations. We hope
this study will stimulate further analyses of the Bedouin population, so that we may
understand better both their unique genetic background and the diseases that affect this group
of individuals.
179
ACKNOWLEDGEMENTS
We would like to thank Ali Ridha Director of the Dubai Central Veterinary Research
Laboratory (CVRL) for approving the work carried out for this study in the first instance.
Funding for this project was provided by the CVRL. Ms Alsafar is a PhD scholar at the
University of Western Australia supported by the Dubai Police General Head Quarters in the
United Arab Emirates.
180
REFERENCES
1. Leelayuwat, C., et al., A new polymorphic and multicopy MHC gene family related to
nonmammalian class I. Immunogenetics, 1994. 40(5): p. 339-51.
2. Kulski, J.K., et al., Comparative genomic analysis of the MHC: the evolution of class I
duplication blocks, diversity and complexity from shark to man. Immunol Rev, 2002.
190: p. 95-122.
3. Dawkins, R., et al., Genomics of the major histocompatibility complex: haplotypes,
duplication, retroviruses and disease. Immunol Rev, 1999. 167: p. 275-304.
4. Dawkins, R.L., Martin, E., Andreas-Ziets, A., Keller, Partanen, J., Arnaiz-Villena, A.,
Vicario, J.L. & Alper, C.A., Linkage disequilibrium, interlocus association and
ancestral haplotypes. Immunobiology of HLA, ed. B. Dupont. Vol. Vol I. 1989, New
York: Springer-Verlag. p.891.
5. Dawkins, R.L., Degli-Esposti, M.A., Abraham, L.J., Zhang, W.J. & Christiansen, F.T. ,
Conservation versus polymorphism of the MHC in relation to transplantation, immune
responses and autoimmune disease. Molecular evolution of the major
histocompatibility, ed. J.K.D. Klein. 1991, Heidelberg: Springer-Verlag. p. 391.
6. Degli-Esposti, M.A., et al., Ancestral haplotypes: conserved population MHC
haplotypes. Hum Immunol, 1992. 34(4): p. 242-52.
7. Zhang, W.J., et al., Differences in gene copy number carried by different MHC
ancestral haplotypes. Quantitation after physical separation of haplotypes by pulsed
field gel electrophoresis. J Exp Med, 1990. 171(6): p. 2101-14.
8. Begovich, A.B., et al., Polymorphism, recombination, and linkage disequilibrium
within the HLA class II region. J Immunol, 1992. 148(1): p. 249-58.
9. Dunn, D.S., B.D. Tait, and J.K. Kulski, The distribution of polymorphic Alu insertions
within the MHC class I HLA-B7 and HLA-B57 haplotypes. Immunogenetics, 2005.
56(10): p. 765-8.
10. Skaug, H.J., Allele-sharing methods for estimation of population size. Biometrics,
2001. 57(3): p. 750-6.
11. Wakeley, J., et al., The discovery of single-nucleotide polymorphisms--and inferences
about human demographic history. Am J Hum Genet, 2001. 69(6): p. 1332-47.
12. Buffery, C., et al., Allele frequency distributions of four variable number tandem
repeat (VNTR) loci in the London area. Forensic Sci Int, 1991. 52(1): p. 53-64.
181
13. Batzer, M.A., et al., African origin of human-specific polymorphic Alu insertions. Proc
Natl Acad Sci U S A, 1994. 91(25): p. 12288-92.
14. Jurka, J., et al., Active Alu elements are passed primarily through paternal germlines.
Theor Popul Biol, 2002. 61(4): p. 519-30.
15. Batzer MA, K.G., Richard PE, Shaikh TH, Desselle TD, Hoppens CL, Deininger PL,
Structure and variability of recently inserted Alu family members. Nucleic Acids Res,
1990. 18:6793-8.
16. Dunn, D.S., et al., Polymorphic Alu insertions and their associations with MHC class I
alleles and haplotypes in the northeastern Thais. Ann Hum Genet, 2005. 69(Pt 4): p.
364-72.
17. Walsh, E.C., et al., An integrated haplotype map of the human major histocompatibility
complex. Am J Hum Genet, 2003. 73(3): p. 580-90.
18. Anzai, T., et al., Comparative sequencing of human and chimpanzee MHC class I
regions unveils insertions/deletions as the major path to genomic divergence. Proc
Natl Acad Sci U S A, 2003. 100(13): p. 7708-13.
19. Hedrick, P.W., R.N. Lee, and D. Garrigan, Major histocompatibility complex variation
in red wolves: evidence for common ancestry with coyotes and balancing selection.
Mol Ecol, 2002. 11(10): p. 1905-13.
20. Mungall, A.J., et al., The DNA sequence and analysis of human chromosome 6. Nature,
2003. 425(6960): p. 805-11.
21. Takasu, M., et al., Deletion of entire HLA-A gene accompanied by an insertion of a
retrotransposon. Tissue Antigens, 2007. 70(2): p. 144-50.
22. Callinan, P.A., et al., Alu retrotransposition-mediated deletion. J Mol Biol, 2005.
348(4): p. 791-800.
23. Dunn, D.S., et al., The distribution of major histocompatibility complex class I
polymorphic Alu insertions and their associations with HLA alleles in a Chinese
population from Malaysia. Tissue Antigens, 2007. 70(2): p. 136-43.
24. Dunn, D.S., et al., The association between HLA-A alleles and young Alu dimorphisms
near the HLA-J, -H, and -F genes in workshop cell lines and Japanese and Australian
populations. J Mol Evol, 2002. 55(6): p. 718-26.
25. Dunn, D.S., et al., Association of MHC dimorphic Alu insertions with HLA class I and
MIC genes in Japanese HLA-B48 haplotypes. Tissue Antigens, 2003. 62(3): p. 259-62.
182
26. Yao, Y., et al., Polymorphic Alu insertions and their associations with MHC class I
alleles and haplotypes in Han and Jinuo populations in Yunnan Province, southwest of
China. J Genet Genomics, 2009. 36(1): p. 51-8.
27. Yao, Y., et al., The association between HLA-A, -B alleles and major
histocompatibility complex class I polymorphic Alu insertions in four populations in
China. Tissue Antigens, 2009. 73(6): p. 575-81.
28. Kulski, J.K. and D.S. Dunn, Polymorphic Alu insertions within the Major
Histocompatibility Complex class I genomic region: a brief review. Cytogenet Genome
Res, 2005. 110(1-4): p. 193-202.
29. Ceppellini, R., M. Siniscalco, and C.A. Smith, The estimation of gene frequencies in a
random-mating population. Ann Hum Genet, 1955. 20(2): p. 97-115.
30. Kumar S, T.K., Nei M., MEGA: Molecular Evolutionary Genetics Analysis.
Pennsylvania State Univeristy, 1993. University Park, PA.
183
CHAPTER 6
A GENOME WIDE SEARCH FOR TYPE 2 DIABETES
SUSCEPTIBILITY GENES IN ARAB FAMILIES.
This chapter is a submission to the Human Molecular Genetics and the format is presented as
per the "Instruction to Authors" from the publishing house.
184
185
Chapter 6
A Genome Wide Search for Type 2 Diabetes Susceptibility
Genes in Arab Families.
Chapter 6 was prepared as a manuscript and has been submitted to Human Molecular
Genetics. The aim of the study presented in this manuscript was to identify loci that could
potentially influence susceptibility to Type 2 Diabetes (T2D) in patients of Arab descent within
the United Arab Emirates (UAE) population. Data on DNA haplotypes in the tribes of the
Middle East is limited and recent advances in DNA technology has provided the opportunity
to study this ethnic group. In this specific study high throughput DNA arrays were used to
study Single Nucleotide Polymorphisms (SNPs) and their influence on Type 2 Diabetes among
Arabs.
To date, no genome wide screen genetic factors of Type 2 Diabetes among the UAE
population nor any other Arab populations. Towards this, the first Genome Wide Association
Study in Bedouins was performed on 178 volunteers from DNA repository developed for this
particular study using Illumina's Human 660W-Quad-BeadChip. Work in Caucasians has
previously defined genetic susceptibility regions on Chromosomes 3, 6, 8, 9, 10, 11, 16, and
17. Analysis of data from this study has revealed potential candidate genes on Chromosome
14.
This study revealed some novel genes in the etiology of Type 2 Diabetes in Arab population in
UAE. The strongest associations were found within the PRKD1 region on 14q11 of
chromosome 14. Associations with the genes RBM47, KCTD8, GABRB, SCD5, OC90 and TG
were observed as well. The fact that PRKD1 has not been found in previous studies may
either be due to chance of sampling variation, power differences or may be explicable in terms
of a higher level of genetic and environmental heterogeneity in the other population,
compared with Arab population. To strengthen claims made here, further replication and fine
mapping in a larger cohort of Arab population, especially in Arab descent sample, will be
essential to validate the results presented here.
186
My colleagues and I have prepared this manuscript. I performed all laboratory work at
Central Veterinary Research Laboratory (CVRL) under the guidance of Dr Khazanehdari. I
performed the data analyses with assistance from co-authors. Specifically, Dr Jafer provided
the technical assistance, Dr Jamieson assisted with the statistics analysis. Drs Cordell and
Blackwell provided endless support and advice regarding the statistical methods and analyses.
Dr Tay guided me throughout the study from designing the study to proof reading the
manuscripts. All the co-authors have proof read the manuscript.
187
A Genome Wide Search for Type 2 Diabetes Susceptibility Genes in Arab Families.
Habiba S Al Safar1, 2, Heather J Cordell3,Osman Jafer4, Sarra E Jamieson5, Kamal Khazanehdari4,
Jenefer M Blackwell5, 6,Guan K Tay1
1 Centre for Forensic Science, The University of Western Australia, Crawley Western Australia. 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates. 3 Institute of Human Genetics, Newcastle University, Newcastle upon Tyne, United Kingdom. 4 Molecular Biology and Genetics Laboratory, Central Veterinary Research Laboratory, Dubai,
United Arab Emirates.
5 Telethon Institute for Child Health Research, Centre for Child Health Research, The University
of Western Australia, Subiaco, Western Australia.
6 Cambridge Institute for Medical Research and Department of Medicine, School of Clinical,
Medicine University of Cambridge, Cambridge, United Kingdom.
Corresponding author:
Associate Professor Guan K Tay
Centre for Forensic Science
The University of Western Australia
35 Stirling Highway, Crawley WA 6009, AUSTRALIA
Phone: + 61 8 6488 7286
Fax: + 61 8 6488 7285
Email: [email protected]
188
189
ABSTRACT
Type 2 Diabetes (T2D) is currently the fastest growing debilitating disease in the world. In the
United Arab Emirates (UAE), it has been estimated that one out of five people between the ages
of 20 to 79 lives with this disease. Due to an increasing prevalence of T2D in the region, lifestyle
management strategies with an emphasis on prevention are required. Determining genetic risk
factors can also make an important contribution to understanding the processes leading to disease.
A genome wide association study (GWAS) using a family based association test (FBAT) in an
extended family of 178 members from the UAE (66 diabetic and 112 healthy individuals) were
genotyped using the Illumina Human 660 Quad chip array was undertaken in order to identify
gene(s) and mechanisms associated with disease.
The study revealed 21 new association signals from single nucleotide polymorphisms (SNPs)
within five genes (RBM47, KCTD8, GABRB1, SCD5 and PRKD1). Six SNPs within PRKD1 on
chromosome 14 were found to be most strongly associated with T2D in this Arab population. It
has been suggested that PRKD1, a serine/threonine kinase, plays an important role in insulin
secretion. The strongest statistical evidence for this new association signal was from rs10144903
in intron 1 of PRKD1, with the overall estimate of effect returning an odds ratio of 3.72 (95%
confidence interval, 1.28-10.82; p-value = 3.92E-06). This study is the first GWAS for T2D in
families of Arab descent, and these findings may provide important insights into the pathogenesis
of T2D in Middle Eastern populations. Comparative analysis with other ethnic groups could assist
in dissecting the mechanisms that cause the disease.
190
INTRODUCTION
Diabetes mellitus is a group of metabolic diseases characterised by hyperglycemia resulting from
defects in insulin secretion, insulin action, or both [1, 2]. Diabetes is one of the most prevalent
chronic diseases. It results in significant morbidity and contributes to the death of millions of
people worldwide. Currently, over 170 million people globally suffer from Type 2 Diabetes
(T2D) [3]. Most of these patients are middle aged. However, earlier age-of-onset is becoming
more common as a result of changes in lifestyle and behavioural factors interacting with genetic
predispositions. Ethnicity is also a risk modifier as people of certain ethnic backgrounds are more
likely to develop diabetes than others. It has been reported that African Americans, Hispanic
Americans, American Indians, some Asian Americans and Pacific Islanders are particularly at
high risk for T2D [4], however the genetic factors that account for this observation have yet to be
identified.
As suggested, genetics plays a role in the disease, but exactly how certain genes may cause
diabetes is unknown. An understanding of the genetic basis of T2D could lead to the development
of new treatments to target the problem. With the increase in prevalence of diabetes worldwide,
the need for intensive research is of high priority [5]. Towards this, researchers now have access
to a set of powerful tools that make it possible to find the genetic contributions to common
diseases. Microarray technology has given rise to high throughput and high-density strategies
such as genome-wide association studies (GWAS). Combined with the scaffold data of the human
genome courtesy of the completed HUGO project in 2003 [6] and the International HapMap
Project in 2005 [7], it is now possible to analyse whole-genome samples for genetic variations
that contribute to common disease in a fast and efficient manner.
The use of GWAS has greatly increased the number of confirmed genetic loci for T2D in many
different populations such as Pima Indian [8], Mexican American [9], Amish [10], French [11],
Japanese [12], Iceland [13], Finnish [14], Chinese [15], Korean [15], Caucasians [16-19] and
Swedish [20]. Moreover, some of the mapped loci have been observed to be common across
multiple populations. For example, the single nucleotide polymorphism (SNP) rs7903146 in
191
TCF7L2 gene has been found to be associated with T2D in French, Japanese, Finnish, Irish,
British, Israeli and German populations [11, 13, 17, 19, 21-24]. Other regions, however, may be
unique to specific populations (e.g. rs2237892 in KCNQ1 has been exclusively found in Japanese
population) [12, 15]. This may reflect underlying phenotypic heterogeneity, racial/ethnic
differences in susceptibility allele frequencies, or differences in sample size, study design, and
analytical methods. Understanding the similarities in ethnic specific associations as well as
difference in the genetic make-up of different ethnic groups, particularly for a disease that occurs
globally, is important for unravelling the genetic architecture.
Unlike most major population groups, a lack of research on the Middle East populations has
created a serious gap in understanding the trend of common diseases such as diabetes within these
populations. Compounding the problem is the fact that T2D has become a major public health
problem in the UAE as the level of affluence has increased. Malik et al. (2005) have estimated
that 25% of UAE citizens suffer from T2D [25] and the prevalence of the disease is increasing
[26].
This GWAS was conceived to investigate and identify the genes that may influence susceptibility
to T2D in an Arab family originating from the UAE. The project focussed specifically on an
indigenous Arab population. The characteristics of the Arab population such as high rate of
consanguineous marriages, high birth rate and their life style make them ideal for the study of
complex, polygenic, multifactorial disorders such as T2D. Therefore, to investigate the genetic
factor of T2D in this population, a family based association study (FBAT) in an extended family
of 178 members from UAE, (66 diabetic and 112 healthy individuals) was undertaken using the
Human 660 Quad chip by Illumina.
192
RESULTS
The study cohort comprises 178 individuals from one extended family (319 members) of Arab
descent. The study cohort consisted of 86 males and 92 females, which comprised 66 diabetes
patients and 112 healthy individuals. The age of the study group ranges from 18 to 95 years old
with the mean of 37.35 years and the 95% confidence interval is from 34.33 to 40.17 years old.
Table 1 summarises the basic characteristics of the cohort selected for the GWAS study.
The association p-values (Manhattan plot) from the FBAT analysis are shown in Figure 1. Groups
of SNPs with p-values below a specific threshold (p-value = 1E-4) were examined in detail. The
top scoring SNPs for association with T2D, which were mapped on chromosomes, 4, 8 and 14 are
shown in Table 2. The most significant p-values ranged from 2.7E-05 to 8.46E-06 for six SNPs in
the Protein Kinase D1 (PRKD1) gene on chromosome 14 (Figure 1 and Table 2). The strongest
statistical evidence for a novel association signal was from the SNP rs10144903 in intron 1 of
PRKD1, with the overall estimate of effect returning an odds ratio of 3.72 (95% confidence
interval, 1.28-10.82) (p-value = 3.92E-06) using an additive model. The PRKD1 gene association
has not been reported in any previous study and represents a novel observation. Other SNPs that
showed association with T2D (p-value ≤ 1E-04) include a cluster of SNPs on Chromosome 4
(RBM47 [4p13-p12], KCTD8 [4p13], GABRB1 [4p12], and SCD5 [4q21.22] and Chromosome 8
(OC90 [8q24.22] and TG [8q24]) as summarised in Table 2.
To investigate the association of PRKD1 gene polymorphisms with T2D, we calculated pairwise
LD coefficients, namely D' and r2, for PRKD1 SNPs, for the six associated intronic SNPs:
rs11626603, rs11622611, rs4981716, rs1953722, rs10144903 and rs7154546. Three haplotype
blocks were observed (Figure 2) across the PRKD1 locus, with all six associated SNPs mapping to
the largest LD block 2. The six significant SNPs (rs11626603, rs11622611, rs4981716, rs1953722,
rs10144903 and rs7154546) comprise six haplotypes (AAGAAG, GGAGCA, AAGGAG,
AGGGCG, AAGGCA and GGAGAG ). The haplotypes and their frequencies are shown in Table
4 and Table 5 which illustrates that only two haplotypes occur at any appreciable frequency.
Analysis in UNPHASED indicated that none of the remaining five SNPs were significant when
193
added to a model that included the effect of the most significant SNP rs10144903 (see Figure 3)
i.e. all the association in the region can be accounted by rs10144903. However, given the strong
LD between the SNPs, any of these other five SNPs could equally well account for the observed
association.
Table 3 lists all the previous GWAS and subsequent meta-analyses that have identified risk loci
associated with T2D up to date [18]. No association is detected at these loci in the study described
here, with the exception of the WFS1 and PPP2R2C locus (rs4689388 p-value =0.006). This locus
was previously associated with T2D in a French population [11] at p-value < 1.00E-5 and has
subsequently been confirmed as T2D risk loci in other replication studies [27].
To explore the biological pathways of the PRKD1 gene of interest, we identify significant
networks among the previously known genes associated with PRKD1 pathway, possibly associated
with T2D. Ingenuity™ Pathway Analysis (IPA) generates networks from the dataset of genes that
fall within PRKD1 network (Figure 4). PRKD1 belongs to the protein kinase C family, members of
which function in many extracellular receptor-mediated signal transduction pathways. PRKD1
itself, also known as protein kinase C mu (PRKCM) and protein kinase D (PKD), encodes a
cytosolic serine-threonine kinase that binds to the trans-golgi network and regulates the fission of
transport carriers specifically destined to the cell surface (OMIM:
http://www.ncbi.nlm.nih.gov/omim/605435).
Since almost all of the genes identified in this study had not previously been associated with T2D
in other studies, we were interested to identify the underlying genetic ancestry of this Arab
population compared to other populations for which HapMap data were available. We therefore
compared the genotype data for ancestors in our cohort with genotype data from the CEU,
JPT+CHB and YRI populations using multidimensional scaling (a form of principal components
analysis (PCA)) undertaken using the PLINK software (Figure 5). Scatter plots of the main axes of
variation, PC1 and PC2, show that the Arab population is more closely related to populations of
Europe (Caucasian) descent than to Asian or African descent. However, our Arab data is less well-
clustered than the data from the three HapMap populations, suggesting that there may be some
194
population stratification within this Arab cohort. This was controlled for in our study by using a
family-based study design.
195
Table 1: Description of phenotypic and clinical characteristics of 178 individuals belonging
to one extended family of Arab origin from the UAE.
Total sample size (N) 178
Generations 5
Number of Nuclear Families 41
Gender (number of females) 92
Number of T2D patients 66
T2D Patient: Age (years) 18-87
Normal: Age (years) 18-97
196
Figure 1: p-values for GWAS SNP tested for association with Type 2 Diabetes among 178 individuals belonging to one
extended family of Arab origin from UAE. Horizontal axis shows SNP location and vertical axis is -log10(p-
value) for each SNP tested by FBAT. Blue horizontal line depicts significance threshold (p =1E-4) and shows
associated SNPs clustering in chromosome 4, 8 and 14
197
Table 2: SNPs showing most significant associations with T2D using FBAT analysis under an additive model. Six SNPs within the PRKD1 gene on chromosome 14 are associated with T2D in Arab population. The strongest statistical evidence for association was with rs7154546 in intron 1 of PRKD1, with the overall estimate of effect returning an Z score of 4.45 for the minor allele, with a p-value of 8.46E-06. A cluster of SNPs on Chromosome 4 (RBM47, KCTD8, GABRB1, and SCD5 and Chromosome 8 (OC90 and TG) also showed association with T2D with p-value ≤ 1E-04.
Chr SNP Position Type Risk Allele Za Allele freq p-value Geneb
4 rs10024216 38440507 Unkno A 4.44 0.557 8.74E-06 -
4 rs1871836 40322700 Intron G 4.28 0.341 1.80E-05 RBM47
4 rs7675224 44049621 Intron A 4.14 0.81 3.50E-05
KCTD8 4 rs4407541 44076716 Intron A 4.42 0.693 9.70E-06
4 rs4695718 44107694 Intron A 4.14 0.776 3.50E-05
4 rs13144404 44130442 Intron G 4.14 0.208 3.50E-05
4 rs7692404 45570356 Unkno A 3.93 0.625 8.30E-05 -
4 rs10517178 46797750 Intron G 4.60 0.428 4.19E-06 GABRB1
4 rs1372491 46804117 Intron A 4.60 0.574 4.19E-06
4 rs6535363 83781593 Intron G 4.52 0.747 6.08E-06
SCD5 4 rs6813901 83784174 Intron A 4.01 0.19 6.00E-05
4 rs6822801 83795853 Intron A 4.01 0.19 6.00E-05
8 rs748978 130072298 Unkno G 4.19 0.122 2.70E-05 -
8 rs2202068 133114662 Intron A 4.18 0.836 2.80E-05 OC90
8 rs6998423 134058472 Intron G 3.89 0.405 1.00E-04 TG
14 rs11626603 29264650 Intron G 4.19 0.816 2.70E-05
PRKD1
14 rs11622611 29270280 Intron G 4.31 0.788 1.60E-05 14 rs4981716 29278774 Intron A 4.19 0.183 2.70E-05 14 rs1953722 29300389 Intron G 3.94 0.739 8.00E-05 14 rs10144903 29342060 Intron C 4.61 0.787 3.92E-06 14 rs7154546 29349734 Intron A 4.45 0.165 8.46E-06 aPositive Z values a positive association of minor allele, with disease. bGene information extracted from University of California Santa Cruz (UCSC) Genome Browser.
198
(A)
(B)
Figure 2: Haplotype blocks in PRKD1 generated by Haploview. Three haplotype blocks were
identified in PRKD1. Block 2 contains all six of the associated SNPs (rs11626603,
rs11622611, rs4981716, rs1953722, rs10144903 and rs7154546). (A) Colour
scheme of the LD map is based on the standard D'/LOD option in the Haploview
software. Values contained in the box at the diagonal intersect of two
polymorphisms indicates the D′ values, boxes with no value indicates complete LD
(i.e. D` = 1). (B) r2 values across the PRKD1 region.
199
Table 3: Six possible haplotypes and their frequencies between the six associated SNPs in the PRKD1 region using FBAT.
Haplotype
Frequency p-value
rs11626603 rs11622611 rs4981716 rs1953722 rs10144903 rs7154546
ht 1 A A G A A G 0.721 0.00115
ht 2 G G A G C A 0.200 0.00035
ht 3 A A G G A G 0.046 0.66385
ht 4 A G G G C G 0.013 0.14412
ht 5 A A G G C G 0.013 0.30125
ht 6 G G A G A G 0.007 0.17971
200
Table 4: UNPHASED analysis for a single-point locus of the six associated SNPs in PRKD1 region with their risk allele, chi-
square, odds ratio and 95% confidence interval (low and high).
Marker Allele Chisq p-value Odds-R Confidence
Interval 95% low
Confidence Interval
95% High
rs11626603 G 8.88 0.0028 2.76 1.14 6.70
rs11622611 G 9.57 0.0019 2.90 1.18 7.16
rs4981716 G 8.88 0.0028 0.36 0.14 0.87
rs1953722 G 9.54 0.0020 2.77 1.15 6.66
rs10144903 C 11.88 0.0005 3.72 1.28 10.82
rs7154546 G 10.32 0.0013 0.30 0.10 0.84
201
Figure 3: A locus zoom plot of log10 (p-values) across the PRKD1 region around rs7154546
(red star) within the last intron of PRKD1 gene on chromosome 14 shown to be
strongly associated with T2D in Arab population. The colouring of SNPs indicates
the strength of LD with rs7154546, coded as red (strong, r2 ≥ 0.8), blue (moderate,
0.2 < r2 ≤ 0.4), dark blue (weak, r2 ≤ 0.2). The blue line depicts local recombination
rates.
202
Table 3: Genes showing genome-wide significant association with T2D in previous studies
among different populations, and their p-values in this study. This study
demonstrated an association with WFS1, PPP2R2C.
Gene SNP Population* p-value Reference
TCF7L2
rs7903146 FR, JP, FI, IS,
UK, IL, DE
0.704 [11-14,
17, 19,
21, 22,
,24] rs7901695 0.777
SLC30A8 rs13266634 FR, JP, FI, IS,
UK 0.165
[11-14,
17, 21,
22]
HHEX rs1111875
FR, JP, FI 0.770 [12, 14,
21, 22] rs5015480 0.826
FTO rs8050136
FI, UK 0.250 [14, 17,
21] rs5215 0.456
LOC64673,IRS1 rs2943641 FR 0.254 [11]
WFS1,PPP2R2C rs4689388 FR 0.006 [11,27]
LOC72901,CETN3 rs12518099 FR 0.705 [11]
IGF2BP2 rs4402960 JP, FI 1.000 [12, 14,
21, 73]
MTNR1B rs1387153 FR 0.973 [74]
VEGFA rs9472138 UK 0.334 [19]
CETP rs1800775 FI, SE 0.220 [20]
APOB rs693 FI, SE 0.426 [20]
Intergenic# rs1859962 UK 0.168
[24] rs6712932 UK 0.420
*France (FR), Japan (JP), Finland (FI), Iceland (IS), United Kingdom (UK) mainly Caucasian, Israel (IL), Germany (DE), Sweden (SE). # rs1859962 is located on chromosome 17 and rs6712932 located on chromosome 2.
203
Figure 4: Pathway of PRKD1 generated using Ingenuity Pathway Analysis to identify
networks among the early genes, altered in the PRKD1, associated with Type 2
Diabetes. Gray shaded shapes in PRKD1, GABRB1 and TG genes depict direct or
indirect role in etiology of the T2D.
204
Figure 5: Scatter plot of principal component 1 and principal component 2 for Arab
population (Red) with 3 continental clusters (Europe (Green), Asia (Blue) and
Africa (Black). The Arab population is clearly closer to Europe (Caucasian) than to
Asian and African.
205
DISCUSSION
GWAS studies have been very effective in mapping disease susceptibility genes. Susceptibility
loci for T2D have been mapped in many different populations, some of which have been observed
in multiple populations and some of which are unique to a specific population. To date, there is a
lack of GWAS studies performed on Middle Eastern populations which gives little opportunity to
understand the aetiology of common disease in these populations. In this study, the goal was to
investigate the genes influencing susceptibility to T2D in ethnic groups in the UAE population,
specifically of Arab origin. The GWAS cohort was analysed using FBAT after performing quality
control on the data using PLINK.
The study was conceived to detect SNPs with modest influence on T2D among the Arab
population. However, given the relatively small sample size, we were only well-powered to detect
fairly strong effects, and indeed our most significant finding (in PRKD1) had an allelic OR of
3.72 (95% confidence interval, 1.28-10.82). Our family-based design, consisting of a single large
pedigree, offers the opportunity to detect risk alleles that correlate with disease within the
pedigree. As such, our results may perhaps better be considered as indicative of linkage in the
presence of association rather than of association per se. Previous studies and subsequent meta-
analyses have identified 17 risk loci associated with T2D in various population (Table 3) of which
only one (rs4689388) showed modest replication in the study presented here. Another study
showed that rs7903146 and rs12255372 variants of TCF7L2 have been strongly associated with
T2D risk in most populations [19]. Evidence that this variant in this gene may be associated with
T2D in the study presented here was sought. Unfortunately the p-value was not significant. In
addition, recent studies of rs7903146 variant in Arab populations of Saudi and Emirati origin
reported weak or no association with T2D [28, 29]. However Ereqat et al. (2009) have shown a
significant association of subjects rs7903146 variant of TCF7L2 with T2D in Palestinian
population [30].
206
New association signals at SNPs within 7 genes (OC90, TG, RBM47, GABRB1, SCD5, KCTD8,
and PRKD1). Since this study is the first GWAS for T2D candidates in families of Arab descent,
these findings may provide new insights into the pathogenesis of T2D.
One SNP in Otoconin-90 (OC90) gene was positive for association in this study. OC90 encodes
the predominant protein constituent of vestibular otoconia [31]. To date neither the functions of
otoconial proteins nor the process of otoconia genesis are clearly defined. OC90 is the major
protein component of otoconia with sequence (but most likely not functional) homology to
phospholipase A2 [32]. OC90 accounts for 90% of the total otoconial protein which renders the
receptor cells of the vestibular system [31]. In addition otoconia is a key element of the inner ear,
which is responsible for the perception of motion and gravity. Given that Diabetes Mellitus is a
disorder of glucose metabolism, it can be linked with vestibular dysfunction by neuropathy or
nerve damage, which is a common complication in T2D. In 2008, Bainbridge and colleagues
found an increased prevalence of hearing impairment among patients with diabetes [33]. The
study indicated that diabetes may lead to hearing loss by damaging the nerves and blood vessels
of the inner ear. This study suggests for further exploration in OC90 in T2D patients with auditory
neuropathy to study whether there is a direct cause or relationship effect. Despite intensive study,
the mechanism of otoconia formation is still a matter of debate.
A second series of SNPs with association signals of interest was found in thyroglobulin (TG). TG
encodes the glycoprotein precursor to the thyroid hormones T3 (triiodothyronine) and T4
(tetraiodothyronine). Dumont et al. (1989) noted that thyroglobulin provides three things: a
thyroid hormone precursor, storage of iodine, and storage of inactive thyroid hormones [34].
Further evidence for the association of TG with diabetics comes from various data that shows a
strong genetic influence on the shared susceptibility to Type 1 Diabetes (T1D) and autoimmune
thyroid disease (AITD) [35-37]. Most of the genes that contribute to the joint susceptibility to
T1D and AITD are involved in immune regulation. Huber et al. (2008) suggested that the
association of AITD with T1D is influenced by HLA [37]. In addition to this study, adult T1D
patients with no history of thyroid disease showed a notably higher thyroid volume in diabetics
than in age and sex matched controls [35].
207
RBM47 is another gene found in this study to be associated with T2D. The gene encodes a RNA
binding protein, which is a key element in RNA metabolism, regulating the temporal, spatial and
functional dynamics of RNAs [38]. Recent genetic and proteomic information and evidence from
animal models reveal that RNA binding proteins are involved in many human diseases [39-41].
However there is no compelling functional evidence for the association between SNPs in RBM47
and T2D. Nevertheless future studies defining the expression, RNA targets and protein
interactions of RBM47 in relevant tissues, as well as characterisation of the metabolism in the
RBM47 knock-out mouse may provide additional clues. Moreover, resequencing may be
necessary to identify causal variants in RBM47 and might support the direct involvement of
RBM47 in T2D.
The data presented here shows two SNPs positive for association in GABRB1 gene (Gamma-
aminobutyric acid receptor 1). Gamma-amino butyric acid (GABA) receptors are a family of
proteins involved in neurotransmission in the mammalian central nervous system [42] and in the
inhibition of glucagon release mediated by β cells [43]. Baily et al. (2007) showed that the
released GABA receptor from pancreatic β cells inhibits the secretion of glucagon by 50% to 60%
in both pancreatic mouse islets and murine alpha TC1-9 cell. The authors showed that the
inhibition depends on glucose concentration. The overall inhibition effect of GABA with 5 or 10
mmol/l glucose on glucagon release is 15% or 40% respectively [44]. They have also shown that
glucose dose dependently increased the expression of GABA receptors.
The over expression of stearoyl-Co desaturase 1 (SCD1) gene, one of the other genes that we have
found containing 3 SNPs positive for association (Table 2) reduces tyrosine and serine
phosphorylation of IRS1 (Insulin receptor substrate 1) and Akt/protein kinase B respectively and
is sufficient to impair glucose uptake and insulin signalling [45]. Miyazaki et al (2009) showed
that Scd1 deficiency improved insulin sensitivity in leptin-resistant A y/a and diet-induced obese
(DIO) mice [46]. Increase in whole body glucose tolerance and insulin sensitivity has been shown
on various tissues of Scd1-/- mice [47, 48].
The most interesting of the associations identified was with PRKD1. This gene is suggested to
play a role in insulin secretion. PRKD1, PKD2 and PKD3 constitute the recently identified PKD
208
family, a sub class of the AGC family of serine/threonine kinases, with structural and
enzymological properties different from those of PKC family [49, 50]. PRKD1 is composed of
different domains: a N-terminal region, two cysteine-rich zinc-finger regions, a region rich in
negatively charged amino acids, a pleckstrin-homology domain and a Ser/Thr kinase catalytic
domain [51, 52]. PRKD1 can be activated by growth factors, oxidative stress, thrombin, bioactive
lipids, cross-linking of B- and T-cell receptors and some G-protein coupled receptors (GPCR).
Previous biological studies on the PKD1 gene support its role in insulin secretion. Sumara et al.
(2009) reported that mice which do not have mitogen-activated protein kinase (MAPK) p38δ
exhibit better glucose tolerance because of enhanced insulin secretion from the β cells of the
pancreas [53]. Furthermore they showed that the protein kinase D (PKD) is vital in monitoring the
level of insulin secretion by the pancreatic β cells. These data imply that the absence of p38δ
upgrades glucose tolerance and improves insulin secretion by a direct and β cell-specific system.
It also validates the negative regulatory function for p38δ in stimulated insulin secretion by the
inhibition of PRKD1 and control of exocytosis. Furthermore, immoderate inhibition of PKD
activity by p38δ can also lead to malfunctioning of the β cells in diabetic patients. The study also
suggests that artificially induction the β cells to secrete insulin through medication, eventually
results in failure of these pancreatic cells. It also recommends that therapies should include an
insulin tropic effect along with measures to resist failure of the β cells. The finding of Sumara et
al. (2009) [53] suggests that the signalling module of p38δ and PRKD1 may be a potential
therapeutic target for human diabetes.
In addition to the observations of Sumara et al. (2009), it has also been shown that a family
member of protein kinase, protein kinase C acts as alternative mediator of insulin induced glucose
transport [54]. This suggestion comes from Cross and Franke et al. (1995) whose work showed
active Akt stimulate glucose uptake in adipocytes [55], however inhibition of Akt does not
completely block insulin effect on glucose transport [56]. In the pancreas, ATP sensitive
potassium channel has also been shown to play a key role in insulin release in response to
changing glucose levels [57, 58].
A 30% decrease of Na+, K+-ATPase activity has been shown in red blood cells (RBC) from
diabetic patients compared to control individuals [59]. Greene et al. (1987) found a decrease in
209
Na+, K+-ATPase activity due to alteration of PKC activity [60]. The group found an association
of RBC between Na+, K+-ATP activity and plasma C-peptide concentration among T2D patients
[59]. In the study described here four significant SNPs in KCTD8 gene (potassium channel
tetramerisation domain containing 8, see Table 2) was identified. It has been suggested that
potassium channels may play a role in GABAergic activity during hypoglycemia [61, 62]. Chan et
al (2007) showed K+ channel in ventromedial hypothalamus (VMH) a region that contains
glucose responsive neurons can modulate the magnitude of counter regulatory responses by
altering release of GABA [63]. Various studies have shown that expression of GABA receptors are
affected due to depolarising concentration of K+ [64] cAMP [65] and MAPK [66].
Pathway analysis showed interconnections among the three genes in PKC (PRKD1, GABRB1 and
TG). This is not unexpected for a disease that is a known to be multifactorial and for which the
mechanism is likely to require the involvement of a number of genes. Future study of these genes
might shed light into the aetiology of the disease. In addition to the unique SNPs identified in our
population we have analysed the 17 risk loci that have been previously reported to be associated
with T2D in different populations (Table 3). Interestingly, the analysis performed confirmed that
the rs4689388 (between WFSI and PPP2R2c) is associated in the Arab population with a p-value
= 0.006. This SNP was previously reported to be associated with T2D among a large sample of
French population [11]. Further studies in larger cohorts will be needed to strengthen the p-value
and replicate the association with T2D.
In conclusion, this study identified variation at PRKD1 on [14q11] as being associated with Type
2 Diabetes (T2D) in Arab population in UAE. Association at the genes RBM47, KCTD8, GABRB,
SCD5, OC90 and TG was also observed. The mechanism by which these genes increase or disease
susceptibility remains to be determined. These findings predict a set of candidate genes to be
evaluated in-depth in the future studies. The fact that PRKD1 has not been found in previous
studies may either be due to chance of sampling variation, power differences or may be explicable
in terms of a higher level of genetic and environmental heterogeneity in the other population,
compared with Arab population. To strengthen the claims made here, further replication and fine
mapping in a larger cohort of Arab population samples will be essential to validate the results
presented here.
210
MATERIALS AND METHODS
Subjects
A total of 319 individuals belonging to one extended family of Arab origin were identified during
their routine visit to clinics in the UAE. Multi-generation family relationships were compiled for
these individuals, allowing a five-generation extended family pedigree to be constructed
containing 41 nuclear families. A total of 178 individuals from this pedigree agreed to participate
in this study (86 males, 92 females and 66 diabetic, 112 healthy). Clinical assessment and
questionnaire completion were conducted at the clinic. An individual was classified as T2D if the
subject was: (1) diagnosed with T2D by a qualified physician, (2) on a prescribed drug treatment
regimen for T2D and (3) returned biochemical test results of a fasting plasma glucose level of at
least 126mg/dl as based on the criteria laid by the World Health Organization (WHO)
consultation group report [67]. Each individual provided signed, informed consent based on
information provided by the ethics committee of the United Arab Emirates Ministry of Health.
DNA Extraction
After blood was drawn into EDTA tubes, genomic DNA was extracted using a Nucleic Acid Kit
(Roche Applied Science, Indianapolis, IN, USA) according to the recommendations of the
manufacturer. Briefly, 300μl of whole blood from each sample was mixed with 200μl of lysis
buffer (50mM Tris pH 8.0, 100mM EDTA, 100mM NaCl, 1% SDS) and 40μl of Proteinase K.
100μl of isoproponal and 500μl of Inhibitor Removal Buffer (5M guanidine-HCl, 20mM Tris-HCl
pH 6.6) was subsequently added. The DNA was washed with a buffer (20mM NaCl; 2mM Tris-
HCl; pH 7.5) and centrifuged twice at 2,000 rpm. The DNA was washed using cold 70% ethanol,
centrifuged at 3,000 rpm and the supernatant was discarded, leaving a pellet that contained
purified genomic DNA. The DNA pellet was diluted in TE buffer (1mM EDTA; 10mM Tris-HCl,
pH 7.5) to a concentration of approximately 50ng.μl-1.
Genotyping
Genotyping using the Infinium Human 660 Quad Chip I-Scan (Illumina Inc. San Diego, USA),
which contained 670,901 SNPs, was performed according to the manufacturer’s recommendations
211
(Illumina Inc., San Diego, USA). Whole-genome amplification was performed using 200ng of
genomic DNA at 37°C for 20 to 24 hours using reagents provided by Illumina (Illumina Inc., San
Diego, USA). Products were fragmented, precipitated, and resuspended in a proprietary
hybridisation buffer (Illumina Inc., San Diego, USA). The resuspended samples were denatured at
95°C for 20 min and loaded on Illumina Bead Chips. The chips were placed in a hybridisation
chamber for 16 to 20 hours at 48°C. After hybridisation, non-hybridised DNA was washed away.
An allele-specific single-base extension of the oligonucleotides on the BeadChip was performed
in a 48-position Slide Chamber Rack (Illumina Inc., San Diego, USA), using labelled
deoxynucleotides and the captured DNA as a template. After staining of the extended DNA,
BeadChips were washed and scanned with I-Scan (Illumina Inc., San Diego, USA), and raw data
was generated by BeadStudio 3.0 software (Illumina Inc. San Diego, USA).
Quality control (QC)
Genetic integrity of the pedigree was checked using the PedCheck software package [68]. Data
cleaning was performed using the PLINK software developed by Purcell et al (2007) [69]. The
average call rate was 98.99% for all the subjects. SNPs were excluded from the analysis based on
the following criteria: (1) minor allele frequency (MAF) < 0.05, (2) missingness per SNP > 5%,
(3) significant (p-value < 1.0 E-06) deviation from the Hardy-Weinberg equilibrium, (4)
Mendelian error, individuals with > 5% of Mendelian error within the family and SNPs with >
10% were checked and no one was excluded. Approximately 70% of SNPs passed QC and were
used in the association analysis.
Data Analysis
We analysed the association between individual SNPs and disease trait (T2D) using the family-
based association test (FBAT) [70]. FBAT was used in this study to test for transmission rates of
marker alleles from heterozygous parents to affected offspring under the null hypothesis of no
association and no linkage. Odds ratio and confidence interval for associated SNPs were
calculated using UNPHASED [71]. With results displayed as Manhattan plots generated from
Haploview v4.1 [72]. Subsequently haplotypes of the PRKD1 gene were also analysed using
FBAT using the HBAT function of FBAT. The advantage of the FBAT method is that it permits
the analysis of large extended family pedigree. The FBAT software divides pedigrees into
212
individual nuclear families. Biallelic tests were performed using a dominant genetic model. LD
(without taking into account familial correlations) was determined using Haploview v4.1 [72].
We investigated interactions of the associated genes using the Ingenuity™ Pathway Analysis
(IPA) Ingenuity Systems Inc., Redwood City, CA). IPA is a powerful web-based software
application that uses expert compilation of molecular biology data derived from the literature and
many public databases, e.g., OMIM, MGI and NCBI Gene to identify specific biological
pathways behind each gene and enables the visualisation and analysis of direct and indirect
interactions among genes of interest. In this study we started with a list of genes of interest to
analyse the common and distinct properties of these genes and how they relate to one or another.
IPA generated networks where the gene of interest can be related according to previously known
associations between genes or proteins.
213
ACKNOWLEDGMENT
Publication number HA010-007 of the Centre for Forensic Science at the University of Western
Australia. We gratefully acknowledge the contribution of participating family members whose
cooperation made this study possible. We also would like to thank Richard Francis at Telethon
Institute for Child Health Research for his specific technical support that has allowed for the
statistical work to be carried out for this study. Part of the data analysis was performed on the
advanced computing resources provided by the Western Australian Advanced Computing
Consortia (iVEC). Habiba Alsafar is a PhD scholar at the University of Western Australia
supported by the Dubai Police General Head Quarters in the United Arab Emirates. Funding for
this project was provided in part by CVRL and the Emirates Foundation.
214
CONFLICT OF INTEREST
All the authors declare no conflict of interest.
215
REFERENCES
1. Leslie, R.D., Metabolic changes in diabetes. Eye (Lond), 1993. 7 ( Pt 2): p. 205-8.
2. Stumvoll, M., B.J. Goldstein, and T.W. van Haeften, Type 2 diabetes: principles of
pathogenesis and therapy. Lancet, 2005. 365(9467): p. 1333-46.
3. International Diabetes Federation, Diabetes Atlas, 3rd ed, 2006.
4. Lyssenko, V., et al., Mechanisms by which common variants in the TCF7L2 gene increase
risk of type 2 diabetes. J Clin Invest, 2007. 117(8): p. 2155 - 2163.
5. Frayling, T.M., Genome-wide association studies provide new insights into type 2 diabetes
aetiology. Nat Rev Genet, 2007. 8(9): p. 657-62.
6. HUGO--a UN for the human genome. Nat Genet, 2003. 34(2): p. 115-6.
7. Thorisson, G.A., et al., The International HapMap Project Web site. Genome Res, 2005.
15(11): p. 1592-3.
8. Hanson, R.L., et al., A search for variants associated with young-onset type 2 diabetes in
American Indians in a 100K genotyping array. Diabetes, 2007. 56(12): p. 3045-52.
9. Hayes, M.G., et al., Identification of type 2 diabetes genes in Mexican Americans through
genome-wide association studies. Diabetes, 2007. 56(12): p. 3033-44.
10. Rampersaud, E., et al., Identification of novel candidate genes for type 2 diabetes from a
genome-wide association scan in the Old Order Amish: evidence for replication from
diabetes-related quantitative traits and from independent populations. Diabetes, 2007.
56(12): p. 3053-62.
11. Rung, J., et al., Genetic variant near IRS1 is associated with type 2 diabetes, insulin
resistance and hyperinsulinemia. Nat Genet, 2009. 41(10): p. 1110-5.
12. Takeuchi, F., et al., Confirmation of multiple risk Loci and genetic impacts by a genome-
wide association study of type 2 diabetes in the Japanese population. Diabetes, 2009.
58(7): p. 1690-9.
13. Steinthorsdottir, V., et al., A variant in CDKAL1 influences insulin response and risk of
type 2 diabetes. Nat Genet, 2007. 39(6): p. 770-5.
14. Scott, L.J., et al., A genome-wide association study of type 2 diabetes in Finns detects
multiple susceptibility variants. Science, 2007. 316(5829): p. 1341-5.
216
15. Yasuda, K., et al., Variants in KCNQ1 are associated with susceptibility to type 2 diabetes
mellitus. Nat Genet, 2008. 40(9): p. 1092-7.
16. Florez, J.C., et al., A 100K genome-wide association scan for diabetes and related traits in
the Framingham Heart Study: replication and integration with other genome-wide
datasets. Diabetes, 2007. 56(12): p. 3063-74.
17. Timpson, N.J., et al., Adiposity-related heterogeneity in patterns of type 2 diabetes
susceptibility observed in genome-wide association data. Diabetes, 2009. 58(2): p. 505-10.
18. Voight, B.F., et al., Twelve type 2 diabetes susceptibility loci identified through large-
scale association analysis. Nat Genet. 42(7): p. 579-589.
19. Zeggini, E., et al., Meta-analysis of genome-wide association data and large-scale
replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet, 2008.
40(5): p. 638-45.
20. Saxena, R., et al., Genome-wide association analysis identifies loci for type 2 diabetes and
triglyceride levels. Science, 2007. 316(5829): p. 1331-6.
21. Zeggini, E., et al., Replication of genome-wide association signals in UK samples reveals
risk loci for type 2 diabetes. Science, 2007. 316(5829): p. 1336-41.
22. Sladek, R., et al., A genome-wide association study identifies novel risk loci for type 2
diabetes. Nature, 2007. 445(7130): p. 881-885.
23. Scott, L.J., et al., Association of transcription factor 7-like 2 (TCF7L2) variants with type
2 diabetes in a Finnish sample. Diabetes, 2006. 55: p. 2649 - 2653.
24. Salonen, J.T., et al., Type 2 diabetes whole-genome association study in four populations:
the DiaGen consortium. Am J Hum Genet, 2007. 81(2): p. 338-45.
25. Malik, M., et al., Glucose intolerance and associated factors in the multi-ethnic
population of the United Arab Emirates: results of a national survey. Diabetes Res Clin
Pract, 2005. 69(2): p. 188-95.
26. Wild, S., et al., Global prevalence of diabetes: estimates for the year 2000 and projections
for 2030. Diabetes Care, 2004. 27(5): p. 1047-53.
27. Sandhu, M.S., et al., Common variants in WFS1 confer risk of type 2 diabetes. Nat Genet,
2007. 39(8): p. 951-3.
28. Alsmadi, O., et al., Weak or no association of TCF7L2 variants with Type 2 diabetes risk
in an Arab population. BMC Medical Genetics, 2008. 9(1): p. 72.
217
29. Saadi, H., et al., Association of TCF7L2 polymorphism with diabetes mellitus, metabolic
syndrome, and markers of beta cell function and insulin resistance in a population-based
sample of Emirati subjects. Diabetes Res Clin Pract, 2008. 80(3): p. 392 - 398.
30. Ereqat, S., et al., Association of a common variant in TCF7L2 gene with type 2 diabetes
mellitus in the Palestinian population. Acta Diabetologica, 2009.
31. Pote, K.G. and M.D. Ross, Each otoconia polymorph has a protein unique to that
polymorph. Comp Biochem Physiol B, 1991. 98(2-3): p. 287-95.
32. Wang, Y., et al., Otoconin-90, the mammalian otoconial matrix protein, contains two
domains of homology to secretory phospholipase A2. Proc Natl Acad Sci U S A, 1998.
95(26): p. 15345-50.
33. Bainbridge, K.E., H.J. Hoffman, and C.C. Cowie, Diabetes and hearing impairment in the
United States: audiometric evidence from the National Health and Nutrition Examination
Survey, 1999 to 2004. Ann Intern Med, 2008. 149(1): p. 1-10.
34. Dumont, J.E., et al., Transducing systems in the control of human thyroid cell function,
proliferation and differentiation. Adv Exp Med Biol, 1989. 261: p. 357-72.
35. Bianchi, G.P., et al., Thyroid involvement in patients with active inflammatory bowel
diseases. Ital J Gastroenterol, 1995. 27(6): p. 291-5.
36. Hansen, D., et al., Thyroid function, morphology and autoimmunity in young patients with
insulin-dependent diabetes mellitus. Eur J Endocrinol, 1999. 140(6): p. 512-8.
37. Huber, A., et al., Joint genetic susceptibility to type 1 diabetes and autoimmune
thyroiditis: from epidemiology to mechanisms. Endocr Rev, 2008. 29(6): p. 697-725.
38. Glisovic, T., et al., RNA-binding proteins and post-transcriptional gene regulation. FEBS
Lett, 2008. 582(14): p. 1977-86.
39. Crawford, T.O. and C.A. Pardo, The neurobiology of childhood spinal muscular atrophy.
Neurobiol Dis, 1996. 3(2): p. 97-110.
40. Darnell, R.B. and J.B. Posner, Paraneoplastic syndromes involving the nervous system. N
Engl J Med, 2003. 349(16): p. 1543-54.
41. Garber, K.B., J. Visootsak, and S.T. Warren, Fragile X syndrome. Eur J Hum Genet, 2008.
16(6): p. 666-72.
42. Erdo, S.L. and J.R. Wolff, gamma-Aminobutyric acid outside the mammalian brain. J
Neurochem, 1990. 54(2): p. 363-72.
218
43. Rorsman, P., et al., Glucose-inhibition of glucagon secretion involves activation of
GABAA-receptor chloride channels. Nature, 1989. 341(6239): p. 233-6.
44. Bailey, J.E. and D.J. Nutt, GABA-A receptors and the response to CO(2) inhalation - a
translational trans-species model of anxiety? Pharmacol Biochem Behav, 2008. 90(1): p.
51-7.
45. Voss, M.D., et al., Gene expression profiling in skeletal muscle of Zucker diabetic fatty
rats: implications for a role of stearoyl-CoA desaturase 1 in insulin resistance.
Diabetologia, 2005. 48(12): p. 2622-30.
46. Miyazaki, M., et al., Stearoyl-CoA desaturase-1 deficiency attenuates obesity and insulin
resistance in leptin-resistant obese mice. Biochem Biophys Res Commun, 2009. 380(4): p.
818-22.
47. Flowers, J.B., et al., Loss of stearoyl-CoA desaturase-1 improves insulin sensitivity in lean
mice but worsens diabetes in leptin-deficient obese mice. Diabetes, 2007. 56(5): p. 1228-
39.
48. Rahman, S.M., et al., Stearoyl-CoA desaturase 1 deficiency elevates insulin-signaling
components and down-regulates protein-tyrosine phosphatase 1B in muscle. Proc Natl
Acad Sci U S A, 2003. 100(19): p. 11110-5.
49. Li, J., et al., The role of protein kinase D in neurotensin secretion mediated by protein
kinase C-alpha/-delta and Rho/Rho kinase. J Biol Chem, 2004. 279(27): p. 28466-74.
50. Yaney, G.C., et al., Potentiation of insulin secretion by phorbol esters is mediated by
PKC-alpha and nPKC isoforms. Am J Physiol Endocrinol Metab, 2002. 283(5): p. E880-
8.
51. Valverde, A.M., et al., Molecular cloning and characterization of protein kinase D: a
target for diacylglycerol and phorbol esters with a distinctive catalytic domain. Proc Natl
Acad Sci U S A, 1994. 91(18): p. 8572-6.
52. Van Lint, J.V., J. Sinnett-Smith, and E. Rozengurt, Expression and characterization of
PKD, a phorbol ester and diacylglycerol-stimulated serine protein kinase. J Biol Chem,
1995. 270(3): p. 1455-61.
53. Sumara, G., et al., Regulation of PKD by the MAPK p38delta in insulin secretion and
glucose homeostasis. Cell, 2009. 136(2): p. 235-48.
219
54. Kotani, K., et al., Requirement of atypical protein kinase clambda for insulin stimulation
of glucose uptake but not for Akt activation in 3T3-L1 adipocytes. Mol Cell Biol, 1998.
18(12): p. 6971-82.
55. Cross, D.A., et al., Inhibition of glycogen synthase kinase-3 by insulin mediated by protein
kinase B. Nature, 1995. 378(6559): p. 785-9.
56. Franke, T.F., et al., Direct regulation of the Akt proto-oncogene product by
phosphatidylinositol-3,4-bisphosphate. Science, 1997. 275(5300): p. 665-8.
57. Meglasson, M.D. and F.M. Matschinsky, Pancreatic islet glucose metabolism and
regulation of insulin secretion. Diabetes Metab Rev, 1986. 2(3-4): p. 163-214.
58. Cook, D.L., et al., ATP-sensitive K+ channels in pancreatic beta-cells. Spare-channel
hypothesis. Diabetes, 1988. 37(5): p. 495-8.
59. De La Tour, D.D., et al., Erythrocyte Na/K ATPase activity and diabetes: relationship with
C-peptide level. Diabetologia, 1998. 41(9): p. 1080-4.
60. Greene, D.A., et al., Role of sorbitol accumulation and myo-inositol depletion in
paranodal swelling of large myelinated nerve fibers in the insulin-deficient spontaneously
diabetic bio-breeding rat. Reversal by insulin replacement, an aldose reductase inhibitor,
and myo-inositol. J Clin Invest, 1987. 79(5): p. 1479-85.
61. During, M.J., et al., Glucose modulates rat substantia nigra GABA release in vivo via
ATP-sensitive potassium channels. J Clin Invest, 1995. 95(5): p. 2403-8.
62. Margaill, I., et al., KATP channels modulate GABA release in hippocampal slices in the
absence of glucose. Fundam Clin Pharmacol, 1992. 6(7): p. 295-300.
63. Chan, O., et al., ATP-sensitive K(+) channels regulate the release of GABA in the
ventromedial hypothalamus during hypoglycemia. Diabetes, 2007. 56(4): p. 1120-6.
64. Ives, J.H., D.L. Drewery, and C.L. Thompson, Neuronal activity and its influence on
developmentally regulated GABA(A) receptor expression in cultured mouse cerebellar
granule cells. Neuropharmacology, 2002. 43(4): p. 715-25.
65. Brinton, R.D., R.H. Thompson, and E.A. Brownson, Spatial, cellular and temporal basis
of vasopressin potentiation of norepinephrine-induced cAMP formation. Eur J Pharmacol,
2000. 405(1-3): p. 73-88.
220
66. Bulleit, R.F. and T. Hsieh, MEK inhibitors block BDNF-dependent and -independent
expression of GABA(A) receptor subunit mRNAs in cultured mouse cerebellar granule
neurons. Brain Res Dev Brain Res, 2000. 119(1): p. 1-10.
67. Alberti, K.G. and P.Z. Zimmet, Definition, diagnosis and classification of diabetes
mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus
provisional report of a WHO consultation. Diabet Med, 1998. 15(7): p. 539-53.
68. O'Connell, J.R. and D.E. Weeks, PedCheck: a program for identification of genotype
incompatibilities in linkage analysis. Am J Hum Genet, 1998. 63(1): p. 259-66.
69. Purcell, S., et al., PLINK: a tool set for whole-genome association and population-based
linkage analyses. Am J Hum Genet, 2007. 81(3): p. 559-75.
70. Laird, N.M., S. Horvath, and X. Xu, Implementing a unified approach to family-based
tests of association. Genet Epidemiol, 2000. 19 Suppl 1: p. S36-42.
71. Dudbridge, F., Pedigree disequilibrium tests for multilocus haplotypes. Genet Epidemiol,
2003. 25(2): p. 115-21.
72. Barrett, J.C., et al., Haploview: analysis and visualization of LD and haplotype maps.
Bioinformatics, 2005. 21(2): p. 263-5.
221
CHAPTER 7
A GENOME-WIDE ASSOCIATION STUDY
EXAMINING OBESE FACTORS IN AN ARAB FAMILY
WITH A HISTORY OF TYPE 2 DIABETES
This chapter was a submission to the American Journal Human Genetics and the format
presented is as per the "Instruction to Authors" from the publishing house.
222
223
Chapter 7
A Genome-Wide Association Study Examining Obese
Factors in an Arab Family with a History of Type 2
Diabetes
Chapter 7 was prepared as a manuscript for submission to The American Journal of Human
Genetics. The aim of the study presented in this manuscript was to detect and characterise
genes that may influence susceptibility to obesity in Type 2 Diabetes patients from volunteers
of a study population from the United Arab Emirates.
To date, the genes responsible for the obese phenotype in Arabs are not known. Obesity is a
principal factor that contributes to Type 2 Diabetes. Consequently, a genome wide screen for
obesity among the UAE population of Arab descent was initiated. This study paved the way
towards identifying susceptibility genes for obesity in the UAE population. If genetic profiling
can be used successfully to identify high-risk individuals to obesity, this would result in
substantial benefits to both individuals and society. Targeting preventive measures for
individuals with high-risk genotypes could delay the onset of the disease, slow its progression,
and reduce the ultimate severity of the condition. This would result in substantial
improvements in quality of life for affected individuals and a reduction in healthcare costs.
The identification of target genes might also lead to the development of novel therapeutic
modalities.
In this chapter, we specifically investigated the genetic associations with obesity in one
extended Emirati family of 319 members only 178 were genotyped. Given that Body Mass
Index (BMI) and Waist Circumference (WC) play a more prominent role in the development of
diabetes in this population, we studied the relation between these two traits with 657,367
Single Nucleotide Polymorphisms (SNP). This study supports the influence of both
environmental and genetic factors in the pathophysiology of Type 2 Diabetes and its related
phenotypes in an Arab population. The study revealed four loci that were significant. Two
224
loci in ADAM30 and JAZF1 which were shown to be associated with Type 2 Diabetes in
Caucasian population through a meta-analysis in previous study it have been also shown to be
associated with the Type 2 Diabetes in Arabs population. Two novel associations were noted
in this study: one novel locus on chromosome 16 within the FBXO31 locus was shown to be
associated with the WC phenotype, and one SNP in GALNTL4 of chromosome 11 was found to
be associated with BMI. The results presented show a strong familial aggregation of
quantitative traits associated with Type 2 Diabetes.
My colleagues and I prepared this manuscript. I completed all laboratory work at Central
Veterinary Research Laboratory (CVRL) under the guidance of Dr Khazanehdari. I
preformed the data analysis and drafted the first version of this manuscript. Mr Francis
provided advise on the relevant bioinformatics modules required for the study and established
working accounts to enable complete analysis of the data. Dr Jamieson worked through 4
different software packages with me to ascertain the relevant analytical tools for the data
gathered. Drs Cordell and Blackwell provided support and advice regarding the statistical
methods and identified the relevant analytical tools. Dr Tay guided me throughout the study
from designing the study to proof reading the manuscripts.
225
A Genome-Wide Association Study Examining Obese Factors in an Arab Family with a
History of Type 2 Diabetes.
Habiba S Al Safar1, 2, Heather J Cordell3, Sarra E Jamieson4, Richard Francis4, Kamal
Khazanehdari5, Guan K Tay1 Jenefer M Blackwell4,6
1 Centre for Forensic Science, The University of Western Australia, Crawley Western
Australia. 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates. 3 Institute of Human Genetics, Newcastle University, Newcastle upon Tyne, United
Kingdom.
4 Telethon Institute for Child Health Research, Centre for Child Health Research, The
University of Western Australia, Subiaco, Western Australia.
5 Molecular Biology and Genetics, Central Veterinary Research Laboratory, Dubai, United
Arab Emirates.
6 Cambridge Institute for Medical Research and Department of Medicine, School of Clinical,
Medicine University of Cambridge, Cambridge, United Kingdom.
Abbreviated title: GWAS of an Arab Family with T2D Keywords: Type 2 Diabetes, GWAS, QTDT, Arab family, Body Mass Index, Waist Circumference. Publication number HA010-006 of the Centre for Forensic Science at the University of Western Australia
Corresponding author:
Professor Jenefer Blackwell Head, Division of Genetics and Health Telethon Institute for Child Health Research 100 Roberts Road, Subiaco, WA 6008 PO Box 855, West Perth, WA 6873 Tel: +61 8 9489 7910 Fax: +61 8 9489 7700 Email: [email protected]
226
227
ABSTRACT
Overweight and obesity are major risk factors for a number of chronic diseases, including
Type 2 Diabetes (T2D), cardiovascular disease and cancer. In the United Arab Emirates
(UAE), it has been estimated that some twenty percent of adults suffer from obesity. The
incidence of T2D in the UAE population is also among the highest in the world. To identify
factors that result in obesity, and its association with T2D, we conducted a Genome-Wide
Association Study (GWAS) and specifically assessed genetic associations with "Body Mass
Index" (BMI) and "Waist Circumference" (WC). GWAS analysis of 178 individuals in an
extended family of Arab descent revealed four loci that reached genome-wide significance,
two of which were found in previous studies. The previously described association between
the Single Nucleotide Polymorphism (SNP) at position rs2793823 within the ADAM30 locus
(identified through meta-analysis of a GWAS study of subjects of Caucasian descent) was also
shown to be associated with the disease in Arabs (p = 1.86E-8). Our study also confirmed the
association between SNPs within the JAZF1 loci and BMI, WC and T2D as reported in other
studies. Two novel associations were noted in our study: (1) a novel locus on chromosome 16
within the FBXO31 locus (rs9308437, p = 7.5E-7) was shown to be associated with the WC
phenotype, and (2) the SNP (rs7120774) in GALNTL4 of chromosome 11 was found to be
associated with BMI (p =1.82E-10). FBX031 is a candidate gene for breast cancer, whereas
GALNTL4 plays a role in insulin stimulated glucose transport in muscle. Work continues to
replicate the two latter findings in independent cohorts to confirm the involvement of FBXO31
and GALNTL4.
228
INTRODUCTION
Obesity is increasing at an alarming rate throughout the world. It is recognised as a major
global public health concern, with much of the underlying problem resulting from to poor
lifestyle factors including unhealthy eating habits and the lack of exercise overlaid on specific
genetic backgrounds that compound the weight gain. Obesity is a chronic condition that
results from increase in body weight in adults and is arguably considered to be the most
important risk factor leading to metabolic diseases such as Type 2 Diabetes (T2D).
Management of the disease can be as simple as adopting life style changes. For example,
obesity in patients that is a consequence of insulin resistance or a reduced number of insulin
receptors can be reversible by weight control and loss [1, 2]. Insulin resistance, a condition in
which cells do not use insulin as they should, results in high levels of sugars in the
bloodstream and can lead to diabetes. A consequence of overall obesity is the accumulation of
body fat; the specific location of this fat has been associated with the development of
cardiovascular disease, stroke, and diabetes [3]. Therefore, the disease reduces life quality and
increases morbidity and mortality [4]. Many studies have specifically reported associations
between the obesity markers Body Mass Index (BMI) and/or Waist Circumference (WC) with
T2D in adults [5-14]. In addition, it has long been recognised that abdominal obesity, assessed
by WC rather than BMI, can be important, and the weighted evidence indicates that the ratio
of WC to BMI predicts a greater variance in health risk than does BMI alone [15, 16].
Microarray based genotypong technology is increasingly being used to investigate complex
metabolic and non-metabolic diseases. Due to the decreasing cost, the convenience and
improved resolving power of Genome Wide Association Studies (GWA) [17, 18], scientists
have been focusing on studying the association between genetic components and risk factors
such as obesity in major metabolic diseases including T2D. Recent genome-wide association
studies have identified multiple risk loci common to obesity, including FTO, MC4R,
TMEM18, GNPDA2, SH2B1, KCTD15, MTCH2, NEGR1 and PCSK1 [19-23].
For example, studies have revealed strong associations between two loci FTO and MC4R with
BMI and WC [24, 25]. FTO is highly expressed in hypothalamic nuclei that control eating
229
behavior [26]. It catalyzes Fe(II)- and 2OG-dependent DNA demethylation [26], however the
role of FTO related DNA methylation in obesity is unknown. MC4R is known as a G-protein
coupled receptor (GPCR). These receptors sense signals such as light; chemicals or hormones
and mutations in the receptors are implicated in many diseases such as Diabetes [27]. If a
mutations causes slight changes in MC4R, it may be sufficient to increase food intake,
ultimately leading to obesity [27].
The study described here was conceived as a GWAS to investigate genetic associations with
BMI and WC and T2D in a population not previously studied, specifically in an extended
family of Arab origin from the United Arab Emirates (UAE). It is understood that obesity is a
metabolic disorder, however, the relationship between obesity and T2D is not yet clear.
Investigating associations between the patient genetic makeup and BMI or WC as well as T2D
may provide clues on the mechanism of the genes that are involved.
Although GWAS studies have previously reported associations between BMI with risk of T2D
in populations such as Caucasians and Orientals [21, 25, 26, 28], no such study has been
carried out on the Arab population. Lifestyle changes of this population in recent times have
significantly increased weight gain early in adult life, and are believed to be a major
contributing factor to the obesity epidemic and associated diseases such as T2D.
The purpose of this study was to investigate the genetic associations with obesity in an
ethnically homogeneous cohort from UAE. In this manuscript, BMI and WC are the primary
focus. The relation between these traits and 657,367 Single Nucleotide Polymorphisms (SNP)
in one extended Emirati family of 319 members (of which 178 were genotyped) was studied.
230
MATERIALS AND METHODS
Participants of study
One hundred seventy eight (n=178) individuals from one extended family of Arab origin in the
United Arab Emirates (UAE) agreed to take part in this study (86 males, 92 females and 66
diabetic, 112 healthy). Clinical assessments were conducted and questionnaires were
completed at the Al-Etihad clinic in Dubai. All participants gave their informed consent in
writing. The study was approved by the Ethics Committees of the Ministry of Health in the
United Arab Emirates.
Collection of Phenotype data
Trained nurses measured the height and weight of each participant using a calibrated wall-
mounted stadiometer and a weigh scale, respectively. Body Mass Index (BMI) was calculated
as weight in kilograms divided by the square of height of each subject (kg/m2). Waist
Circumference (WC) was measured in inches. In this Over weight and obesity was defined
according to World Health Organization (WHO) [29]. A WHO classification for BMI over
weight ranges between 25 to 30 kg/m2 SO and for high waist circumference is defined as ≥ 35
inches for females and ≥ 40 inches for males..
Genotyping
Genotyping was performed on the Infinium Human 660W Quad chip according to the
manufacture’s recommendations (Illumina Inc. San Diego, USA) at the Molecular Biology &
Genetics Department, Central Veterinary Research Laboratory based in Dubai, United Arab
Emirates. A total of 200ng of genomic DNA at 37°C for 20 to 24 hours was amplified using
Whole Genome amplification. Products were fragmented, precipitated, and resuspended in a
proprietary hybridisation buffer (Illumina Inc., San Diego, USA). The resuspended samples
were denatured at 95°C for 20 min and loaded on Illumina Bead Chips. The chips were placed
in a hybridisation chamber for 16 to 20 hours at 48°C. After hybridisation, non-hybridised
DNA was washed away. An allele-specific single-base extension of the oligonucleotides on
the BeadChip was performed in a 48-position Slide Chamber Rack (Illumina Inc., San Diego,
USA), using labelled deoxynucleotides and the captured DNA as a template. After staining of
the extended DNA, BeadChips were washed and scanned with I-Scan (Illumina Inc., San
231
Diego, USA), and raw data was generated by BeadStudio 3.0 software (Illumina Inc. San
Diego, USA).
Statistical Methods
Heritability and power calculations for BMI and WC were performed using the SOLAR
package to evaluate the influence of genetic components on phenotypic variation [30]. Data
quality control (QC) was performed using PLINK [31] to remove SNPs with a minor allele
frequency (MAF) <0.05, >5% missing genotype rate, failing the Hardy-Weinberg equilibrium
(HWE) test at the 0.000001 significance level and Mendelian error. Approximately 70% of
SNPs passed QC and were used in the association analysis. Samples that failed quality control
were also excluded from the analysis. The average call rates for 178 samples were 98.99%. In
addition, PedCheck was also used to identify errors in the familial relationships [32]. Genome-
wide association testing between SNPs and Quantitative traits (BMI and WC) was performed
using the orthogonal model in the quantitative trait transmission disequilibrium test (QTDT)
program, in which the total association is partitioned into orthogonal within- and between-
family components [33].
232
RESULTS
One hundred and seventy eight family members, 112 non-diabetic subjects and 66 diabetic,
were genotyped in this study. The clinical characteristics of the study group are summarised
in Table 1. The age range between both subject categories were similar, ranging from 18 to 87
in patients and 18 to 97 in healthy volunteers.
The estimated heritability and power for the two traits used to evaluate the influence of genetic
component on phenotypic variation are shown in Table 2. BMI and WC showed significant
levels of heritability (p < 1e-6). Our study had greater than 80% power to detect a single locus
accounting for all the heritability at a logarithm of the odds (LOD) =3.
The association p-values (Manhattan plot) for the two quantitative traits BMI and WC are
shown in Figure 1 and Figure 2 respectively. The highest scoring SNPs for association with
BMI and WC are shown in Tables 3 and 4. The SNP with the lowest p-value (8.97E-14) in
BMI was rs11711029 located on chromosome 3. This particular SNP is located within the
HPS3 (GeneID 84343) gene. In addition the SNP with the lowest p-value (7.55E-07) for the
WC trait was rs9308347 located on chromosome 16. This SNP is within FBXO31
(GeneID79791) gene. These SNPs have not previously been shown to have reached genome-
wide significance in studies involving other populations.
Tables 5 & 6 show previously reported genes associated with BMI and WC. In the present
study, only three loci with slight significance, rs6265 in BDNF, rs1333026 in an unknown
gene and rs10838738 in MTCH2 (p-values 0.026, 0.016 and 0.004 respectively) were
observed. Two genes, FTO and MC4R, identified in previous studies as genes related to BMI
and WC, were not significant in our study.
233
Table 1: Characteristics of 178 family member of Arab origin in this study.
Description Number
Males 86
Females 92
Type 2 Diabetes 66
Healthy 112
Variable
Physical Appearance
T2D
Age Range (years) 18-97
Mean Waist Circumference (inches)
Male 37.96 ± 5.13
Female 39.84 ± 5.40
Mean Body Mass Index (kg/m2) 30.40 ± 6.23
Healthy
Age Range (years) 18-97
Mean Waist Circumference (inches)
Male 38.20 ± 8.70
Female 37.85 ± 9.13
Mean Body Mass Index (kg/m2) 29.00 ± 8.82
Mean data are provided with + Standard Deviation.
234
Table 2: Heritability and power estimation to obtain a suggested (LOD =3) of two quantitative
traits (BMI and West Circumference) in 178 individuals. Values have been adjusted
for sex and age.
Trait H2r p-value Chi-square Power estimate
Waist Circumference 0.44 2.6 E-9 34.04 > 80%
Body Mass Index 0.48 1.0 E-6 28.01 > 90%
235
Figure 1: Manhattan plot of −log10 (observed p-value) across the genome or each GWAS SNP tested for association with BMI in
178 individual. Horizontal axis shows SNP location and vertical axis is −log10 (p-value) for each SNP tested. Red line
shows SNPs and implicated genes with p-values beyond the genome-wide significance threshold (1.5×10−7)
236
Figure 2: Manhattan plot of −log10 (observed p- value) across the genome or each GWAS SNP tested for association with Waist
Circumference in 178 individual. Horizontal axis shows SNP location and vertical axis is −log10 (p-value) for each SNP
tested. Red line shows SNPs and implicated genes with p-values beyond (1.5×10−7)
237
Table 3: Top association results for BMI based on QTDT and their position, chi-square and their p value
Trait Chr SNP Position Chi-square p value Gene
BMI
1 rs197438 112082197 42.58 6.79E-11 C1orf1831 rs584096 112131501 29.48 5.65E-08 KCND31 rs2788407 112563162 35.22 2.94E-09 1 rs2793823 120239241 31.64 1.86E-08 ADAM301 rs11204894 150059798 31.81 1.70E-08 RORC2 rs2368424 184176711 32.11 1.46E-08 2 rs1349825 184201575 31.63 1.87E-08 2 rs2056156 189556713 28.79 8.07E-08
COL3A1 2 rs3106796 189558018 31.46 2.04E-082 rs12052514 191086415 35.28 2.86E-09 TMEM194B2 rs6431635 234386183 29.91 4.53E-08 2 rs4663525 235518053 31.35 2.15E-08 2 rs2042831 235521853 33.77 6.20E-09 2 rs3731644 235614616 31.72 1.78E-08
SH3BP4 2 rs3731646 235614741 30.48 3.37E-082 rs3731648 235615023 31.37 2.13E-082 rs13396122 237373250 45.32 1.67E-11 3 rs13088151 3024520 29.79 4.81E-08
CNTN4 3 rs17024684 3030247 32.28 1.33E-083 rs7634908 3530312 31.25 2.27E-08 3 rs9853064 3531909 31.25 2.27E-08 3 rs17042585 5468389 39.99 2.55E-10 3 rs6443195 8576334 33.82 6.05E-09
LMCD1 3 rs1876611 8578483 33.11 8.71E-09
238
Table 3 (continued)
Trait Chr SNP Position Chi-square p value Gene
BMI
3 rs342892 147817321 38.16 6.52E-10 3 rs342938 147833036 37.65 8.46E-10 3 rs10049224 148733855 33.93 5.71E-09 3 rs4681169 150335145 39.17 3.88E-10
HPS3 3 rs4681487 150336167 38.88 4.51E-103 rs12487928 150343683 39.17 3.88E-103 rs11711029 150345546 55.58 8.97E-143 rs2689225 150349398 38.00 7.07E-104 rs3774820 5511959 29.35 6.04E-08
STK32B 4 rs3774813 5521161 33.84 5.98E-094 rs7679731 5977333 30.55 3.25E-08 4 rs7694823 7583941 34.68 3.89E-09 SORCS24 rs1441689 29155396 31.87 1.65E-08 4 rs10002254 29235963 32.94 9.50E-09 4 rs3846269 30021293 33.74 6.30E-09 4 rs1357462 31624867 34.40 4.49E-09 5 rs13160153 1570198 30.09 4.12E-08 LPCAT15 rs13187652 3148939 29.75 4.92E-08 5 rs10035578 171408294 28.92 7.54E-08 STK108 rs10105056 13655257 45.45 1.57E-11 8 rs10109857 13657264 44.36 2.73E-11 8 rs352774 15692048 30.29 3.72E-08 8 rs1670189 124241390 34.77 3.71E-09 9 rs872257 2486567 30.01 4.30E-08 FLJ35024
10 rs1444418 64230476 48.06 4.13E-12
239
Table 3 (continued)
Trait Chr SNP Position Chi-square p value Gene
BMI
10 rs4746781 64415595 30.92 2.69E-08 10 rs6479868 64490002 30.55 3.25E-08 10 rs12770187 64578695 34.01 5.48E-09 DKFZp564C1664, NRBF211 rs7120774 13759495 40.65 1.82E-10 GALNTL411 rs12275375 14593967 32.84 1.00E-08 11 rs10500802 96108514 36.78 1.32E-09 PSMA111 rs3019711 99509561 28.49 9.42E-08 11 rs11222898 77236017 28.33 1.02E-07 CNTN513 rs9600927 77246192 31.83 1.68E-08 SLAIN1and DKFZp434A242213 rs7328292 78497513 38.70 4.94E-10 13 rs1112971 78499127 29.44 5.77E-08 BX647243 and AK09577913 rs17181627 19906230 28.59 8.94E-08 13 rs7334914 20742953 28.43 9.71E-08 15 rs11637445 25661883 38.04 6.93E-10 MAP2K521 rs12185827 26557786 29.88 4.60E-08 21 rs2826261 26756022 32.04 1.51E-08 21 rs2151 16751358 29.10 6.87E-08 21 rs468241 16757199 28.92 7.54E-08 21 rs190100 23796553 29.35 6.04E-08 22 rs5747395 29431646 38.43 5.68E-10
MICAL3 22 rs8141766 33836426 30.31 3.68E-0822 rs6004423 112082197 31.16 2.38E-08 KIAA1671 and CTA-221G9.522 rs9606766 112563162 28.29 1.04E-07 OSBP2 and KIAA166422 rs4820180 150059798 32.83 1.01E-08
240
Table 4: Top association results for Waist Circumference based on QTDT analysis and their position, chi-square and their p value
Trait Chr SNP Position Chi-square p value Gene
Waist Circumference
1 rs17534243 38423504 15.06 1.04E-04 1 rs7526314 40834106 16.75 4.26E-05 1 rs12079703 48779621 15.70 7.42E-05 AGBL41 rs2494316 192759116 16.47 4.94E-05 2 rs7578740 12336203 15.68 7.50E-05 AK0015584 rs11737601 10097666 17.24 3.29E-05 4 rs3749558 10103101 18.14 2.05E-05 CLNK4 rs13109005 10119979 16.34 5.29E-05 4 rs1004327 10120581 18.14 2.05E-05 4 rs4698497 16226648 19.79 8.64E-06 LDB24 rs1031326 17097162 16.81 4.13E-05 QDPR4 rs2939720 37235621 15.30 9.17E-05 C4orf194 rs6830246 41334756 16.57 4.69E-05 LIMCH14 rs13117610 41921681 19.25 1.15E-05 4 rs729467 41935895 15.82 6.97E-05 4 rs4861178 41946387 17.93 2.29E-05 4 rs13113565 42251414 17.25 3.28E-05 ATP8A14 rs7666279 42252312 15.49 8.29E-05 ATP8A14 rs17026425 150891964 16.16 5.82E-05 BC0310925 rs2217346 15664320 16.18 5.76E-05 FBXL75 rs12757 15682061 17.03 3.68E-05 5 rs7704791 15685099 18.03 2.17E-05 5 rs12652447 15727635 17.66 2.64E-05 6 rs510957 151383747 20.71 5.34E-06 DKFZp586G1517 & MTHFD1L
241
Table 4 (continued)
Trait Chr SNP Position Chi-square p value Gene
Waist Circumference
7 rs2091321 47372660 15.95 6.50E-05 TNS37 rs6964472 47383258 16.56 4.71E-05
CSMD1 7 rs12668378 53976764 16.17 5.79E-057 rs304749 79816339 16.34 5.29E-057 rs17162763 89503141 16.38 5.18E-058 rs1112779 4358577 19.48 1.02E-05 9 rs12004565 77182642 17.67 2.63E-05
IGM1 10 rs703424 119939738 15.63 7.70E-0513 rs465051 31570463 18.37 1.82E-05 FRY13 rs9603579 39102695 16.20 5.70E-05 13 rs585206 41587853 17.04 3.66E-05 DGKH14 rs8010158 38066347 16.27 5.49E-05 14 rs2415487 38094871 17.67 2.63E-05 14 rs1597353 38103054 16.80 4.15E-05 14 rs11626845 38151069 15.46 8.43E-05 16 rs150348 55673537 15.77 7.15E-05 NLRC516 rs4843479 85475004 15.79 7.08E-05 16 rs7203346 85862570 18.86 1.41E-05 AK12574916 rs1862788 85893927 17.48 2.90E-05 16 rs7192413 85913378 19.51 1.00E-05 16 rs9308347 85929253 24.47 7.55E-07 FBXO3120 rs4810899 35643566 20.20 6.98E-06 22 rs8135417 29889255 15.77 7.15E-05 RNF185
242
Table 5: Comparison of BMI results with prior literature for SNPs, which are present in this study and our p value, results.
Traits Gene SNP References p value
BMI
BCDIN3D, FAIM2 rs7138803 [54] 0.740
BDNF
rs6265 [54]
0.025 rs925946 0.920 rs7481311 -
BMP2 rs2145270 [23] - C20orf133 rs6110577
[55] 0.823
FBN2 rs374748 - FLJ20309 rs7603514 0.862
FTO
rs9939609 [23] - rs8050136 [54] 0.751 rs9939609 [21] - rs6499640 [54] 0.208 rs1121980 [22] - rs1421085 [56] - rs1121980 [57] - rs9941349 [55] 0.178 rs9930506 [58] -
GNPDA2 rs10938397 [23] -
Intergenic
rs1106683 [28]
- rs1106684 - rs1333026 0.016
ITPR3 rs999943 [55] - KCTD15 rs11084753 [23] - KCTD15, CHST8 rs29941 [54] 0.479
243
Table 5 (continued)
Traits Gene SNP References p value
BMI
MAF rs1424233 [56] 0.577
MC4R rs17782313 [23] - rs12970134 [54] 0.823
MLN rs2274459 [55] 0.639 MTCH2 rs10838738 [23] 0.004 MUC15 rs12295638 [55] -
NEGR1 rs2568958 [54] 0.823 rs2815752 [23] -
NPC1 rs1805081 [56] 0.729 NR rs10783050 [54] 0.791 PRF1 rs10999409 [55] 0.265 PTER rs10508503 [56] - RAFTLIN rs12635698
[55] 0.532
RARB rs1435703 - RKHD3 rs12324805 [23] 0.055 RTN4 rs6726292 [55] 0.806 SEC16B, RASAL2 rs10913469
[54] 0.887
SFRS10, ETV5, DGKG rs7647305 0.289 SH2B1, ATP2A1 rs7498665 -
TMEM18 rs6548238 [23] - rs7561317 [54] 0.777
TRAM1L1 rs10433903 [55] - TRHR rs7832552 [59] - ZNF248 rs7474896 [55] 0.145
244
Table 6: Comparison of Waist Circumference results with prior literature for SNPs, which are present in this study and our p value, results.
Traits Gene SNP References p value
Waist Circumference
CDH12 rs4701252 [60] 0.639
CETP rs3764261 [61] 0.064
FAIM2, BCDIN3D rs7138803
[60]
0.348
FTO rs1558902 -
GCKR rs1260326 [25] 0.104
GDAP1 rs4471028
[28]
0.152
Intergenic rs1875517 -
LPL rs2083637
[25]
0.559
MC4R rs12970134 0.624
MC4R rs489693
[60]
-
NRXN3 rs10146997 0.862
OVCH2 rs7932813 -
PKHD1 rs1555967 -
245
DISCUSSION
In the current study, our aim was to perform GWAS analysis to detect genetic variants that
affect the incidence of T2D in one extended family of Arab origin from the United Arab
Emirates. One of the factors that increases the risk of T2D is obesity. Obesity is a complex
problem, which cannot be entirely explained by one factor alone. Multiple genes may increase
one’s susceptibility for obesity and the phenotype may also be affected by outside factors;
such as abundant food supply or little physical activity. For example, a study conducted by
Froguel and his group identified two forms of the GAD2 gene. One protected against obesity,
the other made it more likely by stimulating the appetite [34].
Many previously discovered genes associated with obesity are active in the brain, and could
affect behavior around food, rather than how the body breaks down fat or uses up energy.
Researchers found that the NRXN3 gene variant previously associated with alcohol
dependence, cocaine addiction, and illegal substance abuse also predicts the tendency to
become obese [35]. Another study explained how BDNF work in combination with a variety
of other substances that regulate appetite and body weight [36]. Interestingly, considering how
many factors are involved in obesity, it is interesting that research is increasingly pointing to
the brain as being very important in its development.
To investigate genetic determinants of obesity and T2D, a total of 657,367 SNPs were
genotyped in 178 members of one Emirati family of five generations. Out of the 178 members,
only 66 were diagnosed with T2D, with the overall BMI mean for T2D patients (30.4 ± 6.23)
and WC (37.96 ± 5.13) in male and (39.84 ± 5.40) in female. It is interesting to see that there
is significant phenotype correlation (70%) between the BMI and Waist Circumference, which
is also related to obesity (data not shown). This is consistent with an influence of both
environmental and genetic factors in the pathophysiology of T2D and its related phenotypes in
an Arab population. Furthermore the results presented in Table 2 show a strong familial
aggregation of quantitative traits WC and BMI, which are known to be associated with T2D
and which may play a more prominent role in the development of diabetes in this population.
246
The most noteworthy outcomes of this study were associations detected at the ADAM30,
GALNTL4, JAZF1, and FBX031 gene regions. The associations at ADAM30 and JAZF1
replicate the associations at GALNTL4 and FBX031 represent novel findings.
A meta-analysis that was carried out by Zeggini et al. validated that a SNP (rs10923931)
located in chromosome 1 in ADAM30 gene is associated with T2D with a p-value 4E-8 [37].
In our study one novel SNP (rs2793823) located in the same gene reached a genome-wide
significance threshold of p=1.86E-8 for association with BMI. The function of ADAM30
(ADAM metallopeptidase domain 30) is still poorly understood. JAZF1 gene is another gene
which was studied by Zeggini et al., and the SNP rs864745 (p=5.00E-14) was associated with
T2D [37]. In our study, two novel SNPs in the same gene rs10268254 and rs38523 were
slightly significant (p-values 0.020, 0.0397 respectively). However very little is known about
the biological function of JAZF1, yet, since JAZF1 is expressed in the pancreas [38] one might
consider that a gain of function variant in JAZF1 may direct to post natal growth restriction
also affecting pancreatic β-cell mass and function. Our study also confirmed that a SNP
(rs7120774) in GLUT4 gene located in chromosome 11 is related to obesity (p = 1.82E-10).
GLUT4 isoform is primarily responsive to insulin and accounts for the majority, if not all, of
insulin-stimulated glucose transport in muscle and adipose tissue under normal physiological
circumstances.
This study identified several loci that were not detected earlier and are associated with T2D
with GWAS significant probability values (p ≤1.00E-7). The most significant statistical
evidence for association with BMI was found in rs9308347 (p = 8.97E-14) in HPS3 gene and
rs9308347 in gene FBX031 (p = 7.50E10-7) for WC.
In this study we also detected association at lower levels of significance with three novel SNPs
located in ATP8A1 gene (p= 3.8E-05 and 8.29E-5), which belongs to the type 4 subfamily of
p-type ATPases to be associated with WC. ATP8A1 is highly distributed in skeletal muscle
and thyroid tissues [39]. ATP10A and ATP10D have been proposed as candidates for obesity
and HDL-cholesterol level respectively [40]. Since ATP10A and ATP10D belong to the same
class of P4 ATPases [41, 42], ATP8A1 may be involved in similar pathways. Therefore, they
may play a role in glucose uptake and fat metabolism.
247
Genes which are contributing to other diseases than T2D are also found to be significant in
this study. This may be due to the fact that T2D patients participated in this study were
suffering from other complications such as breast cancer (in six of the female patients), and
their WC was (42.83 ± 3.76). Therefore, genes such as FBXO31 (rs9308437, p = 7.5E-7) were
found to be significant in this study. FBXO31 showed a GWAS significant value (p-value) for
association with WC. FBXO31 is located in chromosome 16q24.3, a region in which there is
loss of heterozygosity in breast, ovarian, hepatocellular and prostate cancers [43-47].
Scientists concluded that obesity and physical inactivity may account for 25 to 30 percent of
several diseases including major cancers; such as cancers of the colon, breast, endometrium,
kidney, and esophagus [48]. Specifically, Obesity seems to increase the risk of breast cancer
only among postmenopausal women [49] who have an increased levels of estrogen due to their
overweight condition [50]. After menopause, when the ovaries stop producing hormones, fat
tissue becomes the most important estrogen supply [51]. Estrogen levels in postmenopausal
women are 50 to 100 percent higher among heavy versus lean women [52]. Therefore
estrogen-sensitive tissues are exposed to more estrogen stimulation in heavy women, leading
to a more rapid growth of estrogen-responsive breast tumors. Therefore, this gene might play a
role in body weight gain and subsequently in T2D.
Our study found five SNPs (rs4681169, rs4681487, rs12487928, rs11711029 and rs2689225)
to have significant p-value of 3.88E-10, 4.51E-10, 3.88E-10, 8.97E-14 and 7.07E-10,
respectively. These five SNPs located on chromosome 3 within HPS3 gene. HPS3 is one of
the subtypes of Hermansky-Pudlak syndrome (HPS). HPS is a rare genetic autosomal
recessive disorder which occur due to defects in the melanosome, platelet-dense granule, and
lysosome organelles of cells found in various cell types [53]. So far, there are no previous
studies which showed any association of this gene with the T2D and obesity.
In this study we have not seen any significant association of FTO and MC4R genes with our
traits (BMI and WC). The different genetic background between Caucasian and Arab
populations could explain the non-significance results of these two genes. Thus in our
population these genes might not be involved or association with these genes may only emerge
when large sample sizes are analyzed. It should also be noted that the non-significance of FTO
SNP may be partly explained by the similar BMI between the T2D patients and healthy
individuals in our sample.
248
An interesting aspect to our study is the use of 178 individuals from a single large pedigree.
This means that the test we employed (the orthogonal model of the QTDT) could actually be
considered to represent a joint test of linkage and association rather than a test of association
per se. In fact, in a large pedigree such as this one, one could argue that linkage and
association are essentially the same thing - the correlation between phenotype and marker
alleles occurs firstly because the marker allele happened to be in coupling with the trait allele
on one (or several) haplotypes in founders, and secondly because the marker and disease
alleles are transmitted together through the pedigree (due to a lack of recombination). In our
pedigree, there are likely to be a much larger number of observations for linkage (refecting
this lack of recombination between trait and marker alleles) than there are for association
(reflecting the fact that the trait and marker allele are correlated in the founders, perhaps due to
linkage disequilibrium in the general population). In theory one could account for the linkage
component of the test in the QTDT through incorporation of observed identity-by-descent
(IBD) sharing between individuals in a variance components framework. However, calculation
of IBD sharing in such a large pedigree is computationally demanding and would most likely
result in a reduction in power. By not incorporating IBD sharing in the calculation, we are able
to exploit the linkage signal in our pedigree in order to increase our power to detect genetic
effects. However, this does impact upon our interpretation of our results, since linkage signals
are generally expected to extend over larger genomic regions than association signals. This
could explain the relatively wide localization of the signals we found (see Tables 3 and 4)
which in some cases stretched over several genes.
In conclusion, our GWAS analysis indicated the involvement of some novel genes in the
etiology of obesity (BMI and WC). GWAS analyses are only an initial step in the explication
of susceptibility variants. Although the current analyses have pointed out several areas that
may hold genetic variants that affect susceptibility to T2D in Arab populations, further
investigation of the identified genes is needed to understand the mechanism and association of
these genes with T2D and obesity. Our findings require replication in both Arab and other
ethnic groups. The characteristics of Arabic population make them ideal for the study of
complex, polygenic, multifactorial disorders such as diabetes due to consanguineous
marriages, high birth rates and lack of physical exercise. As we uncover more variants, we will
249
gain a better basic understanding of obesity, which in turn will further previously unimagined
areas of clinically relevant research
250
ACKNOWLEDGMENT
Publication number HA010-006 of the Centre for Forensic Science at the University of
Western Australia. We gratefully acknowledge the contribution of participating family
members whose cooperation made this study possible. Part of the data analysis was
performed on the advanced computing resources provided by the Western Australian
Advanced Computing Consortia (iVEC). Habiba Alsafar is a PhD scholar at the University of
Western Australia supported by the Dubai Police General Head Quarters in the United Arab
Emirates. Funding for this project was provided in part by CVRL and the Emirates
Foundation.
251
REFERENCES
1. Lyen, K.R., The insulin receptor. Ann Acad Med Singapore, 1985. 14(2): p. 364-73.
2. Olefsky, J.M. and O.G. Kolterman, Mechanisms of insulin resistance in obesity and
noninsulin-dependent (type II) diabetes. Am J Med, 1981. 70(1): p. 151-68.
3. Bjrntorp, P., Obesity and Adipose Tissue Distribution as Risk Factors for the
Development of Disease. Transfusion Medicine and Hemotherapy, 1990. 17(1): p. 24-
27.
4. Charro, A., M. Rubio, and D. Runkle, Checks up in obese and diabetic patients:
preventive medicine. Int J Vitam Nutr Res, 2006. 76: p. 194-9.
5. Wannamethee, S.G., A.G. Shaper, and M. Walker, Overweight and obesity and weight
change in middle aged men: impact on cardiovascular disease and diabetes. J
Epidemiol Community Health, 2005. 59(2): p. 134-9.
6. Wannamethee, S.G. and A.G. Shaper, Weight change and duration of overweight and
obesity in the incidence of type 2 diabetes. Diabetes Care, 1999. 22(8): p. 1266-72.
7. Resnick, H.E., et al., Relation of weight gain and weight loss on subsequent diabetes
risk in overweight adults. J Epidemiol Community Health, 2000. 54(8): p. 596-602.
8. Perry, I.J., et al., Prospective study of risk factors for development of non-insulin
dependent diabetes in middle aged British men. Bmj, 1995. 310(6979): p. 560-4.
9. Holbrook, T., E. Barrett-Connor, and D. Wingard, The association of life- time weight
and weight control patterns with diabetes among men and women in an adult
community. Int J Obes, 1989. 13: p. 723–9.
10. Haffner SM, et al., Inci- dence of type II diabetes in Mexican Americans predicted by
fasting insulin and glucose levels, obesity, and body-fat distribution. Diabetes 1990.
39: p. 283–8.
11. Field, A.E., et al., Impact of overweight on the risk of developing common chronic
diseases during a 10-year period. Arch Intern Med, 2001. 161(13): p. 1581-6.
12. Colditz, G.A., et al., Weight gain as a risk factor for clinical diabetes mellitus in
women. Ann Intern Med, 1995. 122(7): p. 481-6.
13. Chan, J., et al., Obesity, fat distribution, and weight gain as risk factors for clinical
diabetes in men. Diabetes Care, 1994(17): p. 961–9.
252
14. Carey, V.J., et al., Body fat distribution and risk of non-insulin-dependent diabetes
mellitus in women. The Nurses' Health Study. Am J Epidemiol, 1997. 145(7): p. 614-9.
15. Ardern, C.I., et al., Discrimination of health risk by combined body mass index and
waist circumference. Obes Res, 2003. 11(1): p. 135-42.
16. Chan, J.M., et al., Obesity, fat distribution, and weight gain as risk factors for clinical
diabetes in men. Diabetes Care, 1994. 17(9): p. 961-969.
17. Genome-wide association study of 14,000 cases of seven common diseases and 3,000
shared controls. Nature, 2007. 447(7145): p. 661-78.
18. Christensen K and M. JC., What genome-wide association studies can do for medicine.
N Engl J Med, 2007. 356: p. 1094–7.
19. Benzinou, M., et al., Common nonsynonymous variants in PCSK1 confer risk of
obesity. Nat Genet, 2008. 40(8): p. 943-5.
20. Chambers J.C., et al., Common genetic variation near MC4R is associated with waist
circumference and insulin resistance. Nat. Genet, 2008. 40: p. 716–718.
21. Frayling, T.M., et al., A common variant in the FTO gene is associated with body mass
index and predisposes to childhood and adult obesity. Science, 2007. 316(5826): p.
889-94.
22. Loos, R.J., et al., Common variants near MC4R are associated with fat mass, weight
and risk of obesity. Nat Genet, 2008. 40(6): p. 768-75.
23. Willer, C.J., et al., Six new loci associated with body mass index highlight a neuronal
influence on body weight regulation. Nat Genet, 2009. 41(1): p. 25-34.
24. Kring, S.I., et al., FTO gene associated fatness in relation to body fat distribution and
metabolic traits throughout a broad range of fatness. PLoS One, 2008. 3(8): p. e2958.
25. Chambers, J.C., et al., Common genetic variation near MC4R is associated with waist
circumference and insulin resistance. Nat Genet, 2008. 40(6): p. 716-8.
26. Gerken, T., et al., The obesity-associated FTO gene encodes a 2-oxoglutarate-
dependent nucleic acid demethylase. Science, 2007. 318(5855): p. 1469-72.
27. Vaisse, C., et al., A frameshift mutation in human MC4R is associated with a dominant
form of obesity. Nat Genet, 1998. 20(2): p. 113-4.
28. Fox, C.S., et al., Genome-wide association to body mass index and waist
circumference: the Framingham Heart Study 100K project. BMC Med Genet, 2007. 8
Suppl 1: p. S18.
253
29. World Health Organization. Obesity: Preventing and Managing the Global Epidemic
(2000) Geneva, World Health Organization. Technical report series 894.
30. Almasy, L. and J. Blangero, Multipoint quantitative-trait linkage analysis in general
pedigrees. Am J Hum Genet, 1998. 62(5): p. 1198-211.
31. Purcell, S., et al., PLINK: a tool set for whole-genome association and population-
based linkage analyses. Am J Hum Genet, 2007. 81(3): p. 559-75.
32. O'Connell, J.R. and D.E. Weeks, PedCheck: a program for identification of genotype
incompatibilities in linkage analysis. Am J Hum Genet, 1998. 63(1): p. 259-66.
33. Abecasis, G.R., L.R. Cardon, and W.O. Cookson, A general test of association for
quantitative traits in nuclear families. Am J Hum Genet, 2000. 66(1): p. 279-92.
34. Boutin, P. and P. Froguel, GAD2: a polygenic contribution to genetic susceptibility for
common obesity? Pathol Biol (Paris), 2005. 53(6): p. 305-7.
35. Kelai, S., et al., Nrxn3 upregulation in the globus pallidus of mice developing cocaine
addiction. Neuroreport, 2008. 19(7): p. 751-5.
36. Gray, J., et al., Hyperphagia, severe obesity, impaired cognitive function, and
hyperactivity associated with functional loss of one copy of the brain-derived
neurotrophic factor (BDNF) gene. Diabetes, 2006. 55(12): p. 3366-71.
37. Zeggini, E., et al., Meta-analysis of genome-wide association data and large-scale
replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet, 2008.
40(5): p. 638-45.
38. Nakajima, T., et al., TIP27: a novel repressor of the nuclear orphan receptor
TAK1/TR4. Nucleic Acids Res, 2004. 32(14): p. 4194-204.
39. Mouro, I., et al., Cloning, expression, and chromosomal mapping of a human ATPase
II gene, member of the third subfamily of P-type ATPases and orthologous to the
presumed bovine and murine aminophospholipid translocase. Biochem Biophys Res
Commun, 1999. 257(2): p. 333-9.
40. Flamant, S., et al., Characterization of a putative type IV aminophospholipid
transporter P-type ATPase. Mamm Genome, 2003. 14(1): p. 21-30.
41. Halleck, M.S., et al., Differential expression of putative transbilayer amphipath
transporters. Physiol Genomics, 1999. 1(3): p. 139-50.
42. Paulusma, C.C. and R.P. Oude Elferink, The type 4 subfamily of P-type ATPases,
putative aminophospholipid translocases with a role in human disease. Biochim
Biophys Acta, 2005. 1741(1-2): p. 11-24.
254
43. Miller, B.J., et al., Pooled analysis of loss of heterozygosity in breast cancer: a genome
scan provides comparative evidence for multiple tumor suppressors and identifies
novel candidate regions. Am J Hum Genet, 2003. 73(4): p. 748-67.
44. Lin, Y.W., et al., Deletion mapping of chromosome 16q24 in hepatocellular carcinoma
in Taiwan and mutational analysis of the 17-beta-HSD gene localized to the region. Int
J Cancer, 2001. 93(1): p. 74-9.
45. Launonen, V., et al., Loss of heterozygosity at chromosomes 3, 6, 8, 11, 16, and 17 in
ovarian cancer: correlation to clinicopathological variables. Cancer Genet Cytogenet,
2000. 122(1): p. 49-54.
46. Kumar, R., et al., FBXO31 is the chromosome 16q24.3 senescence gene, a candidate
breast tumor suppressor, and a component of an SCF complex. Cancer Res, 2005.
65(24): p. 11304-13.
47. Elo, J.P., et al., Loss of heterozygosity at 16q24.1-q24.2 is significantly associated with
metastatic and aggressive behavior of prostate cancer. Cancer Res, 1997. 57(16): p.
3356-9.
48. Vainio, H. and F. Bianchini, Evaluation of cancer-preventive agents and strategies a
new program at the International Agency for Research on Cancer. Ann N Y Acad Sci,
2001. 952: p. 177-80.
49. Toniolo, P.G., et al., A prospective study of endogenous estrogens and breast cancer in
postmenopausal women. J Natl Cancer Inst, 1995. 87(3): p. 190-7.
50. Zeleniuch-Jacquotte, A., et al., Endogenous estrogens and risk of breast cancer by
estrogen receptor status: a prospective study in postmenopausal women. Cancer
Epidemiol Biomarkers Prev, 1995. 4(8): p. 857-60.
51. Keun-Young, Y., et al., Postmenopausal obesity as a breast cancer risk factor
according to estrogen and progesterone receptor status (Japan). Cancer letters, 2001.
167(1): p. 57-63.
52. Huang, Z., et al., Dual effects of weight and weight gain on breast cancer risk. Jama,
1997. 278(17): p. 1407-11.
53. Shotelersuk, V. and W.A. Gahl, Hermansky-Pudlak syndrome: models for intracellular
vesicle formation. Mol Genet Metab, 1998. 65(2): p. 85-96.
54. Thorleifsson, G., et al., Genome-wide association yields new sequence variants at
seven loci that associate with measures of obesity. Nat Genet, 2009. 41(1): p. 18-24.
255
55. Cotsapas, C., et al., Common body mass index-associated variants confer risk of
extreme obesity. Hum Mol Genet, 2009. 18(18): p. 3502-7.
56. Meyre, D., et al., Genome-wide association study for early-onset and morbid adult
obesity identifies three new risk loci in European populations. Nat Genet, 2009. 41(2):
p. 157-9.
57. Hinney, A., et al., Genome wide association (GWA) study for early onset extreme
obesity supports the role of fat mass and obesity associated gene (FTO) variants. PLoS
One, 2007. 2(12): p. e1361.
58. Scuteri, A., et al., Genome-wide association scan shows genetic variants in the FTO
gene are associated with obesity-related traits. PLoS Genet, 2007. 3(7): p. e115.
59. Liu, X.G., et al., Genome-wide association and replication studies identified TRHR as
an important gene for lean body mass. Am J Hum Genet, 2009. 84(3): p. 418-23.
60. Heard-Costa, N.L., et al., NRXN3 is a novel locus for waist circumference: a genome-
wide association study from the CHARGE Consortium. PLoS Genet, 2009. 5(6): p.
e1000539.
61. Lindgren, C.M., et al., Genome-wide association scan meta-analysis identifies three
Loci influencing adiposity and fat distribution. PLoS Genet, 2009. 5(6): p. e1000508.
256
257
CHAPTER 8
COMMENTARY AND FINAL REMARKS
258
259
COMMENTARY AND FINAL REMARKS
The pilot program of the Emirates Family Registry (EFR) described in chapters of this thesis
was overwhelmingly successful. The information gathered for the first Genome Wide Screen
of an Arab population could not have been collated without the support of volunteers that
enrolled in the Emirates Family Registry. The tightly knit Bedouin communities are
essentially closed to technological advances. However, the Emirates Family Registry
provided a platform, through which key members of the family hierarchy could derive
confidence in a long term approach to addressing an important issue. The structured bio-bank
and associated clinical database (Figure 1) provided a means to systematically match bio-
specimens (blood, DNA samples) with phenotypic and demographic data (Figure 2). The
study focused on Type 2 Diabetes (T2D) in Arabs as it represents an increasing problem in the
Middle East (see review of the genetics of diabetes in Chapter 1).
Figure 1: Structure of the EFR bio-bank and clinical database.
The study has provided the initial dataset collected from 23,064 volunteers. The elements of
the Phase dataset are included in Figure 1. A specific subset of the these volunteers was
specifically analysed to estimate the prevalence and incidence of Type 2 Diabetes in the
260
United Arab Emirates (Chapter 2) as well as the inheritance of traits known to be associated
with the disease (Chapter 3).
Figure 2: The specific contents of the EFR Phase 1.
The Emirates Family Registry project has not only resulted in a close relationship with
families and individuals who were keen to develop an understanding on the mechanisms that
cause disease, it has resulted in the establishment of an international collaborative network in
Australia, Europe and the Middle East (see Figure 3) which will ensure future development of
the EFR project. This collaborative network will provide substantial benefits for all the groups
such as genotyping facilities, analytical tools bioinformatics and statistical expertise which
would be invaluable to the Arab genome and bio-bank community since there is a lack of
biostatistician in the Middle East. Furthermore, there is the potential to set up a worthy
collaborative network especially with Gulf Cooperation Council (GCC) countries.
261
Figure 3: Collaborative Links of the EFR Project have been established throughout the Middle East, United Kingdom and Australia
262
The nature of the Middle East, and most of Asia, requires further thought. Methods adapted
for sample collection in harsh environments with little access to infrastructure have to be
developed. Blood are typically collected by venipucture in vacutainers. In parts of the
African and Asian continents, blood collection by this process is problematic. As such, new
sample collection and storage methods have to be considered. In Chapter 4, the FTATM
system was successfully assessed for this purpose.
Analysis of the information within the database has revealed much about the uniqueness of the
genetic background of the Bedouin population and its phylogenetic relationship with other
ethnic groups. In Chapter 5, four specific markers in the Major Histocompatibiliy Complex
(MHC) on human chromosome 6 was typed. In the study, PCR assays to type the markers
AluyMICB, AluyTF, AluyHJ and AluyHF were developed. Phylogenetic comparison of data
from the Arab population were compared with the allelic distribution in Malaysian Chinese,
North Eastern Thais, Japanese, Australian, African and Mongolians population. The study
showed that Arabs have a similar lineage to Caucasians.
The information with the biological specimens were useful in many ways including genome
wide studies to identify contributing polymorphisms (see Chapter 6 and Chapter 7). Phase 1
of the EFR is expected to provide a platform for longitudinal studies, moving forward.
In Chapter 1, the major problem that is Type 2 Diabetes is discussed. Currently over 170
million people globally suffer from Type 2 Diabetes and are affected by factors such as
lifestyle, genetics, as well as behavioral factors [1]. The roles of genetic factors in the etiology
of diabetes were found to be highly significant. Therefore it is important to map disease genes
by comparison of disease and control as well as by performing comparative analysis across
different ethnic groups. Type 2 Diabetes become a major public health problem in the UAE.
A survey completed by the Ministry of Health in UAE reported that the overall percentage of
people with diabetes was 19.6% among UAE citizen group. Furthermore, recent studies
estimated that 25% of adult Arabs now suffer from diabetes; mainly Type 2 Diabetes; and the
prevalence of the disease is increasing. These observations emphasize the necessity of
considering prevention for diabetes in the UAE. Towards this “Emirates Family Registry”
(EFR) were created to detect loci and genes influencing susceptibility to Type 2 Diabetes
(T2D) and related traits in the United Arab Emirates (UAE) population. Thus Chapter 1
263
touches the implications of genetic research, with specific emphasis on the findings of genome
wide screening of T2D patients among different population.
Chapter 2 discusses the prevalence of Type 2 Diabetes in a small population from the UAE, a
prelude to a more extensive longitudinal study in the future. The disease is currently the
fastest growing debilitating disease in the world. In 2007 the United Arab Emirates was
ranked the second country in the world with the highest prevalence of diabetes. One out of
five UAE nationals aged 20 to 79 lives with diabetes. In order to investigate the genes
influencing susceptibility to Type 2 Diabetes in ethnic groups in the UAE population;
collaboration have been established with major hospitals and diabetes centres in the country.
Through this collaboration, demographic data of patients have been evaluated and tabulated in
highly professional database called Emirates Family Registry. To date the Emirates Family
Registry contains 23,064 volunteers (see Figure 1). Information within the Emirates Family
Registry has revealed obesity, waist circumference, consanguineous marriage, family history,
lack of physical activity, unhealthy diet with high total cholesterol and triglycerides levels
were more prevalent in Type 2 Diabetes patients in the United Arab Emirates. These
observations could lead to better diagnoses, treatment and intervention. The need to continue
to add patients to the database as they are found and treated; as well as those that do not
presently have the disease is extremely important. This kind of study and continued collection
of data could lead to the genomic studies needed to control of Diabetes. This would be a great
thing for the patient, families, and the healthcare system of any country.
Chapter 3 estimates the heritability of eight traits used to evaluate the influence of genetic
component on phenotypic variation that associated with Type 2 Diabetes and describes the
role of genes and the influence of the environmental on the increasing prevalence of Type 2
Diabetes in an extended family of Arab origin. The study exposed strong phenotypic
correlations between fasting glucose levels and HbA1c, and between these two traits and waist
circumference. The findings presented also indicate a heritable tendency for obesity in this
family, indicated by waist circumference and BMI values. The results presented show a
strong familial aggregation of quantitative traits associated with T2D. Further studies are
underway to identify potentially specific genetic loci in Arab populations. This assessment of
phenotypic factors will be followed up with ongoing studies to evaluate the contribution of
264
genetic polymorphisms that contribute to the prevalence of Type 2 Diabetes in Arab
populations.
Chapter 4 describes the use of FTATM technology for storage DNA and a Whole Genome
Amplification step prior to GWAS application as an alternative strategy for high throughput
genotyping. In this study, three different sources of DNA was assessed (namely, degraded
genomic DNA, amplified degraded genomic DNA and amplified extracted DNA from FTATM
card) as suitable templates in for genome-wide analysis using Illumina’s Human 660w-Quad
Bead Chip. The study showed amplified extracted DNA from FTATM card has the highest
accurate call rates in comparison to other DNA sources; amplified and not-amplified genomic
DNA. Thus FTATM Cards is a routine and cost effective technology that is a simple method
for preservation of bio-specimens, amenable to high throughput DNA extraction, all the
attributes required to undertake successful Genome Wide Association Studies in an efficient
manner. To the best of our knowledge, this is the first description of FTATM sourced DNA for
high throughput genotyping to study human polymorphisms.
Chapter 5 examines the evolutionary relationships of unstudied population, the Bedouins of
the Middle East and evaluates the distribution of specific POALINS of the Major
Histocompatibility Complex (MHC) with previous analyses of specific population groups such
as African, European, Asian and descent [2-8]. The study segregated the populations into 3
phylogenetic groups; the Asian subpopulation, the Bedouins and Caucasians, and the three
included African subpopulations. Based on our results we concluded that Bedouin population
were similar to those in Australian Caucasian. However further analyses of the Bedouin
population is needed for better understanding of their unique genetic background and the
diseases that affect this group of individuals.
Chapter 6 and 7 examines the genes that may influence susceptibility to Type 2 Diabetes and
obesity in Type 2 Diabetes patients from the United Arab Emirates through a sophisticated
technology by studying 660,000 Single Nucleotide Polymorphisms throughout the genome.
To date, a genome wide scan for Type 2 Diabetes have been performed in over 20 different
populations, including Europeans, American Caucasians, Mexican Americans, Pima Indians,
African Americans and Asians [1, 9-18]. Results from these studies have indicated that Type 2
Diabetes susceptibility loci reside in a number of different chromosomes. From this
265
perspective, this study is the first genome wide screen in the Middle East focusing on
identification of the genes involved in the development of Type 2 Diabetes among the UAE
population.
Therefore the Genome Wide Association Study analyses in Arab population among Type 2
Diabetes patients were only an initial step in the explication of susceptibility variants. The
result obtained (Chapter 6 ) from the GWAS analysis identified variation at PRKD1 (Protein
Kinase D1) on 14q11 as being associated with Type 2 Diabetes among an extended family of
319 member (see Figure 4) of an Arab descent living in United Arab Emirates.
The study (Chapter 7) identified loci that were replicated in different cohort in Caucasian
population such as the ADAM30, GALNTL4, JAZF1, and DGKG genes regions that are
associated with obesity in Type 2 Diabetes patients [18, 19]. Moreover, this study also
identified several loci that were not detected earlier and are associated with Type 2 Diabetes.
The most significant statistical evidence for association with Body Mass Index was found in
HPS3 gene and FBX031 for Waist Circumference.
Further investigation of the identified genes is needed to understand the mechanism and
association of these genes with Type 2 Diabetes and obesity. Our findings call for the need of
further replication in other ethnic groups. As we uncover more variants, we will gain a better
basic understanding of Type 2 Diabetes among Arab population, which in turn will open doors
to previously unimagined areas of clinically relevant research.
As Phase One of the Emirates Family Registry project draws to a close, the collaborations
established with regional and international partners will see the expansion of the project to
other Gulf Cooperation Council countries. To conduct more Genome wide association study
in Arabs requires a joint effort among Arab institutions and since they are assortment of ethnic
groups in the region, phase two of Emirates Family Registry will cover a diverse array of
different populations (eg. Arabs, Bedouins, Persians, Kurds, Lebanese, Palestinians, Turks,
etcetera). An understanding of the genetic diversity in the region will provide an insight into
mechanisms that cause disease. These developments could possibly lead to improved
intervention and prevention programs to improve the quality of life throughout Arab nations.
266
REFERENCES
1. Scott, L.J., et al., A genome-wide association study of type 2 diabetes in Finns detects
multiple susceptibility variants. Science, 2007. 316(5829): p. 1341-5.
2. Dunn, D.S., et al., The distribution of major histocompatibility complex class I
polymorphic Alu insertions and their associations with HLA alleles in a Chinese
population from Malaysia. Tissue Antigens, 2007. 70(2): p. 136-43.
3. Dunn, D.S., et al., The association between HLA-A alleles and young Alu dimorphisms
near the HLA-J, -H, and -F genes in workshop cell lines and Japanese and Australian
populations. J Mol Evol, 2002. 55(6): p. 718-26.
4. Dunn, D.S., et al., Association of MHC dimorphic Alu insertions with HLA class I and
MIC genes in Japanese HLA-B48 haplotypes. Tissue Antigens, 2003. 62(3): p. 259-62.
5. Kulski, J.K. and D.S. Dunn, Polymorphic Alu insertions within the Major
Histocompatibility Complex class I genomic region: a brief review. Cytogenet Genome
Res, 2005. 110(1-4): p. 193-202.
6. Dunn, D.S., B.D. Tait, and J.K. Kulski, The distribution of polymorphic Alu insertions
within the MHC class I HLA-B7 and HLA-B57 haplotypes. Immunogenetics, 2005.
56(10): p. 765-8.
7. Yao, Y., et al., Polymorphic Alu insertions and their associations with MHC class I
alleles and haplotypes in Han and Jinuo populations in Yunnan Province, southwest of
China. J Genet Genomics, 2009. 36(1): p. 51-8.
8. Yao, Y., et al., The association between HLA-A, -B alleles and major
histocompatibility complex class I polymorphic Alu insertions in four populations in
China. Tissue Antigens, 2009. 73(6): p. 575-81.
9. Florez, J.C., et al., A 100K genome-wide association scan for diabetes and related
traits in the Framingham Heart Study: replication and integration with other genome-
wide datasets. Diabetes, 2007. 56(12): p. 3063-74.
10. Hayes, M.G., et al., Identification of type 2 diabetes genes in Mexican Americans
through genome-wide association studies. Diabetes, 2007. 56(12): p. 3033-44.
11. Rampersaud, E., et al., Identification of novel candidate genes for type 2 diabetes from
a genome-wide association scan in the Old Order Amish: evidence for replication from
diabetes-related quantitative traits and from independent populations. Diabetes, 2007.
56(12): p. 3053-62.
267
12. Rung, J., et al., Genetic variant near IRS1 is associated with type 2 diabetes, insulin
resistance and hyperinsulinemia. Nat Genet, 2009. 41(10): p. 1110-5.
13. Saxena, R., et al., Genome-wide association analysis identifies loci for type 2 diabetes
and triglyceride levels. Science, 2007. 316(5829): p. 1331-6.
14. Steinthorsdottir, V., et al., A variant in CDKAL1 influences insulin response and risk of
type 2 diabetes. Nat Genet, 2007. 39(6): p. 770-5.
15. Takeuchi, F., et al., Confirmation of multiple risk Loci and genetic impacts by a
genome-wide association study of type 2 diabetes in the Japanese population.
Diabetes, 2009. 58(7): p. 1690-9.
16. Timpson, N.J., et al., Adiposity-related heterogeneity in patterns of type 2 diabetes
susceptibility observed in genome-wide association data. Diabetes, 2009. 58(2): p.
505-10.
17. Yasuda, K., et al., Variants in KCNQ1 are associated with susceptibility to type 2
diabetes mellitus. Nat Genet, 2008. 40(9): p. 1092-7.
18. Zeggini, E., et al., Meta-analysis of genome-wide association data and large-scale
replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet, 2008.
40(5): p. 638-45.
19. Nakajima, T., et al., TIP27: a novel repressor of the nuclear orphan receptor
TAK1/TR4. Nucleic Acids Res, 2004. 32(14): p. 4194-204.
268