The EFR Project: a Collaborative Network to Establish an ... · I thank my faith friends: Moza Alnahyan, Amal Alghanim, Laila Alsayegh and Ahlam Salmeen for their support, perspective,

The EFR Project: a Collaborative Network to Establish an

Arabian Bio-bank Resource to Identify Disease Genes of

Indigenous Populations.

Habiba Sayeed Al-Safar BSc (Biochemistry)

MSc (Medical Engineering)

This thesis is presented for the degree of

Doctor of Philosophy

Centre for Forensic Science

2011

i

i

DEDICATION

To My special Uncle Hamza

Thank you for being there every step of the way

Thank you for guiding me when I went astray

Thank you for everything you did throughout my whole life

ii

ii

DECLARATION

This thesis is submitted to University of Western Australia in fulfillment of the requirements

for the Degree of Doctor of Philosophy.

This thesis has been composed by myself from results of my own work, except where stated

otherwise, and no part of it has been submitted for a degree at this, or at any other university.

Habiba Sayeed Al Safar

iii

iii

PREFACE This thesis is presented as a series of eight chapters. The introductory chapter sets the basis for

the work under taken during the tenure of this study. The final commentary summarises the

main features and findings of the work performed and establishes the next phase of work that

is required. In between are six chapters presented as manuscripts in format of journal that they

have been submitted to. Preceding each manuscript, a general synopsis with specific authors

contributions are declared. Each chapter contains the basic components of an article, namely

an abstract or summary; introduction; materials and methods; results with concluding remarks;

acknowledgments, and a bibliography in the format of the journal to which the manuscript is

submitted.

iv

iv

TABLE OF CONTENTS

DEDICATION ................................................................................................................... i

DECLARATION .............................................................................................................. ii

PREFACE ....................................................................................................................... iii

TABLE OF CONTENTS ................................................................................................ iv

ACKNOWLEDGMENTS ........................................................................................... viii

ABSTRACT ...................................................................................................................... x

LIST OF ABBREVIATIONS ........................................................................................ xv

DEFINITIONS .............................................................................................................. xvi

CHAPTER 1 ..................................................................................................................... 1

LITERATURE REVIEW: AN OVERVIEW OF FACTORS THAT PREDISPOSE

TO TYPE 2 DIABETES IN DIFFERENT POPULATIONS AND THE NEED OF

GENOME STUDIES IN ETHNIC POPULATION OF THE MIDDLE EAST. ........ 1

Epidemiology of Type 2 Diabetes ...................................................................................... 5

Types of Diabetes ............................................................................................................. 10

Risk Factors ...................................................................................................................... 11

Symptoms ......................................................................................................................... 13

Screening .......................................................................................................................... 13

Treatment ......................................................................................................................... 13

Prevention ......................................................................................................................... 15

Genetic approach towards understanding Type 2 Diabetes ............................................. 18

Positional candidate genes approach ................................................................................ 22

Whole genome screen approach ....................................................................................... 23

Association studies ........................................................................................................... 23

Linkage studies ................................................................................................................. 26

Identifying genes contributing to Diabetes ...................................................................... 27

Previous studies ................................................................................................................ 28

Animal studies .................................................................................................................. 28

Human studies .................................................................................................................. 28

Genome wide scans of different populations ................................................................... 30

Asia ................................................................................................................................... 34

Chinese population ........................................................................................................... 34

Japanese population .......................................................................................................... 35

v

Indian population .............................................................................................................. 38

North America .................................................................................................................. 39

Pima Indian population .................................................................................................... 39

Amish population ............................................................................................................. 41

African American population ........................................................................................... 42

Mexican American population ......................................................................................... 43

Europe .............................................................................................................................. 44

Dutch population .............................................................................................................. 45

Ashkenazi Jews population .............................................................................................. 45

Finnish population ............................................................................................................ 46

French population ............................................................................................................. 48

Middle East ...................................................................................................................... 51

Historical Background of Arabs ....................................................................................... 51

Arab Migration: ................................................................................................................ 54

Genetic Disorders in the Arab world: .............................................................................. 54

Need and Scope of Medical Researches in the Arab World: ........................................... 56

United Arab Emirates (UAE) ........................................................................................... 56

Conclusion ........................................................................................................................ 59

CHAPTER 2 ................................................................................................................... 69

THE PREVALENCE OF TYPE 2 DIABETES MELLITUS IN THE UNITED

ARAB EMIRATES: JUSTIFICATION FOR THE ESTABLISHMENT OF THE

EMIRATES FAMILY REGISTRY. ............................................................................ 69

Abstract ............................................................................................................................ 77

Introduction ...................................................................................................................... 78

Results .............................................................................................................................. 83

Discussion ........................................................................................................................ 91

Conclusion ........................................................................................................................ 96

Acknowledgements .......................................................................................................... 97

References ........................................................................................................................ 98

CHAPTER 3 ................................................................................................................. 101

HERITABILITY OF QUANTITATIVE TRAITS ASSOCIATED WITH TYPE 2

DIABETES IN AN EXTENDED FAMILY FROM THE UNITED ARAB

EMIRATES. .................................................................................................................. 101

Abstract .......................................................................................................................... 107

vi

Introduction .................................................................................................................... 108

material and methods ..................................................................................................... 110

Results ............................................................................................................................ 112

Discussion ...................................................................................................................... 116

Acknowledgment ........................................................................................................... 118

References ...................................................................................................................... 119

CHAPTER 4 ................................................................................................................. 121

EVALUATION OF DIFFERENT SOURCES OF DNA FOR USE IN GENOME

WIDE STUDIES ........................................................................................................... 121

Abstract .......................................................................................................................... 131

introduction .................................................................................................................... 132

Material and Methods ..................................................................................................... 135

Results ............................................................................................................................ 137

Discussion ...................................................................................................................... 146

Acknowledgements ........................................................................................................ 150

conflict of Interest .......................................................................................................... 151

References ...................................................................................................................... 152

CHAPTER 5 ................................................................................................................. 155

CHARACTERISATION OF MHC POLYMORPHIC ALU INSERTIONS

(POALIN) IN A POPULATION OF ARAB BEDOUINS. ....................................... 155

Abstract .......................................................................................................................... 161

Introduction .................................................................................................................... 162

Materials and Methods ................................................................................................... 164

Results ............................................................................................................................ 167

Discussion ...................................................................................................................... 176

Acknowledgements ........................................................................................................ 179

References ...................................................................................................................... 180

CHAPTER 6 ................................................................................................................. 183

A GENOME WIDE SEARCH FOR TYPE 2 DIABETES SUSCEPTIBILITY

GENES IN ARAB FAMILIES. ................................................................................... 183

Abstract .......................................................................................................................... 189

Introduction .................................................................................................................... 190

Results ............................................................................................................................ 192

Discussion ...................................................................................................................... 205

vii

Materials and methods ................................................................................................... 210

Acknowledgment ........................................................................................................... 213

Conflict of Interest ......................................................................................................... 214

References ...................................................................................................................... 215

CHAPTER 7 ................................................................................................................. 221

A GENOME-WIDE ASSOCIATION STUDY EXAMINING OBESE FACTORS IN

AN ARAB FAMILY WITH A HISTORY OF TYPE 2 DIABETES ....................... 221

Abstract .......................................................................................................................... 227

Introduction .................................................................................................................... 228

MaterialS and MethodS .................................................................................................. 230

Results ............................................................................................................................ 232

Discussion ...................................................................................................................... 245

Acknowledgment ........................................................................................................... 250

References ...................................................................................................................... 251

CHAPTER 8 ................................................................................................................. 257

COMMENTARY AND FINAL REMARKS ............................................................. 257

Commentary and Final Remarks .................................................................................... 259

References ...................................................................................................................... 266

viii

ACKNOWLEDGMENTS

IN THE NAME OF GOD, THE MERCIFUL, THE CLEMENT!

PRAISE BE TO GOD, Lord of the Worlds, and prayer and peace upon the Lord of the

Prophets, Our Lord and Master Muhammad and upon his family and companions prayer and

peace perpetually required until the Day of Judgment.

The marvelous journey has come to an end. Over recent years I have come to realize that

doing a Ph.D. is the best job one can have, and that Australia is actually the best place for

doing it. I consider myself lucky that I have been given the opportunity to do my Ph.D. here, at

the University of Western Australia, both for professional and social reasons.

First and foremost, I would like to thank GOD the merciful and the passionate for giving me

wisdom and guidance throughout my life and I do believe that GOD send his blessings in to

me form of people.

I would not be able to name everyone separately and to thank for everything that they did for

me, however I would like to take the opportunity and express a few words of thanks to my

best colleagues, friends and family.

I am grateful to His Excellency, Lieutenant General Dhahi Khalfan Tamim, the Dubai Police

Commander-in-Chief for the scholarship, which enabled me to undertake this doctoral work at

the University of Western Australia. I am also thankful to Mr. Ahmed Al-Mansori, Head of

scholarship section in Dubai Police for his assistance.

This study would not have been possible without the general support of my two supervisors,

Dr. Guan Tay and Dr. Kamal Khazanehdari who opened up the real ‘world of worms’ to me!

They were generous in providing me this opportunity to receive my Ph.D., and very brave to

take me on as their student. They have always been eager to help me through the toughest

challenges during my time at University of Western Australia and Dubai.

I am grateful to the staff at Molecular Biology and Genetics (MBG) Department in Central

ix

Veterinary Research Laboratory (CVRL) for their help, friendship and useful discussions. I

would like to thank and acknowledge every one of you. We had nice times, stories, humors

which will be always in my memory.

I am also most grateful to staff at Telethon Institute for Child Health Research, who always

made me feel very welcome and who gave all possible assistance in the search for

bioinformatics and biostatistics. In particular, I thank Dr. Sarra Jamieson, Richard Francis and

Professor Jenefer Blackwell for all their efforts.

I acknowledge the valuable contributions of Dr Heather Cordell, who not only gave of her

time generously and imparted enormous detail, but also followed up with further advice or

sources of information.

I would like to acknowledge the sources of financial support for this research: CVRL,

Emirates Foundation and Dubai Police Head Quarter. Without them, this study would not have

been possible.

This whole thesis could not have looked like it is without my best friend Jenan. I more than

appreciate her help and support, on occasion, she has dried my tears.

I thank my faith friends: Moza Alnahyan, Amal Alghanim, Laila Alsayegh and Ahlam

Salmeen for their support, perspective, and encouragement.

I would like to thank my cousin Hind Alsafar a graphic designer for all her help and support in

designing all the posters, which I participated at conferences and seminars.

Last but of course not least, this work would not have been achieved without the support and

understanding of my family. I would like to thank my mum, dad, sisters and brothers. My

hard-working parents have sacrificed their lives for my sisters, brothers and myself and

provided unconditional love and care. I love them so much, and I would not have made it this

far without them. I know I always have my family to count on when times are rough.

The work described in this thesis was performed with approval from the University of Western

Australia's Human Research Ethics Committee with reference # RA/4/1/4432.

x

ABSTRACT

This project was developed back in 2006 with the aim to detect loci or gene(s) that may

influence susceptibility to Type 2 Diabetes (T2D) and related traits in individuals of Arab

descent. This was required the comparative study of patients and unaffected individuals.

Samples were made available from consenting volunteers from United Arab Emirates (UAE)

population. Phenotypic data and the genotyping results were systematically compiled in bio-

banking and data repository known as the “Emirates Family Registry” (EFR). When the

project was initially conceived, data on DNA haplotypes in the tribes of the Middle-East was

limited. Coincidentally, significant advances in DNA technology, particularly in the field of

DNA arrays, provide the opportunity to study this group of people. Over the past four years,

basic infrastructure that will allow longitudinal genetic studies have started to emerge. This

study has specifically benefited through access to information on the Bedouin people, a

predominantly desert-dwelling Arab ethnic group.

In the first instance, the study examined the evolutionary relationship between the Arab

Bedouin and other ethnic groups. Polymorphic Alu insertions (POALINS) are genetic markers

that are widely distributed through the human genome. These markers have been used in a

range of applications, including anthropological analyses of human populations. In an effort

to understand the evolutionary relationship of the Bedouin population in the context of other

ethnic groups, the frequencies of individual insertions of four POALINs within the human

Major Histocompatibility Complex (MHC) class I region, namely AluyMICB, AluyTF, AluyHJ

and AluyHF; were compiled. The phylogenetic tree was constructed using MEGA version 4.

The genotype frequencies of each of these POALINS in Bedouins were found to be very

similar and nearly identical to that previously reported for Caucasians in an Australian study.

For AluyHJ, the highest frequency for allele*1 was found in Malaysian Chinese, northeastern

Thais, Japanese, and Mongolians (0.376 to 0.292). In contrast, the frequency in Bedouins

(0.242) was similar to that previously reported for Australian Caucasians (0.273), each

representing the second highest allele frequency. The African subpopulations showed a lower

frequency of this allele (0.107 to 0.050). Phylogenetic analysis of the relative allele

frequencies of AluyHJ in combination with the remaining three POALINs markers revealed

that Bedouins have a similar lineage to Caucasians, at least for the MHC region studied. The

structure of the phylogenetic tree supports the popular contention that humans originated in

xi

Africa. The nature of the clusters suggests that the Middle East represent a crossroads from

which humans populations migrated toward Asia in the east and Europe to the northwest.

The characteristics of Arab population make them ideal for the study of complex, polygenic,

multifactorial disorders such as Type 2 Diabetes (T2D). In the United Arab Emirates (UAE)

alone, it has been estimated that one out of five people between the ages of 20 to 79 lives with

this disease. Due to an increasing prevalence of T2D in the region, lifestyle management

strategies with an emphasis on prevention are required. An appreciation of the genetic risk

factors can also make an important contribution to understanding the processes leading to the

disease.

Major hospitals and diabetes centres in the UAE were contacted to establish a bio-banking

facility referred to as the EFR (an abbreviation for the “Emirates Family Registry”). Through

assistance made available by the Ministry of Health and collaborators of this network,

demographic data of T2D patients were collected and collated in a database for analysis and

longitudinal studies in the future. Clinical specimens were collected for Genome Wide

Association Studies (GWAS) study and biochemical profiling (such as; glucose, lipids, HbA1c

levels) were also collected from volunteers who consented to be part of the study.

In the field of epidemiology, GWAS studies are commonly used to identify genetic

predispositions of many human diseases. Large repositories housing biological specimen for

clinical and genetic investigations have been established to store material and data for these

studies. The logistics of specimen collection and sample storage can be onerous, and new

strategies have to be explored. This study established the utility of FTATM cards as a viable

storage matrix for cells from which DNA can be extracted to perform GWAS analyses.

Specifically, three different DNA sources (namely, degraded genomic DNA, amplified

degraded genomic DNA and amplified extracted DNA from FTA card) for GWAS using the

Illumina platform were examined. No significant difference in call rate was detected between

amplified degraded genomic DNA extracted from whole blood; the gold standard for GWAS,

and amplified DNA retrieved from FTATM cards. However, using unamplified- degraded

genomic DNA reduced the call rate to a mean of 42.6% compare to amplified DNA extracted

from FTA card with mean of 96.6%. It is therefore possible to use FTATM stored biological

samples as a source of DNA for GWAS studied, provided that a pre-amplification step is

incorporated into the process.

xii

In the first 24 months of operation, the EFR recruited 23,064 adult volunteers from three

major hospitals and nine primary care centres throughout the UAE. Within this cohort, 88%

were patients classified as T2D patients from the medical records. The cohort was divided

into age categories with 59% of T2D patients aged between 40 and 59 years of age. UAE

nationals comprised 30% of the database of which 21% were diagnosed with T2D. However

the percentage of adults with T2D was higher in other ethnic groups affecting almost 33% of

the Indians who live in the UAE. A total of 741 UAE Nationals consented to donate blood; in

Phase I of the study; for biochemical testing, of which 23% were diagnosed with T2D, 30%

with pre-T2D and 47% were healthy following the completion of testing.

This study subsequently assessed the value of specific clinical markers for T2D among five

generations of an extended Arab family. This family included 319 members of 41 nuclear

families; from which 178 individuals (86 males, 92 females; 66 diabetic, 112 healthy) formed

the study sample set. The heritability of eight quantitative traits (fasting glucose, glycated

hemoglobin (HbA1c), cholesterol, triglyceride, urea and creatinine) were determined. Once the

data in the disease and control groups were stratified, a significant relationship between T2D

status and waist circumference (WC) (p = 2.6, E-9) and BMI (Body Mass Index) (p = 1.0, E-6)

was found. The estimated power for these two traits was 80% to 90%, respectively.

Creatinine (p = 0.002) and cholesterol (p = 0.02) levels were also associated with T2D. Not

surprisingly the results support the link between environmental and genetic factors in the

pathophysiology of T2D and its related phenotypes in an Arab population. To dissect the

mechanisms that cause disease, genetic studies followed.

Firstly, a Family Based Association Test (FBAT) in the same family was performed using the

Illumina Human 660 Quad chip array to better understand the gene(s) that play a role in

pathways that cause T2D disease. The study revealed 21 new association signals from single

nucleotide polymorphisms (SNPs) within five genes (RBM47, KCTD8, GABRB1, SCD5 and

PRKD1). Six SNPs within PRKD1 (Protein Kinase D1) gene on chromosome 14 were found

to be most strongly associated with T2D in this Arab population. It has been suggested that

PRKD1 a serine/threonine kinase; plays an important role in insulin secretion. The strongest

statistical evidence for a new association signal was from rs7154546 in intron 1 of PRKD1,

with the overall estimate of effect returning an odds ratio (OR) of 3.72 (95% confidence

interval, 1.28 to 10.82); (p = 8.46, E-06) using an additive model.

xiii

As mentioned, WC and BMI are phenotypes that have strong heritability values. Since

overweight and obesity are major risk factors for a number of chronic diseases, including T2D

a search to identify common genetic variants that may influence obesity and its association

with T2D was undertaken. Specifically, a GWAS study was conducted in an extended family

with 178 individuals of Arab descent using WC and BMI as indicators. This study revealed

three loci that reached genome-wide significance. The meta-analysis of Caucasian GWAS

resulted in one previously described locus that was associated with WC on chromosome 16

within FBXO31 gene region (rs9308437, p = 7.5, E-7). Another novel association, the

rs2793823 SNP in the ADAM30 (p = 1.86, E-8) gene that has been previously show to be

associated to T2D. One novel SNP (rs7120774) in GALNTL4 was also showed to be

associated with BMI (p =1.82, E-10). The positive associations between SNPs from the JAZF1

loci and BMI, WC, T2D were also confirmed. Further work is required to replicate these

results in other sample sets to validate these preliminary results.

This study is the first GWAS study undertaken in T2D candidates in families of Arab descent.

These findings may provide important insights into the pathogenesis of T2D, in Middle

Eastern populations. Comparative analysis with sequences from other ethnic groups could

assist in dissecting the mechanisms that cause the disease. These efforts will continue to be

important with the increasing affluence of Arab communities. Greater personal wealth in

linked to greater indulgences. It is important to develop an understanding of the relationship

between ethnic specific allelic and haplotypic patterns that leads to disease, in an effort to

control the spread of and manage the consequences of the disease.

In conclusion, comparative genomics in medical science has been widely used to identify

genetic factors that cause disease. Ethnic differences have also been helpful in this respect.

The genetic links of several genetic discoveries that are unique to specific ethnic groups (eg.

hemochromatosis in Caucasians, thalassemia in ethnic groups of the Mediterranean) have been

identified through the comparisons of genomes of different races. Other opportunities

including DNA and race profiling in forensic science will benefit from an appreciation of

ethnic specific differences.

The null hypothesis of this study was that the alleles and genes in the Arabic population that

predispose patients to Type 2 Diabetes were the same as those described for other populations

xiv

previously studied. The genetic factors of interest were studied using GWAS (Genome Wide

Association Study) technology in the context of lifestyle factors that is known to affect

patients living with diabetes. If this hypothesis is rejected, then the alternative, novel genetic

factors unique to the Arabic population contribute to the pathophysiology of the disease.

Regardless of the findings, the data gleaned from this study will result in the characterization

and definition of Arabic haplotypes that are associated with the disease. The genetic

characteristics will have other applications including anthropological and evolutionary

analysis as well as Forensic profiling.

xiv

xv

LIST OF ABBREVIATIONS

AITD Autoimmune Thyroid Disease

BMI Body Mass Index

CVRL Central Veterinary Research Laboratory

dgDNA Degraded Genomic DNA

DIO Diet Induced obese

DNA Deoxyribonucleic Acid

EFR Emirates Family Register

FBAT Family Based Association Test

GWAS Genome Wide Association Studies

IPA Ingenuity Pathway Analysis

LD Linkage disequilibrium

LOD Logarithm of the Odds

MHC Major Histocompatibility Complex

OGTT Oral Glucose Tolerance Test

OR Odd Ratio

p-value Probability Value

PCA principal Componant Analysis

PCR Polymerase Chain Reaction

POALINS Polymorphic Alu insertions

QC Quality Control

QTDT Quantitative Trait transmission Disequilibrium Test

SNP Single Nucleotide Polymorphism

T1D Type 1 Diabetes

T2D Type 2 Diabetes

UAE United Arab Emirates

UWA The University of Western Australia

VMH Ventromedial Hypothalamus

WA Western Australia

WC Waist Circumference

WGA Whole Genome Amplification

WHO World Health Organization

xvi

xvi

DEFINITIONS

Allele An alternative form of a gene that is located at a specific position

on a specific chromosome.

Candidate gene A gene believed to influence expression of complex phenotypes

due to known biological and/or physiological properties of its

products, or to its location near a region of association or

linkage.

Genome The entire complement of genetic material in a chromosome set.

Genotyping call rate Proportion of samples or SNPs for which a specific allele SNP

can be reliably identified by a genotyping method.

Haplotype A group of specific alleles at neighboring genes or markers that

tend to be inherited together.

HapMap Project Genome-wide database of patterns of common human genetic

sequence variation among multiple ancestral population samples.

Hardy Weinberg

Equilibrium

Population distribution of 2 alleles (with frequencies p and q)

such that the distribution is stable from generation to generation

and genotypes occur at frequencies of p2, 2pq, and q2 for the

major allele homozygote, heterozygote, and minor allele

homozygote, respectively under the assumption of natural

selection does not act on the alleles under consideration.

Heritability The proportion of variation in a phenotype (trait, characteristic

or physical feature) that is thought to be caused by genetic

variation among individuals. The remaining variation is usually

attributed to environmental factors. Studies of heritability

typically estimate the proportional contribution of genetic and

environmental factors to a particular trait or feature.

Linkage disequilibrium Association between 2 alleles located near each other on a

chromosome, such that they are inherited together more

frequently than expected by chance.

Linkage Equilibrium Occurs when the genotype present at one locus is independent of

the genotype at a second locus.

xvii

Minor allele frequency Proportion of the less common of 2 alleles in a population (with

2 alleles carried by each person at each autosomal locus) ranging

from less than 1% to less than 50%.

Phenotypes The total characteristics displayed by an organism under a

particular set of environmental factors, regardless of the actual

genotype of the organism.

Polymerase Chain

Reaction

A method for amplifying segments of DNA, by generating

multiple copies using DNA polymerase enzymes under

controlled conditions. As little as a single copy of the DNA

segment or gene can be cloned into millions of copies, allowing

detection using dyes and other visualization techniques.

Population stratification A form of confounding in genetic association studies caused by

genetic differences between cases and controls unrelated to

disease but due to sampling them from populations of different

ancestries.

Power A statistical term for the probability of identifying a difference

between 2 groups in a study when a difference truly exists.

Single Nucleotide

Polymorphism

DNA sequence variations that occur when a single nucleotide

(A, T, C, or G) in the genome sequence is altered. Each

individual has many single nucleotide polymorphisms that

together create a unique DNA pattern for that person. SNPs

promise to significantly advance our ability to understand and

treat human disease.

Whole Genome

Association Study

An examination of genetic variation across a given genome,

designed to identify genetic associations with observable traits.

1

CHAPTER 1

LITERATURE REVIEW: AN OVERVIEW OF

FACTORS THAT PREDISPOSE TO TYPE 2 DIABETES

IN DIFFERENT POPULATIONS AND THE NEED OF

GENOME STUDIES IN ETHNIC POPULATION OF

THE MIDDLE EAST.

2

3

Chapter 1

Literature Review: An Overview of Factors that

Predispose to Type 2 Diabetes in Different Populations and

the Need of Genome Studies in Ethnic Population of the

Middle East.

This Chapter is a prelude to a study to develop an understanding of the environmental and

genetic predisposition that gives rise to the collection of event etiologies resulting in Type 2

Diabetes in indigenous populations of the Middle East. Although the focus of the study is on

the Arab race that has roamed the deserts of the Middle East for centuries known as the

Bedouins, the work is the beginning of a research effort to understand diseases that commonly

affect the many tribes of Arabs. The processes and methods developed towards understanding

the factors that cause Type 2 Diabetes will be expanded beyond this initial effort to unlock

searches for other debilitating disease.

Therefore this chapter will outline the definition of diabetes; the lifestyle and genetic risk

factors of the disease and its potential health consequences. It will also discuss on preventive

measures. This review will also touch the implications of genetic research, with specific

emphasis on the findings of genome wide screening of T2D patients among different

population and ultimately discuss the necessity of genetic and genomic research to study the

disease among the indigenous Arab populations.

4

5

God has honored the human, and excelled in his creation, enable to create his creation in the

best stature, He says in the Qur’an, “Surely, we created man of the best stature”. The human

body is one of the most complex biological systems on earth compared to other living

creatures. It composed of trillions of cells, which contain the body’s hereditary material in the

chemical composition deoxyribonucleic acid (DNA). A person's DNA represents a "genetic

blueprint" that is unique to each individual. DNA consists of two long strands called double

helix composed of units called nucleotides; each group of nucleotides is a gene, which is the

basic physical and functional unit of heredity. Every person has inherited two copies of

chromosomes from his/her biological parents. Because no two human individuals (exception

of identical twins) are composed of the exact same genetic profile, DNA testing is the absolute

means to confirm any biological relationship in doubt.

DNA can provide insights into many intimate aspects of people and their families including

susceptibility to particular diseases, legitimacy of birth, identifying criminals, perhaps

predispositions to certain behaviors and defining ancestry.

In this study, we propose to measure genetic ancestry in Arab population in the United Arab

Emirates (UAE) using genome-wide Single Nucleotide Polymorphism (SNP) arrays. The

identification of polymorphisms that vary in frequency to this population will provide an

opportunity to enhance DNA profiling. Ethnic-specific polymorphisms can be used to profile

biological evidence left at the crime scene to provide information that could be useful in an

investigation. The study of DNA from the local ethnic groups provides a double benefit. Apart

from the development of new opportunities in forensic science, the markers will allow the

study of specific diseases that are common to populations of this region such as Type 2

Diabetes (T2D). Because the frequency of genetic variants can differ across populations, we

aim to detect genes influencing susceptibility to T2D in UAE population.

Epidemiology of Type 2 Diabetes

Type 2 Diabetes is a group of metabolic diseases characterised by hyperglycemia resulting

from defects in insulin secretion, the actions of insulin, or both [1]. Diabetes is currently one

of the most prevalent chronic diseases, which plays a significant role in the lives of millions of

people worldwide leaving others with much morbidity.

6

According to the International Diabetes Federation, the number of people diagnosed with

diabetes has risen from 30 million people to more than 246 million people in only the past

twenty years [2] (Figure 1). This illness is well documented in the United States. It has been

estimated that the total annual economic cost of diabetes in 2002 was estimated to be $132

billion, or one out of every 10 health care dollars spent in the United States [3]. Further, the

report indicates that seven of the ten countries with the highest number of diabetics are in the

developing world rather than where the medicines and treatments might be readily available.

In the Middle East, the percentage of the diabetic population ranges from 12 to 20 percent and

these numbers increase every year along with the rising costs associated with health care

provisions. In 2007, the UAE ranked the second highest noticing terms of diabetes prevalence,

and it is estimated that one out of five people aged 20 to 79 lives with this disease, while a

similar percentage of the population is at risk of developing the disease [4].

The purpose of this review is to outline what is known about diabetes; the lifestyle and genetic

risk factors of the disease and its potential health consequences. It will also touch on

preventive measures as well as management strategies to care for those afflicted with the

disease. This review will also discuss the implications of genetic research, with specific

emphasis on the findings of genome wide screening of T2D patients among different

population and ultimately discuss the necessity of genetic and genomic research to study the

disease among the indigenous Arab populations.

7

Figure 1: The prevalence of Type 2 Diabetes is predicted to rise in all continents according to Wild et al (2004) [5]. The global average of

20% is predicted to rise to 52.8% based on modeling studies by Parves et al (2007) [2]. More significantly, the prevalence will

receive by over 100% in population groups throughout Asia, Africa and the Middle East, with the latter recording the highest rise

of 164%.

8

Simply, diabetes is a disorder of sugar metabolism, which leads to inefficient use of sugar

resources in the body, leading to their accumulation. This in turn causes a range of

pathological consequences in the body, and the patient lives in a compromised state. A

primary factor in diabetes is the level of insulin present in the body. Insulin is the protein that

the body produces naturally to manage the levels of glucose in the system. When the body

produces too little insulin, greater amounts of glucose are allowed to enter the bloodstream

thereby causing the symptoms of the disease. Glucose, a simple sugar, enters the body by way

of ingested food and into every red blood cell via the bloodstream; the cells then break down

the glucose, which acts to supply energy throughout the body. Brain cells, as well as other

organs, are fueled by glucose alone. In diabetics, the body is not able to regulate the levels of

glucose and maintain a stable amount in the cells. This means the body has more than the

necessary glucose levels immediately after a meal but too little otherwise. To maintain a

constant blood-glucose level, the healthy body produces glucagon and insulin, two hormones

originating from the pancreas. Typically, there is balance of these hormones in the

bloodstream with the insulin acting to prevent the concentration of blood glucose from

increasing disproportionately.

The chronic hyperglycemia of diabetes is associated with long-term damage, dysfunction, and

failure of various organs. Several pathogenic processes are involved in the development of

diabetes. These range from autoimmune destruction of the β-cells of the pancreas with

consequent insulin deficiency to abnormalities that result in resistance to insulin action. Long-

term complications of diabetes include retinopathy with potential loss of vision; nephropathy

leading to kidney failure; peripheral neuropathy with risk of foot ulcers, amputations, and

charcot joints; and autonomic neuropathy causing gastrointestinal, cardiovascular and

genitourinary symptoms which can include sexual dysfunction [6].“These life-threatening

consequences strike people with diabetes more than twice as often as they do others” [7].

Patients with diabetes have an increased incidence of atherosclerotic cardiovascular, peripheral

arterial, and cerebrovascular disease. Hypertension and abnormalities of lipoprotein

metabolism are also often found in people with diabetes (Figure 2).

9

Figure 2: Major complications of diabetes include retinopathy, nephropathy, and

peripheral neuropathy with risk of foot ulcers, Charcot joints; and

cardiovascular disease.

10

Types of Diabetes

The complexity of the disease has led to many variants of this condition; however, the more

widely used method of classification is into two broad etiopathogenetic categories: Type 1

Diabetes (T1D) and Type 2 Diabetes (T2D). Simplistically, the modes of the insulin

deficiency in both the cases are different, which is why the treatment methodology also varies

considerably. T1D or Juvenile Diabetes, which occurs primarily in children, is caused by an

absolute deficiency of insulin secretion. This type of the disease afflicts less than 10 percent of

all diabetics. T2D; which is also referred to as ‘non-insulin-dependent’ or ‘adult-onset diabetes

is caused by a combination of resistance to insulin action and an inadequate compensatory

insulin secretory response. More than 90 percent of diabetics suffer from this disease, which

normally afflicts those over 40 years of age.

In some ways, TID can be considered as lesser of the two evils, since proper dosing of insulin

at regular intervals enables the person to lead an active and healthy life. Compliance to

treatment is high as the patients and their families are acutely aware of the role of therapy.

T2D however, is mainly the result of environmental influences such as, sedentary mode of

living, as well as imbalanced and improper eating habits that is compounded by an underlying

genetic background. In most cases, the sufferer is obese, which results in the inability of the

body to take up excess load of sugar levels. In this form of the disease there is no absence of

insulin production as is the case of T1D [8].

Initial epidemiological data will reveal a very simplified version of age groups that prevail in

one or the other type of diabetic condition. For example, T1D generally occurs in younger

patients, who may not be obese and those who present with symptoms such as ketoacidosis.

T2D however, has been reported to be more prevalent in the older age groups, where Body

Mass Index (BMI) is the primary factor that leads to T2D [8]. However, with the transfer of

sedentary mode of living in the younger age groups and children, and with better health care

delivery in the older age groups, the boundaries between the two types of diabetes are very

likely to overlap. Therefore, contemporary definitions are beginning to lose their

specifications with regard to the type of diabetes and their occurrence and prevalence in

various age groups. What is more important is to know that diabetes and its various forms

must be considered in all its aspects in patients regardless of their age [8], and perhaps a new

set of biomarkers considered with advances in the post genomic area.

11

Gestational diabetes is similar to T2D and can arise in all categories of women who are

pregnant. Studies have confirmed that nearly all women with a history of gestational diabetes

have about a 40 percent chance of developing diabetes in the future. “Other specific types of

diabetes, which may account for one to two percent of all diagnosed cases, result from specific

genetic syndromes, surgery, drugs, malnutrition, infections, and other illnesses” [9]. Women

with gestational diabetes experience an abnormal tolerance to glucose and have somewhat

elevated insulin levels. While pregnant, the effects of insulin are blocked by various

hormones, which act to desensitize the patient to the insulin her body produces. This form of

diabetes can be effectively treated by supplementing insulin injections and by submitting to

specialised diets. Normally, the symptoms of gestational diabetes do not continue in the

woman following the birth of the baby.

The classification of diabetes has been of significant value in the progression of researches

related to diabetes. The different pathological findings and clinical presentations in each

variation led to much confusion regarding the pathology of each type and what genes

contribute in each case [8, 10].

Risk Factors

Currently over 170 million people globally suffer from T2D. Most of these patients are middle

aged, however, variations in this regard are not rare, and are affected by factors such as

lifestyle, heredity, as well as behavioral factors [11].

There are several factors that influence T1D such as, the immune system, the environment and

genetics whereas the risk factors for T2D are more clearly defined. These include obesity,

physical inactivity, elderly people, family history of diabetes, a past history of gestational

diabetes and those with a weakened tolerance for glucose. Ethnicity is another risk factor. For

example “African Americans, Hispanic/Latino Americans, American Indians, and some Asian

Americans and Pacific Islanders are at particularly high risk for T2D” [7].

What is interesting to note is the role of urbanisation and changes in the living style that help

in the propagation and prevalence of the disease. Populations such as Mapuche Indians and

Chinese, who are living in rural areas of mainland, have a very low percentage of diabetics

amongst them. This points clearly to the role of physical and environmental factors that are

12

also contributory to the development of the disease [12]. Again, some of the highest numbers

have been seen among the Pima Indians in Arizona and the Naura, which points towards the

role of genetics in the development of the condition [12]. This means that diabetes is a

condition that is very much affected by both environmental and genetic factors, and both can

come into play in varying degrees in the pathology among various populations.

Although about 33 percent of people with the illness are unaware of their condition, nearly

three million or almost 12 percent of the African American population over 20 years of age

suffer with symptoms of diabetes. Because of this, African Americans have been identified as

being at greater risk than those of Anglo descent to suffer macro-vascular problems such as

strokes and heart disease. “African Americans are 1.6 times more likely to have diabetes than

non-Latino whites. 25 percent of African Americans between the ages of 65 and 74 have

diabetes. One in four African American women over 55 years of age has diabetes” [7]. The

disproportionate gap that exists between the African American population and others

regarding diabetes continues to widen. “National health surveys during the past 35 years show

that the percentage of the African American population that has been diagnosed with diabetes

is increasing dramatically” [13]. In a thorough investigative study conducted from 1976 to

1980, the total prevalence of diabetes was less than nine percent in African Americans aged 40

to 75. Another similar study conducted between 1988 and 1994 showed that this number had

increased two-fold to more than 18 percent while in the white community the rate rose only

slightly to just over ten percent.

African Americans, Hispanic/Latino Americans, American Indians and those with a family

history of diabetes also experience a greater chance of contracting gestational diabetes than do

those of other life classifications. In addition, the women who have contracted this form of

diabetes find themselves at a higher risk for developing T2D later in life.

The prevalence of diabetes in different populations is very variable. These are stated as 5% or

near to these in Asian populations. Almost 50% of the Pima Indian population suffers from

diabetes [14].

What is understood is that there are both monogenic as well as polygenic forms of the

condition that can occur in a wide variety of variations. While the simple classification method

of T1D and T2D are helpful in researching, they are still not able to identify in between cases,

13

and therefore, a more extensive time period of continuous research is required to understand

the true nature of this disease [15].

Symptoms

Diabetics display numerous symptoms including “excessive thirst (polydipsia), frequent

urination (polyuria), extreme hunger or constant eating (polyphagia), unexplained weight loss,

presence of glucose in the urine (glycosuria), tiredness or fatigue, changes in vision, numbness

or tingling in the extremities (hands, feet), slow-healing wounds or sores and abnormally high

frequency of infection” [16]. These various symptoms are common to both forms of diabetes.

However, patients do not necessarily succumb to all of the signs mentioned above.

Screening

The method of detection of diabetes is mainly through blood glucose analysis at various time

frames, which is then compared with the normal levels. Ideally, in the fasting stage, the blood

sugar levels must be no more than 126 milligrams per deciliter, or 7 millimoles per liter. In

random state, this level must be no more than 200 milligram per deciliter confirmed via two

sets of separate readings. Any increase in the amount of sugar than these is considered a case

of diabetes, and proper administration of medication and life modification techniques are

advised and administered [8]

A Hemoglobin component A1C (HbA1c) test measures the level of glucose in blood cells. The

diabetic who has not received treatment may show levels as high as 10 percent while a person

not afflicted with the disease tests at close to five percent. As previously discussed, the lack of

insulin production allows higher levels of glucose in cells. High levels of blood glucose (or

sugar) in the bloodstream leads to various diabetic related health complications if allowed to

go unchecked [17].

Treatment

While there is no known cure for the disease, diabetes can be effectively managed with proper

specific lifestyle regimes. “The key to treating diabetes is to closely monitor and manage your

blood-glucose levels through exercise, diet and medications” [16]. The type of diabetes

14

dictates the type of treatments to be followed. T1D must examine their blood-glucose levels

many times per day and inject insulin accordingly, usually at mealtime so as to help manage

the glucose being ingested. The supplementing of insulin assures that blood glucose levels

maintain stability. T2D have the ability to control the disease through personal lifestyle

decisions such as the loss of weight, exercising more and not smoking at all. In severe

instances, medication may need to be given to control glucose levels. Diabetics are able to

significantly decrease the risks of complications due to the disease if they are willing to

educate themselves then apply that knowledge to their daily lives.

Optimistically, modern medicine can bring up to date the treatment for many diseases. One of

the most important goals of contemporary biomedical research is to provide medical care to an

individual's needs, based on information from the individual's genotype or gene expression

profile, so-called personalised medicine. These principles can offer huge advances in medical

care but can only succeed if the genetic variation of humans can be accurately mapped.

The advent of a new generation of experimental techniques, has now given biomedical

researchers the opportunity to map the complete genetic variation of large numbers of humans

via full genome sequencing. The data produced from such efforts will provide an unparalleled

amount of information that can be used to stratify the human race, and help tailor medical care

that targets the specific needs of different populations and individuals. The technology to test

massive volumes is continuously evolving and the computing capability to manage datasets, is

also keeping pace with the exponential increase in sequences capability. Personalised

medicine is thus on the brink of a major breakthrough.

A T1D patient’s diet should include about 35 calories per kg of body weight per day. T2D

patients are commonly restricted to approximately 1500 to1800 calorie diet per day. These

regimes are to control the onset of obesity and to maintain an ideal body mass. These

numbers, of course, vary somewhat depending on the patient’s gender and age along with their

current weight and body type and their level of physical activity. Those diabetics who are

overweight when they begin the nutritional program may require more initial calories until

their weight drops to a more normal level. The reasoning is that too rapid a weight loss can be

unhealthy and it takes additional calorie intake to sustain a larger body frame. Gender also

plays a role in a proper program as males generally possess a greater muscle mass than

females and consequently may require a higher intake of calories. Because muscle uses up

15

more calories per hour than does fat, people who are not physically active will have less need

for calorie intake, a good reason for everyone, and especially those with diabetes, to exercise

regularly and build-up muscle mass. In other words, if you like to eat, supplement it with

proportional amounts of exercise. There are different theories regarding the most effective

diet but the fact that diet is very important in controlling the symptoms of diabetes is

indisputable (American Diabetes Association, 2006). A diabetic’s daily calorie intake,

generally speaking, should consist of 40 to 60 percent carbohydrates because the lower the

carbohydrate intake, the lower the levels of sugar that enters the bloodstream. The advantages

associated with carbohydrate intake are negated by the patient’s intake of foods that are high

in fat. This dilemma can be circumvented by the substitution of polyunsaturated and

monounsaturated fats for saturated fats. “Most people with diabetes find that it is quite helpful

to sit down with a dietician or nutritionist for a consultation about what is the best diet for

them and how many daily calories they need. It is quite important for diabetics to understand

the principles of carbohydrate counting and how to help control blood sugar levels through

proper diet” [18].

Prevention

According to the Florida Department of Health, the proper management of glucose in the

bloodstream benefits people with both type of diabetes. “For every one point reduction in

HbA1C, the risk for developing micro-vascular complications (eye, kidney and nerve disease)

decreases by up to 40 percent. Blood pressure control can reduce cardiovascular disease

(heart disease and stroke) by 33 to 50 percent and can reduce micro-vascular disease (eye,

kidney and nerve disease) by approximately 33 percent. Improved control of cholesterol and

lipids (e.g. HDL, LDL, and triglycerides) can reduce cardiovascular complications by 20 to 50

percent. Detection and treatment of diabetic eye disease with laser therapy can reduce the

development of severe vision loss by an estimated 50 to 60 percent. Comprehensive foot care

programs can reduce amputation rates by 45 to 85 percent.” [19]. Proper weight control,

increased activity and not smoking should also coincide with regular visits to the doctor in

order to better regulate blood pressure, glucose and cholesterol levels. The patient would be

best served if they form a team-like relationship with their health care professionals. “Because

people with diabetes have a multi-system chronic disease, they are best monitored and

managed by highly skilled health care professionals trained with the latest information on

16

diabetes to help ensure early detection and appropriate treatment of the serious complications

of the disease” [7].

In “Metallothionein-Mediated Antioxidant Defense System and Its Response to Exercise

Training Are Impaired in Human Type 2 Diabetes” [20], the authors discuss the importance of

metallothioneins I and II (MT1 and MT2) as part of the antioxidant defense system and its

relationship to exercise in the diabetic patient. Previous studies on these antioxidants have

indicated that exercise has only beneficial effects on the production of MT1 and MT2, but the

research team noticed that none of the studies had actually been conducted on people with

T2D. Further evidence had suggested the possibility that these important chemicals are

reduced with exercise in persons with T2D. During the study, it was confirmed that levels of

MT1 and MT2 are increased in the skeletal muscle tissue and plasma of healthy individuals

who have participated in a regular exercise program. Participants who had T2D showed no

corresponding increases though. While the study was careful to note that there were no

increases or decreases in MT1 and MT2 levels in the skeletal musculature in these patients, it

was also noted that levels were decreased somewhat in the plasma levels. Decreased MT1 and

MT2 can lead to oxidative stress, which “contributes to the development and acceleration of

related conditions such as nephropathy, neuropathy, retinopathy and macro- and microvascular

damage” [20]. At the same time, tissue samples taken from patients with Type T2D indicated

increased oxidative stress from the control group with tissue appearing more susceptible to

damage.

As further research is conducted as to just how important the decreased levels of MT1 and

MT2 are in the overall health and well-being of the diabetic patient, some changes may occur

in the types of physical therapy recommended for these patients. Before this occurs, however,

it must be determined the exact role these compounds play in the antioxidant defense as well

as whether pharmacological or therapeutic treatment options will work best to provide the

patient with the greatest possible benefit.

However, exercise will continue to play a large role in the treatment of diabetic patients thanks

to the many other benefits it offers. According to Kennedy et al (1999), exercise also helps to

distribute GLUT4 throughout the body, a process that does not occur as readily in the person

with diabetes as it does in those without the illness. GLUT4 is the glucose transporter that

brings glucose into the cell through the plasma membrane. For various reasons, GLUT4 is

17

considered to be “the major mechanism responsible for the increased rate of glucose transport

after insulin or exercise stimulation” [21]. However, this is a process that takes place

primarily in the skeletal muscle, which, in the diabetic patient, has proven to decrease insulin-

stimulated uptake. This study showed that the muscle is not similarly resistant to the effects of

exercise by demonstrating that the GLUT4 transporter enters the plasma membrane in

response to exercise where it doesn’t respond to insulin. “In contrast to insulin stimulation,

acute exercise promotes normal glucose uptake and GLUT4 translocation” [21]. In addition,

the study showed that exercise can increase the GLUT4 levels in the plasma membrane which

are comparable to people who are leaner and younger and don’t have diabetes.

Kennedy et al’s (1999) study begins to outline the various ways in which exercise and

physical therapy in diabetic patients can assist them in their disease maintenance. Exercising

the muscle helps to increase the levels of GLUT4 in the plasma membrane making it possible

for the patient’s body to absorb the glucose within the bloodstream more effectively. Even

more specifically, exercise targets an area of dysfunction that insulin has little to no effect

upon as skeletal muscle has been shown through this and other studies to have little to no

reaction to insulin.

This study is supported by a subsequent study conducted by Musi et al (2001) in which it was

determined that AMP-Activated Protein Kinase (AMPK) activity was normal in response to

exercise, as it should be if the previous study regarding the effect of exercise on the GLUT4

transporter held true. “AMPK has recently emerged as a potentially key signaling

intermediary in the regulation of exercise-induced changes in glucose and lipid metabolism in

skeletal muscle” [22]. AMPK plays a significant role in the signaling of the GLUT4 release

into the plasma membrane. This study proves that AMPK functions properly in the T2D during

exercise and suggests that it does not function properly while at rest. This was done by

comparing the blood sugar levels of a test group of diabetics with the blood sugar levels of the

control groups before, during and after riding an exercise bicycle for 45 minutes. While the

blood sugar levels of the diabetics were significantly reduced after the exercise, the blood

sugar levels of the control groups remained the same. However, like GLUT4, the mean AMPK

content in diabetic patients as compared to the control group did not show a significant

difference. Because of its believed role in the regulation of this process, however, this study

suggests further investigation as to just how the AMPK pathway stimulates the uptake of

18

glucose with the intent of the development of a new set of drugs designed to stimulate the

exercise-induced response.

With exercise comes the possibility of broken bones, making the studies of Lu et al (2003)

necessary for proper physical therapy and understanding following an accident. In their study,

“Diabetes Interferes with the Bone Formation by Affecting the Expression of Transcription

Factors that Regulate Osteoblast Differentiation,” researchers found that people with T1D do

experience inadequate bone formation, osteopenia and delayed fracture healing as a result of

their illness. Previous studies have established diabetics have decreased bone density and

bone formation as compared to control groups which suggests they have diminished osteoblast

activity. “In streptozotocin-induced diabetic rats, abnormal bone repair was shown to be

insulin dependent because the deficient osseous healing was reversed by insulin treatment.

This finding demonstrates a specific cause and effect relationship between inadequate insulin

production and abnormal bone formation” [23]. The study indicated that these deficiencies

could be reversed with the proper application of insulin, yet finding the mechanism that

prevents the bone formation at the protein level would enable researchers to further negate the

effects of diabetes on patients.

Genetic approach towards understanding Type 2 Diabetes

The role of genetic factors in the etiology of diabetes has long been implicated. This

possibility of “disease” genes was noted when family incidences of diabetes were found to be

highly significant. Therefore, patients with diabetes are very likely to have siblings or other

near relatives suffering from the same problem. Further researches have also directly

established the fact that diabetes is a condition strongly influenced by the genetic factors and

mutations therein [8].

The rates of genetic influence vary between the two forms of the disease, Type 1 Diabetes and

Type 2 Diabetes. While siblings of Type 1 patients have a 6% chance of developing the

condition themselves, this percentage increases from 30 to 40% in siblings of patients who

suffer from Type 2 Diabetes Mellitus. This makes the risk 6 to 7 times higher than in any other

group within the population. Similarly, twin studies also show a very high probability of

developing the condition ranging from 20 to 70 percent. Combined with the environmental

factors, the rates of diabetes are very likely to increase significantly [8].

19

Various syndromes have been seen where diabetes is the main feature. Such syndromes

include those such as Wolfram syndrome [8]. Hereditary indicase of genetic transfer of this

condition in the siblings is as high as 70 to 80% [14]. This transfer however, occurs in only 0.1

to 1 percent of the patients where severe insulin resistance takes place. The MELAS or the

Mitochondrial Encephalomyopathy Lactic Acidosis and stroke like epilepsy syndrome also

takes place should any mutation in the mitochondrial DNA take place [24].

It is however important to identify which genes are in reality diabetogenic in nature and which

are diabetes related genes. While some genes may modify the chances of developing diabetes

in a patient due to problems in fat storage, and use of glucose deposits, they may not

necessarily mean that each case will develop into diabetes. However, certain genes may lead

to progression of diabetes even in the absence of other environmental factors. Therefore,

research should also be able to identify which genes are actual diabetes causing genes and

which are diabetes promoting genes [25].

Therefore, genetic factors play an important role in the development of T2D. Despite

considerable effort, there has been relatively little progress in identifying genes that affect risk.

This may be due, at least in part, to phenotypic heterogeneity, that is, T2D comprises many

diseases characterized by hyperglycemia.

The etiology of T2D is so multifaceted that the debate still continues about the dichotomous

inheritance pattern of the disease. Since the environmental and genetic factors are both

important in the etiology, the individual role of each remains to be understood [26]. More than

60 genes have been researched so far in the pathology of Type 2 Diabetes and this highlights

the complex pattern of diabetic pathology that leads to the formation of the disease [24].

Two basic methods are available currently for the genome wide scan in particular case of T2D.

Gene mapping requires an elaborate review since this is the main technique, which has

enabled the scientists to recognize much of the genetic information previously unknown to the

scientific community. The method however, can vary and usually consists of positional

candidate approach or the genome wide scan. The genome wide scan is carried out via two

methods, which are linkage studies or association studies.

20

Gene mapping gained widespread popularity due to the fact that it is a cheaper option than

other genetic testing methods. This method is also faster and more accurate, and therefore is

one of the most favored methods among researchers [27]. Genetic or linkage mapping is able

to map out combinations of genes and how they can be responsible for various genetic

pathologies. The method is carried out via samples of the patient, which are blood, and tissue

samples. With the help of genetic markers and processes such as recombination, the process of

gene mapping is achieved very easily [27]. Recent researches have yielded a new class of

markers, which are obtained from DNA variation occurring naturally. They do not influence

any changes in the normal DNA, and are numerous in number, therefore, very effective in

linkage type of analysis [27].

For this, the markers are used in a variety of techniques such as the restriction fragment length

polymorphisms, randomly amplified length polymorphisms and randomly amplified

polymorphic DNAs or RAPDs [27].

There are two modes of mapping. The first is the genetic mapping, where the position of each

gene is made relevant to the other and determining their level of linkage. The physical

mapping is more focused on finding the exact location of a gene in the chromosome [27].

Linkage is defined as the presence of two different genes on the same chromosome. If these

are located near to each other, they are termed as tightly linked. This method is able to help

construct DNA maps by approximating the location of one gene with the other [27]. The

concept of map unit is used in this technique, that is “the effective distance needed to obtain, a

one percent recombination between linked alleles” [27].

Genome wide association study is defined as follows: “A genome-wide association study is

an approach that involves rapidly scanning markers across the complete sets of DNA, or

genomes, of many people to find genetic variations associated with a particular disease. Once

new genetic associations are identified, researchers can use the information to develop better

strategies to detect, treat and prevent the disease. Such studies are particularly useful in finding

genetic variations that contribute to common, complex diseases, such as asthma, cancer,

diabetes, heart disease and mental illnesses” [27].

21

Previously association studies were not possible due to lack of information about the human

genome scan. Now with the completion of genome wide scan and information about the

human genome, the association studies are emerging as an important adjunct to health research

[27]. With the help of the information gathered, the LOD score (logarithm of the odds) is then

estimated. In this method, the probable birth sequence is accessed via estimation of linkage

distance. The result obtained is then divided by the probability of a given birth sequence when

assuming the genes are unlinked. The formula applied is as follows: LOD score=z= log

[probability of birth sequence with a given linkage value/probability of birth sequence with no

linkage].

The GENNID study was perhaps one of the most significant researches done in the area of

diabetes susceptibility genes detection. Carried out by the American Diabetic Society, this

project included the various populations within America, and tried to find out the role of

different factors in the etiology of the condition. The four groups of populations that were

studied included the Caucasian whites, the Mexican Americans, the blacks, and the Japanese

Americans [28]. The criteria selected for the study was the elevation of glucose levels above

normal limits set via international standards. In most of the samples, families were selected

who had first or second relatives suffering from the same condition. Studies were carried out

through blood sampling. The study looked into diabetes as well as impaired glucose tolerance

independently as two areas of research. The method of research selected was whole genome

polymorphism scan and the markers selected ranged in number from 389 to 395 [28].

The study revealed the presence of almost 24.4% of pedigree errors among the various

families. Various markers were linked to various populations under study. For example,

D3S2432 was linked to Mexican Americans, D5S1404 were linked to whites, D10S1412 was

linked to African Americans. Mixed findings were seen in linkage of GATA172D05 on the X

chromosome in case of the two white samples taken in the study [28].

Genome wide scanning or genome mapping are a significant addition to the genetic

identification of various pathologies in the body. The use of ultramodern technologies such as

Illumina and Affymetrix are testimonial to the fact that current medical research is impossible

to carry out without the use of genetic research. Among the diseases that are being extensively

followed is T2D. It is important to understand the genetic basis of this disease, so as to devise

treatments that are able to target the problem. With the increase in the prevalence of diabetes

22

worldwide, the need for intensive research is now a prime need rather than a research fantasy

[29].

The studies have been able to collect this data on the basis of research done on certain rare

variants of the diabetes condition. As mentioned, the division of the diabetes variants into

categories has been of fundamental importance to gain information about the genetic

predisposition to various pathologies of the disease. In this regard, the maturity onset diabetes

of the young is the prime disease variant that has helped in identifying many of the genes and

their loci [24]. Although prevalent in only 2% of the population, this variant has been very

helpful in identifying some of the most complicated issues in the disease. The genes that have

been identified in the MODY include those that encode for hepatic nuclear factor 4 alpha,

glucokinase, HNF1 alpha, insulin promoter factor 1, HNF 1 beta, and NEURODI/beta 2

respectively [24].

Positional candidate genes approach

Candidate genes approach has been of much help in the identification of genetic components

of T2D genes. In fact, it can be stated that the initial and most compelling evidences that

identified genetic components to diabetes pathology have been proved by candidate gene

approach [29]. Three types of candidate genes have been found, which include the functional

candidate genes, the positional candidate genes, and the experimental candidate genes [14].

Through this approach many genes have been implicated in the etiology of T2D. These

include the peroxisome proliferators-activated receptor-gamma receptor, the beta cell

adenosine triphosphate sensitive potassium channel, and the peroxisome proliferators activated

receptor gamma coactivator-1 alpha [14]. These genes are among the very first to be identified

in the pathology of diabetes [29]. The insulin receptor substrate 1 or the IRS1 decreases

insulin signaling, and is being studied further for its possible contribution in the disease

pathology [24]. This protein is essentially a paralogue that has been linked to cellular

functions related to insulin function. The total number of such paralogues identified so far

amount to 18, which are expected to increase as more information is obtained [24].

The positional cloning approach has recently gained widespread interest and approval in many

of the research purposes. This approach was among the first to identify the role of Caplain 10

gene in the etiology of diabetes [30].

23

Whole genome screen approach

Genome scanning has been very effective and helpful in identifying sets of genes that may be

the cause of T2D. This set of genes has been found to be very different from the genetic set

recognized for T1D, since there is no constant set of genes identified for type 2 Diabetes.

Genome wide scans to map T2D susceptibility loci have been conducted in many different

populations. Some of the mapped loci have been observed in multiple populations. Other

regions, however, may be unique to specific populations. This may reflect underlying

phenotypic heterogeneity, racial/ethnic differences in susceptibility allele frequencies, or

differences in sample size, study design, and analytical methods.

The more common loci which have shown genetic abnormalities in the T2D include the

1q25.3, 2q37.3,3p24.1, 3q28, 10q26.13, 12q24.31 and 18p11.22. [8].

Genetic studies carried out in T2D are mainly of two types, the association studies as well as

the linkage studies [14]. Whatever the type of study chosen, what is understood is that single

or monogenic causes of diabetes are found in only 5% of the population, where the primary

cause of it is impaired insulin secretion or impaired beta cell mass [31].

Association studies

The association study is also known as the candidate gene approach, where association

between gene variants and T2D is found. This type of research however, requires multiple

researches to reach definite conclusions since false positive and negatives are very likely to

take place [14].

Genome linkage studies have shown increased association of various chromosomal

abnormalities in the various populations with similar chromosomal mutations. These findings

can be summarized in (Table 1).

The primary problem in these studies has been the lack of association studies that can

specifically identify which gene is responsible for a particular phenotypic trait. This leads

primarily to probability recognition of the genes rather than a confirmed analysis of the genes

24

that are responsible for causing T2D [8]. The association studies are mainly undertaken to

make association between the markers and the disease loci within a specific group or

population [14]. Physical linkage can be ascertained, however, there is required a large amount

of evidence in this regard. This method however, is more specific, and the marker determines

the accuracy with which any association can be made, therefore, the number of markers

required for genomic scanning is more [14].

There is however, a significant pool of genes that have been associated with T2D Again, these

findings have been supported by very limited data and follow up research, which limits the

credibility of these researches. Included in these associations is the PPARδ gene or the

peroxisome proliferator-activated receptor gamma. This gene is mainly involved in the

adipocyte development. This gene has been found to be protective against T2D in Finnish as

well as second generation Japanese populations, which can reduce it by a considerable 70%

[8]. This association is perhaps the most proved and researched finding in the genetic locus

recognition [8].

ABCC8 is another gene that has been implicated in T2D progression. In hyperinsulinism, this

gene is the prime location of mutation. In this particular gene is the exon 22, which is

responsible for T2D. Similar to this gene is the KCNJ11 that has also been implicated in the

T2D. Researchers have significantly supported this association between these genes and T2D

[8].

25

Table 1: A summary of LOD scores of loci findings of Type 2 Diabetes in various

populations.

Chr

omos

ome

#

Japa

nese

Afr

ican

Am

eric

an

Euro

pean

Fren

ch

Finn

ish

Paci

fic Is

land

er

Euro

pean

Am

eric

an

Ash

kena

zi Je

wis

h

Am

eric

an In

dian

East

Asi

an

1 - 0.27[32] 3.30[8] 1.50[14]

3.00[8] 1.27[33] 2.40[8] - 3.30[8]

4.30[14] - 4.10[8, 14] 8.90[14]

2 - 0.38[32] 2.60[8] 2.30[8] 1.60[33] 1.90[14]

4.10[8, 14] - 2.60[8] 2.20[14] - - 2.10[8]

3 1.40[8, 14] 0.54[32] 2.40[8] 2.97[33] 4.70[14] 3.90[8, 14] 1.10[8, 14] 4.10[8] - 1.80[14] -

4 - 0.82[32] - 1.34[33] - - - 1.30[8] - -

5 - 0.36[32] 2.80[8] 1.52[33] - 2.40[8] - - - -

6 - 2.26[32] 1.80[32] 7.30[8] 7.30[8] 4.10[14] 3.20[14] 7.30[8] - 1.80[14] 6.20[14]

7 - 0.75[32] - 1.32[33] - - - - 2.00[8] -

8 - 0.28[32] 2.60[8, 14] - - - 1.30[8] 3.60[14] - - -

9 - 0.92[32] - 1.30[33] 2.40[8] 3.30[8] - - - 2.90[8]

10 - 0.87[32] 1.90[8] 2.00[14] 1.50[33] 3.80[14] - 2.80[8] - - 2.00[8]

11 3.10[8] 0.30[32] 2.10[8] 3.40[8] 1.34[33] - - 2.10[8] - - -

12 - 0.37[32] - 1.50[8] - 3.60[8, 14] 1.50[8, 14] - - -

13 - 0.08[32] - - - - - - - -

14 - 0.58[32] 2.00[8] - - - 2.00[8] - - -

15 - 0.14[32] - - - - - - - -

16 - 0.81[32] 3.40[8] - - - 3.90[8] - - -

17 - 0.25[32] - - - - - - - -

18 - 0.54[32] 1.10[8] - - 4.20[14] 1.10[8] 2.40[14] - - 1.00[14]

19 - 0.71[32] - 1.20[33] - - - - - -

20 2.30[14] 0.21[32] - 2.70[14] - 2.00[8] 4.80[14] 0.90[8] - 2.90[8]

21 - 0.09[32] - - - - - - - -

22 - 1.33[32] - - - - - - - -

26

Recent researches have pointed to variation in the FTO gene that is causative of both diabetes

as well as obesity. Having an extra copy of this variant in the body increases the risk of

developing diabetes by more than 50%. This finding points towards the possibility of other

genes that contribute to obesity, and therefore, may become etiological factors in the

pathology of T2D [34].

Linkage studies

These types of studies are mainly testers of specific genes that look for association as well as

linkage. They are preferred as they are able to provide reasonably accurate results in

population stratification. However, the statistical validation has to be of considerable size and

value as to be recognized in the findings, and therefore, any weak links or findings are likely

to be left ignored. Any linkage that is found is able to correlate with the physical attribute of

the condition to the genetic problem. This data is mainly obtained from more than two

individual sources, should any family linkage need to be determined [14].

Linkage studies have not been able to locate and clone genes localised to a particular interval.

This limits the ability to fine map the genes that are responsible for disease. However, this

method requires lesser markers for the genome scan, and therefore, is able to work with less

data with more efficiency [14].

The researches have not been able to find linkages to phenotypic presentations of various

symptoms of T2D. While chromosomes such as chromosome 18 have been strongly

implicated in the diabetes pathology, the physical attributes to it remain unknown. Limited

research has shown that this particular loci has been implicated in human obesity. To this gene

linkages MC5R and MC4R have been identified, which have been found to be the strongest

link to diabetes [10].

When considering monogenic causes of insulin dysfunction, there are many genes identified to

MODY, which has been used to study diabetes extensively. The genes identified include

HNF4A, GCK, HNF1A, IPF1, HNF1B and NEUROD1 respectively. The linkages in the same

order to these chromosomes found were 20q, 7p, 12q, 13q, 17 q and 2q respectively [12]. Of

these the first three have been identified via the linkage studies.

27

Identifying genes contributing to Diabetes

The identification of many genes in the glucose metabolism and regulation are the key areas

from where any diabetes genome related research begins. The introduction of genome wide

scans has made it possible to introduce new possible lines of research in this regard. The genes

responsible for age related onset of the condition have been ascertained to many genes, and

include genes on chromosomes 1qter, 4p15-4q12, 5p15, 12p13-12q13, 12q24 and 14q12-

14q21 respectively. These loci have been implicated in other researches as well [26].

The chromosome 1 is perhaps one of the most discussed areas that is responsible for diabetes

type II pathology. There are also many candidate genes that have been identified in this

particular area, and these include potassium inwardly rectifying channel, subfamily J,

members 9 and 10, liver specific pyruvate kinase, C reactive protein, lamin A/C, and omentin

[35].

Another region of genetic mutations was found on chromosome 12, which has been strongly

implicated to age of onset. These loci are the 12p13-12q13. In the 12q24 region, the linkage

evidence has been found between D12S324 and D12S1659. The problem with these

researches however, is that specific gene locus and significant linkage analysis is still to be

determined. While the above genes have been strongly implicated with the age of onset of the

diabetes condition, the role of obesity and its genetic predetermination have to be known [26].

The SNPs have also been recognized as chief players in the pathology of diabetes. Of these a

significant number of these are found in the CASQ1 gene also known as the calsequestrin 1

gene. Its calcium regulation activity within the sarcoplasmic reticulum allows it to regulate the

GLUT4 expression within the cell [35]. The expression of this particular protein is increased

in animal models during diabetes, which proposes its role in glucose uptake and glycogen

synthesis, leading to higher risk of T2D [35].

The genes that lead to T2D have to be identified properly, as many are still implicated. For the

most part, researchers claim that etiological factors are also important contributors in the

development of diabetes [10]. This percentage has been identified to constitute 10% of the

diabetic population [10].

28

Previous studies

Animal studies

Animal models have been very helpful in identifying a significant number of genes and

markers, which are causative of insulin secretion and insulin related disorders. It is well

established that reduced insulin secretion and beta cell mass is responsible for diabetes [31].

For example, some of the early researches carried out on the protein activins has been done in

animal models. The roles of activins have been long known for their axial determination in

embryos, and the sonic hedgehog expression in embryonic chicks. Of interest, these proteins

are especially involved in the expression, stimulation and secretion of insulin within rat

models, through actions on the calcium and ATP potassium channels respectively [36].

Alongside, many receptors that are involved in the functioning of activin have been identified

through the same animal models. The various receptors include the ACVR1, the ACVR2 and

the ACVR2B respectively. Of these the ACVR2 a/b have both shown abnormalities in the

functioning of the pancreas function, which lead to problems in rats such as impaired glucose

tolerance or hypoplastic pancreatic cells [36]. These findings have been largely responsible for

further research carried out on this particular protein regarding their role in the pathology of

diabetes.

The IPF1 gene is a transcription factor that not only regulates the development of pancreas,

but also is responsible for exerting effect of the insulin gene. Other genes that it regulates

include GCK, IAPP and SCL2A2 respectively. Animal models have proven that the absence of

IPF1 in mice leads to absence of pancreas in them.

Human studies

Studies have shown that there is a significant role played by the CAPN10 sequence mutations

in the pathology of T2D [8]. The caplain group of proteins is made of calcium-activated

proteases, which activate or inactivate intracellular signaling, proliferation or differentiation.

They are also considered as chief role players in the insulin signaling and its secretion [14].

Variations and haplotype combinations in different locations have been associated with the

disease, however, there are very few studies that have been able to reproduce or carry out

researches on the same lines. While the statistical data produced in the initial study was

29

significant, the lack of further research in this regard leaves it to be determined with more

consistency [8]. The gene has shown a 2.8 fold increase in the risk of developing T2D in the

affected patient. The African American and Mexican American studies in this regard also

point towards the high probability rates of T2D development [14, 37].

The genes found to be responsible for T2D have been mainly known by the association and

linkage studies carried out. Genome wide linkages have shown the presence of mutations in

1q42.2, 2p21, 2q24.3, 4q34.1, 5q13.3, 5q31.1, 7q32.3, 9p24.2, 9q21.12, 10p14, 11p13, 11q13-

14, 12q15, 14q23, 20p12.3 and Xq23 respectively. These linkages have been found in the

Finnish Caucasians, the French Caucasians, the Australian aborigines, the American

Caucasians, Pima Indians, Mexican Americans, African Americans, and the Japanese

populations [14].

Repeated linkage studies demonstrate the susceptibility genes for T2D in the region of the

chromosome 1q21-q25 [14]. This main region is considered an important contributor to

diabetes pathology. This particular area alone includes encoding for things such as insulin

receptor related receptor or the INSRR, the hepatic pyruvate kinase or the PKLR, the lamin

A/C or the LMNA and the apolipooprotien A2 respectively [14].

The CAPN10 sequence however, has been demonstrated in very limited populations at the

moment. One of the few populations this includes is the Mexican American population,

otherwise; proof in other populations has been very limited [30].

Research by Parker et al, (2001) has shown the presence of chromosome 18 in the T2D

pathology, which repeats the results of research carried out before in the same area. Near to

chromosome region was found a strong association of the glucagon receptor gene or GCGR on

the 17q25 [10]. Other researchers have shown the associations between chromosomes 12q and

20 [10].

Some of the well-established associations in the T2D include TCF7L2, PPARG and KCNJ11

[11]. Another cluster variant is in the IGF2BP2 region or the insulin like growth factor 2

mRNA binding protein [11]. The TCF7L2 region in particular has been identified as one of the

most important identifications of the genetic components. The identification of this gene was

helpful in proving that “a non-candidate gene or region based association effort could work”

30

[29]. It was also significant for proving that diabetes pathology may be present in unexpected

places in the genomic sequence, and therefore, the process is currently underway for

discovering the genes that are responsible for it [29].

A very important role of the upstream transcription factor 1 (USF1) is that it has been reported

to be involved in glucose and lipid metabolism [38]. This protein is primarily located on the

chromosome 1q22 to 23, which is very widely known to be involved in diabetes progression

and pathology. This is perhaps one of the most proved protein and chromosome involved in

the pathogenesis of diabetes. However, its expression may vary from one population to the

other [38].

Many of these researches include families in their studies, due to the very strong familial

component found in the pathology of the disease. Siblings and offsprings are very likely to

suffer from very high rates of diabetes, due to their genetic affiliation, and the further the

prevalence increases within the family; the more there is aggregation of the genes responsible

for it. The twin models have easily demonstrated such affiliations, and both monozygotic and

dizygotic twin models have proven the presence of genetic similarity in the pathology of

diabetes [12].

Figure 3 summaries the timeline of the discovery of the genes predispose to T2D in the past

decade.

Genome wide scans of different populations

Genome wide scans to map T2D susceptibility loci have been conducted in many different

populations through linkage analysis (Table 1) and association analysis (Table 2). Some of

the mapped loci have been observed in multiple populations. Other regions, however, may be

unique to specific populations. This may reflect underlying phenotypic heterogeneity,

racial/ethnic differences in susceptibility allele frequencies, or differences in sample size,

study design, and analytical methods.

31

Figure 3: Timeline of discovered genes which are associated to Type 2 Diabetes for the

past decade.

32

Table 2: Thirty-nine Genes showing Genome-Wide Association study of fifty-two loci associated with Type 2 Diabetes in previous studies

among different population and their p-value.

Gene SNP Region

p value

[39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [11] [52]

FR JP FR EU JP JP EU ICLD EU EU EU AMI ICLD FI FR

TCF7L2 rs7903146

10q25.2 1.E-30 8.E-12 - 9.E-30 - - 3.E-23 - 5.E-08 - - - 2.E-10 1.E-08 2.E-34

rs4506565 - - - 6.E-16 - - - - - 5.E-12 - - - - rs7901695 - - - - - - - - - - 1.E-48 - - - -

CDKAL1

rs4712523

6p22.3

2.E-12 7.E-20 - - - - - - - - - - - - - rs4712524 - - - - 3.E-10 - - - - - - - - - - rs6931514 - - - - - - 1.E-11 - - - - - - - - rs9465871 - - - - - - - - - - 3.E-07 - - - - rs7754840 - - - - - - - - - - - - - 4.E-11 - rs10946398 - - - - - - - - - - 1.E-08 - - - - rs7756992 - - - - - - - - - - - - 8.E-09 - -

SLC30A8 rs13266634 8q24.11 8.E-08 2.E-14 - 7.E-06 - - - - - - 5.E-08 - 3.E-06 5.E-08 6.E-08

HHEX rs1111875 10q23.33 - 7.E-12 - - - - - - - - - - - 6.E-10 3.E-06 rs5015480 - - - - - - 7.E-08 - - - 5.E-06 - - - -

KCNQ1 rs2237892 11p15.5 - 1.E-26 - - - 2.E-42 - - - - - - - - - rs2237897 - - - - 1.E-16 - - - - - - - - - -

FTO rs8050136 16q12.2 - - - 2.E-17 - - 7.E-06 - - - 7.E-14 - - 1.E-12 - rs9939609 - - - - - - - - - - 2.E-07 - - - -

KCNJ11 rs5219 11p15.1 - - - 1.E-9 - - - - - - - - - 7.E-11 - rs5215 - - - 5.E-7 - - 4.E-07 - - - 5.E-11 - - - -

LOC64673 IRS1 rs2943641 2q36.3 9.E-12 - - - - - - - - - - - - - -

WFS1 PPP2R2C rs4689388 4p16.1 1.E-08 - - - - - - - - - - - - - -

LOC72901 CETN3 rs12518099 5q14.3 7.E-07 - - - - - - - - - - - - - -

CDKN2A CDKN2B

rs564398

9p21.3

- - - - - - - - - - 1.E-06 - - - - rs10811661 - - - 7.E-07 - - - - - - 5.E-06 - - - - rs7020996 - - - - - - 2.E-07 - - - - - - - - rs2383208 - 2.E-29 - - - - - - - - - - - - -

PPARG rs1801282 3p25.2 - - - - - - - - - - 2.E-06 - - - -

IGF2BP2 rs4402960 3q27.2 - 1.E-06 - - - - 8.E-08 - - - 9.E-16 - - 9.E-16 - rs6769511 - - - - 1.E-09 - - - - - - - - - -

33

Table 2 (continued)

Gene SNP Region p value

[39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [11] [52] FR JP FR EU JP JP EU ICLD EU EU EU AMI ICLD FI FR

MTNR1B rs1387153 11q21 - - 2.E-36 - - - - - - - - - - - -

CDKAL rs10946398 6p22.3 - - - 7.E-07 - - - - - - - - - - -

JAZF1 rs864745 7p15.1 - - - - - - 5.E-14 - - - - - - - - CDC123 CAMK1D rs12779790 10p13 - - - - - - 1.E-10 - - - - - - 8.E-15 -

TSPAN8 LGR5 rs7961581 12q21.1 - - - - - - 1.E-09 - - - - - - - -

THADA rs7578597 2p21 - - - - - - 1.E-09 - - - - - - - -

ADAMTS9 rs4607103 3p14.1 - - - - - 1.E-08 - - - - - - - - NOTCH2 ADAM30 rs10923931 1p12 - - - - - - 4.E-08 - - - - - - - -

DCD rs1153188 12q13.2 - - - - - - 2.E-07 - - - - - - - - SYN2

PPARG rs17036101 3p25.2 - - - - - - 2.E-07 - - - - - - - -

VEGFA rs9472138 6p21.1 - - - - - - 4.E-06 - - - - - - - -

TCF2 rs4430796 17q12 - - - - - - - 1.E-11 - - - - - - -

CETP rs1800775 16q13 - - - - - - - - - 3.E-13 3.E-6 - - - - -

APOE cluster rs4420638 19q13.32 - - - - - - - - - 3.E-13 - - - - -

LPL rs328 8p21.3 - - - - - - - - - 5.E-07 - - - - -

APOB rs693 2p24.1 - - - - - - - - - 7.E-07 - - - - -

PVT1 rs2648875 8q24.21 - - - - - - - - - - - 2.E-06 - - -

NR

rs358806

-

- - - - - - - - - - 3.E-06 - - - - rs12304921 - - - - - - - - - - 7.E-06 - - - - rs1495377 - - - - - - - - - - 7.E-06 - - - - rs7659604 - - - - - - - - - - 9.E-06 - - - -

Intergenic rs1859962 17q24.3 - - - - - - - 3.E-10 - - - - - - - rs6712932 2q12.1 - - - - - - - - 6.E-06 - - - - - -

* Population abbreviations: FR=French; JP=Japanese; EU=European; AFA=African-American; ICLD=Iceland; AMI=American Indian; FI=Finnish.

34

Asia

The Asian population is a large pool of different ethnic communities and races, that have been

living in isolation or in conjunction with each other. This has resulted in a wide mixing of the

genetic component within these populations, with a few pockets of communities that have

preserved original genetic structures. The challenge here is the number of genome scans that

must be carried out based on each individual population so as to ascertain correctly what their

genetic components are. Below is a small introduction to this wide variety of genetic pool.

Chinese population

The Chinese population also displays some of the highest numbers of diabetic disease in the

world. Even more complicating is the fact that the Chinese population accommodates within

itself a very large number of ethnic groups, which may display various genetic variations

within them. This makes generalization in China a very difficult task, and understanding each

and every sect is important to understand the true extent of the disease in this country. This

variation in the diabetes scans shows itself in the form of varying heterogenic expressions of

chromosomes responsible for diabetes in this particular population [53].

Studies show that the increase in the incidences of diabetes have risen by more than ten fold,

and with the increasing population of China, there is an ever increasing number of people who

are suffering from diabetes or diabetes related complications [53].

The most significant of the Chinese ethnic groups is the Hans group, which constitutes the

biggest pool in the Chinese population. These as well as many other Chinese ethnicities

display mutations on chromosome 1 at the locus D1S1589, primarily determining the age of

onset of the disease. Other chromosomal regions associated with the age onset of diabetes

included the chromosomes 6, 12 and 16 respectively. Linkage levels were also found for the

6q and 1q genes. The Mendelian laws of inheritance also apply to this particular disease [53].

The genes mutated in the Chinese population for T2D include 1q25.3, 2q37.3, the 6q22,

18p11.22 and the 20q13.1 respectively [14].

35

The most significant genetic mutations found by the Chinese population include those present

on the chromosome 6. The LOD found in this particular chromosome are as high as 6.2, and

this linkage has been associated with the gene 6q21-q23 [14]. Related to this chromosome is

the low fasting glucose levels which are also found in the Finnish populations, and the African

American, the Mexican Americans and the Pima Indian populations. This region has been

implicated in a multiple of genetic mutations, and is therefore, considered to be a prime region

involved in the etiology of T2D [14].

Japanese population

The chromosomal abnormalities found for T2D in the Japanese population include the 3q28

and the 20q13.1 respectively [14]. Linkage has been found on the 11p13 chromosome, which

is supported by researches and findings on American Caucasians [14]. The linkage Xq23,

originally located in the American Caucasians, has also shown its presence in the Japanese

population as well [14].

Chromosome 3 is another area that has shown high rates of mutations related to the pathology

of T2D. These mutations have been found on both the long as well as the short arms. Such

linkages have been demonstrated in many populations that include the French Caucasian as

well as the Australian population respectively. Similar studies found the linkages in the Pima

Indians at the region of GLUT2 gene on 3q26.1 and D3S1292 respectively [14].

LOD scores as high as 8.91 were found near the marker D1S2815 in the linkage studies

performed [14]. The Hong Kong Chinese have shown susceptibility loci on chromosome

1q21-q25 [14].

The T54 allele homozygosity in the Japanese population has been found to be associated with

higher basal and 2 hour insulin levels when compared to other genotypic variations [14].

Chromosome 11 mutations have been found to link to marker D11S935. The particular

location on the chromosome is 11p13.The similar findings were seen in the Finnish population

at the location 11q13 [14].

36

Of significance, the MODY mutations in the Japanese populations have revealed two more

genes responsible for familial diabetes. These are the nonsense mutation of Q310X found in

the chromosome MAPK81P1, and the missense mutation of E1506K in ABCC8 gene [12].

The objective of Takeuchi study was to detect the new T2D gene variants and substantiate the

previously detected variants through a three-stage GWAS study on the Japanese population

[40].

In the first stage, 519 case and 503 control individuals were genotyped with 482625 SNPs.

The Cochran-Armitage trend test was used to test the association between T2D and genetic

variants.

In the second stage, 1456 SNPs were genotyped using iPLEX (Sequenom) and GoldenGate

(Illumina) assays. According to the p-value criteria (p < 7 x 10 -5), 30 SNPs symbolising 17

unique loci were chosen as significant.

In addition to the GWA study, the objective was also to replicate T2D association of 17 SNPs

from 16 candidate loci detected earlier in the Europeans.

The third stage incorporated the replication of association and estimation of 4000 case subjects

and 12569 sample subjects based on population. An association was taken to be significant

only if it used the same risk allele as used by the other two stages and then it was evaluated on

the basis of a one-tailed test. A meta-analysis was conducted using the combined results of

two or all the three stages with the past Japanese studies carried out by 3 other groups. The

correlation coefficient between a SNP coded by the number of risk alleles and the disease

status is given by R2 and it is used to compare the explained sum squares between Japanese

and European population.

The results gave 4 loci with one new and 3 previously detected ones. There was considerable

overlap of T2D susceptibility genes between the Japanese and European population, while

extent of effect and explained variance was inclined towards a higher level in the Japanese

population and the association was more for the Japanese population than the European

sample.

37

Although the study could not validate whether the penetrance for a genotype of notice differs

considerably between Japanese and European descendants, yet with respect to genetic effects,

4 out of 7 confirmed loci verified a considerably higher odds ratio in the Japanese population

[40].

Another GWAS study was conducted by Unoki and his team to detect genetic variants that

multiplies the risk of type 2 diabetes in the Japanese people [43].

To perform this test, 268,068 SNPs were genotyped from 194 Japanese subjects with T2D and

diabetic retinopathy and, 1558 unrelated control subjects. These SNPs constituted about 56%

of common Japanese SNPs and among them 207,097 SNPs that were successfully genotyped

were chosen. The 8,323 SNPs that showed the lowest p values were chosen to be genotyped in

1,367 T2D cases and 1266 controls. It yielded 6,731 SNPs successfully for further analysis.

Nine SNPs loci were chosen with p values less than 0.0001 and a third cohort of Japanese

were genotyped with 3,557 T2D cases and 1352 controls. All these populations were

combined in a subsequent case-control analysis and the findings detected 6 SNPs that were

strongly associated with T2D. Among them, CDKAL1 locus and IGF2BP2 locus were

detected earlier and three others had a p value greater than 0.056 and hence were excluded in

the third test. The only remaining locus was KCNQ1 (rs2283228) and hence, it was further

examined in quite a number of case-control studies. The analysis apart from being performed

on the Japanese population was also performed on the Singaporean and Danish populations to

confirm the association KCNQ1 with T2D risk.

A significant interactive effect between rs2237897 and rs234844 was detected using stepwise

logistic regression analysis. The results of the Singaporean and Danish studies showed that

rs2237895 and rs2237897 are strongly associated with T2D in East Asian and European

descent respectively.

KCNQ1 was not detected in the studies hitherto and this has been the first attempt to verify its

association with T2D. However, there exists a possibility that the CDKN1C gene near the

KCNQ1 gene is the actual variant causing diabetes and calls for further studies to verify it

[43].

38

Indian population

The subcontinent region of Asia suffers from an acute lack of research in genome scanning in

almost all aspects of medicine. Therefore, the most important or significant data that is worth

mentioning is the epidemiological data. That too, is strongly affected by the lack of local

health care system and setup, where a large proportion of the society is unable to reach

medical help. This also means that much of the population is unable to afford treatment or

even have diagnosis of their condition, let alone have genetic scan. With a constant increase in

population in this particular region, there is an expected exponential increase in the numbers

of diabetic patients as well [31]. The sample therefore, is not representative of all portions of

the society, and may be only confined to the economically sound population. The countries

that usually comprise this set of population include people from Pakistan, India, Bangladesh

and Sri Lanka etc.

To date, some of the most significant research carried out in this regard have been done so in

India. The findings showed a very high prevalence of diabetes among the samples collected,

where more than 12 percent of the patients were suffering from diabetes or insulin resistance.

Gender differences were very nominal, and the mean age of onset was found to be at 40 years.

Epidemiological factors such as diet, living style, lack of activity and BMI etc. were important

contributors in the disease [31].

Scans have shown loci for T2D on chromosomes 1q21, 2q,3, 5, 11q, 12q and 20q respectively

[31]. The most prevalent gene was found to be 1q21-24, which has expressed itself in other

populations including Utah Caucasians, Pima Indians, English, French, Amish, Chinese and

other populations [31]. Its evidence for linkage however, remains to be determined.

Susceptible genes that may contribute towards diabetes in the patients were found to be in

PPARG, KCNJ11, CAPN10 and HNF alpha genes. CAPN10 variations were significant in the

Hispanic as well as Finnish populations [31]. Study carried out by Sanghera et al in 2007

found some new genes in the pathology of T2D. These included IGF2BP2, cyclin dependant

kinase 5, a zinc transporter protein, CDKN2A, HHEX, TCF7L2, KCNJ11 and FTO

respectively [54].

Other chromosomes found in researches among the Indian population include 3q22 region

loci, 1q44, 8q23 and 2q37 respectively [31]. Other findings include chromosomes 16q12,

39

19p13.3 respectively. The expression has been found to vary from one region in the Indian

population to the other. The main reason behind it is the large numbers of different ethnic and

racial populations that live in the same region. Therefore, genetic scans in the Indian region

will mean scans according to different localities and the type of population being included in

the study [31].

Studies carried out in the same line by Sanghera et al concluded the high association of

PPARG2, IGF2BP2, TCFL2, and FTO. This study was able to identify the role of the genes,

which had been previously implicated in the diabetes pathology. This research was able to

compare results of studies carried out on Caucasian populations and was able to identify the

presence of these genes in the Asian genetic pool as well [54].

North America

The American region is perhaps one of the most affected regions of diabetes. While there have

been considerable advancements in the field to improve the quality of life in the diabetic

patients, there still has been a significant increase in this population nevertheless. The primary

reason is the ageing of the baby boomers population, more life expectancy than in the past,

and better health facilities and awareness regarding care of such patients. This is resulting in

an increase in the number of people who will require health care provision due to diabetic

complications. More importantly the complications of diabetes have started to rise as well in

this region. Now there are more cases than ever where younger age groups are demonstrating

renal complications, amputations, or blindness due to a direct consequence of diabetes [8, 28].

This number continues to rise, and with the current challenge the US health care system faces,

the care for such patients will be more challenging than ever [8].

Pima Indian population

A study of genome-wide association has been carried out on the Pima Indian population to

examine this linkage. An example of repeated expression of genetic mutations in different

populations is the chromosome 1q25.3, which is seen in both the Pima Indians as well as

among U.S Caucasians. The same findings were replicated in the French and the UK

Caucasian populations as well. The marker found in this population is the D1S2127 [8]. Other

races where the chromosome 1q25.3 has been reported are the French Caucasians, the UK

40

Caucasians, the Amish, the Chinese and the Framinghams [14]. Pima Indians have also shown

the presence of the mutations in the chromosome 3q28 the marker of which is the D3S1580, as

well as the 6q22, the marker of which is D6S1040 [14].

Studies carried out on the Pima Indian populations also found the role of CAPN10 in the

genetic etiology of T2D. Low mRNA Caplain 10 levels were found in patients who displayed

homozygous G allele of UCSNP-43. Insulin resistance was also demonstrated, which was the

same in the Finnish population [14].

Pima Indians with chromosome 4 mutations especially the missense mutations of A54T have

shown “greater insulin resistance and higher rates of fat oxidation compared with homozygous

normal controls” [14].

Another study carried out to detect genetic variants associated with the onset of diabetes in the

young age [50]. This study suffers from an extensive occurrence of obesity and type 2

diabetes.

The study consists of 300 individuals having T2D at an onset age of 25 years or less. It

consisted of another 334 control subjects without diabetes aged 45 or more. 121 non-diabetic

siblings of the diabetic sample and 140 diabetic siblings of the 334 individuals were included

to check genetic association within the family (case-control approach). A resulting 80044

utilizable SNPs were derived after genotyping the individuals on the Affymetrix 100K array.

These SNPs were tested for both general and within family association in case and control

samples. SNPplex was employed to genotype persons in the follow-up studies based on

population.

The study shows that an early onset of diabetes is to a great extent influenced by the genetic

determinants. This proves that there are a number of regions where marker alleles are strongly

in linkage disequilibrium with variants that confer susceptibility to early onset of diabetes

mellitus.

Genome-wide mapping analyses are only an initial step in the explication of susceptibility

variants. Although the current analyses have pointed out several areas that may hold genetic

variants that affect susceptibility to early onset of diabetes in American Indians yet, fine-

41

mapping analyses of these areas are needed to pinpoint the indicators to specific genes. In the

present study, confirmation of the function of genes in the identified areas would involve

replication analysis on other populations as well as functional analysis [50].

Amish population

The Amish population is perhaps one of the purest populations to study genetic linkages

related to various diseases. The Amish population exhibits a somewhat homogenous lifestyle

and preserves extensive family history accounts and hence, offers a good ground to carry the

genetic analysis. Researches based on this population of older adults or of adults in the past,

ensured the supply of genetic material that was not influenced by other populations. Therefore,

the genetics dynamics related to this population can be considered genuine and may be helpful

in identifying some of the main genes that are involved in diabetes pathology in the Amish

population [35]. The Amish population is one of the first populations to prove the role of

chromosome 1 related mutations in T2D pathology. Multiple polymorphisms have been found

in the sequencing of the exons, showing mutations in the region [35].

The Amish population has also reported finding of the chromosome 1 related linkages and

mutations for T2D [14]. The main complication however, is the number of genes that are ideal

candidates for the progression of T2D. At least 450 have been documented to be involved in

the possible etiology of T2D in this region alone [35]. Patients were found to have impaired

glucose homeostasis, with peak at the marker D1S2715. This location is very near to the

linkage locations found in the Utah Mormon and the French populations [14]. Other

researches point to five populations that show mutations and associated etiology related to

chromosome 1. These include Pima Indians, Utah Caucasians, French Caucasians, UK

Caucasians and the Chinese [35].

A study conducted by Rampersaud and his team to find out the T2D susceptibility genes by

carrying out a genome-wide association scan (GWAS) on the Old Order Amish, a population

of Swiss immigrants [55].

The study consisted of 124 T2D subjects identified by the AFDS and 295 control subjects

exhibiting normal glucose tolerance. Their DNA was genotyped on the Affymetrix 100K SNP

array. In total, 82,485 SNPs were examined to check their association with T2D on the basis

42

of quality control tests and Hardy-Weinberg equilibrium test. These SNPs associated with

T2D were again prioritized on the basis of genetic links with 5 oral glucose tolerance test traits

of 427 Amish individuals not having diabetes. The secondary quantitative test comprised of

the highly significant (p<0.01) subjects out of the 427 non-diabetic participants. The related

SNPs were used for in silico duplication from three distinct 100K SNP GWASs taken from the

population of FHS Caucasians, Pima Indians, and Mexican Americans along with a 500K

GWAS in Scandinavians.

The results showed that the in 1 of the 3 independent 100K GWASs, 80 SNPs were

technically linked with T2D, 3 SNPs, that is, rs2540317 in MFSD9, rs10515353 on

chromosome 5, and rs2242400 in BCAT1 were linked with T2D in a single or more

population, and among the Scandinavians, 11 SNPs showed an association with T2D. The

strongest T2D association trait in the Amish was detected on chromosome 7 in a functionally

pertinent T2D runner gene, GRB10 [55].

African American population

The African American population suffers extensively from T2D. In the USA, this rate of

disease is exclusively high when compared to other races currently residing in the US [32].

Part of the problem may lie in the lack of adequate health care services being provided to the

African American community. The incidence, age of onset as well as the rates of

complications, all have been found to be very high among this group. This is of main concern,

for as the population of this community rises, there will be an increase in the number of

diagnosed cases as well. As many patients have early onset diabetes, the possibility of

developing complications is also raised. These combined with other chronic conditions such as

heart diseases and blood pressure, can lead to a very complex pathology, requiring extensive

treatment of the patients.

Again, the African American community displays a cocktail of genetic as well as

environmental factors that may contribute to the progression of the disease. Genetic

complications have been found to be as high as 2.9 fold in those families where the disease is

prevalent, as compared to those families where the patients are unaffected [32]. Again, this

information is very much arbitrary and based on assumptions, since there are very few studies

carried out regarding the prevalence of diabetes and diabetes related complications in the

43

African American community. Even rarer are the studies carried out on the genetic

identification of the loci responsible for the cause of the disease. Therefore, it may be expected

that the actual numbers of people suffering from diabetes may be much higher than anticipated

[32].

Phenotypic data carried out so far have shown that the age of onset of this condition in this

particular group is much younger. The mean ages were found to be 41 years at the time of

onset. This mean may go down to much younger ages due to the sedentary pattern of living in

this community. This means that the average life of a person suffering from this condition

spans to around 16 years [32]. Researchers have also shown that the predominant population

suffering from this condition is females, having obesity, poor blood sugar control and early

onset of the condition. The genetic findings in these patients showed two main regions of

single locus mutations. These are the chromosome 6 at 163.5cM and chromosome 22 at 32cM.

Here the possible genes candidates for causing T2D include estrogen receptor 1 genes, tubby

superfamily protein, insulin like growth factor 2 receptor, mitogen activated protein kinase

kinase 4, and manganese superoxide dismutase. The multilocus mutations found are in

regions 6q,7p and 18q. These loci have been found to be associated with the early onset of the

condition, as well as in the low BMI [32].

The findings of the researches have also shown that some genetic variations and mutations are

very similar in some of the other racial populations around the world. Linkage peaks of 6q24-

q27 have been found in races of Pima Indians as well as Chinese Hans [32].

Mexican American population

The Mexican European population has shown many overlapping genes which result in T2D.

These gene mutations are located on chromosomes 2q37.3, 3p24.1, and in 10q26.13

respectively. These genes have also been seen in American Caucasians, Chinese, Finnish

Caucasians and UK Caucasians [8, 14]. Indo Mauritians also display mutations in the 2q37.3

gene the marker of which is D2S125. The 3p24.1 gene is also found in the Finnish Caucasians.

The 10q26.13 is also found in the UK Caucasian population [14].

The 2q37 region linkages have been especially found to be high in the Mexican American

population, when interactions with loci on chromosome 15 were researched [14].

44

Chromosome 3 mutations are also seen in these populations. This linkage was found again on

the 3p24.1 as mentioned above, where the LOD was found to be 3.91. Very similar findings

have been received from the Finnish populations [14].

The recent researchers have found strong evidence for genetic mutations found in the region of

2q37 for Non Insulin Dependent Diabetes Mellitis, and caplain 10 for Type 2 Diabetes

Mellitus. Alongside is the contributing role of the SNPs 43 and 44 in the etiology of the

condition [37].

Europe

The European populations especially the UK population has a wide variety in its plate

considering the populations living there. The influx of many new populations have added to

the gene pool, which makes this region very diverse in terms of genetic material. The findings

of the UK population therefore, are mainly determined considering which populations have

been included in the study, and the results reflect the prevalence of diabetes in that population

only. Such disparity in the prevalence is also seen in America, and both these regions show a

very less percentage of Caucasians suffering from diabetes compared to communities such as

the African Americans, who have three to six fold higher rates of diabetes [12].

The main UK population of Caucasians has found many mutations and linkages related to

Type 2 Diabetes Mellitus. Linkages have been found on chromosomes 5q13.3 and 5q31.1

respectively, mirroring the similar findings on the American and the Finnish Caucasians [14].

Chromosome 8 mutations have also been researched in the European populations, where

mutations were found in the 8p23 region. These findings have also been demonstrated in the

population groups of American Caucasians, where the candidate gene found is the PPPIR3B

[14]. This gene has also been implicated in the Pima Indian population, and has been

associated with increased insulin resistance, along with other populations such as the Japanese

and aboriginal Canadians [14].

45

Dutch population

Very little research has been done on the Dutch population at present. The genomic scans are

limited and there is a need for carrying out more research in this regard. The study by

Einarsdottir, 2006 was based on 59 families and their genome wide scan was carried out.

Linkage analysis in this regard found high linkage with the chromosome 2, 3, 7, 11 and 12

respectively [56]. The study confirmed the role of CAPN10 gene in the risk of T2D. This also

supported previous studies where isolated Dutch families presented with CAPN10 gene in

their genome sequence [56].

The study by Rasmussen also mirrors some of the findings of the research of Einarsdottir.

Rasmussen also points out the potential and important role of CAPN10 gene in the pathology

of diabetes [57]. The gene and its various haplotypes are being discovered that may contribute

towards genetic pathology in diabetes. The three polymorphisms identified as yet include

UCSNP43, UCSNP19 and UCSNP63 respectively. These polymorphisms were initially

identified in the Mexican American population, which was later on also confirmed by research

on the African American population. The research by Rasmussen also confirmed a strong

association of the three polymorphisms with diabetes prevalence [57].

Ashkenazi Jews population

The Ashkenazi population can be considered as a very good point to start new and fresh

research on a population that has not suffered from mixing of other population genes. A

relatively uncontaminated sample of DNA that has not mixed with DNA from other

populations, therefore, can be a fertile ground on which to conduct significant research [15].

The Ashkenazi Jews populations have shown a major mutation site at chromosome 4, which is

very similar to the linkage analysis results found in the French populations. In their particular

case, the gene supposed to be responsible is the FABP2 gene or the fatty acid binding protein

gene. In this particular location, a missense mutation is found at the A54T region [14].

Genomic research on relatively unmixed samples of this population have revealed four

chromosomes involved in the pathosis of T2D. The most important ones are the chromosome

4, as mentioned above, and the chromosome 20. On the chromosome 4, the number of markers

46

found is eight in number whereas on chromosome 20, the number of markers found is five

[15]. The findings of the research are very similar to the Finnish studies that have also

identified significant role of chromosome 20 mutations in T2D acquisition [15].

Finnish population

Research carried out in Finland has identified many genes that link to T2D. Other types of

research has found the association between various physical attributes to the etiology of T2D

[10]. Linkage studies have shown high LOD values for chromosome 4 as well as on

chromosome 17[10].

In another genomic scans the Finnish population has shown linkage at chromosome 2q37.3,

and in other similar regions as well [8]. Other mutations found in the Finnish population

include mutations in the 12q24.31 and 18p11.22 genes. These genes have expressed

themselves in populations of American Caucasians and Pacific Islanders as well [8]. Other

genes found include the 3p24.1, the 6q22, 12q24.31 and the 20q13.1 respectively [14]. The

Finnish pool has shown a larger number of mutations for T2D when compared to other

population samples.

Linkage studies have shown involvement of chromosomes 1q42.2, 5q31.1, 9q21.12, 14q23,

20p12.3 and 4q34.1 in the Finnish population. The same linkages have been found in the UK

Caucasians, the Chinese, the American Caucasians and the Ashkenazi Jews respectively [14].

The most significant findings have been seen in the chromosome 4 among the Finnish

populations. The Finnish Caucasians demonstrate linkages on the chromosome 4q, which is

very near and similar to the linkage found in the French Caucasians at 4q34.1 [14].

The second significant finding has been seen on the chromosome 12. The mutation in this

region has been associated to low circulating insulin levels in the body, supported by research

on Finnish population groups [14]. The US Caucasian family researches have also shown such

linkages in this area.

Third significant linkage found was on chromosome 18, where the BMI was reported to be the

highest. The 18p11 has shown linkage to T2D in Netherlanders Caucasians, and the Mormon

Caucasians, as well as the Hans Chinese [14].

47

Another study has used GWA analysis to detect genetic variants that are associated with T2D

in Finnish population [11]. Further, the study has used its results to compare it with those of

two other similar studies of Sladek (2007) and Saxena (2007) [48, 52].

This study used 1,161 cases of T2D along with 1174 NGT control subjects from the Finnish

population was genotyped on 317,503 SNPs in the first stage. Based on the quality control

criteria and Minor Allele Frequency (MAF) values, 315,635 SNPs were selected and

examined for T2D association using an additive model. These samples were taken from the

Finland–United States Investigation of Non–Insulin-Dependent Diabetes Mellitus Genetics

(FUSION) and Finrisk studies of 2002. The proportion of the unknown SNP variants was

enlarged by an imputation approach that utilized genotype statistics and linkage disequilibrium

data from the HapMap Centre d’Etude du Polymorphisme human samples to estimate

genotypes of autosomal SNPs that were not genotyped in the subjects examined. To enhance

the statistical significance of the results obtained in this stage, the second stage used 80 SNPs

from an additional 1,215 Finnish cases with T2D and 1258 NGT control subjects and

performed a combined analysis of the joint FUSION samples of both the stages [11].

The combined samples from all the three analyses showed evidence for seven other T2D loci.

There was significant confirmation in the FUSION stage 1 GWA data for the first three loci

and, for the rest four, the FUSION stage 1 result was more reserved. The results show that

there are genetic variants associated with T2D that are present in an inter-genic area of

chromosome 11p12, in the vicinity of IGF2BP2 and CDKAL1 genes and in the area of

CDKN2A and CDKN2b genes. It also established that the genetic variants in the vicinity of

TCF7L2, SLC30A8, HHEX, FTO, PPARG, and KCNJ11 are associated with the risk of

developing T2D. These findings along with the other gives a total of ten loci associated with

T2D [11].

Studies on the Finland population have shown the presence of ASP or affected sibling family

pair populations. These populations show a significant percentage of diabetes affliction, where

incidence can range from 5 to 30% depending on the age group [30]. Studies on this particular

area have found strong evidence for chromosome 11 via fine mapping procedures. Linkages

have also been reported on chromosomes 2, 6 and 10 respectively. Study by Silander et al in

2003 has revealed the presence of 12 significant areas where linkages related to diabetes have

been implicated. The second strongest evidence was found for chromosome 14, which encodes

48

for endoplasmic reticulum functioning and ensures proper working of the liver and pancreas

respectively [30]. The particular study revealed four chromosomes that have been linked to

diabetes incidence in the Finnish sibling pairs, which are 6, 11, 14 and X respectively [30].

French population

The French Caucasian populations have shown chromosome mutations in 3q28, the marker of

which is the D3S1580 [8]. Other genes located include 2q37.3, 3q28 and the 20q13.1

respectively [14].

Despite the contribution of this particular population in the identification of the various genes

causing diabetes, the largest studies conducted yet on this population showed the linkage to

chromosome 20q12-13.1 respectively. This finding was supported by a multitude of other

researches as well [15].

Research by Silander et al, 2001 has also shown certain genetic susceptibilities that indicate

possible involvement in diabetes [30]. The research by Scott et al in 2007 has shown strong

evidence for chromosome 11 associations in the SNP region of rs9300039. Another

association was made in the intron 5 region of CDKAL1, along with rs4712523 and rs7754840

(Scott et al, 2007, pp 1344) The study showed associations of 10 genes in the pathology of

T2D. The genes are: IGF2BP2, CDKAL1, CDKN2A/B, FTO, PPARG, SIC30AB, HHEX,

TCF712, and KCNJ1 respectively [11]. Two silent genes, the E111E and N486N have also

been reported in high frequency among European and American populations [36].

Vionnett et al have carried out another significant research on genome wide scan in the

Caucasian French population in 2000. This particular genome scan has been able to verify

many of the genetic association and linkages that were proposed to occur in the French

Caucasian population. Loci identified through this research included the 2q37, which was not

found with much convincing results in the four population study carried out by Ehm et al in

the same year [28]. Study by Vionett included 143 families. The inclusion of families is able

to identify any similar traits present in the genome makeup, which could correlate with the

phenotypic and genotypic features of diabetes. Multiple individual families suffering from

diabetes were selected to ensure the inclusion of most of the genetic determinants in the scan.

The Mendelian inheritance pattern was carried out as part of the research to identify the

49

genetic picture more deeply. The phenotypic characteristics, like in the previous studies, were

determined via the physical presentations and the analysis of blood of the patients. The

phenotypic traits included in the research included the status of the diabetes condition, the age

of onset and diagnosis and the BMI [33]. The study was able to associate strongly the

phenotypic trait of impaired glucose tolerance and early age of onset to the chromosome 3q27-

qter [33]. Many genes were identified as primary candidates of diabetes occurrence. The

chromosome 1q21-q24 was strongly associated with diabetes in lean patients. This is very

different from the findings in the Pima Indian population, where the chromosome 1q has been

associated with diabetes and obesity [33]. Chromosome 20 was also implicated in the research

regarding role in the diabetes progression [33]. Study of Gibson et al, in 2005 was also an

extension in identifying the role of various proteins involved with chromosome 1 in the

pathogenesis of diabetes in the French population. Contrary to the results displayed by

Vionnet et al, the results were not as supportive for the French Caucasian population. The

study which looked into the role of the upstream transcription factor 1 (USF1), was unable to

identify any particular role in the French caucasian population [38].

The findings of the role of activins in the pathosis of diabetes and diabetes related syndromes

have also prompted studies in the humans as well. This is with regards to the pathological

development of conditions such as hypoplastic spleen, abnormal stomach, and defects in axial

patterning and lateral asymmetry etc. due to abnormal expressions in the ACVR2B. The role of

the same protein in humans was therefore, a subject of much interest, and was checked on the

French population for any association. In humans, there were three nucleotide variations found

in this particular protein. These included two silent mutations in exons 3 and 11, and a T to C

variation 13bp upstream of exon 7 [58].

The objective of the study which was carried by Rung et. Al (2009) was to identify T2D risk

loci in a group of French subjects derived from a first-stage GWAS and then followed by a

huge second stage concentrating on the 5% of those variants that are associated with T2D with

a very high significance [39]. This is followed by the third stage, which puts a greater focus on

the Danish cases and controls.

In the first stage 1,376 French subjects were used to obtain 16,360 SNPs that were nominally

associated with T2D and these SNPs were examined in an independent sample of 4,977

French subjects. There were 28 best outcomes, which were replicated in 7,698 Danish

50

individuals and resulted in detecting 4 SNPs that showed potential association with T2D. The

control subjects were chosen from DESIR study. The association was tested using

EIGENSTRAT. The quantitative analysis was carried out through linear regressions and the

odd ratios were computed with the help of logistic regression. Occurrence of T2D was

examined by using Cox proportion hazard models in DESIR. All these methods were adjusted

for sex, age and BMI.

The analysis helped to detect the T2D risk loci in the vicinity of IRS1 that were not reported

previously. This is one of the first T2D risk locus identified in a GWAS which is related to

insulin resistance and hyperinsulinemia. It confirmed that the C allele of rs2943641 was linked

with hyperinsulinemia and insulin resistance in a total of 14,358 French, Finnish and Danish

subjects. The findings further emphasize the function of insulin secretion and insulin

sensitivity in creating T2D risk, and they also show direct evidence of a genetic alteration

affecting IRS1 protein and the activity of PI (3) K. These two are key phases in insulin signal

transduction.

Although these findings show that G972R and rs2943641 could independently influence T2D

risks and strength of reactivity towards insulin, however, fine-mapping studies are

recommended to detect the etiological SNPs and analyze their interactions in different

populations in greater details [39].

Zeggini et. Al (2008) carried out meta-analysis of GWAS data to detect further susceptibility

loci associated with T2D [45]. Three T2D GWAS data consisting of 10,128 issues of

European ancestry were used for the meta-analysis. The 2,202,892 SNPs that were directly

genotyped or attributed were analyzed individually in each study for testing their association

with T2D. These were further corrected on the basis of remaining population stratification,

obscure relatedness or methodological artifact by means of genomic control. After that these

results were pooled in a genome-wide meta-analysis over a total of 10,128 samples with 4,549

case and 5,579 control subjects consisting of the results obtained from the first stage of

WTCCC, FUSION and Dig samples. Total of 69 genotypes were prioritised in the second

stage of the meta-analysis and were taken from three replication sets.

The study identified six additional loci that were not detected earlier, to be significantly

associated with T2D. These were JAZF1, CDC123-CAMK1D, TSPAN8-LGR5, THADA,

ADAMTS9 and NOTCH2 gene regions with significant probability values. The most

51

significant statistical evidence for a new association indication was found in rs864745 in

intron 1of JAZF1. These loci presented important evidences of the functions involved in the

continuance of standard glucose homeostasis and in the pathogenesis of T2D [45].

Middle East

The Middle East lacks in the number of genome wide population studies in T2D, which has

created serious gap in the understanding of diabetes trend within the population. Only

epidemiological studies have been made so far, but even these are not consistent and leave

much to be desired in understanding the total picture. Before going into more details about the

genetic disorder of Arab world, I would like to define Arab population and give a slight

introduction of their history and their migration.

Historical Background of Arabs

Looking into the history of the world at large, it becomes evident that human societies have

always been stratified on the basis of caste, class, clan, race, region, religion, ethnicity, gender,

age and socioeconomic status. It is ethnicity and racial discrimination that distinguishes one

nation from the other. “Ethnicity is” Macionis submits, “a shared cultural heritage and people

define themselves or others as members of an ethnic category based on common ancestry,

language or religion that gives them a distinctive social identity.” Same is the case with the

Arab world, which maintains its unique ethnographic identity, historical background, ancestry,

cultural traits, social norms, moral values, religious beliefs and genealogy. The people

speaking Arabic as their primary or first language are called the Arabs. At present, the total

Arab population, inhabited in twenty-three countries of the world, has been estimated to be

about 325 million with 2.3% annual increase.

They have been articulated divergent propositions regarding the origin and background of the

Arabs. One school of thought declares that the Arabian Peninsula is the origin of the Arabs,

and the Bedouin clans of that region are the forefathers of them, who had been living there far

before the birth of Abraham in Babylonia. The first positive reference to the Arabian extant

occurs in an inscription of the Assyrians, Shalmaneser III, who speaks of the capture of a

thousand camels from Gindibu, the Arabia, in 854 B. C. (Landau, 1958: 11-21: quoted in

bible.ca) In addition, it had mistakenly been considered that all Arabs are the descendents of

52

Ismail (Ishmael) the elder son of Abraham. The basic source of this information is the Semitic

religions and a large majority of the Abrahamic religions, including Jews, Christians and

Muslims; view Ismail as the father of the Arabs. According to the Jewish sources, it was

Ishmael, whose descendents were blessed and multiplied as a great nation: “GOD heard the

boy (Ishmael) crying, and the angel of GOD called to Hagar from heaven and said to her,

"What is the matter, Hagar? Do not be afraid; God has heard the boy crying as he lies there.

Lift the boy up and take him by the hand, for I will make him into a great nation."” (Genesis

21:17-18) On the contrary, historians do not see eye to eye with the tradition that all the

arabian tribes have one and the same ancestor. Instead, they strongly believe that a significant

number of many races migrated to the Arabian Peninsula after Abraham left Ismail and his

mother Hagar in the desert of Paran. The tradition got great popularity due to the fact that the

word Arab simply means a desert with neither water nor trees. Hence, the relics prove the very

fact that the region was not populated before the advent of Ismail. “Linguistically, the word

"Arab" means deserts and waste barren land well-nigh waterless and treeless; ever since the

dawn of history, the Arabian Peninsula and its people have been called as such.” After the

advent of Ismail as well as the appearance of Zamzam Well , the Qahtani tribes got their way

to the peninsula and sought the permission of Ismail to get settled in the area. Ismail got

married to the daughter of Jurhum branch of the Qahtani tribe. “The Historians generally agree

that the ancient Semitic peoples Assyrians, Aramaeans, Canaanites (including the Phoenicians

and Hebrews) and, later, the Arabs themselves migrated into the area of the Fertile Crescent

after successive crises of overpopulation in the Peninsula beginning in the third millennium

before the Common Era (BCE) and ending with the Muslim conquests of the 7th century CE.”

The historians divide the origin of Arabs into three categories:

Perishing Arabs: The relics and archaeological researches have got very little knowledge

about the very initial Arab tribes. According to the researches, it has been estimated that some

Arabic speaking clans existed in or around the present Saudi Arabia soon after the

construction of Holy Ka’aba, which were perished away at the eve of the Noah’s flood due to

their disobedience to the ways of GOD. Since there is no authentic record of their origin, life,

activities and descendents, they are often stated as the perishing ancient Arabs. Somehow, it is

thought that some of the ancient nations, including 'Ad, Thamûd, Tasam, Jadis, Emlaq, and

others, destroyed and ruined because of the wrath of Almighty GOD due to their misdeeds and

malpractices, were among the very first Arabian tribes.

53

Pure Arabs: Pure Arabs are the people, which seek their ancestry in the person of Qahtan. The

progeny of Ya'rub bin Yashjub bin Qahtan is the pure Arab, which existed from 2300 BC to

800 BC in the Sayhad region of South Arabia. “In the late 3rd Millenia BC Semitic tribes

began to concentrate in the Sayhad region in South Arabia uniting under the leadership of the

semi-legendary Qahtan.” The Qahtanis began building simple earth dams and canals in the

Marib area in the Sayhad desert. At present, the Qahtani Arabs live in Palestine, Lebanon,

Syria, Egypt, Morocco, Lybia, Ethiopia, Nigeria and other parts of the same region.

Arabized Arabs: Almost all the theologians and historians are unanimously view that an

overwhelming majority of Arabs is from the progeny of the Ishmael, the elder son of the

Prophet Abraham. They were also called Adnanian Arabs, after the name of Adnan, a pious

man and one of the descendents of Kedar, the second son of Ishmael (Ismail). Adnan is also

the ancestor of the Holy Prophet of Islam Muhammad Bin Abdullah (peace and blessings of

Almighty GOD be upon him and his family). The family of Adnan observed an imperative

growth and spread in a significant part of particularly the Arabian Peninsula. The Adnanian

Arabs were the trustee of the Holy Ka’aba and served the pilgrims arriving from yonder lands

and distant parts of the world to perform the pilgrimage of the Holy Ka’aba. The Adnanian

Arabs traveled widely for trade and commerce in different parts of the region as well as the

world. Some of them migrated and settled in divergent areas of the present Middle East as

well as the northern and central parts of Africa. The Arabized Arabs ruled over Yemen,

Heerah, Syria and Hejaz from 650 B. C. onward and retained their unique culture, language,

norms and identity wherever they moved for different purposes. They witnessed popularity,

boost and respect particularly after the advent of Islam. The descendents of Pure and Arabized

Arabs reside in almost all countries of the world, with majority and strong hold in twenty-four

countries of the Middle East and Africa. The researches reveal the very fact that the

overwhelming majority of the Arabized Arabs is Muslim and lives in almost all the states of

Africa and Asia, particularly from Iraq in the east to Morocco in the west and Lebanon in the

north to Tanzania in the south. Arab populations are distributed on 23 different countries,

namely: Algeria, Bahrain, Comoros, Djibouti, Egypt, Eritrea, Iraq, Jordan, Kuwait, Lebanon,

Libya, Mauritania, Morocco, Oman, Palestine, Qatar, Saudi Arabia, Somalia, Sudan, Syria,

Tunisia, United Arab Emirates, and Yemen.

54

Arab Migration:

The world has turned into a global village in the contemporary era and people migrate from

one part of the world to the other because of very easy and speedy means of traveling and

communication. Being one of the most dynamic races, and rulers of multiple states of the

globe, the Arabs have also made migration to Asian, European, American and Australian

continents for business, studies, health, trade and employment purposes. Moreover, the

descendents of the Holy Prophet (peace be upon him) have also migrated to Iraq, Iran, Syria,

Lebanon, Egypt, India, Russia and Yemen during the course of time on political, economic

and religious purposes for the last fourteen century. Further, the Arab Muslims also settled in

Qatar, Kuwait, UAE, Turkey, Spain, Nigeria, Tanzania, Kenya, Afghanistan, Bangladesh,

China and other parts of the globe, only for the sales of their merchandise as well as for the

preaching of their religious beliefs. In addition, the other nations have also made their

migration into Arab world. There are Africans, Indians, Chinese, Europeans and Australians

particularly in the United Arab Emirates and Kuwait serving and working at various positions

and professions.

Genetic Disorders in the Arab world:

Genetic disorders are very common among the Arabs. Researches reveal the very reality that

the ratio of genetic disorders is far higher in the Arab world in comparison with the western

and non-Arab societies of the world. “A genetic disorder is a disease that is caused by an

abnormality in an individual’s DNA. Abnormalities can range from a small mutation in a

single gene to the addition or subtraction of an entire chromosome or set of chromosomes.”

Genetic disorders result in physical or mental disabilities and dysfunctions, and high infant

mortality rate among the individuals. Like other regions of the globe, the Arab world also

undergoes genetic disorders in its population. There are many causes of such kind of disorders

among Arabs, which can be studied as under:

Consanguinity among the Arab Population:

The demographic statistics show that consanguineous marriage (inter-familial marriages or

marriages in blood-relations and cousin-marriages) are very much common among the Arabs,

which multiply the probabilities of the transmission of the family diseases to the next

55

generations. Medical researchers have proved the facts that continuous cousin-marriages

accelerate the chances of the transfer of same deficiencies in genes of the people. It has often

been observed that T2D, blood pressure and heart diseases are more common in the families,

which observe cousin marriages in access. “Throughout the Arab World”, Tadmouri observes,

“consanguineous marriage is traditionally common. Overall, around 40% to 50% of marriages

in the Arab World are consanguineous. The specific types of consanguineous marriage vary

between and within countries. First cousin marriages are the most common consanguineous

bonds in the Arab World. Estimates indicate that the percentage of first cousin marriages is

approximately 11.4% in Egypt, 21% in Bahrain, 29% in Iraq, 30% in Kuwait, 31% in Saudi

Arabia, and 32% in Jordan.” (2004: p 3) Hence, chain of consanguineous marriages is one of

the most imperative causes of genetic disorders among the Arabs.

The High Occurrence of Haemoglobinopathies in the Arab World:

Another important cause of the presence of genetic disorders includes the excessive metabolic

disorders. “The high prevalence of haemoglobinopathies”, Al-Ghazali opines, “glucose-6-

phosphate dehydrogenase deficiency, autosomal recessive syndromes, and several metabolic

disorders cause genetic disorders among the Arabs.” (2006: p 831) Absence of proper health

measures and medical check-ups during the pregnancy also result in the prevalence of

haemoglobinopathies among the next generation.

The High Birth Rates:

The number of pregnancies in Arab countries is far more than that of the pregnancies in the

western world. It seriously affects the health of the mother, which undergoes immunity

deficiency and many other diseases. An ailing, ill and aged mother cannot give birth to the

healthy children. It is therefore; infant mortality rate is very high in Arab countries due to the

genetic disorders.

Lack of Physical Exertion:

Discovery and exploration of the liquid gold i.e. oil in bulk in the Arab world during 1960s

has revolutionised the life-style and financial position of the Arabs. Economic development

has turned the Arabs more and more easy-going. Lack of physical activities and absence of

56

hard efforts cause the creation of the tender and deficient physique; which consequently

projects and promotes genetic disorders among the next generations of the whole Arab

community.

Need and Scope of Medical Researches in the Arab World:

Modern technological advancements has revolutionized the patterns of life and influenced the

pole-apart regions and areas of the world. But there exist some societies and races, which did

not take any benefit and advantage of superb scientific inventions, achievement and

accomplishments. The same is the case with the Arab and African worlds of today, where

genetic disorder has held strong roots, causes of which are still a question mark for the

individuals suffering from these uneven, unpleasant and untoward physical and mental

deficiencies. Though some theories have been articulated and researches have been conducted

regarding the causes of genetic disorders among the Arabs, yet no significant reason of such

deficiencies have been discovered still. There is an opportunity of vast scope to measure the

causes by applying research methods in order to find out sociological, biological,

environmental and psychological avenues behind the genetic diseases in the Arabs. In

addition, the public health sector is very backward in the Arab world.

United Arab Emirates (UAE)

Federation of seven prosperous, vigorous, brisk and glamorous states, the south eastern region

of the Middle East is united under the canopy of United Arab Emirates in 1971. It has been the

centre of trade and commerce in the present day world for the last four decades. People from

every part of the world arrive there in search of labor, job, trade and business. Inauguration of

grand commercial institutions, arrival of most popular multinational brands and establishment

of the chains of gorgeous recreational centers has turned the UAE as one of the most

sensational and fascinating region of the globe. Though the federation has become the

amalgamation of so many cultures of the world and the individuals belonging to almost every

nation can be found there, yet the Arabs are the most dominating stratum of the culture and

society of the UAE. Though the political set up and control of the federation is in the hands of

the native people, yet it has been estimated that over 81% of its total population consists of

foreign workers, laborers, investors, traders and other professionals. Out of 4.7 million

population of the UAE, it is estimated that only 19% are UAE national, due to the high rate of

57

immigration as well as bright opportunities of commercial activities in the whole region. Most

of the pure and original Arabs have migrated to the UAE from Hijjaz, Iraq, Egypt, Syria,

Bahrain, Oman, Lebanon, Palestine, Libya, Morocco, Somalia, Tunisia, Sudan and Algeria.

T2D has become a major public health problem in the UAE. A survey completed by the

Ministry of Health in UAE reported that the overall percentage of people with diabetes was

19.6% among UAE citizen group. Furthermore, recent studies estimated that 25% of adult

Arabs now suffer from diabetes; mainly T2D; and the prevalence of the disease is increasing

[5]

The high frequency of consanguineous marriages leads to an increase in the prevalence of

homozygosity which greatly facilitates the identification of predisposing genes [59]. Early and

extended child bearing age leads to large pedigrees with multiple affected members, which

allows extensive linkage and sibling pair analyses. These factors provide an opportunity to

study the various ethnic/tribal groups of Unites Arab Emirates (UAE) towards understanding

genetic predisposition of T2D.

T2D has become a major public health problem in the UAE. A survey completed by UAE’s

Ministry of Health reported that the overall percentage of people with diabetes was between

13% and 19% among expatriates who live in UAE. Furthermore, Malik and his colleagues

[60] have estimated that 25% of UAE nationals suffer from diabetes; mainly T2DM; and the

prevalence of the disease is increasing.

In addition, another study conducted by Reed et. al (2005) [61] on a random sample of UAE

citizens over the age of 30 living around the city of Al-Ain reported that 20% of subjects

studies suffered from T2D (14% rural to 25% urban). However, the methodology used may

have resulted in underestimation of prevalence by as much as 20% as a recent studies reported

by Centre for Arab Genomic Studies (CAGS) indicated that the prevalence of T2D in UAE

rises with increasing age reaching 40% in people over 30 years. These observations emphasise

the necessity of considering prevention for diabetes in the UAE.

Unfortunately, there are very few researches that have been carried out in the UAE regarding

genetic associations of T2D. At the moment, even the epidemiological studies about the

58

prevalence of the condition in the UAE is lacking [4]. The studies have shown that there have

been differences in the percentage of disease in urban as well as rural populations [4].

While the adult population has shown a strong inclination towards diabetes occurrence, there

are even lesser studies carried out on the pediatric population of the Arab countries. The

demographics of pediatric diabetes have shown a high prevalence of new cases being detected

each year, where the male to female ratio is roughly similar. Majority of these patients

presented with diabetic ketoacidosis in their first visit. This shows that many patients remain

undiagnosed until complications develop [15].

A related study carried out in Bahrain can offer some insight into the prevalence of diabetes in

the region. While these findings are not explanatory for UAE patients, it nevertheless may

help in identifying some of the key features that may be similar in the region. The prevalence

of diabetes in this country is very high, and patients above 30 years of age alone constitute

21% of the population [62]. In 41% of the diabetics a strong family history was found, and a

strong association with the presence of hypertension was also seen in these patients. Obesity

was as high as 74 percent in these cases, which points to the relative other health risks that

increase due to high BMIs [62]. This study however, is very old and may not be indicative of

the current trends and numbers. However, it does point towards the significance of the

problem even a very long time back.

These observations emphasize the necessity of considering prevention for T2D in the UAE.

To date, no research has been conducted on the implication of genetic testing, or genome wide

screen for T2D among UAE population or any Arab population. With the high prevalence

rates of diabetes and the constant decrease in the average age of first onset of diabetes, the

need for such a research is essential in understanding the pathological process involved in this

population.

Since, T2D has not been extensively studied among the Arab populations of the Middle East

along with the characteristics of Arabic population make them ideal for the study of complex,

polygenic, multifactorial disorders such as diabetes [63], thus there is a need for researches

which can be conducted on the implication of genetic testing, or genome wide screening for

T2D among UAE population or any other Arab population.

59

Conclusion

Lack of genome wide scan leaves very little to be discussed regarding the genetic prevalence

of diabetes in the Arab countries. The UAE suffers from the same problem. Clinical researches

in this area have helped identify the main trends in the diabetes progression in other countries

of the world; however, with the lack of this basic knowledge, the UAE population cannot

expect to advance further in the treatment strategy for diabetes.

Therefore, there is a pending need for the development of genome wide scans that should

curtail to the population of the UAE and other Arab nations. Without any progress in this area,

there is no hope for proper treatment strategies, and the number of patients with diabetes is

bound to increase.

The aim of this project is to detect loci and genes influencing susceptibility to T2D and related

traits in the UAE population, however data on DNA haplotype in the tribes of the Middle East

is limited and the advances in DNA technology provide the opportunity to study this group of

people. Therefore there are ranges of expected outcomes from this study such as:

(A) Medical applications: The study of DNA from the local ethnic groups provides a double

benefit. Apart from the development of new opportunities in forensic science, the markers will

allow the study of specific diseases that are common to populations of this region such as

T2D. If genetic profiling could be used successfully to identify high-risk individuals, this

would result in substantial benefits to both individuals and society. Targeting preventive

measures towards individuals with high-risk genotypes could delay the onset of disease, slow

its progression, and reduce the ultimate severity of the condition. This would result in

substantial improvements in quality of life for affected individuals and a reduction in

healthcare costs.

(B) Forensic biology Application: The understanding of the distribution of ethnic specific

haplotypes will expand the understanding of STR markers currently employed in crime scene

investigation. Further it will address possible limitations of STR-based DNA profiling

especially the identification of novel variants of alleles, null alleles and mutations. As the

number of samples stored in judiciary databases grows exponentially, unexpected alleles are

constantly being discovered. Previous work has shown new alleles arising from extra or less

60

core repeat units, partial repeats or indel in the sequence flanking the STR repeats. New STR-

based and SNP-based markers could be identified improving DNA profiling in forensic

science.

The identification of polymorphisms that are unique to these populations will provide an


biological evidence left at the crime scene to provide information that could be useful in an

investigation.

In order to achieve this goal, collaborations were established with major hospitals and diabetes

centers in the country. Through this collaboration, demographic data of T2D patients of

Arabic origin was collected and tabulated in database. Individuals from small nuclear families

belonging to a large extended family was selected for genome wide scans after completing

consent forms. Blood samples were taken for genotyping. DNA samples were extracted

according to the standard molecular protocols.

The biochemical data (Glucose, Lipids, HbA1c etc) were collected to complement the genetic

data. Information regarding their lifestyle were also recorded for correlation with the genetics

and biochemical data.

Genome wide screening of the samples were preformed using Human Quad 660 chips scanned

on Illumina’s BeadArray™ technology. The data collected was evaluated using strategically

selected single nucleotide polymorphisms.

Family based association analyses were used to identify genomic regions associated to the

disease. Single nucleotide polymorphisms (SNPs) were identified and haplotype association

studies were performed using haplotype relative risk and transmission disequilibrium test

analysis.

Identification of target genes might also lead to development of novel therapeutic modalities.

Further, the data could complement existing information available for other ethnic groups

towards enhancing our knowledge of the genetic etiology of the disease.

61

The Arab world was never an active participant in the large international projects in the field

of genomics, and the work presented in subsequent chapters aims at changing this position and

addressing the deficiencies that currently exist.

In chapter 2 of this thesis, the samples and data been collected throughout the United Arab

Emirates to establish the Emirates Family Registry (EFR) to develop the capabilities of a bio-

specimen repository, the associated database resources, high-throughput genotyping

capabilities and skills in medical bioinformatics for the UAE. Due to an increasing prevalence

of T2D in the region, lifestyle management strategies with an emphasis on prevention are

required. Consequently, Total of 23,064 volunteers provided consent to allow their clinical

data to be stored in EFR's database in order to study the prevalence of T2D in a population of

United Arab Emirates (UAE).

In chapter 3 of this thesis, we examine the influence of environmental factors in the

pathophysiology of T2D and its related phenotypes in an Arab population. Upon showing that

Arabs have lifestyle problems. Multiple factors, both environmental and genetic, contribute to

the incidence and distribution of T2D therefore; this study describes the role of genes and the

influence of the environmental on the increasing prevalence of T2D in Arab populations.

Physical and clinical traits were collected for assessment. In addition, pairwise phenotypic

correlations of the eight quantitative traits was observed, specifically between HbA1c and

fasting glucose. This assessment of phenotypic factors will be followed up with ongoing

studies to evaluate the contribution of genetic polymorphisms that contribute to the prevalence

of T2D in Arab populations.

Chapter 4 of this thesis, a new method was assessed to allow collection in remote regions and

in developing countries. This study describes the use of FTATM technology for storage DNA

and a Whole Genome Amplification step prior to GWAS application as an alternative strategy

for high throughput genotyping.

Chapter 5 of this thesis describes the distribution of four Alu markers in the Bedouin

population of the Middle East. Specifically, it establishes the relationship between Arab

populations and others previously studied. Ethnic-specific polymorphisms can be used to

profile biological evidence left at the crime scene to provide information that could be useful

in an investigation.

62

Chapter 6 of this thesis, current a study to detect loci and genes influencing susceptibility to

T2D in the United Arab Emirates population. In this study more sophisticated technologies

were used to study DNA polymorphisms and their influence on T2D among Arab.To date, no

research has been conducted on the implication of genetic testing, or genome wide screening

for T2D among UAE population nor any other Arab population.

Chapter 7 of this thesis, study conducted to study the genetic associations with obesity in

ethnically homogeneous cohorts from United Arab Emirates. In this chapter the study focused

on mean Body Mass Index and mean Waist Circumference. A total of 657,367 SNPs in one

extended Emirati family of 319 members only 178 were genotyped been tested in these two

traits. Modern life style of Arab population increased significantly weight gain early in adult

life, thus contributing to the obesity epidemic and associated diseases such as T2D, which

makes them an ideal population to conduct such study.

63

REFERENCES

1. Leslie, R.D., Metabolic changes in diabetes. Eye (Lond), 1993. 7 ( Pt 2): p. 205-8.

2. Hossain, P., B. Kawar, and M. El Nahas, Obesity and diabetes in the developing

world--a growing challenge. N Engl J Med, 2007. 356(3): p. 213-5.

3. Hogan, P., T. Dall, and P. Nikolov, Economic costs of diabetes in the US in 2002.

Diabetes Care, 2003. 26(3): p. 917-32.

4. El-Sharkawy, T., Diabetes in the United Arab Emirates and Other Arab Countries:

need for Epidemiological and Genetic Studies, in Genetic Disorders in the Arab

World. 2004, Centre for Arab Genomic Studies: Dubai. p. 57.

5. Wild, S., et al., Global prevalence of diabetes: estimates for the year 2000 and

projections for 2030. Diabetes Care, 2004. 27(5): p. 1047-53.

6. Goldstein, I., The mutually reinforcing triad of depressive symptoms, cardiovascular

disease, and erectile dysfunction. Am J Cardiol, 2000. 86(2A): p. 41F-45F.

7. American Diabetes Association: National Diabetes Fact Sheet. Alexandria, VA, ADA,

2002.

8. Florez, J.C., J. Hirschhorn, and D. Altshuler, The inherited basis of diabetes mellitus:

implications for the genetic analysis of complex traits. Annu Rev Genomics Hum

Genet, 2003. 4: p. 257-91.

9. Chu, S.Y., S.Y. Kim, and C.L. Bish, Prepregnancy obesity prevalence in the United

States, 2004-2005. Matern Child Health J, 2009. 13(5): p. 614-20.

10. Parker, A., et al., A gene conferring susceptibility to type 2 diabetes in conjunction

with obesity is located on chromosome 18p11. Diabetes, 2001. 50(3): p. 675-80.

11. Scott, L.J., et al., A genome-wide association study of type 2 diabetes in Finns detects

multiple susceptibility variants. Science, 2007. 316(5829): p. 1341-5.

12. Barroso, I., Genetics of Type 2 Diabetes. Diabet Med, 2005. 22(5): p. 517-35.

13. Acton, R.T., et al., Genes within the major histocompatibility complex predict NIDDM

in African-American women in Alabama. Diabetes Care, 1994. 17(12): p. 1491-4.

14. Huang, Q.-Y., M.-R. Cheng, and S.-L. Ji, Linkage and Association Studies of the

Susceptibility Genes for Type 2 Diabetes. Acta Genetica Sinica, 2006. 33(7): p. 573-

589.

15. Permutt, M.A., et al., A genome scan for type 2 diabetes susceptibility loci in a

genetically isolated population. Diabetes, 2001. 50(3): p. 681-5.

64

16. Freudenrich, C., How Diabetes Works. 2002.

17. Parnes, B., et al., Provider deferred decisions on hemoglobin A1c results: a report

from the Colorado Research Network (CaReNet) and the High Plains Research

Network (HPRN). J Am Board Fam Med, 2006. 19(1): p. 20-3.

18. Norman, J., The Diabetes Center. 2006, The Norman Parathyroid Clinic: Tampa, FL.

19. National Institute of Diabetes and Digestive and Kidney Diseases. National Diabetes

Statistics fact sheet: general information and national estimates on diabetes in the

United States. 2003, Department of Health and Human Services, National Institutes of

Health: Bethesda, MD: U.S.

20. Scheede-Bergdahl, C., et al., Metallothionein-mediated antioxidant defense system and

its response to exercise training are impaired in human type 2 diabetes. Diabetes,

2005. 54(11): p. 3089-94.

21. Kennedy, J.W., et al., Acute exercise induces GLUT4 translocation in skeletal muscle

of normal human subjects and subjects with type 2 diabetes. Diabetes, 1999. 48(5): p.

1192-7.

22. Musi, N., et al., AMP-Activated Protein Kinase (AMPK) Is Activated in Muscle of

Subjects With Type 2 Diabetes During Exercise. Diabetes, 2001. 50(5): p. 921-927.

23. Lu, H., et al., Diabetes interferes with the bone formation by affecting the expression of

transcription factors that regulate osteoblast differentiation. Endocrinology, 2003.

144(1): p. 346-52.

24. Almind, K., A. Doria, and C.R. Kahn, Putting the genes for type II diabetes on the

map. Nat Med, 2001. 7(3): p. 277-9.

25. Gerich, J.E., The Genetic Basis of Type 2 Diabetes Mellitus: Impaired Insulin

Secretion versus Impaired Insulin Sensitivity. Endocr Rev, 1998. 19(4): p. 491-503.

26. Wiltshire, S., et al., Evidence from a large U.K. family collection that genes

influencing age of onset of type 2 diabetes map to chromosome 12p and to the

MODY3/NIDDM2 locus on 12q24. Diabetes, 2004. 53(3): p. 855-60.

27. Strachan Tom, R.A., Human Molecular Genetics 2 Second ed. 1999 John Wiley &

Sons, Inc.

28. Ehm, M.G., et al., Genomewide Search for Type 2 Diabetes Susceptibility Genes in

Four American Populations. The American Journal of Human Genetics, 2000. 66(6):

p. 1871-1881.

29. Frayling, T.M., Genome-wide association studies provide new insights into type 2

diabetes aetiology. Nat Rev Genet, 2007. 8(9): p. 657-62.

65

30. Silander, K., et al., A Large Set of Finnish Affected Sibling Pair Families With Type 2

Diabetes Suggests Susceptibility Loci on Chromosomes 6, 11, and 14. Diabetes, 2004.

53(3): p. 821-829.

31. Das, S.K., Genetic Epidemiology of Adult Onset Type 2 Diabetes in Asian Indian

Population: Past, Present and Future. INTERNATIONAL JOURNAL OF HUMAN

GENETICS, 2006. 6(1): p. 1-13.

32. Sale, M.l.M., et al., A Genome-Wide Scan for Type 2 Diabetes in African-American

Families Reveals Evidence for a Locus on Chromosome 6q. Diabetes, 2004. 53(3): p.

830-837.

33. Vionnet, N., et al., Genomewide Search for Type 2 Diabetes-Susceptibility Genes in

French Whites: Evidence for a Novel Susceptibility Locus for Early-Onset Diabetes on

Chromosome 3q27-qter and Independent Replication of a Type 2-Diabetes Locus on

Chromosome 1q21-q24. The American Journal of Human Genetics, 2000. 67(6): p.

1470-1480.

34. Reinberg, S., Exercise Helps Teens Overcome 'Obesity Gene', in HealthDay. 2007.

35. Fu, M., et al., Polymorphism in the calsequestrin 1 (CASQ1) gene on chromosome

1q21 is associated with type 2 diabetes in the old order Amish. Diabetes, 2004. 53(12):

p. 3292-9.

36. Dupont, S., et al., No Evidence for Linkage or for Diabetes-Associated Mutations in

the Activin Type 2B Receptor Gene (ACVR2B) in French Patients With Mature-Onset

Diabetes of the Young or Type 2 Diabetes. Diabetes, 2001. 50(5): p. 1219-1221.

37. Hayes, M.G., et al., Patterns of linkage disequilibrium in the type 2 diabetes gene

calpain-10. Diabetes, 2005. 54(12): p. 3573-6.

38. Gibson, F., S. Hercberg, and P. Froguel, Common Polymorphisms in the USF1 Gene

Are Not Associated With Type 2 Diabetes in French Caucasians. Diabetes, 2005.

54(10): p. 3040-3042.

39. Rung, J., et al., Genetic variant near IRS1 is associated with type 2 diabetes, insulin

resistance and hyperinsulinemia. Nat Genet, 2009. 41(10): p. 1110-5.

40. Takeuchi, F., et al., Confirmation of multiple risk Loci and genetic impacts by a

genome-wide association study of type 2 diabetes in the Japanese population.

Diabetes, 2009. 58(7): p. 1690-9.

41. Bouatia-Naji, N., et al., A variant near MTNR1B is associated with increased fasting

plasma glucose levels and type 2 diabetes risk. Nat Genet, 2009. 41(1): p. 89-94.

66

42. Timpson, N.J., et al., Adiposity-related heterogeneity in patterns of type 2 diabetes

susceptibility observed in genome-wide association data. Diabetes, 2009. 58(2): p.

505-10.

43. Unoki, H., et al., SNPs in KCNQ1 are associated with susceptibility to type 2 diabetes

in East Asian and European populations. Nat Genet, 2008. 40(9): p. 1098-102.

44. Yasuda, K., et al., Variants in KCNQ1 are associated with susceptibility to type 2

diabetes mellitus. Nat Genet, 2008. 40(9): p. 1092-7.

45. Zeggini, E., et al., Meta-analysis of genome-wide association data and large-scale

replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet, 2008.

40(5): p. 638-45.

46. Gudmundsson, J., et al., Two variants on chromosome 17 confer prostate cancer risk,

and the one in TCF2 protects against type 2 diabetes. Nat Genet, 2007. 39(8): p. 977-

83.

47. Salonen, J.T., et al., Type 2 diabetes whole-genome association study in four

populations: the DiaGen consortium. Am J Hum Genet, 2007. 81(2): p. 338-45.

48. Saxena, R., et al., Genome-wide association analysis identifies loci for type 2 diabetes

and triglyceride levels. Science, 2007. 316(5829): p. 1331-6.

49. Zeggini, E., et al., Replication of genome-wide association signals in UK samples

reveals risk loci for type 2 diabetes. Science, 2007. 316(5829): p. 1336-41.

50. Hanson, R.L., et al., A search for variants associated with young-onset type 2 diabetes

in American Indians in a 100K genotyping array. Diabetes, 2007. 56(12): p. 3045-52.

51. Steinthorsdottir, V., et al., A variant in CDKAL1 influences insulin response and risk of

type 2 diabetes. Nat Genet, 2007. 39(6): p. 770-5.

52. Sladek, R., et al., A genome-wide association study identifies novel risk loci for type 2

diabetes. Nature, 2007. 445(7130): p. 881-885.

53. Xiang, K., et al., Genome-wide search for type 2 diabetes/impaired glucose

homeostasis susceptibility genes in the Chinese: significant linkage to chromosome

6q21-q23 and chromosome 1q21-q24. Diabetes, 2004. 53(1): p. 228-34.

54. Sanghera, D.K., et al., Impact of nine common type 2 diabetes risk polymorphisms in

Asian Indian Sikhs: PPARG2 (Pro12Ala), IGF2BP2, TCF7L2 and FTO variants

confer a significant risk. BMC Med Genet, 2008. 9: p. 59.

55. Rampersaud, E., et al., Identification of novel candidate genes for type 2 diabetes from

a genome-wide association scan in the Old Order Amish: evidence for replication from

67

diabetes-related quantitative traits and from independent populations. Diabetes, 2007.

56(12): p. 3053-62.

56. Einarsdottir, E., et al., Linkage but not association of calpain-10 to type 2 diabetes

replicated in northern Sweden. Diabetes, 2006. 55(6): p. 1879-83.

57. Rasmussen, S.K., et al., Variants within the calpain-10 gene on chromosome 2q37

(NIDDM1) and relationships to type 2 diabetes, insulin resistance, and impaired acute

insulin secretion among Scandinavian Caucasians. Diabetes, 2002. 51(12): p. 3561-7.

58. Dupont, S., et al., No Evidence for Linkage or for Diabetes-Associated Mutations in

the Activin Type 2B Receptor Gene (ACVR2B) in French Patients With Mature-Onset

Diabetes of the Young or Type 2 Diabetes. Diabetes, 2001. 50(5): p. 1219-1221.

59. de Costa, C.M., Consanguineous marriage and its relevance to obstetric practice.

Obstet Gynecol Surv, 2002. 57(8): p. 530-6.

60. Malik, M., et al., Glucose intolerance and associated factors in the multi-ethnic

population of the United Arab Emirates: results of a national survey. Diabetes Res

Clin Pract, 2005. 69(2): p. 188-95.

61. Reed, R.L., et al., A controlled before-after trial of structured diabetes care in primary

health centres in a newly developed country. Int J Qual Health Care, 2005. 17(4): p.

281-286.

62. Al-Zurba F, A.-G.A., Prevalence of Diabetes Mellitus among Bahrainis Attending

Primary Health Care Centers. Eastern Mediterranean Health Journal, 1996. 2: p. 274-

282.

63. Kambouris, M., Target gene discovery in extended families with type 2 diabetes

mellitus. Atheroscler Suppl, 2005. 6(2): p. 31-6.

68

69

CHAPTER 2

THE PREVALENCE OF TYPE 2 DIABETES

MELLITUS IN THE UNITED ARAB EMIRATES:

JUSTIFICATION FOR THE ESTABLISHMENT OF

THE EMIRATES FAMILY REGISTRY.

This chapter was submitted to International Journal of Diabetes in Developing Countries and

the format presented is as per the "Instruction to Authors" from the publishing house.

70

71

Chapter 2

The Prevalence of Type 2 Diabetes Mellitus in the United

Arab Emirates: Justification for the Establishment of the

Emirates Family Registry.

Chapter 2 is presented as a manuscript submitted to International Journal of Diabetes in

Developing Countries. The data collected was only possible through the collaborative

network of three hospitals, nine primary care centres, the Dubai Police Clinic, the United

Arab Emirates (UAE) Ministry of Health and the University of Western Australia.

This manuscript describes The "EFR Project" or Emirates Family Registry, which was

established as part of a collaborative effort to develop the capabilities of a bio-specimen

repository, associated database resources, high-throughput genotyping capabilities and skills

in medical bioinformatics for the UAE. Towards demonstrating its feasibility, a pilot project

commenced in 2007 has recruited volunteers from 3 local hospitals and 9 primary care

centres. Through this network, 23,064 volunteers provided consent to allow their clinical data

to be stored in EFR's database (Table1). DNA samples from Bedouins with Type 2 Diabetes

(T2D) were collected from 1,766 donors. Due to the increasing prevalence of T2D in the

region, lifestyle management strategies with an emphasis on prevention are required.

Consequently, understanding the environmental factors and genetic predispositions were

important aims of this study to ensure successful implementation of future public awareness

programs.

Table1: The Emirates Family Registry Database

Disease Status Ethnicity Males Females Total

Type 2 Diabetes

Without complication

Bedouin 1,092 1,595 2,687 Others 6,550 8,450 15,000

With Cardiovascular Complications Bedouin 1,092 1,595 2,687

Healthy Volunteers Bedouin 1,330 1,360 2,690 Total: 23,064

72

The data presented in the manuscript specifically summarises the features of diabetes in a

local population, not previously studied. The UAE consists of a cosmopolitan population that

includes the tribes of the Middle East and expatriates from neighboring Asian nations.

Although unique in the make-up of the ethnic group studied, the project was conceived on the

basis of previous studies in different ethnic groups including those published in the

International Journal of Epidemiology (1992, 21:352-358; 1998, 27:853-1859; 1999, 28:498-

501; and 2006, 35:1553-1562).

It is evident that both lifestyle and inherited risk factors lead to the development of T2D.

Therefore this study was established to study the prevalence of T2D in a population of UAE

residents through the creation of the EFR. The conclusion from the analyses performed have

revealed that obesity, waist circumference, consanguineous marriage, family history, lack of

physical activity, unhealthy diet with high total cholesterol and triglycerides levels were more

prevalent in T2D patients.

The pilot program of the EFR described here was quite successful. The data presented

throughout this manuscript sort life style features, which contribute to disease in order to

defining more effective and specific plans to screen for and manage diabetes and its

complication in the UAE and other developing countries throughout the Middle East region.

Figure 1: Collaborative Links of the EFR Project have been established throughout the

Middle East, United Kingdom and Australia

73

Further, the collaborative network established with international research groups in

Australia, Europe and the Middle East (see Figure1) will ensure future development of the

EFR project.

I prepared this manuscript with support from the co-authors listed. The samples were

collected by local healthcare workers and I compiled all the collected data. Dr Hassoun; a

physician; of the Joslin Diabetes Centre, an affiliate of the Dubai Health Authority,

contributed as clinical expert and provided guidance. Khadra Jama-Alol is the epidemiologist

at University of Western Australia’s School of Public Health who assisted me with the

statistical analyses. Dr Tay assisted me throughout the study with specific guidance relating to

the design of the study.

74

75

The Prevalence of Type 2 Diabetes in the United Arab Emirates: Justification for the

Establishment of the Emirates Family Registry.

Habiba S Al Safar1, 2, Khadra A Jama-Alol3, Ahmed AK Hassoun4, Guan K Tay1

1 Centre for Forensic Science, University of Western Australia, Western Australia,

Australia. 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates. 3 School of Population Health, University of Western Australia, Western Australia,

Australia. 4 Joslin Diabetes Centre, Affiliate Dubai Health Authority, Dubai, United Arab Emirates.

Abbreviated title: Type 2 Diabetes, United Arab Emirates, Emirates Family Registry

Keywords: Prevalence, Type 2 Diabetes, UAE, EFR

Publication number HA09-0005 of the Center for Forensic Science at the University of

Western Australia.

Corresponding author:

Associate Professor Guan K Tay


The University of Western Australia

35 Stirling Highway, Crawley WA 6009, AUSTRALIA

Phone: + 61 8 6488 7286

Fax: + 61 8 6488 7285

Email: [email protected]

76

77

ABSTRACT

This project was conceived with the aim of studying the prevalence of Type 2 Diabetes (T2D)

in a population of United Arab Emirates (UAE) residents through the creation of the “Emirates

Family Registry” (EFR). This resource is the first of its kind as it focuses on the indigenous

populations of the Arab world. It will allow researchers to collect and collate data from

patients with T2DM and healthy volunteers to assess features that may contribute to disease

progression among populations of the Middle East.

Methods: Major hospitals and diabetes centres in the UAE were contacted to establish a bio-

banking facility referred to as the EFR. Through assistance made available by the Ministry of

Health and collaborators of this network, demographic data of T2D patients were collected

and collated in a database for analysis and longitudinal studies into the future. Clinical

specimens were collected for biochemical profiling (such as; glucose, lipids, HbA1c levels).

Results: In the first 24 months of the operation the EFR recruited 23,064 adult volunteers from

three major hospitals and nine primary care centres throughout the UAE. Within this cohort,

88% were patients classified as T2D patients from the medical records. The cohort was

divided into age categories with 59% of T2D patients aged between 40 and 59 years old. UAE

nationals comprised 30% of the database of which 21% were diagnosed with T2D. However

the percentage of adults with T2D was higher in other ethnic groups effecting almost 33% of

the Indians who live in the UAE. A total of 741 UAE Nationals consented to donate blood; in

phase I of the study; for biochemical testing, of which 23% were diagnosed with T2D, 30%

with pre T2D and 47% were healthy following the completion of testing.

Conclusion: This study is consistent with the previously reported high prevalence of T2D in

the UAE. Furthermore, analyses of the factors that predispose to the disease have revealed that

obesity, a large waist circumference, consanguineous marriage, family history, lack of

physical activity, unhealthy dietary practices, high total cholesterol, and high triglycerides

levels were more prevalent in T2D patients. The classification of these features will contribute

to defining more effective and specific plans to screen for and manage diabetes and its

complication in the UAE and other developing countries throughout the Middle East region as

well as other developing countries.

78

INTRODUCTION

Type 2 Diabetes (T2D) is a group of metabolic diseases characterised by hyperglycaemia (1).

Several physiological processes are involved in the development of diabetes (2). These range

from autoimmune destruction of the β-cells of the pancreas with consequent insulin deficiency

to abnormalities that result in resistance to insulin action. The majority diabetic cases fall into

two broad etiopathogenetic categories: Type 1Diabetes (T1D) caused by an absolute

deficiency of insulin secretion, and T2D caused by a combination of resistance to insulin

action and an inadequate compensatory insulin secretary response. T2D accounts for some 90

to 95% of those with diabetes. It was previously referred to as non-insulin dependent diabetes

or adult onset diabetes (3). It includes individuals who have insulin resistance, relative insulin

deficiency, and usually need insulin treatment mainly later in the course of disease (4, 5).

The chronic hyperglycaemia resulting from diabetes is associated with long-term dysfunction,

damage and eventually failure of various organs (6, 7). These changes mainly occur due to

micro- and macrovascular complications. Long-term complications of diabetes include

retinopathy with potential loss of vision; nephropathy leading to chronic kidney disease;

peripheral neuropathy with risk of foot ulcers, amputations and Charcot joints; and autonomic

neuropathy causing gastrointestinal, genitourinary, and cardiovascular symptoms and sexual

dysfunction (8). Patients with diabetes have an increased risk of developing atherosclerotic

cardiovascular, peripheral arterial and cerebrovascular disease (9-12). People with diabetes

often have high prevalence of hypertension and abnormalities of lipoprotein metabolism (13).

The United Arab Emirates (UAE) has a cosmopolitan population of about 4.7 million and

exhibits a unique demographic structure. The UAE sits at a crossroad of the trade routes

between Asia and Europe. It has flourished as a contemporary centre of trade and commerce

over the last four decades. People from every part of the world arrive in search of jobs, trade

and business. UAE national makes up only 19% of the total population with balance

comprising expatriates of different ethnic backgrounds. The largest ethnic group are people of

South Asian origin (approximately 50%). Those from other parts of Asia include Philippines,

China, Hong Kong, Indonesia, Singapore and Thailand. These East Asians are grouped with

Caucasian and compromise up to 8% of the population. Iranians comprises 8% and the rest of

the population are from other Arab states (15%). These estimates are based on the results of

79

the 2005 census that included a significantly higher estimate of net immigration of non

citizens than estimates in July 2009 (14, 15).

T2D, has become a major public health problem in the UAE. A survey completed by UAE’s

Ministry of Health reported that the overall percentage of people with diabetes was between

13% and 19% among expatriates who live in UAE. Furthermore, Malik and his colleagues

(16) have estimated that 25% of UAE national suffer from diabetes; mainly T2D; and the

prevalence of the disease is increasing.

In addition, another study conducted by Reed and colleagues (2005) (17, 18) on a random

sample of UAE citizens over the age of 30 living around the city of Al-Ain reported that 20%

of subjects studies suffered from T2D (14% rural to 25% urban). However, the methodology

used may have resulted in underestimation of prevalence by as much as 20% as a recent

studies reported by Centre for Arab Genomic Studies (CAGS) indicated that the prevalence of

T2D in UAE rises with increasing age reaching 40% in people over 30 years. These

observations emphasise the necessity of considering prevention for diabetes in the UAE.

The Emirate Family Registry (EFR) project was conceived to provide a means to more

accurately estimate prevalence through a longitudinal approach. Secondly it represents an

important tools and resource as the genomic era gains momentum towards assisting in

deciphering the complexity of diseases in humans (19). Similar approaches to assess risk

factors of diabetes in other populations have been conducted (20-23) .When the EFR project

commenced, the requirement was to establish a registry with well defined description of the

disease (ie. the phenotype) as well as the genetic background of populations of interest (ie. the

genotype). This resource is currently not available for the ethnic groups of the Arab world.

Therefore the EFR was developed to address this deficiency. The EFR can be used by local

research groups to systematically study common diseases throughout the Middle East region.

It will also be used to develop regional and international collaborations in biomedical science.

The EFR is a register containing information on the local ethnic population of the region

designed specifically to study the genetic factors that are unique to this region which will lead

to better patient care, disease management and improved quality of life.

80

MATERIAL AND METHODS

Emirates Family Registry (EFR)

Three major hospitals and nine primary care centres in the United Arab Emirates (UAE) were

contacted to establish EFR. Through this collaboration, data from all patients attending these

clinics and hospitals was collected and tabulated in a database. This study was approved by the

ethics committee of the UAE Ministry of Health and Dubai Police Head Quarter (see

appendix). In general, UAE nationals are the majority group that visit the clinics and the care

center, from which the samples were collected, sees mostly patients with UAE identify cards.

For a non-local visitors including expatriates, passports are required in order to receive

treatment. Therefore the nationality of each volunteer was determined by their legal

documents. Patients and volunteers were selected randomly. A verbal consent was obtained

from those patients who agreed to allow their name to be added to the registry and an

informed consent was obtained from all individuals who donated blood before commencement

of the study procedures. The procedure for collecting the data and samples is summarised in

Figure1.

The database of the registry was constructed using Visual Studio 2006. The EFR comprises

two components: (1) a computer database documenting the details of participants of the

registry and (2) a DNA and bio-specimen repository. Data from patients include demographic

data, biochemical results such as haemoglobin A1c (HbA1c), fasting blood glucose, oral

glucose tolerance test (OGTT), lifestyle variables (healthy diet, daily physical activity,

smoking, quality of life), disease complications (neuropathy, nephropathy, retinopathy) and

family history. There are provisions to expand the registry to include different diseases and

their associated clinical and genetic features.

Subject

A total of 23,064 adult who reside in the UAE volunteered to participate in this study on their

routine visit to the three major hospital and nine primary care centre. Of the total group 20,374

were diagnosed with T2D. Overall 741 UAE Nationals donated blood for biochemical test to

confirm their diagnosis (Diabetic, Pre Diabetic and healthy) and to study their risk factors

which contribute to developing T2D.

81

Collection of Phenotype data

Trained nurses measured the height and weight of each participant using a calibrated wall-

mounted stadiometer and a weigh scale, respectively. Body Mass Index (BMI in kg/m2) was

the measure: weight in kilograms (kg) divided by the square of height in metres (m). Waist

Circumference (WC) was measured in inches. For classification over weight and obesity was

defined according criteria provided by the World Health Organization (WHO). A WHO

classification for BMI over weight ranges between 25 to 30 kg/m2. High waist circumference

was defined as ≥ 35 inches for females and ≥ 40 inches for males.

Biochemistry Profile

Up to 5ml of peripheral blood was drawn from 741 UAE national and collected in EDTA,

Heparin and Fluoride vacutainers for biochemical test. Fluoride and Heparin tubes were

centrifuged at 3,000 rpm for 5 minute and serum was collected. Serum from the Fluoride tubes

were used to measure fasting glucose, Total cholesterol and oral glucose tolerance, and serum

from Heparin tubes were used to measure triglycerides, urea and creatinine level. 25µl of

blood from EDTA tube were used to measure haemoglobin component A1c (HbA1c).

An individual was classified as diabetic if the subject (1) was diagnosed with Diabetes by a

qualified physician, (2) was on a prescribed drug treatment regime for Diabetes and (3) had

biochemical test results that was consistent with the criteria laid by the World Health

Organization (WHO) consultation group report that specifies a fasting plasma glucose level of

at least 126mg/dl. Impaired glucose tolerance was preformed only on subject that did not

suffer from diabetes when enrolled in this study. Individuals were classified in the pre

Diabetic group if the 2 hour post glucose level in the subject was more than 140mg/dl and

more and normal glucose tolerance was a classification used if the 2 hours post glucose level

was less than 140mg/dl.

All the biochemical tests were performed at Al-Baraha Hospital using the Cobas Integra 800

clinical chemistry system (Roche Diagnostics, Indianapolis, USA).

Statistical analysis

The p values (probability value) for each phenotype studied were calculated using Dunnett's

Multiple Comparison Test in GraphPad Prism version 5.0. The standard deviation, mean and

82

percentages were calculated from data input into a Microsoft Excel spreadsheet. A p value <

0.05 was regarded as statistically significant for a two-sided test.

83

RESULTS

The establishment of a registry which contains essential clinical information linked to genomic

data is vital towards great understanding of disease mechanisms in the local ethnic groups of

the UAE. The flow chart in Figure 1 shows the path all patients who volunteer to participate in

the EFR go through is a well defined process. The patient is interviewed and consent is

obtained in their routine visit to the primary care centre or hospital. The data collected from

this patient is entered into the database and becomes part of the overall data of the registry.

The patient’s disease status is assessed and specimen types are recommended and collected.

Subsequently, as data from the analysis becomes available it is entered into the database.

Table 1 provides a summary of the data that has been entered into the registry at the time this

manuscript was compiled. As the patients’ data are entered as shown in Figure 1, it

accumulates and increases the amount of information available for analysis. Over the lifetime

of the registry this information will become an important resource. To date the EFR contains

information on 23,064 individuals, of which 60% were between the ages 40 to 59 years old.

Female volunteers comprise 56% of the entries in the database. Almost 30% are UAE

nationals and 88.3% were diagnosed with T2D.

The registry was set up to collect data to allow estimates of the percentages of the population

who are burdened with diabetes. In time, the overall prevalence of disease throughout the

population will be determined. Figure 2 shows a breakdown of the information collected for

the separate age categories studied. Approximately 3% of T2DM patients were under the age

of 20, about 13% of adult were aged between 20 and 39 years, 59% of adult were aged 40 and

59 years and more than 24% of adults were aged 60 years or older. This was included in the

study design because of the fact that most studies are showing that younger populations

throughout the world are succumbing to T2D.

84

Figure 1: Flow chart outlining the process of recruiting volunteers into the Emirates Family

Registry from three major hospitals and nine primary care centres in the UAE.

The Emirates Family Registry consists of (a) associated database resources that

contain demographic, clinical and genetic data and (b) bio-specimen repository.

85

Table 1: Characteristic of the 23,064 individuals in the Emirates Family Registry.

Characteristic Value (n) Percent (%)

Age (years)

18-20 1,014 4.40

21-39 3,281 14.22

40-59 13,126 59.91

+60 5,642 24.46

Gender

Male 10,059 43.61

Female 13,005 56.39

Ethnicity

UAE National 6,904 29.93

Others* 16,160 70.07

Disease Affection

Type 2 Diabetes 20,374 88.34

Healthy 2,690 11.66

*Consist of: (124; 0.54%) African, (14,587; 63.25%) Asian, (348; 1.51%) Caucasian, (1097;

4.76%) Middle Eastern (except UAE) and (4; 0.017%) Southern American who are residence of

the United Arab Emirates during the study period

86

Figure 2: Chart estimating the percentage of Type 2 Diabetes patients by age group in Emirates Family Registry.

Approximately 3% of T2D patients were under the age of 20, about 13% of adult were aged between 20 and 39 years,

59% of adult were aged 40 and 59 years and more than 24% of adults were aged 60 years or older.

87

The EFR reflects the ethnic diversity of the UAE population and in Table 2; the volunteers are

separated into East Asia, Central Asia, and Middle East. Unfortunately, the local Ministry of

Economy has chosen to combine the minority groups into one category, which combines the

district genetic groups in the orient (East Asia) with Caucasians (western group). Apart from

this discrepancy, information is readily available according to country giving data that is more

specific to each population. Overall, the population of UAE nationals with T2DM is 21%.

However the EFR revealed a higher percentage of T2D in other ethnic groups such as Indian

(33%) as one of the major hospital most of their patients where Indian origin at the time this

study was carried out.

The issue of screening for T2D is important both in terms of an individual’s health and day-to-

day clinical practice as well to a country's overall public health system. One of the advantages

of the screening process set out in the EFR program is to identify individuals at risk of having

undiagnosed T2D or at risk of developing T2D as it will play an important role in preventing

complication of the disease. Tables 3 and 4 shows the risk factors that affect the 741 UAE

national who volunteered for the study. There were three groups, those diagnosed with T2D,

those with pre-T2D and healthy individuals. The physical appearance, life style, family

history and results of biochemical test of each volunteer was recorded. In regard of the

physical appearance 26.18% of the UAE participants are overweight and 7% are obese. Of the

741 UAE national 39% have large waist circumference (male and female). Additionally,

lifestyles features can be seen in Table 3 which shows that 58% of the patients having

unhealthy food in their diet such as fast food, 45% not performing any kind of exercises

(minimum of 30 minutes walking a day). Genetics heritability is another risk factor in

developing the disease; Table 3 shows that 65% have a history of T2D in their family (at least

one parent diagnosed with T2D) with 35% of consanguinity marriage. As far as the

biochemical tests performed the percentage of the population with results association with the

disease are summarised in Table 3.

Table 5 summarises the predicted p value between healthy group and pre-T2D and between

healthy group and T2D using Dunnett's Multiple Comparison Test. Age represents the most

significant risk factor in developing T2D among the physical appearance features p = 0.0065.

Lipids profile such as cholesterol and triglyceride shows a significant p value (0.0018, 0.0023

respectively) when healthy individuals are compared to diabetic patients.

88

Table 2: In 2009, UAE’s population was estimated at 4.7 million, of which 19% were UAE nationals, while the majority of the

population were expatriates. The largest group were of South Asian origin (50%). Those from other parts of Asia

(includes Philippines, China, Hong Kong, Indonesia, Singapore and Thailand) and those of Caucasian origin

compromised up to 8% of the population, while Iranian comprised 8% of the population and the rest of the population

were from other Arab states (15%). It was estimated that close to 20 percent of UAE national have Type 2 Diabetes.

However the percentage of adults with Type 2 Diabetes is higher in other group effecting almost 52.67% of Southern

Asian who lives in the United Arab Emirates.

Ethnic Group

Percent of Ethnic Group in UAE

Number of T2D in EFR

Percent of T2D in EFR

Prevalence of T2D per100,000

South Asian 50% 10,732 52.67% 447.31

UAE National 19% 4,214 20.68% 462.21

Other Arab 15% 3,961 19.44% 550.31

Caucasian + East Asian 8% 504 2.47% 131.29

Iranian (Persians) 8% 487 2.39% 126.86

* According to the UAE census bureau, Caucasian and East Asians were consolidated in minority group.

89

Table 3: Characteristics of clinical data for 741 (UAE National) Type 2 Diabetes, pre

Type 2 Diabetes and healthy adult individual.

Category Subcategory Value (n)

Physical

Appearance

Gender Male 470 (63.43%) Female 271 (36.57%)

Age*

18-20 5 (0.67%) 21-39 246 (33.20%) 40-59 396 (53.44%) 60+ 76 (10.26%)

Body Mass Index (BMI)

Underweight <18.50 27 (3.64%)Normal range 18.50-24.99 329 (44.40%) Overweight 25.00-29.99 294 (39.67%)Obese ≥30.00 91 (12.28%)

Waist Circumference (WC)

Male ≤40 in 315 (42.51%)>40 in 155 (20.92%)

Female ≤35 in 135 (18.22%)>35 in 136 (18.35%)

Lifestyle

Smoking Yes 185 (24.97%)No 556 (75.03%)

Physical Activity Yes 405 (54.66%)No 336 (45.34%)

Diet Yes 314 (42.38%)No 427 (57.62%)

Inheritance Family History Yes 479 (64.64%)

No 262 (35.36%)Consanguinity Marriage

Yes 259 (34.95%)No 482 (65.05%)

Biochemistry

Test

Fasting Plasma Glucose

<100 mg/dl 340 (45.88%)100-125 mg/dl 247 (33.33%)≥126 mg/dl 154 (20.78%)

Oral Glucose Tolerance

<140 mg/dl 349 (47.10%)≥140 mg/dl 224 (30.23%)

HBA1c <6.5 % 542 (73.14%)≥6.5 % 199 (26.86%)

Cholesterol <200 mg/dl 495 (66.80%)≥200 mg/dl 246 (33.20%)

Serum Triglycerides

<150 mg/dl 430 (58.03%)≥150 mg/dl 311 (41.97%)

Urea <43 mg/dl 707 (95.41%)≥43 mg/dl 34 (4.59%)

Creatinine <1.3 mg/dl 675 (91.09%)≥1.3 mg/dl 66 (8.91%)

*The total number in age category does not total to 741 as some individuals under 18 years old were not included in this study

90

Table 4: Clinical and biochemical features of 391 patients diagnosed with T2D, pre T2D and 350 healthy individuals

Characteristic

Type 2 Diabetes Pre-Type 2 Diabetes Healthy

Male Female Male Female Male Female

n= 85 n = 83 n=167 n=56 n=218 n=132

Physical

Appearance

Age 51.75 ± 9.17 50.96 ± 11.01 37.94 ± 9.83 30.66 ± 9.17 48.39 ± 12.95 44.93 ± 9.93

BMI 33.47 ± 7.21 31.94 ± 7.97 28.37 ± 6.75 29.11 ± 8.40 23.92 ± 4.01 23.84 ± 4.06

Waist Circumference 41.42 ± 10.16 41.62 ± 6.78 44.52 ± 13.36 42.43 ± 11.41 33.71 ± 6.94 33.50 ± 8.29

Life Style

Smoking 49.41 10.84 40.72 1.79 28.90 1.52

Physical Activity 32.94 28.92 28.74 12.50 87.16 81.82

Diet 23.53 16.87 3.59 5.36 75.69 80.30

Inheritance Family History 63.53 71.08 55.69 78.57 3.67 3.03

Consanguinity Marriage 50.59 51.81 38.92 30.36 29.82 19.70

Biochemical

Test

Fasting Plasma Glucose 179.91 ± 48.88 160.52 ± 38.99 105.37 ± 9.41 109.25 ± 7.78 89.77 ± 6.36 88.98 ± 7.54

Impaired Glucose Tolerance - - 159.61 ± 13.16 158.18 ± 14.03 99.73 ± 9.61 98.91 ± 8.37

HbA1c 8.24 ± 1.73 7.86 ± 1.90 6.42 ± 1.48 5.98 ± 0.46 4.97 ± 0.63 4.92 ± 0.60

Cholesterol 239.78 ± 35.95 227.98 ± 54.31 195.37 ± 15.42 203.13 ± 15.40 116.11 ± 19.48 122.43 ± 16.40

Serum Triglycerides 188.85 ± 35.32 178.11 ± 51.18 153.22 ± 17.98 155.98 ± 19.29 91.60 ± 19.16 100.16 ± 11.38

Urea 43.47 ± 6.77 32.27 ± 7.35 25.17 ± 5.98 23.93 ± 5.92 21.95 ± 5.13 21.55 ± 5.31

Creatinine 1.24 ± 0.16 1.04 ± 0.23 0.81 ± 0.19 0.80 ± 0.22 0.88 ± 0.13 0.85 ± 0.12

91

DISCUSSION

The importance of a thorough and well-maintained database for significant disease entities

cannot be overstated. Diabetes is an overwhelming healthcare problem throughout the world

and in the UAE studies have shown that around 20% of the population has T2D.

The EFR was conceived as a resource to manage T2D in the UAE. It provides data that is

available through clinical testing and DNA screening. The data is stored in a systematic way

within a database and is coupled to a DNA and bio-bank repository to facilitate future

longitudinal studies. To the best of our knowledge such an effort has not been undertaken for

the Arab population. By breaking down the data in different ways, it is easy to establish what

particular strategy might be employed to improve disease management.

The process works quite the same for each patient and is carried out the same way allowing

simplifying the decision making process for healthcare workers, allowing for consistency in

the material collected. As shown in Figure 1; at the initial consultation; the status of each

individual is assessed and specific protocols are followed. A specific questionnaire and

assessment of clinical information within the UAE healthcare database, allows for a first pass

screen to determine disease status. The volunteers are categorised according to disease status,

and the decision is made as to the nature of bio-specimens that need to be collected. Using

T2D as a case in point, biochemical test relating to glucose, triglycerides and others are

requested along with a sample for research purposes. Since there is a lack of information on

genetic factors that predispose Arabs to T2D, DNA samples are stored for present and future

studies. Its value as a DNA data bank will increase over time as more volunteers are recruited

and genetic studies are completed. It has, thus far, been quite successful but by increasing the

numbers of patients that are in the DNA pool, we also increase our ability to identify gene

polymorphisms that may be related to T2D in Arabs.

When T2D identified in its early stages can be treated reducing the impact of the disease and

severity of the complication. However, there are even greater possibilities if the disease could

be decreasing through DNA testing. By having a large database of patients, we are more likely

to determine the nature of the genetic polymorphisms to predisposes to disease and the

possibly the underlying mechanisms, giving rise to the potential of therapy. DNA research in

92

general has just come into its own over the last two decades but research in diabetes and

particularly an understanding of genetic makeup that is associated with T2D in Arabs is

desperately needed at this time.

This database will allow clinicians and researchers to have access to information that can

make a tremendous difference to the nature for treatment that might be put in place to manage

the disease. For example, as illustrated in Table 2, there is a higher number of females who are

affected by T2D in the UAE. Further, the prevalence of the disease increases with age (Figure

2), in addition specific physical attributes and lifestyle habits (Table 4) are associated with the

disease. A complication related to T2D is metabolic syndrome. Hypercholesterolemia is one

of many significant problems with levels above 200mg/dl indicative of disease (Table 3 and

4). With this information, and in combination with other factors, physician can monitor patient

with hypercholesterolemia and be aware of signs that could indicate a progression to T2D.

Currently over 170 million people around the globe suffer from T2D. Most of these patients

are middle aged, however, variations in this regard are not rare, and are affected by factors

such as lifestyle, heredity, as well as behavioural factors (24). In this study, Table 3 shows

young patients of 30 year age with a large waist circumference have fasting blood sugars at

108mg/dl or greater and HbA1c levels at 6.42% (data not shown). It is also noted in Figure 2

that the 40 to 59 year old group is the largest group but the group of 20 to 39 year olds are not

far behind with 10% diagnosed with T2D.

T2D risks increase, as an individual grows older, especially after the age of 45 years. It has

been estimated that one out of five people aged 20 to 79 lives with this disease. Part of the

reason is that as people grow older they tend to become less physically active and they

gradually loose muscle mass and gain weight (25). However over recent years, a dramatic rise

in T2D among individuals in their 30s and 40s has been observed and more children and

teenagers are being diagnosed with the disease.

Public awareness can be increased using campaigns to reverse the alarming trend of increasing

prevalence among patients. Moreover, over the past decade it has been obvious that the

prevalence of T2D is increasing rapidly. Unless appropriate action is taken, it is predicted that

there will be at least 350 million people in the world with T2D by the year 2030 (26).

93

Risk factors for T2D are well defined. These include obesity, physical inactivity, elderly

people, family history of diabetes and those with a weakened tolerance for glucose. Table 4

illustrates the importance of maintaining a healthy physique, especially BMI and waist

circumference, life style, and biochemicals testing to monitor physiological changes are

essential. Abnormal fasting glucose above 126mg/dl, triglycerides above 150mg/dl,

cholesterol above 200mg/dl, and an elevated BMI and waist circumference can mean that the

patient already has metabolic syndrome.

Previous studies have shown that a family history of T2D is a very important indicator for

developing T2D (27, 28). Among the 741 UAE national who donated blood for this study,

63% of males and 71 % of females who are diabetic have first-degree relatives with T2D,

where only 3.6% of males and 6.06% females with the disease has history of T2DM in their

family. EFR has focused on collecting DNA from families to study T2D on the premise that

having one or more first-degree relatives with T2D increases the odds of having the disease.

Further, the use of families provides a degree of redundancy with a registry. Over time

patients are loss to the system due to migration. These individuals can be readily tracked by

contacting family members to discern their whereabouts.

The prevalence of T2D was more common among individuals in consanguineous marriages

with first degree relatives compared with the healthy group, an observation that is consistent

of a study by Bener et, al.(2005), which showed that consanguineous marriages were more

prevalent in T2D patients (29). This study also confirms previous studies (28) that show T2D

closely associated with overweight and obesity (BMI > 25). Okosun and his group (1998)

showed in their study that a large waist circumference is the strongest indicator of T2D risk

(30). Data in Table 4 show that males patients have larger waist circumferences than their

females counterparts.

Additionally, the data in Table 4 shows the significant of smoking among T2D and pre-T2D is

higher than the healthy individual. It has been suggested that smoking increases the risk of

diabetes but the evidence has been inconclusive. It is not surprising that smoking plays an

important role as there is evidence that smoking is bad for the pancreas, causes internal

inflammation and increases the hormones that increase abdominal fat even in thin smokers,

which could hamper the work of the insulin resistance (31). Table 5 summarises information

to be used especially by those in primary care clinics. With access to a public health database

94

such as the EFR, physicians can establish deficiencies and where diagnostic processes a poor

resulting in diabetics being missed. This in turn allows them to determine processes that might

allow for earlier diagnosis, follow up and preventing complications.

As with the association between disease and family relationships, ethnicity is another risk

factor. African Americans, Hispanic or Latino Americans, American Indians, and some Asian

Americans and Pacific Islanders are at particularly high risk for T2D (32). Some component of

this factor is most likely related to genes carried from earlier times, passed down through

generations. The data collected in the EFR and tabulated in Table 2 supports this showing

varying degrees of prevalence among populations from different race.

The nature of DNA profile or genetic makeup is generally population specific and can provide

leads toward best practice for care. To date, the nature of the genetic lesions that leads to T2D

in Arabs is not known. One of the primary objectives of the EFR was to provide a resource to

study genes of indigenous Arab populations. The DNA repository, when coupled with to

longitudinal data, will provide opportunities for researchers to dissect different variations of

the disease and for physicians to determine what the long term management procedures might

be used for monitoring patients.

In summary, with the resource that is the EFR data collected from volunteers has revealed that

obesity, waist circumference, consanguineous marriage, family history, lack of physical

activity, unhealthy diet with high total cholesterol and triglycerides levels were more prevalent

in T2D patients with predicted p value between healthy group and pre T2D and between

healthy group and T2D summarised in Table 5.

What is known is that there are both monogenic as well as polygenic forms of the conditions

that manifest in T2D can occur in a wide variety of variations. While the simple classification

method of Type 1 Diabetes and Type 2 Diabetes are helpful in unlocking the secrets of the

disease, these have not resulted identifying key clear cut factors between both forms of the

disease in the Arab population, therefore, a more extensive time period of continuous research

is required to understand the true nature of this disease. We believe that the longitudinal nature

of the EFR will allow the researchers to assess whether or not there are confounding

environmental factors or if a different set of genes account for earlier onset T2D.

95

Table 5: p-value generated by Dunnett's Multiple Comparison Test between Healthy

individuals and Pre-Diabetic and Diabetic patients.

Risk Factors

p value

Healthy vs. Pre-Diabetic

Healthy vs. Diabetic

Physical Appearance

Age (years) 0.0079 0.0065

BMI (kg/m2) 0.0121 0.0113

Waist Circumference (inch) 0.0083 0.0085

Biochemical Test

Fasting Plasma Glucose (mg/dl) 0.0032 0.0025

HbA1c (%) 0.0565 0.0484

Total Serum Cholesterol (mg/dl) 0.0020 0.0018

Serum triglycerides (mg/dl) 0.0025 0.0023

Urea (mg/dl) 0.0138 0.0107

Creatinine (mg/dl) 0.2792 0.2498

96

CONCLUSION

The Emirates Family Registry or EFR was developed in pursuit of several outcomes: (1)

studying lifestyle variables and other exposures that may be related to the development of

Diabetes Mellitus, (2) evaluating patient awareness about the disease and developing new

trends in disease prevention and management for the UAE; and (3) categorising patients and

their families based on disease complications which may imply different pathophysiology and

therefore different susceptibility genes.

The pilot programme of the EFR described here was successful. The data presented

throughout this paper could not have been gathered any other way as the tightly knit Bedouin

communities that are essentially closed to the technological advances could only be

approached through key members at the upper end of the family heirachy. The study has

provided an initial dataset collected from large numbers of volunteers (23,064). The

information gained is useful in many ways including genome wide association studies to

identify contributing polymorphisms, data that will entered into this data base, when available.

Analysis of the information within the database has revealed much about T2D in the UAE

allowing for the possibility of earlier diagnoses, treatment and intervention. This information

could help in the diagnosis and treatment of diabetes even before the patient has symptoms, in

the silent stage. The need to continue to add patients to the database as they are found and

treated; as well as those that do not presently have the disease is extremely important. This

kind of study and continued collection of data could lead to the genomic studies needed to

control of Diabetes. This would be a great thing for the patient, families, and the healthcare

system of any country.

To date, the lack of genome wide association studies leaves very little to be discussed

regarding the genetic prevalence of diabetes in the Arab countries. Therefore, there is a

pending need for the development of genome wide association studies for populations of the

UAE and other Arab nations.

97

ACKNOWLEDGEMENTS

Publication number HA09-0005 of the Centre for Forensic Science at the University of

Western Australia. Ms Alsafar is a PhD scholar at the University of the Western Australia

supported by the Dubai Police General Head Quarters in the United Arab Emirates. Ethics

approval was obtained form the United Arab Emirates Ministry of Health committee. Funding

for this project was provided by the Emirates Foundation. We would like to thank the Al-

Baraha Hospital and the Dubai Police Clinic for assisting with biochemical tests performed in

this study.

98

REFERENCES

1. Leslie RD. Metabolic changes in diabetes. Eye (Lond). 1993;7 ( Pt 2):205-8.

2. Chandy A, Pawar B, John M, Isaac R. Association between diabetic nephropathy and

other diabetic microvascular and macrovascular complications. Saudi J Kidney Dis

Transpl. 2008 Nov;19(6):924-8.

3. American Diabetes Association: National Diabetes Fact

Sheet. Alexandria, VA, ADA. 2002.

4. Centers for Disease Control and Prevention. National Diabetes Fact Sheet, General

Information and National Estimates on Diabetes in the United States. Atlanta, U.S. :

Department of Health and Human Services, Centers for Disease Control and

Prevention2007.

5. Centers for Disease Control and Prevention Coordinating Center for Health Promotion.

Diabetes: Successes and Opportunities for Population-Based Prevention and Control At-

A-Glance2009.

6. Diagnosis and Classification of Diabetes Mellitus. Diabetes Care. 2004 January

2004;27(suppl 1):s5-s10.

7. Eaks GA, Tiszka R. Chronic complications of diabetes: a creative management

approach. Nurse Pract Forum. 1998 Jun;9(2):74-86.

8. Goldstein I. The mutually reinforcing triad of depressive symptoms, cardiovascular

disease, and erectile dysfunction. Am J Cardiol. 2000 Jul 20;86(2A):41F-5F.

9. Pan WH, Cedres LB, Liu K, Dyer A, Schoenberger JA, Shekelle RB, et al. Relationship

of clinical diabetes and asymptomatic hyperglycemia to risk of coronary heart disease

mortality in men and women. Am J Epidemiol. 1986 Mar;123(3):504-16.

10. Uusitupa MI, Niskanen LK, Siitonen O, Voutilainen E, Pyorala K. 5-year incidence of

atherosclerotic vascular disease in relation to general risk factors, insulin level, and

abnormalities in lipoprotein composition in non-insulin-dependent diabetic and

nondiabetic subjects. Circulation. 1990 Jul;82(1):27-36.

11. Kannel WB, D'Agostino RB, Wilson PW, Belanger AJ, Gagnon DR. Diabetes,

fibrinogen, and risk of cardiovascular disease: the Framingham experience. Am Heart J.

1990 Sep;120(3):672-6.

99

12. Laakso M, Kuusisto J. Epidemiological evidence for the association of hyperglycaemia

and atherosclerotic vascular disease in non-insulin-dependent diabetes mellitus. Ann

Med. 1996 Oct;28(5):415-8.

13. Malecki MT, Klupa T. Type 2 diabetes mellitus: from genes to disease. Pharmacol Rep.

2005;57 Suppl:20-32.

14. El-Sharkawy T. Diabetes in the United Arab Emirates and Other Arab Countries: need

for Epidemiological and Genetic Studies. Genetic Disorders in the Arab World. Dubai:

Centre for Arab Genomic Studies; 2004. p. 57.

15. Expat numbers rise rapidly as UAE population touches 6m: Department of Economic

and Social Affairs Population Division2009.

16. Malik M, Bakir A, Saab BA, King H. Glucose intolerance and associated factors in the

multi-ethnic population of the United Arab Emirates: results of a national survey.

Diabetes Res Clin Pract. 2005 Aug;69(2):188-95.

17. Reed RL, Revel AD, Carter AO, Saadi HF, Dunn EV. A controlled before-after trial of

structured diabetes care in primary health centres in a newly developed country. Int J

Qual Health Care. 2005 August 1, 2005;17(4):281-6.

18. Saadi H, Carruthers SG, Nagelkerke N, Al-Maskari F, Afandi B, Reed R, et al.

Prevalence of diabetes mellitus and its complications in a population-based sample in Al

Ain, United Arab Emirates. Diabetes Res Clin Pract. 2007 Dec;78(3):369-77.

19. Niazi TN, Cannon-Albright LA, Couldwell WT. Utah Population Database: a tool to

study the hereditary element of nonsyndromic neurosurgical diseases. Neurosurg Focus.

Jan;28(1):E1.

20. Nystrom L, Dahlquist G, Ostman J, Wall S, Arnqvist H, Blohme G, et al. Risk of

developing insulin-dependent diabetes mellitus (IDDM) before 35 years of age:

indications of climatological determinants for age at onset. Int J Epidemiol. 1992

Apr;21(2):352-8.

21. Phillips P, Wilson D, Beilby J, Taylor A, Rosenfeld E, Hill W, et al. Diabetes

complications and risk factors in an Australian population. How well are they managed?

Int J Epidemiol. 1998 Oct;27(5):853-9.

22. Sekikawa A, Eguchi H, Tominaga M, Manaka H, Sasaki H, Chang YF, et al. Evaluating

the reported prevalence of type 2 diabetes mellitus by the Oguni diabetes registry using a

two-sample method of capture-recapture. Int J Epidemiol. 1999 Jun;28(3):498-501.

100

23. Villegas R, Shu XO, Li H, Yang G, Matthews CE, Leitzmann M, et al. Physical activity

and the incidence of type 2 diabetes in the Shanghai women's health study. Int J

Epidemiol. 2006 Dec;35(6):1553-62.

24. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, et al. A genome-

wide association study of type 2 diabetes in Finns detects multiple susceptibility

variants. Science. 2007 Jun 1;316(5829):1341-5.

25. Oguma Y, Sesso HD, Paffenbarger RS, Jr., Lee IM. Weight change and risk of

developing type 2 diabetes. Obes Res. 2005 May;13(5):945-51.

26. Wild S, Roglic G, Green A, Sicree R, King H. Global prevalence of diabetes: estimates

for the year 2000 and projections for 2030. Diabetes Care. 2004 May;27(5):1047-53.

27. de Costa CM. Consanguineous marriage and its relevance to obstetric practice. Obstet

Gynecol Surv. 2002 Aug;57(8):530-6.

28. Chen Y, Rennie DC, Dosman JA. Synergy of BMI and family history on diabetes: the

Humboldt Study. Public Health Nutr. 2009 Aug 26:1-5.

29. Bener A, Zirie M, Al-Rikabi A. Genetics, obesity, and environmental risk factors

associated with type 2 diabetes. Croat Med J. 2005 Apr;46(2):302-7.

30. Okosun IS, Cooper RS, Rotimi CN, Osotimehin B, Forrester T. Association of waist

circumference with risk of hypertension and type 2 diabetes in Nigerians, Jamaicans,

and African-Americans. Diabetes Care. 1998 November 1998;21(11):1836-42.

31. Ding EL, Hu FB. Smoking and type 2 diabetes: underrecognized risks and disease

burden. Jama. 2007 Dec 12;298(22):2675-6.

32. Centers for Disease Control and Prevention, National Diabetes Fact Sheet: General

Information and National Estimates on Diabetes in the United States. Department of

Health and Human Services. 2005.

101

CHAPTER 3

HERITABILITY OF QUANTITATIVE TRAITS

ASSOCIATED WITH TYPE 2 DIABETES IN AN

EXTENDED FAMILY FROM THE UNITED ARAB

EMIRATES.

This chapter was submitted to the International Journal of Diabetes and Metabolism in the

recommended format presented in the "Instruction to Authors" from the publishing house.

102

103

Chapter 3

Heritability of Quantitative Traits Associated with Type 2

Diabetes in an Extended Family from the United Arab

Emirates.

Chapter 3 was prepared as a manuscript which was submitted to International Journal of

Diabetes and Metabolism. In this chapter, the influence of environmental factors in the

pathophysiology of Type 2 Diabetes (T2D) and its related phenotypes in an Arab population

was examined.

Multiple factors, both environmental and genetic, contribute to the incidence and distribution

of T2DM. Therefore this study describes the role of genes and the influence of the

environmental on the increasing prevalence of Type 2 Diabetes in Arab populations. It

expands on a study presented by Mathias and colleagues in a 2009 edition of Metabolism:

Clinical and Experimental (10:1439-45). As the incidence of Type 2 Diabetes is increasing at

an alarming rate, an appreciation of the contributing factors will assist in improving

management strategies.

Physical and clinical traits were collected for assessment. Pair-wise phenotypic correlations

of the eight quantitative traits were observed, specifically between glycated hemoglobin

(HbA1c) and fasting glucose. This assessment of phenotypic factors will be followed up with

ongoing studies to evaluate the contribution of genetic polymorphisms that contribute to the

prevalence of T2D in Arab populations.

Diet and lifestyle factors (smoking, exercise, etcetera) are known to play a role in T2D.

Assessment of the quantitative traits collected in this study showed significant contributions by

factors such as Body Mass Index (BMI) and waist circumference (p < 1x10-6). There were

other suggestive traits (cholesterol, creatinine levels; p < 0.05). Although phenotype studies

provide some insight, matching genetic studies will augment the understanding of disease

104

mechanisms. Towards this, the first Genome Wide Association Study in Bedouins was

performed on 178 volunteers from the EFR project's DNA repository using Illumina's Human

660W-Quad-BeadChip. The outcomes of the GWAS are discussed in chapter 6 and 7.

This manuscript was prepared by myself with support from the co-authors listed. Drs Cordell,

Blackwell and Jameison guided me through the statistical analysis and provided me with

valuable comments and feedbacks. Dr Tay guided me throughout the study from designing the

study to proof reading the manuscript.

105

Heritability of Quantitative Traits Associated with Type 2 Diabetes in an Extended

Family from the United Arab Emirates

Habiba S. Al Safar1, 2, Sarra E. Jamieson3, Heather J. Cordell4, Jenefer M. Blackwell3,5, Guan

K. Tay1

1 Centre for Forensic Science, The University of Western Australia, Crawley Western

Australia. 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates. 3 Telethon Institute for Child Health Research, Centre for Child Health Research, The

University of Western Australia, Subiaco, Western Australia. 4 Institute of Human Genetics, Newcastle University, Newcastle upon Tyne, United

Kingdom. 5 Cambridge Institute for Medical Research and Department of Medicine, School of Clinical,

Medicine University of Cambridge, Cambridge, United Kingdom.

Abbreviated title: Type 2 Diabetes, United Arab Emirates

Keywords: Heritability, Quantitative Trait, Type 2 Diabetes


Western Australia






Phone: + 61 8 6488 7286

Fax: + 61 8 6488 7285


106

107

ABSTRACT

The prevalence of Type 2 Diabetes (T2D) in the United Arab Emirates (UAE) is steadily

increasing, posing a major public health problem. This study assessed the value of specific

clinical markers for T2D among five generations of an extended Arab family. This family

included 319 members of 41 nuclear families; from which 178 individuals (86 males, 92

females; 66 diabetic, 112 healthy) formed the study sample set. The ages of the participants

ranged from 4 to 88 years. All participants completed a questionnaire that focused on baseline

factors that have previously been associated with T2D such as diet, smoking, and family

history of the disease. The quantitative traits, fasting glucose, glycated hemoglobin (HbA1c),

cholesterol, triglyceride, urea and creatinine levels were measured. Body mass index (BMI)

and waist circumference were also recorded. The heritability of these eight quantitative traits

were determined with values ranging from 6% to 48%. We found a significant relationship

between T2D diagnosis and waist circumference (p = 2.6, E-9) and BMI (p = 1.0, E-6). The

estimated power for these two traits was 80% to 90%, respectively. Creatinine (p = 0.002) and

cholesterol (p = 0.02) levels were also associated with T2D. Our results support the link

between environmental and genetic factors in the pathophysiology of T2D and its related

phenotypes in an Arab population.

108

INTRODUCTION

Type 2 Diabetes (T2D) is one of the most widespread chronic diseases, contributing to the

severe illness and ultimately leading to death of millions of people worldwide. According to

the International Diabetes Federation, the number of people diagnosed with T2D has risen

over the past twenty years from 30 million to more than 246 million (1, 2). In the Middle

East, 12% to 20% of the population suffers from diabetes. This incidence increases every year

along with the rising costs associated with health care provision (3). A Ministry of Health

survey in 1999 and 2000 reported that 19.6% of people in the United Arab Emirates (UAE)

were diagnosed with diabetes. More recent studies have estimated that 25% of adult Arabs

suffer from T2D, and the prevalence of the disease is increasing. In 2007, the UAE population

had the second-highest incidence of diabetes in the world. In this country, an estimated one in

five people aged between 20 to 79 years of age lives with diabetes, while a similar percentage

of the population is at risk of developing the disease.

A range of risk factors contribute to being at risk to T2D, particularly obesity, physical

inactivity, age, ethnicity, history of gestational diabetes, weakened glucose tolerance, and a

familial history of diabetes (4). The prevalence of diabetes varies between different

populations. Approximately 5% of Asian populations are affected, while almost 50% of the

Pima Indian population suffers from diabetes (5-7) at the top end of this spectrum.

Researchers have noted high rate of new T2D cases among youth in the United State every

year for ; African-American (39 per 100,000), Hispanic-American (29 per 100,000),

American Indian (45 per 100,000), and to a lesser extent Asian-American and Pacific Island

populations (24 per 100,000) (8).

Multiple factors, both environmental and genetic, contribute to the incidence and distribution

of T2D. Urbanisation and concordant changes in lifestyle have been linked to the prevalence

of the disease (9). For instance, the incidence of T2D is very low in some rural populations

such as the Mapuche Indians of Chile and rural Chinese groups, indicating the role of

environmental factors (10). Some of the highest incidences of T2D, however, have been

among the Pima Indians of Arizona and the Naura of Papua New Guinea, suggesting the

importance of genetic factors in the development of the condition (10).

109

The increasing prevalence of T2D in the UAE appears to follow similar trends. Families

among the indigenous tribes show varying degrees of predisposition to the disease. With

widespread urbanisation in the Middle East over the past century, environmental factors

increasingly exert an influence. In this report, we estimate the heritability of traits associated

with T2D in an extended family from the UAE. This assessment of phenotypic factors will be

followed up with ongoing studies to evaluate the contribution of genetic polymorphisms that

contribute to the prevalence of T2D in Arab populations.

110


Subjects

Major hospitals and primary care centers in the UAE were contacted to establish a

collaborative recruiting network for this study. The study was performed with the approval of

the ethical review committee of the United Arab Emirates Ministry of Health. Through this

collaboration, doctor diagnosed data collected through one-on-one interviews of T2D patients

(and healthy controls) were evaluated. Clinical assessment and questionnaire completion were

conducted at the clinic. Subsequently, 319 individuals belonging to one extended family of

Bedouins origin were identified. Multigeneration family relationships were compiled for these

individuals, and the pedigree of five generation extended family was constructed from 41

nuclear families. A total of 178 individuals from this sample agreed to participate in this study.

Physical attributes

The age, waist circumference and body mass index (BMI) for each volunteer was recorded.

Biochemical testing

All biochemical tests were performed at the Al-Baraha Hospital, Dubai, UAE, using the Cobas

Integra 800 clinical chemistry system (Roche Diagnostics, Indianapolis, IN, USA). Peripheral

blood was collected from the 178 individuals in EDTA, heparin and fluoride vacutainers. The

heparin and fluoride tubes were centrifuged at 3,000 rpm for 5 minutes. Serum from the

fluoride tubes was aspirated off to measure fasting glucose, cholesterol and impaired glucose

tolerance, while serum from the heparin tubes was used to measure triglycerides, urea and

creatinine levels. HbA1c was measured with 25µl of blood from the EDTA tubes. An

individual was classified as diabetic if the subject: (1) was diagnosed with the disease by a

qualified physician; (2) had been prescribed drug treatment for diabetes; and/or (3) met the

fasting plasma glucose criterion of ≥ 126 mg/dl set by the World Health Organisation (WHO).


Raw phenotypic data was transformed and adjustment for age and sex. The transformation

process, quantile-quantile (QQ) plots and histogram plots were generated by version 11 of

STATA statistical software (College Station, TX, USA). To achieve normal distribution, the

quantitative trait data were log-transformed. Heritability and power estimates were calculated

111

for each trait using Solar version 4 (11). Pairwise correlations between all phenotypic pairs

were calculated using STATA.

112

RESULTS

The study population included 66 subjects with T2D and 112 healthy subjects; 86 were male

and 92 were female, ranging from 4 to 97 years of age. The mean age of the cohort was 37

years. The means and standard deviations of the eight quantitative traits used in this study are

presented in Table 1.

Table 2 shows the estimated heritability and power for the eight traits used to evaluate the

influence of genetic component on phenotypic variation by using Solar. All traits showed

moderate to high familial aggregation, with heritability estimates ranging from 6% to 44%.

Waist circumference, BMI, creatinine and cholesterol levels showed significant levels of

heritability (p < 0.05), while the p-values were greater than 0.05 for triglyceride, fasting

glucose, HbA1c and urea levels. Waist circumference (44% heritability) and BMI (48%

heritability) had the highest heritability rates among the eight traits, with powers of 80% to

90%. Fasting glucose (36% heritability) and HbA1c (6% heritability) were the only traits that

were directly related to T2D.

Table 3 presents the pairwise phenotypic correlations of the eight quantitative traits. The

highest phenotypic correlation observed in this study was that between fasting glucose and

HbA1c (0.89). Another significant pairwise correlation was between BMI and waist

circumference (0.70), which is related to obesity. There was also a phenotypic correlation

between waist circumference and both fasting glucose (0.52) and HbA1c (0.41); both of which

are related to obesity.

113

Table 1: Phenotypic and clinical characteristics of 178 individuals belonging to an

extended family of Arab origin.

Description Number

Males 86

Females 92

Type 2 Diabetes 66

Healthy 112

Variable Mean ± SD

Physical Appearance

Age (years) 37.35 ± 19.24

Waist circumference (inches) 38.41 ± 7.75

Body mass index (BMI) 29.48 ± 7.97

Biochemical Tests

Creatinine (mg/dl) 0.96 ± 0.25

Cholesterol (mg/dl) 177.19 ± 62.23

Triglyceride (mg/dl) 148.24 ± 83.04

Fasting glucose (mg/dl) 117.32 ± 44.14

Urea (mg/dl) 26.24 ± 8.21

HbA1c (%) 5.73 ± 1.38

114

Table 2: Heritability and power estimation to obtain a suggested (LOD =3) of eight

quantitative traits in 178 individuals. Values have been adjusted for sex and

age. Significant p values are indicated in bold.

Trait H2ra p valuea Chi-squarea Power

estimate

Waist Circumference 0.44 2.6, E-9 34.04 > 80%

Body mass index 0.48 1.0, E-6 28.01 > 90%

Creatinine 0.28 2.0, E-3 7.60 > 20%

Cholesterol 0.18 0.02 3.59 > 10%

Triglyceride 0.14 0.06 2.28 > 10%

Fasting glucose 0.36 0.10 1.63 > 50%

Urea 0.10 0.11 1.49 > 10%

HbA1c 0.06 0.36 0.11 > 10%

a Heritability (H2r), p and chi-square values were obtained with tests on transformed

quantitative trait data. The chi-square and p values relate to the likelihood ratio test comparing

polygenic models to sporadic models.

115

Table 3: Pairwise correlation between diabetes-related phenotypic traits in 178 individuals.

Waist circumference

BMI Creatinine Cholesterol Triglyceride Fasting Glucose

Urea HbA1c

Waist Circumference 1

BMI 0.70 1

Creatinine 0.20 0.18 1

Cholesterol 0.21 0.18 0.29 1

Triglyceride 0.23 0.29 0.20 0.31 1

Fasting Glucose 0.52 0.26 0.22 0.24 0.24 1

Urea 0.01 0.09 0.29 0.13 0.07 0.14 1

HbA1c 0.40 0.28 0.22 -0.04 0.14 0.89 0.16 1

116

DISCUSSION

Our study of T2D in an extended family of Arab origin provides insights into the roles of

genetic predisposition and environmental influence in the rising prevalence of T2D in Arab

populations. We found strong phenotypic correlations between fasting glucose levels and

HbA1c, and between these two traits and waist circumference. Our findings also indicate a

heritable tendency for obesity in this family, indicated by waist circumference and BMI

values. Therefore the heritability of these traits suggest the contribution of genetic factors to

the prevalence of T2D in this population Obesity results from a combination of genetic and

environmental factors that appear to play a significant role in the development of T2D in this

sample. A major and prevalent public health problem, obesity is associated with numerous

conditions such as hypertension, T2D, coronary heart disease and cancer.

Wide ranges of heritability have been reported for these traits in other populations. Mathias

and colleagues (12) found moderate to high familial aggregation for the traits tested in this

study in a south Indian population, with heritability ranging from 21% to 72%.

Anthropometric measures such as height, weight and BMI showed the highest heritability in

their study, and the results in Arabs shown here are consistent with this finding. The

researchers also found strong correlations between genetic and environmental effects for the

measures most directly related to T2D, especially between fasting insulin levels and

anthropometric measures. However, only two pairs of traits showed evidence for complete

pleiotropy: waist circumference was correlated with BMI and fasting insulin levels. These

results suggest that common genes may exert an influence on obesity and insulin levels in

these pedigrees (12).

A study conducted by the Framingham Heart Study group estimated the heritability of

anthropometric and biochemical traits in a Caucasian population (13-15). There

anthropometric trends of this Arabian study were familiar with those shown in the

Framingham studies which found heritability rates for height (0.52 ± 0.09 to 0.88 ± 0.06),

weight (0.42 ± 0.10 to 0.56 ± 0.50) and BMI (0.46 ± 0.10 to 0.49 ± 0.06). However, the

heritability of cholesterol (0.51 ± 0.04) and triglyceride (0.56) levels was much higher than in

the Arab population studied. Their heritability results for fasting blood glucose (0.17 ± 0.04 to

0.39) were similar to that observed in the Arab study.

117

In summary, this study supports the influence of both environmental and genetic factors in the

pathophysiology of T2D and its related phenotypes in an Arab population. Waist

circumference and BMI may play a more prominent role in the development of diabetes in this

population. The results presented show a strong familial aggregation of quantitative traits

associated with T2D. Further studies are underway to identify potentially specific genetic loci

in Arab populations.

118

ACKNOWLEDGMENT


Western Australia. We gratefully acknowledge the family whose cooperation made this study

possible. We also would like thank Richard Francis at Telethon Institute for Child Health

Research for his support that was allowed the statistical work to be carried out for this study.

Ms Alsafar is a PhD scholar at the University of Western Australia supported by the Dubai

Police General Head Quarters in the United Arab Emirates. Funding for this project was

provided by the Emirates Foundation.

119

REFERENCES

1. Dunstan DW, Zimmet PZ, Welborn TA, De Courten MP, Cameron AJ, Sicree RA, et

al. The rising prevalence of diabetes and impaired glucose tolerance: the Australian

Diabetes, Obesity and Lifestyle Study. Diabetes Care. 2002;25:829-34.

2. Sicree R, Shaw J, and Zimmet P, editors. Diabetes and impaired glucose tolerance. 3rd

edition. Brussels; 2006.

3. International Diabetes Federation. Diabetes Atlas. 2006.

4. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, et al. A genome-

wide association study of type 2 diabetes in Finns detects multiple susceptibility

variants. Science. 2007;316:1341-5.

5. Knowler WC, Bennett PH, Hamman RF, Miller M. Diabetes incidence and prevalence

in Pima Indians: a 19-fold greater incidence than in Rochester, Minnesota. Am J

Epidemiol. 1978;108:497-505.

6. Pavkov ME, Hanson RL, Knowler WC, Bennett PH, Krakoff J, Nelson RG. Changing

patterns of type 2 diabetes incidence among Pima Indians. Diabetes Care.

2007;30:1758-63.

7. Yang X, Pratley RE, Tokraks S, Bogardus C, Permana PA. Microarray profiling of

skeletal muscle tissues from equally obese, non-diabetic insulin-sensitive and insulin-

resistant Pima Indians. Diabetologia. 2002;45:1584-93.

8. Centers for Disease Control and Prevention. National Diabetes Fact Sheet: General

Information and National Estimates on Diabetes in the United States. Department of

Health and Human Services. 2005.

9. Elsharkawy T. Diabetes in the United Arab Emirates and other Arab countries: need

for epidemiological and genetic studies. Genetic Disorders in the Arab World, United

Arab Emirates: Centre for Arab Genomic Studies; 2004. p. 57.

10. O’Rahilly S IBaNW. Genetic factors in type 2 diabetes: the end of the beginning?

Science. 2005:370-3.

11. Almasy L, Blangero J. Multipoint quantitative-trait linkage analysis in general

pedigrees. Am J Hum Genet. 1998;62:1198-211.

120

12. Mathias RA, Deepa M, Deepa R, Wilson AF, Mohan V. Heritability of quantitative

traits associated with type 2 diabetes mellitus in large multiplex families from South

India. Metabolism. 2009;58:1439-45.

13. Brown WM, Beck SR, Lange EM, Davis CC, Kay CM, Langefeld CD, et al. Age-

stratified heritability estimation in the Framingham Heart Study families. BMC Genet.

2003;4:S32.

14. Mathias RA, Roy-Gagnon MH, Justice CM, Papanicolaou GJ, Fan YT, Pugh EW, et

al. Comparison of year-of-exam- and age-matched estimates of heritability in the

Framingham Heart Study data. BMC Genet. 2003;4:S36.

15. McQueen MB, Bertram L, Rimm EB, Blacker D, Santangelo SL. A QTL genome scan

of the metabolic syndrome and its component traits. BMC Genet. 2003;4:S96.

121

CHAPTER 4

EVALUATION OF DIFFERENT SOURCES OF DNA

FOR USE IN GENOME WIDE STUDIES

This chapter has been published in Applied Microbiology and Biotechnology according to the

format prescribed by the journal.

122

123

Chapter 4

Evaluation of Different Sources of DNA for use in Genome

Wide Studies

Chapter 4 is presented as a manuscript submitted to Applied Microbiology and Biotechnology

Journal. The version of the manuscript presented in this chapter has been corrected after

receiving comments from the editor and reviewers of Applied Microbiology and

Biotechnology. The amended manuscript has been returned to the journal's editor for

publication.

As part of the overall effort, we established a DNA repository with the clinical database to

allow; (1) association studies between genotype and phenotype and (2) longitudinal studies

for future work. DNA was collected using the traditional methods of extraction. A new

method was assessed to allow collection in remote regions and developing countries. From

our background in Forensic science, we use FTATM for STR analysis. This study describes

another use of FTATM technology. FTATM cards were developed by Whatman, accompany

which has a respectable track record in filter paper technology and application. FTATM card

system incorporates a chemical preservative that allows in-field collection of biological

samples. Applications in forensic science (blood, saliva and semen collection and storage) as

well as conservation biology (storage of DNA from endangered species) have been reported.

DNA samples stored over 11 years have been successfully amplified for analysis. Storage of

DNA on cards at ambient temperatures represents a substantial saving in infrastructure costs.

This manuscript describes the storage of DNA and a Whole Genome Amplification step prior

to using the GWAS application as an alternative strategy for collecting and storing bio-

specimens for high throughput genotyping.

The use of FTATM to store DNA for genomic applications is becoming more common.

Whatman reported the successful use of DNA from FTATM to genotype 1,516 SNPs using

Illumina's Golden Gate platform and subsequently studied 10,000 SNPs using Affymetrix. In

124

2008, the Hunt Biobank group collected samples on FTATM paper for future genotyping

applications.

This study expands on a study published in BMC Notes by McClure et al (2009); where DNA

extracted from cells on FTATM cards were used to genotype 54,122 cattle SNPs. In this study,

three different sources of DNA (degraded genomic DNA, amplified degraded genomic DNA

and amplified extracted DNA from FTA card) as suitable templates for genome-wide analysis

using Illumina’s Human 660w-Quad Bead Chip which contains 12 times the number of

markers (ie. 660,000 SNPs) was assessed. To the best of our knowledge, this is the first

description of FTATM sourced DNA for high throughput genotyping to study human

polymorphisms.

This manuscript was prepared by myself with support from the co-authors listed. All the

laboratory work at the Central Vetairnary Research Laboratory (CVRL) including DNA

extraction, whole genome amplification were performed by myself. Genotyping was

performed with technical assistance from Dr Abidi, under the guidance of Dr Khazanehdari.

The manuscript was proof read by Dr Dadour, co-advisor to my PhD project. Dr Tay, my

principal advisor, guided me throughout the study from design to proof reading the

manuscripts.

129

Evaluation of Different Sources of DNA for use in Genome Wide Studies and Forensic

Application

Habiba S Al Safar1, 2, Fatima H Abidi3, Kamal A Khazanehdari3, Ian R Dadour1, Guan K Tay1

1 Centre for Forensic Science, the University of Western Australia, Western Australia,

Australia 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates 3 Molecular Biology & Genetics, Central Veterinary Research Laboratory, Dubai, United

Arab Emirates

Abbreviated title: Genome Wide Studies

Keywords: FTA, GWAS, DNA quality


Western Australia.






Phone: + 61 8 6488 7286

Fax: + 61 8 6488 7285


130

131

ABSTRACT

In the field of epidemiology, Genome-Wide Association Studies (GWAS) are commonly used

to identify genetic predispositions of many human diseases. Large repositories housing

biological specimens for clinical and genetic investigations have been established to store

material and data for these studies. The logistics of specimen collection and sample storage

can be onerous, and new strategies have to be explored. This study examines three different

DNA sources (namely, degraded genomic DNA, amplified degraded genomic DNA and

amplified extracted DNA from FTA card) for GWAS using the Illumina platform. No

significant difference in call rate was detected between amplified degraded genomic DNA

extracted from whole blood and amplified DNA retrieved from FTATM cards. However, using

unamplified-degraded genomic DNA reduced the call rate to a mean of 42.6% compared to

amplified DNA extracted from FTA card (mean of 96.6%). This study establishes the utility of

FTATM cards as a viable storage matrix for cells from which DNA can be extracted to perform

GWAS analysis.

132

INTRODUCTION

The collection of biological samples on paper matrices is a common and routine practice. For

example, use of Guthrie spots on filter papers to store and transport samples of blood from

newborns by a heel prick method is standard practise. With advances in molecular techniques,

specific preservatives and novel extraction chemicals have been developed to enhance paper.

Flinders Technology Associates (FTATM) was developed to simplify the collection, shipment

and archiving of a wide variety of biological specimens. It comprises a cellulose-based matrix

containing chemicals (formamide, citrate and Trizma-base) for cell lysis and nucleic acid

preservation (Moscoso et al., 2004). Chemical activation occurs when a biological fluid

comes into contact with the FTATM surface. The preservatives on the FTATM matrix inactivate

bacteria and viruses, thus protecting the biological samples from microbial growth and

contamination. Further, users collecting biological specimens for DNA are protected from

hazardous microbes that may be present in the specimen. FTATM technology has also been

used in a number of animal tissue culture applications. For example, it has been used to safely

transport samples infected by foot-and-mouth disease virus (FMDV) (Muthukrishnan et al.,

2008). FTATM paper also provides the advantage of sample storage at ambient room

temperatures.

FTATM paper has been commonly used as a matrix for DNA storage in a number of

disciplines, particularly in the pharmaceutical sector (Martins et al., 2002; Tolunay et al.,

2006), law enforcement groups (Raina and Dogra, 2002; Tack et al., 2007), agriculture

(Crabbe, 2003; Ndunguru et al., 2005) and regulatory agencies. In the field of forensic science,

FTATM technology has excelled (Harvey, 2005; Yoshihiko and Shin-ichi, 2006). The

simplicity of the collection technique, its adaptability to the range of biological specimens

encountered at potential crime scenes and the ease of storage has made the technology the

preferred evidence collection method. It has been shown that specimens stored on FTATM

paper have long shelf life, with DNA samples recovered from FTATM stored over 17 years

used for reliable human identification (Ndunguru et al., 2005).

On the forensics front, the use of microsatellite short tandem repeats (STR) for DNA profiling

first developed by jeffrey et al (1985) has been invaluable (Gill et al., 1985). However, with

133

advances in genome science, new opportunities continue to be considered (Foster et al., 1998).

The large amount of data from Genome-Wide Association Studies (GWAS), once a hindrance

to applied work such as criminal profiling, will become more manageable with the

development of analysis, visualisation and interrogation software. Here, we have shown that

FTATM archived DNA can be used in GWAS. Consequently, current DNA storage procedures

in forensics are acceptable in the event GWAS or similar genomic methods are adopted for

criminal profiling.

In 2008, the Hunt BioSciences study in Norway, which commenced in 1984, used FTATM

technology for storage of DNA. The study is comprised of a population-based

epidemiological health studies which have focused on factors that predispose to diabetes and

breast cancer. There are some 75,000 participants, with a participation rate of 88%. In their

third ongoing survey HUNT3, in which 10,000 samples were collected, biological specimens

were collected and preserved on FTATM.

In this study, the suitability of DNA stored on FTATM was assessed for more sophisticated

DNA analysis techniques, namely GWAS. GWAS applications have led to a proliferation in

the number of biobanks or biological sample repositories to provide the necessary biological

resource for these substantial genome-wide studies. Considerable effort has been put into

collecting blood and tissue samples and matching these to patient information ranging from

demographic data to specific clinical histories. Over the years, associations between these

phenotypes and genetic polymorphisms have revealed a plethora of genetic associations.

For GWAS studies, the FTATM Elute system, when used in combination with whole genome

amplification (WGA) technologies can create a virtually unlimited supply of nucleic acid

template. Valuable biological samples can be archived or banked at ambient laboratory

temperatures, replacing the need for expensive, space-consuming and energy-demanding

freezer banks. In GWAS studies, the investigation of large groups is necessary because genetic

factors involved in the cause of multifactorial diseases can only ever supply partial

explanations. There is only a certain probability that genetic factors will result in a given

multifactorial disease, and as the sample number increases, the probability becomes more

precise and accurate. However, current storage systems are relatively limited and require

significant infrastructure (e.g. −80°C freezers) and support. Consequently, more convenient

alternatives have to be considered. It is expected that the dissection of genetic factors that

134

predispose to disease and which explain the etiology of the complex multi-factorial disorders

will be the key to preventative strategies, as well as the development of targeted therapeutic

modalities. The development and assessment of technologies including FTATM that facilitate

large-scale genomic efforts are critical to these outcomes.

135


Sample set

Peripheral blood was drawn and collected in EDTA tubes from three healthy unrelated

individuals (denoted S1, S2 and S3) after receiving ethical approval from the Ministry of

Health in the United Arab Emirates. These three samples were used in each set of the

experiments mentioned below. Four drops of blood from each sample were transferred to a

FTATM paper (Whatman, Maidstone, Kent, UK) and stored at ambient room temperature

(20°C).

Preparation of genomic DNA for GWAS analysis

Three different sets of DNA templates were prepared and used in the present study.

Set 1: DNA was extracted from blood embedded in FTATM (abbreviated PCR-FTA) and then

amplified. DNA samples were purified from FTA by placing a 3-mm disk in a microcentrifuge

tube. The disk was rinsed in TE−1 (10mM Tris–HCI, 0.1 mM EDTA, pH 8) buffer twice and

left to stand for 5 min at room temperature (20°C). The buffer was subsequently removed and

fresh TE−1 buffer was added. The disk was left to stand in elution buffer for 20 min at room

temperature. This step was repeated twice. Subsequently, the elution buffer was removed and

the disk was dried at room temperature for 1 h. At the end of the drying process, a complete

WGA step was performed by thermal Cycler GeneAmp PCR system 9700 (Applied

Biosystems, Lincoln Centre Drive, Foster City, CA, USA) on all three samples separately

using Sigma's Genomeplex® kit (Sigma #WGA4) according to the manufacturer's instructions

(Sigma-Aldrich, St Louis, MO, USA). Prior to GWAS analysis, the PCR products were

cleaned using a Promega Kit (Promega, Madison, WI, USA) according to the protocol

provided.

Set 2: DNA was extracted from whole blood using standard methods and amplified (referred

to as PCR-dgDNA). The quantity and purity of the three DNA samples used were determined

by absorbance measurements using a NanoDrop ND-1000 Spectrophotometer (NanoDrop,

Wilmington, DL, USA). A total of 10ng/µl of each DNA sample was amplified using Sigma's

Genomelex® kit, with PCR clean up performed using the PCR purification Kit of Promega

using thermal Cycler GeneAmp PCR system 9700 (Applied Biosystems).

136

Set 3: DNA was extracted from whole blood using standard methods without further

amplification (dgDNA). Three DNA samples at concentrations of 50ng/µl were prepared for

GWAS analysis.

All sample sets were qualified for GWAS analysis, with DNA ratios (A260:A280) of 1.9 and

the average DNA concentrations of 200ng/μl used for the study. All samples were diluted to a

concentration of 50ng/μl in Tris EDTA (TEKnova, Hollister, CA, USA).

GWAS assay

A genome-wide study was performed on all three sets of DNA with the Human660W-Quad

BeadChip (Illumina, San Diego CA, USA), which contains 660,000 SNPs derived from the

International HapMap Project. The genotype assays for the three sets of DNA were performed

according to the manufacturer's recommendations. In brief, 200ng of DNA template was

subjected to whole-genome amplification at 37°C for 20 to 24 h. Products were degraded,

precipitated, and re-suspended in hybridisation buffer. The re-suspended samples were

denatured at 95°C for 20 min, loaded onto the BeadChips, and placed in a 48°C hybridisation

chamber for 16 to 20 h. After hybridisation, non-hybridised DNA was washed away from the

BeadChips. An allele-specific single-base extension of the oligonucleotides on the BeadChip

was performed in a 48-position GenePaintTM Slide Chamber Rack (Tecan, Männedorf,

Switzerland) using labelled deoxynucleotides and the captured DNA as a template. After

staining of the extended DNA, BeadChips were washed and scanned on an I-Scan apparatus

(Illumina), and genotypes were called using the BeadStudio software version 3.0 (Illumina).


Statistics on the data generated were carried out with one-way analysis of variance (ANOVA)

and Bonferroni's multiple comparison tests.

137

RESULTS

The integrity of degraded genomic DNA is critical when used as template for GWAS studies.

The call rates for degraded DNA can be variable, which compromises the integrity of the

study. By way of illustrating this in Figure 1, when GWAS assays were performed using

DNA templates that were degraded, the ratio of “calls” to “no calls” can be highly variable.

The efficiency of the assay is low, with call rates as low as one in five (or 20%) achieved.

The use of a WGA step prior to GWAS analysis can improve the call rate to around 96% (call

rates for three samples under the PCR-degraded gDNA category in Figure 1). In the same

study, the use of FTATM as a DNA collection and storage media was assessed with call rates of

96% and higher achieved (PCR-FTA Figure 1).

In Figure 2, the quality of the base calling function is illustrated. Specifically, in Figure 2C,

clustering of the plots shows that genotypes are not assigned when using degraded DNA

templates. The results for three separate samples, S1, S2 and S3, fall outside the ‘call zone’.

There is some improvement, when the degraded genomic DNA is processed with an

amplification step prior to GWAS analyses (Figure 2B). Interestingly, amplified genomic

DNA collected using FTATM cards generated the best results (Figure 2A), suggesting that this

simple method of specimen collection and nucleic acid purification could be a suitable prelude

for GWAS studies.

In the 20 selected SNPs on chromosome 18, it is clear the genotypes are not called when

degraded genomic DNA is used for analysis. Examples of the types of missed calls and no

calls are specifically shown in Figure 3. The three different DNA templates (PCR-FTA, PCR-

dgDNA and dgDNA) for all three individuals (S1, S2 and S3) were compared and the range of

examples of call scenarios is presented. There are three examples of SNP positions where

there is concordance between all three DNA templates shown, rs10083985 and rs10163808 for

S1 and rs1010360 for S3. In sample S1, the only example of a no call observed for all three

DNA templates used can be seen at SNP rs10163736.

Importantly, the type of DNA template used can give rise to erroneous results. These errors

are compounded and generally missed due to the large amount of data that is generally

138

associated with GWAS studies. An example of a miscall genotype is shown at position

rs1008899 in sample S1 when using degraded genomic DNA as a template for GWAS assays.

To provide a chromosome-wide perspective of the data selected for Figure 3, the same data for

all the SNPs analysed for chromosome 18 is presented using Illumina's Chromosome Browser

(Fig. 4). The density of genotypes called when using DNA templates collected by FTATM

paper is higher when compared to amplified degraded genomic DNA. The number of

genotypes called and accuracy of the calls with degraded genomic DNA was poor. These

results were consistent with the quality control step using box plots to represent the log R ratio

recommended by Illumina (see Fig. 5). The log R values when using amplified DNA template

from FTATM were typically 0.1 to 0.25 for all three samples studied, the range for a good call

(Fig. 5A). The average score for amplified degraded genomic DNA was acceptable (Fig. 5B);

however, SNPs were not as tightly clustered as seen with amplified templates from FTATM.

As expected, the scores reflected the poor quality of results obtained using degraded DNA

(Fig. 5C).

In summary, for three subjects studied (S1, S2 and S3), the call rates were variable when using

degraded DNA as a template (19%, 61% and 48%, respectively, Table 1).

While collecting blood in the conventional fashion for S1, S2 and S3, blood spots were also

collected on FTATM paper. The DNA was harvested and subjected to a genome-wide

amplification step prior to the GWAS assay. The call rates using these DNA templates were

equivalent (96%) or better than (97%) the assays that used amplified degraded genomic DNA

as templates (Table 1).

Results from one-way ANOVA (Table 2) shows pair-wise comparisons of the three sources of

DNA. Overall, there is a significant difference (p = 0.0027) between the call rates observed for

degraded genomic DNA (dgDNA) when compared to PCR amplified degraded genomic DNA

and DNA sourced from FTATM. The call rates of the latter two were similar (mean of 96.0%

and 96.6%), respectively. These call rates above 95% are above the optimal rates used for

conventional GWAS using pristine quality DNA.

139

Figure 1: Summary of called genotype and no genotype calls of 657,366 SNPs across 23 chromosomes using three sources of DNA: PCR-

FTA, PCR-dgDNA and dgDNA. For each source of DNA, three independent samples were collected (S1, S2 and S3) for testing

and comparison.

0

100000

200000

300000

400000

500000

600000

S1 S2 S3 S1 S2 S3 S1 S2 S3

PCR-FTA PCR-fragmented gDNA Fragmented DNA

SNPs

Num

ber

Calls No Calls

PCR‐FTA PCR‐dgDNA dgDNA

140

Figure 2: Examples of clustering plots showing the

accuracy of calling for SNP rs1013861 on

chromosome 18 using different sources of

DNA. a High call rate for the three PCR-

FTA samples, squares S1, circles S2 and

triangles S3, with the genotype called

correctly. b When using PCR amplified

degraded DNA, there were two correct

calls (S2 and S3) and one no call (S1). (c)

The genotypes of all three samples of

degraded DNA could not be assigned due

to poor call rates. d A typical clustered

SNP clustering pattern in 178 samples

with all genotypes being correctly called.

Norm R, normalised intensity; Norm

Theta, angle of the centre of cluster in

normalised polar coordinates. Dark shaded

area, the call zone for AA (right), AB

(middle) and BB (left) genotypes.

141

Chromosome 18

SNPs

rs10

0005

5

rs10

0440

3

rs10

0839

61

rs10

0839

85

rs10

0889

9

rs10

0981

9

rs10

1036

0

rs10

1044

4

rs10

1194

7

rs10

1386

1

rs10

1534

05

rs10

1546

0

rs10

1636

57

rs10

1637

36

rs10

1638

08

rs10

1640

09

rs10

1725

2

rs10

1998

9

rs10

2159

9

rs10

2214

43

S1

PCR-FTA AB AB AA BB AB AA AA AA AA AB AA BB AA AA BB AB BB AB

PCR-dgDNA AB AB AA BB AB AA AA AA AA AB AA BB AA AA BB AB BB AB

dgDNA BB AA AA

S2

PCR-FTA AB AA AA BB AB AA AB AB BB AA AA AA BB AB AA AA AB AB AB BB

PCR-dgDNA AB AA AA BB AB AA AB AB BB AA AA AA BB AB AA AA AB AB AB BB

gDNA

S3

PCR-FTA AB AB AA AB AB AA AA BB BB AB AA AA BB AA AA AA AA AB AB AA

PCR-dgDNA AB AB AA AB AB AA AA BB BB AB AA AA BB AA AA AA AA AB AB AA

dgDNA AA Figure 3: Examples of correct calls, miscalls and no calls in three samples (S1, S2 and S3) in a comparison between PCR

amplified DNA from blood sample collected on FTA (PCR- FTA), whole genome amplified from degraded DNA (PCR-dgDNA) and degraded genomic DNA (dgDNA). Twenty SNPs on chromosome 18 were randomly selected from the 660,000 SNPs available for all three subjects. At each SNP, the genotypes were either (1) called correctly: see dgDNA genotype of rs10083985 for S1, (2) miscalled: see dgDNA genotype of rs1008899 or (3) not called: see genotype of all three sources of DNA for rs10163736.

142

Figure 4: The Illumina Chromosome Browser (ICB) features a plot of the B allele

frequencies along the chromosome 18 in sample 1. The horizontal axis denotes

the physical position of SNPs (scale in megabases, Mb), and the vertical axis

denotes the estimated the B allele frequency. aNinety six percent of SNPs were

called and genotyped as AA, AB or BB using PCR-FTA as a source of DNA. b

Ninety five percent of SNPs were called and genotyped using PCR-dgDNA as a

source of DNA. c Eighteen percent of SNPs were called and genotyped using

dgDNA. a, b There is a deletion in 55 to 65 Mb, where, in c, due to poor-quality

DNA, the deletion was not obvious.

143

Figure 5: A box plot representing the distribution of log R ratio in all three samples using three different sources of DNA. The log R ratio

provides a measure of the noise in the data. Typical values associated with high-quality data are 0.1 to 0.25. a A log R ratio is

shown using PCR- FTA. b The log R ratio was not as tightly grouped when using PCR-dgDNA. c Good-quality log R ratio was

observed due to a poor DNA quality when using dgDNA.

(A) PCR‐FTA (B) PCR‐dgDNA (C) dgDNA

144

Table 1: Summary of number of “calls” and “no calls”, call rate, allele frequencies for the AA, AB and BB genotypes, minor allele frequency

and percentile of Gen Call on 657,366 Loci for PCR-FTA, PCR-dgDNA and dgDNA.

DNA Sources Sample #No Calls #Calls Call_Rate A/A Freq

A/B Freq

B/B Freq

Minor Freq

50% GC_Score

10% GC_Score

PCR-FTA Loci= 657,366

S1 18,494 542,996 0.9671 0.3312 0.2923 0.3765 0.4773 0.4396 0.2867

S2 17,446 544,044 0.9689 0.315 0.3261 0.359 0.478 0.4396 0.2861

S3 20,156 541,334 0.9641 0.3259 0.3032 0.3709 0.4775 0.4396 0.2861

PCR-dgDNA Loci=657,366

S1 25,007 536,483 0.9555 0.3285 0.2956 0.376 0.4763 0.8741 0.5439

S2 23,455 538,035 0.9582 0.3131 0.327 0.3599 0.4766 0.8795 0.5534

S3 20,041 541,449 0.9643 0.324 0.3055 0.3706 0.4767 0.8915 0.6483

dgDNA Loci=657,366

S1 457,583 103,907 0.1851 0.5549 0.2823 0.1628 0.3039 0.6957 0.2085

S2 218,313 343,177 0.6112 0.2174 0.4559 0.3268 0.4453 0.7994 0.2781

S3 292,182 269,308 0.4796 0.299 0.3517 0.3493 0.4749 0.7853 0.2594

145

Table 2: Bonferroni's multiple test shows that the call rates for genomic DNA extracted from FTA (96.6%) and PCR amplified genomic

DNA (average = 96.0%) are significantly higher when compared to degraded genomic DNA (42.6%) (p = 0.0027).

Bonferroni’s Multiple Comparison Test

Test Mean Difference t Significance (p<0.05) 95% Cl of Difference

FTA-PCR vs PCR-dgDNA 0.006 0.065 NO -0.33 to 0.34

FTA-PCR vs dgDNA 0.540 5.326 YES 0.21 to 0.87

PCR-dgDNA vs dgDNA 0.533 5.260 YES 0.20 to 0.87

ANOVA (one way analysis of variance)

Test Sum of Squares Degrees of Freedom Mean Squares F Ratio p-value

Three DNA templates 0.570 2 0.30 18.68 0.0027

Call Rate 0.090 6 0.02

Total 0.660 8

146

DISCUSSION

DNA collected for SNP analysis needs to be of sufficient quality to ensure high genotype call

rates. Association studies investigating the underlying factors of complex diseases

increasingly require sustainable high-quality DNA resources for large-scale single-nucleotide

polymorphism (SNP) genotyping (Paynter et al., 2006).

While venous blood is often considered the optimal source for DNA, the invasiveness and cost

of obtaining venous blood samples can be prohibitive, especially for large-scale human studies

or those that deal with livestock and wild animals. Additionally, fresh samples collected in the

field may experience degradation before they can be processed. Previous research has shown

that multiple genomic sources, including lymphocytes (Dictor et al., 2007), buccal cells (Milne

et al., 2006), sperm (Yoshihiko and Shin-ichi, 2006) and fingernails (Nakashima et al. 2008),

can be used to generate high-density SNP data provided the DNA sample is of adequate

quality and quantity (Jasmine et al., 2008). The ease of collection, transportation, storage and

protection from degradation of samples stored on FTATM cards provides a possible solution.

McClure et. al. (2009) used DNA extracted from cells on FTATM cards to study SNPs on

Illumina’s I-select Bead Chip which contains 54,122 SNPs (McClure et al., 2009). This study

expands on McClure et al.’s (2009) study and assessed three different sources of DNA as

suitable templates in a genome-wide study (GWAS) using Illumina's human 660W-Quand

Bead Chip, which contains 660,000 SNP markers.

In this study, three different types of DNA templates (PCR-FTA, PCR-dgDNA and dgDNA

see methodology) were used for GWAS. A call rate of greater than 95% may be obtained for

GWAS studies of a good-quality DNA on Illumina’s Infinium Array. On the other hand, poor-

quality DNA such as degraded DNA, can result in low call rates as a result of polymorphisms

that were called erroneously (miss call) or SNPs that were not called (no call). Figure (1)

shows the ratio of “calls” to “no calls” can be highly variable among the three templates. For

instance, degraded DNA (dgDNA) shows a low number of SNP calls, which affected the call

rate (mean of 42.6%), where the use of an amplification step on degraded DNA (PCR-dgDNA

) prior to GWAS improved the call rate (mean of 96.0%). It would appear that the use of

FTATM as a DNA collection method also increased the call rate of the samples (mean of

96.6%).

147

In order for a SNP to be called or genotyped correctly, the SNP should fall in the call zone

(middle of darker shade) of the designated AA, AB or BB regions (see Figure 2). Poor-quality

DNA can result in the SNP falling outside the dark shaded area, which results in a "no call" for

the marker. Where an amplification step was used before GWAS, the SNPs fall within the call

zone and were genotyped correctly. Moreover, when using DNA from FTATM, the highest call

rate results were obtained. This suggests the possibility of using this simple specimen,

relatively inexpensive collection and nucleic and purification technology as a convenient

method of collection and storage of blood samples before embarking on GWAS studies.

A further problem when dealing with poor quality of DNA is the miscalled genotype (or

mistakenly called) effect. Figure (3) shows an example of miss call for SNP rs1008899 in S1

when using degraded DNA. The SNP was genotyped AA, with the call falling outside call

zone and between AA and AB areas. When the sample was amplified and subsequently

genotyped, the SNP called AB. The genotype called was in the middle of the shaded area for

AB. Further, the same SNP from the sample sourced from FTATM confirmed the call was

indeed AB.

One of the advantages of using the Illumina platform is the ability to study the loss of

heterozygosity (or LOH). Figure (4) shows the effect of poor-quality DNA on the call rate.

The result for degraded DNA is scattered throughout the plot, and it is difficult to distinguish

whether the call of the SNP is AA, AB or BB. Whereas in PCR-amplified degraded DNA, the

call rate or efficiency for SNPs improved. The use of DNA sourced from FTATM also gave

rise to a high call rate with SNPs genotyped correctly.

Strategies to recover degraded DNA samples for GWAS analyses have previously been used

(Ballantyne et al., 2007), one of which is based on an amplification step prior to the pre-

amplification step that occurs during the GWAS assay (Ryo et al., 2007). In this study,

10ng/µl of each degraded DNA sample was amplified using Sigma's Genomelex® kit,

followed by a clean-up step performed using Promega's PCR purification kit. This additional

amplification step before the GWAS assay step proper improved the call rates from 19% to

96% in the first sample (S1 in Table 1). The call rates in S2 and S3 also improved to 96%

from 61% and 48%, respectively (see Table 1).

148

Quality control (QC) algorithms for GWAS have been incorporated in the analysis process to

assess, evaluate and guarantee the quality of genotyping. The bead studio analysis software

package provides several convenient QC modules, such as the Box Plot, a useful tool to

quickly visualise the variation within an array and between arrays. A "log of R ratio" provides

a measure of noise in the data. The typical values associated with high-quality data ranges

from 0.1 to 0.25. Figure (5) shows results generated from DNA extracted from FTATM had

the least noise of the three templates. This provides some degree of confidence that DNA

from biological samples collected and stored on this matrix can be used for genome-wide

studies. The p value of 0.0027 obtained from ANOVA shows significant difference between

the three templates. A Bonferroni’s pair-wise comparison was also performed and showed

there were significant differences (Table 2) between both PCR amplified degraded DNA and

PCR amplified DNA from FTATM when compared to degraded DNA. Although the three

samples discussed to this point show a call rate of 96%, analysis was performed across 23

samples with an average call rate of 99% (data not shown) when using DNA from FTATM.

Furthermore, there have been studies that have shown that blood spots on FTATM cards are a

more efficient source of DNA for studying genetic polymorphisms including STR analysis

(Guangyun et al., 2005). DNA from neonatal blood that has been stored over 10 years on

Guthrie cards have been successfully extracted using modified FTATM technology known as

GenSolve for whole genome microarray analysis. In contrast, the traditional procedures of

strong alkali or heat treatment used for DNA extraction compromised the physical and

chemical integrity of nucleic acid (Hardin et al., 2009).

FTATM has received considerable interest from other sectors of bioscience, such as forenscis,

due to its non-invasive and cost-effective means for obtaining DNA in large-scale studies.

FTATM cards have also been shown to be compatible with virtually all cell types (McClure et

al., 2009). While early studies have shown that DNA harvested from FTATM cards were

suitable for genotyping 1,516 SNPs on the Illumina Golden Gate platform and 10,000 SNPs

on the Affymetrix 10 K GeneChip, more recently, FTATM cards have been shown to be

suitable for high-throughput genotyping on the Illumina iSelect platform, which currently

assays up to 200,000 SNPs. McClure et al (2009) concluded that FTATM cards provide an

excellent medium for harvesting DNA from multiple cell types, and that, when assayed using

the Illumina iSelect technology, yield high-genotype call rates and reproducibility, particularly

when the DNA is extracted using the GenSolve kit (McClure et al., 2009). DNA from FTATM

149

cards has been used in Illumina Golden Gate Bead Array Assay in ovarian cancer studies to

assess its performance in multiple displacement association WGA studies performed by

Cunningham et. al. (2008) (Cunningham et al., 2008). In this study, DNA from FTATM was

successfully used on Illumina's chip containing 660,000 SNPs and showed the highest

accurate call rates in comparison to other DNA sources, amplified and not-amplified genomic

DNA.

In conclusion, FTATM cards capture nucleic acid in one easy step. Captured nucleic acid is

ready for downstream applications in less than 30 min. Nucleic acids collected on FTATM

cards are stable for years at room temperature. FTATM cards are stored at room temperature

before and after sample application, reducing the need for laboratory freezers. They are

suitable for virtually any cell type and any genotyping platform. FTATM cards come with a

built-in indicator that changes colour upon sample application to facilitate handling of

colourless samples. They are available in a variety of configurations to meet application

requirements. They have been widely used in the fields of forensics, transgenics, transfusion

medicine, plasmid screening, food and agriculture testing, drug discovery, genomics, STR

analysis, animal identification, diagnostics, pharmacogenomics and molecular biology. Thus,

FTATM cards are a routine and cost-effective technology that provide a simple method for

preservation of biospecimens, amenable to high-throughput DNA extraction, all the attributes

required to undertake successful GWAS in an efficient manner.

150

ACKNOWLEDGEMENTS


Western Australia, Ms. Alsafar is a Ph.D. scholar at the University of the Western Australia

supported by the Dubai Police General Head Quarters in the United Arab Emirates. Funding

for this project was provided by the Emirates Foundation, and support was also kindly

provided by Ali Ridha, the director of Central Veterinary Research Laboratory (CVRL) in

Dubai, United Arab Emirates.

151

CONFLICT OF INTEREST

All authors declare that they have no conflict of interest.

152

REFERENCES

Ballantyne KN, van Oorschot RA, Mitchell RJ. 2007. Comparison of two whole genome

amplification methods for STR genotyping of LCN and degraded DNA samples.

Forensic Sci Int 166:35-41.

Crabbe MJ. 2003. A novel method for the transport and analysis of genetic material from

polyps and zooxanthellae of scleractinian corals. J Biochem Biophys Methods 57:171-

176.

Cunningham JM, Sellers TA, Schildkraut JM, Fredericksen ZS, Vierkant RA, Kelemen LE,

Gadre M, Phelan CM, Huang Y, Meyer JG, Pankratz VS, Goode EL. 2008.

Performance of amplified DNA in an Illumina GoldenGate BeadArray assay. Cancer

Epidemiol Biomarkers Prev 17:1781-1789.

Dictor M, Skogvall I, Warenholt J, Rambech E. 2007. Multiplex polymerase chain reaction on

FTA cards vs. flow cytometry for B-lymphocyte clonality. Clin Chem Lab Med

45:339-345.

Foster EA, Jobling MA, Taylor PG, Donnelly P, de Knijff P, Mieremet R, Zerjal T, Tyler-

Smith C. 1998. Jefferson fathered slave's last child. Nature 396:27-28.

Gill P, Jeffreys AJ, Werrett DJ. 1985. Forensic application of DNA 'fingerprints'. Nature

318:577-579.

Guangyun S, Ritesh K, Prodipto P, Michael W, Diane S, Hong C, Mei L, Ranajit C, Li J,

Ranjan D. 2005. Whole-genome amplification: relative efficiencies of the current

methods. Legal medicine (Tokyo, Japan) 7:279-286.

Hardin J, Finnell RH, Wong D, Hogan ME, Horovitz J, Shu J, Shaw GM. 2009. Whole

genome microarray analysis, from neonatal blood cards. BMC Genet 10:38.

Harvey ML. 2005. An alternative for the extraction and storage of DNA from insects in

forensic entomology. J Forensic Sci 50:627-629.

153

Jasmine F, Ahsan H, Andrulis IL, John EM, Chang-Claude J, Kibriya MG. 2008. Whole-

genome amplification enables accurate genotyping for microarray-based high-density

single nucleotide polymorphism array. Cancer Epidemiol Biomarkers Prev 17:3499-

3508.

Martins S, Trigo F, Azevedo L, Silva MJ, Guimaraes JE, Amorim A. 2002. Haplotype study

of microsatellites flanking the t(15;17) breakpoint in acute promyelocytic leukemia

patients from North Portugal. Leukemia 16:1353-1357.

McClure M, McKay S, Schnabel R, Taylor J. 2009. Assessment of DNA extracted from

FTA(R) cards for use on the Illumina iSelect BeadChip. BMC Research Notes 2:107.

Milne E, van Bockxmeer FM, Robertson L, Brisbane JM, Ashton LJ, Scott RJ, Armstrong

BK. 2006. Buccal DNA collection: comparison of buccal swabs with FTA cards.

Cancer Epidemiol Biomarkers Prev 15:816-819.

Moscoso H, Thayer SG, Hofacre CL, Kleven SH. 2004. Inactivation, storage, and PCR

detection of Mycoplasma on FTA filter paper. Avian Dis 48:841-850.

Muthukrishnan M, Singanallur NB, Ralla K, Villuppanoor SA. 2008. Evaluation of FTA cards

as a laboratory and field sampling device for the detection of foot-and-mouth disease

virus and serotyping by RT-PCR and real-time RT-PCR. J Virol Methods 151:311-

316.

Nakashima M, Tsuda M, Kinoshita A, Kishino T, Kondo S, Shimokawa O, Niikawa N,

Yoshiura K. 2008. Precision of high-throughput single-nucleotide polymorphism

genotyping with fingernail DNA: comparison with blood DNA. Clin Chem 54:1746-

1748.

Ndunguru J, Taylor NJ, Yadav J, Aly H, Legg JP, Aveling T, Thompson G, Fauquet CM.

2005. Application of FTA technology for sampling, recovery and molecular

characterization of viral pathogens and virus-derived transgenes from plant tissues.

Virol J 2:45.

154

Paynter RA, Skibola DR, Skibola CF, Buffler PA, Wiemels JL, Smith MT. 2006. Accuracy of

Multiplexed Illumina Platform-Based Single-Nucleotide Polymorphism Genotyping

Compared between Genomic and Whole Genome Amplified DNA Collected from

Multiple Sources. Cancer Epidemiology Biomarkers & Prevention 15:2533-2536.

Raina A, Dogra TD. 2002. Application of DNA fingerprinting in medicolegal practice. J

Indian Med Assoc 100:688-694.

Ryo I, Takamitsu T, Chinatsu S, Mitsugi I, Kazunari U. 2007. Simple and rapid detection of

the porcine reproductive and respiratory syndrome virus from pig whole blood using

filter paper. Journal of Virological Methods 141:102.

Tack LC, Thomas M, Reich K. 2007. Automated forensic DNA purification optimized for

FTA card punches and identifiler STR-based PCR analysis. Clin Lab Med 27:183-191.

Tolunay B, Raymond KB, Robert JC. 2006. Zinc Supplementation of Young Men Alters

Metallothionein, Zinc Transporter, and Cytokine Gene Expression in Leukocyte

Populations. Proceedings of the National Academy of Sciences of the United States of

America 103:1699-1704.

Yoshihiko F, Shin-ichi K. 2006. Application of FTAآ® technology to extraction of sperm

DNA from mixed body fluids containing semen. Legal medicine (Tokyo, Japan) 8:43-

47.

155

CHAPTER 5

CHARACTERISATION OF MHC POLYMORPHIC ALU

INSERTIONS (POALIN) IN A POPULATION OF ARAB

BEDOUINS.

This chapter was submitted to Journal of Evolutionary Biology according to the format presented in "Instruction to Authors" from the publishing house.

156

157

Chapter 5

Characterisation of MHC Polymorphic Alu Insertions

(POALIN) in a population of Arab Bedouins.

Chapter 5 describes the distribution of four Alu markers located with the Human Major

Histocompatibility Complex (MHC) in the Bedouin population of the Middle East for the first time. It

expands on work first presented by Dunn et al (Journal of Molecular Evolution. 2002; 55:718-26) and

subsequently in Tissue Antigens. (2007; 70:136-43). Dunn et al (2002, 2007) studied the distribution

of these MHC markers in Caucasians, Northern Eastern Thai, Japanese, Malaysian Chinese and

Southern Africans. The distributions of the MHC markers were compared to the results presented in

these studies by phylogenic analysis. Specifically, it establishes the relationship between Arab

populations and other populations previously studied.

The identification of polymorphisms that are unique to these populations will provide an


biological evidence left at a crime scene to provide information that could be useful in an

investigation. The comparative analysis revealed the genotype frequencies of each of these

markers in Bedouins to be identical to that previously reported for Australian Caucasians

therefore, the Middle East represent a crossroads from which humans populations migrated

toward Asia in the east and Europe to the northwest.

My colleagues and I have prepared this manuscript. I carried out all laboratory work at

Central Veterinary Research Laboratory (CVRL) under the guidance of Dr Khazanehdari. Ms

Pitt optimised the initial PCR conditions for the 4 Alu markers and Mr Ismail provided his

technical assistance. Mr laschi assisted with phylogenetic analysis. Dr Tay guided me

throughout the study from designing the study to proof reading the manuscripts. All the co-

authors have proof read the manuscript.

158

159

Characterisation of MHC Polymorphic Alu Insertions (POALIN) in Arab Bedouins

Population

Habiba S Al Safar1, 2, Alison P Pitt1, Stephen P.A. Iaschi1, Motasem W Ismail3, Kamal A

Khazanehdari3, Guan K Tay1

1 Centre for Forensic Science, The University of Western Australia, Crawley, Australia. 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates. 3 Molecular Biology & Genetics, Central Veterinary Research Laboratory, Dubai, United

Arab Emirates.

Abbreviated title: Major Histocompatibility Complex, POALINS

Keywords: Major Histocompatibility Complex, Polymorphisms, Bedouins, Alu

insertions


Western Australia






Phone: + 61 8 6488 7286

Fax: + 61 8 6488 7285


160

161

ABSTRACT

Polymorphic Alu insertions (POALINS) are widely spread through the human genome, and

have been used in a range of applications, including anthropological analyses of human

populations. For example, the evolutionary relationships between populations of African,

European, and Asian descent have been analyzed by comparing the distribution of specific

POALINS of the major histocompatibility complex (MHC) with those of HLA, complement,

and other polymorphic markers.

In the current study, we have broadened this analysis by focusing on a previously

uncharacterized population, the Bedouin from the Middle East (n = 91). Specifically, we

determined the frequency of individual insertions of four POALINs within the MHC class I

region of this population: AluyMICB, AluyTF, AluyHJ and AluyHF.

We found the genotype frequencies of each of these POALINS in Bedouins to be identical to

that previously reported for Australian Caucasians. For AluyHJ, the highest frequency for

allele*1 was found in Malaysian Chinese, northeastern Thais, Japanese, and Mongolians. The

frequency in Bedouins was similar to that previously reported for Australian Caucasians, each

representing the second highest allele frequency in the current analysis. The African

subpopulations showed a lower frequency of this allele. Phylogenetic analysis of the relative

allele frequencies of AluyHJ in combination with the remaining three POALINs markers

revealed that Bedouins have a similar lineage to Caucasians, at least for the MHC region

studied. The structure of the phylogenetic tree supports the popular contention that humans

originated in Africa. The nature of the clusters suggests that the Middle East represent a

crossroads from which humans populations migrated toward Asia in the east and Europe to the

northwest.

162

INTRODUCTION

The human major histocompatibility complex (MHC) lies on the short arm of chromosome 6,

within a gene-rich region that has been intensively studied. The MHC encodes many genes

that participate in the regulation of the immune system. One of the most striking features of

this region is its high gene density, with many of its component genes having been replicated

to form multigene families [1]. Among the genes within a given family, both single nucleotide

polymorphisms and insertion/deletion elements exist. The clustering of these families and their

highly polymorphic nature has been interpreted to be biologically and evolutionarily

significant, as they are involved in the suppression of recombination events [2]. Consequently,

contemporary MHC haplotypes contain highly specific (haplospecific) sequences. These

haplotypes are preserved over time, resulting in the development of ancestral haplotypes, such

that the sharing of one or more haplotypes between individuals implies that they are related

through a remote but common ancestor [3-7]. More recently, repetitive elements have been

used to refine the definition of MHC ancestral haplotypes, which has allowed the dating of

specific human lineages by evolutionary and anthropological methods [8-11].

One class of repetitive elements, polymorphic Alu insertions (POALINs), are members of an

Alu subfamily that appears to have been inserted into the human genome in relatively recent

evolutionary history [2, 12]. Alu repeats are short stretches of retrotransposable DNA that

were originally characterized by the action of the restriction endonuclease Alu I, which

cleaves double-stranded DNA [13, 14]. POALINs have the ability to copy themselves and

insert into new chromosomal locations, and can be diagnostic at particular genomic regions by

being either present or absent. Because inserted or deleted polymorphisms are genetically

inherited, individuals who share a particular polymorphism are assumed to share a common

ancestor [2]. Because the generation of a new Alu insertion event is rare, POALINs are a

desirable DNA marker for studying the genetic relationships between populations [15-17]. Alu

insertions also allow a large number of screenings to be done simultaneously through a single

polymerase chain reaction (PCR). Specifically, a single pair of PCR primers can generate a

number of different amplification products of a length that can resolved in agarose gels, and

can thereby be analyzed directly for polymorphisms [16].

163

Alu insertions are rarely deleted and, even if a deletion occurs, a signature of the original

insertion is left behind. As a direct result of this, Alu-specific sequences are abundant

throughout the genome, where they promote genetic recombination events that are responsible

for large-scale deletions, duplication and translocations [3, 18-21]. Deletions occur mostly in

AT–rich regions, and have been determined to be unlikely to have been created independently

of the insertion of the Alu elements [22].

In this study, we have focused on four MHC class I POALINs (Fig. 1). AluyMICB is located

with the first intron of the MICB gene, in the beta block. AluyTF is located in the region

between the beta and kappa regions, adjacent to the TFIIH and CDSN genes. The remaining

two POALINs, AluyHJ and AluyHF, lie at the beginning and the end of the alpha block, close

to the HLA-J, and the HLA-G and HLA-F genes, respectively.

The ease with which the POALINs can be genotyped has made them valuable lineage markers

for the study of human population genetics and pedigrees, which has increased our

understanding human diversity and evolution. The four MHC POALINs studied here have

been used in a range of applications, primarily focusing on the anthropological analysis of

human populations [16, 23]. The current study expands on previous analyses of specific

population groups [9, 16, 23-28]. In this paper, we report efforts to define the polymorphisms

of four Alu elements in the class I region of the MHC in a previously unstudied population, the

Bedouins of the Middle East.

164

MATERIALS AND METHODS

Subjects

The study population consisted of 91 healthy, unrelated, Bedouin individuals, each of whom

gave signed, informed consent based on information provided by the ethics committee of the

Dubai police headquarters.

Genomic DNA

After blood was drawn into EDTA tubes, genomic DNA was extracted using the MagNA

Pure LC Total Nucleic Acid Kit (Roche Applied Science, Indianapolis, IN, USA) according to

the recommendations of the manufacturer. Specifically, 300μl of whole blood from each

sample was mixed with 200μl of lysis buffer (50mM Tris pH 8.0, 100mM EDTA, 100mM

NaCl, 1% SDS) to lyse the cell membrane and to release the DNA. The procedure also

included the addition of 40μl of Proteinase K. 100μl of isoproponal was subsequently added

to remove residual amounts of protein. 500μl of Inhibitor Removal Buffer (5M guanidine-

HCl, 20mM Tris-HCl pH 6.6) was then added. The DNA was washed with a buffer (20mM

NaCl; 2mM Tris-HCl; pH 7.5) and centrifuged twice at 2,000 rpm. The DNA was washed

using cold 70% ethanol, centrifuged at 3,000 rpm and the supernatant was discarded, leaving

purified template DNA that was diluted in TE Buffer (1mM EDTA; 10mM Tris-HCl, pH 7.5)

to a concentration of approximately 20ng.μl-1. 2μl to 4μl of DNA was used for each

Polymerase Chain Reaction (PCR) assay.

POALIN PCR assay

The presence or absence of the Alu motif at each of the four loci was determined based on the

predicted size of the PCR product for each of the specific primer pairs designed for each

marker. Table 1 summarizes the primer sequences and annealing temperatures for each

marker. For primers AluyHJ, AluyHF and AluyMICB, the PCR solution (20 μl) contained 80

ng of DNA template, 10 pmol each primer, 25 nmol of each deoxyribonucleotide

triphosphates (dNTPs), 0.4 units of FastStart Taq polymerase (Roche Applied Science,

Indianapolis, IN, USA), 3 mM MgCl2, and 2 μl of 10× PCR buffer (600 mM Tris-HCl, pH 8.3;

250 mM KCl; 1% Triton X100; 100 mM β-mercaptoenthanol). The AluyTF reaction mixture

included 40 ng of DNA template, 5 pmol each primer, 0.4 μl of each dNTP, 0.5 units of

FastStart Taq polymerase, 1μl of 3 mM MgCl2, and 1μl of 10× PCR buffer. PCR was

165

performed using a DNA Engine Tetrad Thermal Cycler (Bio-Rad Laboratories, Hercules, CA,

USA), with a single hot start step at 95°C for 10 min to release the FastStart Taq, A total of 35

cycles were used, each consisting of 30 sec dentaturation at 95°C, a 30 sec annealing step

(59°C for AluyMICB and AluyHF, 55°C for AluyHJ, and 56°C for AluyTF), and an extension

step at 72°C for 45 secs. A final extension step of 72°C for 10 min completed the cycle.The

PCR reaction products were separated on 1.5% agarose gels in Tris-Borate EDTA (TBE) on a

horizontal model 192 gel electrophoresis sub-cell (Bio-Rad Laboratories, Hercules, CA,

USA), which were stained with ethidium bromide.

Genotype Analysis

The PCR assays were designed to detect the presence and absence of the insertion or deletion

characteristic of each of the MHC POALINs. In each case, a larger PCR product band

indicated the presence of the Alu element (referred to as allele*2), while the smaller band

indicated the absence of the insertion (allele*1). Allele frequencies were obtained using the

gene counting method [29]. and were calculated by adding the number of alleles seen in the

study group (e.g. 2*allele*1 for 1, 1 and 1*allele*1 for 1, 2), and then dividing this value by

the total number of alleles present in the sample population (or twice the number of subjects).

The alternative allele (allele*2) is 1-frequency [allele*1] (Table 2).

The estimated genotype frequency of each allele was calculated using the Hardy-Weinberg

equilibrium equation p² + 2pq + q² = 1 . Here, p is defined as the frequency of allele*1 and q

as the frequency of allele*2. The observed and estimated genotype frequencies were

subsequently compared. The frequencies for allele*1 and allele*2 were calculated by squaring

their allele frequency. The frequency for heterozygous alleles was calculated as double the

product of the frequency of allele*1 and the frequency of allele*2. The population was

considered to be in Hardy-Weinberg equilibrium if the observed frequency matched that

predicted by the equation.

Phylogenetic analysis

We used Gendist software, a component of the Phylip program (version 3.69), to compare

Nei's genetic distance values of the Bedouin population compared to eight previously studied

populations. The distance matrix was converted to MEGA format, and a neighbour-joining

phylogenetic tree was constructed in MEGA (version 4) [30]. Bootstrap 1000 replicate, seed =

166

64,238 values were selected to indicate the reliability of the tree topology. DisPan (Genetic

Distance and Phylogenetic) analysis was used to confirm the phylogeny.

167

RESULTS

The POALIN PCR assay results are shown in Fig 2. For each locus, a smaller band

corresponds to the (allele*1), while the larger band represents the allele containing the Alu

insertion (allele*2). Homozygotes for an Alu insertion would thus be expected to show only

one or the other band, with heterozygotes having both.

For example, lanes 2, 3, 4, 6 and 9 show the ALuyMICB assay results for an individual

homozygous for the allele*1, which yields a 502-base-pair band (denoted 1, 1). In contrast, the

single, 604-bp band visible in lane 5 corresponds to an individual homozygous for the

AluyMICB insertion allele (denoted 2, 2). Lanes 1, 7, and 8 show results for an individual who

was heterozygous for the AluyMICB element (denoted 1, 2).

Similarly, in Fig. (2) the 710-base-pair product apparent in lane 1 indicates the subject to be

homozygous for the AluyTF insertion, whereas the single, 422-base-pair product visible in

lanes 2, 3, and 5 to 9 indicates individuals homozygous for the allele*1. Results for a

heterozygous individual are shown in lane 4.

Fig. (2) shows results for AluyHJ; in which a single 501-base-pair indicates the subject to be

homozygous with the AluyHJ insertion (lane 5), a single 163-base-pair indicates a subject

homozygous for the allele*1 (lanes 1, 2, 3, 6, 7, and 9, and the presence of both bands

indicates a heterozygote (lanes 4 and 8).

For AluyHF, the allele*1 yields a 458-base-pair product, with the Alu insertion yielding a 605-

base-pair band. Thus, lane 9 indicates in individual homozygous for the AluyHF insertion,

lanes 1 to 5 represent individuals homozygous for the allele*1, and lanes 6, 7, and 8

individuals heterozygous for the AluyHF insertion.

Genotype frequencies were determined for each locus. For each of the POALINs, the number

of individuals with each genotype, either homozygous for the absence of the element (1, 1),

homozygous for the presence of the element (2, 2), or heterozygous (1, 2) were counted.

Frequencies were then established for each genotype by dividing the number individuals with

that genotype by the total number of individuals in the population. The frequency of observed

168

genotypes, allele frequencies, Hardy-Weinberg significance, and heterozygosity for

AluyMICB, AluyTF, AluyHJ and AluyHF in the Bedouin population are shown on Table 2.

AluyHJ was the POALIN in which allele*2 was most frequent, either in the heterozygous or

homozygous state (0.242), followed by AluyHF (0.225), AluyMICB (0.146), and AluyTF

(0.110). All POALINs were in Hardy Weinberg equilibrium.

Table 3 shows the comparison between the insertion frequencies of the four MHC POALINs

in the Arab Bedouin population and that of previously studied populations. For each of the

four POALINS, the insertion frequencies in the Bedouin population were similar to those in

Australian Caucasian.

Allele frequencies of the four MHC POALINs in nine populations (Table 3) produced the

genetic distance values (Table 4) that were used to construct the phylogenetic tree shown in

Fig. 3. A theoretical outgroup with a frequency close to zero was used to root the tree. Based

on the ancestral form being the root of the tree, the MHC POALIN data indicated that the 4

Asian populations (Malaysian Chinese, Japanese, northeast Thai, and the Mongolian formed a

cluster, while the Australian Caucasian and the Bedouins were separated from both the Asian

cluster and the African subpopulation.

169

kb2000150010005000

AluyMICB AluyTF AluyHJ AluyHF

Telomeric

BAT1

MICB

MICA

HLA

‐BHLA

‐C

CDSN

DDR1

FLOT1

GNL1

HLA

‐EMICC

HLA

‐30

HLA

‐92

TRIM

26

TRIM

31HLA

‐JJM

ICD

HLA

‐AMICF

HLA

‐GMICG

MICE

HLA

‐F

MHC –Class IIIMHC –Class II β block κ block α block

Figure 1: The human Major Histocompatibility Complex (MHC) is approximately 4 mega bases and is located at 6p21.3. It is composed of

three sub regions, class I, class II, and the central MHC region (also known as the MHC class III). The class I region is contained

within a 2,000 kilo base region constituting the telomeric portion of the human MHC. Above is the map of the approximate

locations and of the four polymorphic Alu insertions (POALIN: AluyMICB, AluyTF, AluyHJ and AluyHF), HLA class I loci and

related genes within the MHC Class I region.

170

1 2 3 4 5 6 7 8 9 MW

1,500bp

500bp664bp502bp

‐ve

+ve

1 2 3 4 5 6 7 8 9 MW

1,500bp

500bp

‐ve

+ve

501bp

163bp

1 2 3 4 5 6 7 8 9MW‐ve

+ve

1,500bp

500bp

710bp

422bp

1,500bp

500bp

‐ve

+ve

605bp458bp

1 2 3 4 5 6 7 8 9MWAluyMICB

AluyHJ AluyHF

AluyTF

Figure 2: Gel photograph illustrating the genotypes of nine subjects for the four MHC Class I POALINs studied. PCR assays were designed

to detect the presence and absence of insertion of four POALINs: AluyMICB, AluyTF, AluyHJ and AluyHF. The larger PCR product size for any of the four POALIN represent the presence of the insertion (referred to as allele*2) and the smaller size represent the absence of the insertion (allele*1). For example, in the panel representing the amplification products for AluyTF, an individual who is homozygous for the larger allele*2 (710 basepairs) product containing the insertion is shown in lane 1 (genotype: 2,2). An individual with the heterozygous genotype (1,2) is shown in lane 4. The remaining seven samples were homozygous for the smaller 422 basepairs POALIN product without the insertion (1,1). The same allele convention: allele*1 for the smaller product and allele *2 for larger, is also used for AluyMICB, AluyHJ and AluyHF.

171

Table 1: The primer sequences and the predicted product size of PCR amplified products of the four POALIN loci.

Aluy Loci Primer Name Primer Sequence (5' - 3') Accession

Number Position

Fragment size (bp) Annealing

Temperature allele*1 allele*2

AluyMICB

AluyMICB.F GCC TTC CAA TGC CAT TCA CAG AC006046 38,921 38,941

502 664 59°C

AluyMICB.R CTC AGC CCT GCT TTC CCA TCT AC006046 38,277 38,297

AluyTF

AluyTF.F GTG CCT GGT AAA AAT TTA AGA GCT GTA AC005530 7,150 7,177

422 710 56°C AluyTF.R TGC ACC CGG CCT AAA ACC ACT GGT T AC005530 7,836 7,859

AluyHJ AluyHJ.F AAG AAA CCC ATA ACT CAC TTG AP000519 11,430 11,450

163 501 55°C

AluyHJ.R TGT GTC CAG GTT AAA CTT CAG AP000519 11,909 11,929

AluyHF AluyHF.F GCC TCA TGG CCT GAA TCT GCC AGT GTC CTT AP000521 124,367 124,396

458 605 59°C AluyHF.R GTA ACT GAC GTG CCC TCT ATA GTA TAG TCT AP000521 124,794 124,825

172

Table 2: The frequency of the observed genotypes, allele frequencies, Hardy-Weinberg significance and heterozygosity for

AluyMICB, AluyTF, AluyHJ and AluyHF in the Bedouin population.

Aluy Loci

aGenotypes observed Allele frequencies Chi-

squared p value Heterozygosity n 1,1 1,2 2,2 Allele*1 Allele*2

AluyMICB 89 65 22 2 0.854 0.146 0.007 0.931 0.249

AluyTF 91 70 20 1 0.890 0.120 0.157 0.745 0.196

AluyHJ 91 50 38 3 0.758 0.242 1.758 0.185 0.367

AluyHF 91 53 35 3 0.775 0.225 0.944 0.330 0.349

aGenotypes: 1,1 homozygote absent; 1,2 heterozygote and 2,2 homozygote present

173

Table 3: The allele frequencies of four MHC POALINs in 9 different populations used for genetic distance calculation

Population n POALIN allele*2 frequenciesb,c

Reference AluyMICB AluyTF AluyHJ AluyHF

Bedouins 89-91 0.146 0.110 0.242 0.225

Australian Caucasian 105 0.157 0.107 0.073 0.038 (24)

Japanese 87 0.118 0.083 0.376 0.064 (25)

Malaysian Chinese 50 0.170 0.040 0.300 0.030 (23)

North-Eastern Thai 192 0.117 0.086 0.292 0.018

(28)

Mongolian Khalkh 41 0.378 0.220 0.293 0.098

South African South Eastern Bantu 50 0.030 0.100 0.070 0.090

South African Kung San 42 0.036 0.283 0.107 0.060

South African Sekele San 60 0.050 0.034 0.050 0.083 bPOALIN=polymorphic Alu insertions. cThe alternative allele (allele*1) = 1 - frequency of allele*2

174

Table 4: Genetic distance values from the four POALIN allele frequencies in nine different populations.

Population Genetic distance values

1 2 3 4 5 6 7 8 9

Australian Caucasian -

Japanese 0.0119

North-Eastern Thai 0.0110 0.0027

Chinese 0.0109 0.0035 0.0016

Mongolian Khalkh 0.0267 0.0308 0.0272 0.0234

South African South Eastern Bantu 0.0150 0.0299 0.018 0.0233 0.0526

South African Kung San 0.0272 0.0384 0.0257 0.0367 0.0497 0.0101

South African Sekele San 0.0158 0.0315 0.0191 0.0219 0.0542 0.0013 0.0181

Bedouins 0.0003 0.0149 0.014 0.0143 0.0306 0.0147 0.0273 0.0158

Root 0.0275 0.0392 0.0237 0.0276 0.0678 0.0039 0.0212 0.0020 0.0280

175

Mongolian Khakh

Japanese

North-Eastern Thai

Malaysian Chinese

Australian

Bedouins

South-African Kung San

South-African South Eastern Bantu

South-African Sekele San

Root

0.002

58

36

4880

51

67

85

Figure 3: Phylogenetic relationship of Bedouins and other studies populations using calculated distances based on frequency data from the

four studied POALINs.

176

DISCUSSION

The allelic distribution of the MHC POALINs in different populations is generally less than

0.4. Thus, the Alu insertion frequencies of the MHC POALINs are lower than those of many

other chromosomal POALINs that have been studied in other populations [23]. This has

allowed these markers to afford a closer comparison of specific populations, such as the

African subpopulations, and their similarities and differences to be refined to a more precise

allelic distribution.

In this study, we have applied, to the Bedouin population, four MHC POALIN lineage and

linkage markers that have previously been found to be informative in investigating the

ancestral relationships between other populations. These markers have also been shown to

associate with specific groups of HLA class I alleles, microsatellites, and MHC ancestral

haplotypes, which together may help to better identify variation in linkage disequilibrium and

historical recombination events. For example, according to Dunn et al.[23]. the AluyMIC *2

allele shows a strong association with four different HLA-B alleles: HLA-B13, HLA-B44,

HLA-B48, and HLA-B57. The AluyHJ*2 allele is strongly associated with HLA-A24 and with

HLA-A1. The association of these Aluy insertions and the distribution of HLA alleles suggests

that there may have been recombination between different haplotypes, rather than separate Alu

insertion events in individuals carrying various HLA alleles [9].

The highest allele frequency of any MHC POALIN insertion in the Middle Eastern Bedouin

samples was 0.242, detected for allele*2 of AluyHJ. When compared with other populations,

the AluyHJ allelic distribution in Bedouin individuals was similar to that reported by Dunn et

al. in Australian Caucasians [24]. Furthermore, the relative frequencies of the AluyHJ alleles

places the Japanese, northeastern Thai, Malasian Chinese, and Mongolian Khalkh in a separate

cluster from either the present study population or 3 previously examined African

subpopulations.

Allele*2 of AluyHF had the second highest allele frequency for the Middle Eastern Bedouin

samples. A comparison with data generated by Dunn and co-workers [24] indicates that the

AluyHF allelic distribution in Bedouin individuals was, again, closest to the Australian

Caucasian genotype frequency of 0.038. Similar results were observed for allele*2 of

177

AluyMICB, which had a frequency of 0.146 in Bedouins versus 0.157 in Caucasians.

Moreover AluyTF, with a frequency of (0.110), again presented similarity to the Australian

Caucasian population (0.107). Thus, while the Japanese, northeastern Thai, and Malaysian

Chinese appear to share a similar allele frequency distribution, a distinct frequency of the

AluyTF insertion in African subpopulations has giving rise to a more sparse topology on the

phylogenetic tree (Fig. 3). These data are in accordance with the hypothesis that early humans

originated in Africa, (the “out-of-Africa” hypothesis) with the Middle East having acted as a

crossroads from which populations then migrated east to Asia and to the north west to Europe.

In forensic DNA applications, POALINs are potentially useful DNA markers for population

identification. Specifically, they can complement other markers used in forensic science by

assisting in identifying the racial background of individuals. The results presented in this study

should form a basis for research on further racial subpopulations such, as the Middle Eastern

Bedouin, the larger Middle Eastern population, and others. This in turn may provide a more

accurate and complete forensic population database for the region, and enhance the utitility of

POALINs as a forensic tool in these geographical regions.

It is of interest that there is some coincidence between scientific and population religious

beliefs. For example, according to both Islamic and Christian scriptures, the earth was

completely destroyed during a catastrophic flood and that Noah the prophet, and his family

were the sole survivors to continue the human race. According to the Qur’an (Surah Hud

11:27-51), the present population of the world was descended from Noah's three sons: Shem,

Ham, and Japheth. It is believed that Africans were ancient descendants of Ham, Shem is

considered to be the founder of Arabs and Caucasians and Asians are from Japheth’s

descendants. According to the bible all humans descend from Noah, through his three sons

Shem, Japheth and Ham. Genesis lists seventy descendants of Noah saying: “from these the

nations were spread about in the earth” (Genesis 10:32) one of the many ways in which these

nations have been classified is with references to skin colour. The presence of melanin in skin

of humans providing protection against the elements is believed to be an important trait. Noah

and his three sons all had a measure of this dark pigment. From Shem came the Babylonians,

the Assyrians, the Jews and the Arabs who vary from fair to light brown skin. The descendants

of Japheth, who include the indo European races, vary from light skin to dark brown. As for

Ham some but not all of his descendants had dark skin. The Egyptians, with light-brown skin,

descended from Ham’s son Mizraim. Therefore the bible presents Egypt as the land of Ham

178

(Psalms 78:51;105:23,27;106:22). To unravel the mysteries of these texts and to shed light on

the interracial relationships, research is required.

In summary, based on analysis of the four POALIN markers we have examined here, the

populations we analysed segregate into 3 phylogenetic groups: (1) the Asian subpopulation,

(2) the Bedouins and Caucasians, and (3) the three included African subpopulations. We hope

this study will stimulate further analyses of the Bedouin population, so that we may

understand better both their unique genetic background and the diseases that affect this group

of individuals.

179

ACKNOWLEDGEMENTS

We would like to thank Ali Ridha Director of the Dubai Central Veterinary Research

Laboratory (CVRL) for approving the work carried out for this study in the first instance.

Funding for this project was provided by the CVRL. Ms Alsafar is a PhD scholar at the

University of Western Australia supported by the Dubai Police General Head Quarters in the

United Arab Emirates.

180

REFERENCES

1. Leelayuwat, C., et al., A new polymorphic and multicopy MHC gene family related to

nonmammalian class I. Immunogenetics, 1994. 40(5): p. 339-51.

2. Kulski, J.K., et al., Comparative genomic analysis of the MHC: the evolution of class I

duplication blocks, diversity and complexity from shark to man. Immunol Rev, 2002.

190: p. 95-122.

3. Dawkins, R., et al., Genomics of the major histocompatibility complex: haplotypes,

duplication, retroviruses and disease. Immunol Rev, 1999. 167: p. 275-304.

4. Dawkins, R.L., Martin, E., Andreas-Ziets, A., Keller, Partanen, J., Arnaiz-Villena, A.,

Vicario, J.L. & Alper, C.A., Linkage disequilibrium, interlocus association and

ancestral haplotypes. Immunobiology of HLA, ed. B. Dupont. Vol. Vol I. 1989, New

York: Springer-Verlag. p.891.

5. Dawkins, R.L., Degli-Esposti, M.A., Abraham, L.J., Zhang, W.J. & Christiansen, F.T. ,

Conservation versus polymorphism of the MHC in relation to transplantation, immune

responses and autoimmune disease. Molecular evolution of the major

histocompatibility, ed. J.K.D. Klein. 1991, Heidelberg: Springer-Verlag. p. 391.

6. Degli-Esposti, M.A., et al., Ancestral haplotypes: conserved population MHC

haplotypes. Hum Immunol, 1992. 34(4): p. 242-52.

7. Zhang, W.J., et al., Differences in gene copy number carried by different MHC

ancestral haplotypes. Quantitation after physical separation of haplotypes by pulsed

field gel electrophoresis. J Exp Med, 1990. 171(6): p. 2101-14.

8. Begovich, A.B., et al., Polymorphism, recombination, and linkage disequilibrium

within the HLA class II region. J Immunol, 1992. 148(1): p. 249-58.

9. Dunn, D.S., B.D. Tait, and J.K. Kulski, The distribution of polymorphic Alu insertions

within the MHC class I HLA-B7 and HLA-B57 haplotypes. Immunogenetics, 2005.

56(10): p. 765-8.

10. Skaug, H.J., Allele-sharing methods for estimation of population size. Biometrics,

2001. 57(3): p. 750-6.

11. Wakeley, J., et al., The discovery of single-nucleotide polymorphisms--and inferences

about human demographic history. Am J Hum Genet, 2001. 69(6): p. 1332-47.

12. Buffery, C., et al., Allele frequency distributions of four variable number tandem

repeat (VNTR) loci in the London area. Forensic Sci Int, 1991. 52(1): p. 53-64.

181

13. Batzer, M.A., et al., African origin of human-specific polymorphic Alu insertions. Proc

Natl Acad Sci U S A, 1994. 91(25): p. 12288-92.

14. Jurka, J., et al., Active Alu elements are passed primarily through paternal germlines.

Theor Popul Biol, 2002. 61(4): p. 519-30.

15. Batzer MA, K.G., Richard PE, Shaikh TH, Desselle TD, Hoppens CL, Deininger PL,

Structure and variability of recently inserted Alu family members. Nucleic Acids Res,

1990. 18:6793-8.

16. Dunn, D.S., et al., Polymorphic Alu insertions and their associations with MHC class I

alleles and haplotypes in the northeastern Thais. Ann Hum Genet, 2005. 69(Pt 4): p.

364-72.

17. Walsh, E.C., et al., An integrated haplotype map of the human major histocompatibility

complex. Am J Hum Genet, 2003. 73(3): p. 580-90.

18. Anzai, T., et al., Comparative sequencing of human and chimpanzee MHC class I

regions unveils insertions/deletions as the major path to genomic divergence. Proc

Natl Acad Sci U S A, 2003. 100(13): p. 7708-13.

19. Hedrick, P.W., R.N. Lee, and D. Garrigan, Major histocompatibility complex variation

in red wolves: evidence for common ancestry with coyotes and balancing selection.

Mol Ecol, 2002. 11(10): p. 1905-13.

20. Mungall, A.J., et al., The DNA sequence and analysis of human chromosome 6. Nature,

2003. 425(6960): p. 805-11.

21. Takasu, M., et al., Deletion of entire HLA-A gene accompanied by an insertion of a

retrotransposon. Tissue Antigens, 2007. 70(2): p. 144-50.

22. Callinan, P.A., et al., Alu retrotransposition-mediated deletion. J Mol Biol, 2005.

348(4): p. 791-800.

23. Dunn, D.S., et al., The distribution of major histocompatibility complex class I

polymorphic Alu insertions and their associations with HLA alleles in a Chinese

population from Malaysia. Tissue Antigens, 2007. 70(2): p. 136-43.

24. Dunn, D.S., et al., The association between HLA-A alleles and young Alu dimorphisms

near the HLA-J, -H, and -F genes in workshop cell lines and Japanese and Australian

populations. J Mol Evol, 2002. 55(6): p. 718-26.

25. Dunn, D.S., et al., Association of MHC dimorphic Alu insertions with HLA class I and

MIC genes in Japanese HLA-B48 haplotypes. Tissue Antigens, 2003. 62(3): p. 259-62.

182

26. Yao, Y., et al., Polymorphic Alu insertions and their associations with MHC class I

alleles and haplotypes in Han and Jinuo populations in Yunnan Province, southwest of

China. J Genet Genomics, 2009. 36(1): p. 51-8.

27. Yao, Y., et al., The association between HLA-A, -B alleles and major

histocompatibility complex class I polymorphic Alu insertions in four populations in

China. Tissue Antigens, 2009. 73(6): p. 575-81.

28. Kulski, J.K. and D.S. Dunn, Polymorphic Alu insertions within the Major

Histocompatibility Complex class I genomic region: a brief review. Cytogenet Genome

Res, 2005. 110(1-4): p. 193-202.

29. Ceppellini, R., M. Siniscalco, and C.A. Smith, The estimation of gene frequencies in a

random-mating population. Ann Hum Genet, 1955. 20(2): p. 97-115.

30. Kumar S, T.K., Nei M., MEGA: Molecular Evolutionary Genetics Analysis.

Pennsylvania State Univeristy, 1993. University Park, PA.

183

CHAPTER 6

A GENOME WIDE SEARCH FOR TYPE 2 DIABETES

SUSCEPTIBILITY GENES IN ARAB FAMILIES.

This chapter is a submission to the Human Molecular Genetics and the format is presented as

per the "Instruction to Authors" from the publishing house.

184

185

Chapter 6

A Genome Wide Search for Type 2 Diabetes Susceptibility

Genes in Arab Families.

Chapter 6 was prepared as a manuscript and has been submitted to Human Molecular

Genetics. The aim of the study presented in this manuscript was to identify loci that could

potentially influence susceptibility to Type 2 Diabetes (T2D) in patients of Arab descent within

the United Arab Emirates (UAE) population. Data on DNA haplotypes in the tribes of the

Middle East is limited and recent advances in DNA technology has provided the opportunity

to study this ethnic group. In this specific study high throughput DNA arrays were used to

study Single Nucleotide Polymorphisms (SNPs) and their influence on Type 2 Diabetes among

Arabs.

To date, no genome wide screen genetic factors of Type 2 Diabetes among the UAE

population nor any other Arab populations. Towards this, the first Genome Wide Association

Study in Bedouins was performed on 178 volunteers from DNA repository developed for this

particular study using Illumina's Human 660W-Quad-BeadChip. Work in Caucasians has

previously defined genetic susceptibility regions on Chromosomes 3, 6, 8, 9, 10, 11, 16, and

17. Analysis of data from this study has revealed potential candidate genes on Chromosome

14.

This study revealed some novel genes in the etiology of Type 2 Diabetes in Arab population in

UAE. The strongest associations were found within the PRKD1 region on 14q11 of

chromosome 14. Associations with the genes RBM47, KCTD8, GABRB, SCD5, OC90 and TG

were observed as well. The fact that PRKD1 has not been found in previous studies may

either be due to chance of sampling variation, power differences or may be explicable in terms

of a higher level of genetic and environmental heterogeneity in the other population,

compared with Arab population. To strengthen claims made here, further replication and fine

mapping in a larger cohort of Arab population, especially in Arab descent sample, will be

essential to validate the results presented here.

186

My colleagues and I have prepared this manuscript. I performed all laboratory work at

Central Veterinary Research Laboratory (CVRL) under the guidance of Dr Khazanehdari. I

performed the data analyses with assistance from co-authors. Specifically, Dr Jafer provided

the technical assistance, Dr Jamieson assisted with the statistics analysis. Drs Cordell and

Blackwell provided endless support and advice regarding the statistical methods and analyses.

Dr Tay guided me throughout the study from designing the study to proof reading the

manuscripts. All the co-authors have proof read the manuscript.

187

A Genome Wide Search for Type 2 Diabetes Susceptibility Genes in Arab Families.

Habiba S Al Safar1, 2, Heather J Cordell3,Osman Jafer4, Sarra E Jamieson5, Kamal Khazanehdari4,

Jenefer M Blackwell5, 6,Guan K Tay1

1 Centre for Forensic Science, The University of Western Australia, Crawley Western Australia. 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates. 3 Institute of Human Genetics, Newcastle University, Newcastle upon Tyne, United Kingdom. 4 Molecular Biology and Genetics Laboratory, Central Veterinary Research Laboratory, Dubai,


5 Telethon Institute for Child Health Research, Centre for Child Health Research, The University

of Western Australia, Subiaco, Western Australia.

6 Cambridge Institute for Medical Research and Department of Medicine, School of Clinical,







Phone: + 61 8 6488 7286

Fax: + 61 8 6488 7285


188

189

ABSTRACT

Type 2 Diabetes (T2D) is currently the fastest growing debilitating disease in the world. In the

United Arab Emirates (UAE), it has been estimated that one out of five people between the ages

of 20 to 79 lives with this disease. Due to an increasing prevalence of T2D in the region, lifestyle

management strategies with an emphasis on prevention are required. Determining genetic risk

factors can also make an important contribution to understanding the processes leading to disease.

A genome wide association study (GWAS) using a family based association test (FBAT) in an

extended family of 178 members from the UAE (66 diabetic and 112 healthy individuals) were

genotyped using the Illumina Human 660 Quad chip array was undertaken in order to identify

gene(s) and mechanisms associated with disease.

The study revealed 21 new association signals from single nucleotide polymorphisms (SNPs)

within five genes (RBM47, KCTD8, GABRB1, SCD5 and PRKD1). Six SNPs within PRKD1 on

chromosome 14 were found to be most strongly associated with T2D in this Arab population. It

has been suggested that PRKD1, a serine/threonine kinase, plays an important role in insulin

secretion. The strongest statistical evidence for this new association signal was from rs10144903

in intron 1 of PRKD1, with the overall estimate of effect returning an odds ratio of 3.72 (95%

confidence interval, 1.28-10.82; p-value = 3.92E-06). This study is the first GWAS for T2D in

families of Arab descent, and these findings may provide important insights into the pathogenesis

of T2D in Middle Eastern populations. Comparative analysis with other ethnic groups could assist

in dissecting the mechanisms that cause the disease.

190

INTRODUCTION

Diabetes mellitus is a group of metabolic diseases characterised by hyperglycemia resulting from

defects in insulin secretion, insulin action, or both [1, 2]. Diabetes is one of the most prevalent

chronic diseases. It results in significant morbidity and contributes to the death of millions of

people worldwide. Currently, over 170 million people globally suffer from Type 2 Diabetes

(T2D) [3]. Most of these patients are middle aged. However, earlier age-of-onset is becoming

more common as a result of changes in lifestyle and behavioural factors interacting with genetic

predispositions. Ethnicity is also a risk modifier as people of certain ethnic backgrounds are more

likely to develop diabetes than others. It has been reported that African Americans, Hispanic

Americans, American Indians, some Asian Americans and Pacific Islanders are particularly at

high risk for T2D [4], however the genetic factors that account for this observation have yet to be

identified.

As suggested, genetics plays a role in the disease, but exactly how certain genes may cause

diabetes is unknown. An understanding of the genetic basis of T2D could lead to the development

of new treatments to target the problem. With the increase in prevalence of diabetes worldwide,

the need for intensive research is of high priority [5]. Towards this, researchers now have access

to a set of powerful tools that make it possible to find the genetic contributions to common

diseases. Microarray technology has given rise to high throughput and high-density strategies

such as genome-wide association studies (GWAS). Combined with the scaffold data of the human

genome courtesy of the completed HUGO project in 2003 [6] and the International HapMap

Project in 2005 [7], it is now possible to analyse whole-genome samples for genetic variations

that contribute to common disease in a fast and efficient manner.

The use of GWAS has greatly increased the number of confirmed genetic loci for T2D in many

different populations such as Pima Indian [8], Mexican American [9], Amish [10], French [11],

Japanese [12], Iceland [13], Finnish [14], Chinese [15], Korean [15], Caucasians [16-19] and

Swedish [20]. Moreover, some of the mapped loci have been observed to be common across

multiple populations. For example, the single nucleotide polymorphism (SNP) rs7903146 in

191

TCF7L2 gene has been found to be associated with T2D in French, Japanese, Finnish, Irish,

British, Israeli and German populations [11, 13, 17, 19, 21-24]. Other regions, however, may be

unique to specific populations (e.g. rs2237892 in KCNQ1 has been exclusively found in Japanese

population) [12, 15]. This may reflect underlying phenotypic heterogeneity, racial/ethnic

differences in susceptibility allele frequencies, or differences in sample size, study design, and

analytical methods. Understanding the similarities in ethnic specific associations as well as

difference in the genetic make-up of different ethnic groups, particularly for a disease that occurs

globally, is important for unravelling the genetic architecture.

Unlike most major population groups, a lack of research on the Middle East populations has

created a serious gap in understanding the trend of common diseases such as diabetes within these

populations. Compounding the problem is the fact that T2D has become a major public health

problem in the UAE as the level of affluence has increased. Malik et al. (2005) have estimated

that 25% of UAE citizens suffer from T2D [25] and the prevalence of the disease is increasing

[26].

This GWAS was conceived to investigate and identify the genes that may influence susceptibility

to T2D in an Arab family originating from the UAE. The project focussed specifically on an

indigenous Arab population. The characteristics of the Arab population such as high rate of

consanguineous marriages, high birth rate and their life style make them ideal for the study of

complex, polygenic, multifactorial disorders such as T2D. Therefore, to investigate the genetic

factor of T2D in this population, a family based association study (FBAT) in an extended family

of 178 members from UAE, (66 diabetic and 112 healthy individuals) was undertaken using the

Human 660 Quad chip by Illumina.

192

RESULTS

The study cohort comprises 178 individuals from one extended family (319 members) of Arab

descent. The study cohort consisted of 86 males and 92 females, which comprised 66 diabetes

patients and 112 healthy individuals. The age of the study group ranges from 18 to 95 years old

with the mean of 37.35 years and the 95% confidence interval is from 34.33 to 40.17 years old.

Table 1 summarises the basic characteristics of the cohort selected for the GWAS study.

The association p-values (Manhattan plot) from the FBAT analysis are shown in Figure 1. Groups

of SNPs with p-values below a specific threshold (p-value = 1E-4) were examined in detail. The

top scoring SNPs for association with T2D, which were mapped on chromosomes, 4, 8 and 14 are

shown in Table 2. The most significant p-values ranged from 2.7E-05 to 8.46E-06 for six SNPs in

the Protein Kinase D1 (PRKD1) gene on chromosome 14 (Figure 1 and Table 2). The strongest

statistical evidence for a novel association signal was from the SNP rs10144903 in intron 1 of

PRKD1, with the overall estimate of effect returning an odds ratio of 3.72 (95% confidence

interval, 1.28-10.82) (p-value = 3.92E-06) using an additive model. The PRKD1 gene association

has not been reported in any previous study and represents a novel observation. Other SNPs that

showed association with T2D (p-value ≤ 1E-04) include a cluster of SNPs on Chromosome 4

(RBM47 [4p13-p12], KCTD8 [4p13], GABRB1 [4p12], and SCD5 [4q21.22] and Chromosome 8

(OC90 [8q24.22] and TG [8q24]) as summarised in Table 2.

To investigate the association of PRKD1 gene polymorphisms with T2D, we calculated pairwise

LD coefficients, namely D' and r2, for PRKD1 SNPs, for the six associated intronic SNPs:

rs11626603, rs11622611, rs4981716, rs1953722, rs10144903 and rs7154546. Three haplotype

blocks were observed (Figure 2) across the PRKD1 locus, with all six associated SNPs mapping to

the largest LD block 2. The six significant SNPs (rs11626603, rs11622611, rs4981716, rs1953722,

rs10144903 and rs7154546) comprise six haplotypes (AAGAAG, GGAGCA, AAGGAG,

AGGGCG, AAGGCA and GGAGAG ). The haplotypes and their frequencies are shown in Table

4 and Table 5 which illustrates that only two haplotypes occur at any appreciable frequency.

Analysis in UNPHASED indicated that none of the remaining five SNPs were significant when

193

added to a model that included the effect of the most significant SNP rs10144903 (see Figure 3)

i.e. all the association in the region can be accounted by rs10144903. However, given the strong

LD between the SNPs, any of these other five SNPs could equally well account for the observed

association.

Table 3 lists all the previous GWAS and subsequent meta-analyses that have identified risk loci

associated with T2D up to date [18]. No association is detected at these loci in the study described

here, with the exception of the WFS1 and PPP2R2C locus (rs4689388 p-value =0.006). This locus

was previously associated with T2D in a French population [11] at p-value < 1.00E-5 and has

subsequently been confirmed as T2D risk loci in other replication studies [27].

To explore the biological pathways of the PRKD1 gene of interest, we identify significant

networks among the previously known genes associated with PRKD1 pathway, possibly associated

with T2D. Ingenuity™ Pathway Analysis (IPA) generates networks from the dataset of genes that

fall within PRKD1 network (Figure 4). PRKD1 belongs to the protein kinase C family, members of

which function in many extracellular receptor-mediated signal transduction pathways. PRKD1

itself, also known as protein kinase C mu (PRKCM) and protein kinase D (PKD), encodes a

cytosolic serine-threonine kinase that binds to the trans-golgi network and regulates the fission of

transport carriers specifically destined to the cell surface (OMIM:

http://www.ncbi.nlm.nih.gov/omim/605435).

Since almost all of the genes identified in this study had not previously been associated with T2D

in other studies, we were interested to identify the underlying genetic ancestry of this Arab

population compared to other populations for which HapMap data were available. We therefore

compared the genotype data for ancestors in our cohort with genotype data from the CEU,

JPT+CHB and YRI populations using multidimensional scaling (a form of principal components

analysis (PCA)) undertaken using the PLINK software (Figure 5). Scatter plots of the main axes of

variation, PC1 and PC2, show that the Arab population is more closely related to populations of

Europe (Caucasian) descent than to Asian or African descent. However, our Arab data is less well-

clustered than the data from the three HapMap populations, suggesting that there may be some

194

population stratification within this Arab cohort. This was controlled for in our study by using a

family-based study design.

195

Table 1: Description of phenotypic and clinical characteristics of 178 individuals belonging

to one extended family of Arab origin from the UAE.

Total sample size (N) 178

Generations 5

Number of Nuclear Families 41

Gender (number of females) 92

Number of T2D patients 66

T2D Patient: Age (years) 18-87

Normal: Age (years) 18-97

196

Figure 1: p-values for GWAS SNP tested for association with Type 2 Diabetes among 178 individuals belonging to one

extended family of Arab origin from UAE. Horizontal axis shows SNP location and vertical axis is -log10(p-

value) for each SNP tested by FBAT. Blue horizontal line depicts significance threshold (p =1E-4) and shows

associated SNPs clustering in chromosome 4, 8 and 14

197

Table 2: SNPs showing most significant associations with T2D using FBAT analysis under an additive model. Six SNPs within the PRKD1 gene on chromosome 14 are associated with T2D in Arab population. The strongest statistical evidence for association was with rs7154546 in intron 1 of PRKD1, with the overall estimate of effect returning an Z score of 4.45 for the minor allele, with a p-value of 8.46E-06. A cluster of SNPs on Chromosome 4 (RBM47, KCTD8, GABRB1, and SCD5 and Chromosome 8 (OC90 and TG) also showed association with T2D with p-value ≤ 1E-04.

Chr SNP Position Type Risk Allele Za Allele freq p-value Geneb

4 rs10024216 38440507 Unkno A 4.44 0.557 8.74E-06 -

4 rs1871836 40322700 Intron G 4.28 0.341 1.80E-05 RBM47

4 rs7675224 44049621 Intron A 4.14 0.81 3.50E-05

KCTD8 4 rs4407541 44076716 Intron A 4.42 0.693 9.70E-06

4 rs4695718 44107694 Intron A 4.14 0.776 3.50E-05

4 rs13144404 44130442 Intron G 4.14 0.208 3.50E-05

4 rs7692404 45570356 Unkno A 3.93 0.625 8.30E-05 -

4 rs10517178 46797750 Intron G 4.60 0.428 4.19E-06 GABRB1

4 rs1372491 46804117 Intron A 4.60 0.574 4.19E-06

4 rs6535363 83781593 Intron G 4.52 0.747 6.08E-06

SCD5 4 rs6813901 83784174 Intron A 4.01 0.19 6.00E-05

4 rs6822801 83795853 Intron A 4.01 0.19 6.00E-05

8 rs748978 130072298 Unkno G 4.19 0.122 2.70E-05 -

8 rs2202068 133114662 Intron A 4.18 0.836 2.80E-05 OC90

8 rs6998423 134058472 Intron G 3.89 0.405 1.00E-04 TG

14 rs11626603 29264650 Intron G 4.19 0.816 2.70E-05

PRKD1

14 rs11622611 29270280 Intron G 4.31 0.788 1.60E-05 14 rs4981716 29278774 Intron A 4.19 0.183 2.70E-05 14 rs1953722 29300389 Intron G 3.94 0.739 8.00E-05 14 rs10144903 29342060 Intron C 4.61 0.787 3.92E-06 14 rs7154546 29349734 Intron A 4.45 0.165 8.46E-06 aPositive Z values a positive association of minor allele, with disease. bGene information extracted from University of California Santa Cruz (UCSC) Genome Browser.

198

(A)

(B)

Figure 2: Haplotype blocks in PRKD1 generated by Haploview. Three haplotype blocks were

identified in PRKD1. Block 2 contains all six of the associated SNPs (rs11626603,

rs11622611, rs4981716, rs1953722, rs10144903 and rs7154546). (A) Colour

scheme of the LD map is based on the standard D'/LOD option in the Haploview

software. Values contained in the box at the diagonal intersect of two

polymorphisms indicates the D′ values, boxes with no value indicates complete LD

(i.e. D` = 1). (B) r2 values across the PRKD1 region.

199

Table 3: Six possible haplotypes and their frequencies between the six associated SNPs in the PRKD1 region using FBAT.

Haplotype

Frequency p-value

rs11626603 rs11622611 rs4981716 rs1953722 rs10144903 rs7154546

ht 1 A A G A A G 0.721 0.00115

ht 2 G G A G C A 0.200 0.00035

ht 3 A A G G A G 0.046 0.66385

ht 4 A G G G C G 0.013 0.14412

ht 5 A A G G C G 0.013 0.30125

ht 6 G G A G A G 0.007 0.17971

200

Table 4: UNPHASED analysis for a single-point locus of the six associated SNPs in PRKD1 region with their risk allele, chi-

square, odds ratio and 95% confidence interval (low and high).

Marker Allele Chisq p-value Odds-R Confidence

Interval 95% low

Confidence Interval

95% High

rs11626603 G 8.88 0.0028 2.76 1.14 6.70

rs11622611 G 9.57 0.0019 2.90 1.18 7.16

rs4981716 G 8.88 0.0028 0.36 0.14 0.87

rs1953722 G 9.54 0.0020 2.77 1.15 6.66

rs10144903 C 11.88 0.0005 3.72 1.28 10.82

rs7154546 G 10.32 0.0013 0.30 0.10 0.84

201

Figure 3: A locus zoom plot of log10 (p-values) across the PRKD1 region around rs7154546

(red star) within the last intron of PRKD1 gene on chromosome 14 shown to be

strongly associated with T2D in Arab population. The colouring of SNPs indicates

the strength of LD with rs7154546, coded as red (strong, r2 ≥ 0.8), blue (moderate,

0.2 < r2 ≤ 0.4), dark blue (weak, r2 ≤ 0.2). The blue line depicts local recombination

rates.

202

Table 3: Genes showing genome-wide significant association with T2D in previous studies

among different populations, and their p-values in this study. This study

demonstrated an association with WFS1, PPP2R2C.

Gene SNP Population* p-value Reference

TCF7L2

rs7903146 FR, JP, FI, IS,

UK, IL, DE

0.704 [11-14,

17, 19,

21, 22,

,24] rs7901695 0.777

SLC30A8 rs13266634 FR, JP, FI, IS,

UK 0.165

[11-14,

17, 21,

22]

HHEX rs1111875

FR, JP, FI 0.770 [12, 14,

21, 22] rs5015480 0.826

FTO rs8050136

FI, UK 0.250 [14, 17,

21] rs5215 0.456

LOC64673,IRS1 rs2943641 FR 0.254 [11]

WFS1,PPP2R2C rs4689388 FR 0.006 [11,27]

LOC72901,CETN3 rs12518099 FR 0.705 [11]

IGF2BP2 rs4402960 JP, FI 1.000 [12, 14,

21, 73]

MTNR1B rs1387153 FR 0.973 [74]

VEGFA rs9472138 UK 0.334 [19]

CETP rs1800775 FI, SE 0.220 [20]

APOB rs693 FI, SE 0.426 [20]

Intergenic# rs1859962 UK 0.168

[24] rs6712932 UK 0.420

*France (FR), Japan (JP), Finland (FI), Iceland (IS), United Kingdom (UK) mainly Caucasian, Israel (IL), Germany (DE), Sweden (SE). # rs1859962 is located on chromosome 17 and rs6712932 located on chromosome 2.

203

Figure 4: Pathway of PRKD1 generated using Ingenuity Pathway Analysis to identify

networks among the early genes, altered in the PRKD1, associated with Type 2

Diabetes. Gray shaded shapes in PRKD1, GABRB1 and TG genes depict direct or

indirect role in etiology of the T2D.

204

Figure 5: Scatter plot of principal component 1 and principal component 2 for Arab

population (Red) with 3 continental clusters (Europe (Green), Asia (Blue) and

Africa (Black). The Arab population is clearly closer to Europe (Caucasian) than to

Asian and African.

205

DISCUSSION

GWAS studies have been very effective in mapping disease susceptibility genes. Susceptibility

loci for T2D have been mapped in many different populations, some of which have been observed

in multiple populations and some of which are unique to a specific population. To date, there is a

lack of GWAS studies performed on Middle Eastern populations which gives little opportunity to

understand the aetiology of common disease in these populations. In this study, the goal was to

investigate the genes influencing susceptibility to T2D in ethnic groups in the UAE population,

specifically of Arab origin. The GWAS cohort was analysed using FBAT after performing quality

control on the data using PLINK.

The study was conceived to detect SNPs with modest influence on T2D among the Arab

population. However, given the relatively small sample size, we were only well-powered to detect

fairly strong effects, and indeed our most significant finding (in PRKD1) had an allelic OR of

3.72 (95% confidence interval, 1.28-10.82). Our family-based design, consisting of a single large

pedigree, offers the opportunity to detect risk alleles that correlate with disease within the

pedigree. As such, our results may perhaps better be considered as indicative of linkage in the

presence of association rather than of association per se. Previous studies and subsequent meta-

analyses have identified 17 risk loci associated with T2D in various population (Table 3) of which

only one (rs4689388) showed modest replication in the study presented here. Another study

showed that rs7903146 and rs12255372 variants of TCF7L2 have been strongly associated with

T2D risk in most populations [19]. Evidence that this variant in this gene may be associated with

T2D in the study presented here was sought. Unfortunately the p-value was not significant. In

addition, recent studies of rs7903146 variant in Arab populations of Saudi and Emirati origin

reported weak or no association with T2D [28, 29]. However Ereqat et al. (2009) have shown a

significant association of subjects rs7903146 variant of TCF7L2 with T2D in Palestinian

population [30].

206

New association signals at SNPs within 7 genes (OC90, TG, RBM47, GABRB1, SCD5, KCTD8,

and PRKD1). Since this study is the first GWAS for T2D candidates in families of Arab descent,

these findings may provide new insights into the pathogenesis of T2D.

One SNP in Otoconin-90 (OC90) gene was positive for association in this study. OC90 encodes

the predominant protein constituent of vestibular otoconia [31]. To date neither the functions of

otoconial proteins nor the process of otoconia genesis are clearly defined. OC90 is the major

protein component of otoconia with sequence (but most likely not functional) homology to

phospholipase A2 [32]. OC90 accounts for 90% of the total otoconial protein which renders the

receptor cells of the vestibular system [31]. In addition otoconia is a key element of the inner ear,

which is responsible for the perception of motion and gravity. Given that Diabetes Mellitus is a

disorder of glucose metabolism, it can be linked with vestibular dysfunction by neuropathy or

nerve damage, which is a common complication in T2D. In 2008, Bainbridge and colleagues

found an increased prevalence of hearing impairment among patients with diabetes [33]. The

study indicated that diabetes may lead to hearing loss by damaging the nerves and blood vessels

of the inner ear. This study suggests for further exploration in OC90 in T2D patients with auditory

neuropathy to study whether there is a direct cause or relationship effect. Despite intensive study,

the mechanism of otoconia formation is still a matter of debate.

A second series of SNPs with association signals of interest was found in thyroglobulin (TG). TG

encodes the glycoprotein precursor to the thyroid hormones T3 (triiodothyronine) and T4

(tetraiodothyronine). Dumont et al. (1989) noted that thyroglobulin provides three things: a

thyroid hormone precursor, storage of iodine, and storage of inactive thyroid hormones [34].

Further evidence for the association of TG with diabetics comes from various data that shows a

strong genetic influence on the shared susceptibility to Type 1 Diabetes (T1D) and autoimmune

thyroid disease (AITD) [35-37]. Most of the genes that contribute to the joint susceptibility to

T1D and AITD are involved in immune regulation. Huber et al. (2008) suggested that the

association of AITD with T1D is influenced by HLA [37]. In addition to this study, adult T1D

patients with no history of thyroid disease showed a notably higher thyroid volume in diabetics

than in age and sex matched controls [35].

207

RBM47 is another gene found in this study to be associated with T2D. The gene encodes a RNA

binding protein, which is a key element in RNA metabolism, regulating the temporal, spatial and

functional dynamics of RNAs [38]. Recent genetic and proteomic information and evidence from

animal models reveal that RNA binding proteins are involved in many human diseases [39-41].

However there is no compelling functional evidence for the association between SNPs in RBM47

and T2D. Nevertheless future studies defining the expression, RNA targets and protein

interactions of RBM47 in relevant tissues, as well as characterisation of the metabolism in the

RBM47 knock-out mouse may provide additional clues. Moreover, resequencing may be

necessary to identify causal variants in RBM47 and might support the direct involvement of

RBM47 in T2D.

The data presented here shows two SNPs positive for association in GABRB1 gene (Gamma-

aminobutyric acid receptor 1). Gamma-amino butyric acid (GABA) receptors are a family of

proteins involved in neurotransmission in the mammalian central nervous system [42] and in the

inhibition of glucagon release mediated by β cells [43]. Baily et al. (2007) showed that the

released GABA receptor from pancreatic β cells inhibits the secretion of glucagon by 50% to 60%

in both pancreatic mouse islets and murine alpha TC1-9 cell. The authors showed that the

inhibition depends on glucose concentration. The overall inhibition effect of GABA with 5 or 10

mmol/l glucose on glucagon release is 15% or 40% respectively [44]. They have also shown that

glucose dose dependently increased the expression of GABA receptors.

The over expression of stearoyl-Co desaturase 1 (SCD1) gene, one of the other genes that we have

found containing 3 SNPs positive for association (Table 2) reduces tyrosine and serine

phosphorylation of IRS1 (Insulin receptor substrate 1) and Akt/protein kinase B respectively and

is sufficient to impair glucose uptake and insulin signalling [45]. Miyazaki et al (2009) showed

that Scd1 deficiency improved insulin sensitivity in leptin-resistant A y/a and diet-induced obese

(DIO) mice [46]. Increase in whole body glucose tolerance and insulin sensitivity has been shown

on various tissues of Scd1-/- mice [47, 48].

The most interesting of the associations identified was with PRKD1. This gene is suggested to

play a role in insulin secretion. PRKD1, PKD2 and PKD3 constitute the recently identified PKD

208

family, a sub class of the AGC family of serine/threonine kinases, with structural and

enzymological properties different from those of PKC family [49, 50]. PRKD1 is composed of

different domains: a N-terminal region, two cysteine-rich zinc-finger regions, a region rich in

negatively charged amino acids, a pleckstrin-homology domain and a Ser/Thr kinase catalytic

domain [51, 52]. PRKD1 can be activated by growth factors, oxidative stress, thrombin, bioactive

lipids, cross-linking of B- and T-cell receptors and some G-protein coupled receptors (GPCR).

Previous biological studies on the PKD1 gene support its role in insulin secretion. Sumara et al.

(2009) reported that mice which do not have mitogen-activated protein kinase (MAPK) p38δ

exhibit better glucose tolerance because of enhanced insulin secretion from the β cells of the

pancreas [53]. Furthermore they showed that the protein kinase D (PKD) is vital in monitoring the

level of insulin secretion by the pancreatic β cells. These data imply that the absence of p38δ

upgrades glucose tolerance and improves insulin secretion by a direct and β cell-specific system.

It also validates the negative regulatory function for p38δ in stimulated insulin secretion by the

inhibition of PRKD1 and control of exocytosis. Furthermore, immoderate inhibition of PKD

activity by p38δ can also lead to malfunctioning of the β cells in diabetic patients. The study also

suggests that artificially induction the β cells to secrete insulin through medication, eventually

results in failure of these pancreatic cells. It also recommends that therapies should include an

insulin tropic effect along with measures to resist failure of the β cells. The finding of Sumara et

al. (2009) [53] suggests that the signalling module of p38δ and PRKD1 may be a potential

therapeutic target for human diabetes.

In addition to the observations of Sumara et al. (2009), it has also been shown that a family

member of protein kinase, protein kinase C acts as alternative mediator of insulin induced glucose

transport [54]. This suggestion comes from Cross and Franke et al. (1995) whose work showed

active Akt stimulate glucose uptake in adipocytes [55], however inhibition of Akt does not

completely block insulin effect on glucose transport [56]. In the pancreas, ATP sensitive

potassium channel has also been shown to play a key role in insulin release in response to

changing glucose levels [57, 58].

A 30% decrease of Na+, K+-ATPase activity has been shown in red blood cells (RBC) from

diabetic patients compared to control individuals [59]. Greene et al. (1987) found a decrease in

209

Na+, K+-ATPase activity due to alteration of PKC activity [60]. The group found an association

of RBC between Na+, K+-ATP activity and plasma C-peptide concentration among T2D patients

[59]. In the study described here four significant SNPs in KCTD8 gene (potassium channel

tetramerisation domain containing 8, see Table 2) was identified. It has been suggested that

potassium channels may play a role in GABAergic activity during hypoglycemia [61, 62]. Chan et

al (2007) showed K+ channel in ventromedial hypothalamus (VMH) a region that contains

glucose responsive neurons can modulate the magnitude of counter regulatory responses by

altering release of GABA [63]. Various studies have shown that expression of GABA receptors are

affected due to depolarising concentration of K+ [64] cAMP [65] and MAPK [66].

Pathway analysis showed interconnections among the three genes in PKC (PRKD1, GABRB1 and

TG). This is not unexpected for a disease that is a known to be multifactorial and for which the

mechanism is likely to require the involvement of a number of genes. Future study of these genes

might shed light into the aetiology of the disease. In addition to the unique SNPs identified in our

population we have analysed the 17 risk loci that have been previously reported to be associated

with T2D in different populations (Table 3). Interestingly, the analysis performed confirmed that

the rs4689388 (between WFSI and PPP2R2c) is associated in the Arab population with a p-value

= 0.006. This SNP was previously reported to be associated with T2D among a large sample of

French population [11]. Further studies in larger cohorts will be needed to strengthen the p-value

and replicate the association with T2D.

In conclusion, this study identified variation at PRKD1 on [14q11] as being associated with Type

2 Diabetes (T2D) in Arab population in UAE. Association at the genes RBM47, KCTD8, GABRB,

SCD5, OC90 and TG was also observed. The mechanism by which these genes increase or disease

susceptibility remains to be determined. These findings predict a set of candidate genes to be

evaluated in-depth in the future studies. The fact that PRKD1 has not been found in previous

studies may either be due to chance of sampling variation, power differences or may be explicable

in terms of a higher level of genetic and environmental heterogeneity in the other population,

compared with Arab population. To strengthen the claims made here, further replication and fine

mapping in a larger cohort of Arab population samples will be essential to validate the results

presented here.

210


Subjects

A total of 319 individuals belonging to one extended family of Arab origin were identified during

their routine visit to clinics in the UAE. Multi-generation family relationships were compiled for

these individuals, allowing a five-generation extended family pedigree to be constructed

containing 41 nuclear families. A total of 178 individuals from this pedigree agreed to participate

in this study (86 males, 92 females and 66 diabetic, 112 healthy). Clinical assessment and

questionnaire completion were conducted at the clinic. An individual was classified as T2D if the

subject was: (1) diagnosed with T2D by a qualified physician, (2) on a prescribed drug treatment

regimen for T2D and (3) returned biochemical test results of a fasting plasma glucose level of at

least 126mg/dl as based on the criteria laid by the World Health Organization (WHO)

consultation group report [67]. Each individual provided signed, informed consent based on

information provided by the ethics committee of the United Arab Emirates Ministry of Health.

DNA Extraction

After blood was drawn into EDTA tubes, genomic DNA was extracted using a Nucleic Acid Kit

(Roche Applied Science, Indianapolis, IN, USA) according to the recommendations of the

manufacturer. Briefly, 300μl of whole blood from each sample was mixed with 200μl of lysis

buffer (50mM Tris pH 8.0, 100mM EDTA, 100mM NaCl, 1% SDS) and 40μl of Proteinase K.

100μl of isoproponal and 500μl of Inhibitor Removal Buffer (5M guanidine-HCl, 20mM Tris-HCl

pH 6.6) was subsequently added. The DNA was washed with a buffer (20mM NaCl; 2mM Tris-

HCl; pH 7.5) and centrifuged twice at 2,000 rpm. The DNA was washed using cold 70% ethanol,

centrifuged at 3,000 rpm and the supernatant was discarded, leaving a pellet that contained

purified genomic DNA. The DNA pellet was diluted in TE buffer (1mM EDTA; 10mM Tris-HCl,

pH 7.5) to a concentration of approximately 50ng.μl-1.

Genotyping

Genotyping using the Infinium Human 660 Quad Chip I-Scan (Illumina Inc. San Diego, USA),

which contained 670,901 SNPs, was performed according to the manufacturer’s recommendations

211

(Illumina Inc., San Diego, USA). Whole-genome amplification was performed using 200ng of

genomic DNA at 37°C for 20 to 24 hours using reagents provided by Illumina (Illumina Inc., San

Diego, USA). Products were fragmented, precipitated, and resuspended in a proprietary

hybridisation buffer (Illumina Inc., San Diego, USA). The resuspended samples were denatured at

95°C for 20 min and loaded on Illumina Bead Chips. The chips were placed in a hybridisation

chamber for 16 to 20 hours at 48°C. After hybridisation, non-hybridised DNA was washed away.

An allele-specific single-base extension of the oligonucleotides on the BeadChip was performed

in a 48-position Slide Chamber Rack (Illumina Inc., San Diego, USA), using labelled

deoxynucleotides and the captured DNA as a template. After staining of the extended DNA,

BeadChips were washed and scanned with I-Scan (Illumina Inc., San Diego, USA), and raw data

was generated by BeadStudio 3.0 software (Illumina Inc. San Diego, USA).

Quality control (QC)

Genetic integrity of the pedigree was checked using the PedCheck software package [68]. Data

cleaning was performed using the PLINK software developed by Purcell et al (2007) [69]. The

average call rate was 98.99% for all the subjects. SNPs were excluded from the analysis based on

the following criteria: (1) minor allele frequency (MAF) < 0.05, (2) missingness per SNP > 5%,

(3) significant (p-value < 1.0 E-06) deviation from the Hardy-Weinberg equilibrium, (4)

Mendelian error, individuals with > 5% of Mendelian error within the family and SNPs with >

10% were checked and no one was excluded. Approximately 70% of SNPs passed QC and were

used in the association analysis.

Data Analysis

We analysed the association between individual SNPs and disease trait (T2D) using the family-

based association test (FBAT) [70]. FBAT was used in this study to test for transmission rates of

marker alleles from heterozygous parents to affected offspring under the null hypothesis of no

association and no linkage. Odds ratio and confidence interval for associated SNPs were

calculated using UNPHASED [71]. With results displayed as Manhattan plots generated from

Haploview v4.1 [72]. Subsequently haplotypes of the PRKD1 gene were also analysed using

FBAT using the HBAT function of FBAT. The advantage of the FBAT method is that it permits

the analysis of large extended family pedigree. The FBAT software divides pedigrees into

212

individual nuclear families. Biallelic tests were performed using a dominant genetic model. LD

(without taking into account familial correlations) was determined using Haploview v4.1 [72].

We investigated interactions of the associated genes using the Ingenuity™ Pathway Analysis

(IPA) Ingenuity Systems Inc., Redwood City, CA). IPA is a powerful web-based software

application that uses expert compilation of molecular biology data derived from the literature and

many public databases, e.g., OMIM, MGI and NCBI Gene to identify specific biological

pathways behind each gene and enables the visualisation and analysis of direct and indirect

interactions among genes of interest. In this study we started with a list of genes of interest to

analyse the common and distinct properties of these genes and how they relate to one or another.

IPA generated networks where the gene of interest can be related according to previously known

associations between genes or proteins.

213

ACKNOWLEDGMENT

Publication number HA010-007 of the Centre for Forensic Science at the University of Western

Australia. We gratefully acknowledge the contribution of participating family members whose

cooperation made this study possible. We also would like to thank Richard Francis at Telethon

Institute for Child Health Research for his specific technical support that has allowed for the

statistical work to be carried out for this study. Part of the data analysis was performed on the

advanced computing resources provided by the Western Australian Advanced Computing

Consortia (iVEC). Habiba Alsafar is a PhD scholar at the University of Western Australia

supported by the Dubai Police General Head Quarters in the United Arab Emirates. Funding for

this project was provided in part by CVRL and the Emirates Foundation.

214

CONFLICT OF INTEREST

All the authors declare no conflict of interest.

215

REFERENCES

1. Leslie, R.D., Metabolic changes in diabetes. Eye (Lond), 1993. 7 ( Pt 2): p. 205-8.

2. Stumvoll, M., B.J. Goldstein, and T.W. van Haeften, Type 2 diabetes: principles of

pathogenesis and therapy. Lancet, 2005. 365(9467): p. 1333-46.

3. International Diabetes Federation, Diabetes Atlas, 3rd ed, 2006.

4. Lyssenko, V., et al., Mechanisms by which common variants in the TCF7L2 gene increase

risk of type 2 diabetes. J Clin Invest, 2007. 117(8): p. 2155 - 2163.

5. Frayling, T.M., Genome-wide association studies provide new insights into type 2 diabetes

aetiology. Nat Rev Genet, 2007. 8(9): p. 657-62.

6. HUGO--a UN for the human genome. Nat Genet, 2003. 34(2): p. 115-6.

7. Thorisson, G.A., et al., The International HapMap Project Web site. Genome Res, 2005.

15(11): p. 1592-3.

8. Hanson, R.L., et al., A search for variants associated with young-onset type 2 diabetes in

American Indians in a 100K genotyping array. Diabetes, 2007. 56(12): p. 3045-52.

9. Hayes, M.G., et al., Identification of type 2 diabetes genes in Mexican Americans through

genome-wide association studies. Diabetes, 2007. 56(12): p. 3033-44.

10. Rampersaud, E., et al., Identification of novel candidate genes for type 2 diabetes from a

genome-wide association scan in the Old Order Amish: evidence for replication from


56(12): p. 3053-62.



12. Takeuchi, F., et al., Confirmation of multiple risk Loci and genetic impacts by a genome-

wide association study of type 2 diabetes in the Japanese population. Diabetes, 2009.

58(7): p. 1690-9.





216

15. Yasuda, K., et al., Variants in KCNQ1 are associated with susceptibility to type 2 diabetes

mellitus. Nat Genet, 2008. 40(9): p. 1092-7.

16. Florez, J.C., et al., A 100K genome-wide association scan for diabetes and related traits in

the Framingham Heart Study: replication and integration with other genome-wide

datasets. Diabetes, 2007. 56(12): p. 3063-74.


susceptibility observed in genome-wide association data. Diabetes, 2009. 58(2): p. 505-10.

18. Voight, B.F., et al., Twelve type 2 diabetes susceptibility loci identified through large-

scale association analysis. Nat Genet. 42(7): p. 579-589.



40(5): p. 638-45.

20. Saxena, R., et al., Genome-wide association analysis identifies loci for type 2 diabetes and

triglyceride levels. Science, 2007. 316(5829): p. 1331-6.

21. Zeggini, E., et al., Replication of genome-wide association signals in UK samples reveals

risk loci for type 2 diabetes. Science, 2007. 316(5829): p. 1336-41.

22. Sladek, R., et al., A genome-wide association study identifies novel risk loci for type 2

diabetes. Nature, 2007. 445(7130): p. 881-885.

23. Scott, L.J., et al., Association of transcription factor 7-like 2 (TCF7L2) variants with type

2 diabetes in a Finnish sample. Diabetes, 2006. 55: p. 2649 - 2653.

24. Salonen, J.T., et al., Type 2 diabetes whole-genome association study in four populations:

the DiaGen consortium. Am J Hum Genet, 2007. 81(2): p. 338-45.

25. Malik, M., et al., Glucose intolerance and associated factors in the multi-ethnic

population of the United Arab Emirates: results of a national survey. Diabetes Res Clin

Pract, 2005. 69(2): p. 188-95.

26. Wild, S., et al., Global prevalence of diabetes: estimates for the year 2000 and projections

for 2030. Diabetes Care, 2004. 27(5): p. 1047-53.

27. Sandhu, M.S., et al., Common variants in WFS1 confer risk of type 2 diabetes. Nat Genet,

2007. 39(8): p. 951-3.

28. Alsmadi, O., et al., Weak or no association of TCF7L2 variants with Type 2 diabetes risk

in an Arab population. BMC Medical Genetics, 2008. 9(1): p. 72.

217

29. Saadi, H., et al., Association of TCF7L2 polymorphism with diabetes mellitus, metabolic

syndrome, and markers of beta cell function and insulin resistance in a population-based

sample of Emirati subjects. Diabetes Res Clin Pract, 2008. 80(3): p. 392 - 398.

30. Ereqat, S., et al., Association of a common variant in TCF7L2 gene with type 2 diabetes

mellitus in the Palestinian population. Acta Diabetologica, 2009.

31. Pote, K.G. and M.D. Ross, Each otoconia polymorph has a protein unique to that

polymorph. Comp Biochem Physiol B, 1991. 98(2-3): p. 287-95.

32. Wang, Y., et al., Otoconin-90, the mammalian otoconial matrix protein, contains two

domains of homology to secretory phospholipase A2. Proc Natl Acad Sci U S A, 1998.

95(26): p. 15345-50.

33. Bainbridge, K.E., H.J. Hoffman, and C.C. Cowie, Diabetes and hearing impairment in the

United States: audiometric evidence from the National Health and Nutrition Examination

Survey, 1999 to 2004. Ann Intern Med, 2008. 149(1): p. 1-10.

34. Dumont, J.E., et al., Transducing systems in the control of human thyroid cell function,

proliferation and differentiation. Adv Exp Med Biol, 1989. 261: p. 357-72.

35. Bianchi, G.P., et al., Thyroid involvement in patients with active inflammatory bowel

diseases. Ital J Gastroenterol, 1995. 27(6): p. 291-5.

36. Hansen, D., et al., Thyroid function, morphology and autoimmunity in young patients with

insulin-dependent diabetes mellitus. Eur J Endocrinol, 1999. 140(6): p. 512-8.

37. Huber, A., et al., Joint genetic susceptibility to type 1 diabetes and autoimmune

thyroiditis: from epidemiology to mechanisms. Endocr Rev, 2008. 29(6): p. 697-725.

38. Glisovic, T., et al., RNA-binding proteins and post-transcriptional gene regulation. FEBS

Lett, 2008. 582(14): p. 1977-86.

39. Crawford, T.O. and C.A. Pardo, The neurobiology of childhood spinal muscular atrophy.

Neurobiol Dis, 1996. 3(2): p. 97-110.

40. Darnell, R.B. and J.B. Posner, Paraneoplastic syndromes involving the nervous system. N

Engl J Med, 2003. 349(16): p. 1543-54.

41. Garber, K.B., J. Visootsak, and S.T. Warren, Fragile X syndrome. Eur J Hum Genet, 2008.

16(6): p. 666-72.

42. Erdo, S.L. and J.R. Wolff, gamma-Aminobutyric acid outside the mammalian brain. J

Neurochem, 1990. 54(2): p. 363-72.

218

43. Rorsman, P., et al., Glucose-inhibition of glucagon secretion involves activation of

GABAA-receptor chloride channels. Nature, 1989. 341(6239): p. 233-6.

44. Bailey, J.E. and D.J. Nutt, GABA-A receptors and the response to CO(2) inhalation - a

translational trans-species model of anxiety? Pharmacol Biochem Behav, 2008. 90(1): p.

51-7.

45. Voss, M.D., et al., Gene expression profiling in skeletal muscle of Zucker diabetic fatty

rats: implications for a role of stearoyl-CoA desaturase 1 in insulin resistance.

Diabetologia, 2005. 48(12): p. 2622-30.

46. Miyazaki, M., et al., Stearoyl-CoA desaturase-1 deficiency attenuates obesity and insulin

resistance in leptin-resistant obese mice. Biochem Biophys Res Commun, 2009. 380(4): p.

818-22.

47. Flowers, J.B., et al., Loss of stearoyl-CoA desaturase-1 improves insulin sensitivity in lean

mice but worsens diabetes in leptin-deficient obese mice. Diabetes, 2007. 56(5): p. 1228-

39.

48. Rahman, S.M., et al., Stearoyl-CoA desaturase 1 deficiency elevates insulin-signaling

components and down-regulates protein-tyrosine phosphatase 1B in muscle. Proc Natl

Acad Sci U S A, 2003. 100(19): p. 11110-5.

49. Li, J., et al., The role of protein kinase D in neurotensin secretion mediated by protein

kinase C-alpha/-delta and Rho/Rho kinase. J Biol Chem, 2004. 279(27): p. 28466-74.

50. Yaney, G.C., et al., Potentiation of insulin secretion by phorbol esters is mediated by

PKC-alpha and nPKC isoforms. Am J Physiol Endocrinol Metab, 2002. 283(5): p. E880-

8.

51. Valverde, A.M., et al., Molecular cloning and characterization of protein kinase D: a

target for diacylglycerol and phorbol esters with a distinctive catalytic domain. Proc Natl

Acad Sci U S A, 1994. 91(18): p. 8572-6.

52. Van Lint, J.V., J. Sinnett-Smith, and E. Rozengurt, Expression and characterization of

PKD, a phorbol ester and diacylglycerol-stimulated serine protein kinase. J Biol Chem,

1995. 270(3): p. 1455-61.

53. Sumara, G., et al., Regulation of PKD by the MAPK p38delta in insulin secretion and

glucose homeostasis. Cell, 2009. 136(2): p. 235-48.

219

54. Kotani, K., et al., Requirement of atypical protein kinase clambda for insulin stimulation

of glucose uptake but not for Akt activation in 3T3-L1 adipocytes. Mol Cell Biol, 1998.

18(12): p. 6971-82.

55. Cross, D.A., et al., Inhibition of glycogen synthase kinase-3 by insulin mediated by protein

kinase B. Nature, 1995. 378(6559): p. 785-9.

56. Franke, T.F., et al., Direct regulation of the Akt proto-oncogene product by

phosphatidylinositol-3,4-bisphosphate. Science, 1997. 275(5300): p. 665-8.

57. Meglasson, M.D. and F.M. Matschinsky, Pancreatic islet glucose metabolism and

regulation of insulin secretion. Diabetes Metab Rev, 1986. 2(3-4): p. 163-214.

58. Cook, D.L., et al., ATP-sensitive K+ channels in pancreatic beta-cells. Spare-channel

hypothesis. Diabetes, 1988. 37(5): p. 495-8.

59. De La Tour, D.D., et al., Erythrocyte Na/K ATPase activity and diabetes: relationship with

C-peptide level. Diabetologia, 1998. 41(9): p. 1080-4.

60. Greene, D.A., et al., Role of sorbitol accumulation and myo-inositol depletion in

paranodal swelling of large myelinated nerve fibers in the insulin-deficient spontaneously

diabetic bio-breeding rat. Reversal by insulin replacement, an aldose reductase inhibitor,

and myo-inositol. J Clin Invest, 1987. 79(5): p. 1479-85.

61. During, M.J., et al., Glucose modulates rat substantia nigra GABA release in vivo via

ATP-sensitive potassium channels. J Clin Invest, 1995. 95(5): p. 2403-8.

62. Margaill, I., et al., KATP channels modulate GABA release in hippocampal slices in the

absence of glucose. Fundam Clin Pharmacol, 1992. 6(7): p. 295-300.

63. Chan, O., et al., ATP-sensitive K(+) channels regulate the release of GABA in the

ventromedial hypothalamus during hypoglycemia. Diabetes, 2007. 56(4): p. 1120-6.

64. Ives, J.H., D.L. Drewery, and C.L. Thompson, Neuronal activity and its influence on

developmentally regulated GABA(A) receptor expression in cultured mouse cerebellar

granule cells. Neuropharmacology, 2002. 43(4): p. 715-25.

65. Brinton, R.D., R.H. Thompson, and E.A. Brownson, Spatial, cellular and temporal basis

of vasopressin potentiation of norepinephrine-induced cAMP formation. Eur J Pharmacol,

2000. 405(1-3): p. 73-88.

220

66. Bulleit, R.F. and T. Hsieh, MEK inhibitors block BDNF-dependent and -independent

expression of GABA(A) receptor subunit mRNAs in cultured mouse cerebellar granule

neurons. Brain Res Dev Brain Res, 2000. 119(1): p. 1-10.

67. Alberti, K.G. and P.Z. Zimmet, Definition, diagnosis and classification of diabetes

mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus

provisional report of a WHO consultation. Diabet Med, 1998. 15(7): p. 539-53.

68. O'Connell, J.R. and D.E. Weeks, PedCheck: a program for identification of genotype

incompatibilities in linkage analysis. Am J Hum Genet, 1998. 63(1): p. 259-66.

69. Purcell, S., et al., PLINK: a tool set for whole-genome association and population-based

linkage analyses. Am J Hum Genet, 2007. 81(3): p. 559-75.

70. Laird, N.M., S. Horvath, and X. Xu, Implementing a unified approach to family-based

tests of association. Genet Epidemiol, 2000. 19 Suppl 1: p. S36-42.

71. Dudbridge, F., Pedigree disequilibrium tests for multilocus haplotypes. Genet Epidemiol,

2003. 25(2): p. 115-21.

72. Barrett, J.C., et al., Haploview: analysis and visualization of LD and haplotype maps.

Bioinformatics, 2005. 21(2): p. 263-5.

221

CHAPTER 7

A GENOME-WIDE ASSOCIATION STUDY

EXAMINING OBESE FACTORS IN AN ARAB FAMILY

WITH A HISTORY OF TYPE 2 DIABETES

This chapter was a submission to the American Journal Human Genetics and the format

presented is as per the "Instruction to Authors" from the publishing house.

222

223

Chapter 7

A Genome-Wide Association Study Examining Obese

Factors in an Arab Family with a History of Type 2

Diabetes

Chapter 7 was prepared as a manuscript for submission to The American Journal of Human

Genetics. The aim of the study presented in this manuscript was to detect and characterise

genes that may influence susceptibility to obesity in Type 2 Diabetes patients from volunteers

of a study population from the United Arab Emirates.

To date, the genes responsible for the obese phenotype in Arabs are not known. Obesity is a

principal factor that contributes to Type 2 Diabetes. Consequently, a genome wide screen for

obesity among the UAE population of Arab descent was initiated. This study paved the way

towards identifying susceptibility genes for obesity in the UAE population. If genetic profiling

can be used successfully to identify high-risk individuals to obesity, this would result in

substantial benefits to both individuals and society. Targeting preventive measures for

individuals with high-risk genotypes could delay the onset of the disease, slow its progression,

and reduce the ultimate severity of the condition. This would result in substantial

improvements in quality of life for affected individuals and a reduction in healthcare costs.

The identification of target genes might also lead to the development of novel therapeutic

modalities.

In this chapter, we specifically investigated the genetic associations with obesity in one

extended Emirati family of 319 members only 178 were genotyped. Given that Body Mass

Index (BMI) and Waist Circumference (WC) play a more prominent role in the development of

diabetes in this population, we studied the relation between these two traits with 657,367

Single Nucleotide Polymorphisms (SNP). This study supports the influence of both

environmental and genetic factors in the pathophysiology of Type 2 Diabetes and its related

phenotypes in an Arab population. The study revealed four loci that were significant. Two

224

loci in ADAM30 and JAZF1 which were shown to be associated with Type 2 Diabetes in

Caucasian population through a meta-analysis in previous study it have been also shown to be

associated with the Type 2 Diabetes in Arabs population. Two novel associations were noted

in this study: one novel locus on chromosome 16 within the FBXO31 locus was shown to be

associated with the WC phenotype, and one SNP in GALNTL4 of chromosome 11 was found to

be associated with BMI. The results presented show a strong familial aggregation of

quantitative traits associated with Type 2 Diabetes.

My colleagues and I prepared this manuscript. I completed all laboratory work at Central

Veterinary Research Laboratory (CVRL) under the guidance of Dr Khazanehdari. I

preformed the data analysis and drafted the first version of this manuscript. Mr Francis

provided advise on the relevant bioinformatics modules required for the study and established

working accounts to enable complete analysis of the data. Dr Jamieson worked through 4

different software packages with me to ascertain the relevant analytical tools for the data

gathered. Drs Cordell and Blackwell provided support and advice regarding the statistical

methods and identified the relevant analytical tools. Dr Tay guided me throughout the study

from designing the study to proof reading the manuscripts.

225

A Genome-Wide Association Study Examining Obese Factors in an Arab Family with a

History of Type 2 Diabetes.

Habiba S Al Safar1, 2, Heather J Cordell3, Sarra E Jamieson4, Richard Francis4, Kamal

Khazanehdari5, Guan K Tay1 Jenefer M Blackwell4,6

1 Centre for Forensic Science, The University of Western Australia, Crawley Western

Australia. 2 Dubai Police General Head Quarters, Dubai, United Arab Emirates. 3 Institute of Human Genetics, Newcastle University, Newcastle upon Tyne, United

Kingdom.

4 Telethon Institute for Child Health Research, Centre for Child Health Research, The

University of Western Australia, Subiaco, Western Australia.

5 Molecular Biology and Genetics, Central Veterinary Research Laboratory, Dubai, United

Arab Emirates.

6 Cambridge Institute for Medical Research and Department of Medicine, School of Clinical,


Abbreviated title: GWAS of an Arab Family with T2D Keywords: Type 2 Diabetes, GWAS, QTDT, Arab family, Body Mass Index, Waist Circumference. Publication number HA010-006 of the Centre for Forensic Science at the University of Western Australia


Professor Jenefer Blackwell Head, Division of Genetics and Health Telethon Institute for Child Health Research 100 Roberts Road, Subiaco, WA 6008 PO Box 855, West Perth, WA 6873 Tel: +61 8 9489 7910 Fax: +61 8 9489 7700 Email: [email protected]

226

227

ABSTRACT

Overweight and obesity are major risk factors for a number of chronic diseases, including

Type 2 Diabetes (T2D), cardiovascular disease and cancer. In the United Arab Emirates

(UAE), it has been estimated that some twenty percent of adults suffer from obesity. The

incidence of T2D in the UAE population is also among the highest in the world. To identify

factors that result in obesity, and its association with T2D, we conducted a Genome-Wide

Association Study (GWAS) and specifically assessed genetic associations with "Body Mass

Index" (BMI) and "Waist Circumference" (WC). GWAS analysis of 178 individuals in an

extended family of Arab descent revealed four loci that reached genome-wide significance,

two of which were found in previous studies. The previously described association between

the Single Nucleotide Polymorphism (SNP) at position rs2793823 within the ADAM30 locus

(identified through meta-analysis of a GWAS study of subjects of Caucasian descent) was also

shown to be associated with the disease in Arabs (p = 1.86E-8). Our study also confirmed the

association between SNPs within the JAZF1 loci and BMI, WC and T2D as reported in other

studies. Two novel associations were noted in our study: (1) a novel locus on chromosome 16

within the FBXO31 locus (rs9308437, p = 7.5E-7) was shown to be associated with the WC

phenotype, and (2) the SNP (rs7120774) in GALNTL4 of chromosome 11 was found to be

associated with BMI (p =1.82E-10). FBX031 is a candidate gene for breast cancer, whereas

GALNTL4 plays a role in insulin stimulated glucose transport in muscle. Work continues to

replicate the two latter findings in independent cohorts to confirm the involvement of FBXO31

and GALNTL4.

228

INTRODUCTION

Obesity is increasing at an alarming rate throughout the world. It is recognised as a major

global public health concern, with much of the underlying problem resulting from to poor

lifestyle factors including unhealthy eating habits and the lack of exercise overlaid on specific

genetic backgrounds that compound the weight gain. Obesity is a chronic condition that

results from increase in body weight in adults and is arguably considered to be the most

important risk factor leading to metabolic diseases such as Type 2 Diabetes (T2D).

Management of the disease can be as simple as adopting life style changes. For example,

obesity in patients that is a consequence of insulin resistance or a reduced number of insulin

receptors can be reversible by weight control and loss [1, 2]. Insulin resistance, a condition in

which cells do not use insulin as they should, results in high levels of sugars in the

bloodstream and can lead to diabetes. A consequence of overall obesity is the accumulation of

body fat; the specific location of this fat has been associated with the development of

cardiovascular disease, stroke, and diabetes [3]. Therefore, the disease reduces life quality and

increases morbidity and mortality [4]. Many studies have specifically reported associations

between the obesity markers Body Mass Index (BMI) and/or Waist Circumference (WC) with

T2D in adults [5-14]. In addition, it has long been recognised that abdominal obesity, assessed

by WC rather than BMI, can be important, and the weighted evidence indicates that the ratio

of WC to BMI predicts a greater variance in health risk than does BMI alone [15, 16].

Microarray based genotypong technology is increasingly being used to investigate complex

metabolic and non-metabolic diseases. Due to the decreasing cost, the convenience and

improved resolving power of Genome Wide Association Studies (GWA) [17, 18], scientists

have been focusing on studying the association between genetic components and risk factors

such as obesity in major metabolic diseases including T2D. Recent genome-wide association

studies have identified multiple risk loci common to obesity, including FTO, MC4R,

TMEM18, GNPDA2, SH2B1, KCTD15, MTCH2, NEGR1 and PCSK1 [19-23].

For example, studies have revealed strong associations between two loci FTO and MC4R with

BMI and WC [24, 25]. FTO is highly expressed in hypothalamic nuclei that control eating

229

behavior [26]. It catalyzes Fe(II)- and 2OG-dependent DNA demethylation [26], however the

role of FTO related DNA methylation in obesity is unknown. MC4R is known as a G-protein

coupled receptor (GPCR). These receptors sense signals such as light; chemicals or hormones

and mutations in the receptors are implicated in many diseases such as Diabetes [27]. If a

mutations causes slight changes in MC4R, it may be sufficient to increase food intake,

ultimately leading to obesity [27].

The study described here was conceived as a GWAS to investigate genetic associations with

BMI and WC and T2D in a population not previously studied, specifically in an extended

family of Arab origin from the United Arab Emirates (UAE). It is understood that obesity is a

metabolic disorder, however, the relationship between obesity and T2D is not yet clear.

Investigating associations between the patient genetic makeup and BMI or WC as well as T2D

may provide clues on the mechanism of the genes that are involved.

Although GWAS studies have previously reported associations between BMI with risk of T2D

in populations such as Caucasians and Orientals [21, 25, 26, 28], no such study has been

carried out on the Arab population. Lifestyle changes of this population in recent times have

significantly increased weight gain early in adult life, and are believed to be a major

contributing factor to the obesity epidemic and associated diseases such as T2D.

The purpose of this study was to investigate the genetic associations with obesity in an

ethnically homogeneous cohort from UAE. In this manuscript, BMI and WC are the primary

focus. The relation between these traits and 657,367 Single Nucleotide Polymorphisms (SNP)

in one extended Emirati family of 319 members (of which 178 were genotyped) was studied.

230


Participants of study

One hundred seventy eight (n=178) individuals from one extended family of Arab origin in the

United Arab Emirates (UAE) agreed to take part in this study (86 males, 92 females and 66

diabetic, 112 healthy). Clinical assessments were conducted and questionnaires were

completed at the Al-Etihad clinic in Dubai. All participants gave their informed consent in

writing. The study was approved by the Ethics Committees of the Ministry of Health in the


Collection of Phenotype data

Trained nurses measured the height and weight of each participant using a calibrated wall-

mounted stadiometer and a weigh scale, respectively. Body Mass Index (BMI) was calculated

as weight in kilograms divided by the square of height of each subject (kg/m2). Waist

Circumference (WC) was measured in inches. In this Over weight and obesity was defined

according to World Health Organization (WHO) [29]. A WHO classification for BMI over

weight ranges between 25 to 30 kg/m2 SO and for high waist circumference is defined as ≥ 35

inches for females and ≥ 40 inches for males..

Genotyping

Genotyping was performed on the Infinium Human 660W Quad chip according to the

manufacture’s recommendations (Illumina Inc. San Diego, USA) at the Molecular Biology &

Genetics Department, Central Veterinary Research Laboratory based in Dubai, United Arab

Emirates. A total of 200ng of genomic DNA at 37°C for 20 to 24 hours was amplified using

Whole Genome amplification. Products were fragmented, precipitated, and resuspended in a

proprietary hybridisation buffer (Illumina Inc., San Diego, USA). The resuspended samples

were denatured at 95°C for 20 min and loaded on Illumina Bead Chips. The chips were placed

in a hybridisation chamber for 16 to 20 hours at 48°C. After hybridisation, non-hybridised

DNA was washed away. An allele-specific single-base extension of the oligonucleotides on

the BeadChip was performed in a 48-position Slide Chamber Rack (Illumina Inc., San Diego,

USA), using labelled deoxynucleotides and the captured DNA as a template. After staining of

the extended DNA, BeadChips were washed and scanned with I-Scan (Illumina Inc., San

231

Diego, USA), and raw data was generated by BeadStudio 3.0 software (Illumina Inc. San

Diego, USA).

Statistical Methods

Heritability and power calculations for BMI and WC were performed using the SOLAR

package to evaluate the influence of genetic components on phenotypic variation [30]. Data

quality control (QC) was performed using PLINK [31] to remove SNPs with a minor allele

frequency (MAF) <0.05, >5% missing genotype rate, failing the Hardy-Weinberg equilibrium

(HWE) test at the 0.000001 significance level and Mendelian error. Approximately 70% of

SNPs passed QC and were used in the association analysis. Samples that failed quality control

were also excluded from the analysis. The average call rates for 178 samples were 98.99%. In

addition, PedCheck was also used to identify errors in the familial relationships [32]. Genome-

wide association testing between SNPs and Quantitative traits (BMI and WC) was performed

using the orthogonal model in the quantitative trait transmission disequilibrium test (QTDT)

program, in which the total association is partitioned into orthogonal within- and between-

family components [33].

232

RESULTS

One hundred and seventy eight family members, 112 non-diabetic subjects and 66 diabetic,

were genotyped in this study. The clinical characteristics of the study group are summarised

in Table 1. The age range between both subject categories were similar, ranging from 18 to 87

in patients and 18 to 97 in healthy volunteers.

The estimated heritability and power for the two traits used to evaluate the influence of genetic

component on phenotypic variation are shown in Table 2. BMI and WC showed significant

levels of heritability (p < 1e-6). Our study had greater than 80% power to detect a single locus

accounting for all the heritability at a logarithm of the odds (LOD) =3.

The association p-values (Manhattan plot) for the two quantitative traits BMI and WC are

shown in Figure 1 and Figure 2 respectively. The highest scoring SNPs for association with

BMI and WC are shown in Tables 3 and 4. The SNP with the lowest p-value (8.97E-14) in

BMI was rs11711029 located on chromosome 3. This particular SNP is located within the

HPS3 (GeneID 84343) gene. In addition the SNP with the lowest p-value (7.55E-07) for the

WC trait was rs9308347 located on chromosome 16. This SNP is within FBXO31

(GeneID79791) gene. These SNPs have not previously been shown to have reached genome-

wide significance in studies involving other populations.

Tables 5 & 6 show previously reported genes associated with BMI and WC. In the present

study, only three loci with slight significance, rs6265 in BDNF, rs1333026 in an unknown

gene and rs10838738 in MTCH2 (p-values 0.026, 0.016 and 0.004 respectively) were

observed. Two genes, FTO and MC4R, identified in previous studies as genes related to BMI

and WC, were not significant in our study.

233

Table 1: Characteristics of 178 family member of Arab origin in this study.

Description Number

Males 86

Females 92

Type 2 Diabetes 66

Healthy 112

Variable

Physical Appearance

T2D

Age Range (years) 18-97

Mean Waist Circumference (inches)

Male 37.96 ± 5.13

Female 39.84 ± 5.40

Mean Body Mass Index (kg/m2) 30.40 ± 6.23

Healthy

Age Range (years) 18-97

Mean Waist Circumference (inches)

Male 38.20 ± 8.70

Female 37.85 ± 9.13

Mean Body Mass Index (kg/m2) 29.00 ± 8.82

Mean data are provided with + Standard Deviation.

234

Table 2: Heritability and power estimation to obtain a suggested (LOD =3) of two quantitative

traits (BMI and West Circumference) in 178 individuals. Values have been adjusted

for sex and age.

Trait H2r p-value Chi-square Power estimate

Waist Circumference 0.44 2.6 E-9 34.04 > 80%

Body Mass Index 0.48 1.0 E-6 28.01 > 90%

235

Figure 1: Manhattan plot of −log10 (observed p-value) across the genome or each GWAS SNP tested for association with BMI in

178 individual. Horizontal axis shows SNP location and vertical axis is −log10 (p-value) for each SNP tested. Red line

shows SNPs and implicated genes with p-values beyond the genome-wide significance threshold (1.5×10−7)

236

Figure 2: Manhattan plot of −log10 (observed p- value) across the genome or each GWAS SNP tested for association with Waist

Circumference in 178 individual. Horizontal axis shows SNP location and vertical axis is −log10 (p-value) for each SNP

tested. Red line shows SNPs and implicated genes with p-values beyond (1.5×10−7)

237

Table 3: Top association results for BMI based on QTDT and their position, chi-square and their p value

Trait Chr SNP Position Chi-square p value Gene

BMI

1 rs197438 112082197 42.58 6.79E-11 C1orf1831 rs584096 112131501 29.48 5.65E-08 KCND31 rs2788407 112563162 35.22 2.94E-09 1 rs2793823 120239241 31.64 1.86E-08 ADAM301 rs11204894 150059798 31.81 1.70E-08 RORC2 rs2368424 184176711 32.11 1.46E-08 2 rs1349825 184201575 31.63 1.87E-08 2 rs2056156 189556713 28.79 8.07E-08

COL3A1 2 rs3106796 189558018 31.46 2.04E-082 rs12052514 191086415 35.28 2.86E-09 TMEM194B2 rs6431635 234386183 29.91 4.53E-08 2 rs4663525 235518053 31.35 2.15E-08 2 rs2042831 235521853 33.77 6.20E-09 2 rs3731644 235614616 31.72 1.78E-08

SH3BP4 2 rs3731646 235614741 30.48 3.37E-082 rs3731648 235615023 31.37 2.13E-082 rs13396122 237373250 45.32 1.67E-11 3 rs13088151 3024520 29.79 4.81E-08

CNTN4 3 rs17024684 3030247 32.28 1.33E-083 rs7634908 3530312 31.25 2.27E-08 3 rs9853064 3531909 31.25 2.27E-08 3 rs17042585 5468389 39.99 2.55E-10 3 rs6443195 8576334 33.82 6.05E-09

LMCD1 3 rs1876611 8578483 33.11 8.71E-09

238

Table 3 (continued)


BMI

3 rs342892 147817321 38.16 6.52E-10 3 rs342938 147833036 37.65 8.46E-10 3 rs10049224 148733855 33.93 5.71E-09 3 rs4681169 150335145 39.17 3.88E-10

HPS3 3 rs4681487 150336167 38.88 4.51E-103 rs12487928 150343683 39.17 3.88E-103 rs11711029 150345546 55.58 8.97E-143 rs2689225 150349398 38.00 7.07E-104 rs3774820 5511959 29.35 6.04E-08

STK32B 4 rs3774813 5521161 33.84 5.98E-094 rs7679731 5977333 30.55 3.25E-08 4 rs7694823 7583941 34.68 3.89E-09 SORCS24 rs1441689 29155396 31.87 1.65E-08 4 rs10002254 29235963 32.94 9.50E-09 4 rs3846269 30021293 33.74 6.30E-09 4 rs1357462 31624867 34.40 4.49E-09 5 rs13160153 1570198 30.09 4.12E-08 LPCAT15 rs13187652 3148939 29.75 4.92E-08 5 rs10035578 171408294 28.92 7.54E-08 STK108 rs10105056 13655257 45.45 1.57E-11 8 rs10109857 13657264 44.36 2.73E-11 8 rs352774 15692048 30.29 3.72E-08 8 rs1670189 124241390 34.77 3.71E-09 9 rs872257 2486567 30.01 4.30E-08 FLJ35024

10 rs1444418 64230476 48.06 4.13E-12

239

Table 3 (continued)


BMI

10 rs4746781 64415595 30.92 2.69E-08 10 rs6479868 64490002 30.55 3.25E-08 10 rs12770187 64578695 34.01 5.48E-09 DKFZp564C1664, NRBF211 rs7120774 13759495 40.65 1.82E-10 GALNTL411 rs12275375 14593967 32.84 1.00E-08 11 rs10500802 96108514 36.78 1.32E-09 PSMA111 rs3019711 99509561 28.49 9.42E-08 11 rs11222898 77236017 28.33 1.02E-07 CNTN513 rs9600927 77246192 31.83 1.68E-08 SLAIN1and DKFZp434A242213 rs7328292 78497513 38.70 4.94E-10 13 rs1112971 78499127 29.44 5.77E-08 BX647243 and AK09577913 rs17181627 19906230 28.59 8.94E-08 13 rs7334914 20742953 28.43 9.71E-08 15 rs11637445 25661883 38.04 6.93E-10 MAP2K521 rs12185827 26557786 29.88 4.60E-08 21 rs2826261 26756022 32.04 1.51E-08 21 rs2151 16751358 29.10 6.87E-08 21 rs468241 16757199 28.92 7.54E-08 21 rs190100 23796553 29.35 6.04E-08 22 rs5747395 29431646 38.43 5.68E-10

MICAL3 22 rs8141766 33836426 30.31 3.68E-0822 rs6004423 112082197 31.16 2.38E-08 KIAA1671 and CTA-221G9.522 rs9606766 112563162 28.29 1.04E-07 OSBP2 and KIAA166422 rs4820180 150059798 32.83 1.01E-08

240

Table 4: Top association results for Waist Circumference based on QTDT analysis and their position, chi-square and their p value


Waist Circumference

1 rs17534243 38423504 15.06 1.04E-04 1 rs7526314 40834106 16.75 4.26E-05 1 rs12079703 48779621 15.70 7.42E-05 AGBL41 rs2494316 192759116 16.47 4.94E-05 2 rs7578740 12336203 15.68 7.50E-05 AK0015584 rs11737601 10097666 17.24 3.29E-05 4 rs3749558 10103101 18.14 2.05E-05 CLNK4 rs13109005 10119979 16.34 5.29E-05 4 rs1004327 10120581 18.14 2.05E-05 4 rs4698497 16226648 19.79 8.64E-06 LDB24 rs1031326 17097162 16.81 4.13E-05 QDPR4 rs2939720 37235621 15.30 9.17E-05 C4orf194 rs6830246 41334756 16.57 4.69E-05 LIMCH14 rs13117610 41921681 19.25 1.15E-05 4 rs729467 41935895 15.82 6.97E-05 4 rs4861178 41946387 17.93 2.29E-05 4 rs13113565 42251414 17.25 3.28E-05 ATP8A14 rs7666279 42252312 15.49 8.29E-05 ATP8A14 rs17026425 150891964 16.16 5.82E-05 BC0310925 rs2217346 15664320 16.18 5.76E-05 FBXL75 rs12757 15682061 17.03 3.68E-05 5 rs7704791 15685099 18.03 2.17E-05 5 rs12652447 15727635 17.66 2.64E-05 6 rs510957 151383747 20.71 5.34E-06 DKFZp586G1517 & MTHFD1L

241

Table 4 (continued)


Waist Circumference

7 rs2091321 47372660 15.95 6.50E-05 TNS37 rs6964472 47383258 16.56 4.71E-05

CSMD1 7 rs12668378 53976764 16.17 5.79E-057 rs304749 79816339 16.34 5.29E-057 rs17162763 89503141 16.38 5.18E-058 rs1112779 4358577 19.48 1.02E-05 9 rs12004565 77182642 17.67 2.63E-05

IGM1 10 rs703424 119939738 15.63 7.70E-0513 rs465051 31570463 18.37 1.82E-05 FRY13 rs9603579 39102695 16.20 5.70E-05 13 rs585206 41587853 17.04 3.66E-05 DGKH14 rs8010158 38066347 16.27 5.49E-05 14 rs2415487 38094871 17.67 2.63E-05 14 rs1597353 38103054 16.80 4.15E-05 14 rs11626845 38151069 15.46 8.43E-05 16 rs150348 55673537 15.77 7.15E-05 NLRC516 rs4843479 85475004 15.79 7.08E-05 16 rs7203346 85862570 18.86 1.41E-05 AK12574916 rs1862788 85893927 17.48 2.90E-05 16 rs7192413 85913378 19.51 1.00E-05 16 rs9308347 85929253 24.47 7.55E-07 FBXO3120 rs4810899 35643566 20.20 6.98E-06 22 rs8135417 29889255 15.77 7.15E-05 RNF185

242

Table 5: Comparison of BMI results with prior literature for SNPs, which are present in this study and our p value, results.

Traits Gene SNP References p value

BMI

BCDIN3D, FAIM2 rs7138803 [54] 0.740

BDNF

rs6265 [54]

0.025 rs925946 0.920 rs7481311 -

BMP2 rs2145270 [23] - C20orf133 rs6110577

[55] 0.823

FBN2 rs374748 - FLJ20309 rs7603514 0.862

FTO

rs9939609 [23] - rs8050136 [54] 0.751 rs9939609 [21] - rs6499640 [54] 0.208 rs1121980 [22] - rs1421085 [56] - rs1121980 [57] - rs9941349 [55] 0.178 rs9930506 [58] -

GNPDA2 rs10938397 [23] -

Intergenic

rs1106683 [28]

- rs1106684 - rs1333026 0.016

ITPR3 rs999943 [55] - KCTD15 rs11084753 [23] - KCTD15, CHST8 rs29941 [54] 0.479

243

Table 5 (continued)


BMI

MAF rs1424233 [56] 0.577

MC4R rs17782313 [23] - rs12970134 [54] 0.823

MLN rs2274459 [55] 0.639 MTCH2 rs10838738 [23] 0.004 MUC15 rs12295638 [55] -

NEGR1 rs2568958 [54] 0.823 rs2815752 [23] -

NPC1 rs1805081 [56] 0.729 NR rs10783050 [54] 0.791 PRF1 rs10999409 [55] 0.265 PTER rs10508503 [56] - RAFTLIN rs12635698

[55] 0.532

RARB rs1435703 - RKHD3 rs12324805 [23] 0.055 RTN4 rs6726292 [55] 0.806 SEC16B, RASAL2 rs10913469

[54] 0.887

SFRS10, ETV5, DGKG rs7647305 0.289 SH2B1, ATP2A1 rs7498665 -

TMEM18 rs6548238 [23] - rs7561317 [54] 0.777

TRAM1L1 rs10433903 [55] - TRHR rs7832552 [59] - ZNF248 rs7474896 [55] 0.145

244

Table 6: Comparison of Waist Circumference results with prior literature for SNPs, which are present in this study and our p value, results.


Waist Circumference

CDH12 rs4701252 [60] 0.639

CETP rs3764261 [61] 0.064

FAIM2, BCDIN3D rs7138803

[60]

0.348

FTO rs1558902 -

GCKR rs1260326 [25] 0.104

GDAP1 rs4471028

[28]

0.152

Intergenic rs1875517 -

LPL rs2083637

[25]

0.559

MC4R rs12970134 0.624

MC4R rs489693

[60]

-

NRXN3 rs10146997 0.862

OVCH2 rs7932813 -

PKHD1 rs1555967 -

245

DISCUSSION

In the current study, our aim was to perform GWAS analysis to detect genetic variants that

affect the incidence of T2D in one extended family of Arab origin from the United Arab

Emirates. One of the factors that increases the risk of T2D is obesity. Obesity is a complex

problem, which cannot be entirely explained by one factor alone. Multiple genes may increase

one’s susceptibility for obesity and the phenotype may also be affected by outside factors;

such as abundant food supply or little physical activity. For example, a study conducted by

Froguel and his group identified two forms of the GAD2 gene. One protected against obesity,

the other made it more likely by stimulating the appetite [34].

Many previously discovered genes associated with obesity are active in the brain, and could

affect behavior around food, rather than how the body breaks down fat or uses up energy.

Researchers found that the NRXN3 gene variant previously associated with alcohol

dependence, cocaine addiction, and illegal substance abuse also predicts the tendency to

become obese [35]. Another study explained how BDNF work in combination with a variety

of other substances that regulate appetite and body weight [36]. Interestingly, considering how

many factors are involved in obesity, it is interesting that research is increasingly pointing to

the brain as being very important in its development.

To investigate genetic determinants of obesity and T2D, a total of 657,367 SNPs were

genotyped in 178 members of one Emirati family of five generations. Out of the 178 members,

only 66 were diagnosed with T2D, with the overall BMI mean for T2D patients (30.4 ± 6.23)

and WC (37.96 ± 5.13) in male and (39.84 ± 5.40) in female. It is interesting to see that there

is significant phenotype correlation (70%) between the BMI and Waist Circumference, which

is also related to obesity (data not shown). This is consistent with an influence of both

environmental and genetic factors in the pathophysiology of T2D and its related phenotypes in

an Arab population. Furthermore the results presented in Table 2 show a strong familial

aggregation of quantitative traits WC and BMI, which are known to be associated with T2D

and which may play a more prominent role in the development of diabetes in this population.

246

The most noteworthy outcomes of this study were associations detected at the ADAM30,

GALNTL4, JAZF1, and FBX031 gene regions. The associations at ADAM30 and JAZF1

replicate the associations at GALNTL4 and FBX031 represent novel findings.

A meta-analysis that was carried out by Zeggini et al. validated that a SNP (rs10923931)

located in chromosome 1 in ADAM30 gene is associated with T2D with a p-value 4E-8 [37].

In our study one novel SNP (rs2793823) located in the same gene reached a genome-wide

significance threshold of p=1.86E-8 for association with BMI. The function of ADAM30

(ADAM metallopeptidase domain 30) is still poorly understood. JAZF1 gene is another gene

which was studied by Zeggini et al., and the SNP rs864745 (p=5.00E-14) was associated with

T2D [37]. In our study, two novel SNPs in the same gene rs10268254 and rs38523 were

slightly significant (p-values 0.020, 0.0397 respectively). However very little is known about

the biological function of JAZF1, yet, since JAZF1 is expressed in the pancreas [38] one might

consider that a gain of function variant in JAZF1 may direct to post natal growth restriction

also affecting pancreatic β-cell mass and function. Our study also confirmed that a SNP

(rs7120774) in GLUT4 gene located in chromosome 11 is related to obesity (p = 1.82E-10).

GLUT4 isoform is primarily responsive to insulin and accounts for the majority, if not all, of

insulin-stimulated glucose transport in muscle and adipose tissue under normal physiological

circumstances.

This study identified several loci that were not detected earlier and are associated with T2D

with GWAS significant probability values (p ≤1.00E-7). The most significant statistical

evidence for association with BMI was found in rs9308347 (p = 8.97E-14) in HPS3 gene and

rs9308347 in gene FBX031 (p = 7.50E10-7) for WC.

In this study we also detected association at lower levels of significance with three novel SNPs

located in ATP8A1 gene (p= 3.8E-05 and 8.29E-5), which belongs to the type 4 subfamily of

p-type ATPases to be associated with WC. ATP8A1 is highly distributed in skeletal muscle

and thyroid tissues [39]. ATP10A and ATP10D have been proposed as candidates for obesity

and HDL-cholesterol level respectively [40]. Since ATP10A and ATP10D belong to the same

class of P4 ATPases [41, 42], ATP8A1 may be involved in similar pathways. Therefore, they

may play a role in glucose uptake and fat metabolism.

247

Genes which are contributing to other diseases than T2D are also found to be significant in

this study. This may be due to the fact that T2D patients participated in this study were

suffering from other complications such as breast cancer (in six of the female patients), and

their WC was (42.83 ± 3.76). Therefore, genes such as FBXO31 (rs9308437, p = 7.5E-7) were

found to be significant in this study. FBXO31 showed a GWAS significant value (p-value) for

association with WC. FBXO31 is located in chromosome 16q24.3, a region in which there is

loss of heterozygosity in breast, ovarian, hepatocellular and prostate cancers [43-47].

Scientists concluded that obesity and physical inactivity may account for 25 to 30 percent of

several diseases including major cancers; such as cancers of the colon, breast, endometrium,

kidney, and esophagus [48]. Specifically, Obesity seems to increase the risk of breast cancer

only among postmenopausal women [49] who have an increased levels of estrogen due to their

overweight condition [50]. After menopause, when the ovaries stop producing hormones, fat

tissue becomes the most important estrogen supply [51]. Estrogen levels in postmenopausal

women are 50 to 100 percent higher among heavy versus lean women [52]. Therefore

estrogen-sensitive tissues are exposed to more estrogen stimulation in heavy women, leading

to a more rapid growth of estrogen-responsive breast tumors. Therefore, this gene might play a

role in body weight gain and subsequently in T2D.

Our study found five SNPs (rs4681169, rs4681487, rs12487928, rs11711029 and rs2689225)

to have significant p-value of 3.88E-10, 4.51E-10, 3.88E-10, 8.97E-14 and 7.07E-10,

respectively. These five SNPs located on chromosome 3 within HPS3 gene. HPS3 is one of

the subtypes of Hermansky-Pudlak syndrome (HPS). HPS is a rare genetic autosomal

recessive disorder which occur due to defects in the melanosome, platelet-dense granule, and

lysosome organelles of cells found in various cell types [53]. So far, there are no previous

studies which showed any association of this gene with the T2D and obesity.

In this study we have not seen any significant association of FTO and MC4R genes with our

traits (BMI and WC). The different genetic background between Caucasian and Arab

populations could explain the non-significance results of these two genes. Thus in our

population these genes might not be involved or association with these genes may only emerge

when large sample sizes are analyzed. It should also be noted that the non-significance of FTO

SNP may be partly explained by the similar BMI between the T2D patients and healthy

individuals in our sample.

248

An interesting aspect to our study is the use of 178 individuals from a single large pedigree.

This means that the test we employed (the orthogonal model of the QTDT) could actually be

considered to represent a joint test of linkage and association rather than a test of association

per se. In fact, in a large pedigree such as this one, one could argue that linkage and

association are essentially the same thing - the correlation between phenotype and marker

alleles occurs firstly because the marker allele happened to be in coupling with the trait allele

on one (or several) haplotypes in founders, and secondly because the marker and disease

alleles are transmitted together through the pedigree (due to a lack of recombination). In our

pedigree, there are likely to be a much larger number of observations for linkage (refecting

this lack of recombination between trait and marker alleles) than there are for association

(reflecting the fact that the trait and marker allele are correlated in the founders, perhaps due to

linkage disequilibrium in the general population). In theory one could account for the linkage

component of the test in the QTDT through incorporation of observed identity-by-descent

(IBD) sharing between individuals in a variance components framework. However, calculation

of IBD sharing in such a large pedigree is computationally demanding and would most likely

result in a reduction in power. By not incorporating IBD sharing in the calculation, we are able

to exploit the linkage signal in our pedigree in order to increase our power to detect genetic

effects. However, this does impact upon our interpretation of our results, since linkage signals

are generally expected to extend over larger genomic regions than association signals. This

could explain the relatively wide localization of the signals we found (see Tables 3 and 4)

which in some cases stretched over several genes.

In conclusion, our GWAS analysis indicated the involvement of some novel genes in the

etiology of obesity (BMI and WC). GWAS analyses are only an initial step in the explication

of susceptibility variants. Although the current analyses have pointed out several areas that

may hold genetic variants that affect susceptibility to T2D in Arab populations, further

investigation of the identified genes is needed to understand the mechanism and association of

these genes with T2D and obesity. Our findings require replication in both Arab and other

ethnic groups. The characteristics of Arabic population make them ideal for the study of

complex, polygenic, multifactorial disorders such as diabetes due to consanguineous

marriages, high birth rates and lack of physical exercise. As we uncover more variants, we will

249

gain a better basic understanding of obesity, which in turn will further previously unimagined

areas of clinically relevant research

250

ACKNOWLEDGMENT


Western Australia. We gratefully acknowledge the contribution of participating family

members whose cooperation made this study possible. Part of the data analysis was

performed on the advanced computing resources provided by the Western Australian

Advanced Computing Consortia (iVEC). Habiba Alsafar is a PhD scholar at the University of

Western Australia supported by the Dubai Police General Head Quarters in the United Arab

Emirates. Funding for this project was provided in part by CVRL and the Emirates

Foundation.

251

REFERENCES

1. Lyen, K.R., The insulin receptor. Ann Acad Med Singapore, 1985. 14(2): p. 364-73.

2. Olefsky, J.M. and O.G. Kolterman, Mechanisms of insulin resistance in obesity and

noninsulin-dependent (type II) diabetes. Am J Med, 1981. 70(1): p. 151-68.

3. Bjrntorp, P., Obesity and Adipose Tissue Distribution as Risk Factors for the

Development of Disease. Transfusion Medicine and Hemotherapy, 1990. 17(1): p. 24-

27.

4. Charro, A., M. Rubio, and D. Runkle, Checks up in obese and diabetic patients:

preventive medicine. Int J Vitam Nutr Res, 2006. 76: p. 194-9.

5. Wannamethee, S.G., A.G. Shaper, and M. Walker, Overweight and obesity and weight

change in middle aged men: impact on cardiovascular disease and diabetes. J

Epidemiol Community Health, 2005. 59(2): p. 134-9.

6. Wannamethee, S.G. and A.G. Shaper, Weight change and duration of overweight and

obesity in the incidence of type 2 diabetes. Diabetes Care, 1999. 22(8): p. 1266-72.

7. Resnick, H.E., et al., Relation of weight gain and weight loss on subsequent diabetes

risk in overweight adults. J Epidemiol Community Health, 2000. 54(8): p. 596-602.

8. Perry, I.J., et al., Prospective study of risk factors for development of non-insulin

dependent diabetes in middle aged British men. Bmj, 1995. 310(6979): p. 560-4.

9. Holbrook, T., E. Barrett-Connor, and D. Wingard, The association of lifetime weight

and weight control patterns with diabetes among men and women in an adult

community. Int J Obes, 1989. 13: p. 723–9.

10. Haffner SM, et al., Inci- dence of type II diabetes in Mexican Americans predicted by

fasting insulin and glucose levels, obesity, and body-fat distribution. Diabetes 1990.

39: p. 283–8.

11. Field, A.E., et al., Impact of overweight on the risk of developing common chronic

diseases during a 10-year period. Arch Intern Med, 2001. 161(13): p. 1581-6.

12. Colditz, G.A., et al., Weight gain as a risk factor for clinical diabetes mellitus in

women. Ann Intern Med, 1995. 122(7): p. 481-6.

13. Chan, J., et al., Obesity, fat distribution, and weight gain as risk factors for clinical

diabetes in men. Diabetes Care, 1994(17): p. 961–9.

252

14. Carey, V.J., et al., Body fat distribution and risk of non-insulin-dependent diabetes

mellitus in women. The Nurses' Health Study. Am J Epidemiol, 1997. 145(7): p. 614-9.

15. Ardern, C.I., et al., Discrimination of health risk by combined body mass index and

waist circumference. Obes Res, 2003. 11(1): p. 135-42.

16. Chan, J.M., et al., Obesity, fat distribution, and weight gain as risk factors for clinical

diabetes in men. Diabetes Care, 1994. 17(9): p. 961-969.

17. Genome-wide association study of 14,000 cases of seven common diseases and 3,000

shared controls. Nature, 2007. 447(7145): p. 661-78.

18. Christensen K and M. JC., What genome-wide association studies can do for medicine.

N Engl J Med, 2007. 356: p. 1094–7.

19. Benzinou, M., et al., Common nonsynonymous variants in PCSK1 confer risk of

obesity. Nat Genet, 2008. 40(8): p. 943-5.

20. Chambers J.C., et al., Common genetic variation near MC4R is associated with waist

circumference and insulin resistance. Nat. Genet, 2008. 40: p. 716–718.

21. Frayling, T.M., et al., A common variant in the FTO gene is associated with body mass

index and predisposes to childhood and adult obesity. Science, 2007. 316(5826): p.

889-94.

22. Loos, R.J., et al., Common variants near MC4R are associated with fat mass, weight

and risk of obesity. Nat Genet, 2008. 40(6): p. 768-75.

23. Willer, C.J., et al., Six new loci associated with body mass index highlight a neuronal

influence on body weight regulation. Nat Genet, 2009. 41(1): p. 25-34.

24. Kring, S.I., et al., FTO gene associated fatness in relation to body fat distribution and

metabolic traits throughout a broad range of fatness. PLoS One, 2008. 3(8): p. e2958.

25. Chambers, J.C., et al., Common genetic variation near MC4R is associated with waist

circumference and insulin resistance. Nat Genet, 2008. 40(6): p. 716-8.

26. Gerken, T., et al., The obesity-associated FTO gene encodes a 2-oxoglutarate-

dependent nucleic acid demethylase. Science, 2007. 318(5855): p. 1469-72.

27. Vaisse, C., et al., A frameshift mutation in human MC4R is associated with a dominant

form of obesity. Nat Genet, 1998. 20(2): p. 113-4.

28. Fox, C.S., et al., Genome-wide association to body mass index and waist

circumference: the Framingham Heart Study 100K project. BMC Med Genet, 2007. 8

Suppl 1: p. S18.

253

29. World Health Organization. Obesity: Preventing and Managing the Global Epidemic

(2000) Geneva, World Health Organization. Technical report series 894.

30. Almasy, L. and J. Blangero, Multipoint quantitative-trait linkage analysis in general

pedigrees. Am J Hum Genet, 1998. 62(5): p. 1198-211.

31. Purcell, S., et al., PLINK: a tool set for whole-genome association and population-

based linkage analyses. Am J Hum Genet, 2007. 81(3): p. 559-75.

32. O'Connell, J.R. and D.E. Weeks, PedCheck: a program for identification of genotype

incompatibilities in linkage analysis. Am J Hum Genet, 1998. 63(1): p. 259-66.

33. Abecasis, G.R., L.R. Cardon, and W.O. Cookson, A general test of association for

quantitative traits in nuclear families. Am J Hum Genet, 2000. 66(1): p. 279-92.

34. Boutin, P. and P. Froguel, GAD2: a polygenic contribution to genetic susceptibility for

common obesity? Pathol Biol (Paris), 2005. 53(6): p. 305-7.

35. Kelai, S., et al., Nrxn3 upregulation in the globus pallidus of mice developing cocaine

addiction. Neuroreport, 2008. 19(7): p. 751-5.

36. Gray, J., et al., Hyperphagia, severe obesity, impaired cognitive function, and

hyperactivity associated with functional loss of one copy of the brain-derived

neurotrophic factor (BDNF) gene. Diabetes, 2006. 55(12): p. 3366-71.



40(5): p. 638-45.

38. Nakajima, T., et al., TIP27: a novel repressor of the nuclear orphan receptor

TAK1/TR4. Nucleic Acids Res, 2004. 32(14): p. 4194-204.

39. Mouro, I., et al., Cloning, expression, and chromosomal mapping of a human ATPase

II gene, member of the third subfamily of P-type ATPases and orthologous to the

presumed bovine and murine aminophospholipid translocase. Biochem Biophys Res

Commun, 1999. 257(2): p. 333-9.

40. Flamant, S., et al., Characterization of a putative type IV aminophospholipid

transporter P-type ATPase. Mamm Genome, 2003. 14(1): p. 21-30.

41. Halleck, M.S., et al., Differential expression of putative transbilayer amphipath

transporters. Physiol Genomics, 1999. 1(3): p. 139-50.

42. Paulusma, C.C. and R.P. Oude Elferink, The type 4 subfamily of P-type ATPases,

putative aminophospholipid translocases with a role in human disease. Biochim

Biophys Acta, 2005. 1741(1-2): p. 11-24.

254

43. Miller, B.J., et al., Pooled analysis of loss of heterozygosity in breast cancer: a genome

scan provides comparative evidence for multiple tumor suppressors and identifies

novel candidate regions. Am J Hum Genet, 2003. 73(4): p. 748-67.

44. Lin, Y.W., et al., Deletion mapping of chromosome 16q24 in hepatocellular carcinoma

in Taiwan and mutational analysis of the 17-beta-HSD gene localized to the region. Int

J Cancer, 2001. 93(1): p. 74-9.

45. Launonen, V., et al., Loss of heterozygosity at chromosomes 3, 6, 8, 11, 16, and 17 in

ovarian cancer: correlation to clinicopathological variables. Cancer Genet Cytogenet,

2000. 122(1): p. 49-54.

46. Kumar, R., et al., FBXO31 is the chromosome 16q24.3 senescence gene, a candidate

breast tumor suppressor, and a component of an SCF complex. Cancer Res, 2005.

65(24): p. 11304-13.

47. Elo, J.P., et al., Loss of heterozygosity at 16q24.1-q24.2 is significantly associated with

metastatic and aggressive behavior of prostate cancer. Cancer Res, 1997. 57(16): p.

3356-9.

48. Vainio, H. and F. Bianchini, Evaluation of cancer-preventive agents and strategies a

new program at the International Agency for Research on Cancer. Ann N Y Acad Sci,

2001. 952: p. 177-80.

49. Toniolo, P.G., et al., A prospective study of endogenous estrogens and breast cancer in

postmenopausal women. J Natl Cancer Inst, 1995. 87(3): p. 190-7.

50. Zeleniuch-Jacquotte, A., et al., Endogenous estrogens and risk of breast cancer by

estrogen receptor status: a prospective study in postmenopausal women. Cancer

Epidemiol Biomarkers Prev, 1995. 4(8): p. 857-60.

51. Keun-Young, Y., et al., Postmenopausal obesity as a breast cancer risk factor

according to estrogen and progesterone receptor status (Japan). Cancer letters, 2001.

167(1): p. 57-63.

52. Huang, Z., et al., Dual effects of weight and weight gain on breast cancer risk. Jama,

1997. 278(17): p. 1407-11.

53. Shotelersuk, V. and W.A. Gahl, Hermansky-Pudlak syndrome: models for intracellular

vesicle formation. Mol Genet Metab, 1998. 65(2): p. 85-96.

54. Thorleifsson, G., et al., Genome-wide association yields new sequence variants at

seven loci that associate with measures of obesity. Nat Genet, 2009. 41(1): p. 18-24.

255

55. Cotsapas, C., et al., Common body mass index-associated variants confer risk of

extreme obesity. Hum Mol Genet, 2009. 18(18): p. 3502-7.

56. Meyre, D., et al., Genome-wide association study for early-onset and morbid adult

obesity identifies three new risk loci in European populations. Nat Genet, 2009. 41(2):

p. 157-9.

57. Hinney, A., et al., Genome wide association (GWA) study for early onset extreme

obesity supports the role of fat mass and obesity associated gene (FTO) variants. PLoS

One, 2007. 2(12): p. e1361.

58. Scuteri, A., et al., Genome-wide association scan shows genetic variants in the FTO

gene are associated with obesity-related traits. PLoS Genet, 2007. 3(7): p. e115.

59. Liu, X.G., et al., Genome-wide association and replication studies identified TRHR as

an important gene for lean body mass. Am J Hum Genet, 2009. 84(3): p. 418-23.

60. Heard-Costa, N.L., et al., NRXN3 is a novel locus for waist circumference: a genome-

wide association study from the CHARGE Consortium. PLoS Genet, 2009. 5(6): p.

e1000539.

61. Lindgren, C.M., et al., Genome-wide association scan meta-analysis identifies three

Loci influencing adiposity and fat distribution. PLoS Genet, 2009. 5(6): p. e1000508.

256

257

CHAPTER 8

COMMENTARY AND FINAL REMARKS

258

259

COMMENTARY AND FINAL REMARKS

The pilot program of the Emirates Family Registry (EFR) described in chapters of this thesis

was overwhelmingly successful. The information gathered for the first Genome Wide Screen

of an Arab population could not have been collated without the support of volunteers that

enrolled in the Emirates Family Registry. The tightly knit Bedouin communities are

essentially closed to technological advances. However, the Emirates Family Registry

provided a platform, through which key members of the family hierarchy could derive

confidence in a long term approach to addressing an important issue. The structured bio-bank

and associated clinical database (Figure 1) provided a means to systematically match bio-

specimens (blood, DNA samples) with phenotypic and demographic data (Figure 2). The

study focused on Type 2 Diabetes (T2D) in Arabs as it represents an increasing problem in the

Middle East (see review of the genetics of diabetes in Chapter 1).

Figure 1: Structure of the EFR bio-bank and clinical database.

The study has provided the initial dataset collected from 23,064 volunteers. The elements of

the Phase dataset are included in Figure 1. A specific subset of the these volunteers was

specifically analysed to estimate the prevalence and incidence of Type 2 Diabetes in the

260

United Arab Emirates (Chapter 2) as well as the inheritance of traits known to be associated

with the disease (Chapter 3).

Figure 2: The specific contents of the EFR Phase 1.

The Emirates Family Registry project has not only resulted in a close relationship with

families and individuals who were keen to develop an understanding on the mechanisms that

cause disease, it has resulted in the establishment of an international collaborative network in

Australia, Europe and the Middle East (see Figure 3) which will ensure future development of

the EFR project. This collaborative network will provide substantial benefits for all the groups

such as genotyping facilities, analytical tools bioinformatics and statistical expertise which

would be invaluable to the Arab genome and bio-bank community since there is a lack of

biostatistician in the Middle East. Furthermore, there is the potential to set up a worthy

collaborative network especially with Gulf Cooperation Council (GCC) countries.

261

Figure 3: Collaborative Links of the EFR Project have been established throughout the Middle East, United Kingdom and Australia

262

The nature of the Middle East, and most of Asia, requires further thought. Methods adapted

for sample collection in harsh environments with little access to infrastructure have to be

developed. Blood are typically collected by venipucture in vacutainers. In parts of the

African and Asian continents, blood collection by this process is problematic. As such, new

sample collection and storage methods have to be considered. In Chapter 4, the FTATM

system was successfully assessed for this purpose.

Analysis of the information within the database has revealed much about the uniqueness of the

genetic background of the Bedouin population and its phylogenetic relationship with other

ethnic groups. In Chapter 5, four specific markers in the Major Histocompatibiliy Complex

(MHC) on human chromosome 6 was typed. In the study, PCR assays to type the markers

AluyMICB, AluyTF, AluyHJ and AluyHF were developed. Phylogenetic comparison of data

from the Arab population were compared with the allelic distribution in Malaysian Chinese,

North Eastern Thais, Japanese, Australian, African and Mongolians population. The study

showed that Arabs have a similar lineage to Caucasians.

The information with the biological specimens were useful in many ways including genome

wide studies to identify contributing polymorphisms (see Chapter 6 and Chapter 7). Phase 1

of the EFR is expected to provide a platform for longitudinal studies, moving forward.

In Chapter 1, the major problem that is Type 2 Diabetes is discussed. Currently over 170

million people globally suffer from Type 2 Diabetes and are affected by factors such as

lifestyle, genetics, as well as behavioral factors [1]. The roles of genetic factors in the etiology

of diabetes were found to be highly significant. Therefore it is important to map disease genes

by comparison of disease and control as well as by performing comparative analysis across

different ethnic groups. Type 2 Diabetes become a major public health problem in the UAE.

A survey completed by the Ministry of Health in UAE reported that the overall percentage of

people with diabetes was 19.6% among UAE citizen group. Furthermore, recent studies

estimated that 25% of adult Arabs now suffer from diabetes; mainly Type 2 Diabetes; and the

prevalence of the disease is increasing. These observations emphasize the necessity of

considering prevention for diabetes in the UAE. Towards this “Emirates Family Registry”

(EFR) were created to detect loci and genes influencing susceptibility to Type 2 Diabetes

(T2D) and related traits in the United Arab Emirates (UAE) population. Thus Chapter 1

263

touches the implications of genetic research, with specific emphasis on the findings of genome

wide screening of T2D patients among different population.

Chapter 2 discusses the prevalence of Type 2 Diabetes in a small population from the UAE, a

prelude to a more extensive longitudinal study in the future. The disease is currently the

fastest growing debilitating disease in the world. In 2007 the United Arab Emirates was

ranked the second country in the world with the highest prevalence of diabetes. One out of

five UAE nationals aged 20 to 79 lives with diabetes. In order to investigate the genes

influencing susceptibility to Type 2 Diabetes in ethnic groups in the UAE population;

collaboration have been established with major hospitals and diabetes centres in the country.

Through this collaboration, demographic data of patients have been evaluated and tabulated in

highly professional database called Emirates Family Registry. To date the Emirates Family

Registry contains 23,064 volunteers (see Figure 1). Information within the Emirates Family

Registry has revealed obesity, waist circumference, consanguineous marriage, family history,

lack of physical activity, unhealthy diet with high total cholesterol and triglycerides levels

were more prevalent in Type 2 Diabetes patients in the United Arab Emirates. These

observations could lead to better diagnoses, treatment and intervention. The need to continue

to add patients to the database as they are found and treated; as well as those that do not

presently have the disease is extremely important. This kind of study and continued collection

of data could lead to the genomic studies needed to control of Diabetes. This would be a great

thing for the patient, families, and the healthcare system of any country.

Chapter 3 estimates the heritability of eight traits used to evaluate the influence of genetic

component on phenotypic variation that associated with Type 2 Diabetes and describes the

role of genes and the influence of the environmental on the increasing prevalence of Type 2

Diabetes in an extended family of Arab origin. The study exposed strong phenotypic

correlations between fasting glucose levels and HbA1c, and between these two traits and waist

circumference. The findings presented also indicate a heritable tendency for obesity in this

family, indicated by waist circumference and BMI values. The results presented show a

strong familial aggregation of quantitative traits associated with T2D. Further studies are

underway to identify potentially specific genetic loci in Arab populations. This assessment of

phenotypic factors will be followed up with ongoing studies to evaluate the contribution of

264

genetic polymorphisms that contribute to the prevalence of Type 2 Diabetes in Arab

populations.

Chapter 4 describes the use of FTATM technology for storage DNA and a Whole Genome

Amplification step prior to GWAS application as an alternative strategy for high throughput

genotyping. In this study, three different sources of DNA was assessed (namely, degraded

genomic DNA, amplified degraded genomic DNA and amplified extracted DNA from FTATM

card) as suitable templates in for genome-wide analysis using Illumina’s Human 660w-Quad

Bead Chip. The study showed amplified extracted DNA from FTATM card has the highest

accurate call rates in comparison to other DNA sources; amplified and not-amplified genomic

DNA. Thus FTATM Cards is a routine and cost effective technology that is a simple method

for preservation of bio-specimens, amenable to high throughput DNA extraction, all the

attributes required to undertake successful Genome Wide Association Studies in an efficient

manner. To the best of our knowledge, this is the first description of FTATM sourced DNA for

high throughput genotyping to study human polymorphisms.

Chapter 5 examines the evolutionary relationships of unstudied population, the Bedouins of

the Middle East and evaluates the distribution of specific POALINS of the Major

Histocompatibility Complex (MHC) with previous analyses of specific population groups such

as African, European, Asian and descent [2-8]. The study segregated the populations into 3

phylogenetic groups; the Asian subpopulation, the Bedouins and Caucasians, and the three

included African subpopulations. Based on our results we concluded that Bedouin population

were similar to those in Australian Caucasian. However further analyses of the Bedouin

population is needed for better understanding of their unique genetic background and the

diseases that affect this group of individuals.

Chapter 6 and 7 examines the genes that may influence susceptibility to Type 2 Diabetes and

obesity in Type 2 Diabetes patients from the United Arab Emirates through a sophisticated

technology by studying 660,000 Single Nucleotide Polymorphisms throughout the genome.

To date, a genome wide scan for Type 2 Diabetes have been performed in over 20 different

populations, including Europeans, American Caucasians, Mexican Americans, Pima Indians,

African Americans and Asians [1, 9-18]. Results from these studies have indicated that Type 2

Diabetes susceptibility loci reside in a number of different chromosomes. From this

265

perspective, this study is the first genome wide screen in the Middle East focusing on

identification of the genes involved in the development of Type 2 Diabetes among the UAE

population.

Therefore the Genome Wide Association Study analyses in Arab population among Type 2

Diabetes patients were only an initial step in the explication of susceptibility variants. The

result obtained (Chapter 6 ) from the GWAS analysis identified variation at PRKD1 (Protein

Kinase D1) on 14q11 as being associated with Type 2 Diabetes among an extended family of

319 member (see Figure 4) of an Arab descent living in United Arab Emirates.

The study (Chapter 7) identified loci that were replicated in different cohort in Caucasian

population such as the ADAM30, GALNTL4, JAZF1, and DGKG genes regions that are

associated with obesity in Type 2 Diabetes patients [18, 19]. Moreover, this study also

identified several loci that were not detected earlier and are associated with Type 2 Diabetes.

The most significant statistical evidence for association with Body Mass Index was found in

HPS3 gene and FBX031 for Waist Circumference.

Further investigation of the identified genes is needed to understand the mechanism and

association of these genes with Type 2 Diabetes and obesity. Our findings call for the need of

further replication in other ethnic groups. As we uncover more variants, we will gain a better

basic understanding of Type 2 Diabetes among Arab population, which in turn will open doors

to previously unimagined areas of clinically relevant research.

As Phase One of the Emirates Family Registry project draws to a close, the collaborations

established with regional and international partners will see the expansion of the project to

other Gulf Cooperation Council countries. To conduct more Genome wide association study

in Arabs requires a joint effort among Arab institutions and since they are assortment of ethnic

groups in the region, phase two of Emirates Family Registry will cover a diverse array of

different populations (eg. Arabs, Bedouins, Persians, Kurds, Lebanese, Palestinians, Turks,

etcetera). An understanding of the genetic diversity in the region will provide an insight into

mechanisms that cause disease. These developments could possibly lead to improved

intervention and prevention programs to improve the quality of life throughout Arab nations.

266

REFERENCES



2. Dunn, D.S., et al., The distribution of major histocompatibility complex class I

polymorphic Alu insertions and their associations with HLA alleles in a Chinese

population from Malaysia. Tissue Antigens, 2007. 70(2): p. 136-43.

3. Dunn, D.S., et al., The association between HLA-A alleles and young Alu dimorphisms

near the HLA-J, -H, and -F genes in workshop cell lines and Japanese and Australian

populations. J Mol Evol, 2002. 55(6): p. 718-26.

4. Dunn, D.S., et al., Association of MHC dimorphic Alu insertions with HLA class I and

MIC genes in Japanese HLA-B48 haplotypes. Tissue Antigens, 2003. 62(3): p. 259-62.

5. Kulski, J.K. and D.S. Dunn, Polymorphic Alu insertions within the Major

Histocompatibility Complex class I genomic region: a brief review. Cytogenet Genome

Res, 2005. 110(1-4): p. 193-202.

6. Dunn, D.S., B.D. Tait, and J.K. Kulski, The distribution of polymorphic Alu insertions

within the MHC class I HLA-B7 and HLA-B57 haplotypes. Immunogenetics, 2005.

56(10): p. 765-8.

7. Yao, Y., et al., Polymorphic Alu insertions and their associations with MHC class I

alleles and haplotypes in Han and Jinuo populations in Yunnan Province, southwest of

China. J Genet Genomics, 2009. 36(1): p. 51-8.

8. Yao, Y., et al., The association between HLA-A, -B alleles and major

histocompatibility complex class I polymorphic Alu insertions in four populations in

China. Tissue Antigens, 2009. 73(6): p. 575-81.

9. Florez, J.C., et al., A 100K genome-wide association scan for diabetes and related

traits in the Framingham Heart Study: replication and integration with other genome-

wide datasets. Diabetes, 2007. 56(12): p. 3063-74.

10. Hayes, M.G., et al., Identification of type 2 diabetes genes in Mexican Americans

through genome-wide association studies. Diabetes, 2007. 56(12): p. 3033-44.

11. Rampersaud, E., et al., Identification of novel candidate genes for type 2 diabetes from

a genome-wide association scan in the Old Order Amish: evidence for replication from


56(12): p. 3053-62.

267



13. Saxena, R., et al., Genome-wide association analysis identifies loci for type 2 diabetes

and triglyceride levels. Science, 2007. 316(5829): p. 1331-6.



15. Takeuchi, F., et al., Confirmation of multiple risk Loci and genetic impacts by a

genome-wide association study of type 2 diabetes in the Japanese population.

Diabetes, 2009. 58(7): p. 1690-9.


susceptibility observed in genome-wide association data. Diabetes, 2009. 58(2): p.

505-10.

17. Yasuda, K., et al., Variants in KCNQ1 are associated with susceptibility to type 2

diabetes mellitus. Nat Genet, 2008. 40(9): p. 1092-7.



40(5): p. 638-45.

19. Nakajima, T., et al., TIP27: a novel repressor of the nuclear orphan receptor

TAK1/TR4. Nucleic Acids Res, 2004. 32(14): p. 4194-204.

268

Documents

The EFR Project: a Collaborative Network to Establish an ... · I thank my faith friends: Moza Alnahyan, Amal Alghanim, Laila Alsayegh and Ahlam Salmeen for their support, perspective,