Upload
dinhnhan
View
213
Download
0
Embed Size (px)
Citation preview
1
GENE-ASSOCIATED SNP DISCOVERY AND MOLECULAR CLONING
OF FULL-LENGTH cDNA OF CINNAMATE 4-HYDROXYLASE AND
CINNAMYL ALCOHOL DEHYDROGENASE IN A TROPICAL TIMBER
TREE Neolamarckia cadamba
Tchin Boon Ling
Master of Science
(Plant Biotechnology)
2013
Faculty of Resource Science and Technology
2
GENE-ASSOCIATED SNP DISCOVERY AND MOLECULAR CLONING OF FULL-
LENGTH cDNA OF CINNAMATE 4-HYDROXYLASE AND CINNAMYL ALCOHOL
DEHYDROGENASE IN A TROPICAL TIMBER TREE Neolamarckia cadamba
Tchin Boon Ling
A thesis submitted in fulfilment of the requirements for the degree of Master of Science
Faculty of Resource Science and Technology
UNIVERSITI MALAYSIA SARAWAK
2013
i
ACKNOWLEDGEMENTS
First, I would like to express my sincere appreciation and deepest gratitude to my obliging
supervisor, Dr. Ho Wei Seng, Faculty of Resource Science and Technology, UNIMAS, who
gave me this golden opportunity to carry out my master research in the Forest Genomics and
Informatics Laboratory (fGiL) and thanks for his patience, valuable advice and
encouragement which enabled me to successfully complete my study.
I would also like to thanks my co-supervisor, Dr. Pang Shek Ling, researcher from
Sarawak Forestry Corporation (SFC), for her continuous guidance along the research and gave
me many insights into the techniques of research and problems that arise.
My sincere thanks also go to Assoc Prof. Dr. Ismail Jusoh, Department of Plant
Science and Environmental Ecology, Faculty of Resource Science and Technology, UNIMAS,
who provided me with necessary information and technique on wood properties study.
My truthful gratitude to our lab assistant, Miss Kamalia and all labmates in the Forest
Genomics and Informatics Laboratory (fGiL), UNIMAS, especially Mr. Liew Kit Siong for
his caring, advices and support. Without his assistance, it would have been impossible to
complete my study within the stipulated period.
Finally, I would like to thank my family, friends and everyone else who directly or
indirectly contributed to my study, especially my lovely family members who always caring
and supporting me throughout my study in UNIMAS.
ii
ABSTRACT
Neolamarckia cadamba, or locally known as Kelampayan is one of the indigenous tree
species that are selected for forest plantation establishment in Sarawak due to the high
productivity and short rotation time of this species. Understanding the structure and
composition of Kelampayan wood through genome integration is vital to better utilize this
wood material. Concurrently, the Kelampayan wood formation genomic resource database,
aka Cadamomics (10,368 ESTs) has been developed, and this opens the gateway for
researchers to deeply explore the genomic basic of Kelampayan. EST database is a useful
resource for gene discovery. Further analysis on this database generated two full-length lignin
biosynthesis genes, namely C4H and CAD. Validation by RT-PCR and full-length gene
specific primers had confirmed the identities of the genes discovered. The full-length C4H
cDNA, designated as NcC4H is 1,651 bp long, with a 1,518 bp open reading frame encoding a
protein of 505 amino acids, a 18 bp 5’-UTR and a 115 bp 3’-UTR. The NcC4H showed higher
identity with the class I C4Hs, which is preferentially involved in the phenylpropanoid
biosynthesis pathway. Meanwhile, sequence analysis of the full-length CAD cDNA,
designated as NcCAD, showed that it is 1,240 bp long with a 1,086 bp open reading frame
encoding a protein of 361 amino acids, a 68 bp 5’-UTR and a 86 bp 3’-UTR. Phylogenetic
analysis revealed that NcCAD was grouped in the cluster containing both CAD and SAD genes,
in which both genes were involved in lignin biosynthesis. The full-length NcC4H and NcCAD
cDNA identified serve as good candidate genes for association genetics studies in
Kelampayan. Association genetics study is a powerful approach to detect potential genetic
variants, i.e. SNPs, underlying the common and complex adaptive traits. Thus far, single
nucleotide polymorphisms (SNPs) detected in C4H and CAD genes were known to be
iii
correlated with some other phenotypic variations in forest tree species rather than with lignin
production only. Hence, attempts were made to discover SNPs from the partial C4H and CAD
genomic DNA sequences. Overlapping primers were designed to flank the partial C4H and
CAD DNA from 12 Kelampayan samples. The amplified DNA fragments were cloned and
sent for sequencing. Furthermore, wood cores were collected and the basic wood density was
measured for each tree. Sequence variation analysis revealed that there were 60 and 32 SNPs
detected in the partial C4H and CAD DNA sequences, respectively. The SNPs detected were
distributed throughout the exon, intron, 5’-UTR and 3’-UTR regions. Among the SNPs
detected in the exon regions of C4H, 16 were synonymous mutations and eight were
nonsynonymous mutations. For CAD, six SNPs lead to synonymous mutations and one SNP
lead to nonsynonymous mutation. Synonymous mutations (71 %) were more common than
nonsynonymous mutations (29 %) for both the C4H and CAD genes. Association genetics
studies also revealed that four and six SNPs from the C4H and CAD genes respectively were
significantly associated with the basic wood density of Kelampayan (p<0.05). Genetic
variations identified by the SNP markers, once validated, will facilitate the selection of
Kelampayan parental lines or seedlings with optimal quality through the gene-assisted
selection (GAS) approach.
Keywords: Neolamarckia cadamba, Cinnamate 4-hydroxylase, Cinnamyl alcohol
dehydrogenase, Single nucleotide polymorphism, Association genetics study.
iv
GEN-BERKAITAN SNP PENEMUAN DAN PENGKLONAN CINNAMATE 4-
HYDROXYLASE DAN CINNAMYL ALCOHOL DEHYDROGENASE cDNA
SEMPURNA DARI POKOK TROPIKA Neolamarckia cadamba
ABSTRAK
Neolamarckia cadamba (nama tempatan: Kelampayan) adalah sejenis pokok asli yang dipilih
untuk penubuhan perladangan hutan di Sarawak disebabkan oleh produktiviti yang tinggi dan
putaran masa yang singkat. Kefahaman tentang struktur dan komposisi kayu Kelampayan
melalui integrasi genom adalah penting. Serentak itu, sumber pangkalan genomik data untuk
pembentukan kayu Kelampayan (Cadamomics, 10368 ESTs) telah dibentukkan dan ini
membolehkan para penyelidik untuk meneroka secara mendalam tentang asas genomik
Kelampayan. Pangkalan data EST adalah sumber yang amat berguna untuk penemuan gen.
Analisis lanjut dari pangkalan data ini menjana dua gen biosintesis lignin sempurna, iaitu
C4H dan CAD. Pengesahan dengan RT-PCR dan gen primer spesifik telah mengesahkan
identiti gen yang ditemui. C4H cDNA sempurna, atau NcC4H, ialah 1,651 bp panjang dengan
1,518 bp rangka bacaan terbuka yang mengekodkan 505 asid amino, 18 bp 5'-UTR dan
115bp 3'-UTR. NcC4H dikategori ke dalam kelas I C4Hs yang terutamanya terlibat dalam
proses biosintesis phenylpropanoid. Sementara itu, analisis bagi CAD cDNA sempurna, atau
NcCAD menunjukkan bahawa ia adalah 1,240 bp panjang dengan 1,086 bp rangka bacaan
terbuka yang mengekodkan 361 asid amino, 68bp 5'-UTR dan 86 bp 3'-UTR. Analisis
filogenetik menunjukkan bahawa NcCAD berada di dalam kelompok yang mengandungi
kedua-dua gen CAD dan SAD, di mana kedua-dua gen ini adalah terlibat dalam biosintesis
lignin. NcC4H dan NcCAD cDNA dapat digunakan sebagai gen calon yang baik untuk kajian
v
perhubungan genetik dalam Kelampayan. Kajian perhubungan genetik adalah satu
pendekatan yang ampuh untuk mengesan variasi genetik, contohnya polimorfisme nukleotida
tunggal (SNP) yang berpotensi untuk mendasari sifat-sifat adaptif umum dan kompleks bagi
Kelampayan. Setakat ini, SNP yang dikesan di gen C4H dan CAD telah diketahui mempunyai
hubungkait dengan beberapa variasi fenotip lain dalam spesies pokok hutan dan bukan
pembuatan lignin sahaja. Oleh itu, percubaan telah dibuat untuk mencari SNP dari separa
C4H dan CAD jujukan DNA. Primer-primer bertindih telah direka untuk mengamplifikasi
C4H dan CAD DNA daripada 12 sampel Kelampayan. Serpihan-serpihan DNA yang
diamplifikasikan telah diklon dan dihantar untuk penjujukan. Selain itu, teras kayu telah
dikumpul dan ketumpatan kayu asas diukur bagi setiap pokok. Analisis variasi jujukan telah
mengesan 60 dan 32 SNP di separa C4H dan CAD DNA masing-masing. SNP yang dikesan
terdapat di bahagian exon, intron, 5'-UTR dan 3'-UTR. Antara SNP yang dikesan di kawasan
exon C4H, 16 merupakan mutasi sinonim dan 8 adalah mutasi bukan sinonim. Untuk CAD, 6
SNP telah membawa kepada mutasi sinonim dan satu SNP telah membawa kepada mutasi
bukan sinonim dalam jujukan asid amino yang diterjemahkan. Mutasi sinonim (71%) adalah
lebih kerap berlaku daripada mutasi bukan sinonim (29%) bagi kedua-dua gen C4H dan
CAD. Kajian perhubungan genetik juga menunjukkan bahawa 4 dan 6 SNP dari gen C4H dan
CAD masing-masing mempunyai perhubungan yang signifikan dengan ketumpatan kayu
Kelampayan (p<0.05). Variasi genetik yang dikenalpasti oleh penanda SNP, apabila
disahkan, akan memudahkan pemilihan Kelampayan induk atau anak benih yang berkualiti
optimum melalui kaedah pemilihan gen-bantuan (GAS).
Kata kunci: Neolamarckia cadamba, Cinnamate 4-hydroxylase, Cinnamyl alcohol
dehydrogenase, Polimorfisme nukleotida tunggal (SNP), Kajian perhubungan genetik
vi
TABLE OF CONTENTS
ACKNOWLEDGMENTS i
ABSTRACT
ABSTRAK
ii
iv
TABLE OF CONTENTS vi
LIST OF TABLES xi
LIST OF FIGURES
xiii
LIST OF ABBREVIATIONS
xvii
CHAPTER I
INTRODUCTION 1
CHAPTER II LITERATURE REVIEW
2.1 Neolamarckia cadamba (Roxb.) Bosser 7
2.1.1 Anatomy Characteristics 7
2.1.2 Scientific Classification 10
2.1.3 Potential Uses of Kelampayan 11
2.1.4 Pharmacological Values of Kelampayan 11
2.1.5 Genomics Study of Kelampayan 11
2.2 Wood 12
2.2.1 Wood Density 13
2.2.2 Wood Formation 14
2.2.3 Plant Cell Wall 16
2.2.4 Cellulose 18
2.2.5 Hemicelluloses 19
2.2.6 Lignin 19
vii
2.2.7 Lignin Biosynthesis Pathway 21
2.2.7.1 Cinnamate 4-Hydroxylase (C4H) Gene 24
2.2.7.2 Cinnamyl Alcohol Dehydrogenase
(CAD) Gene
26
2.3 Molecular Markers for Forest Tree Genomics
Research
30
2.3.1 Single Nucleotide Polymorphisms (SNPs) 32
2.3.1.1 Advantages and Disadvantages of
SNPs
34
2.3.1.2 SNP Markers Development 35
2.3.1.3 Applications of SNPs in Genome
Analysis
37
2.4 Association Genetics Study 40
CHAPTER III
MATERIALS AND METHODS
3.1 C4H and CAD EST Data Analysis 48
3.2 Primer Design for Full-length C4H and CAD
cDNA
50
3.3 Total RNA Isolation 50
3.3.1 Plant Materials 50
3.3.2 Total RNA Isolation Protocol 51
3.4 Reverse Transcription of Total RNA 53
3.5 Cloning and Sequencing of Full-length C4H and
CAD cDNA
53
3.5.1 Rapid amplification of Full-length C4H and
CAD cDNA
53
3.5.2 Purification of PCR Amplicons from Agarose
Gel
54
3.5.3 cDNA Ligation 55
3.5.4 Transformation 56
3.5.5 Blue/White Colony Screening 56
viii
3.5.6 Plasmids Isolation 57
3.5.7 Confirmation for Desired Insert Trough
Restriction Enzyme Digestion
58
3.5.8 Sequencing 58
3.6 In Silico Analysis of Full-length C4H and CAD
cDNA
59
3.7 Sequence Variations of C4H and CAD Genes 60
3.7.1 Plant Materials 60
3.7.2 DNA Extraction 63
3.7.3 DNA Purification 64
3.7.4 Overlapping Primer Design 65
3.7.5 PCR Amplification 66
3.7.6 Cloning of PCR Amplicons 67
3.7.7 Sequencing 67
3.7.8 Sequence Variation Analysis 68
3.8 Wood Properties 69
3.8.1 Wood Cores Collection 69
3.8.2 Wood Density Measurement 69
3.9 Statistical Analysis 71
3.9.1 Nucleotide Diversity Analysis 71
3.9.2 Association Analysis 72
3.9.3 In silico Development of CAPs from SNP
Genotypes
72
CHAPTER IV
RESULTS AND DISCUSSION
4.1 C4H and CAD EST Data Analysis 73
4.2 Total DNA and RNA Extraction 78
4.2.1 Total RNA Extraction 78
4.2.2 Total DNA Extraction 80
ix
4.2.2.1 DNA Qualification and Quantification 81
4.3 Full-length C4H and CAD cDNA discovery and
analysis
82
4.3.1 Cinnamate 4-hydroxylase (C4H) 86
4.3.1.1 NcC4H cDNA Sequence 86
4.3.1.2 C4H Genomic DNA Sequence 89
4.3.1.3 In Silico Analysis of NcC4H cDNA
Sequence
92
4.3.2 Cinnamyl alcohol dehydrogenase (CAD) 99
4.3.2.1 Full-Length NcCAD cDNA Sequence 99
4.3.2.2 CAD Genomic DNA Sequence 101
4.3.2.3 In Silico Analysis of NcCAD cDNA
Sequence
104
4.4 Sequence Variations of C4H and CAD DNA
Sequences
110
4.4.1 Cloning of C4H and CAD DNA fragments 110
4.4.2 Single Nucleotide Polymorphisms (SNPs)
Analysis
115
4.4.3 Nucleotide Diversity Analysis 127
4.4.4 Phylogenetic Relationship Study Using SNP
Markers
130
4.4.5 Linkage Disequilibrium (LD) 131
4.5 Basic Wood density 136
4.6 Association Genetics Study 138
4.7 In Silico Restriction Enzymes Analysis
145
CHAPTER V CONCLUSIONS
147
REFERENCES
150
x
APPENDIXES
Appendix A
Repetitive sequencing results for selected samples 181
Appendix B
Sequences alignments showing the positions of start and
stop codons
185
Appendix C
Alignment of three full-length C4H and CAD clones
together with their respective hypothetical sequence
187
Appendix D
Gel purified PCR products of C4H and CAD genes after
analyzed on 1.5 % agarose gel. In total, four C4H
regions and three CAD regions were amplified and
purified from 12 Kelampayan samples
194
Appendix E
SNPs detection in C4H DNA sequences
196
Appendix F
SNPs detection in CAD DNA sequences
212
Appendix G
Synonymous and nonsynonymous mutations in C4H
and CAD amino acid sequences
223
Appendix H
Shared allele distance calculated using PowerMarker
software
227
Appendix I Neighbour joining trees constructed using shared allele
distance
230
Appendix J LD analysis of SNPs in CAD and C4H genes using
TASSEL v.3 software
231
Appendix K LD analysis of SNPs in C4H and CAD genes using
DnaSP5 software
244
xi
LIST OF TABLES
TABLE NO.
PAGE
Table 2.1 The pros and cons of low-lignin wood. 20
Table 2.2 Various molecular markers and their characteristics. 31
Table 2.3 Applications and features of various molecular markers. 38
Table 3.1 Ligation reaction mixture and volume. 55
Table 3.2 Restriction digestion reaction mixture and volume. 58
Table 3.3 DBH and GPS reading for 12 Kelampayan trees collected. 61
Table 3.4 PCR profile for each C4H and CAD primer sets. 66
Table 4.1 Blasting result of the hypothetical C4H amino acid sequence. 77
Table 4.2 Blasting result of the hypothetical CAD amino acid sequence. 77
Table 4.3 The concentration of total RNA extracted from developing
xylem tissue of Kelampayan.
79
Table 4.4 The concentration of total DNA extracted from inner bark tissue
of Kelampayan.
82
Table 4.5 Full-length forward and reverse primers designed for C4H and
CAD genes.
83
Table 4.6 The BLASTn output for full-length NcC4H cDNA sequence
discovered from Kelampayan.
88
Table 4.7 Introns and genes length of known C4H genomic DNA
sequences. The gene length is calculated from start codon to
stop codon.
91
Table 4.8 Comparison of NcC4H protein structure again structures in
PDB by using Dali server.
98
Table 4.9 The BLASTn output for full-length NcCAD cDNA sequence
discovered from Kelampayan.
99
xii
Table 4.10 Intron-exon structures of CAD genes from Populus. 103
Table 4.11 Comparison of NcCAD protein structure again structures in
PDB by using Dali server.
108
Table 4.12 Primers designed to flank the C4H and CAD genomic DNA
sequences.
111
Table 4.13 SNPs detected within partial C4H and CAD DNA sequences. 116
Table 4.14 Synonymous and nonsynonymous mutations in C4H and CAD
genes.
118
Table 4.15 The distribution of SNPs in C4H and CAD DNA sequences and
the resulted amino acid substitution.
119
Table 4.16 Haplotype (Hd) diversity, nucleotide diversity (θ and π) and
neutrality test statistics (D) in C4H and CAD candidate genes.
128
Table 4.17 Mean nucleotide diversity (θW and π) in different nucleotide
sites or gene regions for C4H and CAD candidate genes.
129
Table 4.18 The number of significant pairwise LD calculated using Chi-
square Test in DnaSP5 software.
132
Table 4.19 Estimate of the recombination parameter in the history of the
C4H and CAD loci.
135
Table 4.20 Basic wood density measured for 12 Kelampayan samples. 136
Table 4.21 Association test for SNPs from C4H genes with basic wood
density.
139
Table 4.22 Association test for SNPs from CAD genes with basic wood
density.
142
Table 4.23 Restriction enzymes identified for specific cutting on C4H and
CAD SNP regions.
146
xiii
LIST OF FIGURES
FIGURE NO. PAGE
Figure 2.1 (a) Kelampayan tree; (b) Kelampayan seedlings; (c) Kelampayan
flowers (Source: http://www.flickr.com/photos/ravi_gogte/
3821932711/); and (d) Kelampayan fruits (Krisnawati et al.,
2011).
8
Figure 2.2 Structure of wood. 15
Figure 2.3 Structure of plant cell wall. 18
Figure 2.4 Three type of monolignols. 21
Figure 2.5 Phenylpropanoid pathway leading to monolignol precursors of
lignin.
23
Figure 2.6 Schematic diagram of the pivotal role of C4H as a functional link
between the cytosolic enzymes of general phenylpropanoid
metabolism, PAL and 4CL, and the membrane-associated
electron-transfer reactions catalyzed by CPR.
25
Figure 2.7 Schematic representation of LD among genetic markers, genes
and a causal mutation.
41
Figure 2.8 Steps involved in conducting an association genetics study. 44
Figure 2.9 Genotyping strategies for candidate gene and genome-wide
association study (GWAS).
45
Figure 3.1 Grouping of the C4H EST singletons according to the alignment
score and position on gene.
49
Figure 3.2 Grouping of the CAD EST singletons according to the alignment
score and position on gene.
49
Figure 3.3 Collection of Kelampayan developing xylem tissue. 51
Figure 3.4 Collection of Kelampayan inner bark tissues. 60
Figure 3.5 The map for Kelampayan samples collected. 62
xiv
Figure 3.6 The locations of overlapping primers designed for C4H and CAD
DNA sequences.
65
Figure 3.7 Collection of Kelampayan Wood cores. 70
Figure 4.1 The hypothetical full-length C4H predicted through contig
mapping approach.
74
Figure 4.2 The hypothetical full-length CAD predicted through contig
mapping approach.
74
Figure 4.3 The hypothetical full-length C4H cDNA sequences. 75
Figure 4.4 The hypothetical full-length CAD cDNA sequences. 76
Figure 4.5 Total RNA isolated from developing xylem tissue of
Kelampayan.
79
Figure 4.6 Total DNA isolated from 12 inner bark tissues of Kelampayan. 80
Figure 4.7 Gel purified PCR products for full-length CAD and C4H cDNA. 83
Figure 4.8 Plasmids isolated for full-length CAD and C4H genes. 85
Figure 4.9 Restriction digestions on CAD and C4H plasmids by using
EcoRI restriction enzyme.
86
Figure 4.10 The full-length NcC4H cDNA sequence discovered from
Kelampayan. The start (ATG) and stop (TAG) codon were
bolded and underlined.
87
Figure 4.11 The NcC4H amino acid sequence translated for Kelampayan. 88
Figure 4.12 The C4H genomic DNA sequence discovered from Kelampayan. 90
Figure 4.13 The graphical presentation of full-length NcC4H cDNA and
partial C4H genomic DNA sequence.
91
Figure 4.14 Multiple alignment of NcC4H protein sequence with C4H
protein sequences from other species.
93
Figure 4.15 Classification of C4H genes from different plant species. 95
Figure 4.16 Phylogenetic tree constructed for NcC4H gene from Kelampayan
by using MEGA5 software.
96
xv
Figure 4.17 The secondary structure of NcC4H protein predicted by using
CDM software.
97
Figure 4.18 Tertiary structure of NcC4H protein predicted by using Phyre2. 98
Figure 4.19 The NcCAD cDNA sequence discovered from Kelampayan. The
start (ATG) and stop (TAG) codon were bolded and underlined.
100
Figure 4.20 The NcCAD amino acid sequence translated for Kelampayan. 100
Figure 4.21 The CAD genomic DNA sequences discovered from
Kelampayan. The start (ATG) and stop (TAG) codon were
bolded and underlined.
102
Figure 4.22 The graphical presentation of full-length NcCAD cDNA and
partial CAD genomic DNA sequence.
103
Figure 4.23 The motif domains detected within NcCAD amino acid
sequence.
104
Figure 4.24 Phylogenetic tree constructed for NcCAD gene from
Kelampayan by using MEGA5 software.
106
Figure 4.25 Phylogenetic tree showing the classification of CADs in CAD
gene family.
107
Figure 4.26 The secondary structure of NcCAD protein predicted by using
CDM software.
109
Figure 4.27 Tertiary structure of NcCAD protein modelled by using Phyre2
(colour by secondary structure).
110
Figure 4.28 Colony PCR product for CAD3 region after analyzed on 1.5 %
agarose gel.
112
Figure 4.29 Plasmid isolated from C4H1 region. Lane M1: Supercoiled DNA
ladder (Promega, USA); Lane M2: λ HindIII DNA marker
(Promega, USA).
113
Figure 4.30 Plasmid isolated from C4H2 region. Lane M1: Supercoiled DNA
ladder (Invitrogen, USA); Lane M2: λ HindIII DNA marker
(Promega, USA)
113
xvi
Figure 4.31 Plasmid isolated from C4H3 region. Lane M1: Supercoiled DNA
ladder (Promega, USA); Lane M2: λ HindIII DNA marker
(Promega, USA).
113
Figure 4.32 Plasmid isolated from C4H4 region. Lane M1: Supercoiled DNA
ladder (Promega, USA); Lane M2: λ HindIII DNA marker
(Promega, USA).
114
Figure 4.33 Plasmid isolated from CAD1 region. Lane M1: Supercoiled DNA
ladder (Promega, USA); Lane M2: λ HindIII DNA marker
(Promega, USA).
114
Figure 4.34 Plasmid isolated from CAD3 region. Lane M1: Supercoiled DNA
ladder (Promega, USA); Lane M2: λ HindIII DNA marker
(Promega, USA).
114
Figure 4.35 Plasmid isolated from CAD4 region. Lane M1: Supercoiled DNA
ladder (Promega, USA); Lane M2: λ HindIII DNA marker
(Promega, USA).
115
Figure 4.36 Different nucleotide substitution observed in C4H and CAD
DNA sequences.
124
Figure 4.37 Two-nucleotides (A/C) and three-nucleotides (A/C/T) SNP
detected in C4H3 region.
125
Figure 4.38 The deletion of G nucleotide at position 894 of C4H3_NcMT2
fragment had change the amino acid sequence starting from the
mutation site.
126
Figure 4.39 Neighbour joining tree constructed using shared allele distance
calculated from the combination of CAD and C4H SNP data.
130
Figure 4.40 The TASSEL generated triangle plot for pairwise LD between
SNP marker sites in C4H and CAD gene fragments (above the
diagonal displays r2 values and below the diagonal displays the
corresponding p-values).
132
Figure 4.41 LD plot for all paired polymorphic sites in C4H and CAD genes
and fitted with a linear and logarithmic trend line.
133
xvii
LIST OF ABBREVATIONS
AFLP Amplified fragment length polymorphism
C4H Cinnamate 4-hydroxylase
CAD Cinnamyl alcohol dehydrogenase
CAPS Cleaved amplification polymorphic sequence
CCR Cinnamoyl-CoA reductase
DBH Diameter at breast height
EST Expressed sequence tag
GAS Gene-assisted selection
GLM General linear model
GWAS Genome-wide association study
Indel Insertion and deletion mutation
LD Linkage disequilibrium
LKCT Lesser-known commercial timbers
MAS Marker-assisted selection
PCR Polymerase chain reaction
QTL Quantitative loci
QTN Quantitative trait nucleotide
RAPD Random amplified polymorphic DNA
RFLP Restriction fragment length polymorphism
SAD Sinapyl alcohol dehydrogenase
SNP Single nucleotide polymorphism
SSR Simple sequence repeat
UTR Untranslated region
1
CHAPTER I
INTRODUCTION
Wood is undoubtedly the most versatile raw material available to human for construction,
paper, fuel and non-wood forest products. The available wood supplies from natural forests
being insufficient to meet the ever increased demand for wood, as the wood-using population
increased continuously (FAO, 2010). Rapid deforestation and declining in timber production
has imminent the planted forests development, with fast growing indigenous tree species
being emphasized. The economic advantages of planted forests with genetically superior trees
are therefore have gained great attentions worldwide (Sedjo, 1999). Hence, the future of the
planted forests as well as the forest products industry will rely upon the ability to domesticate
the indigenous tree species and adapt or alter them for maximum economic yield in the highly
controlled environments (Plomion et al., 2001).
Neolamarckia cadamba, or locally known as Kelampayan is one of the indigenous tree
species that are being selected for forest plantation establishment in Sarawak due to the high
productivity and short rotation time of this species (Sarawak Timber Industry Development
Corporation, 2009). Kelampayan is one of the lesser-known commercial timbers (LKCT)
which posses various benefits for wood-based industry such as picture frame, moulding,
skirting, wooden sandals, disposable chopstick, general utility furniture, veneer, plywood as
well as pulp and paper making (Lim et al., 2005). Moreover, the leaves and bark of
Kelampayan have been reported to have high pharmacological values (Joker, 2000; Patel and
2
Kumar, 2008). The root extracts from Kelampayan also can reduce the blood glucose
concentration and thus suggesting the utility of Kelampayan extracts in the treatment of
diabetes (Acharyya et al., 2010).
Planting stocks of Kelampayan is very important in ensuring the adequate supply of
timber for fulfilling the high demand in the market. Various efforts have been taken to
optimize the production scale and at high wood quality manner. To date, studies on
Kelampayan at molecular level are still limited. However, a Kelampayan wood formation
genomic resource database, aka Cadamomics (10,368 ESTs) have been developed by
researchers from the Forest Genomics and Informatics Laboratory, UNIMAS
(http://fgilab.com/) and Sarawak Forestry Corporation. This database has yielded an array of
useful information and resources for researchers to deeply explore the genomics basic of
Kelampayan.
Understanding of the molecular and physiological mechanisms of wood formation is
now considered as the main research area which must draw a serious attention upon diverse
disciplines, spanning from conventional biochemical and wood sciences to the genomics. To
date, study on adaptive traits especially in the wood formation has gained a lot of attention in
numerous forest tree species, like poplar (Sterky et al., 1998; Jansson and Douglas, 2007),
Eucalyptus (Rengel et al., 2009), Acacia hybrid (Yong et al., 2011), and loblolly pine
(Whetten et al., 2001). Therefore, understanding on the structure and composition of
Kelampayan wood through genome integration is vital to better utilize this wood materials.
3
Wood is a complex material composed of polymers of cellulose, hemicelluloses and
lignin that are physically and chemically bond together. Lignin, the second most abundant
organic compound after cellulose, represents approximately 20-30% of the plant biomass. It is
mainly found in supporting and conducting tissue of the plants such as fibers and tracheary
elements. Lignin is formed through dehydrogenative polymerization of monolignols known as
coumaryl alcohol, coniferyl alcohol and sinapyl alcohol, which give rise to ρ-coumaryl units
(H), guaiacyl (G) units, and sinapyl (S) units, respectively (Brett and Waldron, 1990). Due to
the mechanically rigid nature of lignin and the deposition on the cell wall, lignin is able to
offer mechanical and structural support to the plants, and allow the transportation of water
becomes smoother in the tracheids and vessels. Moreover, lignin is very resistant to
degradation in nature and thus has provided a significant protective function again pathogen
or decaying fungi (Brett and Waldron, 1990; Higuchi, 1997).
In paper and pulp industry, lignins have to be separated from cellulose and
hemicelluloses by an expensive and polluting process (Sederoff, 1999). In concern to this, the
study on lignin biosynthesis genes, such as CAD, gene encoded a cinnamyl alcohol
dehydrogenase and C4H, gene encoded a cinnamate 4-hydroxylase, had brought a new
discovery in paper and pulp making industry where any up- or down-regulation of these genes
will resulted in altered lignin production (Baucher et al., 2003). The main function of C4H is
to catalyze the hydroxylation of cinnamate to 4-coumarate at the early stage of lignin
biosynthesis pathway, while CAD catalyzes the reduction of cinnamaldehydes to ρ-coumaryl,
coniferyl and sinapyl alcohols during the final stage of lignin biosynthesis pathway (Lewis,
1999).
4
To date, there are considerable amounts of full-length C4H and CAD genes being
published in NCBI, but no such information available for Kelampayan trees. Moreover, C4H
and CAD genes are known to be correlated with some other phenotypic variations in forest
trees species rather than lignin production only (Yu et al., 2006; Gonzalez-Martinez et al.,
2007; Tchin et al., 2011; Schilmiller et al., 2009; Bjurhager et al., 2010; Wegrzyn et al.,
2010). Abreu et al. (2009) also proposed that the high ß-O-4 (Alkyl Aril Ether) bonds in
lignin of angiosperms may possibly affect the wood properties. Therefore, extensive study
especially at molecular level is needed to determine the correlation between C4H and CAD
genes with the Kelampayan wood properties.
Phenotypic variations among individuals may due to the cumulative effect of a number
of genes and/or the environmental influences (Bentz et al., 2011; Flatscher et al., 2012).
However, the basic genetic architecture of such complex adaptive traits is very difficult to
discover through traditional linkage-based approaches. Moreover, traditional linkage-based
approach for choosing planting materials with desired traits is laborious, time consuming and
expensive due to the requirement of mapping population establishment (Myles et al., 2009).
With advance in science and technology, a promising approach to study the complex adaptive
traits is to investigate the sequence variations (single nucleotide polymorphism, SNP) of
candidate genes at the nucleotide level, and the correlation of such variations with the
phenotypes of the trees (Neale and Kremer, 2011). This approach, namely association genetics
study, has been broadly embraced in forest trees studies, and is more powerful in
identification of the genes or loci that contribute to variation in complex traits (Long and
Langley, 1999). Association genetic study is a natural uncontrolled experiment that can give a
higher mapping resolution as compared with the linkage mapping (Myles et al., 2009). In the
5
space of just a few years, association genetics study has been widely applied in forest tree
species such as Pinus, Pseudotsuga, Populus and Eucalyptus (Neale and Kremer, 2011).
Single nucleotide polymorphism (SNP), where the sequences differ only in a single
nucleotide, has become marker of choice for association genetics study due to the abundance,
stable, ubiquity and interspersed characteristics of it in nuclear genome (Fusari et al., 2008).
SNPs are less polymorphic than other genetic markers for example simple sequence repeats
(SSRs), but still provide information regarding the genetic constituent of a living organism
(Rafalski, 2002a). SNPs are also far easier to be detected since only one single base in specific
sequences become target (Prince et al., 2001). Hayashi et al. (2004) also argued that SNP
markers are more preferable than RFLP markers due to the efficient and cost effectiveness of
SNP markers. In addition, SNP markers are also suitable for germplasm selection at early
seedling stage due to the amount of genomic DNA required for the detection is relatively low
(Hayashi et al., 2004).
Application of genomics science in Kelampayan breeding programs will improve our
understanding of their unique biology, and accelerate the discovery of genes controlling
economically and ecologically important traits through candidate gene-based association
genetics study. Gene-assisted selective breeding method in forest industry by using SNP
markers are expected to increase the selection efficiency and reduce the time and cost
associated with measuring the traits (Fusari et al., 2008). However, the prerequisite for the
determination of nucleotide polymorphism in candidate genes is the knowledge of the gene
sequences. A lot of partial or full-length sequences of genes are now accessible through online