36
Proteome bioinformatics and genetics for associating proteins with grain phenotype Rudi Appels, Centre for Comparative Genomics, Murdoch University, Australia Paula Moolhuijzen, Brett Chapman, Wujun Ma, Dean Diepeveen, Matthew Bellgard, Centre for Comparative Genomics, Murdoch University and Department of Food and Agriculture WA, Australia. Yueming Yan, Shunli Wang, Capital Normal University, Beijing Angela Juhasz, Agricultural Institute, Martonvásár, Hungary Frank Bekes, FBFD Pty Ltd, Beecroft, Sydney, Australia 2119 CENTRE FOR COMPARATIVE GENOMICS

Proteome bioinformatics and genetics for associating proteins with grain phenotype

Embed Size (px)

DESCRIPTION

International Gluten Workshop, 11th; Beijing (China); 12-15 Aug 2012

Citation preview

Page 1: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

Rudi Appels, Centre for Comparative Genomics, Murdoch University, Australia

Paula Moolhuijzen, Brett Chapman, Wujun Ma, Dean Diepeveen, Matthew

Bellgard,

Centre for Comparative Genomics, Murdoch University and Department

of Food and Agriculture WA, Australia.

Yueming Yan, Shunli Wang,

Capital Normal University, Beijing

Angela Juhasz,

Agricultural Institute, Martonvásár, Hungary

Frank Bekes,

FBFD Pty Ltd, Beecroft, Sydney, Australia 2119

CENTRE FOR

COMPARATIVE GENOMICS

Page 2: Proteome bioinformatics and genetics for associating proteins with grain phenotype

• Stage 1A Pawsey Centre (SKA) • Ranked 87 in the world • 9600 cores

Centre for Comparative Genomics (CCG) at Murdoch University

CENTRE FOR

COMPARATIVE GENOMICS

Supercomputer

Page 3: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

Rudi Appels, Centre for Comparative Genomics, Murdoch University, Australia

Paula Moolhuijzen, Brett Chapman, Wujun Ma, Dean Diepeveen, Matthew

Bellgard,

Centre for Comparative Genomics, Murdoch University and Department

of Food and Agriculture WA, Australia.

Yueming Yan, Shunli Wang,

Capital Normal University, Beijing

Angela Juhasz,

Agricultural Institute, Martonvásár, Hungary

Frank Bekes,

FBFD Pty Ltd, Beecroft, Sydney, Australia 2119

CENTRE FOR

COMPARATIVE GENOMICS

Page 4: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

• Genome sequencing and high resolution genetic maps of wheat

• Integrating new wheat protein level analyses

• Translating research findings to industry – the Decision Matrix

Page 5: Proteome bioinformatics and genetics for associating proteins with grain phenotype

• The integration of new efforts to obtain reference sequences for bread

wheat and barley genomes is accelerating gene discovery.

• Locations of traits and proteins on DNA sequence assemblies via

genetic maps define gene networks

•The genomic resources are refining molecular marker development and

mapping strategies for combining yield with quality attributes of the

grain that meet markets requirements

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

Page 6: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

• Genome sequencing and high resolution genetic maps of wheat

• Integrating new wheat protein level analyses

• Translating research findings to industry – the Decision Matrix

Page 7: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

Locations of proteins within a genetic map can be determined One of the first examples was published by Amiour (2003) using 2D gels to identify chromosomal locations of amphiphilic proteins from wheat grains . Later Chen et al (2007) carried out mapping using MALDI-TOF defined peaks of gliadin Progress in the DNA sequencing of the wheat transcribed genes and now allows higher resolution maps to be established

Amiour N, et al (2003) Theor. Appl. Genet. 108: 62–72. .

Chen J, et al (2007) Rapid Comm Mass Spectrometry 21: 2913 – 2917

Page 8: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

2007 – 2012 Suites of genomic resources and knowledge have been established to provide

the foundation for sequencing the wheat and barley

• International Wheat Genome Sequencing Consortium (www.wheatgenome.org)

• UK WISP consortium (www.wheatisp.org)

• International Barley Sequencing Consortium (www.barleygenome.org)

• European TriticeaeGenome FP7 project (www.triticeaegenome.eu)

The initiatives built on long standing resources such as:

• KOMUGI in Japan (www.shigen.nig.ac.jp/wheat/komugi/)

• Graingenes in the USA (wheat.pw.usda.gov/GG2/index.shtml)

• Extensive EST collections (ITEC http://avena.pw.usda.gov/genome/)

Page 9: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

• Reducing the complexity of the

wheat genome through flow

sorting of chromosome arms has

formed the basis for the

international effort to produce a

reference sequence for the variety

Chinese Spring

• All the chromosome arms now

have a completed survey sequence

analysis. This provides a pool of

DNA contigs that can be used to

anchor gene sequences and

proteins to chromosome arms

Page 10: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

The array technologies to

assay single nucleotide

polymorphisms (SNPs) is now

establishing genetic maps with

2000-3000 molecular markers

.

map for chromosomes

1A, 1B, 1D, from a cross,

Avalon x Cadenza

Allen AM, Barker GLA, Berry ST, Coghill, JA, Gwilliam R, Kirby S, Robinson P, Brenchley RC, D’Amore R,

McKenzie N, Waite D, Hall A, Bevan M, Neil Hall N, Edwards KJ. (2011)Transcript-specific, single-nucleotide

polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.). Plant Biotechnology

Journal 2011: 1–14

Page 11: Proteome bioinformatics and genetics for associating proteins with grain phenotype

The 9000 SNP array (“chip”) technology for assaying

SNPs has been used to establish a 2000 molecular

marker map for a set of 225 double haploid lines from a

Westonia x Kauz cross.

A large study in Australia is examining progeny from a

complex cross (MAGIC, currently a 4 –way cross using

Baxter, Yitpi, Westonia, Chara, 1500 lines, with markers

from a 9K SNP chip and markers from a 90K chip

planned). This work at CSIRO with Colin Cavanagh.

Proteome bioinformatics and genetics for associating

proteins with grain phenotype Chromosome 7A

An 8 –way cross using Baxter, Yitpi, Westonia, AC

Barrie (Canada), Alsen (US), Pastor (CIMMYT),

Xiaoyan 54 (China), and Volcani (Israel), 5000 lines are

being characterized.

Page 12: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

In a large population of 5,000 lines (as required for accurate mapping) it is not

feasible to phenotype all progeny

The marker information can be used to define families of progeny for

phenotyping

For the 1500 lines from the 4x MAGIC lines, a population 370 families have

been defined for phenotyping (in duplicated/randomized designs) and while we

are still in the middle of this analysis (includes milling yield), some QTL for %

wet gluten at the LMW-glutenin locus of chromosome 1B are evident.

It is interesting that in the high resolution maps the QTL may not be exactly

superimposed on the LMW-glutenin locus.

Page 13: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

GluStar system

for “wet

gluten”

measurements

on 4.5 g flour

• MAGIC and

assignment of a QTL

for % wet gluten to

1B near the LMW

glutenin locus but

not coincident with it

• The high density of

markers allows a

fine resolution of

map location when

1,500 progeny are

analyzed

Tomoshozi S, Budapest University of Technology and Economy; http://www.labintern.hu

Page 14: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

Li et al (2010). BMC Plant Biology 10:124

To determine protein fingerprints as a “phenotype” we have explored MALDI-

TOF as a means for increasing the number of lines we can analyse.

Low molecular weight glutenins

Page 15: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

Page 16: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

High molecular weight glutenins (70,000– 90,000 Da)

Gao L et al (2010). J Ag Food Chem 58: 2777–2786 Li et al (2009). Cereal Sci. 50: 295-301;

Page 17: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

HMW-GS Mr (Da) deduced from coding gene Mr (Da) by MALDI-TOF

1Ax2* 86309 86200

1Bx6 Unknown 86500

1Bx7 82524 82300

1Bx7OE 83134 82900

1Bx7b* Unknown 82600

1Bx13 Unknown 83000

1Bx14 84012 83600

1Bx17 78607 77900, 78400

1Bx20 Unknown 82100

1Dx2 87022 87000

1Dx3 Unknown 85400

1Dx5 88128 87900

1By8 75156 74900

1By8a* Unknown 74800

1By8b* Unknown 75000

1By9 73515 73300

1By15 75733 74900

1By16 Unknown 76900

1By18 Unknown 75000

1By20 Unknown 74900

1Dy10 67473 67300

1Dy12 68652 68300 Li et al (2009) Cereal

Sci. 50: 295-301;

Page 18: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

The MALDI-TOF based analyses of the LMW and HMW glutenins have

provided a good basis for establishing a high throughput analysis for breeding

programs. This analysis now runs as a fee-for-service (Saturn Biotech;

AUS$6/sample).

The glutenin subunit protein loci we know to date however can only account

for approximately 60% of the variation in measured grain quality attributes.

More detailed genetic analyses is yielding new information

Page 19: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

Chromosome 1D

L29183 L33288 L33529

The classic designation of the LMW

glutenin locus Westonia on

chromosome 1D is LMWG-D3c (in

addition to A3c, B3h).

Kauz designation is not known

Peaks from:

Westonia = L33288

Kauz = L29183, L33529

Peaks found in LMWG-D3c (based on

Li et al 2010):

33021

33290

33453

Map based on DH lines from a

Westonia x Kauz cross

Li et al (2010). BMC Plant Biology 10:124

Page 20: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

Chromosome 1D

L29183 L33288 L33529

The classic designation of the LMW

glutenin locus Westonia on

chromosome 1D is LMWG-D3c (in

addition to A3c, B3h).

Kauz designation is not known

Peaks from:

Westonia = L33288

Kauz = L29183, L33529

Peaks found in LMWG-D3c (based on

Li et al 2010):

33021

33290

33453

Map based on DH lines from a

Westonia x Kauz cross

Li et al (2010). BMC Plant Biology 10:124

Page 21: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

L32831 L31965

Chromosome 7A

Classical mapping of LMW-glutenin loci defined the

chromosome 1A, 1B and 1D loci based on single

dimension SDS PAGE technology (Gupta and Shepherd,

1994) and it was noted then that the protein family was

complex.

We now find some of the peaks in the MALDI-TOF are

mapping to other chromosomes such as chromosome

7A

We used our wheat proteome data base to see if we

could identify the L32831 and L31965 proteins

Gupta and Shepherd (1994. Two-step one-dimensional SDS-PAGE

analysis of LMW subunits of glutenin. I. Variation and genetic control of

the subunits in hexaploid wheats. Theor. Appl. Genet. 80:65-74)

Page 22: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype Chromosome 7A

In this analysis we are accessing a complex

part of the LMW glutenin protein

spectrum that was not available for

analysis in the earlier SDS gel-based

studies

L32831 L31965

Page 23: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

L32831 L31965

Chromosome 7A

In this analysis we are accessing a complex

part of the LMW glutenin protein

spectrum that was not available for

analysis in the earlier SDS gel-based

studies

Page 24: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

Criteria for database search:

(1) Qualitative – amino acid composition (occurrence of QQQ etc) consistent

with being co-extracted with LMW-glutenins (gliadins removed before-

hand)

(2) Quantitative – molecular weight within 10 dalton

IWGSC_4DS_v1_2275417.fa.genscan.pep.1 31960 IWGSC_2AL_v1_6356128.fa.genscan.pep.2 31960 IWGSC_4BS_v1_4917914.fa.genscan.pep.1 31960 IWGSC_1AL_v2_3915175.fa.genscan.pep.1 31960 Komugi_ AJ133603_1 31960

IWGSC_3B_v1_10586963.fa.genscan.pep.1 31961 IWGSC_5DS_v1_2734070.fa.genscan.pep.1 31961 IWGSC_2BS_v1_5247743.fa.genscan.pep.3 31961

>Komugi_AJ133603_1 AJ133603

7209247 [Triticum aestivum]

Triticum aestivum mRNA for alpha-

gliadin storage protein, clone alpha-9

MVRVTVPQLQPQNPSQQQPQEQ

VPLVQQQQFLGQQQPFPPQQPYP

QPQPFPSQQPYLQLQPFPQPQLP

YSQPQPFRPQQPYPQPQPQYSQP

QQPISQQQQQQQQQQQQQQQQ

QQQQQQQILQQILQQQLIPCMDV

VLQQHNIVHGRSQVLQQSTYQLL

QELCCQHLWQIPEQSQCQAIHNV

VHAIILHQQQKQQQQPSSQVSFQ

QPLQQYPLGQGSFRPSQQNPQAQ

GSVQPQQLPQFEEIRNLALQTLPA

MCNVYIPPYCTIAPFGIFGTNYR

Query : L31965

Page 25: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

Criteria for database search:

(1) Qualitative – amino acid composition (occurrence of QQQ etc) consistent

with being co-extracted with LMW-glutenins (gliadins removed before-

hand)

(2) Quantitative – molecular weight within 10 dalton

Query : L32831

IWGSC_4BL_v1_6996674.fa.genscan.pep.4 31980

Solomon_Q8H0J4_WHEAT 31934

Solomon_B2ZRD2_WHEAT 32829

>Solomon_B2ZRD2_WHEAT B2ZRD2

SubName: Full=Alpha-gliadin; [Triticum

aestivum (Wheat).]

MKTFLILALLAIVATTATTAGRVPVPQL

QPQNPSQQQPQEQVPLVQQQQFLGQ

QQPFPPQQPYPQPQPFPSQQPYLQLQP

FPQPQLPYSQPQPFRPQQPYPQPQPQY

SQPQQPISQQQQQQQQQQQQQQQEQ

QILQQILQQQLIPCMDVVLQQHNIAH

GRSQVLQQSTYQLLQELCCQHLWQIP

EQSQCQAIHNVVHAIILHQQQKQQQQ

PSSQFSFQQPLQQYPLGQGSSRPSQQN

PQAQGSVQPQQLPQFEEIRNLALQTLP

AMCNVYIPPYCTIAPFGIFGTN

Page 26: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

This analysis suggests that there are probably

more genetic loci for major seed storage proteins

than we have found to date.

Genome sequencing and proteome analyses,

combined with genetic mapping can define these

new loci and provide molecular markers for

breeding and selection.

Chromosome 7A

L32831 L31965

It turns out that a 1980 report did find

LMWG/gliadins on 4B and 7A

Salcedo G, Prada J, Sanchez-Monge R,

Aragoncillo C (1980). Aneuploid analysis of low

molecular weight gliadins from wheat. Theor

Appl Genet 56 ; 65-69

Page 27: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

The “hits” on chromosome 7A will be resolved

as we have now started to sequence this

chromosome, as a national project in Australia.

This is part of the International Wheat

Genome Sequencing Consortium (IWGSC) in

which different countries around the world are

doing a chromosome each.

Chromosome 7A

L32831 L31965

Page 28: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

• Genome sequencing and high resolution genetic maps of wheat

• Integrating new wheat protein level analyses

• Translating research findings to industry – the Decision Matrix

Page 29: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

The Wheat Proteome database:

Motivation : wheat genome, transcriptome and proteome studies are now advanced

and need a reference proteome database for

• annotating the genes in the wheat

• assigning peptides, obtained from high level proteomic analyses, to wheat proteins

Content of proteins/peptides:

• wheat/Triticum entries from SwissProt, UniProt, TrEMBL (2,690)

• translation from the KOMUGI full-length cDNA collection (13,717)

• peptides from INRA (France), USDA (USA), CNU (China) labs (still sorting out a

final non-redundant set)

• IWGSC-genome-wide-sequence (GWS) gene model translations (144,920)

Page 30: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

The Wheat Proteome database:

(1) Translations of conserved genes.

The IWGSC-GWS database for each chromosome arm typically identifies 4000-9000

genic sequences per chromosome. These include gene fragments and pseudogenes.

Following their identification, genes conserved between wheat, Brachypodium, rice,

sorghum and barley (Klaus Mayer “chromosome zipper”) can be clustered into

syntenic groups.

(2) Non-redundant proteins/wheat known to originate from wheat

30-40% of the gene complement in wheat and barley do not reside in the conserved

syntenic gene order space

All genes and protein/peptide sequences need to be anchored to the IWGSC-GWS

chromosome arms DNA sequences. So far only 205 KOMUGI translations and 6 from

the SwissProt/UniProt/TrEMBL dataset have been anchored to the IWGSC-GWS

translations so there is quite a bit of curation to carry out.

Page 31: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

• Genome sequencing and high resolution genetic maps of wheat

• Integrating new wheat protein level analyses

• Translating research findings to industry – the Decision Matrix

Page 32: Proteome bioinformatics and genetics for associating proteins with grain phenotype

To complete this presentation it

is important to consider

translating research findings to

industry.

(1) Further stream-lining of the

MALDI-TOF scoring of wheat

proteins

(2) Assigning a toxicity score to

specific proteins in considering

celiac and wheat allergy

reactions to wheat flour

The aim is to be able to enter

specific features of the wheat grain

as a number into a Decision Matrix

Feature

Genome

fingerprint

Gene

marker

Protein

marker

Other

traits

sele

ctio

n i

nd

ex v

alu

es

Weights assigned to features

For each breeding line

(matrix rows) the

feature score (matrix

columns) is multiplied

by the feature weight.

These are then added

to provide a selection

index (SI)

This SI is used to rank

breeding lines or

suitability for an end-

product in industry

Page 33: Proteome bioinformatics and genetics for associating proteins with grain phenotype

(1) Further stream-lining of the MALDI-TOF scoring of wheat proteins we are following

the MALDIquant process described by Sebastian Gibb (IMISE, University of Leipzig)

Dean Diepeveen

1: raw 2: variance stabilization 3: smoothing

4: base line correction 5: peak detection 6: peak plot

Page 34: Proteome bioinformatics and genetics for associating proteins with grain phenotype

(2) Assigning a toxicity score to specific proteins in considering celiac disease (CD) and

wheat allergy (WA) reactions to wheat flour

Proof of concept by Angla Juhasz and Frank Bekes carried on

the data set published by DuPont et al (2011)

Every protein in the wheat grain defined by DuPont et al

(2011) was assigned a toxicity score which is the result of the

amount of protein in the grain x the number of epitopes

present that are known to relate to CD and or WA

Page 35: Proteome bioinformatics and genetics for associating proteins with grain phenotype

(2) Assigning a toxicity score to specific proteins in considering celiac disease (CD) and

wheat allergy (WA) reactions to wheat flour

Proof of concept by Angla Juhasz and Frank Bekes carried on

the data set published by DuPont et al (2011)

Every protein in the wheat grain defined by DuPont et al

(2011) was assigned a toxicity score which is the result of the

amount of protein in the grain x the number of epitopes

present that are known to relate to CD and or WA

Page 36: Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating

proteins with grain phenotype

• Genome sequencing and high resolution genetic maps of wheat

• Integrating new wheat protein level analyses

• Translating research findings to industry – the Decision Matrix

The proteins of the wheat grain form a significant

phenotype in breeding, industry processing and

marketing, and will become more important in

defining the product