Credit seminar on rice genomics crrected

Ph. D. COURSE SEMINAR ON

Rice Genomics

Department of Genetics & Plant Breeding INSTITUTE OF AGRICULTURE SCIENCES

BANARAS HINDU UNIVERSITYVARANASI -221005

Supervisor:Prof. S.K. Singh

Co-SupervisorProf. Rajesh Singh

Presented by :Prudhvi Raj Vennela

FLOW OF PRESENTATION

1. Genomics

2. Types of genomics

3. History of genomics

4. Rice genomics

5. Pre- Genome sequencing era in Rice

6. Whole genome sequencing of Rice

7. Post – Genome sequencing era in Rice

8. Applications

9. Case study

10.Conclusion

(Genomics word was coined by Thomas Roderick in 1986.)(Study of structure & function of entire genome of a living organism)

(Study of the structure of entire genome of an organism)

(Study of the function of entire genome of an organism)

GENOMICS

Structural Genomics Functional Genomics

Comparative Genomics

(Study of the relationship of genome structure and function across different biological species or strains)

1980 – DNA markers (RFLP)1983 – Karry Mullis invented the PCR technique, andSeveral PCR based markers developed i.e. RAPD, AFLP, SSR, SNP, CAPS, STS, SCAR, EST, DFP, etc.. 1986 – Leroy Hood and Lloyd Smith developed the first semi-automatic DNA sequencer1990 – Development of Pyrosequencing by Pal Nyren1990 – The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma , Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae1995 – Craig Venter, Hamilton Smith at The Institute for Genomic Research (TIGR) publish the first complete genome of a free-living organism, the bacterium Haemophilus influenzae 1996 – Sequence of saccharomyces cervisiae genome completed.

1998 – The genome of the 1st multi-cellular organism of the Round worm (Caenorhabditis elegans) was completed.

1999 – Sequence of first human chromosome (chromosome 22nd)

2000 – The first plant to be completely sequenced is that of the Arabidopsis thaliana.

2001 – A draft sequence of the human genome is published.

2002 – Rice genome sequencing was completed

2003 – Human genome sequencing was completed

2004 – 454 Life Sciences markets a version of pyrosequencing machine developed. ‘454 Sequencing’ used in Barley and many others sequencing .

The first version of their machine reduced sequencing costs 6-fold compared to automated Sanger sequencing methods. Recently,2012 – Wheat genome was sequenced – R. Brenchley et al.2012 – Pigeonpea genome was sequenced – RKV et al.2013 – Chickpea genome was sequenced – RKV et al.

Nature Review, 2010 & Nature Biotechnology, 2001-2013

Rice Genomics ???

From here… …to here

Rice is a model cereal plant

• The small size of its genome (430 Mb)

• Its relatively short generation time

• Its relative genetic simplicity (it is diploid, or has two copies of

each chromosome).

• Easy to transform genetically.

• Belongs to the grass family

• Its greatest biodiversity among the cereal crops

Developmental Milestones in Rice Genomics

Development of the first saturated (RFLP) map.

The application of PCR based markers such as SSR markers.

Identification of QTLs for many agronomically important traits

and marker assisted breeding.

Development of efficient techniques for genetic transformation

which makes rice the easiest cereal to transform.

Complete sequencing and annotation of indica and japonica rice

genomes and development of new generation markers.

Synteny between genomes of rice and other cereals.

In the early to mid 1990s,

RFLP and RAPD

Sequence Tagged Sites (STS) markers

Simple Sequence Repeats (SSR)

Pre- Genome sequencing era in Rice

The first SSRs were reported in 1996 (O. Panaud et al.,).

By 1997, there were 121 validated SSRs, were reported but they had limited use for MAS, due limited genome coverage.

By 2001, a total of ~500 SSR that were developed from 57.8 Mb of available rice genome data, which further increased the utility of these markers.

Why rice genome was sequenced?

Institute which sequence the particular chromosomeSr no. Rice sequence participant Chromosomes

1 Rice Genome Research Program (RGP)Japan 1,6,7,8

2 Korea Rice Genome Research Program (Korea) 1

3 CCW(US) CUG(Clemson university)Cold spring Harbor University

3,10

4 TIGR –US 3,10

5 PGIR-US 10

6 University of Wisconsin-US 11

7 National Center of Gene Research Chinese Academy of science -china

4

8 Indian rice genome program-university of Delhi 11

9 Academia sinica plant genomic center (Taiwan) 5

10 Universidad fedral de Pelotas -Brazil 12

11 Kasetsant University –Thailand 912 MG Gill University –Canada 9

13 John innescenter –U.K 2

Milestone in rice genome sequencing

2)Feb 1998-IRGSP launched under coordination of RGP

1)Sept 1997 – Sequencing of the rice genome was initiated as an international collaboration among 10 countries

5)Dec 2002 – IRGSP finished high-quality draft sequence (clone-by-clone approach) with a sequence length, excluding overlaps, of 366 Mb corresponding to ~92% -RG

3)April 2000 – Monsanto Co. produced a draft sequence of BAC covering 260 Mb of the rice genome; 95% of rice genes were identify

identified

4)Feb 2001 – Syngenta produced a draft sequence & identified 32,000 to 50,000 genes, 99.8% sequence accuracy & identified 99% of the rice genes

6)Dec 2004-IRGSP produce

the high-quality’ sequence -entire rice genome; with 99.99% accuracy

& without any sequence gap

Indian complete work on RG sequence India joined -IRGSP in June 2000 and chose to sequence a part of chromosome 11.

India has invested Rs.48.83 crores for the "Indian Initiative for Rice Genome Sequencing (IIRGS)".

The initiative is a joint effort by the Department of Plant Molecular Biology (DPMB), University of Delhi South Campus (UDSC) and the National Research Centre on Plant Biotechnology (NRCPB) and the Indian Agricultural Research Institute (IARI), New Delhi.

Findings…………..

The chr. 11 was known to carry several diseases resistant gene including Xac1bacterial blight resistant gene.

Chromosome segment sequence by IARI involve ~6.825 million bp & predicted 1005 genes with unknown function.

IRGSPThe IRGSP effort evolved around a few basic points:

The sequencing strategy.

The rice cultivar to be sequenced.

The accuracy of sequence and the sequence release policy.

Nipponbare ???????????

Rice Genome Research Program (RGP), Japan, used it as a source

of EST sequencing and constructed a dense linkage and YAC

physical map.

The guidelines for the method of sequencing, sequence quality

and release policy were developed largely on the same lines as the

Human Genome Project

Sequence-ready physical map developed by PAC library

comprising of 71,040 clones and a BAC library consisting of 48,960

clones.

BAC library (~90,000 clones) made at Clemson University

Genomics Institute (CUGI) and BAC libraries made by Monsanto.

Two libraries for each clone having an insert size of 2 and 5 kb,

respectively.

Backbone of IRGSP

The IRGSP had set the target to finish the rice genome sequence

by 2008. This goal changed when Monsanto released the draft

sequence of ‘japonica’ in 2000.

Two other groups, Syngenta and BGI published drafts of

‘japonica’ and ‘indica’ simultaneously in 2002.

The draft sequence was released by the consortium at a meeting held

in Japan in December 2002. This task was speeded by Monsanto’s

decision to provide its BAC libraries sequenced up to 5X coverage

to IRGSP.

How rice was sequenced?

Clone by clone sequencing

Clone by clone sequencing also called as the directed sequencing of the BAC contigs.

The chromosomes were mapped

Then split up into sections

A rough map was drawn for each of these sections

Then the sections themselves were split into smaller bits.

Each of these smaller bits would be sequenced.

(* BAC clones (80-100 kb long DNA fragments ) arranged in contigs.)

In this approach, genomic DNA is cut into pieces

Inserted into BAC vectors

Transformed into E. coli where they are replicated

The BAC inserts are isolated

Mapped to determine the order of each cloned fragment.

Each BAC fragment in the Golden Path is fragmented randomly into smaller pieces

Each piece is cloned into a plasmid

Sequenced on both strands.

These sequences are aligned so that identical sequences are overlapping.

THE HIERARCHICAL SHOTGUN SEQUENCING METHOD

This is referred to as the Tiling Path.

Rice Genome Annotation

The accuracy of genome sequence should be evaluated by the

quality of annotation, i.e. assignment of biological function to the

sequence.

Gene modeling for a given sequence using gene prediction and

similarity search programmes facilitates gene discovery in a

systematic and comprehensive manner.

Rice GAAS (Rice Genome Automated Annotation System) has been

developed by combining

Coding region prediction programmes

Splice site prediction programmes (Sakata et al., 2002)

T-RNA gene prediction programme

Similarity msearch analysis programmes.

The interpretation of the coding region is though fully automated,

gene modeling is accomplished with manual evaluation

What does Rice genome sequencing reveals?

The map-based sequence covered 95% of the 389 Mb rice genome.

A total of 37,544 genes with an average gene density of one gene per 9.9 kb and average gene length of 2,699 bp.

Chromosomes 1 and 3 have the highest gene density.

Chromosomes 11 and 12 have the lowest gene density.

Transposable elements was maximum for chromosome 8 (38%) and 12 (38.3%) and least for chromosome 1 (31%), 2 (29.8%) and 3 (29%).

Contains at least 35% repeat elements.

japonica genome sequence showed almost 60% of the genome is duplicated.

421 chloroplast and 909 mitochondrial DNA insertions contributing to ~0.2% each of the nuclear genome.

GC content of 43.6% with 54.2% and 38.3% of exons and introns respectively.

Post - Genome sequencing era in Rice

The post – Genome sequencing era made an opening for the

“treasure chest” of new rice markers. They are:

SSR

SNPs

INDELS

Custom made markers

SSRs:

Using more than 2200 validated SSRs were released in 2002.

18,828 class 1 SSRs were released after the completion of Nipponbare genome sequence in 2005.

The extremely high density of SSRs(approx. 51 SSRs per Mb).

Now in Rice there are around 24,000(approx.) SSR markers available in the database.

Single nucleotide polymorphisms (SNPs)

They are most abundant and ubiquitous polymorphisms.

Lower levels of SNP marker polymorphism are detected in (indica x indica or japonica x japonica derived material), when compared with the japonica-indica reference genotypes.

The frequency of SNPs between subspecies was 0.68% to 0.70%, whereas, it was 0.03% to 0.05% between japonica cultivars and 0.49% between indica cultivars.

Currently the total number of collected SNPs are 2,34,58,338 in 17 accessions/cultivars.

Indels (Insertion/deletion):

Identified in silico by direct comparison of japonica and indica genome sequences.

A large No. of Indels were reported in indica × japonica populations.

Introns “tolerate” insertion/deletion mutations compared with exons.

Many Indels have been identified and been exploited by the development of Intron Length Polymorphic (ILP)markers.

Majority of them are reliable and co-dominant and also polymorphic between varieties within both subspecies.

“Custom-made” markersThe information of genome sequences, permits development of

markers that are tightly linked to target loci i.e.“custom-made” or

“tailor-made”.

The number of markers are generated using the rice genome

sequence.

Candidate gene (CG) identification can be integrated with

customized marker design that are usually more tightly linked to the

gene or QTL controlling the trait.

Comparison of Rice with cereal genomes

Analysis of rice genome sequence draft showed that homologues

of almost 98% wheat, barley and maize proteins could be

identified in rice.

Wheat–rice synteny was done using 4,485 wheat ESTs. revealed

that there was a general conservation of genes and their order in the

two species.

Rice and Maize revealed 656 putative orthologs with several

breaks in co-linearity.

Similar sequence-based alignments of rice done with sorghum and

barley revealed that there were some rearrangements along with a

general conservation of synteny.

Comparative genomics based on the syntenic relationship of rice

with other cereals has helped in, such as

QTL for malting quality in barley

Major heading date QTL in perennial ryegrass

Liguleless region in sorghum

Ror2, to powdery mildew disease in barley.

The 3K Rice Genome project

Rice is known for tremendous within-species and within genus

genetic diversity.

Exploring this diversity at the sequence level has been, until

recently, only a dream of rice scientists.

“The 3,000 (3K) Rice Genomes Project” is the answer for it….

Joint organizers of the project:

1. The Chinese Academy of Agricultural Sciences (CAAS),

2. the Beijing Genomics Institute (BGI)

3. The International Rice Research Institute (IRRI)

It is a major step towards revealing the genomic diversity in all of

the world’s rice germplasm collections.

Current status and Plans

Sequencing of 3,000 rice genomes has completed .

Which contains ………

Diverse set accessions originating from 89 countries.

Accessions from the ~180,000 rice accessions conserved in the

International Rice Genebank Collection (IRGC) at IRRI and the

China National Crop Genebank (CNCG).

400 parental lines of popular varieties and genome-wide

introgression lines for multiple complex traits.

Outcomes of 3K rice project

The outcome of the 3K Rice Genomes Project :

1. New population-specific genotyping arrays useful to a wide

range of genetic and breeding applications.

2. Population structures that have been shaped by evolution,

domestication, selection.

3. Identification of unique cryptic structural genomic variants

across the rice genome

Sequencing- based GWAS in rice

The efficient detection of the genetic diversity of germplasms

for mapping of agronomically important traits.

GWAS in rice showed that the integrated approach of sequence-

based GWAS and functional genome annotation can be used as a

complementary strategy to classical biparental cross mapping for

dissecting complex traits in rice.

Rice breeding in Post Genomics era

To achieve ‘Green Revolution’ (GR) in 1960s, which doubled rice

productivity under the modern high-input agricultural conditions extensive

efforts were made.

The successful commercialization of hybrid rice in China since late

1970s resulted in a second leap in rice productivity.

However, the world rice production has to be doubled again by 2030 to

meet the projected demand of the increasing world population and much of

this increase has to come from improved rice cultivars.

‘Super inbred and hybrid rice’ cultivars produced by

‘Ideotype’ breeding .

Exploiting inter-sub specific heterosis.

But,

‘super rice’ or ‘super hybrid rice’ cultivars require very high inputs

to realize their yield potentials.

Resulting in serious environmental pollution and related

problems.

Modern semi-dwarf rice cultivars have rarely achieved their yield

potentials in farmers’ fields because of many abiotic and biotic

stresses.

To achieve sustainable yield increases of rice, there has been a call

for developing ‘Green Super Rice’ (GSR) cultivars that can

produce high and stable yields under lesser inputs.

In addition, high iron and zinc contents have become important

objectives in many breeding programs.

The future rice breeding would require breeders that improve many

‘green’ traits in addition to high yield potential and desirable

quality.

This can be achieved by……………………

Improving the conventional breeding methodologies with the high

throughput genomic techniques.

Applications

In conventional breeding normally presence of genes can be

identified only when they are expressed.

By the use of genomic research now we can easily identify the presence or absence of gene in early stage

1.Genotype identity testing

For simple F1 hybrids

Seed purity or intra-variety variation

Hybrid rice lines

SSRs from mitochondrial genes have been targeted for the

development of markers to study maternally inherited traits such as

cytoplasmic male sterility or the maternal origin of rice accessions.

2. Genetic diversity analysis of breeding material

Hybrid rice breeding

3.Gene surveys in parental material

An example of this was demonstrated by Wang et al.(2007) who

used a set of dominant allele specific markers for surveying

markers to detect the presence of the Pi-ta resistance gene for rice

blast in a large germplasm collection.

Some important genes tagged using molecular markers

Trait Genes Markers Chromosome

reference

Blast resistance Pi-1 RZ 536, RG303, NpB 181 11

Pi-2 RG 64.XNpb 294 6

Bacterial blight

resistant

Xa-1 XNpb 235, XNpb 120 4

Xa-21 Y03700 4

Gall midge Gm(2) RG 329, RG476 4 etc.

4. Marker-assisted backcrossing (MABC)

5. Pyramiding

6. Use in Trans-genes: For example, transgenic rice (southern

U.S. japonica-type varieties) with inherent ability to produce beta-

carotene developed by Syngenta is available at IRRI and in several

other national programs.

GR1 events (GR1-146, GR1-309, and GR1-652) as donor

parents, while 2 IRRI-bred mega varieties (IR64 and IR36) and a

popular Bangladeshi variety (BR29) were used as recurrent

parents.

Characterization of genes controlling Important trait

Biofortified Rice Development

Case Study

Breeding strategies in post-genomics era

IRRI and China ever since 1998 for developing GSR cultivars.

This strategy contains two major well integrated components in

three steps:

1) Developing trait specific introgression lines(IL’s).

2)Large scale gene/QTL discovery and allele mining.

3) Developing GSR cultivars with multiple green traits by designed

QTL pyramiding (DQP) or by Marker assisted Recurrent

selection(MARS).

Large scale rice Molecular breeding- an example

Large scale backcross breeding activities were conducted using 25 best commercial varieties and hybrid parents as the recipients and 203 mini-core germplasm accessions from worldwide as donors.

Advanced backcross populations developed from crosses were screened and progeny tested for a wide range of many abiotic and biotic stresses, resulting in the development of multiple sets of trait-specific ILs.

Step-1

Results obtained from the massive introgression breeding activities gave the information :

1. Tremendous amounts of useful genetic diversity in the gene pool of O. sativa for all complex target traits, which are hidden in the exotic germplasm accessions.

2. Backcross breeding with strong phenotypic selection is a powerful way to exploit this rich source of hidden genetic diversity.

3. Selection of parental lines for breeding based on target phenotype(s) practiced by most breeders is a poor way in exploiting this hidden genetic variation for complex traits.

Selected ILs will be progeny tested in replicated experiments for the selected target traits and important non-target traits along with genotyping to detect QTL and QTL networks.

The generated genetic information is used for characterizing genome wide responses to strong phenotypic selection in the ILs.

Step-II

Superior ILs carrying favorable alleles from different donors are selected based on accurate genetic information generated in step II to cross with one another.

Segregating populations from these crosses will be subjected to strong phenotypic selection and/or GS for developing new cultivars. Selected progeny will be characterized for target and non-target traits in genotyping and phenotyping experiments to verify loci for target traits identified in step II and to remove undesirable genetic drags.

Step-III

Schematic representation

ConclusionThe recent integration of advances in molecular biology, genomic research, transgenic breeding and molecular marker applications with conventional plant breeding practices has created the foundation for molecular plant breeding or ‘precision’ breeding.

Rice genomics can play a significant role in enhancing the quantity and quality of rice production in order to feed more of the world’s population.

Thank You