Upload
iris-martinez-rodero
View
293
Download
0
Embed Size (px)
Citation preview
DE NOVO RNA-SEQ FOR THE STUDY OF ODAP SYNTHESIS PATHWAY IN
LATHYRUS SATIVUS
Calabuig Serna Tono, Martínez Rodero Iris & Segarra Martín Eva
Escola Tècnica Superior d’Enginyeria Agronòmica i del Medi Natural Universitat Politècncia de ValènciaMay, 2015
INDEX 1. INTRODUCTION
2. FIRST APPROCH TO THE PROJECT
3. EXPERIMENTAL DESIGN
a. SAMPLE RECOVERY
b. RNA-SEQ ASSAY
c. RAW DATA PROCESSING
i. DATA FILTERING
ii. TRANSCRIPTOME ASSEMBLY
d. DATA ANALYSIS
4. BUDGET ESTIMATE
5. CONCLUSIONS
1. INTRODUCTION
• the ‘insurance crop’
• But ODAP synthesis
LATHYRUS SATIVUS
Commonly known as grass pea
Agronomical and biological advantatges
Main source when other crops fail
Areas prone to famine and drough: Asia and East Africa
× Neuro-excitatory aminoacid
× Associated with neurolathyrism: neurodegenerative disease
1.INTRODUCTION
• Converting L. sativus into a safe food
• Discovering ODAP sythesis control
• Results future grass pea with low ODAP
OUR AIM
need of genetic improvement
Gene global expression assay
HOW L. SATIRUS & ODAP ARE RELATED
2. FiRST APPROACH TO THE PROJECT
Genotype effects:‘Jamalpur’ variety >> ‘LS-5602’ variety
Environmental effects: Drought conditions >> normal conditions
Developement stage:
seed >> vegetative tissues
2. FIRST APPROACH TO THE PROJECT
1st) Microarray assay
• We contacted Agilent
• But genome sequence was nedeed!
GLOBAL EXPRESSION STUDY: WHICH TECHNOLOGY ?
probes design
2nd) Including sequencing & annotation step
2. FIRTS APPROACH TO THE PROJECT
2. FIRST APPROACH TO THE PROJECT
3rd) RNA-seq ‘de novo’
GLOBAL EXPRESSION STUDY: WHICH TECHNOLOGY ?
3. EXPERIMENTAL DESIGN
• Taking advantatge of previous knowledge:
Sample Variety Tissue Environmental
conditions
1
‘Jamalpur’
Seed Drought
2 Control
3 Stem Drought
4 Control
5
‘LS-8603’
Seed Drought
6 Control
7 Stem Drought
8 Control
SAMPLES RECOVERY
50 g of seeds 100 mg of stem no need of replicates
3. EXPERIMENTAL DESIGN
RNA- SEQ ASSAY
2) cDNA library construction random hexamer primers dNTPs RNase H and DNA polymerase I
1) RNA extraction
3) Libraries qualification and quantification Agilent 2100 Bioanaylzer ABI StepOnePlus Real-Time PCR System 4) cDNA sequencing
HiSeq 2000
3. EXPERIMENTAL DESIGN
RAW DATA PROCESSING
1) Data filtering avoid sequecing errors
Reads removed:
• Sequences with adapter
• Low quality at both ends
• Average quality score < 15 in Phred
• Too short (< 36 bp)
3. EXPERIMENTAL DESIGN
RAW DATA PROCESSING
2) Transcriptome assembly
‘de novo’ transcriptome assemblier
• Recovers more full-lenght transcripts
• Sensitivity across a range of expression levels
• 3 modules:
• Inchworm
• Chrysalis sequentially applied
• Butterfly to
large volumes of RNA-seq reads
TRINITY
A. INCHWORM: reconstructs linear transcript contigs
1. k-mer dictionary from all sequence reads
2. removing of error-containing k-mers
3. the most frequent k-mer
4. Extends the seed contig in each direction:
5. Extends the sequence in either direction
6. Repeats steps 3–5 until k-mer dictionary exhausted
CONTIG
highest occuring k-merterminal base
(k-1) overlapping
growingcontig
sequenceuntil it
cannot be further
extended
B. CHRYSALIS: constructs complete de Bruijn graphs
1. groups Inchworm contigs into connected components
If they perfectly overlap k- 1 bases
2. likely to be:
3. builds a de Bruijn graph for each component
4. assigns each read to
components
• alternative splice forms
• closely related paralogs
with which the readshares the largest number of k-mers
component
C. BUTTERFLY: reconstructs full-length linear transcripts
• by reconciling:
1. Graph simplification
2. Deletes edges that represent minor deviations
individual de Bruijn graph
iterates between consecutive nodes
obtains linear paths in de Bruijn graphs
nodes representing longer sequences
likely sequencing errors
generated by Chrysalis:• original reads• paired ends
3. EXPERIMENTAL DESIGN DATA ANALYSIS
A) Transcriptome annotation
• NCBI non-redundant protein database• Swiss-Prot• Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database
the best hits
BLASTx against:
transcription direction coding region
of transcripts
ESTScan:
• detect coding regions in DNA sequences even low quality
1. CDS assignment
Transcripts were translated with CDS > 100 bp
3. EXPERIMENTAL DESIGN DATA ANALYSIS
A) Transcriptome annotation
• NCBI non-redundant protein database
Transcripts against:
3. Blast2GO
Transcripts annotated with GO• molecular function• biological processes• cellular component
2. BLASTn
termsGO annotations
3. EXPERIMENTAL DESIGN DATA ANALYSIS
B) Differential expression
• Mapping
• Transcripts normalization:
FPKM =total fragments
mapped reads millions ∗ exon length (Kb)
BWAaligner
readsassembled
transciptome
different FPKM different expression
3. EXPERIMENTAL DESIGN DATA ANALYSIS
B) Differential expression
• Different selected variables different ODAP content
• Different ODAP content different expressed genes
Sample Variety Tissue Environmental
conditions
1
‘Jamalpur’
Seed Drought
2 Control
3 Stem Drought
4 Control
5
‘LS-8603’
Seed Drought
6 Control
7 Stem Drought
8 Control
Look for genes more expressedprobably
synthesizing ODAP
3. EXPERIMENTAL DESIGN DATA ANALYSIS
C) Statistical analysis
ANOVA detects differentially expressed genes
all the level combinations considers the possible effect of
over the response variable
expression level for each gene
• Factor: variable which takes values in the experiment• Level: possible values for each factor
3. EXPERIMENTAL DESIGN
DATA ANALYSIS
C) Statistical analysis
Factors Levels
Variety Jamalpur (+) LS-8603 (-)
Tissue seed (+) stem (-)
E. condition drought (+) control (-)
ANOVA23 factorial design • 2 levels
• 3 factors
3. EXPERIMENTAL DESIGN DATA ANALYSIS
C) Statistical analysis
Variety Tissue E. condition
1 + + +
2 + + -
3 + - +
4 + - -
5 - + +
6 - + -
7 - - +
8 - - -
genes with higher expression in sample 1
than in the other samples
candidate genes for ODAP synthesis
detects just the genes implied in ODAP synthesisANOVA
SAMPLE 1: more ODAP content
4. BUDGET ESTIMATE
• RNA extraction 150$ / sample
• RNAseq 3600$ / sample
• Trinity free
• ESTScan free
• BWAligner free
TOTAL = 150·8 + 3,600·8 = 30,000 $
5. CONCLUSIONS
• Contribution to the knowledge of Lathyrus sativus
and ODAP biosynthesis
• Possible future modification
• Developement of the project
Support always our ideas with references
Need of collaboration
Lot of work!
lower ODAP content variant
THANK YOU FOR YOUR ATTENTION
Any question?