Upload
santosh-rama-bhadra-tata
View
207
Download
0
Embed Size (px)
Citation preview
In-silico Analysis for Unknown Data
-Tata Santosh Rama Bhadra RaoAgri Biotech Foundation
What is Bioinformatics?
Mathematics and Statistics
Biology
Computer Science
"All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts of biological information in databases. The information involved includes gene sequences, biological activity/function, pharmacological activity, biological structure, molecular structure, protein-protein interactions, and gene expression. Bioinformatics uses powerful computers and statistical techniques to accomplish research objectives, for example, to discover a new pharmaceutical or herbicide."
What is bioinformatics?
Task flow• Data what we have• Search for simlar data in available data base• Clustal- W• Phylogenetic analysis• Classification• Structural analysis• Functional analysis• Reporting
Data Outcome
• That may be a nucleotide sequence such as m-RNA or gene or genome or protein sequence.
• Mostly 16s m-RNA is used to classify a gene or species.
• With Forward and reverse sequences it will more accurate.
• We can check with protein also.
Genetic code table
Sample for DNA isolation
1
DNA
2 3
DNA
Symbol Meaning Explanation
G G Guanine
A A Adenine
T T Thymine
C C Cytosine
R A or G puRine
Y C or T pYrimidine
N A, C, G or T Any base
Double helix
5’
3’
3’
5’
A C G T C A T G
T G C A G T A C
RNA5’ 3’A C G U C A U G
template
U U Uracil
Isolation of the gene of interest from unknown sample
cDNA library construction kit from Stratagene
1st strand cDNA preparationand mRNA removal
AAAA
AAAAAAAATTTT
AAAATTTT
Removal of commonly hybridized population bymagnetic separation
Differentially up-regulatedmRNA population
Commonly expressed mRNA population
Control mRNA
AAAATTTT
TTTTAAAATTTT
AAAA
AAAATTTT
AAAAAAAA
AAAAAAAA
TTTTTTTT
TTTTTTTT
stress mRNA
Hybridization of stress mRNA with excess ofcomplementary 1st strand control cDNA
TTTT TTTT
Gene and protein of EIF4A
ATGGCGGCGSCCACCACSTCCCGCCGCGGCGCCGGCGCCTCCCGCAGCATGGACGACGAGAACCTCACCTTCGAGACCTCCCCGGGTGTCGAGGTCGTCAGCAGCTTCGACCAGATGGGGATCAAGGACGACCTCCTCCGCGGCATCTACGGCTACGGGTTCGAGAAGCCCTCCGCCATCCAGCAGCGCGCCGTCCTCCCCATCATCAACGGACGCGACGTCATCGCGCAGGCCCAGTCCGGCACCGGGAAGTCATCCATGATCTCACTCACCGTATGCCAGATCGTCGACACCGCAGTCCGCGAGGTCCAGGCTCTGATCCTCTCACCCACCAGGGAGCTCGCTTCGCAGACAGAGAAGGTTATGCTGGCTGTCGGCGACTACCTCAATATCCAAGTGCACGCTTGCATTGGTGGGAAAAGTATCAGCGAGGATATCAGGAGGCTTGAGAACGGAGTCCATGTTGTCTCTGGGACTCCGGGCAGAGTCTGCGATATGATCAAGAGGAGGACCCTGCGGACAAGAGCCATCAAGCTTCTAGTTCTGGATGAGGCTGATGAGATGTTGAGCAGAGGCTTTAAGGATCAGATTTACGATGTCTACAGATACCTCCCACCCGAACTTCAGGTCGTTTTGATCTCCGCCACTCTTCCTCACGAGATCCTAGAGATGACTAGCAAGTTCATGACCGAACCAGTTAGGATCCTTGTGAAGCGTGATGAGTTGACCCTGGAGGGTATCAAACAATTCTTCGTTGCTGTTGAGAAAGAGGAATGGAAGTTTGATACGCTGTGTGATCTTTATGATACGTTGACCATCACCCAAGCTGTTATTTTCTGCAATACTAAGAGAAAGGTGGATTGGCTTACTGAAAGAATGCGCAGCAATAACTTCACAGTATCAGCTATGCATGGTGACATGCCCCAACAGGAAAGGGATGCCATCATGACAGAGTTCAGGTCTGGTGCAACTCGTGTGCTAATCACTACGGATGTTTGGGCTCGAGGGCTGGATGTTCAGCAGGTTTCACTTGTCATAAATTATGATCTCCCAAATAATCGTGAGCTTTACATCCATCGCATCGGTCGCTCTGGTCGTTTTGGGCGCAAGGGTGTGGCGATCAATTTTGTGCGCAAGGATGACATCCGTATCCTGAGGGATATAGAACAGTACTACAGCACACAAATTGATGAGATGCCAATGAATGTTGCTGATCTAATTTGA
"MAAXTTSRRGAGASRSMDDENLTFETSPGVEVVSSFDQMGIKDDLLRGIYGYGFEKPSAIQQRAVLPIINGRDVIAQAQSGTGKSSMISLTVCQIVDTAVREVQALILSPTRELASQTEKVMLAVGDYLNIQVHACIGGKSISEDIRRLENGVHVVSGTPGRVCDMIKRRTLRTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEPVRILVKRDELTLEGIKQFFVAVEKEEWKFDTLCDLYDTLTITQAVIFCNTKRKVDWLTERMRSNNFTVSAMHGDMPQQERDAIMTEFRSGATRVLITTDVWARGLDVQQVSLVINYDLPNNRELYIHRIGRSGRFGRKGVAINFVRKDDIRILRDIEQYYSTQIDEMPMNVADLI"
In-silico generated protein structures
13
ABOUT THE GENE AND PROTEINE
GENE LENGTH : 1224bp
INTRONS NUMBER : 7
EXON NUMBER : 8
GENE MOLECULAR WEIGHT : 378411.66 - 378491.72 Daltons
PROTEIN LENGTH : 407 AA
MOLECULAR WEIGHT : 45.2KDA
ISO ELECTIC POINT : 6.10
Search for simlar data in available data base
• The date will subjected for similar data search in NCBI or Phytozome or some more available databases with BLAST tool.
• Download the data from the data base.Note: • always keep data in notepad for working
convenience.• Now we are presenting unpublished data.
BLAST
Clustal- W
• Now the finalized data will subject to Clustal alignment for sequence similarity.
• Clustal- W is the tool for searching and mapping more similarities in sequences.
• This may allow for nucleotide sequences and proteins.
• Mostly protein sequences are subjected for the alignment for accuracy.
Mega
SB4g RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTEP 232SACetif RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTEP 232ZEAMMB73 RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEP 231SIDb RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEITSKFMTEP 232PgeiF4a RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEP 232OS3g RTRAIKLLILDEADEMLGRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTDP 229H RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHDILEITSKFMTDP 237Phys RTRSIKLLILDESDEMLSRGFKDQIYDVYRYLPPELQVVLVSATLPHEILEMTNKFMTDP 222Jat RTRAIRLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPNEILEMTSKFMTDP 235RC RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPNEILEMTSKFMTDP 232GM RTRAIKMLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 232PHAVU RTRAIKMLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231CA RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231M RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231CS-EIF4A-3-like RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTNKFMTDP 235MD RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTNKFMTEP 227 ***:*::*:***:****.****************:*** *:*****::***:*.****:*
Alignment for retrieved sequences
Phylogenetic analysis
• After alignment the data will subject for the phylogenetic analysis.
• Here the relation between the data source will be evaluated.
• Most similar sequence will place near the sequence less similar sequence will place in distance.
• By counting the distance we can measure the relation between data source.
Phylogenetic tree
22
Fig 1: 20-404: P-LOOP COTAINIG NUCLIOSIDE TRIOSE PHOSPATE HYDROLASE(ipr027417).34-62: RNA- HELICASE, DEAD BOX TYPE Q-MOTIF (IPR014014).246-407: HELICASE C-TERMINAL (IPR001650)183-186: REPRESENCE OF DEAD AMINO ACIDS
ATG GCG GCG SCC ACC ACS TCC CGC CGC GGC GCC GGC GCC TCC CGC AGC ATG GAC GAC GAG AAC CTC ACC TTC
M A A X T T S R R G A G A S R S M D D E N L T F 24
GAG ACC TCC CCG GGT GTC GAG GTC GTC AGC AGC TTC GAC CAG ATG GGG ATC AAG GAC GAC CTC CTC CGC GGC
E T S P G V E V V S S F D Q M G I K D D L L R G 48
ATC TAC GGC TAC GGG TTC GAG AAG CCC TCC GCC ATC CAG CAG CGC GCC GTC CTC CCC ATC ATC AAC GGA CGC
I Y G Y G F E K P S A I Q Q R A V L P I I N G R
GAC GTC ATC GCG CAG GCC CAG TCC GGC ACC GGG AAG TCA TCC ATG ATC TCA CTC ACC GTA TGC CAG ATC GTC
D V I A Q A Q S G T G K S S M I S L T V C Q I V
GAC ACC GCA GTC CGC GAG GTC CAG GCT CTG ATC CTC TCA CCC ACC AGG GAG CTC GCT TCG CAG ACA GAG AAG
D T A V R E V Q A L I L S P T R E L A S Q T E K
GTT ATG CTG GCT GTC GGC GAC TAC CTC AAT ATC CAA GTG CAC GCT TGC ATT GGT GGG AAA AGT ATC AGC GAG
V M L A V G D Y L N I Q V H A C I G G K S I S E
GAT ATC AGG AGG CTT GAG AAC GGA GTC CAT GTT GTC TCT GGG ACT CCG GGC AGA GTC TGC GAT ATG ATC AAG
D I R R L E N G V H V V S G T P G R V C D M I K
AGG AGG ACC CTG CGG ACA AGA GCC ATC AAG CTT CTA GTT CTG GAT GAG GCT GAT GAG ATG TTG AGC AGA GGC
R R T L R T R A I K L L V L D E A D E M L S R G
TTT AAG GAT CAG ATT TAC GAT GTC TAC AGA TAC CTC CCA CCC GAA CTT CAG GTC GTT TTG ATC TCC GCC ACT
F K D Q I Y D V Y R Y L P P E L Q V V L I S A T
CTT CCT CAC GAG ATC CTA GAG ATG ACT AGC AAG TTC ATG ACC GAA CCA GTT AGG ATC CTT GTG AAG CGT GAT
L P H E I L E M T S K F M T E P V R I L V K R D
GAG TTG ACC CTG GAG GGT ATC AAA CAA TTC TTC GTT GCT GTT GAG AAA GAG GAA TGG AAG TTT GAT ACG CTG
E L T L E G I K Q F F V A V E K E E W K F D T L
TGT GAT CTT TAT GAT ACG TTG ACC ATC ACC CAA GCT GTT ATT TTC TGC AAT ACT AAG AGA AAG GTG GAT TGG
C D L Y D T L T I T Q A V I F C N T K R K V D W
CTT ACT GAA AGA ATG CGC AGC AAT AAC TTC ACA GTA TCA GCT ATG CAT GGT GAC ATG CCC CAA CAG GAA AGG
L T E R M R S N N F T V S A M H G D M P Q Q E R
GAT GCC ATC ATG ACA GAG TTC AGG TCT GGT GCA ACT CGT GTG CTA ATC ACT ACG GAT GTT TGG GCT CGA GGG
D A I M T E F R S G A T R V L I T T D V W A R G
CTG GAT GTT CAG CAG GTT TCA CTT GTC ATA AAT TAT GAT CTC CCA AAT AAT CGT GAG CTT TAC ATC CAT CGC
L D V Q Q V S L V I N Y D L P N N R E L Y I H R
ATC GGT CGC TCT GGT CGT TTT GGG CGC AAG GGT GTG GCG ATC AAT TTT GTG CGC AAG GAT GAC ATC CGT ATC
I G R S G R F G R K G V A I N F V R K D D I R I
CTG AGG GAT ATA GAA CAG TAC TAC AGC ACA CAA ATT GAT GAG ATG CCA ATG AAT GTT GCT GAT CTA ATT TGA
L R D I E Q Y Y S T Q I D E M P M N V A D L I *
Structural analysis
• Structural analysis will conduct for protein through homology modeling & docking.
• The protein sequence secondary structure and tertiary structure analysis must be done.
• This structure analysis must be evaluated under Nuclear magnetic resonance score and X-Ray crystallographic score.
• Ramachandra plot is more important for structural validation.
24
Insilco analysis of eIF4AHomology modeling: by using Modeller 9.12 version we have designed structure of eIF4A Pennisetum glaucum
α-helics
β- pleated sheets DEAD box
motif
Fig: Homology modeling of amino acid sequence of eiF4A from P. glaucum revealing the signature motifs of DEAD box and Mg2+ binding sites. eiF4A showed the ----helices and --------sheets.
Nuclear magnetic resonance analysis for protein structure
REPRESENTATION OF RAMA CHANDRAN PLOT FOR RICE AND PEARL MILLET EIF4A STRUCTURES DONE BY PROCHECK
Peptide position and bonds
Functional analysis
• Functional analysis will be done with domain and conserved motifs and active site analysis.
• These are evaluated with docking and amino acid composition.
• Depend on αhelices β-pleated sheets the protein structure can be obtained.
29
Docking analysis and motif localization in Pennisetum glaccum EIF4A
Docking analysis was performed by using Sybil 6.7 version for motif analysis and structural stability.
Rice and pearl millet Active sites, Motifs and Domains of eif4a respectively done by docking studies
Classification
• Functional analysis and structural analysis can classify our protein.
• At first we got the relation of the protein through phylogenetic analysis.
• Now with structural and functional characters can be include and clear classification will be performed.
Reporting
• Now the data which was evaluated in a way with accuracy you can publish or report.
• So many submissions and sequence uploads are taking place at various levels.
• Genes are reporting, proteins are reporting, genomes are also reporting to those databases.
• Those will be available for further research aspects.
Conclusion• With In-silico studies you will get 60 to 70%
accuracy of the information regarding your work.
• With this you can confirm whether you are working on proper thing or not before starting your In-vitro studies.
• So you can proceed towards your work with 70% of In-silico information and complete the project with 100% success in .
Acknowledgement
• Agri biotech foundation• Department of Biotechnology• Prof . G. Pakkireddy,• Dr. J. S. Bentur• Dr. G. Mallikarjun• My Friends and colleagues• Dearest participants (transformed with high
energy and patience)