Upload
anonymous-2jxvv5vs
View
216
Download
0
Embed Size (px)
Citation preview
8/12/2019 FINALbioinformatics SEMINAR
1/59
BIOINFORMATICS & COMPUTING
METHODS
Presented by:
Sudhakar TripathiResearch scholar
Computer Engineering Department
IT-BHU
Supervisor:
Prof. R.B.Mishra
8/12/2019 FINALbioinformatics SEMINAR
2/59
Bioinformatics Definition
An interdisciplinary field involving biology,
computer science, mathematics and statistics
to analyze biological sequence data, genome
content, arrangement and to predict the
function & structure of macromolecules.
-David C. Mount
8/12/2019 FINALbioinformatics SEMINAR
3/59
What is Bioinformatics?
The creation and development of advanced
information and computational technologies
for problems in biology, most commonly
molecular biology (but increasingly in otherareas of biology). As such, it deals with
methods for storing, retrieving and analyzing
biological data, such as nucleic acid(DNA/RNA) and protein sequences, structures,
functions, pathways and genetic interactions.
8/12/2019 FINALbioinformatics SEMINAR
4/59
Need for and Use of Bioinformatics
Bioinformatics plays a key role in modern biology and is especially important
in:
_Molecular biology
_Genomics
_Functional genomics
_Systems biology
_Protein design and engineering
_Pharmaceutical development
_Medicine
_Ecology / population genetics
Need for and Use of Bioinformatics
_Finding genes, locating coding regions, predicting function
_Function, Evolution, Sequence, Structure (FESS relationships)
_Metabolic genotype, phenotype, redundancy
_Genes to Pathways; Genes to Biological Knowledge
_Proteomics: Proteome of an Organism
_Assigning Gene Sets to different Species: Homologs vs Paralogs
_Expression profiles, relation to Metabolic Pathways / Genetic Networks
Experimentally Analyse Thousands of Genes simultaneously
_Gene Synteny between Species: Gene Adjacency in Genomes
_Polymorphisms, Haplotypes, Propensity for Genetic Disease
-Searching databases for nucleotide or amino acid sequences that match sequences in unknownsamples
Inferring a proteins shape and function from a given a sequence of amino acids,
Finding all the genes and proteins in a given genome,
Determining sites in the protein structure where drug molecules can beattached.
8/12/2019 FINALbioinformatics SEMINAR
5/59
Aim of research in Bioinformatics
Understand the functioning of living things - to
improve the quality of life.
drug design
identification of genetic risk factors
gene therapy
genetic modification of food crops and animals, etc.
application to e.g. biotechnology
How will this benefit humanity !
Genetically modified crops ! - contamination escapes
Genetically modified " & #- whisky? "
Genes & behaviour - really?
Testing on animals - why? $% Gene therapy &'benefits outweigh dangers? (
Bio weapons? # ) * +
8/12/2019 FINALbioinformatics SEMINAR
6/59
Genetic material
Information transfer (mRNA)
Protein synthesis (tRNA/mRNA)
Some catalytic activity
Most cellular functions are performed or facilitated by proteins.
Primary biocatalyst Cofactor transport/storage
Mechanical motion/support
Immune protection
Control of growth/differentiation
Genome Sequence
Finding Genes in Genomic DNA
introns
exons
promotors
Characterizing Repeats in
Genomic DNA
Statistics
Patterns
Duplications in the Genome
8/12/2019 FINALbioinformatics SEMINAR
7/59
The Complexity of Biological Data
Nucleotide sequences Nucleotide structures
Gene expressions
Protein Structures
Protein functions Protein-protein interaction (pathways)
Cell
Cell signaling
Tissues Organs
Physiology
Organisms
8/12/2019 FINALbioinformatics SEMINAR
8/59
Basic cell architectureCells are smallest functional units of life
8/12/2019 FINALbioinformatics SEMINAR
9/59
Types of cell
Prokaryotes Eukaryotes
Single cell Single or multi cell
No nucleus Nucleus
No organelles Organelles
One piece of circular DNA(plasmid)
Chromosomes
No mRNA post transcriptional
modification
Exons/Introns splicing
8/12/2019 FINALbioinformatics SEMINAR
10/59
8/12/2019 FINALbioinformatics SEMINAR
11/59
8/12/2019 FINALbioinformatics SEMINAR
12/59
8/12/2019 FINALbioinformatics SEMINAR
13/59
8/12/2019 FINALbioinformatics SEMINAR
14/59
8/12/2019 FINALbioinformatics SEMINAR
15/59
8/12/2019 FINALbioinformatics SEMINAR
16/59
8/12/2019 FINALbioinformatics SEMINAR
17/59
8/12/2019 FINALbioinformatics SEMINAR
18/59
8/12/2019 FINALbioinformatics SEMINAR
19/59
8/12/2019 FINALbioinformatics SEMINAR
20/59
ProteinsProteins are biological molecules of primary importance to the functioning of
livingOrganisms Perform many and varied functions
Structural Proteins: the organism's basic building blocks, eg. collagen, nails, hair, etc
Enzymes:biological engines which mediate multitude of biochemical reactions. Usually
enzymes are very specific and catalyze only a single type of reaction, but they can play a
role in more than one pathway.
Transmembrane proteins: they are the cells housekeepers, eg. By regulating cell
volume, extraction and concentration of small molecules from the extracellular
environment and generation of ionic gradients essential for muscle and nerve cellfunction (sodium/potasium pump is an example)
U d t di t i t t i k t d t di f ti d d f ti
8/12/2019 FINALbioinformatics SEMINAR
21/59
Understanding protein structure is key to understanding functionand dysfunctionAmino Acid Sequences AAs polymerised into Chains (Residues)
Gene sequence determines Protein sequence
Protein Structure Chains fold into specific compact structures
Structure formation (folding) is spontaneous
Sequence determines Structure
Structure determines function
8/12/2019 FINALbioinformatics SEMINAR
22/59
Information flow in the cell - Central Dogma
DNA (4 bases, {A,C,G,T}) transcribed into
RNA (4 bases, {A,C,G,U}) translated into
Protein (20 amino acid residues,
{A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}) by triplets
(codons) of RNAs
UCA -> Serine (S)Start codon AUG -> Methionine (M)
3 stop codons (UGA, UAA,UAG) in most species
As always in Biology, there are exceptions!
Some species use different stop codons. The codon table(codon -> AA) is not the same for
all species, the mitochondria has different codon table.
8/12/2019 FINALbioinformatics SEMINAR
23/59
8/12/2019 FINALbioinformatics SEMINAR
24/59
How DNA Codes for Protein
8/12/2019 FINALbioinformatics SEMINAR
25/59
8/12/2019 FINALbioinformatics SEMINAR
26/59
Various Problem areas in Bioinformatics
Sequencing
Sequence analysis
Sequence alignment
The RNA Secondary Structure Prediction
Identifying Gene Regulatory Networks
Protein structure analysis
Protein structure comparision
Protein folding
domain pattern recognition
Sequence representation
Genotype Analysis
Splicing Site prediction
http://en.wikipedia.org/wiki/Genotypinghttp://en.wikipedia.org/wiki/RNA_splicinghttp://en.wikipedia.org/wiki/RNA_splicinghttp://en.wikipedia.org/wiki/Genotyping8/12/2019 FINALbioinformatics SEMINAR
27/59
Protein - protein interaction
Database development
Modeling genetics History
Ancient DNA
cDNAs
Population Genetics Simulations
Finding SNPs Genome wide Association Studies
Homology Search
The Sequence DB Search problem
Efficient searching in large data sets interfacing with data to support genomics research - software,
databases, and
HGT Analysis
8/12/2019 FINALbioinformatics SEMINAR
28/59
Finding signal in the datasets - statistical and computational
methods
Need to get more efficient in how the data is processed,
organized, and accessed
how do we represent the large amount of data? Dynamically
and interactively?
Gene network reconstruction from time series data
Gene function prediction
Clustering of Gene Expression Data
Characterization of Metabolic Pathways between Different
Genomes
Organizing biological knowledge in databases
Signal transduction and other biochemical pathways
Phylogenetics:Predicting the genetic or evolutionary relation
of set of organisms.
Alternative splicing
http://en.wikipedia.org/wiki/Phylogeneticshttp://en.wikipedia.org/wiki/Phylogenetics8/12/2019 FINALbioinformatics SEMINAR
29/59
Alternative splicing
Gene disease relationships
Microarray data collection, calibration and analysis
Polymorphism Analysis and visualization Pathway Analysis:Sequence comparison,Searches in sequence
databases
Sequence Matching:Tracing Phylogeny,Finding family
relationships between species by tracking, similarities betweenspecies.
Molecular Networks
Protein Threading
Sequence Comparisons and Sequence-Based Database Searches Clinical Diagnosis
Gene Expression Prediction
Genetic Linkage Analysis
Protein Function Prediction
8/12/2019 FINALbioinformatics SEMINAR
30/59
Various Computational methods
used in Bioinformatics
Mathematical Computing methodsStatistical computing methods
I ntell igent Computing methods
Neural Network Approaches
Integrated Differential Fuzzy Clustering
Fuzzy Computing
Genetic and Evolutionary Computing Algorithms
Probabilistic Computing and Belief Networks
HYBRID INTELLIGENT SYSTEMS
Swarm Intelligence
Rough Set Theory
Granular ComputingArtificial Immune Systems
Chaos Theory
The Differential Evolution Algorithm
Soft Computing
Dynamic Programming & various Algorithmic Computations
8/12/2019 FINALbioinformatics SEMINAR
31/59
8/12/2019 FINALbioinformatics SEMINAR
32/59
Gene Prediction
Overview of steps & strategies
What sequence signals can be used?
What other types of information can be used?
Algorithms
HMMs, discriminant functions, neural nets
Gene prediction software
3 major types
many,many programs!
8/12/2019 FINALbioinformatics SEMINAR
33/59
Overview of gene prediction strategies
What sequence signals can be used? Transcription:TF binding sites, promoter,
initiation site, terminator
Processing signals:splice donor/acceptors, polyA signal
Translation: start (AUG = Met) & stop (UGA,UUA,UAG)
ORFs, codon usage
What other types of information can be used?
cDNAs & ESTs(experimental data,pairwise alignment) homology(sequence comparison, BLAST)
8/12/2019 FINALbioinformatics SEMINAR
34/59
8/12/2019 FINALbioinformatics SEMINAR
35/59
Examples of gene prediction software
1) Similarity-based or Comparative
BLAST
SGP2 (extension of GeneID)
2) Ab initio = from the beginning
GeneID
GENSCAN
GeneMark.hmm
3) Combined "evidence-based
GeneSeqer(Brendel et al., ISU)
BEST? GENSCAN, GeneMark.hmm, GeneSeqer
but depends on organism & specific task
8/12/2019 FINALbioinformatics SEMINAR
36/59
Signals: Pre-mRNA Splicing
Translation
Protein
Splicing
mRNA Cap- -Poly(A)
Transcription
pre-mRNA Cap- -Poly(A)
Genomic DNA
Start codon Stop codon
GT AG
exon intron
Splice sites
Donor site Acceptor
site
8/12/2019 FINALbioinformatics SEMINAR
37/59
Post Transcription Splicing
Genomic DNA
Start codon Stop codon
mRNA -Poly(A)Cap-
5-UTR3-UTR
Start codon Stop codon
8/12/2019 FINALbioinformatics SEMINAR
38/59
Horizontal Gene Transfer
The movement of genetic material BETWEENprokaryotes
Common in prokaryotes. Useful forenvironmental adaptation (better than pointmutations)
8/12/2019 FINALbioinformatics SEMINAR
39/59
8/12/2019 FINALbioinformatics SEMINAR
40/59
PHYLOGENY
8/12/2019 FINALbioinformatics SEMINAR
41/59
Homology & Similarity
Homology Conserved sequences arising from a common ancestor
Orthologs:homologous genes that share a commonancestor in the absence of any gene duplication (Mouse
and Human Hemoglobin) Paralogs:genes related through gene duplication (one
gene is a copy of another - Fetal and Adult Hemoglobin)
Similarity Genes that share common sequences but are not
necessarily related
8/12/2019 FINALbioinformatics SEMINAR
42/59
Phylogenetics
What is Phylogenetics?
Science of identifying and interpreting
evolutionary relationships between biological
entities (species, genes, etc)
What is a phylogenetic tree?
Dendrogram (tree) composed of nodes and
branches representing the putative geneology
of the taxonomic units
8/12/2019 FINALbioinformatics SEMINAR
43/59
Phylogenetic Trees
A Graph Representing The Evolutionary History Of
Sequences
Relationship of sequences to one another (How everythingis connected)
Dissect the order of appearance of insertions, deletions,and mutations
Identify Related Sequences, Predict Function,
Observe Epidemiology (Analyze changes in viralstrains)
8/12/2019 FINALbioinformatics SEMINAR
44/59
8/12/2019 FINALbioinformatics SEMINAR
45/59
Tree Characteristics
Tree Properties Clade:all the descendants of a common ancestor
represented by a node
Distance:number of changes that have taken place
along a branch
Tree Types Cladogram:shows the branching order of nodes
Phylogram:shows branching order and distances
A
B
C
D
.035
.009
.057
.044
.012
.016
Phylogram
8/12/2019 FINALbioinformatics SEMINAR
46/59
Methods
Distance-based
Parsimony
Maximum likelihood
8/12/2019 FINALbioinformatics SEMINAR
47/59
8/12/2019 FINALbioinformatics SEMINAR
48/59
8/12/2019 FINALbioinformatics SEMINAR
49/59
Levels of Protein Structure
Primary (1) structure: amino acid sequence of
protein
Secondary (2) structure: local structure (alpha
helices or beta strands)
Tertiary (3) structure: 3-dimensional structure
of protein
Quaternary (4) structure: structure of a
multiple protein complex
8/12/2019 FINALbioinformatics SEMINAR
50/59
Protein structures Prediction
protein structures can be determined
experimentally (in most cases) by
x-ray crystallography
nuclear magnetic resonance (NMR) but this is very expensive and time-consuming
can we predict structures by computational
meansinstead?
PDB Content Growth
Methods for Secondary Structure
8/12/2019 FINALbioinformatics SEMINAR
51/59
Methods for Secondary StructurePrediction
Chou-Fasman method
Based on the propensities of different amino acids to adoptdifferent
secondary structures
Predictions are made using a rules-based approach toidentify groups of
amino acids with shared secondary structure propensities Garnier, Osguthorpe, Robson (GOR) method
Statistical method of secondary structure prediction basedon informationtheory & Bayesian probability
Multiple Sequence Alignment (MSA) methods
Performs secondary structure prediction on a multiplesequence alignment as opposed to a single protein sequence
Neural network-based methods
Example: Profile network from Heidelberg (PHD)
M th d f T ti St t
8/12/2019 FINALbioinformatics SEMINAR
52/59
Methods for Tertiary Structure
Prediction
Tertiary (3D) Structure Prediction
Homology Modeling
Fold Recognition Protein Threading
Ab initio structure prediction
Quaternary Structure
8/12/2019 FINALbioinformatics SEMINAR
53/59
8/12/2019 FINALbioinformatics SEMINAR
54/59
8/12/2019 FINALbioinformatics SEMINAR
55/59
8/12/2019 FINALbioinformatics SEMINAR
56/59
8/12/2019 FINALbioinformatics SEMINAR
57/59
8/12/2019 FINALbioinformatics SEMINAR
58/59
8/12/2019 FINALbioinformatics SEMINAR
59/59
THANKS!