FINALbioinformatics SEMINAR

Embed Size (px)

Citation preview

  • 8/12/2019 FINALbioinformatics SEMINAR

    1/59

    BIOINFORMATICS & COMPUTING

    METHODS

    Presented by:

    Sudhakar TripathiResearch scholar

    Computer Engineering Department

    IT-BHU

    Supervisor:

    Prof. R.B.Mishra

  • 8/12/2019 FINALbioinformatics SEMINAR

    2/59

    Bioinformatics Definition

    An interdisciplinary field involving biology,

    computer science, mathematics and statistics

    to analyze biological sequence data, genome

    content, arrangement and to predict the

    function & structure of macromolecules.

    -David C. Mount

  • 8/12/2019 FINALbioinformatics SEMINAR

    3/59

    What is Bioinformatics?

    The creation and development of advanced

    information and computational technologies

    for problems in biology, most commonly

    molecular biology (but increasingly in otherareas of biology). As such, it deals with

    methods for storing, retrieving and analyzing

    biological data, such as nucleic acid(DNA/RNA) and protein sequences, structures,

    functions, pathways and genetic interactions.

  • 8/12/2019 FINALbioinformatics SEMINAR

    4/59

    Need for and Use of Bioinformatics

    Bioinformatics plays a key role in modern biology and is especially important

    in:

    _Molecular biology

    _Genomics

    _Functional genomics

    _Systems biology

    _Protein design and engineering

    _Pharmaceutical development

    _Medicine

    _Ecology / population genetics

    Need for and Use of Bioinformatics

    _Finding genes, locating coding regions, predicting function

    _Function, Evolution, Sequence, Structure (FESS relationships)

    _Metabolic genotype, phenotype, redundancy

    _Genes to Pathways; Genes to Biological Knowledge

    _Proteomics: Proteome of an Organism

    _Assigning Gene Sets to different Species: Homologs vs Paralogs

    _Expression profiles, relation to Metabolic Pathways / Genetic Networks

    Experimentally Analyse Thousands of Genes simultaneously

    _Gene Synteny between Species: Gene Adjacency in Genomes

    _Polymorphisms, Haplotypes, Propensity for Genetic Disease

    -Searching databases for nucleotide or amino acid sequences that match sequences in unknownsamples

    Inferring a proteins shape and function from a given a sequence of amino acids,

    Finding all the genes and proteins in a given genome,

    Determining sites in the protein structure where drug molecules can beattached.

  • 8/12/2019 FINALbioinformatics SEMINAR

    5/59

    Aim of research in Bioinformatics

    Understand the functioning of living things - to

    improve the quality of life.

    drug design

    identification of genetic risk factors

    gene therapy

    genetic modification of food crops and animals, etc.

    application to e.g. biotechnology

    How will this benefit humanity !

    Genetically modified crops ! - contamination escapes

    Genetically modified " & #- whisky? "

    Genes & behaviour - really?

    Testing on animals - why? $% Gene therapy &'benefits outweigh dangers? (

    Bio weapons? # ) * +

  • 8/12/2019 FINALbioinformatics SEMINAR

    6/59

    Genetic material

    Information transfer (mRNA)

    Protein synthesis (tRNA/mRNA)

    Some catalytic activity

    Most cellular functions are performed or facilitated by proteins.

    Primary biocatalyst Cofactor transport/storage

    Mechanical motion/support

    Immune protection

    Control of growth/differentiation

    Genome Sequence

    Finding Genes in Genomic DNA

    introns

    exons

    promotors

    Characterizing Repeats in

    Genomic DNA

    Statistics

    Patterns

    Duplications in the Genome

  • 8/12/2019 FINALbioinformatics SEMINAR

    7/59

    The Complexity of Biological Data

    Nucleotide sequences Nucleotide structures

    Gene expressions

    Protein Structures

    Protein functions Protein-protein interaction (pathways)

    Cell

    Cell signaling

    Tissues Organs

    Physiology

    Organisms

  • 8/12/2019 FINALbioinformatics SEMINAR

    8/59

    Basic cell architectureCells are smallest functional units of life

  • 8/12/2019 FINALbioinformatics SEMINAR

    9/59

    Types of cell

    Prokaryotes Eukaryotes

    Single cell Single or multi cell

    No nucleus Nucleus

    No organelles Organelles

    One piece of circular DNA(plasmid)

    Chromosomes

    No mRNA post transcriptional

    modification

    Exons/Introns splicing

  • 8/12/2019 FINALbioinformatics SEMINAR

    10/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    11/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    12/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    13/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    14/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    15/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    16/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    17/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    18/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    19/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    20/59

    ProteinsProteins are biological molecules of primary importance to the functioning of

    livingOrganisms Perform many and varied functions

    Structural Proteins: the organism's basic building blocks, eg. collagen, nails, hair, etc

    Enzymes:biological engines which mediate multitude of biochemical reactions. Usually

    enzymes are very specific and catalyze only a single type of reaction, but they can play a

    role in more than one pathway.

    Transmembrane proteins: they are the cells housekeepers, eg. By regulating cell

    volume, extraction and concentration of small molecules from the extracellular

    environment and generation of ionic gradients essential for muscle and nerve cellfunction (sodium/potasium pump is an example)

    U d t di t i t t i k t d t di f ti d d f ti

  • 8/12/2019 FINALbioinformatics SEMINAR

    21/59

    Understanding protein structure is key to understanding functionand dysfunctionAmino Acid Sequences AAs polymerised into Chains (Residues)

    Gene sequence determines Protein sequence

    Protein Structure Chains fold into specific compact structures

    Structure formation (folding) is spontaneous

    Sequence determines Structure

    Structure determines function

  • 8/12/2019 FINALbioinformatics SEMINAR

    22/59

    Information flow in the cell - Central Dogma

    DNA (4 bases, {A,C,G,T}) transcribed into

    RNA (4 bases, {A,C,G,U}) translated into

    Protein (20 amino acid residues,

    {A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}) by triplets

    (codons) of RNAs

    UCA -> Serine (S)Start codon AUG -> Methionine (M)

    3 stop codons (UGA, UAA,UAG) in most species

    As always in Biology, there are exceptions!

    Some species use different stop codons. The codon table(codon -> AA) is not the same for

    all species, the mitochondria has different codon table.

  • 8/12/2019 FINALbioinformatics SEMINAR

    23/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    24/59

    How DNA Codes for Protein

  • 8/12/2019 FINALbioinformatics SEMINAR

    25/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    26/59

    Various Problem areas in Bioinformatics

    Sequencing

    Sequence analysis

    Sequence alignment

    The RNA Secondary Structure Prediction

    Identifying Gene Regulatory Networks

    Protein structure analysis

    Protein structure comparision

    Protein folding

    domain pattern recognition

    Sequence representation

    Genotype Analysis

    Splicing Site prediction

    http://en.wikipedia.org/wiki/Genotypinghttp://en.wikipedia.org/wiki/RNA_splicinghttp://en.wikipedia.org/wiki/RNA_splicinghttp://en.wikipedia.org/wiki/Genotyping
  • 8/12/2019 FINALbioinformatics SEMINAR

    27/59

    Protein - protein interaction

    Database development

    Modeling genetics History

    Ancient DNA

    cDNAs

    Population Genetics Simulations

    Finding SNPs Genome wide Association Studies

    Homology Search

    The Sequence DB Search problem

    Efficient searching in large data sets interfacing with data to support genomics research - software,

    databases, and

    HGT Analysis

  • 8/12/2019 FINALbioinformatics SEMINAR

    28/59

    Finding signal in the datasets - statistical and computational

    methods

    Need to get more efficient in how the data is processed,

    organized, and accessed

    how do we represent the large amount of data? Dynamically

    and interactively?

    Gene network reconstruction from time series data

    Gene function prediction

    Clustering of Gene Expression Data

    Characterization of Metabolic Pathways between Different

    Genomes

    Organizing biological knowledge in databases

    Signal transduction and other biochemical pathways

    Phylogenetics:Predicting the genetic or evolutionary relation

    of set of organisms.

    Alternative splicing

    http://en.wikipedia.org/wiki/Phylogeneticshttp://en.wikipedia.org/wiki/Phylogenetics
  • 8/12/2019 FINALbioinformatics SEMINAR

    29/59

    Alternative splicing

    Gene disease relationships

    Microarray data collection, calibration and analysis

    Polymorphism Analysis and visualization Pathway Analysis:Sequence comparison,Searches in sequence

    databases

    Sequence Matching:Tracing Phylogeny,Finding family

    relationships between species by tracking, similarities betweenspecies.

    Molecular Networks

    Protein Threading

    Sequence Comparisons and Sequence-Based Database Searches Clinical Diagnosis

    Gene Expression Prediction

    Genetic Linkage Analysis

    Protein Function Prediction

  • 8/12/2019 FINALbioinformatics SEMINAR

    30/59

    Various Computational methods

    used in Bioinformatics

    Mathematical Computing methodsStatistical computing methods

    I ntell igent Computing methods

    Neural Network Approaches

    Integrated Differential Fuzzy Clustering

    Fuzzy Computing

    Genetic and Evolutionary Computing Algorithms

    Probabilistic Computing and Belief Networks

    HYBRID INTELLIGENT SYSTEMS

    Swarm Intelligence

    Rough Set Theory

    Granular ComputingArtificial Immune Systems

    Chaos Theory

    The Differential Evolution Algorithm

    Soft Computing

    Dynamic Programming & various Algorithmic Computations

  • 8/12/2019 FINALbioinformatics SEMINAR

    31/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    32/59

    Gene Prediction

    Overview of steps & strategies

    What sequence signals can be used?

    What other types of information can be used?

    Algorithms

    HMMs, discriminant functions, neural nets

    Gene prediction software

    3 major types

    many,many programs!

  • 8/12/2019 FINALbioinformatics SEMINAR

    33/59

    Overview of gene prediction strategies

    What sequence signals can be used? Transcription:TF binding sites, promoter,

    initiation site, terminator

    Processing signals:splice donor/acceptors, polyA signal

    Translation: start (AUG = Met) & stop (UGA,UUA,UAG)

    ORFs, codon usage

    What other types of information can be used?

    cDNAs & ESTs(experimental data,pairwise alignment) homology(sequence comparison, BLAST)

  • 8/12/2019 FINALbioinformatics SEMINAR

    34/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    35/59

    Examples of gene prediction software

    1) Similarity-based or Comparative

    BLAST

    SGP2 (extension of GeneID)

    2) Ab initio = from the beginning

    GeneID

    GENSCAN

    GeneMark.hmm

    3) Combined "evidence-based

    GeneSeqer(Brendel et al., ISU)

    BEST? GENSCAN, GeneMark.hmm, GeneSeqer

    but depends on organism & specific task

  • 8/12/2019 FINALbioinformatics SEMINAR

    36/59

    Signals: Pre-mRNA Splicing

    Translation

    Protein

    Splicing

    mRNA Cap- -Poly(A)

    Transcription

    pre-mRNA Cap- -Poly(A)

    Genomic DNA

    Start codon Stop codon

    GT AG

    exon intron

    Splice sites

    Donor site Acceptor

    site

  • 8/12/2019 FINALbioinformatics SEMINAR

    37/59

    Post Transcription Splicing

    Genomic DNA

    Start codon Stop codon

    mRNA -Poly(A)Cap-

    5-UTR3-UTR

    Start codon Stop codon

  • 8/12/2019 FINALbioinformatics SEMINAR

    38/59

    Horizontal Gene Transfer

    The movement of genetic material BETWEENprokaryotes

    Common in prokaryotes. Useful forenvironmental adaptation (better than pointmutations)

  • 8/12/2019 FINALbioinformatics SEMINAR

    39/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    40/59

    PHYLOGENY

  • 8/12/2019 FINALbioinformatics SEMINAR

    41/59

    Homology & Similarity

    Homology Conserved sequences arising from a common ancestor

    Orthologs:homologous genes that share a commonancestor in the absence of any gene duplication (Mouse

    and Human Hemoglobin) Paralogs:genes related through gene duplication (one

    gene is a copy of another - Fetal and Adult Hemoglobin)

    Similarity Genes that share common sequences but are not

    necessarily related

  • 8/12/2019 FINALbioinformatics SEMINAR

    42/59

    Phylogenetics

    What is Phylogenetics?

    Science of identifying and interpreting

    evolutionary relationships between biological

    entities (species, genes, etc)

    What is a phylogenetic tree?

    Dendrogram (tree) composed of nodes and

    branches representing the putative geneology

    of the taxonomic units

  • 8/12/2019 FINALbioinformatics SEMINAR

    43/59

    Phylogenetic Trees

    A Graph Representing The Evolutionary History Of

    Sequences

    Relationship of sequences to one another (How everythingis connected)

    Dissect the order of appearance of insertions, deletions,and mutations

    Identify Related Sequences, Predict Function,

    Observe Epidemiology (Analyze changes in viralstrains)

  • 8/12/2019 FINALbioinformatics SEMINAR

    44/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    45/59

    Tree Characteristics

    Tree Properties Clade:all the descendants of a common ancestor

    represented by a node

    Distance:number of changes that have taken place

    along a branch

    Tree Types Cladogram:shows the branching order of nodes

    Phylogram:shows branching order and distances

    A

    B

    C

    D

    .035

    .009

    .057

    .044

    .012

    .016

    Phylogram

  • 8/12/2019 FINALbioinformatics SEMINAR

    46/59

    Methods

    Distance-based

    Parsimony

    Maximum likelihood

  • 8/12/2019 FINALbioinformatics SEMINAR

    47/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    48/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    49/59

    Levels of Protein Structure

    Primary (1) structure: amino acid sequence of

    protein

    Secondary (2) structure: local structure (alpha

    helices or beta strands)

    Tertiary (3) structure: 3-dimensional structure

    of protein

    Quaternary (4) structure: structure of a

    multiple protein complex

  • 8/12/2019 FINALbioinformatics SEMINAR

    50/59

    Protein structures Prediction

    protein structures can be determined

    experimentally (in most cases) by

    x-ray crystallography

    nuclear magnetic resonance (NMR) but this is very expensive and time-consuming

    can we predict structures by computational

    meansinstead?

    PDB Content Growth

    Methods for Secondary Structure

  • 8/12/2019 FINALbioinformatics SEMINAR

    51/59

    Methods for Secondary StructurePrediction

    Chou-Fasman method

    Based on the propensities of different amino acids to adoptdifferent

    secondary structures

    Predictions are made using a rules-based approach toidentify groups of

    amino acids with shared secondary structure propensities Garnier, Osguthorpe, Robson (GOR) method

    Statistical method of secondary structure prediction basedon informationtheory & Bayesian probability

    Multiple Sequence Alignment (MSA) methods

    Performs secondary structure prediction on a multiplesequence alignment as opposed to a single protein sequence

    Neural network-based methods

    Example: Profile network from Heidelberg (PHD)

    M th d f T ti St t

  • 8/12/2019 FINALbioinformatics SEMINAR

    52/59

    Methods for Tertiary Structure

    Prediction

    Tertiary (3D) Structure Prediction

    Homology Modeling

    Fold Recognition Protein Threading

    Ab initio structure prediction

    Quaternary Structure

  • 8/12/2019 FINALbioinformatics SEMINAR

    53/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    54/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    55/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    56/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    57/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    58/59

  • 8/12/2019 FINALbioinformatics SEMINAR

    59/59

    THANKS!