25
Introduction Definitions Related Fields The New Biology Motivation and Background Sources of Biological Data Course Plan Bioinformatics Lecture 1 Muhammad Usman Ghani Khan UET Lahore

presentation of bioinformatics

Embed Size (px)

DESCRIPTION

presentation of bioinformatics

Citation preview

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    BioinformaticsLecture 1

    Muhammad Usman Ghani Khan

    UET Lahore

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Outline

    1 IntroductionDefinitionsRelated FieldsThe New BiologyMotivation and Background

    2 Sources of Biological Data

    3 Course Plan

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Definitions

    over 43,000 definitions available on internet

    Definition 1: Bioinformatics is the application of computertechnology to the management and analysis of biologicaldata1

    Definition 2: Biologists doing stuff with computers?

    Definition 3: The design, construction and use of softwaretools to generate, store, annotate, access and analyse dataand information relating to Molecular Biology

    * Here we consider the use of Bioinformatics tools ratherthan their design and construction

    * Here we consider the access and analysis of data andinformation items rather than their generation, storage orannotation

    1European Bioinformatics Institute (EBI)

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Definitions

    Every application of computer science to biology* Sequence analysis, images analysis, sample management,

    population modeling,Analysis of data coming from large-scale biologicalprojects

    * Genomes, transcriptomes, proteomes, metabolomes, etc

    Solving biological problems with computation?Collecting, storing and analysing biological data?Informatics - library science?But: I do not think all biological computing isbioinformatics, e.g. mathematical modelling is notbioinformatics, even when connected with biology-relatedproblems. In my opinion, bioinformatics has to do withmanagement and the subsequent use of biologicalinformation, particular genetic information. RichardDurbin

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Definitions

    What is not bioinformatics?

    * Biologically-inspired computation, e.g., genetic algorithmsand neural networks

    * However, application of neural networks to solve somebiological problem, could be called bioinformatics

    * What about DNA computing?

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Related Fields

    Computational biology Application of computing tobiology (broad definition)

    * Often used interchangeably with bioinformatics

    Biometry: the statistical analysis of biological data

    Biophysics: An interdisciplinary field which appliestechniques from the physical sciences to understandingbiological structure and function2

    Mathematical biology tackles biological problems, but themethods it uses to tackle them need not be numerical andneed not be implemented in software or hardware.

    2British Biophysical Society

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Related Fields

    Computational biology and bioinformatics overlap; both usecomputational techniques to try to understand biologicalphenomena; but comp biol has more of an emphasis onmathematical modelling to explain biological mechanisms,whereas bioinformatics has more to do with the storage andsynthesis of experimental data (eg. pattern recognition anddata mining).

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    New Biology

    Traditional Biology

    Small team working on a specialized topic

    Well defined experiment to answer precise questions

    New high-throughput biology

    Large international teams using cutting edge technologydefining the project

    Results are given raw to the scientific community withoutany underlying hypothesis

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Examples of High Throughput

    Complete genome sequencing

    Simultaneous expression analysis of thousands of genes(DNA microarrays, SAGE)

    Large-scale sampling of the proteome

    Protein-protein analysis large-scale 2-hybrid (yeast, worm)

    Large-scale 3D structure production (yeast)

    Metabolism modeling

    Biodiversity

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Motivation

    Rapid growth of biological related data explosion of publicly available biological materials

    * Modern molecular biology and especially genomics has ledto vast quantities of data: DNA/ protein sequence, geneexpression.

    * This mainly consists of vast strings/ matrices of letters/numbers, which in their raw form are not very interesting.

    Management problem: how to handle this data?

    * Analysis* Understand* Presentation

    Approaches:* Computing techniques are very good for extracting useful

    patterns.* Boinformatics consists of methods to remove these issues.

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Motivation

    In order to extract useful information, it is necessary tounderstand biological principles involved.

    In this course we will introduce some basic molecularbiology/ genomics and look at ways in which computerscan be used to analyse it.

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Motivation

    Sample Ultimate Problems

    What is the role of a particular gene?

    Does a particular gene help cause a disease?

    How does a drug affect a cell?

    Can we insert a gene into corn to protect it againstdiseases or pests?

    Can we design a drug to accomplish a particular purpose?

    Can we build a cell that eats pollution?

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Motivation

    Why would a student choose this course?

    To prepare for graduate study in Bioinformatics orComputational Biology.

    To prepare for certain jobs in the pharmaceutical orbiotechnology industries. The future is hard to predict.There are jobs related to high-tech agriculture (newvarieties of plants), industrial organisms, biofuels,pharmaceuticals (designer drugs).

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Outline

    1 IntroductionDefinitionsRelated FieldsThe New BiologyMotivation and Background

    2 Sources of Biological Data

    3 Course Plan

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    So what data can we generate?

    Biological data can be generated at many different levels

    Genomics (DNA)

    Transcriptomics (RNA)

    Proteomics (proteins)

    Metabolomics (small compounds)

    Lipidomics (lipids)

    Hundreds of omics have been catalogued

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    How an omics dataset looks like?

    In most cases datasets present a similar structure

    Each sample is characteristed by a large number ofvariables (RNA, Proteins, lipids, etc.)

    Each variable indicates (usually quantitatively) thepresence of that element in the sample

    Due to the high cost of most omics technologies, variablesare much more then samples

    * Problems of over-fitting

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Research Areas

    Genome-scale) Sequence Analysis

    * Sequence alignments, motif discovery, genome-wideassociation (to study diseases such as cancers)

    Computational Evolutionary Biology

    * Phylogenetics, evolution modeling

    Analysis of Gene Regulation

    * Gene expression analysis, alternative splicing, protein-DNAinteractions, gene regulatory networks

    Structural Biology

    * Drug discovery, protein folding, protein-protein interactions

    Synthetic Biology

    High throughput Imaging Analysis

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Outline

    1 IntroductionDefinitionsRelated FieldsThe New BiologyMotivation and Background

    2 Sources of Biological Data

    3 Course Plan

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Course Contents

    Lecture 1

    Introduction, Definitions.

    Applications, Scope, Motivation.

    Lecture 2

    Molecular biology Introduction

    Structure of DNA, RNA, Proteins

    Announcement of term projects

    Lecture 3

    Bioinformatics Databases; Genbank, ENBL, Prot etc.

    Practical demonstration of databanks and their structures.

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Course Contents

    Lecture 4

    Database Formats; Fasta, seq, Data

    Quiz 1

    Lecture 5

    Sequence Alignment Sequence Motifs; Gene Finding

    Practical demonstration of BioJava/.NetBio tools forbiological related tasks

    Lecture 6

    Sequence Alignment (Part 2)

    Computing with Biological Structures

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Course Contents

    Lecture 7

    Phylogenetic Algorithms

    Lecture 8

    Mid-term break

    Lecture 9

    Microarray Data Analysis

    Lecture 10

    Term project presentations and discussion

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Course Contents

    Lecture 11

    Comparative Genomics

    Lecture 12

    Proteomics

    Lecture 13

    Biological Ontologies; Biological Text Mining

    Lecture 14

    Genetic Networks

    Lecture 15

    Final Viva and term project submissions

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Term Project Ideas

    Architectures and data management techniques for the lifesciencesQuery processing and optimization for biological dataBiological data sharing and update propagationQuery formulation assistance for scientistsModeling of life sciences dataBiomedical data integration issues in eScienceLaboratory information management systems in biology(including workflow systems)Quality assurance in integrated data repositoriesBiomedical metadata management (including provenance)Mining integrated life sciences data and text resourcesStandards for biomedical data integration and annotationScientific results arising from innovative data integrationsolutions

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Term Project Ideas

    Exposing biomedical data for integration purposes (APIs,Linked Open Data, SPARQL endpoints)

    Creation and use of clinical data repositories

    Data integration in clinical and translational research

    Integration of genotypic and phenotypic data

    Challenges and opportunities with big data in the lifesciences

    Ethical, legal and social issues with biomedical dataintegration

  • Introduction

    Definitions

    Related Fields

    The NewBiology

    Motivation andBackground

    Sources ofBiologicalData

    Course Plan

    Useful Books

    Bryan Bergeron M.D: Bioinformatics Computing, PrenticeHall, 2002 (freely available on internet).

    Richard C. Deonier, Simon Tavare & Michael S.Waterman: Computational Genome Analysis anIntroduction, Springer 2005

    Some other helpful books

    * Alberts et al- Molecular Biology of the Cell* Stryer- Biochemistry* Baldi and Brunak Bioinformatics a machine learning

    approach* Durbin, Eddy, Krogh and Mitchison Biological sequence

    analysis* Kanehisa - Post genome informatics* Lesk- Introduction to bioinformatics* Orengo, Jones and Thornton - Bioinformatics

    IntroductionDefinitionsRelated FieldsThe New BiologyMotivation and Background

    Sources of Biological DataCourse Plan