Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Introduction to Bioinformatics:some definitions
Peter K. Rogan, Ph.D.Laboratory of Human Molecular Genetics
Children’s Mercy HospitalSchools of Medicine & Computer Science and
Engineering, UMKChttp://www.sce.umkc.edu/~roganp
Definition of Bioinformatics: What is bioinformatics?
• Roughly, bioinformatics describes any use of computers to handle biological information
• In practice, the definition used by most people is narrower; bioinformatics to them is a synonym for "computational molecular biology"---the use of computers to characterize the molecular components of living things.
"Classical" bioinformatics
• Most biologists talk about "doing bioinformatics" when they use computers to store, retrieve, analyze or predictthe composition or the structure of biomolecules.
• As computers become more powerful you could probably add simulate to this list of bioinformatics verbs.
• "Biomolecules" include your genetic material---nucleic acids---and the products of your genes: proteins.
• These are the concerns of "classical" bioinformatics, dealing primarily with sequence analysis.
• Richard Durbin, Head of Informatics at the WellcomeTrust Sanger Institute, expressed an interesting opinion on this distinction in an interview:– "I do not think all biological computing is
bioinformatics, e.g. mathematical modelling is not bioinformatics, even when connected with biology-related problems. In my opinion, bioinformatics has to do with management and the subsequent use of biological information, particular genetic information."
Monomers and polymers• Most large biological molecules that they are polymers;
ordered chains of simpler molecular modules called monomers.
• Monomers that can combine in a in a chain are of the same general class, but each kind of monomer in that class has its own well-defined set of characteristics.
• Many monomer molecules can be joined together to form a single, far larger, macromolecule. Macromolecules can have exquisitely specific informational content and/or chemical properties.
• The monomers in a given macromolecule of DNA or protein can be treated computationally as letters of an alphabet (strings), put together in pre-programmed arrangements to carry messages or do work in a cell.
Living in the " post-genomic" era • From multiple whole genome sequences we can look for
differences and similarities between all the genes of multiple species. From such studies we can draw particular conclusions about species and general ones about evolution. This kind of science is often referred to as comparative genomics [EXAMPLE].
• There are now technologies designed to measure the relative number of copies of a genetic message (levels of gene expression) at different stages in development or disease or in different tissues.
• Large-scale ways of identifying gene functions and associations will grow in significance and with them the accompanying bioinformatics of functional genomics.
Comparing 2 genomes vs. reference genomewith matches defined above 70% similarity
Reference Genome : E. coli K12- MG1655
This figure shows the protein matches of each comparison genome to the reference genome.Each ring of the circular display represents a genome and each tick mark represents a gene match along the length of the genome. The outer ring displays the reference genome and the inner rings display each comparison genome.
Summary statistics for reference genome3129 reference genes match a comparison genome at least once for the given criteria
125 reference genes match all comparison genomes for the given criteria
1160 reference genes match none of the comparison genomes for the given criteria
Staphylococcus epidermidis ATCC 12228 146 genes match reference
Salmonella typhimurium LT2 SGSC1412 3555 genes match reference
3701 total gene matches to reference genome
Results of Query: Gene sequences found in E. coli, S. typhimerium and S. epidermidis
70.212769putative 2-component transcriptional regulatorb2855
putative transcriptional regulator (LuxR/UhpA
familiy)STM3606
70.212769thiogalactosideacetyltransferaseb0342
UDP-3-O-(3-hydroxymyristoyl)-
glucosamine n-acyltransferase
STM0226
70.238098ATP-binding component of a transporterb0199
putative ABC-type transport system ATPasecomponent/cell division
STM0511
100uridylate kinaseb0171uridylate kinaseSTM0218
100dnaK suppressor proteinb0145dnaK suppressor proteinSTM0186
100
ATP-binding cell division protein,
septation process, complexes with FtsZ,
associated with junctions of inner and outer
membranes
b0094
ATP-binding cell division protein,
septation process, complexes
STM0132
100cell division protein; ingrowth of wall at
septumb0083cell division protein
ingrowth of wall atSTM0121
100transcriptional repressor of fru operon and othersb0080transcriptional repressor
of fru operon and othersSTM0118
% similaritymatching common namematching locuscommon nameLocus Accession
Number
Detailed analysis of conserved sequence region in Salmonella typhimerium
PrositeTIGR+
-strand
Genome coordinate
Gene Evidence:
GenBank
Living in the …(continued)• Shift in emphasis (of sequence analysis
especially) from genes themselves to gene products. – attempts to catalogue the activities and characterize
interactions between all gene products (in humans): proteomics
– attempts to crystallize and or predict the structures of all proteins (in humans): structural genomics
– fewer DNA double-helices in bad sci-fi movies!
Living in the…(continued 2)
• What some people refer to as research or medical informatics, the management of all biomedical experimental data associated with particular molecules or patients will move into the mainstream of cell and molecular biology and migrate from the commercial and clinical to academic sectors.
What is medical informatics?
• Biomedical Informatics is an emerging discipline that has been defined as the study, invention, and implementation of structures and algorithms to improve communication, understanding and management of medical information.
• The end objective of biomedical informatics is the coalescing of data, knowledge, and the tools necessary to apply that data and knowledge in the decision-making process, at the time and place that a decision needs to be made.
• The focus on the structures and algorithms necessary to manipulate the information separates Biomedical Informatics from other medical disciplines where information content is the focus.
The distinction• This suggests that one difference between bioinformatics
and medical informatics as disciplines lies with their approaches to the data; there are bioinformaticistsinterested in the theory behind the manipulation of that data and there are bioinformatics scientists concerned with the data itself and its biological implications.
• Medical informatics, for practical reasons, is more likely to deal with data obtained at "grosser" biological levels---that is information from super-cellular systems, right up to the population level---while most bioinformatics is concerned with information about cellular and biomolecular structures and systems. University of Missouri-Columbia has a Ph.D. program in Medical Informatics
Seminars this semester at MU on Health InformaticsSam Schulz, Ph.D.Professor, Director of Health Informatics, HMI "ROI, Integration and the Effects of Receivership upon the IT Enterprise for AMCs“
Steven Waldren, MDNLM Postdoctoral Fellow, HMI"Determining Family Medicine Residency Needs and Expectations of an Electronic Medical Record“
Raman Seth"Biochemical Names Database“
Kathryn J. NelsonProject Director, Clinical Outcomes"Implementing an electronic medical error reporting system“
Timothy B. Patrick, Ph.D.Assistant Professor, HMI"A Text Corpus Approach to an Analysis of the Shared Use of Core Terminology“
John GormanNLM Fellow, HMI"Pursuing Best Practices for Information System Management in Academic Medical Centers"
Swetha SridharNLM Fellow, HMI"A comparison of face-to-face and virtual dermatology visits"
Jeannette Jackson-Thompson, MSPH, Ph.D.Resident Assistant Professor, HMI; Director, Missouri Cancer Registry"Quality of Life for Cancer Survivors-A Collaborative Research Project Involving the Missouri Cancer Registry, the Department of Family and Community Medicine and the National Office of the American Cancer Society"
What is Genomics?
• Genomics is a field which existed before the completion of the sequences of genomes.
• In the crudest of forms, for example the oft-referenced estimate of 100,000 genes in the human genome derived from a(n) (in)famous piece of "back of an envelope" genomics, guessing the weight of chromosomes and the density of the genes they bear. [We now have evidence for ~35,000 genes in the human genome]
• Genomics is any attempt to analyze or compare the entire genetic complement of a species or species (plural). It is, of course possible to compare genomes by comparing more-or-less representative subsets of genes within genomes.
What is Mathematical Biology?• Mathematical biology also tackles biological problems,
but the methods it uses to tackle them need not be numerical and need not be implemented in software or hardware.
• Indeed, such methods need not "solve" anything; in mathematical biology it would be considered reasonable to publish a result which merely establishes that a biological problem belongs to a particular general class.
• According to Alex Kasman:– bioinformatics "...seems to focus almost exclusively on specific
algorithms that can be applied to large molecular biological data sets..." whereas
– mathematical biology "...includes things of theoretical interest which are not necessarily algorithmic, not necessarily molecularin nature, and are not necessarily useful in analyzing collecteddata."
Research at the Center for Mathematical Biology, University of OxfordSpatial and spatiotemporal pattern formationSpecifically, partial differential equation modelling of the chemical and mechanical aspects of the generation of pattern and form in embryology and development. Applications include skeletal patterning in the vertebrate limb, primitive streak formation, somitogenesis, skin organ formation (eg feather germ formation, tooth initiation); tissue movement during invagination processes; tissue-tissue interactions in, for example, determining lung morphology; cell aggregation in Dictyostelium, pattern generation in Hydra. We have recently begun to investigate discrete models to understand pattern formation on a cellular level (eg. Delta-Notch intercellular signalling).
Wound healingRecent biological advances in the understanding of foetal wound healing have shed new light on the role of the interaction between cells and their environment in both foetal and adult wound repair. We are investigating normal and abnormal wound healing. Applications include, modelling wound contraction, fibroproliferative diseases, scar tissue formation and corneal wound healing. This investigation is being carried out in collaboration with experimental colleagues in the Biology Department at Manchester University.
Mathematical Modelling to Improve Cancer TherapyThe CMB is member of the Research Training Network project "Using Mathematical Modelling and Computer Simulation to Improve Cancer Therapy." The aim of the network is to develop the whole modelling process from phenomenological observation to simulation and validation, through the development of mathematical models and their qualitative and quantitative study, in order to simulate the different aspects of tumor dynamics within the full range of scales: sub-cellular, cellular and macroscopic. Developing mathematical models at all the scales mentioned requires making use of a wide variety of theoretical tools from a range of disciplines (e.g., continuum mechanics, kinetic theory, stochastic processes, system theory, compartmental models, multiphase systems) and developing different mathematical tools to obtain both qualitative and quantitative results.
From Individual to Collective Behaviour in EcologyWe are using various mathematical methods to investigate the relationship between the behaviour of individual animals and the dynamics of their population. A focus of this research work has been on social insects and honey bees in particular. We are interested in how insects use simple rules and local information to generate complex and functional patterns. More recent work has concentrated on applying these techniques to the dynamics of populations in ecological systems. We are working in collaboration with researchers in the Zoology department on the behaviour and dynamics of locust swarms.
What is proteomics?• Tyers & Mann Nature. 2003.13;422(6928):193-7:
– “The term proteome was first coined to describe the set of proteins encoded by the genome.
– The study of the proteome, called proteomics, now evokes
• not only all the proteins in any given cell, • but also the set of all protein isoforms and modifications, • the interactions between them, • the structural description of proteins and their higher-order
complexes, • and for that matter almost everything 'post-genomic'."
What is Pharmacogenetics?• All individuals respond differently to drug treatments; some
positively, others with little obvious change in their conditions and yet others with side effects or allergic reactions.
• Much of this variation is known to have a genetic basis. • Pharmacogenetics is a subset of pharmacogenomics which
uses genomic/bioinformatic methods to identify genomic correlates,– for example SNPs (Single Nucleotide Polymorphisms),
characteristic of particular patient response profiles and use those markers to inform the administration and development of therapies.
– Strikingly, such approaches have been used to "resurrect" drugs thought previously to be ineffective, but subsequently found to work with in subset of patients.
– They can also be used for optimizing the doses of chemotherapy for particular patients.
What is Pharmacogenomics?
• Pharmacogenomics is the application of genomic approaches and technologies to the identification of drug targets.
• Examples include trawling entire genomes for potential receptors by bioinformatics means,
• or by investigating patterns of gene expression in both pathogens and hosts during infection,
• or by examining the characteristic expression patterns found in tumors or patients samples for diagnostic purposes.
What is Bioinformatics?The Loose definition
• There are other fields---for example medical imaging / image analysis which might be considered part of bioinformatics.
• There is also a whole other discipline of biologically-inspired computation; genetic algorithms, AI, neural networks.
• Example: Neural networks, inspired by crude models of the functioning of nerve cells in the brain, are used to predict surprisingly accurately, the secondary structures of proteins from their primary sequences.
• What almost all bioinformatics has in common is the processing of large amounts of biologically-derived information, whether DNA sequences or breast X-rays.
Example: Sequence Annotation of The Prader-Willi and Angelman Syndromes Critical Region on Human Chromosome 15 (http://www.genome.ucsc.edu )