Upload
mudit-misra
View
220
Download
0
Embed Size (px)
Citation preview
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 1/156
20th June ± 20th July, 2006
³Bioinformatics : Techniques and usage´
Dr. Ashok Sharma
Head, Bioinformatics and Co-ordinator, Bioinformatics Centre
Central Institute of Medicinal and Aromatic Plants
PO. CIMAP, Lucknow-226015, India.
Web site: www.cimap.res.in
E-mail: [email protected]
CIMAP Summer Training on
Biotechnology & Bioinformatics
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 2/156
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 3/156
Sequences
Biological
KnowledgeDatabases
Greater Biological Knowledge
Bioinformatics
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 4/156
Bioinformatics:
WhyWhat
Computational MethodsResources and Tools
?
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 5/156
If you are one of many biologists for whom genome
database are as comprehensible as a mass of supermarket
barcodes ± It is a good time to team up with a friendly
bioinformaticist and join the action, before, it is too late
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 6/156
If biologists do not adapt to the powerful
computation tools needed to exploit huge data
sheets, they could find themselves flounderingin the wake of advances in genomics
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 7/156
It is predicted that the potential to integrate different levels of
genomic data ± such as raw sequence from the human
genome and those of model organisms, data on genetic
variability between individuals and on gene expression in
different tissues ± will radically change biological research.
It is also agreed that small experiments driven by individual
investigators will give way to a world in which
multidisciplinary teams, sharing huge online data sets, emerge
as key players.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 8/156
Bioinformatics : a brave new world
�R adical change in biological research from small
experiments driven by individual investigators
Multidisciplinary teams sharing huge online datawill be the key players
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 9/156
Era of µsystems biology¶ ability to create mathematical
models describing the function of networks of genes and proteins is just as important as traditional lab skills
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 10/156
Those who learn to conduct high throughput genomic analyses,
and who can master the computational tools needed to exploit biological databases
Who will have competitive advantage?
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 11/156
Outcome of this natural selection will see many current top
scientists, research groups and even whole institutes relegated
to the second division
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 12/156
What is the solution?
In the long run, the change will come through the emergence of anew breed of biologists who are steeped in computational biology
as an integral parts of their education. This means that the subject
must be included as a core module in all undergraduate biology
courses, rather than a specialist option. Although, this is starting to
happen, the availability of teachers with the appropriate expertiseis still a limiting factor.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 13/156
The emerging new breedSo, if the majority of biologists are not to be disenfranchised,
What is the solution?Emergence of a new breed of biologists who are steeped in
computational biology as an integral parts of their education.
Limiting factor: availability of teachers with the appropriate
expertise.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 14/156
One of the model solution has come out in U.S.A
Funding agencies are also trying to drive change by ploughingmoney into initiatives that require a multidisciplinary approach
and a strong computational component.
The US National Institute of Health, for instance, through its National Institute of General Medical Sciences; has created a
programme of µglue grants¶ for integrative and collaborative
approaches to research. Under this programme, the Alliance will
draw a complete map of interactions between some 1000
proteins in two types of cells.
The consortium unites traditional experimentalists with
computational biologists.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 15/156
Glue grants
Integrative and collaborative approaches to research
US National Institutes of Health
Alliance for Cellular Signaling
Complete map of interactions between
some 1,000 proteins in two types of cells
Consortium unites traditional experim-
entalist with computational biologists.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 16/156
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 17/156
y Human Genome Project and other genome projects such as sequencingof bacterial genomes and yeast genomes, etc. have produced enormous
amounts of DNA sequence data.
y Large scale biological research involving micro sequencing of proteins, 2-
D gel patterns of proteins and polypeptides, metabolic pathways, physicaland genetic maps of the organisms, cell line information, and microbial
strain data etc. have been responsible for the unprecedented growth of
biological data.
y Projects such as Species-2000, global plant check list, information on
release of organisms in environment, and Animal Virus Information, etc.
are producing hard data at the species level in multimedia format.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 18/156
� The rate of growth of the biological data is estimated to be more than 200
million base pairs per year.
� The database content itself is doubling in size approximately every year.
� Nucleotide and protein sequences are not the only data that are
accumulating rapidly. The number of characterized genes from a variety
of organisms and the number of solved protein structures are also
doubling every two years.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 19/156
The enormous growth of biological data and its availability in the major international databases is serving as a source of knowledge to the life
scientists.
The whole paradigm shift in molecular biology towards data-
intensive research in search of useful genes is basically due to thefact that the genetic data is becoming the major driving force in drug
discovery, protein engineering, design of new molecules, and other
related areas.
The large stores of biological data are holding the promise to serve asthe ³Discovery Super Highway´ for innovations in biotechnology through
a process of analysis and transformation of molecular and structural data
into biological knowledge for prosperity.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 20/156
In the face of the challenges imposed by the growing size and
complexity of the biological data, a new discipline of science, known
as µBioinformatics¶, had emerged in the recent past.
Bioinformatics deals with the various issues related to the biological
data. It also covers the development of data analysis tools, modeling
of biological macromolecules and their complexes, metabolic
pathways, designing of new molecules such as drugs, peptide
vaccines, proteins, etc.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 21/156
Gradually, Bioinformatics has evolved to deal with four related but still
distinct problem areas, viz.:
a) Handling and management of biological data, including its organization,
control, linkages, analysis, and so forth.
b) Communication among people, projects, and institutions engaged in
the biological research and applications. The communication may
include e-mail, file transfer, remote login, computer conferencing,
electronic bulletin boards, or establishment of web-based information
resources.
c) Organization, access, search and retrieval of biological information,
documents, and literature.
d) Analysis and interpretation of the biological data through the
computational approaches including visualization, mathematical
modeling, and development of algorithms for highly parallel
processing of complex biological structure.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 22/156
Bioinformatics may, be defined as a scientific discipline that
encompasses all the aspects of biological information, viz.,
acquisition, processing, storage, distribution, analysis andinterpretation, that combines the tools and techniques of
mathematics, computer science, and biology with the aim of
understanding the biological significance of a variety of data.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 23/156
Bioinformatics has acquired great importance due to its application in
the Genome projects.
The target of decoding the three billion base pairs of the human DNA
has become achievable only through the use of various innovative
techniques and methods evolved by the Bioinformatics scientists.
Bioinformatics has become an essential component of biotechnology
based product and process development.
The process of drug design and development is expensive and time-
consuming. The application of the tools and techniques of
Bioinformatics has resulted in the reduction in cost and the
development cycle of the drugs. This aspect has a tremendous impact
on the society. If a newly discovered drug is a life-saving one, then the
resulting gains are not only in terms of financial savings but also insaving the lives of several million people. Major pharmaceutical and
Biotechnology companies have set up large R&D groups in
Bioinformatics.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 24/156
Bioinformatics is a multidisciplinary subject. Through only about a decadeold, it has become very important for the growth of biosciences,
biotechnology, and the economic prosperity of nations.
Three well-identified divisions of Bioinformatics may be considered:
a) Molecular Bioinformatics,
b) Cellular and sub-cellular Bioinformatics, and
c) Orgasmic and community Bioinformatics.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 25/156
FUNCTIONS OF A BIOINFORMATICS CENTRE
i. The principal objective of a Bioinformatics Centre is to function as aninformation base in each specialty so that the scientists have ready
access to the computer-based information on resources, databases
in subject fields, and build up expertise in bioinformatics in keeping
with the rapid development in this area.
ii. To provide a computer-based information storage and retrievalsystem of database that collects structured information generated
by research and industrial institutions in the identified fields of
biotechnology, continually update the databases and make the
information available to the users.
iii. An active network mode, in which the scientists get access to thebiotechnology community in the identified areas, answer requests
for information in an interactive and discussive mode and actively
initiate dialogue among groups with common interest.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 26/156
iv. To provide retrieval service either online or offline in their
specialized areas and to give overall information support even in
areas other than those assigned to them.
v. To provide communication link with international databases for
selective bibliographic information for the user scientist.
vi. To develop software packages and databases specific to user needs.
vii. To conduct training courses in the specialized areas
periodically to meet the special requirements of manpower
development in the area and to promote awareness about the
computerized storage and retrieval facility among bio scientists
and information scientists.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 27/156
Bioinformatics ± What?
�A mixture of Biochemistry, Molecular Biology, and Computer Science
�O btaining, storing, organizing, and analyzing biological and genetic information
for understanding its activity in living organisms
�Main goal is to convert multitude of complex data into useful information and
knowledge
�Data includes gene and protein sequences, cDNA, nucleotide sequences
�Data from gene sequencing, combinatorial chemical synthesis, gene-expression
investigations, pharmicogenomics, proteomic studies, and other methods of study.
�Information used to build synthetic and predictive models allowing scientists to better
understand complex living systems
�Future applications in biology, chemistry, pharmaceuticals, medicine, and agriculture
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 28/156
What is the Role of Bioinformatics
� The R ole of the Bioinformatics group is to:
R esearch and develop tools and systems that provide understanding
and integration of genomic data across technologies
Work with other R esearch Information staff to make these tools
available to research scientists
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 29/156
What kinds of data are we interested in?
� Sequence data
� Profile data ± gene expression and proteins
� Mapping data
� Function and phenotype
� Pathways
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 30/156
TECHNOLOGIES INTECHNOLOGIES IN
BIOINFORMATICSBIOINFORMATICS
DataData--acquisition Systemsacquisition Systems
TheseThese areare requiresrequires mainlymainly atat researchresearch labslabs
generatinggenerating largelarge amountsamounts of of datadata.. TheseThese systemssystems
includeinclude inventoryinventory ControlControl Software,Software, trackingtracking hundredshundreds
of of thousandsthousands of of reagents,reagents, gelsgels andand other other materials,materials,
reagentreagent manipulationmanipulation software,software, roboticrobotic systemsystem toto carrycarry
outout highhigh volume,volume, highhigh precision precision laboratorylaboratory
manipulationmanipulation inin genomegenome researchresearch andand sequencesequence
production production softwaresoftware thatthat willwill helphelp improveimprove sequencingsequencing..
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 31/156
TECHNOLOGIES IN BIOINFORMATICSTECHNOLOGIES IN BIOINFORMATICS
DataData ² ² Analysis SystemsAnalysis SystemsStudyingStudying sequences,sequences, predictingpredicting proteinprotein structurestructure andand
comparingcomparing genomesgenomes onon anan extensionextension suchsuch allall requiresrequires
InformaticsInformatics toolstools suchsuch asas SequenceSequence AnalysisAnalysis SoftwareSoftware thatthat
performsperforms alignments,alignments, detectsdetects homologies,homologies, identifiesidentifies codingcoding
regionsregions andand extractsextracts featuresfeatures.. ProteinProtein foldingfolding softwaresoftware isis
usedused toto transformtransform geneticgenetic informationinformation intointo functionfunction viavia
proteinsproteins whosewhose functionalfunctional specificspecific areare determineddetermined byby
their their 33--DD shapesshapes.. GeneticGenetic mappingmapping SoftwareSoftware SystemsSystems playplay
aa keykey rolerole inin thethe analysisanalysis of of geneticgenetic mappingmapping datadata..
ClassificationClassification SoftwareSoftware extractsextracts featuresfeatures fromfrom DNADNA
SequencesSequences placeplace proteinsproteins intointo genegene familiesfamilies andand tracktrack
roteinrotein motifsmotifs..
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 32/156
TECHNOLOGIES IN BIOINFORMATICSTECHNOLOGIES IN BIOINFORMATICS
DataData-- Management SystemManagement System
Various genome projects are generatingVarious genome projects are generating
information that can not be accommodated byinformation that can not be accommodated by
traditional publishing. Electronic data managementtraditional publishing. Electronic data management
and publishing Systems are crucial components of and publishing Systems are crucial components of genomic research.genomic research.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 33/156
Bioinformatics,Bioinformatics, whichwhich isis thethe intersectionintersection of of InformationInformationTechnologyTechnology andand MathematicsMathematics withwith molecular molecular biology biology //genetics,genetics, hashas createdcreated severalseveral challengeschallenges for for thethe Computer Computer ScienceScience CommunityCommunity.. InformationInformation StorageStorage
StoringStoring hugehuge amountsamounts of of geneticgenetic information,information, amenableamenable toto rapidrapid accessaccess andandmanipulation,manipulation, isis aa greatgreat challengechallenge..
OneOne millionmillion bases bases ((11Mb)Mb) NN 11 MegabyteMegabyte ((11MB)MB).. Thus,Thus, oneone wouldwould requirerequire 33GigabytesGigabytes ((33 GB)GB) of of computer computer datadata storagestorage spacespace toto storestore entireentire HumanHuman GenomeGenomecomprisingcomprising threethree GigabasesGigabases ((33 Gb)Gb)..
ThisThis includesincludes nucleotidenucleotide sequencesequence datadata onlyonly andand doesdoes notnot includeinclude datadata annotationsannotationsandand other other informationinformation associatedassociated withwith thethe sequencesequence datadata..
WithWith time,time, moremore annotationsannotations enteredentered either either (a)(a) by by scientistsscientists asas aa resultresult of of laboratorylaboratory findings,findings, literatureliterature searches,searches, datadata analysis,analysis, or or personal personal communications,communications, and/or and/or
(b)(b) enteredentered asas aa resultresult of of automatedautomated datadata analysisanalysis programs programs or or autoannotators,autoannotators, WillWill be beassociatedassociated withwith thethe sequencesequence datadata increasingincreasing thethe requirementsrequirements of of storagestoragesignificantlysignificantly beyond beyond thethe 33 GBGB for for thethe humanhuman genomegenome..
CHALLENGES IN BIOINFORMATICS
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 34/156
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 35/156
ProteinProtein FoldingFolding SoftwareSoftware
GeneticGenetic informationinformation isis transformedtransformed intointo functionfunction viavia proteins, proteins, whosewhose functionalfunctionalspecificitiesspecificities areare determineddetermined by by their their threethree dimensionaldimensional shapesshapes.. PredictionPrediction of of thethe protein protein structurestructure fromfrom aminoamino acidacid sequncessequnces isis anan importantimportant andand challengingchallenging problem problem..
MapMap AssemblyAssembly && IntegrationIntegration SoftwareSoftware ComputationComputation plays plays anan increasingincreasing centralcentral rolerole inin thethe assemblyassembly andand integrationintegration of of
largelarge mapsmaps composedcomposed of of differentdifferent kindskinds andand combinationscombinations of of datadata..
ComparativeComparative GenomicsGenomics ToolsTools AsAs thethe genomegenome projects projects maturemature andand largelarge amountsamounts of of genomicgenomic informationinformation isis
availableavailable for for aa number number of of species,species, comparativecomparative genomicsgenomics isis emergingemerging asas anan activeactiveareaarea of of studystudy..
GeneGene MiningMining MethodsMethods for for mappingmapping genesgenes toto their their physical physical locationslocations onon thethe genomegenome;; searchingsearching
for for relatedrelated genesgenes;; analysinganalysing thethe databasedatabase toto findfind familiesfamilies of of relatedrelated genesgenes andand totounderstandunderstand their their coordinatedcoordinated expressionexpression;; findingfinding correlationcorrelation between between specificspecificdiseasesdiseases andand expressionexpression of of relatedrelated genesgenes..
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 36/156
SER ENDIPITY EFFECTSER ENDIPITY EFFECT
OneOne of of thethe mostmost excitingexciting aspectsaspects of of thethe informationinformation revolutionrevolutionisis thatthat itit allowsallows usus toto combinecombine manymany differentdifferent itemsitems of of informationinformation andand manymany differentdifferent kindkind of of informationinformation onon aa scalescale
never never seenseen before before.. LargeLarge internationalinternational databasesdatabases for for instance,instance, includeinclude
contributionscontributions fromfrom thousandsthousands of of differentdifferent sourcessources.. AlsoAlso thethehypertexthypertext linkslinks (Information(Information Super Super Highway)Highway) between between sitessitesmakesmakes itit possible possible toto drawdraw together together manymany differentdifferent kindskinds of of
informationinformation thatthat bear bear onon aa particular particular problems problems.. TheseThese activitiesactivities notnot onlyonly promote promote collaborationcollaboration onon aa trulytruly vastvast
scale,scale, theythey alsoalso enrichenrich researchresearch.. OneOne importantimportant effecteffect isis thethe³Screndipity³Screndipity effect´effect´ combiningcombining differentdifferent datasetsdatasets makesmakes possible possible entirelyentirely newnew kindskinds of of studystudy--New New StudiesStudies inevitableinevitableleadlead toto newnew andand unexpectedunexpected discoveriesdiscoveries..
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 37/156
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 38/156
Using public databases and data formats
The first key skill for biologists is to learn to use online search toolsto find information. Literature searching is no longer a matter of
looking up references in a printed index. You can find links to most
of the scientific publications you need online. There are central
databases that collect reference information so you can search
dozens of journals at once. You can even set up ³agents´ that notify
you when new articles are published in an area of interest.
Searching the public molecular-biology databases requires the same
skills as searching for literature references: you need to know how
to construct a query statement that will pluck the particular needleyou¶re looking for out of the database haystack.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 39/156
Being able to compare pairs of DNA or protein sequences andextract partial matches has made it possible to use a biological
sequence as a database query. Sequence-based searching is another
key skill for biologists; a little exploration of the biological
databases at the beginning of a project often saves a lot of valuable
time in the lab. Identifying homologous sequences provides a basis
for phylogenetic analysis and sequence-pattern recognition.
Sequence-based searching can be done online through web forms,
so it requires no special computing skills, but to judge the quality
of your search results you need to understand how the underlyingsequence-alignment method works and go beyond simple sequence
alignment to other types of analysis.
Sequence alignment and sequence searching
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 40/156
Gene prediction
Gene prediction is only one of a cluster of methods for attemptingto detect meaningful signals in uncharacterized DNA sequences.
Until recently, most sequences deposited in GenBank were already
characterized at the time of deposition. That is, someone had
already gone in and, using molecular biology, genetic, or
biochemical methods, figured out what the gene did. However, nowthat the genome projects are in full swing, there¶s a lot of DNA
sequence out there that isn¶t characterized.
Software for prediction of open reading frames, genes, exon splice
sites, promoter binding sites, repeat sequences, and tR NA genes
helps molecular biologists make sense out of this unmapped DNA.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 41/156
Multiple sequence alignment
Multiple sequence-alignment methods assemble pairwise sequence
alignments for many related sequences into a picture of sequence
homology among all members of a gene family. Multiple sequence
alignments aid in visual identification of sites in a DNA or protein
sequence that may be functionally important. Such sites are usuallyconserved; that is, the same amino acid is present at that site in each
one of a group of related sequences. Multiple sequence alignments
can also be quantitatively analyzed to extract information about a
gene family. Multiple sequence alignments are an integral step in
phylogenetic analysis of a family of related sequences, and they also provide the basis for identifying sequence patterns that characterize
particular protein families.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 42/156
Phylogenetic analysis
Phylogenetic analysis attempts to describe the evolutionary
relatedness of a group of sequences. A traditional phylogenetic tree or
cladogram groups species into a diagram that represents their relative
evolutionary divergence. Branchings of the tree that occur furthest
from the root separate individual species; branchings that occur close
to the root group species into kingdoms, phyla, classes, families,genera, and so on.
The information in a molecular sequence alignment can be used to
compute a phylogenetic tree for a particular family of gene sequences.
The branchings in phylogenetic trees represent evolutionary distance
based on sequence similarity scores or on information-theoretic
modeling of the number of mutational steps required to change on
sequence into the other. Phylogenetic analyses of protein sequence
families talks not about the evolution of the entire organism but about
evolutionary change in specific coding regions.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 43/156
Extraction of patterns and profiles from sequence data
A motif is a sequence of amino acids that defines a substructure in a protein that can be connected to function or to structural stability. In a
group of evolutionarily related gene sequences, motifs appear as
conserved sites. Sites in a gene sequence tend to be conserved-to
remain the same in all or most representatives of a sequence family ±
when there is selection pressure against copies of the gene that havemutations at that site. Nonessential parts of the gene sequence will
diverge from each other in the course of evolution, so the conserved
motif regions who up as a signal in a sea of mutational noise.
Sequence profiles are statistical descriptions of these motif signals;
profiles can help identify distantly related proteins by picking out a
motif signal even in a sequence that has diverged radically from other
members of the same family.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 44/156
Protein sequence analysis
The amino-acid content of a protein sequence can be used as the
basis for many analyses, from computing the isoelectric point and
molecular weight of the protein and the characteristic peptide mass
fingerprints that will form when it¶s digested with a particular
protease, to predicting secondary structure features and post-
transnational modification sites.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 45/156
Protein structure prediction
It is a lot harder to determine the structure of a protein
experimentally than it is to obtain DNA sequence data. One very
active area of bioinformatics and computational biology research is
the developemtn of methods for predicting protein structure from protein sequence. Methods such as secondary structure prediction
and threading can help determine how a protein might fold,
classifying it with other proteins that have similar topology, but
they don¶t provide a detailed structure mode.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 46/156
Protein structure property analysis
Protein structures have many measurable properties that are of
interest to crystallographers and structural biologists. Protein
structure validation tools are used by crystallographers to measure
how well a structure model conforms to structural rules extractedfrom existing structures or chemical model compounds. These tools
may also analyze the ³fitness´ of every amino acid in a structure
model for its environment, flagging such oddities as buried charges
with no countercharge or large patches of hydrophobic amino acids
found on a protein surface. These tools are useful for evaluating both
experimental and theoretical structure models.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 47/156
Protein structure alignment and comparison
Even when two gene sequences aren¶t apparently homologous, the
structures of the proteins they encode can be similar, New tools for
computing structural similarity are making is possible to detect
distant homologies by comparing structures, even on the absence of
much sequence similarity.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 48/156
Biochemical simulation
Biochemical simulation uses the tools of dynamical systems
modeling to simulate the chemical reactions involved in
metabolism. Simulations can extend from individual metabolic
pathways to transmembrane transport processes and even properties
of whole cells or tissues. Biochemical and cellular simulationstraditionally have relied on the ability of the scientist to describe a
system mathematically, developing a system of differential
equations that represent the different reactions and fluxes occurring
in the system. However new software tools can build the
mathematical framework of a simulation automatically from adescription provided interactively by the user, making mathematical
modeling accessible to any biologist who knows enough about a
system to describe it according to the conventions of dynamical
systems modeling.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 49/156
Whole genome analysis
As more and more genomes are sequenced completely, the
analysis of raw genome data has become a more important
task. There are a number of perspectives from which one can
look at genome data: for example, it can be treated as a longlinear sequence, but it¶s often more useful to integrate DNA
sequence information with existing genetic and physical map
data. This allows you to navigate a very large genome and
find what you want.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 50/156
Primer design
Many molecular biology protocols require the design of
oligonucleotide primers. Proper primer design is critical for the
success of polymerase chain reaction (PCR ), oligo hybridization,
DNA sequencing, and microarray experiments. Primers must
hybridize with the target DNA to provide a clear answer to thequestion being asked, but, they must also have appropriate
physicochemical properties; they must not self-hybridize or
dimerize; and they should not have multiple targets within the
sequence under investigation. There are several web-based services
that allow users to submit a DNA sequence and automatically
detect appropriate primers, or to compute the properties of a
desired primer DNA sequence.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 51/156
DNA microarray analysis
DNA microarray analysis is a relatively new molecular biology
method that expands on classic probe hybridization methods to
provide access to thousands of genes at once.
The main tasks in microarray analysis as it¶s currently done are
an image analysis step, in which individual spots on the array
image are identified and signal intensities are identified.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 52/156
Proteomics analysis
Before they¶re ever crystallized and biochemically characterized, proteins are often studied using a combination of gel
electrophoresis, partial sequencing, and mass spectroscopy. 2-D gel
electrophoresis can separate a mixture of thousands of proteins into
distinct components; the individual spots of material can be blotted
or even cut from the gel and analyzed. Simple computational tools
can provide some information to aid in the process of analyzing
protein mixtures. It¶s trivial to compute molecular weight and pI
from a protein sequence; by using these values in combination, sets
of candidate identities can be found for each spot on a gel. It¶s also possible to compute, from a protein sequence, the peptide
fingerprint that is created when that protein is broken down into
fragments by enzymes with specific protein cleavage sites.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 53/156
DatabasesDatabases
The internet is a powerful resource containing a large volume of data and tools
to manipulate them« unfortunately, connecting data between them can
sometimes be tricky.
What is a database ?What is a database ?
An organized body of related information.A collection of information organized and presented to serve a specific
purpose. A computerized database is an updated, organized file of machine
readable information that is rapidly searched and retrieved by computer.
computerized storehouse of data (records).
allows user-defined queries.
allows extraction of specified records.
allows adding, changing, removing, and merging of records .
uses standardized formats.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 54/156
The ideal sequence database forThe ideal sequence database forcomputational analyses and datacomputational analyses and data--
mining:mining:
It must be complete with minimal redundancyIt must be complete with minimal redundancy
It must contain as much upIt must contain as much up--toto--date information (annotation) asdate information (annotation) aspossible on each sequencepossible on each sequence
All the information items must be retrievable by computer All the information items must be retrievable by computer
programs in a consistent mannerprograms in a consistent manner
It must be highly interoperable with other databasesIt must be highly interoperable with other databases
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 55/156
Database Categories List
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure DatabasesGenomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics ResourcesOther Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
Database Categories ListDatabase Categories List
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 56/156
The nucleotide sequence databases are data repositories, accepting nucleic
acid sequence data from the scientific community and making it freely
available. The databases strive for completeness, with the aim of recording
every publicly known nucleic acid sequence. These data are heterogenous,
they vary with respect to the source of the material (e.g. genomic versus
cDNA), the intended quality (e.g. finished versus single pass sequences), the
extent of sequence annotation and the intended completeness of the
sequence relative to its biological target (e.g. complete versus partial
coverage of a gene or a genome). The nucleotide databases are distributed
free of charge over the internet.
Nucleotide Sequence DatabasesNucleotide Sequence Databases
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 57/156
DDBJ, GenBank and EMBL-Bank exchange new and updated data on a
daily basis to achieve optimal synchronisation. The result is that they
contain exactly the same information, except for sequences that have been
added in the last 24 hours.
Nucleotide Sequence Databases can be further subdivided into following :
1)International Nucleotide Sequence Database Collaboration
2)Coding and non-coding DNA
3)Gene structure, introns and exons, splice sites
4)Transcriptional regulator sites and transcription factors.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 58/156
Database nameDatabase name Full name and/or descriptionFull name and/or description URLURL
1.1. International Nucleotide Sequence Database Collaboration1.1. International Nucleotide Sequence Database Collaboration
GenBank
An annotated collection of all publicly
available nucleotide and protein sequences http://www.ncbi.nlm.nih.gov/
EMBL Nucleotide
Sequence Database
An annotated collection of all publicly
available nucleotide and protein sequenceshttp://www.ebi.ac.uk/embl.html
DDBJ²DNA Data
Bank of Japan
An annotated collection of all publicly
available nucleotide and protein sequences
http://www.ddbj.nig.ac.jp
Nucleotide Sequence DatabasesNucleotide Sequence Databases
O li d t bO li d t b
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 59/156
Online databasesOnline databases primary repositories of sequence data:primary repositories of sequence data:
-- European Bioinformatics Institute (EBI)European Bioinformatics Institute (EBI)
-- DNA data bank of Japan (DDBJ)DNA data bank of Japan (DDBJ)-- GenBank, National Center for Biotechnology InformationGenBank, National Center for Biotechnology Information(NCBI)(NCBI)
� each of these databases
contain equivalent
information (formats vary
slightly)
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 60/156
1.2. DNA sequences: genes, motifs and regulatory sites1.2. DNA sequences: genes, motifs and regulatory sites
1.2.1. Coding and coding DNA1.2.1. Coding and coding DNA
ACLAME A classification of genetic mobile elements http://aclame.ulb.ac.be/
CUTG Codon usage tabulated from GenBank http://www.kazusa.or.jp/codon/
Genetic Codes
Deviations from the standard genetic code in various organisms
and organelles
http://www.ncbi.nlm.nih.gov/Taxonomy/
Utils/wprintgc.cgi?mode=c
HER Vd Human endogenous retrovirus database http://herv.img.cas.cz
IMGT/LIGM-
DB
Immunoglobulin, T cell receptor and MHC nucleotide
sequences from human and other vertebrates http://imgt.cines.fr/cgi-bin/IMGTlect.jv
Imprinted Gene
Catalogue Imprinted genes and parent-of-origin effects in animals http://www.otago.ac.nz/IGC
Islander Pathogenicity islands and prophages in bacterial genomes http://www.indiana.edu/islander
MICdb Prokaryotic microsatellites http://www.cdfd.org.in/micas
STR Base Short tandem DNA repeats database http://www.cstl.nist.gov/div831/strbase/
TIGR Gene
IndicesOrganism-specific databases of EST and gene sequences http://www.tigr.org/tdb/tgi.shtml
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 61/156
Transterm Codon usage, start and stop signals http://uther.otago.ac.nz/Transterm.html
UniGene Unified clusters of ESTs and full-length mR NA sequences http://www.ncbi.nlm.nih.gov/UniGene/
UniVec
Vector sequences, adapters, linkers and primers used in DNA
cloning, can be used to check for vector contamination
http://www.ncbi.nlm.nih.gov/VecScreen/U
niVec.html
V
ectorDB Characterization and classification of nucleic acid vectors
http://genome-
www2.stanford.edu/vectordb/
Xpro
Eukaryotic protein-encoding DNA sequences, both intron-
containing and intron-less genes http://origin.bic.nus.edu.sg/xpro/
1.2.2. Gene structure, introns and exons, splice sites1.2.2. Gene structure, introns and exons, splice sites
ASAP Alternative spliced isoforms http://www.bioinformatics.ucla.edu/ASAP
ASDEBI¶s alternative splicing database project includes three
databases AltSplice, AltExtron and AEdbhttp://www.ebi.ac.uk/asd
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 62/156
ASDB
Alternative splicing database: protein products and
expression patterns of alternatively-spliced genes http://hazelton.lbl.gov/teplitski/alt
EASED Extended alternatively spliced EST database http://eased.bioinf.mdc-berlin.de/
EID Exon±intron database: introns in protein-coding genes http://mcb.harvard.edu/gilbert/EID/
ExInt Exon±intron structure of eukaryotic genes http://intron.bic.nus.edu.sg/exint/exint.html
HS3D Homo sa pien s splice sites dataset http://www.sci.unisannio.it/docenti/rampone/
IDB/IEDB Intron sequence and evolution databases http://nutmeg.bio.indiana.edu/intron/index.html
Intronerator
Introns and alternative splicing in C.eleg an s and
C.brigg sae
http://www.cse.ucsc.edu/kent/intronerator/
SpliceDB Canonical and non-canonical mammalian splice sites
http://genomic.sanger.ac.uk/spldb/SpliceDB.htm
l
SpliceNest A tool for visualizing splicing of genes from EST data http://splicenest.molgen.mpg.de/
YIDB Yeast nuclear and mitochondrial intron sequences
http://www.embl-
heidelberg.DE/ExternalInfo/seraphin/yidb.html
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 63/156
1.2.3. Transcriptional regulator sites and transcription factors1.2.3. Transcriptional regulator sites and transcription factors
ACTIVITY Functional DNA/R NA site activity
http://util.bionet.nsc.ru/databases/activity.htm
l
DBTBS Bacillu s subtili s promoters and transcription factors http://dbtbs.hgc.jp/
DBTSS A database of transcriptional start sites http://dbtss.hgc.jp/
DPInteract Binding sites for E.coli DNA-binding proteins http://arep.med.harvard.edu/dpinteract
EPD Eukaryotic promoter database http://www.epd.isb-sib.ch
HemoPDB
Hematopoietic promoter database: transcriptional regulation in
hematopoiesis
http://bioinformatics.med.ohio-
state.edu/HemoPDB
HvrBase Primate mitochondrial DNA control region sequences http://www.hvrbase.org/
JASPAR PSSMs for transcription factor DNA-binding sites http://jaspar.cgb.ki.se
PLACE Plant ci s-acting regulatory DNA elements http://www.dna.affrc.go.jp/htdocs/PLACE
PlantCAR E Plant promoters and ci s-acting regulatory e lements http://intra.psb.ugent.be:8080/PlantCAR E/
PlantProm Plant promoter sequences for R NA polymerase II http://mendel.cs.rhul.ac.uk/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 64/156
Database Categories List
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure Databases
Genomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics ResourcesOther Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
Database Categories ListDatabase Categories List
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 65/156
The RNA sequence databases aims to contain all the databases have
compiled all complete or nearly complete ribosomal RNA sequences from all
or specific rna sequences. Some of them contains secondary structure
information, additional information about the sequences, such as taxonomic
classification of the organism from which they have been obtained, and
literature references are also provided. There are databases containing
information regarding 16S and 23S ribosomal RNA mutations, 5S rRNA
sequences, Genomic tRNA, All complete or nearly complete rRNA sequences
etc.
RNA sequence databasesRNA sequence databases
2 RNA d b2 RNA d b
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 66/156
2. RNA sequence databases2. RNA sequence databases
16S and 23S r R NA
Mutation Database 16S and 23S ribosomal R NA mutations http://ribosome.fandm.edu/
5S r R NA Database 5S r R NA sequences http://biobases.ibch.poznan.pl/5SData/
Aptamer database
Small R NA/DNA molecules binding nucleic acids,
proteins http://aptamer.icmb.utexas.edu/
AR ED AU-rich element-containing mR NA database http://rc.kfshrc.edu.sa/ared
Mobile group II introns
A database of group II introns, self-splicing catalytic
R NAs http://www.fp.ucalgary.ca/group2introns/
European r R NA
database All complete or nearly complete r R NA sequences http://www.psb.ugent.be/r R NA/
GtR DB Genomic tR NA database http://rna.wustl.edu/GtR DB
Guide R NA Database R NA editing in various kinetoplastid species
http://biosun.bio.tu-
darmstadt.de/goringer/gR NA/gR NA.html
HIV Sequence
Database HIV R NA sequences http://hiv-web.lanl.gov/
HyPaLib
Hybrid pattern library: structural elements in classes of
R NA
http://bibiserv.techfak.uni-
bielefeld.de/HyPa/
IR ESdb Internal ribosome entry site database
http://ifr31w3.toulouse.inserm.fr/IR ESda
tabase/
http://www sanger ac uk/Software/Rfam/mir
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 67/156
miR NA R egistry Database of microR NAs (small non-codingR NAs)
http://www.sanger.ac.uk/Software/R fam/mir
na/
NCIR Non-canonical interactions in R NA structures http://prion.bchs.uh.edu/bp_type/
ncR NAs Database Non-coding R NAs with regulatory functions http://biobases.ibch.poznan.pl/ncR NA/
PLANTncR NAs Plant non-codingR NAs http://www.prl.msu.edu/PLANTncR NAs
Plant snoR NA DB sno R NA genes in plant species http://www.scri.sari.ac.uk/plant_sno R NA/
PLMItR NA Plant mitochondrial tR NA http://bighost.area.ba.cnr.it/PLMItR NA/
PseudoBase Database of R NA pseudoknots
http://wwwbio.leidenuniv.nl/ Batenburg/P
KB.html
R DP R ibosomal database project: r R NA sequence data http://rdp.cme.msu.edu
R fam Non-coding R NA families http://www.sanger.ac.uk/Software/ R fam/
R ISCC R ibosomal internal spacer sequence collection http://ulises.umh.es/R ISSC
R NA Modification
Database Naturally modified nucleosides inR NA http://medlib.med.utah.edu/R NAmods/
RR NDB r R NA operon numbers in various prokaryotes http://rrndb.cme.msu.edu/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 68/156
Small R NA
Database Small R NAs from prokaryotes and eukaryotes http://mbcr.bcm.tmc.edu/smallRNA
SR PDB Signal recognition particle databasehttp://psyche.uthct.edu/dbs/SRPDB/SRPDB.html
Subviral R NA
Database Viroids and viroid-like R NAs
http://subviral.med.uottawa.ca/cgi-
bin/home.cgi
tmR NA
Website tmR NA sequences and alignments http://www.indiana.edu/tmrna
tmR DB tmR NA database
http://psyche.uthct.edu/dbs/tmR DB/tmR DB.
html
tR NA database t R NA viewer and sequence editor
http://www.uni-
bayreuth.de/departments/biochemie/trna/
UTR db/UTR sit
e5'- and 3'-UTR s of eukaryotic mR NAs http://bighost.area.ba.cnr.it/srs6/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 69/156
Database Categories List
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure Databases
Genomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics ResourcesOther Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
Database Categories ListDatabase Categories List
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 70/156
Types of protein databasesTypes of protein databases
GLAWEWINQTR
2. Protein motif databases | |||||
GREWEWINES
1. Sequence sequence databases SCIENCEISFN
3. Protein structure databases
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 71/156
�The protein databases are the most comprehensive source of information
on proteins. It is necessary to distinguish between universal databases
covering proteins from all species and specialised data collections storing
information about specific families or groups of proteins, or about the
proteins of a specific organism.
Two categories of universal protein databases can be discerned: simplearchives of sequence data; and annotated databases where additional
information has been added to the sequence record.
In the upcoming slides you will find a list of the databases like:
�Primary protein sequence databases such as UniProt/Swiss-Prot
�Specialised protein sequence databases such as GOA
�Specialised protein databases such as ENZYME
�Secondary protein databases such as InterPro
�Structure databases such as PDB
Protein sequence databasesProtein sequence databases
3 P t i d t b3 P t i d t b
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 72/156
3. Protein sequence databases3. Protein sequence databases
3.1. General sequence databases3.1. General sequence databases
EXProt Sequences of proteins with experimentally verified function http://www.cmbi.kun.nl/EXProt/
NCBI Protein
database
All protein sequences: translated from GenBank and imported
from other protein databases http://www.ncbi.nlm.nih.gov/entrez
PIR
Protein information resource: a collection of protein sequence
databases, part of the UniProt project http://pir.georgetown.edu/
PIR -NR EF PIR ¶s non-redundant reference protein database
http://pir.georgetown.edu/pirwww/pirnref
.shtml
PR F
Protein research foundation database of peptides: sequences,
literature and unnatural amino acids http://www.prf.or.jp/en
Swiss-Prot
Curated protein sequence database with a high level of
annotation (protein function, domain structure, modifications) http://www.expasy.org/sprot
TrEMBL
Translations of EMBL nucleotide sequence entries: computer-
annotated supplement to Swiss-Prot http://www.expasy.org/sprot
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 73/156
UniProt
Universal protein knowledgebase: a database of protein
sequence from Swiss-Prot, TrEMBL and PIR http://www.uniprot.org/
3.2. Protein properties3.2. Protein properties
AAindex Physicochemical properties of amino acids http://www.genome.ad.jp/aaindex/
ProTherm Thermodynamic data for wild-type and mutant proteins
http://gibk26.bse.kyutech.ac.jp/jouhou/Pr
otherm/protherm.html
3.3. Protein localization and targeting3.3. Protein localization and targeting
DBSubLoc Database of protein subcellular localization
http://www.bioinfo.tsinghua.edu.cn/dbsublo
c.html
MitoDrome Nuclear-encoded mitochondrial proteins of Dr oso phil a http://bighost.area.ba.cnr.it/BIG/MitoDrome
NESbase Nuclear export signals database http://www.cbs.dtu.dk/databases/NESbase
NLSdb Nuclear localization signals http://cubic.bioc.columbia.edu/db/NLSdb/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 74/156
THGS Transmembrane helices in genome sequences http://pranag.physics.iisc.ernet.in/thgs/
TMPDB Experimentally characterized transmembrane topologies http://bioinfo.si.hirosaki-.ac.jp/TMPDB/
3.4. Protein sequence motifs and active sites3.4. Protein sequence motifs and active sites
ASC Active sequence collection: biologically active peptides http://bioinformatica.isa.cnr.it/ASC/
Blocks Alignments of conserved regions in protein families http://blocks.fhcrc.org/
CSA
Catalytic site atlas: enzyme active sites and catalytic residues
in enzymes of known 3D structure
http://www.ebi.ac.uk/thornton-
srv/databases/CSA/
COMeCo-ordination of metals etc.: classification of bioinorganic
proteins (metalloproteins and some other complex proteins)http://www.ebi.ac.uk/come
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 75/156
eMOTIF Protein sequence motif determination and searches http://motif.stanford.edu/emotif
Metalloprotein
Site Database Metal-binding sites in metalloproteins http://metallo.scripps.edu/
O-GlycBase O- and C-linked glycosylation sites in proteins
http://www.cbs.dtu.dk/databases/ OGLYCBA
SE/
PhosphoBase Protein phosphorylation sites
http://www.cbs.dtu.dk/databases/PhosphoBas
e/
PR OMISE Prosthetic centers and metal ions in protein active sites http://metallo.scripps.edu/PR OMISE
PR OSITE Biologically significant protein patterns and profiles http://www.expasy.org/prosite
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 76/156
3.5. Protein domain databases; protein classification3.5. Protein domain databases; protein classification
CDD
Conserved domain database: includes protein domains from
Pfam, SMAR T and COG databases
http://www.ncbi.nlm.nih.gov/Structure/cdd/
cdd.shtml
CluSTr Clusters of Swiss-Prot+TrEMBL proteins http://www.ebi.ac.uk/clustr
Hits A database of protein domains and motifs http://hits.isb-sib.ch/
InterPro
Integrated resource of protein families, domains and
functional sites http://www.ebi.ac.uk/interpro
iProClass Integrated protein classification database http://pir.georgetown.edu/iproclass/
MetaFam Database of protein family annotations http://metafam.ahc.umn.edu/
Pfam Protein families: multiple sequence alignments and profilehidden Markov models of protein domains
http://www.sanger.ac.uk/Software/Pfa
m/
PIR SF Family/superfamily classification of whole proteins http://pir.georgetown.edu/pirsf/
PR INTS Hierarchical gene family fingerprints
http://www.bioinf.man.ac.uk/dbbrowser/PR IN
TS/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 77/156
PIR -ALN Curated database of protein sequence alignments
http://pir.georgetown.edu/pirwww/dbinfo/piraln
.html
ProClass
Protein families defined by PIR superfamilies and
PR OSITE patterns
http://pir.georgetown.edu/gfserver/proclass.htm
l
ProDom Protein domain families http://www.toulouse.inra.fr/prodom.html
ProtoMap Hierarchical classification of Swiss-Prot proteins http://protomap.cornell.edu/
ProtoNet Hierarchical clustering of Swiss-Prot proteins http://www.protonet.cs.huji.ac.il/
SBASE Protein domain sequences and tools http://www.icgeb.org/sbase
SMAR T
Simple modular architecture research tool: signalling,
extracellular and chromatin-associated protein domains http://smart.embl-heidelberg.de/
SUPFAM Grouping of sequence families into superfamilies http://pauling.mbu.iisc.ernet.in/supfam
SYSTER S Systematic re-searching and clustering of proteins http://systers.molgen.mpg.de/
TIGR FAMs TIGR protein families adapted for functional annotation http://www.tigr.org/TIGR FAMs
3 6 Databases of individual protein families3 6 Databases of individual protein families
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 78/156
3.6. Databases of individual protein families3.6. Databases of individual protein families
AAR SDB Aminoacyl-tR NA synthetase database http://rose.man.poznan.pl/aars/index.html
ABCdb ABC transporters database http://ir2lcb.cnrs-mrs.fr/ABCdb/
ASPD Artificial selected proteins/peptides database http://wwwmgs.bionet.nsc.ru/mgs/gnw/aspd/
BacTregulators Transcriptional regulators of AraC and TetR families http://www.bactregulators.org/
CSDBase Cold shock domain-containing proteins
http://www.chemie.uni-
marburg.de/ csdbase/
DExH/D
Family
Database DEAD-box, DEAH-box and DExH-box proteins http://www.helicase.net/dexhd/dbhome.htm
Endogenous
GPCR List G protein-coupled receptors; expression in cell lines http://www.tumor-gene.org/GPCR /gpcr.html
ESTHER Esterases and other alpha/beta hydrolase enzymes http://www.ensam.inra.fr/esther
EyeSite Families of proteins functioning in the eye http://eyesite.cryst.bbk.ac.uk/
GPCR DB G protein-coupled receptors database http://www.gpcr.org/7tm/
Histone
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 79/156
Histone
Database Histone fold sequences and structures http://research.nhgri.nih.gov/histones/
HIV Molecular
Immunology
Database HIV epitopes http://hiv-web.lanl.gov/immunology/
HIV Protease
Database HIV reverse transcriptase and protease sequences http://hivdb.stanford.edu/
Homeobox Page Homeobox proteins, classification and evolution
http://www.biosci.ki.se/groups/tbu/homeo.ht
ml
HomeodomainR esource
Homeodomain sequences, structures and related genetic andgenomic information http://research.nhgri.nih.gov/homeodomain
HOR DE Human olfactory receptor data exploratorium http://bioinfo.weizmann.ac.il/HOR DE/
InBase
Inteins (protein splicing elements) database: properties,
sequences, bibliography http://www.neb.com/neb/inteins.html
Kabat Database Sequences of proteins of immunological interest http://immuno.bme.nwu.edu/
KinG
Ser/Thr/Tyr-specific protein kinases encoded in complete
genomes http://hodgkin.mbu.iisc.ernet.in/king
KnottinsDatabase of knottins²small proteins with an unusual
µdisulfide through disulfide¶ knothttp://knottin.cbs.cnrs.fr
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 80/156
LGICdb Ligand-gated ion channel subunit sequences database
http://www.pasteur.fr/recherche/banques/
LGIC/LGIC.html
Lipase
Engineering
Database Sequence, structure and function of lipases and esterases http://www.led.uni-stuttgart.de/
LOX-DB Mammalian, invertebrate, plant and fungal lipoxygenases http://www.dkfz-heidelberg.de/spec/lox-db/
MER OPS Database of proteolytic enzymes (peptidases) http://www.merops.ac.uk/
MHCPEP MHC-binding peptides http://wehih.wehi.edu.au/mhcpep/
MPIMP Mitochondrial protein import machinery of plants
http://millar3.biochem.uwa.edu.au/ lister/i
ndex.html
NPD Nuclear protein database http://npd.hgu.mrc.ac.uk/
NucleaR DB Nuclear receptor superfamily http://www.receptors.org/NR /
Nuclear
R eceptor
R esource Nuclear receptor superfamily http://nrr.georgetown.edu/nrr/nrr.html
NUR EBASE Nuclear hormone receptors database
http://www.ens-
lyon.fr/LBMC/laudet/nurebase/nurebase.
html
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 81/156
Olfactory R eceptor
DatabaseSequences for olfactory receptor-like molecules http://ycmi.med.yale.edu/senselab/ordb/
ooTFD O bject-oriented transcription factors database http://www.ifti.org/ootfd
PK R
Protein kinase resource: sequences, enzymology,
genetics and molecular and structural properties http://pkr.sdsc.edu/
PLANT-PIs Plant protease inhibitors http://bighost.area.ba.cnr.it/PLANT-PIs
PlantsP/PlantsT
Plant proteins involved in phosphorylation and
membrane transport
http://plantsp.sdsc.edu/
Prolysis Proteases and natural and synthetic protease inhibitors http://delphi.phys.univ-tours.fr/Prolysis/
R EBASE R estriction enzymes and associated methylases http://rebase.neb.com/rebase/rebase.html
R ibonuclease P
Database R Nase P sequences, alignments and structures http://www.mbio.ncsu.edu/R NaseP/home.html
R PG R ibosomal protein gene database http://ribosome.miyazaki-med.ac.jp/
RTKdb Receptor tyrosine kinase sequences http://pbil univ lyon1 fr/RTKdb/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 82/156
R TKdb R eceptor tyrosine kinase sequences http://pbil.univ-lyon1.fr/R TKdb/
S/MAR t dB Nuclear scaffold/matrix attached regions http://smartdb.bioinf.med.uni-goettingen.de/
SDAP Structural database of allergenic proteins and food allergenshttp://fermi.utmb.edu/SDAP
SENTR A Sensory signal transduction proteins
http://wit.mcs.anl.gov/WIT2/Sentra/HTML/
sentra.html
SEVENS 7-transmembrane helix receptors (G-protein-coupled) http://sevens.cbrc.jp/
SR PDB Proteins of the signal recognition particles
http://bio.lundberg.gu.se/dbs/SRPDB/SR
PDB.html
TrSDB Transcription factor database http://ibb.uab.es/trsdb
VIDA Homologous viral protein families database
http://www.biochem.ucl.ac.uk/bsm/virus_da
tabase/VIDA.html
VKCDB
Voltage-gated potassium channel database http://vkcdb.biology.ualberta.ca/
Wnt Database Wnt proteins and phenotypes
http://www.stanford.edu/rnusse/wntwindow.
html
D b C i LiD b C i Li
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 83/156
Database Categories List
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure DatabasesGenomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics ResourcesOther Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
Database Categories ListDatabase Categories List
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 84/156
The number of known molecular structures is increasing
very rapidly and these are available through the various
databases comprising of structural information
regarding the specific molecule. Various sub categories
lying in this divison of molecular databases are:
1)Small molecules
2)Carbohydrates
3)Nucleic acid structure4)Protein structure
5) Unicellular eukaryotes genome databases.
Structure DatabasesStructure Databases
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 85/156
4. Structure Databases4. Structure Databases
4.1. Small molecules4.1. Small molecules
CSD
Cambridge structural database: crystal structure information
for organic and metal-organic compounds
http://www.ccdc.cam.ac.uk/prods/csd/csd.
html
HIC-Up Hetero-compound Information Centre²Uppsala http://xray.bmc.uu.se/hicup
AANT Amino acid±nucleotide interaction database http://aant.icmb.utexas.edu/
Klotho Collection and categorization of biological compounds http://www.biocheminfo.org/klotho
LIGAND Chemical compounds and reactions in biological pathways http://www.genome.ad.jp/ligand/
4.2. Carbohydrates4.2. Carbohydrates
CCSD Complex carbohydrate structure database (CarbBank)
http://bssv01.lancs.ac.uk/gig/pages/gag/c
arbbank.htm
Glycan Carbohydrate database, part of the KEGG system http://glycan.genome.ad.jp/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 86/156
GlycoSuiteDB N- and O-linked glycan structures and biological sources http://www.glycosuite.com/
Monosaccharide
Browser Space filling Fischer projections of monosaccharides
http://www.jonmaber.demon.co.uk/monosac
charide
SWEET-DB
Annotated carbohydrate structure and substance
information
http://www.dkfz-
heidelberg.de/spec2/sweetdb/
4.3. Nucleic acid structure4.3. Nucleic acid structure
NDB Nucleic acid-containing structures http://ndbserver.rutgers.edu/
NTDB Thermodynamic data for nucleic acids http://ntdb.chem.cuhk.edu.hk/
R NABase R NA-containing structures from PDB and NDB http://www.rnabase.org/
SCOR
Structural classification of R NA: R NA motifs by structure,
function and tertiary interactions http://scor.lbl.gov/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 87/156
PR ODOR IC NET Prokaryotic database of gene regulation networks http://prodoric.tu-bs.de/
PromEC
E.coli promoters with experimentally-identified
transcriptional start sites http://bioinfo.md.huji.ac.il/marg/promec
SELEX_DB
DNA and R NA binding sites for various proteins, found
by systematic evolution of ligands by exponential
enrichment
http://wwwmgs.bionet.nsc.ru/mgs/systems/s
elex/
TESS Transcription element search system http://www.cbil.upenn.edu/tess
TR ANSCompel
Composite regulatory elements affecting gene
transcription in eukaryotes
http://www.gene-
regulation.com/pub/databases.html#transco
mpel
TR ANSFAC Transcription factors and binding sites
http://transfac.gbf.de/TRANSFAC/index.
html
TRR D Transcription regulatory regions of eukaryotic genes http://www.bionet.nsc.ru/trrd/
4.4. Protein structure4.4. Protein structure
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 88/156
4.4. Protein structure4.4. Protein structure
ArchDB Automated classification of protein loop structures http://gurion.imim.es/archdb
ASTR AL
Sequences of domains of known structure, selected subsets
and sequence-structure correspondences http://astral.stanford.edu/
BAliBASE A database for comparison of multiple sequence alignments
http://www-igbmc.u-
strasbg.fr/BioInfo/BAliBASE2/index.html
BioMagR esBank NMR spectroscopic data for proteins and nucleic acids http://www.bmrb.wisc.edu/
CADB Conformational angles in proteins database http://cluster.physics.iisc.ernet.in/cadb/
CATH Protein domain structures database
http://www.biochem.ucl.ac.uk/bsm/cath_
new
CE 3D Protein structure alignments http://cl.sdsc.edu/ce.html
CKAAPs DB Structurally-similar proteins with dissimilar sequences http://ckaap.sdsc.edu/
Dali Protein fold classification using the Dali search enginehttp://www.bioinfo.biocenter.helsinki.fi:8
080/dali/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 89/156
Decoys µR ¶ Us Computer-generated protein conformations http://dd.stanford.edu/
DisProt
Database of Protein Disorder: information about proteins that
lack fixed 3D structure in their native states http://divac.ist.temple.edu/disprot
DomIns Domain insertions in known protein structures http://stash.mrc-lmb.cam.ac.uk/DomIns
DSDBASE Native and modeled disulfide bonds in proteins
http://www.ncbs.res.in/ faculty/mini/dsdba
se/dsdbase.html
DSMM Database of simulated molecular motions http://projects.villaosch.de/dbase/dsmm/
eF-site
Electrostatic surface of Functional site: electrostatic potentials
and hydrophobic properties of the active sites http://ef-site.protein.osaka-u.ac.jp/eF-site
FSSP
Fold classification based on structure-structure alignment of
proteins, currently maintained as Dali database http://www.ebi.ac.uk/dali/fssp
Gene3D Precalculated structural assignments for whole genomes
http://www.biochem.ucl.ac.uk/bsm/cath_ne
w/Gene3D/
GTD
Genomic threading database: structural annotations of
complete genomes http://bioinf.cs.ucl.ac.uk/GTD
GTOP Protein fold predictions from genome sequences http://spock.genes.nig.ac.jp/ genome/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 90/156
Het-PDB Navi Hetero-atoms in protein structures
http://daisy.nagahama-i-
bio.ac.jp/golab/hetpdbnavi.html
HOMSTR AD
Homologous structure alignment database: curated structure-
based alignments for protein families http://www-cryst.bioc.cam.ac.uk/homstrad
IMB Jena
Image Library Visualization and analysis of 3D biopolymer structures http://www.imb-jena.de/IMAGE.html
IMGT/3Dstruct
ure-DB
Sequences and 3D structures of vertebrate immunoglobulins, T
cell receptors and MHC proteins http://imgt3d.igh.cnrs.fr
ISSD Integrated sequence-structure database http://www.protein.bio.msu.su/issd
LPFC Library of protein family core structures
http://www-
smi.stanford.edu/projects/helix/LPFC
MMDB NCBI¶s database of 3D structures, part of NCBI Entrez http://www.ncbi.nlm.nih.gov/Structure
E-MSD EBI¶s macromolecular structure database http://www.ebi.ac.uk/msd
ModBase Annotated comparative protein structure models http://salilab.org/modbase
MolMovDB
Database of macromolecular movements: descriptions of
protein and macromolecular motions, including movies http://bioinfo.mbb.yale.edu/MolMovDB/
PALI Phylogeny and alignment of homologous protein structures http://pauling.mbu.iisc.ernet.in/ pali
PASS2 Structural motifs of protein superfamilieshttp://ncbs.res.in/ faculty/mini/campass/pas
s.html
PepConfDB A database of peptide conformations
http://202.41.70.49:8080/pepconfdb/index.ht
m
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 91/156
PepConfDB A database of peptide conformations m
PDB
Protein structure databank: all publicly available 3D
structures of proteins and nucleic acids http://www.rcsb.org/pdb
PDB-R EPR DB R epresentative protein chains, based on PDB entries http://www.cbrc.jp/pdbreprdb/
PDBsum Summaries and analyses of PDB structures http://www.biochem.ucl.ac.uk/bsm/pdbsum
SCOP Structural classification of proteins http://scop.mrc-lmb.cam.ac.uk/scop
Sloop Classification of protein loops http://www-cryst.bioc.cam.ac.uk/ sloop/
Structure-
Superposition
Database Pairwise superposition of TIM-barrel structures http://ssd.rbvi.ucsf.edu/
SWISS-MODEL
R epository Database of annotated 3D protein structure models http://swissmodel.expasy.org/repository
SUPER FAMILY Assignments of proteins to structural superfamilies http://supfam.org/
SUR FACE
Surface residues and functions annotated, compared and
evaluated: a database of protein surface patches http://cbm.bio.uniroma2.it/surface
TargetDB Target data from worldwide structural genomics projects http://targetdb.pdb.org/
3D-GENOMICS Structural annotations for complete proteomes http://www.sbg.bio.ic.ac.uk/3dgenomics
TOPS Topology of protein structures database http://www.tops.leeds.ac.uk
Database Categories ListDatabase Categories List
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 92/156
Database Categories List
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure DatabasesGenomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics ResourcesOther Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
Database Categories ListDatabase Categories List
G i D bG i D b
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 93/156
Genomics DatabasesGenomics Databases
For organisms of major interest to geneticists, there is a long history of conventionally
published catalogues of genes or mutations. In the past few years, most of these have
been made available in an electronic form and a variety of new databases have been
developed. These databases vary greatly in the classes of data captured and how these
data are stored.This category of databases comprising of the information regarding
various genomes like of Humans ,Plants, Viral, Invertebrate, Microbes etc.
1)Genome annotation terms, ontologies and nomenclature
2)Taxonomy and identification
3)General genomics databases
4)Viral genome databases
5)Prokaryotic genome databases
6)Unicellular eukaryotes genome databases
7)Fungal genome databases
8)Invertebrate genome databases
9)Human genome databases, maps and viewers.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 94/156
5. Genomics Databases (non5. Genomics Databases (non--human)human)
5.1. Genome annotation terms, onthologies and nomenclature5.1. Genome annotation terms, onthologies and nomenclature
GenewHuman gene nomenclature: approved genesymbols
http://www.gene.ucl.ac.uk/nomenclature
GO Gene onthology consortium database http://www.geneontology.org/
GOA Gene onthology annotation project http://www.ebi.ac.uk/GOA
IUBMB Nomenclature
database
Nomenclature of enzymes, membranetransporters, electron transport proteins and other
proteins htt
IUPAC
Nomenclature
database
Nomenclature of biochemical and organic
compounds approved by the IUBMB-IUPAC
Joint Commission http://www.chem.qmul.ac.uk/iupac
IUPHAR -R D
The International Union of Pharmacology
recommendations on receptor nomenclature and
drug classification http://www.iuphar-db.org/iuphar-rd/
PANTHER Gene products organized by biological function http://panther.celera.com/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 95/156
SOUR CE
Functional genomic resource for annotations ontologies and
expression data http://source.stanford.edu/
UMLS Unified medical language system http://umlsks.nlm.nih.gov/
5.1.1. Taxonomy and Identification5.1.1. Taxonomy and Identification
ICB gyr B
database for identification and classification of bacteria http://www.mbio.co.jp/icb
NCBI
Taxonomy Names and taxonomic lineages of all organisms in GenBank http://www.ncbi.nlm.nih.gov/Taxonomy/
R IDOM r R NA-based differentiation of medical microorganisms http://www.ridom-rdna.de/
R DP R ibosomal database project http://rdp.cme.msu.edu
Tree of Life Information on phylogeny and biodiversity
http://phylogeny.arizona.edu/tree/phylogeny
.html
5.2. General genomics databases5.2. General genomics databases
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 96/156
gg
COG
Clusters of orthologous groups of proteins from unicellular
microorganisms http://www.ncbi.nlm.nih.gov/COG
COR G
Comparative regulatory genomics: conserved non-coding
sequence blocks http://corg.molgen.mpg.de/
DEG Database of essential genes from bacteria and yeast http://tubic.tju.edu.cn/deg
EBI Genomes
EBI¶s collection of databases for the analysis of complete and
unfinished viral, pro- and eukaryotic genomes http://www.ebi.ac.uk/genomes
EGO
Eukaryotic gene orthologs: orthologous DNA sequences in
the TIGR gene indices http://www.tigr.org/tdb/tgi/ego/
EMGlib
Enhanced microbial genomes library: completely sequenced
genomes of unicellular organisms http://pbil.univ-lyon1.fr/emglib/emglib.html
EntrezGenomes
NCBI¶s collection of databases for the analysis of completeand unfinished viral, pro- and eukaryotic genomes
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome
ER GOLight
Integrated biochemical data on seven bacterial genomes:
publicly available portion of the ER GO database http://www.ergo-light.com/ER GO
FusionDB Database of bacterial and archaeal gene fusion events http://igs-server.cnrs-mrs.fr/FusionDB
Genome
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 97/156
Genome
information
broker
DDBJ¶s collection of databases for the analysis of complete
and unfinished viral, pro- and eukaryotic genomes http://gib.genes.nig.ac.jp
GOLD
Genomes online database: a listing of completed and ongoing
genome projects http://www.genomesonline.org/
TIGR
Microbial
Database
Lists of completed and ongoing genome projects with links to
complete genome sequences
http://www.tigr.org/tdb/mdb/mdbcomplet
e.html
HGT-DB
Putative horizontally transferred genes in prokaryotic
genomes http://www.fut.es/ debb/HGT/
KEGG
K yoto encyclopedia of genes and genomes: integrated suite of
databases on genes, proteins, and metabolic pathways http://www.genome.ad.jp/kegg
MBGD Microbial genome database for comparative analysis http://mbgd.genome.ad.jp/
OR Fanage
Database of orphan OR Fs (OR Fs with no homologs) in
complete microbial genomes http://www.cs.bgu.ac.il/ nomsiew/OR Fans
PACR AT Archaeal and bacterial intergenic sequence features http://www.biosci.ohio-tate.edu/ pacrat
PEDANT R esults of an automated analysis of genomic sequences http://pedant.gsf.de
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 98/156
TIGR
Comprehensiv
e Microbial
R esource
Various data on complete microbial genomes: uniform
annotation, properties of DNA and predicted proteins http://www.tigr.org/CMR
TransportDB
Predicted membrane transporters in complete genomes,
classified according to the TC classification system http://www.membranetransport.org
WIT
What is there? Metabolic reconstruction for completely
sequenced microbial genomes http://wit.mcs.anl.gov/WIT2/
5.3. Organism5.3. Organism--specific genomic databasesspecific genomic databases
5.3.1. Viruses5.3.1. Viruses
HCVDB The hepatitis C virus database http://hepatitis.ibcp.fr/
HIV Drug
R esistance
Database
Mutations in HIV genes that confer resistance to anti-HIV
drugs http://resdb.lanl.gov/R esist_DB/default.htm
VirGen
Annotated and curated database for complete viral genome
sequences http://bioinfo.ernet.in/virgen/virgen.html
5.3.2. Prokaryotes5.3.2. Prokaryotes
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 99/156
yy
5.3.2.1.5.3.2.1. Escherichia coli Escherichia coli
ASAP
A systematic annotation package for community analysis
of E.coli and related genomes
https://asap.ahabs.wisc.edu/annotation/php/A
SAP1.htm
CCDB CyberCell database: E.coli database at U. Alberta http://redpoll.pharmacy.ualberta.ca/CCDB
coliBase A database for E.coli, S al monell a and Shigell a http://colibase.bham.ac.uk/
Colibri E.coli genome database at Institut Pasteur http://genolist.pasteur.fr/Colibri/
Essential genes in
E.coli First results of an E.coli gene deletion project
http://magpie.genome.wisc.edu/ chris/esse
ntial.html
GenoBase E.coli genome database at Nara Institute http://ecoli.aist-nara.ac.jp/
GenProtEC E.coli K-12 genome and proteome database http://genprotec.mbl.edu
PEC Profiling of E.coli chromosome http://shigen.lab.nig.ac.jp/ecoli/pec
EcoCyc
E.coli K-12 genes, metabolic pathways, transporters, and
gene regulationhttp://ecocyc.org/
EcoGene Sequence and literature data on E.coli genes and proteins
http://bmb.med.miami.edu/EcoGene/EcoWe
b/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 100/156
R egulonDB Transcriptional regulation and operon organization in E.coli
http://www.cifn.unam.mx/Computational_G
enomics/regulondb/
5.3.2.2.5.3.2.2. Bacillus subtilis Bacillus subtilis
BSOR F Bacillu s subtili s genome database at Kyoto U. http://bacillus.genome.ad.jp/
NR Sub Non-redundant Bacillu s subtili s database at U. Lyon http://pbil.univ-lyon1.fr/nrsub/nrsub.html
SubtiList Bacillu s subtili s genome database at Institut Pasteur http://genolist.pasteur.fr/SubtiList/
5.3.2.3. Other bacteria5.3.2.3. Other bacteria
BioCyc Pathway/genome databases for many bacteria http://biocyc.org/
CampyDB Database for Cam pyl obacter genome analysis http://campy.bham.ac.uk/
ClostriDB Finished and unfinished genomes of C l ostridium spp. http://clostri.bham.ac.uk/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 101/156
CyanoBase Cyanobacterial genomes http://www.kazusa.or.jp/cyano
LeptoList Lept os pir a interr o g an s genome http://bioinfo.hku.hk/LeptoList
MolliGen Genomic data on mollicutes http://cbi.labri.fr/outils/molligen/
R sGDB Rhod obacter s phaer oide s genome
http://www-
mmg.med.uth.tmc.edu/sphaeroides
5.3.3. Unicellular eukaryotes5.3.3. Unicellular eukaryotes
5.3.3.1. Yeast5.3.3.1. Yeast
SGD S acchar om yce s genome database http://www.yeastgenome.org/
CYGD MIPS Comprehensive yeast genome database http://mips.gsf.de/proj/yeast
Génolevures A comparison of S .cerevi siae and 14 other yeast species http://cbi.labri.fr/Genolevures
MitoPD Yeast mitochondrial protein database http://bmerc-www.bu.edu/mito
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 102/156
5.3.3.2. Other unicellular eukaryotes5.3.3.2. Other unicellular eukaryotes
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 103/156
5.3.3.2. Other unicellular eukaryotes5.3.3.2. Other unicellular eukaryotes
ApiEST-DB EST sequences from various Apicomplexan parasites http://www.cbil.upenn.edu/paradbs-servlet
CryptoDB C rypt os poridium parvum genome database http://cryptodb.org/
DictyBase
Genome information, literature and experimental resources
for Dictyostelium di scoideum http://dictybase.org/
Full-Malaria
Full-length cDNA library from erythrocytic-stage
P l asmodium f al ciparum http://fullmal.ims.u-tokyo.ac.jp/
GeneDB
Curated database for T rypanosoma brucei, Lei shmania
major , S . pombe and other Sanger-sequenced genomes http://www.genedb.org/
PlasmoDB P l asmodium genome database http://plasmodb.org/
TcruziDB T rypanosoma cruzi genome d at abase http://tcruzidb.org/
ToxoDB Toxo pl asma g ondii genome d at abase http://toxodb.org/
5.3.4. Plants5.3.4. Plants
5 3 4 1 G l l d b5 3 4 1 G l l d b
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 104/156
5.3.4.1. General plant databases5.3.4.1. General plant databases
CropNet Genome mapping in crop plants http://ukcrop.net/
FLAGdb++ Integrative database about plant genomes
http://genoplante-
info.infobiogen.fr/FLAGdb/
GénoPlante-Info Plant genomic data from the Génoplante consortium http://genoplante-info.infobiogen.fr/
GrainGenes
Molecular and phenotypic information on wheat, barley, rye,
triticale and oats
http://wheat.pw.usda.gov or
http://www.graingenes.org
MendelDatabase of plant EST and STS sequences annotated withgene family information http://www.mendel.ac.uk/
PHYTOPR OT Clusters of (predicted) plant proteins
http://genoplante-
info.infobiogen.fr/phytoprot
PlantGDB
Plant genome database: actively-transcribed plant genomic
sequences http://www.plantgdb.org/
Sputnik Plant EST clustering and functional annotation http://mips.gsf.de/proj/sputnik
TIGR plant
repeat database Classification of repetitive sequences in plant genomes
http://www.tigr.org/tdb/e2k1/plant.repeat
s
TropGENE DB
Genetic and genomic information about tropical crops:
sugarcane, banana, cocoa http://tropgenedb.cirad.fr/
5 3 4 25 3 4 2 Arabidopsis thalianaArabidopsis thaliana
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 105/156
5.3.4.2.5.3.4.2. Arabidopsis thaliana Arabidopsis thaliana
AR AMEMNO N Ar abid o p si s thaliana membrane proteins and transporters http://aramemnon.botanik.uni-koeln.de/
AthaMap
Genome-wide map of putative transcription factor binding
sites in Ar abid o p si s thaliana http://www.athamap.de/
CATMA
Complete Ar abid o p si s transcriptome microarray: gene
sequence tags http://www.catma.org
FLAGdb/FST Ar abid o p si s thaliana T-DNA transformants http://genoplante-info.infobiogen.fr/
MAtDB MIPS Ar abid o p si s thaliana database http://mips.gsf.de/proj/thal/db
SeedGenes Genes essential for Ar abid o p si s development http://www.seedgenes.org/
TAIR The Ar abid o p si s information resource http://www.arabidopsis.org/
5.3.4.3. Rice5.3.4.3. Rice
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 106/156
BGI-R ISe Beijing genomics institute rice information system http://rise.genomics.org.cn/
INE Integrated rice genome explorer http://rgp.dna.affrc.go.jp/giot/INE.html
IR IS International rice information system: all rice data http://www.iris.irri.org/
MOsDB MIPS Oryza sativa database http://mips.gsf.de/proj/rice
Oryzabase R ice genetics and genomics http://www.shigen.nig.ac.jp/rice/oryzabase/
R iceGAAS R ice genome automated annotation system http://ricegaas.dna.affrc.go.jp/
R ice
PIPELINE Unification tool for rice databases http://cdna01.dna.affrc.go.jp/PIPE
R PD R ice proteome database http://gene64.dna.affrc.go.jp/R PD/
5.3.4.4. Other plants5.3.4.4. Other plants
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 107/156
MaizeGDB
Maize genetics and genomics database, a successor to
MaizeDB and ZmDB databases http://www.maizegdb.org/
MGI
M edica g o genome initiative: ESTs, gene expression and
proteomic data http://xgi.ncgr.org/mgi
MtDB M edica g o truncul at a genome http://www.medicago.org/MtDB
SGMD Soybean genomics and microarray database
http://psi081.ba.ars.usda.gov/SGMD/defaul
t.htm
5.3.5. Fungi5.3.5. Fungi
CADR E Central A s pergillu s data repository http://www.cadre.man.ac.uk/
COGEME Phytopathogenic fungi and oomycete EST database http://cogeme.ex.ac.uk
MagnaportheD
B M a gna porthe gri sea integrated physical/genetic map
http://www.fungalgenomics.ncsu.edu/Proje
cts/mgdatabase/int.htm
MNCDB MIPS N eur os por a cr assa database http://mips.gsf.de/proj/neurospora/
Phytophthora
Genome
Consortium
Database ESTs from P hyt o phthor a infe st an s and P.sojae https://xgi.ncgr.org/pgc
5 3 6 Invertebrates5 3 6 Invertebrates
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 108/156
5.3.6. Invertebrates5.3.6. Invertebrates
5.3.6.1.5.3.6.1. C aenorhabditis elegansC aenorhabditis elegans
C.eleg an s
Project Genome sequencing data at the Sanger Institute http://www.sanger.ac.uk/Projects/C_elegans
Intronerator Introns and alternative splicing in C.eleg an s and C.brigg sae
http://www.cse.ucsc.edu/ kent/intronerator
/
R NAiDB R NAi phenotypic analysis of C.eleg an s genes http://www.rnai.org/
WILMA C.eleg an s annotation database http://www.came.sbg.ac.at/wilma/
WorfDB C.eleg an s OR Feome http://worfdb.dfci.harvard.edu/
WormBaseData repository for C.eleg an s and C.brigg sae: curated
genome annotation, genetic and physical maps, pathwayshttp://www.wormbase.org/
5.3.6.2.5.3.6.2. Drosophila melanogaster Drosophila melanogaster
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 109/156
FlyBase Dr oso phil a sequences and genomic information http://flybase.bio.indiana.edu/
GadFly Genome annotation database of Dr oso phil a http://www.fruitfly.org
FlyBrain Database of the Dr oso phil a nervous system http://flybrain.neurobio.arizona.edu
FlyTrap
Dr oso phil a transgenic lines created using an intron protein
trap strategy http://flytrap.med.yale.edu/
InterActive Fly Dr oso phil a genes and their roles in development
http://sdb.bio.purdue.edu/fly/aimain/1aahom
e.htm
Dr oso phil amicroarray
centre Data and tools for Dr oso phil a gene expression studies http://www.flyarrays.com/fruitfly
5.3.6.3. Other invertebrates5.3.6.3. Other invertebrates
AppaDB A database on the nematode P ri stionchu s pacificu s http://appadb.eb.tuebingen.mpg.de
CnidBase Cnidarian evolution and gene expression database http://cnidbase.bu.edu/
Nematode.net Parasitic nematode sequencing project http://nematode.net/
NEMBASE Nematode sequence and functional data database http://www.nematodes.org
Database Categories ListDatabase Categories List
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 110/156
Database Categories List
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure DatabasesGenomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics ResourcesOther Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
Database Categories ListDatabase Categories List
Metabolic and Signaling PathwaysMetabolic and Signaling Pathways
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 111/156
The metabolic and signaling pathway is a collection of
Pathway/Signaling Databases. Each database in this
collection describes the genome and metabolic pathways
of a single organism, with some exception databases. The
categories in this
1)Enzymes and enzyme nomenclature
2)Metabolic pathways
3)Intermolecular interactions and signaling pathways
6. Metabolic Enzymes and Pathways; Signaling Pathways6. Metabolic Enzymes and Pathways; Signaling Pathways
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 112/156
y y g g yy y g g y
6.1. Enzymes and Enzyme Nomenclature6.1. Enzymes and Enzyme Nomenclature
ENZYME Enzyme nomenclature and properties http://www.expasy.org/enzyme
BR ENDA
Enzyme names and properties: sequence, structure, specificity,
stability, reaction parameters, isolation data http://www.brenda.uni-koeln.de
IntEnz Integrated enzyme database and enzyme nomenclature http://www.ebi.ac.uk/intenz
Enzyme
Nomenclature IUBMB Nomenclature Committee recommendations http://www.chem.qmw.ac.uk/iubmb/enzyme
6.2. Metabolic Pathways6.2. Metabolic Pathways
KEGG
Kyoto encyclopedia of genes and genomes: metabolic and
regulatory pathways encoded in complete genomes http://www.genome.ad.jp/kegg
MetaCyc Metabolic pathways and enzymes from various organisms http://metacyc.org
PathDB Biochemical pathways, compounds and metabolism http://www.ncgr.org/pathdb
UM-BBD
University of Minnesota biocatalysis and biodegradation
database: microbial catabolism and biotransformations http://umbbd.ahc.umn.edu/
WIT2Integrated system for functional curation and development of
metabolic modelshttp://wit.mcs.anl.gov/WIT2/
6.3. Intermolecular Interactions and Signaling Pathways6.3. Intermolecular Interactions and Signaling Pathways
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 113/156
6.3. Intermolecular Interactions and Signaling Pathways6.3. Intermolecular Interactions and Signaling Pathways
aMAZE
A system for the annotation, management and analysis of
biochemical and signaling pathway networks http://www.amaze.ulb.ac.be/
BIND Biomolecular interaction network database http://www.bind.ca
BioCarta Online maps of metabolic and signaling pathways
http://www.biocarta.com/genes/allPathways.
asp
BR ITE
Biomolecular relations in information transmission and
expression, part of the KEGG system http://www.genome.ad.jp/brite
DIP
Database of interacting proteins: experimentally determined
protein±protein interactions http://dip.doe-mbi.ucla.edu
DR C Database of ribosomal crosslinks
http://www.mpimg-berlin-
dahlem.mpg.de/ ag_ribo/ag_brimacombe/
drc
GeneNet Database on gene network components
http://wwwmgs.bionet.nsc.ru/mgs/gnw/ge
nenet
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 114/156
IntAct project Protein±protein interaction data http://www.ebi.ac.uk/intact
InterDom Putative protein domain interactions http://interdom.lit.org.sg
JenPep
Functional and quantitative thermodynamic data
on peptide binding to immunological
biomacromolecules http://www.jenner.ac.uk/Jenpep2
MPID MHC²peptide interaction database http://surya.bic.nus.edu.sg/mpid
R OSPath
R eactive oxygen s pecies (R OS) signaling
pathway http://rospath.ewha.ac.kr
STCDB Signal transductions classification database
http://www.techfak.uni-
bielefeld.de/ mchen/STCDB
STR ING
Predicted functional associations between
proteins www.bork.emblheidelberg.de/STRING
TR ANSPATH
Gene regulatory networks and microarray
analysis
http://www.biobase.de/pages/products/
databases.html
Database Categories ListDatabase Categories List
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 115/156
Database Categories List
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure DatabasesGenomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics ResourcesOther Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
gg
Human and other Vertebrate GenomesHuman and other Vertebrate Genomes
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 116/156
The Human and other vertebrate genomes is a repository of
the human genome as well as the other vertebrate genomes
containing databases.
1)Model organisms, comparative genomics
2)Human genome databases, maps and viewers
3)H
umanO
RFs.
77.Human and other Vertebrate Genomes.Human and other Vertebrate Genomes
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 117/156
77.Human and other Vertebrate Genomes.Human and other Vertebrate Genomes
7.1. Mitochondrial Genes and Proteins7.1. Mitochondrial Genes and Proteins
AMmtDB Metazoan mitochondrial genes
http://bighost.area.ba.cnr.it/mitochondriom
e
GOBASE Organelle genome database
http://megasun.bch.umontreal.ca/gobase/go
base.html
MitoDat Mitochondrial proteins (predominantly human) http://www-lecb.ncifcrf.gov/mitoDat/
MitoMap Human mitochondrial genome http://www.mitomap.org/
MitoNuc Nuclear genes coding for mitochondrial proteins
http://bio-
www.ba.cnr.it:8000/BioWWW/#MitoNuc
MITOP2 Mitochondrial proteins, genes and diseases http://ihg.gsf.de/mitop2/
MitoProteome
Mitochondrial protein sequences encoded by
mitochondrial and nuclear genes http://www.mitoproteome.org
OGR eComplete mitochondrial genome sequences for 200
metazoan specieshttp://www.bioinf.man.ac.uk/ogre
7.2. Model organisms, comparative genomics7.2. Model organisms, comparative genomics
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 118/156
ACeDB
C.eleg an s, S . pombe, and human sequences and genomic
information http://www.acedb.org/
AllGenes Human and mouse gene, transcript and protein annotation http://www.allgenes.org/
ArkDB Genome databases for farm and other animals http://www.thearkdb.org/
Cre Transgenic
Database Cre transgenic mouse lines with links to publications http://www.mshri.on.ca/nagy/
DR ESH Human cDNA clones homologous to D
r oso phil a mutantgenes http://www.tigem.it/LO
CAL/drosophila/dros.html
Ensembl Annotated information on eukaryotic genomes http://www.ensembl.org/
FANTOM Functional annotation of mouse full-length cDNA clones http://fantom2.gsc.riken.go.jp
FR EP Functional repeats in mouse cDNAs http://facts.gsc.riken.go.jp/FR EP/
IPD-MHC
Database Non-human major histocompatibility complex sequences http://www.ebi.ac.uk/ipd/mhc
GenetPig Genes controlling economic traits in pig http://www.infobiogen.fr/services/Genetpig
K OG Eukaryotic orthologous groups of proteins
http://www.ncbi.nlm.nih.gov/C OG/new/sh
okog.cgi
LocusLink Curated sequences and descriptions of genetic loci http://www ncbi nlm nih gov/LocusLink
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 119/156
LocusLink Curated sequences and descriptions of genetic loci http://www.ncbi.nlm.nih.gov/LocusLink
Mouse
Genome
Database Mouse genome database http://www.informatics.jax.org/
Mouse SAGE SAGE libraries from various mouse tissues and cell lines http://mouse.biomed.cas.cz/sage
Mouse
Targeted
Mutations Information on transgenic animals and targeted mutations http://tbase.jax.org/
MTID Mouse transposon insertion database http://mouse.ccgb.umn.edu/transposon/
PEDE Pig EST data explorer: full-length cDNA libraries and ESTs http://pede.gene.staff.or.jp/
R at Genome
Database R at genetic and genomic data http://rgd.mcw.edu/
TIGR Gene
Indices Organism-specific databases of EST and gene sequences http://www.tigr.org/tdb/tgi.shtml
UniGene Unified clusters of ESTs and full-length mR NA sequences http://www.ncbi.nlm.nih.gov/UniGene/
UniSTSUnified non-redundant view of sequence tagged sites withmarker and mapping data from a variety of resources
http://www.ncbi.nlm.nih.gov/entrez/query.f cgi?db=unists
ZFIN Genetic, genomic and developmental data from zebrafish http://zfin.org/
7.3. Human genome databases, maps and viewers7.3. Human genome databases, maps and viewers
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 120/156
g , pg , p
Ensembl Annotated information on eukaryotic genomes http://www.ensembl.org/
AluGene Complete Alu map in the human genome http://alugene.tau.ac.il/
CroW 21 Human chromosome 21 database http://bioinfo.weizmann.ac.il/crow21/
G3-R H Stanford G3 and TNG radiation hybrid maps http://www-shgc.stanford.edu/R H/
GB4-R H Genebridge4 human radiation hybrid mapshttp://www.sanger.ac.uk/Software/R Hserver/R Hserver.shtml
GDB Human genes and genomic maps http://www.gdb.org/
GenAtlas Human genes, markers and phenotypes http://www.citi2.fr/GENATLAS/
GeneCards
Integrated database of human genes, maps, proteins and
diseases http://bioinfo.weizmann.ac.il/cards/
GeneLoc
Gene location database (formerly UDB²Unified database
for human genome mapping) http://genecards.weizmann.ac.il/geneloc/
GeneNest Gene indices of human, mouse, zebrafish, etc. http://genenest.molgen.mpg.de/
GenMapDB Mapped human BAC clones http://genomics.med.upenn.edu/genmapdb
Gene Resource
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 121/156
Gene R esource
Locator Alignment of ESTs with finished human sequence http://grl.gi.k.u-tokyo.ac.jp/
HOWDY Human organized whole genome database http://www-alis.tokyo.jst.go.jp/HOWDY/
HuGeMap Human genome genetic and physical map data
http://www.infobiogen.fr/services/Hugema
p
Human BAC Ends
Database Non-redundant human BAC end sequences
http://www.tigr.org/tdb/humgen/bac_end_s
earch/bac_end_intro.html
IXDB Physical maps of human chromosome X http://ixdb.mpimg-berlin-dahlem.mpg.de/
NCBI R efSeq Non-redundant DNA and protein sequence collection http://www.ncbi.nlm.nih.gov/R efSeq/
UCSC Genome
Browser Genome assemblies and annotation http://genome.ucsc.edu/
ParaDB Paralogy mapping in human genomes http://abi.marseille.inserm.fr/paradb/
R Hdb R adiation hybrid map data http://www.ebi.ac.uk/R Hdb
STACK Sequence tag alignment and consensus knowledgebase http://www.sanbi.ac.za/Dbases.html
Database Categories ListDatabase Categories List
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 122/156
Database Categories List
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure DatabasesGenomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics ResourcesOther Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
Human Genes and DiseasesHuman Genes and Diseases
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 123/156
Human Genes and Diseases
Human genes and diseases is a category of those databases that has the
information regarding disease causing genes, having databases of cancerous
genes, human OR Fs, etc.
1)Human OR Fs
2)General human genetics databases
3)General polymorphism databases
4)Cancer gene databases
5)Gene-system or disease-specific databases
7.4.Human proteins7.4.Human proteins
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 124/156
HPMR
Human plasma membrane receptome: protein sequences,
literature, and expression database http://receptome.stanford.edu/
HPR D
Human protein reference database: domain architecture,
post-translational modifications, and disease association http://www.hprd.org
HUNT Human novel transcripts: annotated full-length cDNAs http://www.hri.co.jp/HUNT
HUGE
Human unidentified gene-encoded large (>50 kDa) protein
and cDNA sequences http://www.kazusa.or.jp/huge
LIFEdbLocalization, interaction and functional assays of human proteins http://www.dkfz.de/LIFEdb
trome, trEST and
trGEN Databases of predicted human protein sequences ftp://ftp.isrec.isb-sib.ch/pub/databases/
8.H
uman Genes and Diseases8.H
uman Genes and Diseases
8.1. General Databases8.1. General Databases
Genetics Home
R eference A general guide on human hereditary diseases http://ghr.nlm.nih.gov/
Homophila Dr oso phil a homologs of human disease genes http://homophila.sdsc.edu/
I t ti l i ti i f ti t
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 125/156
IMGT
International immunogenetics information system:
immunoglobulins, T cell receptors, MHC and R PI http://imgt.cines.fr/
Mutation
Spectra
Database
Mutations in viral, bacterial, yeast and mammalian
genes http://info.med.yale.edu/mutbase/
OMIA
Online Mendelian inheritance in animals: a catalog of
animal genetic and genomic disorders http://www.angis.org.au/omia
OMIM
Online Mendelian inheritance in man: a catalog of
human genetic and genomic disorders http://www.ncbi.nlm.nih.gov/Omim/
OR FDB Collection of OR Fs that are sold by Invitrogen http://orf.invitrogen.com/
PathBase
European mutant mice pathology database:
histopathology photomicrographs and macroscopic
images http://www.pathbase.net/
PMD Compilation of protein mutant data http://pmd.ddbj.nig.ac.jp/
8.2. Human Mutations Databases8.2. Human Mutations Databases
8 2 1 G l l hi d t b8 2 1 G l l hi d t b
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 126/156
8.2.1. General polymorphism database8.2.1. General polymorphism database
ALFR ED Allele frequencies and DNA polymorphisms http://alfred.med.yale.edu/
BayGenomics Genes relevant to cardiovascular and pulmonary disease http://baygenomics.ucsf.edu/
dbSNP Database of single nucleotide polymorphisms www.ncbi.nlm.nih.gov/SNP/
FIMM Functional molecular immunology data http://sdmc.krdl.org.sg:8080/fimm/
HGVS
Databases A compilation of human mutation databases http://www.hgvs.org/
HGV baseHuman genome variation database: curated human polymorphisms http://hgvbase.cgb.ki.se/
HGMD Human gene mutation database http://www.hgmd.org/
IPD
Immuno polymorphism database: data on human killer-cell
Ig-like receptors and human platelet antigens http://www.ebi.ac.uk/ipd
JSNP Japanese SNP database http://snp.ims.u-tokyo.ac.jp/
rSNP Guide SNPs in regulatory gene regions http://util.bionet.nsc.ru/databases/rsnp.html
SNP
Consortium
database SNP Consortium data http://snp.cshl.org/
TopoSNP Topographic database of non-synonymous SNPs http://gila.bioengr.uic.edu/snp/toposnp
8.2.2. Cancer8.2.2. Cancer
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 127/156
Atlas of Genetics and
Cytogenetics in
Oncology and
Haematology
Cancer related genes, chromosomal abnormalities in
oncology and haematology, and cancer-prone diseases
http://www.infobiogen.fr/services/chromca
ncer/CGED Cancer gene expression database http://love2.aist-nara.ac.jp/CGED
Database of Germline
p53 Mutations Mutations in human tumor and cell line p53 gene
http://www.lf2.cuni.cz/win/projects/germli
ne_mut_p53.htm
IAR C TP53 Database Human TP53 somatic and germline mutations http://www.iarc.fr/p53/
MTBMouse tumor biology database: mouse tumor types,genes, classification, incidence, pathology http://tumor.informatics.jax.org/
Oral Cancer Gene
Database
Cellular and molecular data for genes involved in oral
cancer http://www.tumor-gene.org/Oral/oral.html
R B1 Gene Mutation
Database Mutations in the human retinoblastoma (R B1) gene http://www.d-lohmann.de/R b/
R TCGD Mouse retroviral tagged cancer gene database http://rtcgd.ncifcrf.gov/
SNP500Cancer R e-sequenced SNPs from 102 reference samples http://snp500cancer.nci.nih.gov
SV40 Large T-
Antigen Mutant
Database Mutations in SV40 large tumor antigen gene http://bigdaddy.bio.pitt.edu/SV40/
Tumor Gene
Family
Databases
Cellular, molecular and biological data about genes involved
in ario s cancers http:// t mor gene org/tgdf html
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 128/156
Databases in various cancers http://www.tumor-gene.org/tgdf.html
8.2.3. Gene, system or disease8.2.3. Gene, system or disease--specificspecific
ALPSbase Autoimmune lymphoproliferative syndrome database http://research.nhgri.nih.gov/alps/
Androgen
R eceptor Gene
Mutations
Database Mutations in the androgen receptor gene http://www.mcgill.ca/androgendb/
BTKbase Mutation registry for X-linked agammaglobulinemia http://bioinf.uta.fi/BTKbase/
CASR DB
Calcium-sensing receptor database: CASR mutations
causing hypercalcemia and/or hyperparathyroidism http://www.casrdb.mcgill.ca/
Cytokine Gene
Polymorphism in
Human Disease Cytokine gene polymorphism literature database
http://bris.ac.uk/pathandmicro/services/GAI
/cytokine4.htm
Collagen Mutation
Database Human type I and type III collagen gene mutations http://www.le.ac.uk/genetics/collagen/
ER GDB Estrogen responsive genes database
http://sdmc.lit.org.sg/ergdb/cgi-
bin/explore.pl
FUNPEP
Low-complexity peptides capable of forming amyloid
plaque
http://www.cmbi.kun.nl/swift/FUNPEP/g
ergo/
GOLD.db Genomics of lipid-associated disorders database http://gold.tugraz.at
tGR AP Mutants of G-protein coupled receptors of family A http://tinygrap.uit.no/GR AP/
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 129/156
p p p y p yg p
HaemB Factor IX gene mutations, insertions and deletions
http://www.kcl.ac.uk/ip/petergreen/haemBd
atabase.html
HbVar Human hemoglobin variants and thalassemias http://globin.cse.psu.edu/globin/hbvar
Human p53/hprt,
rodent lacI/lacZ
databases
Mutations at the human p53 and hprt genes; rodent
transgenic lacI and lacZ mutations
http://www.ibiblio.org/dnam/mainpage.htm
l
Human PAX2 Allelic
Variant Database Mutations in human PAX2 gene http://pax2.hgu.mrc.ac.uk/
Human PAX6 Allelic
Variant Database Mutations in human PAX6 gene http://pax6.hgu.mrc.ac.uk/
IL2R gbase
X-linked severe combined immunodeficiency
mutations http://research.nhgri.nih.gov/scid/
IMGT/Gene-DBVertebrate immunoglobulin and T cell receptor
geneshttp://imgt.cines.fr/cgi-bin/GENElect.jv
IMGT/HLA Polymorphism of human MHC and related genes http://www.ebi.ac.uk/imgt/hla/
H di i fl di d d f ili l di
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 130/156
INFEVER S
Hereditary inflammatory disorder and familial mediterranean
fever mutation data http://fmf.igh.cnrs.fr/infevers
KinMutBase Disease-causing protein kinase mutations http://www.uta.fi/imt/bioinfo/KinMutBase/
Lowe
Syndrome
Mutation
Database
Phosphatidylinositol-4,5-bisphosphate 5-phosphatase
mutations causing Lowe oculocerebrorenal syndrome http://research.nhgri.nih.gov/lowe/
NCL Mutation
Database Polymorphisms in neuronal ceroid lipofuscinoses genes http://www.ucl.ac.uk/ncl/
PAHdb Mutations at the phenylalanine hydroxylase locus http://www.pahdb.mcgill.ca/
PGDB Prostate and prostatic diseases gene database http://www.ucsf.edu/PGDB
PHEXdb PHEX mutations causing X-linked hypophosphatemia http://www.phexdb.mcgill.ca/
PTCH1
Mutation
Database Mutations and SNPs found in PTCH1 gene
http://www.cybergene.se/PTCH/ptchbase.ht
ml
Database Categories ListDatabase Categories List
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 131/156
Database Categories List
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure DatabasesGenomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics ResourcesOther Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
Microarray Data and other Gene Expression DatabasesMicroarray Data and other Gene Expression Databases
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 132/156
Microarrays are producing massive amounts of data.
These data, like genome sequence data, can use to gain insights into
underlying biological processes only if they are carefully recorded and stored
in databases, where they can be queried, compared and analysed by different
computer software programs .
A gene expression database can be regarded as consisting of three parts
the gene expression data matrix,
gene annotation
and sample annotation.
Hence the Microarray data and other gene expression databases is consists
of repositories of microarray data and gene expression data.
9. Microarray Data and other Gene Expression Databases9. Microarray Data and other Gene Expression Databases
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 133/156
ArrayExpress Public collection of microarray gene expression data http://www.ebi.ac.uk/arrayexpress
Axeldb Gene expression in X eno pu s l aevi shttp://www.dkfz-heidelberg.de/abt0135/axeldb.htm
BodyMap Human and mouse gene expression data http://bodymap.ims.u-tokyo.ac.jp/
BGED Brain gene expression database http://love2.aist-nara.ac.jp/BGED
CleanEx
Expression reference database, linking heterogeneous
expression data to facilitate cross-dataset comparisons http://www.cleanex.isb-sib.ch/
EICO DB
Expression-based imprint candidate organiser: a database for
discovery of novel imprinted genes http://fantom2.gsc.riken.jp/EICODB/
ema p Atlas
Edinburgh mouse atlas: a digital atlas of mouse embryo
development and spatially-mapped gene expression http://genex.hgu.mrc.ac.uk/
EPConDB Endocrine pancreas consortium database http://www.cbil.upenn.edu/EPConDB
EpoDB Genes expressed during human erythropoiesis http://www.cbil.upenn.edu/EpoDB/
Fl Vi D hil d l t d ti htt // bi 07 i t d /
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 134/156
FlyView Dr oso phil a development and genetics http://pbio07.uni-muenster.de/
GeneAnnot
R evised and improved annotation of Affymetrix human
gene probe sets http://genecards.weizmann.ac.il/geneannot/
GeneNote Human genes expression profiles in healthy tissues
http://genecards.weizmann.ac.il/genenote
/
GenePaint Gene expression patterns in the mouse http://www.genepaint.org/Frameset.html
GeneTrap
Expression patterns in an embryonic stem library of gene
trap insertions http://www.cmhd.ca/sub/genetrap.asp
GermOnline
Expression data relevant for the mitotic and meiotic cell
cycle and gametogenesis in yeast and higher eukaryotes http://www.germonline.org/
GXD Mouse gene expression database
http://www.informatics.jax.org/menus/expre
ssion_menu.shtml
HemBase Genes transcribed in differentiating human erythroid cells http://hembase.niddk.nih.gov/
HugeIndex Expression levels of human genes in normal tissues http://hugeindex.org/
Interferon
Stimulated GeneDatabase Genes induced by treatment with interferons
http://www.lerner.ccf.org/labs/williams/xchi p-html.cgi
Kidney
Development
Database
Kidney development and gene expression http://golgi.ana.ed.ac.uk/kidhome.html
MAGEST Ascidian ( Hal oc ynthia r oretzi) gene expression patterns http://www.genome.ad.jp/magest
M d k (f h t fi h O i l ti ) i
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 135/156
MEPD
Medaka (freshwater fish Oryzias l atipe s) gene expression
pattern database http://medaka.dsp.jst.go.jp/MEPD
MethDB DNA methylation data, patterns and profiles http://www.methdb.de/
NASCarrays Nottingham Ar abid o p si s Stock Centre microarray database http://affymetrix.arabidopsis.info
NetAffx Public Affymetrix probesets and annotations http://www.affymetrix.com/
PEDB
Prostate expression database: ESTs from prostate tissue and
cell type-specific cDNA libraries http://www.pedb.org/
PEPR
Public expression profiling resource: expression profiles in
a variety of diseases and conditions
http://microarray.cnmcresearch.org/pgadatat
able.asp
R ECODE
Genes using programmed translational recoding in their
expression http://recode.genetics.utah.edu/
R efExA R eference database for human gene expression analysis http://www.lsbm.org/db/index_e.html
Stanford
Microarray
Database R aw and normalized data from microarray experiments
http://genome-
www.stanford.edu/microarray
Tooth
Development
Database
Gene expression in dental tissue http://bite-it.helsinki.fi/
Database Categories ListDatabase Categories List
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 136/156
Database Categories List
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure DatabasesGenomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics ResourcesOther Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
Proteomics ResourcesProteomics Resources
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 137/156
The proteomic resources have databases containing
proteomics information from various genomes/proteomes.
�Characterization of Protein Complexes
�Protein Expression Profiling
�Proteome Mining
�Protein Arrays
Applications of Proteomics
What is Proteomics?What is Proteomics?
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 138/156
What is Proteomics?What is Proteomics?
Defined as ³the analysis of the entire protein complementDefined as ³the analysis of the entire protein complement
in a given cell, tissue, or organism.´in a given cell, tissue, or organism.´
Proteomics ³also assesses activities, modifications,Proteomics ³also assesses activities, modifications,
localization, and interactions of proteins in complexes.´localization, and interactions of proteins in complexes.´
Technology of ProteomicsTechnology of Proteomics
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 139/156
gygy
Separation and Isolation of ProteinsSeparation and Isolation of Proteins
1D and 2D PAGE1D and 2D PAGE
Edman SequencingEdman Sequencing
Mass SpectrometryMass Spectrometry
Database utilizationDatabase utilization
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 140/156
Types of ProteomicsTypes of Proteomics
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 141/156
Protein ExpressionProtein Expression
Quantitative study of protein expression betweenQuantitative study of protein expression between
samples that differ by some variablesamples that differ by some variable
Structural ProteomicsStructural Proteomics
Goal is to map out the 3Goal is to map out the 3--D structure of proteins andD structure of proteins andprotein complexesprotein complexes
Functional ProteomicsFunctional Proteomics
10. Proteomics Resources10. Proteomics Resources
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 142/156
GelBank 2D gel electrophoresis patterns of proteins fromcomplete microbial genomes http://gelbank.anl.gov/
PEP
Predictions for entire proteomes: summarized
analyses of protein sequences http://cubic.bioc.columbia.edu/pep/
Proteome
Analysis
Database
Functional classification of proteins in whole
genomes http://www.ebi.ac.uk/proteome/
R ESID Pre-, co- and post-translational protein modifications
http://www-
nbrf.georgetown.edu/pirwww/dbinfo/r
esid.html
SWISS-
2DPAGEAnnotated 2D gel electrophoresis database http://www.expasy.org/ch2d/
Other Molecular Biology DatabasesOther Molecular Biology Databases
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 143/156
This category has the remaining types of databases. This
category again can be subdivide into the following
divisions:
1) BioImage
2) MetaRouter
3) PubMed
4) Drugs and drug design
5) Molecular probes and primers
11.Other Molecular Biology Databases11.Other Molecular Biology Databases
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 144/156
11.1. Drugs and drug design11.1. Drugs and drug design
ANTIMIC Database of natural antimicrobial peptideshttp://research.i2r.a-star.edu.sg/Templar/DB/ANTIMIC/
APD Antimicrobial peptide database http://aps.unmc.edu/AP/main.php
BSD
Biodegradative strain database: microorganisms
that can degrade aromatic and other organic
compounds http://bsd.cme.msu.edu/
DAR T Drug adverse reaction target database http://xin.cz3.nus.edu.sg/group/drt/dart.asp
Peptaibol Peptaibol (antibiotic peptide) sequences
http://www.cryst.bbk.ac.uk/peptaibol/welco
me.html
Pharmacogenomics and
Pharmacogenetics
Knowledge Base
Variation in drug response based on human
variation http://www.pharmgkb.org/
TTD Therapeutic target database http://xin.cz3.nus.edu.sg/group/cjttd/ttd.asp
11.2. Probes11.2. Probes
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 145/156
IMGT/PR IME
R -DB Immunogenetics oligonucleotide primer database
http://imgt3d.igh.cnrs.fr/PrimerDB/Query_
PrDB.pl
MPDB
Information on synthetic oligonucleotides proven useful as
primers or probes
http://www.biotech.ist.unige.it/interlab/m
pdb.html
probeBase
r R NA-targeted oligonucleotide probe sequences, DNA
microarray layouts and associated information
http://www.microbialecology.net/probeba
se
R TPrimerDB R eal-time PCR primer and probe sequences
http://medgen31.ugent.be/primerdatabase/in
dex.php
Vir Oligo Virus-specific oligonucleotides for PCR and hybridization http://viroligo.okstate.edu/
11.3. Unclassified databases11.3. Unclassified databases
PubMed Citations and abstracts of biomedical literature http://pubmed.gov/
BioImage Database of multidimensional biological images http://www.bioimage.org/
Bioinformatics ToolsBioinformatics Tools
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 146/156
Bioinformatics ToolsBioinformatics ToolsBLAST(Basic Local Alignment Search Tool)BLAST(Basic Local Alignment Search Tool)
BLAST is the algorithm used by a family of five programs that
will align your query sequence against sequences in a molecular
database.
Statistical methods are applied to judge the significance of
matches.
Reported alignments (i.e. sequences in the database that could
be identical to your query sequence) are reported in order of
significance, as estimated by the applied statistics
BLASTNBLASTN
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 147/156
Compares a nucleotide query sequence against aCompares a nucleotide query sequence against anucleotide sequence database.nucleotide sequence database.
BLASTPBLASTP
Compares an amino acid query sequence against aCompares an amino acid query sequence against aprotein sequence database.protein sequence database.
BLASTXBLASTX
Compares the sixCompares the six--frame conceptual translationframe conceptual translationproducts of a nucleotide query sequence (both strands)products of a nucleotide query sequence (both strands)against a protein sequence database.against a protein sequence database.
TBLASTNTBLASTN
Compares a protein query sequence against aCompares a protein query sequence against anucleotide sequence database dynamically translatednucleotide sequence database dynamically translated
in all six reading frames (both strands).in all six reading frames (both strands). TBLASTXTBLASTX
Compares a nucleotide query sequence against the sixCompares a nucleotide query sequence against the six--frame translations of a nucleotide sequence database.frame translations of a nucleotide sequence database.
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 148/156
CLUSTALXCLUSTALX
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 149/156
CLUSTALX CLUSTALX
Clustal X (Clustal X (Thompson et al. 1997Thompson et al. 1997) is a) is aversion of version of Clustal WClustal W with a graphical userwith a graphical user
interface.interface. This programme is used for multipleThis programme is used for multiple
sequence alignment.sequence alignment.
Multiple AlignmentMultiple Alignment
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 150/156
Multiple Alignment Multiple Alignment
Phylogenetic AnalysisPhylogenetic Analysis
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 151/156
Phylogenetic AnalysisPhylogenetic Analysis
Nucleic acid and protein sequences are used toNucleic acid and protein sequences are used toinfer Phylogenetic relationshipsinfer Phylogenetic relationships
Molecular phylogeny methods allow theMolecular phylogeny methods allow thesuggestion of phylogenetic trees, from a given setsuggestion of phylogenetic trees, from a given setof aligned sequences.of aligned sequences.
The phylogenetic trees aim at reconstructing theThe phylogenetic trees aim at reconstructing the
history of successive divergence which took placehistory of successive divergence which took placeduring the evolution, between the consideredduring the evolution, between the consideredsequences and their common ancestor.sequences and their common ancestor.
Phylogenetic programmesPhylogenetic programmes
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 152/156
Phylogenetic programmesPhylogenetic programmes
PHYLIP
PAUP
MEGA
Treeview
ODEN
PHYLOWIN
TREECON
DENDRON
Gene IdentificationGene Identification
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 153/156
Gene IdentificationGene Identification
AAT: AAT: Analysis and Annotation Tool Analysis and Annotation Tool
FGENESH:FGENESH: Splice sites, protein coding exons & geneSplice sites, protein coding exons & genemodelsmodels
Genie:Genie: Gene finder based on hidden Markov modelsGene finder based on hidden Markov models
GenScan:GenScan: Identification of gene structures in genomicIdentification of gene structures in genomicDNADNA
Grail:Grail: DNA sequence analysis toolDNA sequence analysis tool
ORF Finder:ORF Finder: Search for open reading frame, at NCBISearch for open reading frame, at NCBI
Protein Structure PredictionProtein Structure Prediction
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 154/156
Protein Structure PredictionProtein Structure Prediction
3D3D--PSSM:PSSM: Protein Fold RecognitionProtein Fold Recognition
Multicoil:Multicoil: Predict coiled coil structuresPredict coiled coil structures
NNPredict:NNPredict: Protein secondary structure predictionProtein secondary structure prediction
PredictProtein:PredictProtein: Sequence analysis and structureSequence analysis and structurepredictionprediction
SAPS:SAPS: Statistical analysis of protein sequencesStatistical analysis of protein sequences
Protein 3D Structure /Protein 3D Structure /
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 155/156
ModellingModelling
FUGUE:FUGUE: SequenceSequence--structure homology recognitionstructure homology recognition
PDB Viewer:PDB Viewer: Protein structure databaseProtein structure database
Proinformatix:Proinformatix: Modeling oligopeptides for energeticallyModeling oligopeptides for energeticallyminimized structuresminimized structures
SWISSSWISS--MODEL:MODEL: An automated knowledge An automated knowledge--basedbasedprotein modelling serverprotein modelling server
8/6/2019 INDO Thai What is Bioinformatics,A.sharMA
http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 156/156