151
20 th June ± 20 th July, 2006 ³Bioinformatics : Techniques and usage´ Dr . Ashok Sharma Head, Bioinformatics and Co-ordinator , Bioinformatics Centre Central Institute of Medici nal and Aromatic Plants PO. CIMAP , Lucknow-226015, Indi a. Web site: www.cimap.res.in E-mail: [email protected] CIMAP Summer T raini ng on Biotechnology & Bioinformatics

INDO Thai What is Bioinformatics,A.sharMA

Embed Size (px)

Citation preview

Page 1: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 1/156

20th June ± 20th July, 2006

³Bioinformatics : Techniques and usage´ 

Dr. Ashok Sharma

Head, Bioinformatics and Co-ordinator, Bioinformatics Centre

Central Institute of Medicinal and Aromatic Plants

PO. CIMAP, Lucknow-226015, India.

Web site: www.cimap.res.in

E-mail: [email protected]

CIMAP Summer Training on

Biotechnology & Bioinformatics

Page 2: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 2/156

Page 3: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 3/156

Sequences

Biological

KnowledgeDatabases

Greater Biological Knowledge

Bioinformatics

Page 4: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 4/156

Bioinformatics:

WhyWhat

Computational MethodsResources and Tools

?

Page 5: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 5/156

If you are one of many biologists for whom genome

database are as comprehensible as a mass of supermarket

 barcodes ± It is a good time to team up with a friendly

 bioinformaticist and join the action, before, it is too late

Page 6: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 6/156

If biologists do not adapt to the powerful

computation tools needed to exploit huge data

sheets, they could find themselves flounderingin the wake of advances in genomics

Page 7: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 7/156

It is predicted that the potential to integrate different levels of 

genomic data ± such as raw sequence from the human

genome and those of model organisms, data on genetic

variability between individuals and on gene expression in

different tissues ± will radically change biological research.

It is also agreed that small experiments driven by individual

investigators will give way to a world in which

multidisciplinary teams, sharing huge online data sets, emerge

as key players.

Page 8: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 8/156

Bioinformatics : a brave new world

�R adical change in biological research from small

experiments driven by individual investigators

Multidisciplinary teams sharing huge online datawill be the key players

Page 9: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 9/156

Era of µsystems biology¶ ability to create mathematical

models describing the function of networks of genes and proteins is just as important as traditional lab skills

Page 10: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 10/156

Those who learn to conduct high throughput genomic analyses,

and who can master the computational tools needed to exploit biological databases

Who will have competitive advantage?

Page 11: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 11/156

Outcome of this natural selection will see many current top

scientists, research groups and even whole institutes relegated

to the second division

Page 12: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 12/156

What is the solution?

In the long run, the change will come through the emergence of anew breed of biologists who are steeped in computational biology

as an integral parts of their education. This means that the subject

must be included as a core module in all undergraduate biology

courses, rather than a specialist option. Although, this is starting to

happen, the availability of teachers with the appropriate expertiseis still a limiting factor.

Page 13: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 13/156

The emerging new breedSo, if the majority of biologists are not to be disenfranchised,

What is the solution?Emergence of a new breed of biologists who are steeped in

computational biology as an integral parts of their education.

Limiting factor: availability of teachers with the appropriate

expertise.

Page 14: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 14/156

One of the model solution has come out in U.S.A

Funding agencies are also trying to drive change by ploughingmoney into initiatives that require a multidisciplinary approach

and a strong computational component.

The US National Institute of Health, for instance, through its National Institute of General Medical Sciences; has created a

 programme of µglue grants¶ for integrative and collaborative

approaches to research. Under this programme, the Alliance will

draw a complete map of interactions between some 1000

 proteins in two types of cells.

The consortium unites traditional experimentalists with

computational biologists.

Page 15: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 15/156

Glue grants

Integrative and collaborative approaches to research

US National Institutes of Health

Alliance for Cellular Signaling

Complete map of interactions between

some 1,000 proteins in two types of cells

Consortium unites traditional experim-

entalist with computational biologists.

Page 16: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 16/156

Page 17: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 17/156

y Human Genome Project and other genome projects such as sequencingof bacterial genomes and yeast genomes, etc. have produced enormous

amounts of DNA sequence data.

y Large scale biological research involving micro sequencing of proteins, 2-

D gel patterns of proteins and polypeptides, metabolic pathways, physicaland genetic maps of the organisms, cell line information, and microbial

strain data etc. have been responsible for the unprecedented growth of 

biological data.

y Projects such as Species-2000, global plant check list, information on

release of organisms in environment, and Animal Virus Information, etc.

are producing hard data at the species level in multimedia format.

Page 18: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 18/156

� The rate of growth of the biological data is estimated to be more than 200

million base pairs per year.

� The database content itself is doubling in size approximately every year.

� Nucleotide and protein sequences are not the only data that are

accumulating rapidly. The number of characterized genes from a variety

of organisms and the number of solved protein structures are also

doubling every two years.

Page 19: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 19/156

The enormous growth of biological data and its availability in the major international databases is serving as a source of knowledge to the life

scientists.

The whole paradigm shift in molecular biology towards data-

intensive research in search of useful genes is basically due to thefact that the genetic data is becoming the major driving force in drug

discovery, protein engineering, design of new molecules, and other 

related areas.

The large stores of biological data are holding the promise to serve asthe ³Discovery Super Highway´ for innovations in biotechnology through

a process of analysis and transformation of molecular and structural data

into biological knowledge for prosperity.

Page 20: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 20/156

In the face of the challenges imposed by the growing size and

complexity of the biological data, a new discipline of science, known

as µBioinformatics¶, had emerged in the recent past.

Bioinformatics deals with the various issues related to the biological

data. It also covers the development of data analysis tools, modeling

of biological macromolecules and their complexes, metabolic

pathways, designing of new molecules such as drugs, peptide

vaccines, proteins, etc.

Page 21: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 21/156

Gradually, Bioinformatics has evolved to deal with four related but still

distinct problem areas, viz.:

a) Handling and management of biological data, including its organization,

control, linkages, analysis, and so forth.

b) Communication among people, projects, and institutions engaged in

the biological research and applications. The communication may

include e-mail, file transfer, remote login, computer conferencing,

electronic bulletin boards, or establishment of web-based information

resources.

c) Organization, access, search and retrieval of biological information,

documents, and literature.

d)  Analysis and interpretation of the biological data through the

computational approaches including visualization, mathematical

modeling, and development of algorithms for highly parallel

processing of complex biological structure.

Page 22: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 22/156

Bioinformatics may, be defined as a scientific discipline that

encompasses all the aspects of biological information, viz.,

acquisition, processing, storage, distribution, analysis andinterpretation, that combines the tools and techniques of 

mathematics, computer science, and biology with the aim of 

understanding the biological significance of a variety of data.

Page 23: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 23/156

Bioinformatics has acquired great importance due to its application in

the Genome projects.

The target of decoding the three billion base pairs of the human DNA

has become achievable only through the use of various innovative

techniques and methods evolved by the Bioinformatics scientists.

Bioinformatics has become an essential component of biotechnology

based product and process development.

The process of drug design and development is expensive and time-

consuming. The application of the tools and techniques of 

Bioinformatics has resulted in the reduction in cost and the

development cycle of the drugs. This aspect has a tremendous impact

on the society. If a newly discovered drug is a life-saving one, then the

resulting gains are not only in terms of financial savings but also insaving the lives of several million people. Major pharmaceutical and

Biotechnology companies have set up large R&D groups in

Bioinformatics.

Page 24: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 24/156

Bioinformatics is a multidisciplinary subject. Through only about a decadeold, it has become very important for the growth of biosciences,

biotechnology, and the economic prosperity of nations.

Three well-identified divisions of Bioinformatics may be considered:

a) Molecular Bioinformatics,

b) Cellular and sub-cellular Bioinformatics, and

c) Orgasmic and community Bioinformatics.

Page 25: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 25/156

 FUNCTIONS OF A BIOINFORMATICS CENTRE

i. The principal objective of a Bioinformatics Centre is to function as aninformation base in each specialty so that the scientists have ready

access to the computer-based information on resources, databases

in subject fields, and build up expertise in bioinformatics in keeping

with the rapid development in this area.

ii. To provide a computer-based information storage and retrievalsystem of database that collects structured information generated

by research and industrial institutions in the identified fields of 

biotechnology, continually update the databases and make the

information available to the users.

iii. An active network mode, in which the scientists get access to thebiotechnology community in the identified areas, answer requests

for information in an interactive and discussive mode and actively

initiate dialogue among groups with common interest.

Page 26: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 26/156

iv. To provide retrieval service either online or offline in their  

specialized areas and to give overall information support even in

areas other than those assigned to them.

v. To provide communication link with international databases for 

selective bibliographic information for the user scientist.

vi. To develop software packages and databases specific to user needs.

vii. To conduct training courses in the specialized areas

periodically to meet the special requirements of manpower 

development in the area and to promote awareness about the

computerized storage and retrieval facility among bio scientists

and information scientists.

Page 27: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 27/156

Bioinformatics ± What?

�A mixture of Biochemistry, Molecular Biology, and Computer Science

�O btaining, storing, organizing, and analyzing biological and genetic information

for understanding its activity in living organisms

�Main goal is to convert multitude of complex data into useful information and

knowledge

�Data includes gene and protein sequences, cDNA, nucleotide sequences

�Data from gene sequencing, combinatorial chemical synthesis, gene-expression

investigations, pharmicogenomics, proteomic studies, and other methods of study.

�Information used to build synthetic and predictive models allowing scientists to better 

understand complex living systems

�Future applications in biology, chemistry, pharmaceuticals, medicine, and agriculture

Page 28: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 28/156

What is the Role of Bioinformatics

� The R ole of the Bioinformatics group is to:

R esearch and develop tools and systems that provide understanding

and integration of genomic data across technologies

Work with other R esearch Information staff to make these tools

available to research scientists

Page 29: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 29/156

What kinds of data are we interested in?

� Sequence data

� Profile data ± gene expression and proteins

� Mapping data

� Function and phenotype

� Pathways

Page 30: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 30/156

TECHNOLOGIES INTECHNOLOGIES IN

BIOINFORMATICSBIOINFORMATICS

DataData--acquisition Systemsacquisition Systems

TheseThese areare requiresrequires mainlymainly atat researchresearch labslabs

generatinggenerating largelarge amountsamounts of of datadata.. TheseThese systemssystems

includeinclude inventoryinventory ControlControl Software,Software, trackingtracking hundredshundreds

of of thousandsthousands of of reagents,reagents, gelsgels andand other other materials,materials,

reagentreagent manipulationmanipulation software,software, roboticrobotic systemsystem toto carrycarry

outout highhigh volume,volume, highhigh precision precision laboratorylaboratory

manipulationmanipulation inin genomegenome researchresearch andand sequencesequence

 production production softwaresoftware thatthat willwill helphelp improveimprove sequencingsequencing..

Page 31: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 31/156

TECHNOLOGIES IN BIOINFORMATICSTECHNOLOGIES IN BIOINFORMATICS

DataData ² ² Analysis SystemsAnalysis SystemsStudyingStudying sequences,sequences, predictingpredicting proteinprotein structurestructure andand

comparingcomparing genomesgenomes onon anan extensionextension suchsuch allall requiresrequires

InformaticsInformatics toolstools suchsuch asas SequenceSequence AnalysisAnalysis SoftwareSoftware thatthat

performsperforms alignments,alignments, detectsdetects homologies,homologies, identifiesidentifies codingcoding

regionsregions andand extractsextracts featuresfeatures.. ProteinProtein foldingfolding softwaresoftware isis

usedused toto transformtransform geneticgenetic informationinformation intointo functionfunction viavia

proteinsproteins whosewhose functionalfunctional specificspecific areare determineddetermined byby

their their 33--DD shapesshapes.. GeneticGenetic mappingmapping SoftwareSoftware SystemsSystems playplay

aa keykey rolerole inin thethe analysisanalysis of of geneticgenetic mappingmapping datadata..

ClassificationClassification SoftwareSoftware extractsextracts featuresfeatures fromfrom DNADNA

SequencesSequences placeplace proteinsproteins intointo genegene familiesfamilies andand tracktrack

roteinrotein motifsmotifs..

Page 32: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 32/156

TECHNOLOGIES IN BIOINFORMATICSTECHNOLOGIES IN BIOINFORMATICS

DataData-- Management SystemManagement System

Various genome projects are generatingVarious genome projects are generating

information that can not be accommodated byinformation that can not be accommodated by

traditional publishing. Electronic data managementtraditional publishing. Electronic data management

and publishing Systems are crucial components of and publishing Systems are crucial components of genomic research.genomic research.

Page 33: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 33/156

Bioinformatics,Bioinformatics, whichwhich isis thethe intersectionintersection of of InformationInformationTechnologyTechnology andand MathematicsMathematics withwith molecular molecular biology biology //genetics,genetics, hashas createdcreated severalseveral challengeschallenges for for thethe Computer Computer ScienceScience CommunityCommunity.. InformationInformation StorageStorage

StoringStoring hugehuge amountsamounts of of geneticgenetic information,information, amenableamenable toto rapidrapid accessaccess andandmanipulation,manipulation, isis aa greatgreat challengechallenge..

OneOne millionmillion bases bases ((11Mb)Mb) NN 11 MegabyteMegabyte ((11MB)MB).. Thus,Thus, oneone wouldwould requirerequire 33GigabytesGigabytes ((33 GB)GB) of of computer computer datadata storagestorage spacespace toto storestore entireentire HumanHuman GenomeGenomecomprisingcomprising threethree GigabasesGigabases ((33 Gb)Gb)..

ThisThis includesincludes nucleotidenucleotide sequencesequence datadata onlyonly andand doesdoes notnot includeinclude datadata annotationsannotationsandand other other informationinformation associatedassociated withwith thethe sequencesequence datadata..

WithWith time,time, moremore annotationsannotations enteredentered either either (a)(a)  by by scientistsscientists asas aa resultresult of of laboratorylaboratory findings,findings, literatureliterature searches,searches, datadata analysis,analysis, or or  personal personal communications,communications, and/or and/or 

(b)(b) enteredentered asas aa resultresult of of automatedautomated datadata analysisanalysis programs programs or or autoannotators,autoannotators, WillWill be beassociatedassociated withwith thethe sequencesequence datadata increasingincreasing thethe requirementsrequirements of of storagestoragesignificantlysignificantly beyond beyond thethe 33 GBGB for for thethe humanhuman genomegenome..

CHALLENGES IN BIOINFORMATICS

Page 34: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 34/156

Page 35: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 35/156

ProteinProtein FoldingFolding SoftwareSoftware

GeneticGenetic informationinformation isis transformedtransformed intointo functionfunction viavia proteins, proteins, whosewhose functionalfunctionalspecificitiesspecificities areare determineddetermined by by their their threethree dimensionaldimensional shapesshapes.. PredictionPrediction of of thethe protein protein structurestructure fromfrom aminoamino acidacid sequncessequnces isis anan importantimportant andand challengingchallenging problem problem..

MapMap AssemblyAssembly && IntegrationIntegration SoftwareSoftware ComputationComputation plays plays anan increasingincreasing centralcentral rolerole inin thethe assemblyassembly andand integrationintegration of of 

largelarge mapsmaps composedcomposed of of differentdifferent kindskinds andand combinationscombinations of of datadata..

ComparativeComparative GenomicsGenomics ToolsTools AsAs thethe genomegenome projects projects maturemature andand largelarge amountsamounts of of genomicgenomic informationinformation isis

availableavailable for for aa number number of of species,species, comparativecomparative genomicsgenomics isis emergingemerging asas anan activeactiveareaarea of of studystudy..

GeneGene MiningMining MethodsMethods for for mappingmapping genesgenes toto their their physical physical locationslocations onon thethe genomegenome;; searchingsearching

for for relatedrelated genesgenes;; analysinganalysing thethe databasedatabase toto findfind familiesfamilies of of relatedrelated genesgenes andand totounderstandunderstand their their coordinatedcoordinated expressionexpression;; findingfinding correlationcorrelation between between specificspecificdiseasesdiseases andand expressionexpression of of relatedrelated genesgenes..

Page 36: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 36/156

SER ENDIPITY EFFECTSER ENDIPITY EFFECT

OneOne of of thethe mostmost excitingexciting aspectsaspects of of thethe informationinformation revolutionrevolutionisis thatthat itit allowsallows usus toto combinecombine manymany differentdifferent itemsitems of  of informationinformation andand manymany differentdifferent kindkind of of informationinformation onon aa scalescale

never never seenseen before before.. LargeLarge internationalinternational databasesdatabases for  for instance,instance, includeinclude

contributionscontributions fromfrom thousandsthousands of of differentdifferent sourcessources.. AlsoAlso thethehypertexthypertext linkslinks (Information(Information Super Super Highway)Highway) between between sitessitesmakesmakes itit possible possible toto drawdraw together together manymany differentdifferent kindskinds of of 

informationinformation thatthat bear  bear onon aa particular  particular problems problems.. TheseThese activitiesactivities notnot onlyonly promote promote collaborationcollaboration onon aa trulytruly vastvast

scale,scale, theythey alsoalso enrichenrich researchresearch.. OneOne importantimportant effecteffect isis thethe³Screndipity³Screndipity effect´effect´ combiningcombining differentdifferent datasetsdatasets makesmakes possible possible entirelyentirely newnew kindskinds of of studystudy--New New StudiesStudies inevitableinevitableleadlead toto newnew andand unexpectedunexpected discoveriesdiscoveries..

Page 37: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 37/156

Page 38: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 38/156

Using public databases and data formats

The first key skill for biologists is to learn to use online search toolsto find information. Literature searching is no longer a matter of 

looking up references in a printed index. You can find links to most

of the scientific publications you need online. There are central

databases that collect reference information so you can search

dozens of journals at once. You can even set up ³agents´ that notify

you when new articles are published in an area of interest.

Searching the public molecular-biology databases requires the same

skills as searching for literature references: you need to know how

to construct a query statement that will pluck the particular needleyou¶re looking for out of the database haystack.

Page 39: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 39/156

Being able to compare pairs of DNA or protein sequences andextract partial matches has made it possible to use a biological

sequence as a database query. Sequence-based searching is another 

key skill for biologists; a little exploration of the biological

databases at the beginning of a project often saves a lot of valuable

time in the lab. Identifying homologous sequences provides a basis

for phylogenetic analysis and sequence-pattern recognition.

Sequence-based searching can be done online through web forms,

so it requires no special computing skills, but to judge the quality

of your search results you need to understand how the underlyingsequence-alignment method works and go beyond simple sequence

alignment to other types of analysis.

Sequence alignment and sequence searching

Page 40: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 40/156

Gene prediction

Gene prediction is only one of a cluster of methods for attemptingto detect meaningful signals in uncharacterized DNA sequences.

Until recently, most sequences deposited in GenBank were already

characterized at the time of deposition. That is, someone had

already gone in and, using molecular biology, genetic, or  

 biochemical methods, figured out what the gene did. However, nowthat the genome projects are in full swing, there¶s a lot of DNA

sequence out there that isn¶t characterized.

Software for prediction of open reading frames, genes, exon splice

sites, promoter binding sites, repeat sequences, and tR  NA genes

helps molecular biologists make sense out of this unmapped DNA.

Page 41: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 41/156

Multiple sequence alignment

Multiple sequence-alignment methods assemble pairwise sequence

alignments for many related sequences into a picture of sequence

homology among all members of a gene family. Multiple sequence

alignments aid in visual identification of sites in a DNA or protein

sequence that may be functionally important. Such sites are usuallyconserved; that is, the same amino acid is present at that site in each

one of a group of related sequences. Multiple sequence alignments

can also be quantitatively analyzed to extract information about a

gene family. Multiple sequence alignments are an integral step in

 phylogenetic analysis of a family of related sequences, and they also provide the basis for identifying sequence patterns that characterize

 particular protein families.

Page 42: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 42/156

Phylogenetic analysis

Phylogenetic analysis attempts to describe the evolutionary

relatedness of a group of sequences. A traditional phylogenetic tree or 

cladogram groups species into a diagram that represents their relative

evolutionary divergence. Branchings of the tree that occur furthest

from the root separate individual species; branchings that occur close

to the root group species into kingdoms, phyla, classes, families,genera, and so on.

The information in a molecular sequence alignment can be used to

compute a phylogenetic tree for a particular family of gene sequences.

The branchings in phylogenetic trees represent evolutionary distance

 based on sequence similarity scores or on information-theoretic

modeling of the number of mutational steps required to change on

sequence into the other. Phylogenetic analyses of protein sequence

families talks not about the evolution of the entire organism but about

evolutionary change in specific coding regions.

Page 43: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 43/156

Extraction of patterns and profiles from sequence data

A motif is a sequence of amino acids that defines a substructure in a protein that can be connected to function or to structural stability. In a

group of evolutionarily related gene sequences, motifs appear as

conserved sites. Sites in a gene sequence tend to be conserved-to

remain the same in all or most representatives of a sequence family ± 

when there is selection pressure against copies of the gene that havemutations at that site. Nonessential parts of the gene sequence will

diverge from each other in the course of evolution, so the conserved

motif regions who up as a signal in a sea of mutational noise.

Sequence profiles are statistical descriptions of these motif signals;

 profiles can help identify distantly related proteins by picking out a

motif signal even in a sequence that has diverged radically from other 

members of the same family.

Page 44: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 44/156

Protein sequence analysis

The amino-acid content of a protein sequence can be used as the

 basis for many analyses, from computing the isoelectric point and

molecular weight of the protein and the characteristic peptide mass

fingerprints that will form when it¶s digested with a particular 

 protease, to predicting secondary structure features and post-

transnational modification sites.

Page 45: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 45/156

Protein structure prediction

It is a lot harder to determine the structure of a protein

experimentally than it is to obtain DNA sequence data. One very

active area of bioinformatics and computational biology research is

the developemtn of methods for predicting protein structure from protein sequence. Methods such as secondary structure prediction

and threading can help determine how a protein might fold,

classifying it with other proteins that have similar topology, but

they don¶t provide a detailed structure mode.

Page 46: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 46/156

Protein structure property analysis

Protein structures have many measurable properties that are of 

interest to crystallographers and structural biologists. Protein

structure validation tools are used by crystallographers to measure

how well a structure model conforms to structural rules extractedfrom existing structures or chemical model compounds. These tools

may also analyze the ³fitness´ of every amino acid in a structure

model for its environment, flagging such oddities as buried charges

with no countercharge or large patches of hydrophobic amino acids

found on a protein surface. These tools are useful for evaluating both

experimental and theoretical structure models.

Page 47: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 47/156

Protein structure alignment and comparison

Even when two gene sequences aren¶t apparently homologous, the

structures of the proteins they encode can be similar, New tools for 

computing structural similarity are making is possible to detect

distant homologies by comparing structures, even on the absence of 

much sequence similarity.

Page 48: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 48/156

Biochemical simulation

Biochemical simulation uses the tools of dynamical systems

modeling to simulate the chemical reactions involved in

metabolism. Simulations can extend from individual metabolic

 pathways to transmembrane transport processes and even properties

of whole cells or tissues. Biochemical and cellular simulationstraditionally have relied on the ability of the scientist to describe a

system mathematically, developing a system of differential

equations that represent the different reactions and fluxes occurring

in the system. However new software tools can build the

mathematical framework of a simulation automatically from adescription provided interactively by the user, making mathematical

modeling accessible to any biologist who knows enough about a

system to describe it according to the conventions of dynamical

systems modeling.

Page 49: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 49/156

Whole genome analysis

As more and more genomes are sequenced completely, the

analysis of raw genome data has become a more important

task. There are a number of perspectives from which one can

look at genome data: for example, it can be treated as a longlinear sequence, but it¶s often more useful to integrate DNA

sequence information with existing genetic and physical map

data. This allows you to navigate a very large genome and

find what you want.

Page 50: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 50/156

Primer design

Many molecular biology protocols require the design of  

oligonucleotide primers. Proper primer design is critical for the

success of polymerase chain reaction (PCR ), oligo hybridization,

DNA sequencing, and microarray experiments. Primers must

hybridize with the target DNA to provide a clear answer to thequestion being asked, but, they must also have appropriate

 physicochemical properties; they must not self-hybridize or 

dimerize; and they should not have multiple targets within the

sequence under investigation. There are several web-based services

that allow users to submit a DNA sequence and automatically

detect appropriate primers, or to compute the properties of a

desired primer DNA sequence.

Page 51: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 51/156

DNA microarray analysis

DNA microarray analysis is a relatively new molecular biology

method that expands on classic probe hybridization methods to

 provide access to thousands of genes at once.

The main tasks in microarray analysis as it¶s currently done are

an image analysis step, in which individual spots on the array

image are identified and signal intensities are identified.

Page 52: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 52/156

Proteomics analysis

Before they¶re ever crystallized and biochemically characterized, proteins are often studied using a combination of gel

electrophoresis, partial sequencing, and mass spectroscopy. 2-D gel

electrophoresis can separate a mixture of thousands of proteins into

distinct components; the individual spots of material can be blotted

or even cut from the gel and analyzed. Simple computational tools

can provide some information to aid in the process of analyzing

 protein mixtures. It¶s trivial to compute molecular weight and pI

from a protein sequence; by using these values in combination, sets

of candidate identities can be found for each spot on a gel. It¶s also possible to compute, from a protein sequence, the peptide

 fingerprint  that is created when that protein is broken down into

fragments by enzymes with specific protein cleavage sites.

Page 53: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 53/156

DatabasesDatabases

The internet is a powerful resource containing a large volume of data and tools

to manipulate them« unfortunately, connecting data between them can

sometimes be tricky.

What is a database ?What is a database ?

An organized body of related information.A collection of information organized and presented to serve a specific

purpose. A computerized database is an updated, organized file of machine

readable information that is rapidly searched and retrieved by computer.

computerized storehouse of data (records).

allows user-defined queries.

allows extraction of specified records.

allows adding, changing, removing, and merging of records .

uses standardized formats.

Page 54: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 54/156

The ideal sequence database forThe ideal sequence database forcomputational analyses and datacomputational analyses and data--

mining:mining:

It must be complete with minimal redundancyIt must be complete with minimal redundancy

It must contain as much upIt must contain as much up--toto--date information (annotation) asdate information (annotation) aspossible on each sequencepossible on each sequence

 All the information items must be retrievable by computer All the information items must be retrievable by computer

programs in a consistent mannerprograms in a consistent manner

It must be highly interoperable with other databasesIt must be highly interoperable with other databases

Page 55: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 55/156

Database Categories List

Nucleotide Sequence Databases

RNA sequence databases

Protein sequence databases

Structure DatabasesGenomics Databases (non-vertebrate)

Metabolic and Signaling Pathways

Human and other Vertebrate Genomes

Human Genes and Diseases

Microarray Data and other Gene Expression Databases

Proteomics ResourcesOther Molecular Biology Databases

Organelle databases

Plant databases

Immunological databases

Database Categories ListDatabase Categories List

Page 56: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 56/156

The nucleotide sequence databases are data repositories, accepting nucleic

acid sequence data from the scientific community and making it freely

available. The databases strive for completeness, with the aim of recording

every publicly known nucleic acid sequence. These data are heterogenous,

they vary with respect to the source of the material (e.g. genomic versus

cDNA), the intended quality (e.g. finished versus single pass sequences), the

extent of sequence annotation and the intended completeness of the

sequence relative to its biological target (e.g. complete versus partial

coverage of a gene or a genome). The nucleotide databases are distributed

free of charge over the internet.

Nucleotide Sequence DatabasesNucleotide Sequence Databases

Page 57: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 57/156

DDBJ, GenBank and EMBL-Bank exchange new and updated data on a

daily basis to achieve optimal synchronisation. The result is that they

contain exactly the same information, except for sequences that have been

added in the last 24 hours.

Nucleotide Sequence Databases can be further subdivided into following :

1)International Nucleotide Sequence Database Collaboration

2)Coding and non-coding DNA

3)Gene structure, introns and exons, splice sites

4)Transcriptional regulator sites and transcription factors.

Page 58: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 58/156

Database nameDatabase name Full name and/or descriptionFull name and/or description URLURL

1.1. International Nucleotide Sequence Database Collaboration1.1. International Nucleotide Sequence Database Collaboration

GenBank 

An annotated collection of all publicly

available nucleotide and protein sequences http://www.ncbi.nlm.nih.gov/

EMBL Nucleotide

Sequence Database

An annotated collection of all publicly

available nucleotide and protein sequenceshttp://www.ebi.ac.uk/embl.html

DDBJ²DNA Data

Bank of Japan

An annotated collection of all publicly

available nucleotide and protein sequences

http://www.ddbj.nig.ac.jp

Nucleotide Sequence DatabasesNucleotide Sequence Databases

O li d t bO li d t b

Page 59: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 59/156

Online databasesOnline databases primary repositories of sequence data:primary repositories of sequence data:

-- European Bioinformatics Institute (EBI)European Bioinformatics Institute (EBI)

-- DNA data bank of Japan (DDBJ)DNA data bank of Japan (DDBJ)-- GenBank, National Center for Biotechnology InformationGenBank, National Center for Biotechnology Information(NCBI)(NCBI)

� each of these databases

contain equivalent

information (formats vary

slightly)

Page 60: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 60/156

1.2. DNA sequences: genes, motifs and regulatory sites1.2. DNA sequences: genes, motifs and regulatory sites

1.2.1. Coding and coding DNA1.2.1. Coding and coding DNA

ACLAME A classification of genetic mobile elements http://aclame.ulb.ac.be/

CUTG Codon usage tabulated from GenBank   http://www.kazusa.or.jp/codon/

Genetic Codes

Deviations from the standard genetic code in various organisms

and organelles

http://www.ncbi.nlm.nih.gov/Taxonomy/

Utils/wprintgc.cgi?mode=c

HER Vd Human endogenous retrovirus database http://herv.img.cas.cz

IMGT/LIGM-

DB

Immunoglobulin, T cell receptor and MHC nucleotide

sequences from human and other vertebrates http://imgt.cines.fr/cgi-bin/IMGTlect.jv

Imprinted Gene

Catalogue Imprinted genes and parent-of-origin effects in animals http://www.otago.ac.nz/IGC

Islander Pathogenicity islands and prophages in bacterial genomes http://www.indiana.edu/islander

MICdb Prokaryotic microsatellites http://www.cdfd.org.in/micas

STR Base Short tandem DNA repeats database http://www.cstl.nist.gov/div831/strbase/

TIGR Gene

IndicesOrganism-specific databases of EST and gene sequences http://www.tigr.org/tdb/tgi.shtml

Page 61: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 61/156

Transterm Codon usage, start and stop signals http://uther.otago.ac.nz/Transterm.html

UniGene Unified clusters of ESTs and full-length mR  NA sequences http://www.ncbi.nlm.nih.gov/UniGene/

UniVec

Vector sequences, adapters, linkers and primers used in DNA

cloning, can be used to check for vector contamination

http://www.ncbi.nlm.nih.gov/VecScreen/U

niVec.html

V

ectorDB Characterization and classification of nucleic acid vectors

http://genome-

www2.stanford.edu/vectordb/

Xpro

Eukaryotic protein-encoding DNA sequences, both intron-

containing and intron-less genes http://origin.bic.nus.edu.sg/xpro/

1.2.2. Gene structure, introns and exons, splice sites1.2.2. Gene structure, introns and exons, splice sites

ASAP Alternative spliced isoforms http://www.bioinformatics.ucla.edu/ASAP

ASDEBI¶s alternative splicing database project includes three

databases AltSplice, AltExtron and AEdbhttp://www.ebi.ac.uk/asd

Page 62: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 62/156

ASDB

Alternative splicing database: protein products and

expression patterns of alternatively-spliced genes http://hazelton.lbl.gov/teplitski/alt

EASED Extended alternatively spliced EST database http://eased.bioinf.mdc-berlin.de/

EID Exon±intron database: introns in protein-coding genes http://mcb.harvard.edu/gilbert/EID/

ExInt Exon±intron structure of eukaryotic genes http://intron.bic.nus.edu.sg/exint/exint.html

HS3D  Homo sa pien s splice sites dataset http://www.sci.unisannio.it/docenti/rampone/

IDB/IEDB Intron sequence and evolution databases http://nutmeg.bio.indiana.edu/intron/index.html

Intronerator 

Introns and alternative splicing in C.eleg an s and

C.brigg  sae

http://www.cse.ucsc.edu/kent/intronerator/

SpliceDB Canonical and non-canonical mammalian splice sites

http://genomic.sanger.ac.uk/spldb/SpliceDB.htm

l

SpliceNest A tool for visualizing splicing of genes from EST data http://splicenest.molgen.mpg.de/

YIDB Yeast nuclear and mitochondrial intron sequences

http://www.embl-

heidelberg.DE/ExternalInfo/seraphin/yidb.html

Page 63: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 63/156

1.2.3. Transcriptional regulator sites and transcription factors1.2.3. Transcriptional regulator sites and transcription factors

ACTIVITY Functional DNA/R  NA site activity

http://util.bionet.nsc.ru/databases/activity.htm

l

DBTBS Bacillu s subtili s  promoters and transcription factors http://dbtbs.hgc.jp/

DBTSS A database of transcriptional start sites http://dbtss.hgc.jp/

DPInteract Binding sites for  E.coli DNA-binding proteins http://arep.med.harvard.edu/dpinteract

EPD Eukaryotic promoter database http://www.epd.isb-sib.ch

HemoPDB

Hematopoietic promoter database: transcriptional regulation in

hematopoiesis

http://bioinformatics.med.ohio-

state.edu/HemoPDB

HvrBase Primate mitochondrial DNA control region sequences http://www.hvrbase.org/

JASPAR  PSSMs for transcription factor DNA-binding sites http://jaspar.cgb.ki.se

PLACE Plant ci s-acting regulatory DNA elements http://www.dna.affrc.go.jp/htdocs/PLACE

PlantCAR E Plant promoters and ci s-acting regulatory e lements http://intra.psb.ugent.be:8080/PlantCAR E/

PlantProm Plant promoter sequences for R  NA polymerase II http://mendel.cs.rhul.ac.uk/

Page 64: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 64/156

Database Categories List

Nucleotide Sequence Databases

RNA sequence databases

Protein sequence databases

Structure Databases

Genomics Databases (non-vertebrate)

Metabolic and Signaling Pathways

Human and other Vertebrate Genomes

Human Genes and Diseases

Microarray Data and other Gene Expression Databases

Proteomics ResourcesOther Molecular Biology Databases

Organelle databases

Plant databases

Immunological databases

Database Categories ListDatabase Categories List

Page 65: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 65/156

The RNA sequence databases aims to contain all the databases have

compiled all complete or nearly complete ribosomal RNA sequences from all

or specific rna sequences. Some of them contains secondary structure

information, additional information about the sequences, such as taxonomic

classification of the organism from which they have been obtained, and

literature references are also provided. There are databases containing

information regarding 16S and 23S ribosomal RNA mutations, 5S rRNA

sequences, Genomic tRNA, All complete or nearly complete rRNA sequences

etc.

RNA sequence databasesRNA sequence databases

2 RNA d b2 RNA d b

Page 66: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 66/156

2. RNA sequence databases2. RNA sequence databases

16S and 23S r R  NA

Mutation Database 16S and 23S ribosomal R  NA mutations http://ribosome.fandm.edu/

5S r R  NA Database 5S r  R  NA sequences http://biobases.ibch.poznan.pl/5SData/

Aptamer database

Small R  NA/DNA molecules binding nucleic acids,

 proteins http://aptamer.icmb.utexas.edu/

AR ED AU-rich element-containing mR  NA database http://rc.kfshrc.edu.sa/ared

Mobile group II introns

A database of group II introns, self-splicing catalytic

R  NAs http://www.fp.ucalgary.ca/group2introns/

European r R  NA

database All complete or nearly complete r  R  NA sequences http://www.psb.ugent.be/r  R  NA/

GtR DB Genomic tR   NA database http://rna.wustl.edu/GtR DB

Guide R  NA Database R  NA editing in various kinetoplastid species

http://biosun.bio.tu-

darmstadt.de/goringer/gR  NA/gR  NA.html

HIV Sequence

Database HIV R  NA sequences http://hiv-web.lanl.gov/

HyPaLib

Hybrid pattern library: structural elements in classes of 

R  NA

http://bibiserv.techfak.uni-

 bielefeld.de/HyPa/

IR ESdb Internal ribosome entry site database

http://ifr31w3.toulouse.inserm.fr/IR ESda

tabase/

http://www sanger ac uk/Software/Rfam/mir

Page 67: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 67/156

miR  NA R egistry Database of microR  NAs (small non-codingR  NAs)

http://www.sanger.ac.uk/Software/R fam/mir 

na/

 NCIR  Non-canonical interactions in R  NA structures http://prion.bchs.uh.edu/bp_type/

ncR  NAs Database Non-coding R   NAs with regulatory functions http://biobases.ibch.poznan.pl/ncR  NA/

PLANTncR   NAs Plant non-codingR  NAs http://www.prl.msu.edu/PLANTncR  NAs

Plant snoR  NA DB sno R  NA genes in plant species http://www.scri.sari.ac.uk/plant_sno R  NA/

PLMItR   NA Plant mitochondrial tR  NA http://bighost.area.ba.cnr.it/PLMItR  NA/

PseudoBase Database of  R  NA pseudoknots

http://wwwbio.leidenuniv.nl/ Batenburg/P

KB.html

R DP R ibosomal database project: r R  NA sequence data http://rdp.cme.msu.edu

R fam Non-coding R  NA families http://www.sanger.ac.uk/Software/ R fam/

R ISCC R ibosomal internal spacer sequence collection http://ulises.umh.es/R ISSC

R  NA Modification

Database  Naturally modified nucleosides inR  NA http://medlib.med.utah.edu/R  NAmods/

RR  NDB r  R  NA operon numbers in various prokaryotes http://rrndb.cme.msu.edu/

Page 68: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 68/156

Small R  NA

Database Small R  NAs from prokaryotes and eukaryotes http://mbcr.bcm.tmc.edu/smallRNA

SR PDB Signal recognition particle databasehttp://psyche.uthct.edu/dbs/SRPDB/SRPDB.html

Subviral R  NA

Database Viroids and viroid-like R  NAs

http://subviral.med.uottawa.ca/cgi-

 bin/home.cgi

tmR  NA

Website tmR  NA sequences and alignments http://www.indiana.edu/tmrna

tmR DB tmR  NA database

http://psyche.uthct.edu/dbs/tmR DB/tmR DB.

html

tR  NA database t R  NA viewer and sequence editor 

http://www.uni-

 bayreuth.de/departments/biochemie/trna/

UTR db/UTR sit

e5'- and 3'-UTR s of eukaryotic mR  NAs http://bighost.area.ba.cnr.it/srs6/

Page 69: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 69/156

Database Categories List

Nucleotide Sequence Databases

RNA sequence databases

Protein sequence databases

Structure Databases

Genomics Databases (non-vertebrate)

Metabolic and Signaling Pathways

Human and other Vertebrate Genomes

Human Genes and Diseases

Microarray Data and other Gene Expression Databases

Proteomics ResourcesOther Molecular Biology Databases

Organelle databases

Plant databases

Immunological databases

Database Categories ListDatabase Categories List

Page 70: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 70/156

Types of protein databasesTypes of protein databases

GLAWEWINQTR

2. Protein motif databases | |||||

GREWEWINES

1. Sequence sequence databases SCIENCEISFN

3. Protein structure databases

Page 71: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 71/156

�The protein databases are the most comprehensive source of information

on proteins. It is necessary to distinguish between universal databases

covering proteins from all species and specialised data collections storing

information about specific families or groups of proteins, or about the

proteins of a specific organism.

Two categories of universal protein databases can be discerned: simplearchives of sequence data; and annotated databases where additional

information has been added to the sequence record.

In the upcoming slides you will find a list of the databases like:

�Primary protein sequence databases such as UniProt/Swiss-Prot

�Specialised protein sequence databases such as GOA

�Specialised protein databases such as ENZYME 

�Secondary protein databases such as InterPro

�Structure databases such as PDB

Protein sequence databasesProtein sequence databases

3 P t i d t b3 P t i d t b

Page 72: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 72/156

3. Protein sequence databases3. Protein sequence databases

3.1. General sequence databases3.1. General sequence databases

EXProt Sequences of proteins with experimentally verified function http://www.cmbi.kun.nl/EXProt/

 NCBI Protein

database

All protein sequences: translated from GenBank and imported

from other protein databases http://www.ncbi.nlm.nih.gov/entrez

PIR 

Protein information resource: a collection of protein sequence

databases, part of the UniProt project http://pir.georgetown.edu/

PIR -NR EF PIR ¶s non-redundant reference protein database

http://pir.georgetown.edu/pirwww/pirnref 

.shtml

PR F

Protein research foundation database of peptides: sequences,

literature and unnatural amino acids http://www.prf.or.jp/en

Swiss-Prot

Curated protein sequence database with a high level of 

annotation (protein function, domain structure, modifications) http://www.expasy.org/sprot

TrEMBL

Translations of EMBL nucleotide sequence entries: computer-

annotated supplement to Swiss-Prot http://www.expasy.org/sprot

Page 73: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 73/156

UniProt

Universal protein knowledgebase: a database of protein

sequence from Swiss-Prot, TrEMBL and PIR  http://www.uniprot.org/

3.2. Protein properties3.2. Protein properties

AAindex Physicochemical properties of amino acids http://www.genome.ad.jp/aaindex/

ProTherm Thermodynamic data for wild-type and mutant proteins

http://gibk26.bse.kyutech.ac.jp/jouhou/Pr

otherm/protherm.html

3.3. Protein localization and targeting3.3. Protein localization and targeting

DBSubLoc Database of protein subcellular localization

http://www.bioinfo.tsinghua.edu.cn/dbsublo

c.html

MitoDrome Nuclear-encoded mitochondrial proteins of  Dr oso phil a http://bighost.area.ba.cnr.it/BIG/MitoDrome

 NESbase Nuclear export signals database http://www.cbs.dtu.dk/databases/NESbase

 NLSdb Nuclear localization signals http://cubic.bioc.columbia.edu/db/NLSdb/

Page 74: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 74/156

THGS Transmembrane helices in genome sequences http://pranag.physics.iisc.ernet.in/thgs/

TMPDB Experimentally characterized transmembrane topologies http://bioinfo.si.hirosaki-.ac.jp/TMPDB/

3.4. Protein sequence motifs and active sites3.4. Protein sequence motifs and active sites

ASC Active sequence collection: biologically active peptides http://bioinformatica.isa.cnr.it/ASC/

Blocks Alignments of conserved regions in protein families http://blocks.fhcrc.org/

CSA

Catalytic site atlas: enzyme active sites and catalytic residues

in enzymes of known 3D structure

http://www.ebi.ac.uk/thornton-

srv/databases/CSA/

COMeCo-ordination of metals etc.: classification of bioinorganic

 proteins (metalloproteins and some other complex proteins)http://www.ebi.ac.uk/come

Page 75: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 75/156

eMOTIF Protein sequence motif determination and searches http://motif.stanford.edu/emotif 

Metalloprotein

Site Database Metal-binding sites in metalloproteins http://metallo.scripps.edu/

O-GlycBase O- and C-linked glycosylation sites in proteins

http://www.cbs.dtu.dk/databases/ OGLYCBA

SE/

PhosphoBase Protein phosphorylation sites

http://www.cbs.dtu.dk/databases/PhosphoBas

e/

PR OMISE Prosthetic centers and metal ions in protein active sites http://metallo.scripps.edu/PR OMISE

PR OSITE Biologically significant protein patterns and profiles http://www.expasy.org/prosite

Page 76: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 76/156

3.5. Protein domain databases; protein classification3.5. Protein domain databases; protein classification

CDD

Conserved domain database: includes protein domains from

Pfam, SMAR T and COG databases

http://www.ncbi.nlm.nih.gov/Structure/cdd/

cdd.shtml

CluSTr Clusters of Swiss-Prot+TrEMBL proteins http://www.ebi.ac.uk/clustr 

Hits A database of protein domains and motifs http://hits.isb-sib.ch/

InterPro

Integrated resource of protein families, domains and

functional sites http://www.ebi.ac.uk/interpro

iProClass Integrated protein classification database http://pir.georgetown.edu/iproclass/

MetaFam Database of protein family annotations http://metafam.ahc.umn.edu/

Pfam Protein families: multiple sequence alignments and profilehidden Markov models of protein domains

http://www.sanger.ac.uk/Software/Pfa

m/

PIR SF Family/superfamily classification of whole proteins http://pir.georgetown.edu/pirsf/

PR INTS Hierarchical gene family fingerprints

http://www.bioinf.man.ac.uk/dbbrowser/PR IN

TS/

Page 77: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 77/156

PIR -ALN Curated database of protein sequence alignments

http://pir.georgetown.edu/pirwww/dbinfo/piraln

.html

ProClass

Protein families defined by PIR superfamilies and

PR OSITE patterns

http://pir.georgetown.edu/gfserver/proclass.htm

l

ProDom Protein domain families http://www.toulouse.inra.fr/prodom.html

ProtoMap Hierarchical classification of Swiss-Prot proteins http://protomap.cornell.edu/

ProtoNet Hierarchical clustering of Swiss-Prot proteins http://www.protonet.cs.huji.ac.il/

SBASE Protein domain sequences and tools http://www.icgeb.org/sbase

SMAR T

Simple modular architecture research tool: signalling,

extracellular and chromatin-associated protein domains http://smart.embl-heidelberg.de/

SUPFAM Grouping of sequence families into superfamilies http://pauling.mbu.iisc.ernet.in/supfam

SYSTER S Systematic re-searching and clustering of proteins http://systers.molgen.mpg.de/

TIGR FAMs TIGR   protein families adapted for functional annotation http://www.tigr.org/TIGR FAMs

3 6 Databases of individual protein families3 6 Databases of individual protein families

Page 78: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 78/156

3.6. Databases of individual protein families3.6. Databases of individual protein families

AAR SDB Aminoacyl-tR   NA synthetase database http://rose.man.poznan.pl/aars/index.html

ABCdb ABC transporters database http://ir2lcb.cnrs-mrs.fr/ABCdb/

ASPD Artificial selected proteins/peptides database http://wwwmgs.bionet.nsc.ru/mgs/gnw/aspd/

BacTregulators Transcriptional regulators of AraC and TetR  families http://www.bactregulators.org/

CSDBase Cold shock domain-containing proteins

http://www.chemie.uni-

marburg.de/ csdbase/

DExH/D

Family

Database DEAD-box, DEAH-box and DExH-box proteins http://www.helicase.net/dexhd/dbhome.htm

Endogenous

GPCR List G protein-coupled receptors; expression in cell lines http://www.tumor-gene.org/GPCR /gpcr.html

ESTHER  Esterases and other alpha/beta hydrolase enzymes http://www.ensam.inra.fr/esther 

EyeSite Families of proteins functioning in the eye http://eyesite.cryst.bbk.ac.uk/

GPCR DB G protein-coupled receptors database http://www.gpcr.org/7tm/

Histone

Page 79: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 79/156

Histone

Database Histone fold sequences and structures http://research.nhgri.nih.gov/histones/

HIV Molecular 

Immunology

Database HIV epitopes http://hiv-web.lanl.gov/immunology/

HIV Protease

Database HIV reverse transcriptase and protease sequences http://hivdb.stanford.edu/

Homeobox Page Homeobox proteins, classification and evolution

http://www.biosci.ki.se/groups/tbu/homeo.ht

ml

HomeodomainR esource

Homeodomain sequences, structures and related genetic andgenomic information http://research.nhgri.nih.gov/homeodomain

HOR DE Human olfactory receptor data exploratorium http://bioinfo.weizmann.ac.il/HOR DE/

InBase

Inteins (protein splicing elements) database: properties,

sequences, bibliography http://www.neb.com/neb/inteins.html

Kabat Database Sequences of proteins of immunological interest http://immuno.bme.nwu.edu/

KinG

Ser/Thr/Tyr-specific protein kinases encoded in complete

genomes http://hodgkin.mbu.iisc.ernet.in/king

KnottinsDatabase of knottins²small proteins with an unusual

µdisulfide through disulfide¶ knothttp://knottin.cbs.cnrs.fr

Page 80: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 80/156

LGICdb Ligand-gated ion channel subunit sequences database

http://www.pasteur.fr/recherche/banques/

LGIC/LGIC.html

Lipase

Engineering

Database Sequence, structure and function of lipases and esterases http://www.led.uni-stuttgart.de/

LOX-DB Mammalian, invertebrate, plant and fungal lipoxygenases http://www.dkfz-heidelberg.de/spec/lox-db/

MER OPS Database of proteolytic enzymes (peptidases) http://www.merops.ac.uk/

MHCPEP MHC-binding peptides http://wehih.wehi.edu.au/mhcpep/

MPIMP Mitochondrial protein import machinery of plants

http://millar3.biochem.uwa.edu.au/ lister/i

ndex.html

  NPD Nuclear protein database http://npd.hgu.mrc.ac.uk/

 NucleaR DB Nuclear receptor superfamily http://www.receptors.org/NR /

 Nuclear 

R eceptor 

R esource  Nuclear receptor superfamily http://nrr.georgetown.edu/nrr/nrr.html

 NUR EBASE Nuclear hormone receptors database

http://www.ens-

lyon.fr/LBMC/laudet/nurebase/nurebase.

html

Page 81: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 81/156

Olfactory R eceptor 

DatabaseSequences for olfactory receptor-like molecules http://ycmi.med.yale.edu/senselab/ordb/

ooTFD O  bject-oriented transcription factors database http://www.ifti.org/ootfd

PK R 

Protein kinase resource: sequences, enzymology,

genetics and molecular and structural properties http://pkr.sdsc.edu/

PLANT-PIs Plant protease inhibitors http://bighost.area.ba.cnr.it/PLANT-PIs

PlantsP/PlantsT

Plant proteins involved in phosphorylation and

membrane transport

http://plantsp.sdsc.edu/

Prolysis Proteases and natural and synthetic protease inhibitors http://delphi.phys.univ-tours.fr/Prolysis/

R EBASE R estriction enzymes and associated methylases http://rebase.neb.com/rebase/rebase.html

R ibonuclease P

Database R  Nase P sequences, alignments and structures http://www.mbio.ncsu.edu/R  NaseP/home.html

R PG R ibosomal protein gene database http://ribosome.miyazaki-med.ac.jp/

RTKdb Receptor tyrosine kinase sequences http://pbil univ lyon1 fr/RTKdb/

Page 82: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 82/156

R TKdb R eceptor tyrosine kinase sequences http://pbil.univ-lyon1.fr/R TKdb/

S/MAR t dB Nuclear scaffold/matrix attached regions http://smartdb.bioinf.med.uni-goettingen.de/

SDAP Structural database of allergenic proteins and food allergenshttp://fermi.utmb.edu/SDAP

SENTR A Sensory signal transduction proteins

http://wit.mcs.anl.gov/WIT2/Sentra/HTML/

sentra.html

SEVENS 7-transmembrane helix receptors (G-protein-coupled) http://sevens.cbrc.jp/

SR PDB Proteins of the signal recognition particles

http://bio.lundberg.gu.se/dbs/SRPDB/SR 

PDB.html

TrSDB Transcription factor database http://ibb.uab.es/trsdb

VIDA Homologous viral protein families database

http://www.biochem.ucl.ac.uk/bsm/virus_da

tabase/VIDA.html

VKCDB

Voltage-gated potassium channel database http://vkcdb.biology.ualberta.ca/

Wnt Database Wnt proteins and phenotypes

http://www.stanford.edu/rnusse/wntwindow.

html

D b C i LiD b C i Li

Page 83: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 83/156

Database Categories List

Nucleotide Sequence Databases

RNA sequence databases

Protein sequence databases

Structure DatabasesGenomics Databases (non-vertebrate)

Metabolic and Signaling Pathways

Human and other Vertebrate Genomes

Human Genes and Diseases

Microarray Data and other Gene Expression Databases

Proteomics ResourcesOther Molecular Biology Databases

Organelle databases

Plant databases

Immunological databases

Database Categories ListDatabase Categories List

Page 84: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 84/156

The number of known molecular structures is increasing

very rapidly and these are available through the various

databases comprising of structural information

regarding the specific molecule. Various sub categories

lying in this divison of molecular databases are:

1)Small molecules

2)Carbohydrates

3)Nucleic acid structure4)Protein structure

5) Unicellular eukaryotes genome databases.

Structure DatabasesStructure Databases

Page 85: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 85/156

4. Structure Databases4. Structure Databases

4.1. Small molecules4.1. Small molecules

CSD

Cambridge structural database: crystal structure information

for organic and metal-organic compounds

http://www.ccdc.cam.ac.uk/prods/csd/csd.

html

HIC-Up Hetero-compound Information Centre²Uppsala http://xray.bmc.uu.se/hicup

AANT Amino acid±nucleotide interaction database http://aant.icmb.utexas.edu/

Klotho Collection and categorization of biological compounds http://www.biocheminfo.org/klotho

LIGAND Chemical compounds and reactions in biological pathways http://www.genome.ad.jp/ligand/

4.2. Carbohydrates4.2. Carbohydrates

CCSD Complex carbohydrate structure database (CarbBank)

http://bssv01.lancs.ac.uk/gig/pages/gag/c

arbbank.htm

Glycan Carbohydrate database, part of the KEGG system http://glycan.genome.ad.jp/

Page 86: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 86/156

GlycoSuiteDB N- and O-linked glycan structures and biological sources http://www.glycosuite.com/

Monosaccharide

Browser Space filling Fischer projections of monosaccharides

http://www.jonmaber.demon.co.uk/monosac

charide

SWEET-DB

Annotated carbohydrate structure and substance

information

http://www.dkfz-

heidelberg.de/spec2/sweetdb/

4.3. Nucleic acid structure4.3. Nucleic acid structure

 NDB Nucleic acid-containing structures http://ndbserver.rutgers.edu/

 NTDB Thermodynamic data for nucleic acids http://ntdb.chem.cuhk.edu.hk/

R  NABase R  NA-containing structures from PDB and NDB http://www.rnabase.org/

SCOR 

Structural classification of R  NA: R  NA motifs by structure,

function and tertiary interactions http://scor.lbl.gov/

Page 87: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 87/156

PR ODOR IC NET Prokaryotic database of gene regulation networks http://prodoric.tu-bs.de/

PromEC

 E.coli  promoters with experimentally-identified

transcriptional start sites http://bioinfo.md.huji.ac.il/marg/promec

SELEX_DB

DNA and R  NA binding sites for various proteins, found

 by systematic evolution of ligands by exponential

enrichment

http://wwwmgs.bionet.nsc.ru/mgs/systems/s

elex/

TESS Transcription element search system http://www.cbil.upenn.edu/tess

TR ANSCompel

Composite regulatory elements affecting gene

transcription in eukaryotes

http://www.gene-

regulation.com/pub/databases.html#transco

mpel

TR ANSFAC Transcription factors and binding sites

http://transfac.gbf.de/TRANSFAC/index.

html

TRR D Transcription regulatory regions of eukaryotic genes http://www.bionet.nsc.ru/trrd/

4.4. Protein structure4.4. Protein structure

Page 88: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 88/156

4.4. Protein structure4.4. Protein structure

ArchDB Automated classification of protein loop structures http://gurion.imim.es/archdb

ASTR AL

Sequences of domains of known structure, selected subsets

and sequence-structure correspondences http://astral.stanford.edu/

BAliBASE A database for comparison of multiple sequence alignments

http://www-igbmc.u-

strasbg.fr/BioInfo/BAliBASE2/index.html

BioMagR esBank NMR spectroscopic data for proteins and nucleic acids http://www.bmrb.wisc.edu/

CADB Conformational angles in proteins database http://cluster.physics.iisc.ernet.in/cadb/

CATH Protein domain structures database

http://www.biochem.ucl.ac.uk/bsm/cath_ 

new

CE 3D Protein structure alignments http://cl.sdsc.edu/ce.html

CKAAPs DB Structurally-similar proteins with dissimilar sequences http://ckaap.sdsc.edu/

Dali Protein fold classification using the Dali search enginehttp://www.bioinfo.biocenter.helsinki.fi:8

080/dali/

Page 89: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 89/156

Decoys µR ¶ Us Computer-generated protein conformations http://dd.stanford.edu/

DisProt

Database of Protein Disorder: information about proteins that

lack fixed 3D structure in their native states http://divac.ist.temple.edu/disprot

DomIns Domain insertions in known protein structures http://stash.mrc-lmb.cam.ac.uk/DomIns

DSDBASE Native and modeled disulfide bonds in proteins

http://www.ncbs.res.in/ faculty/mini/dsdba

se/dsdbase.html

DSMM Database of simulated molecular motions http://projects.villaosch.de/dbase/dsmm/

eF-site

Electrostatic surface of Functional site: electrostatic potentials

and hydrophobic properties of the active sites http://ef-site.protein.osaka-u.ac.jp/eF-site

FSSP

Fold classification based on structure-structure alignment of 

 proteins, currently maintained as Dali database http://www.ebi.ac.uk/dali/fssp

Gene3D Precalculated structural assignments for whole genomes

http://www.biochem.ucl.ac.uk/bsm/cath_ne

w/Gene3D/

GTD

Genomic threading database: structural annotations of 

complete genomes http://bioinf.cs.ucl.ac.uk/GTD

GTOP Protein fold predictions from genome sequences http://spock.genes.nig.ac.jp/ genome/

Page 90: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 90/156

Het-PDB Navi Hetero-atoms in protein structures

http://daisy.nagahama-i-

 bio.ac.jp/golab/hetpdbnavi.html

HOMSTR AD

Homologous structure alignment database: curated structure-

 based alignments for protein families http://www-cryst.bioc.cam.ac.uk/homstrad

IMB Jena

Image Library Visualization and analysis of 3D biopolymer structures http://www.imb-jena.de/IMAGE.html

IMGT/3Dstruct

ure-DB

Sequences and 3D structures of vertebrate immunoglobulins, T

cell receptors and MHC proteins http://imgt3d.igh.cnrs.fr 

ISSD Integrated sequence-structure database http://www.protein.bio.msu.su/issd

LPFC Library of protein family core structures

http://www-

smi.stanford.edu/projects/helix/LPFC

MMDB NCBI¶s database of 3D structures, part of NCBI Entrez http://www.ncbi.nlm.nih.gov/Structure

E-MSD EBI¶s macromolecular structure database http://www.ebi.ac.uk/msd

ModBase Annotated comparative protein structure models http://salilab.org/modbase

MolMovDB

Database of macromolecular movements: descriptions of 

 protein and macromolecular motions, including movies http://bioinfo.mbb.yale.edu/MolMovDB/

PALI Phylogeny and alignment of homologous protein structures http://pauling.mbu.iisc.ernet.in/ pali

PASS2 Structural motifs of protein superfamilieshttp://ncbs.res.in/ faculty/mini/campass/pas

s.html

PepConfDB A database of peptide conformations

http://202.41.70.49:8080/pepconfdb/index.ht

m

Page 91: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 91/156

PepConfDB A database of peptide conformations m

PDB

Protein structure databank: all publicly available 3D

structures of proteins and nucleic acids http://www.rcsb.org/pdb

PDB-R EPR DB R epresentative protein chains, based on PDB entries http://www.cbrc.jp/pdbreprdb/

PDBsum Summaries and analyses of PDB structures http://www.biochem.ucl.ac.uk/bsm/pdbsum

SCOP Structural classification of proteins http://scop.mrc-lmb.cam.ac.uk/scop

Sloop Classification of protein loops http://www-cryst.bioc.cam.ac.uk/ sloop/

Structure-

Superposition

Database Pairwise superposition of TIM-barrel structures http://ssd.rbvi.ucsf.edu/

SWISS-MODEL

R epository Database of annotated 3D protein structure models http://swissmodel.expasy.org/repository

SUPER FAMILY Assignments of proteins to structural superfamilies http://supfam.org/

SUR FACE

Surface residues and functions annotated, compared and

evaluated: a database of protein surface patches http://cbm.bio.uniroma2.it/surface

TargetDB Target data from worldwide structural genomics projects http://targetdb.pdb.org/

3D-GENOMICS Structural annotations for complete proteomes http://www.sbg.bio.ic.ac.uk/3dgenomics

TOPS Topology of protein structures database http://www.tops.leeds.ac.uk  

Database Categories ListDatabase Categories List

Page 92: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 92/156

Database Categories List

Nucleotide Sequence Databases

RNA sequence databases

Protein sequence databases

Structure DatabasesGenomics Databases (non-vertebrate)

Metabolic and Signaling Pathways

Human and other Vertebrate Genomes

Human Genes and Diseases

Microarray Data and other Gene Expression Databases

Proteomics ResourcesOther Molecular Biology Databases

Organelle databases

Plant databases

Immunological databases

Database Categories ListDatabase Categories List

G i D bG i D b

Page 93: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 93/156

Genomics DatabasesGenomics Databases

For organisms of major interest to geneticists, there is a long history of conventionally

published catalogues of genes or mutations. In the past few years, most of these have

been made available in an electronic form and a variety of new databases have been

developed. These databases vary greatly in the classes of data captured and how these

data are stored.This category of databases comprising of the information regarding

various genomes like of Humans ,Plants, Viral, Invertebrate, Microbes etc.

1)Genome annotation terms, ontologies and nomenclature

2)Taxonomy and identification

3)General genomics databases

4)Viral genome databases

5)Prokaryotic genome databases

6)Unicellular eukaryotes genome databases

7)Fungal genome databases

8)Invertebrate genome databases

9)Human genome databases, maps and viewers.

Page 94: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 94/156

5. Genomics Databases (non5. Genomics Databases (non--human)human)

5.1. Genome annotation terms, onthologies and nomenclature5.1. Genome annotation terms, onthologies and nomenclature

GenewHuman gene nomenclature: approved genesymbols

http://www.gene.ucl.ac.uk/nomenclature

GO Gene onthology consortium database http://www.geneontology.org/

GOA Gene onthology annotation project http://www.ebi.ac.uk/GOA

IUBMB Nomenclature

database

 Nomenclature of enzymes, membranetransporters, electron transport proteins and other 

  proteins htt

IUPAC

 Nomenclature

database

 Nomenclature of biochemical and organic

compounds approved by the IUBMB-IUPAC

Joint Commission http://www.chem.qmul.ac.uk/iupac

IUPHAR -R D

The International Union of Pharmacology

recommendations on receptor nomenclature and

drug classification http://www.iuphar-db.org/iuphar-rd/

PANTHER  Gene products organized by biological function http://panther.celera.com/

Page 95: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 95/156

SOUR CE

Functional genomic resource for annotations ontologies and

expression data http://source.stanford.edu/

UMLS Unified medical language system http://umlsks.nlm.nih.gov/

5.1.1. Taxonomy and Identification5.1.1. Taxonomy and Identification

ICB gyr  B

database for identification and classification of bacteria http://www.mbio.co.jp/icb

 NCBI

Taxonomy Names and taxonomic lineages of all organisms in GenBank  http://www.ncbi.nlm.nih.gov/Taxonomy/

R IDOM r R   NA-based differentiation of medical microorganisms http://www.ridom-rdna.de/

R DP R ibosomal database project http://rdp.cme.msu.edu

Tree of Life Information on phylogeny and biodiversity

http://phylogeny.arizona.edu/tree/phylogeny

.html

5.2. General genomics databases5.2. General genomics databases

Page 96: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 96/156

gg

COG

Clusters of orthologous groups of proteins from unicellular 

microorganisms http://www.ncbi.nlm.nih.gov/COG

COR G

Comparative regulatory genomics: conserved non-coding

sequence blocks http://corg.molgen.mpg.de/

DEG Database of essential genes from bacteria and yeast http://tubic.tju.edu.cn/deg

EBI Genomes

EBI¶s collection of databases for the analysis of complete and

unfinished viral, pro- and eukaryotic genomes http://www.ebi.ac.uk/genomes

EGO

Eukaryotic gene orthologs: orthologous DNA sequences in

the TIGR gene indices http://www.tigr.org/tdb/tgi/ego/

EMGlib

Enhanced microbial genomes library: completely sequenced

genomes of unicellular organisms http://pbil.univ-lyon1.fr/emglib/emglib.html

EntrezGenomes

 NCBI¶s collection of databases for the analysis of completeand unfinished viral, pro- and eukaryotic genomes

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome

ER GOLight

Integrated biochemical data on seven bacterial genomes:

 publicly available portion of the ER GO database http://www.ergo-light.com/ER GO

FusionDB Database of bacterial and archaeal gene fusion events http://igs-server.cnrs-mrs.fr/FusionDB

Genome

Page 97: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 97/156

Genome

information

 broker 

DDBJ¶s collection of databases for the analysis of complete

and unfinished viral, pro- and eukaryotic genomes http://gib.genes.nig.ac.jp

GOLD

Genomes online database: a listing of completed and ongoing

genome projects http://www.genomesonline.org/

TIGR  

Microbial

Database

Lists of completed and ongoing genome projects with links to

complete genome sequences

http://www.tigr.org/tdb/mdb/mdbcomplet

e.html

HGT-DB

Putative horizontally transferred genes in prokaryotic

genomes http://www.fut.es/ debb/HGT/

KEGG

K yoto encyclopedia of genes and genomes: integrated suite of 

databases on genes, proteins, and metabolic pathways http://www.genome.ad.jp/kegg

MBGD Microbial genome database for comparative analysis http://mbgd.genome.ad.jp/

OR Fanage

Database of orphan OR Fs (OR Fs with no homologs) in

complete microbial genomes http://www.cs.bgu.ac.il/ nomsiew/OR Fans

PACR AT Archaeal and bacterial intergenic sequence features http://www.biosci.ohio-tate.edu/ pacrat

PEDANT R esults of an automated analysis of genomic sequences http://pedant.gsf.de

Page 98: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 98/156

TIGR  

Comprehensiv

e Microbial

R esource

Various data on complete microbial genomes: uniform

annotation, properties of DNA and predicted proteins http://www.tigr.org/CMR 

TransportDB

Predicted membrane transporters in complete genomes,

classified according to the TC classification system http://www.membranetransport.org

WIT

What is there? Metabolic reconstruction for completely

sequenced microbial genomes http://wit.mcs.anl.gov/WIT2/

5.3. Organism5.3. Organism--specific genomic databasesspecific genomic databases

5.3.1. Viruses5.3.1. Viruses

HCVDB The hepatitis C virus database http://hepatitis.ibcp.fr/

HIV Drug

R esistance

Database

Mutations in HIV genes that confer resistance to anti-HIV 

drugs http://resdb.lanl.gov/R esist_DB/default.htm

VirGen

Annotated and curated database for complete viral genome

sequences http://bioinfo.ernet.in/virgen/virgen.html

5.3.2. Prokaryotes5.3.2. Prokaryotes

Page 99: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 99/156

yy

5.3.2.1.5.3.2.1. Escherichia coli  Escherichia coli 

ASAP

A systematic annotation package for community analysis

of  E.coli and related genomes

https://asap.ahabs.wisc.edu/annotation/php/A

SAP1.htm

CCDB CyberCell database: E.coli database at U. Alberta http://redpoll.pharmacy.ualberta.ca/CCDB

coliBase A database for   E.coli, S al monell a and Shigell a http://colibase.bham.ac.uk/

Colibri  E.coli genome database at Institut Pasteur http://genolist.pasteur.fr/Colibri/

Essential genes in

 E.coli First results of an E.coli gene deletion project

http://magpie.genome.wisc.edu/ chris/esse

ntial.html

GenoBase  E.coli genome database at Nara Institute http://ecoli.aist-nara.ac.jp/

GenProtEC  E.coli K-12 genome and proteome database http://genprotec.mbl.edu

PEC Profiling of  E.coli chromosome http://shigen.lab.nig.ac.jp/ecoli/pec

EcoCyc

 E.coli K-12 genes, metabolic pathways, transporters, and

gene regulationhttp://ecocyc.org/

EcoGene Sequence and literature data on E.coli genes and proteins

http://bmb.med.miami.edu/EcoGene/EcoWe

 b/

Page 100: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 100/156

R egulonDB Transcriptional regulation and operon organization in E.coli

http://www.cifn.unam.mx/Computational_G

enomics/regulondb/

5.3.2.2.5.3.2.2.  Bacillus subtilis Bacillus subtilis

BSOR F Bacillu s subtili s genome database at Kyoto U. http://bacillus.genome.ad.jp/

 NR Sub Non-redundant  Bacillu s subtili s database at U. Lyon http://pbil.univ-lyon1.fr/nrsub/nrsub.html

SubtiList Bacillu s subtili s genome database at Institut Pasteur http://genolist.pasteur.fr/SubtiList/

5.3.2.3. Other bacteria5.3.2.3. Other bacteria

BioCyc Pathway/genome databases for many bacteria http://biocyc.org/

CampyDB Database for  Cam pyl obacter genome analysis http://campy.bham.ac.uk/

ClostriDB Finished and unfinished genomes of C l ostridium spp. http://clostri.bham.ac.uk/

Page 101: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 101/156

CyanoBase Cyanobacterial genomes http://www.kazusa.or.jp/cyano

LeptoList Lept os pir a interr o g an s genome http://bioinfo.hku.hk/LeptoList

MolliGen Genomic data on mollicutes http://cbi.labri.fr/outils/molligen/

R sGDB  Rhod obacter s phaer oide s genome

http://www-

mmg.med.uth.tmc.edu/sphaeroides

5.3.3. Unicellular eukaryotes5.3.3. Unicellular eukaryotes

5.3.3.1. Yeast5.3.3.1. Yeast

SGD S acchar om yce s genome database http://www.yeastgenome.org/

CYGD MIPS Comprehensive yeast genome database http://mips.gsf.de/proj/yeast

Génolevures A comparison of S .cerevi siae and 14 other yeast species http://cbi.labri.fr/Genolevures

MitoPD Yeast mitochondrial protein database http://bmerc-www.bu.edu/mito

Page 102: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 102/156

5.3.3.2. Other unicellular eukaryotes5.3.3.2. Other unicellular eukaryotes

Page 103: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 103/156

5.3.3.2. Other unicellular eukaryotes5.3.3.2. Other unicellular eukaryotes

ApiEST-DB EST sequences from various Apicomplexan parasites http://www.cbil.upenn.edu/paradbs-servlet

CryptoDB C rypt os poridium parvum genome database http://cryptodb.org/

DictyBase

Genome information, literature and experimental resources

for  Dictyostelium di scoideum http://dictybase.org/

Full-Malaria

Full-length cDNA library from erythrocytic-stage

 P l asmodium f  al ciparum http://fullmal.ims.u-tokyo.ac.jp/

GeneDB

Curated database for T rypanosoma brucei,  Lei shmania

major , S . pombe and other Sanger-sequenced genomes http://www.genedb.org/

PlasmoDB  P l asmodium genome database http://plasmodb.org/

TcruziDB T rypanosoma cruzi genome d at abase http://tcruzidb.org/

ToxoDB Toxo pl asma g ondii  genome d at abase http://toxodb.org/

5.3.4. Plants5.3.4. Plants

5 3 4 1 G l l d b5 3 4 1 G l l d b

Page 104: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 104/156

5.3.4.1. General plant databases5.3.4.1. General plant databases

CropNet Genome mapping in crop plants http://ukcrop.net/

FLAGdb++ Integrative database about plant genomes

http://genoplante-

info.infobiogen.fr/FLAGdb/

GénoPlante-Info Plant genomic data from the Génoplante consortium http://genoplante-info.infobiogen.fr/

GrainGenes

Molecular and phenotypic information on wheat, barley, rye,

triticale and oats

http://wheat.pw.usda.gov or 

http://www.graingenes.org

MendelDatabase of plant EST and STS sequences annotated withgene family information http://www.mendel.ac.uk/

PHYTOPR OT Clusters of (predicted) plant proteins

http://genoplante-

info.infobiogen.fr/phytoprot

PlantGDB

Plant genome database: actively-transcribed plant genomic

sequences http://www.plantgdb.org/

Sputnik Plant EST clustering and functional annotation http://mips.gsf.de/proj/sputnik 

TIGR plant

repeat database Classification of repetitive sequences in plant genomes

http://www.tigr.org/tdb/e2k1/plant.repeat

s

TropGENE DB

Genetic and genomic information about tropical crops:

sugarcane, banana, cocoa http://tropgenedb.cirad.fr/

5 3 4 25 3 4 2 Arabidopsis thalianaArabidopsis thaliana

Page 105: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 105/156

5.3.4.2.5.3.4.2.  Arabidopsis thaliana Arabidopsis thaliana

AR AMEMNO N Ar abid o p si s thaliana membrane proteins and transporters http://aramemnon.botanik.uni-koeln.de/

AthaMap

Genome-wide map of putative transcription factor binding

sites in  Ar abid o p si s thaliana http://www.athamap.de/

CATMA

Complete  Ar abid o p si s transcriptome microarray: gene

sequence tags http://www.catma.org

FLAGdb/FST Ar abid o p si s thaliana T-DNA transformants http://genoplante-info.infobiogen.fr/

MAtDB MIPS  Ar abid o p si s thaliana database http://mips.gsf.de/proj/thal/db

SeedGenes Genes essential for   Ar abid o p si s development http://www.seedgenes.org/

TAIR  The  Ar abid o p si s information resource http://www.arabidopsis.org/

5.3.4.3. Rice5.3.4.3. Rice

Page 106: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 106/156

BGI-R ISe Beijing genomics institute rice information system http://rise.genomics.org.cn/

INE Integrated rice genome explorer http://rgp.dna.affrc.go.jp/giot/INE.html

IR IS International rice information system: all rice data http://www.iris.irri.org/

MOsDB MIPS Oryza sativa database http://mips.gsf.de/proj/rice

Oryzabase R ice genetics and genomics http://www.shigen.nig.ac.jp/rice/oryzabase/

R iceGAAS R ice genome automated annotation system http://ricegaas.dna.affrc.go.jp/

R ice

PIPELINE Unification tool for rice databases http://cdna01.dna.affrc.go.jp/PIPE

R PD R ice proteome database http://gene64.dna.affrc.go.jp/R PD/

5.3.4.4. Other plants5.3.4.4. Other plants

Page 107: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 107/156

MaizeGDB

Maize genetics and genomics database, a successor to

MaizeDB and ZmDB databases http://www.maizegdb.org/

MGI

 M edica g o genome initiative: ESTs, gene expression and

 proteomic data http://xgi.ncgr.org/mgi

MtDB M edica g o truncul at a genome http://www.medicago.org/MtDB

SGMD Soybean genomics and microarray database

http://psi081.ba.ars.usda.gov/SGMD/defaul

t.htm

5.3.5. Fungi5.3.5. Fungi

CADR E Central  A s pergillu s data repository http://www.cadre.man.ac.uk/

COGEME Phytopathogenic fungi and oomycete EST database http://cogeme.ex.ac.uk  

MagnaportheD

B M a gna porthe  gri sea integrated physical/genetic map

http://www.fungalgenomics.ncsu.edu/Proje

cts/mgdatabase/int.htm

MNCDB MIPS N eur os por a cr assa database http://mips.gsf.de/proj/neurospora/

Phytophthora

Genome

Consortium

Database ESTs from P hyt o phthor a infe st an s and P.sojae https://xgi.ncgr.org/pgc

5 3 6 Invertebrates5 3 6 Invertebrates

Page 108: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 108/156

5.3.6. Invertebrates5.3.6. Invertebrates

5.3.6.1.5.3.6.1. C aenorhabditis elegansC aenorhabditis elegans

C.eleg an s

Project Genome sequencing data at the Sanger Institute http://www.sanger.ac.uk/Projects/C_elegans

Intronerator Introns and alternative splicing in C.eleg an s and C.brigg  sae

http://www.cse.ucsc.edu/ kent/intronerator 

/

R  NAiDB R  NAi phenotypic analysis of C.eleg an s genes http://www.rnai.org/

WILMA C.eleg an s annotation database http://www.came.sbg.ac.at/wilma/

WorfDB C.eleg an s OR Feome http://worfdb.dfci.harvard.edu/

WormBaseData repository for C.eleg an s and C.brigg  sae: curated

genome annotation, genetic and physical maps, pathwayshttp://www.wormbase.org/

5.3.6.2.5.3.6.2.  Drosophila melanogaster  Drosophila melanogaster 

Page 109: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 109/156

FlyBase Dr oso phil a sequences and genomic information http://flybase.bio.indiana.edu/

GadFly Genome annotation database of   Dr oso phil a http://www.fruitfly.org

FlyBrain Database of the  Dr oso phil a nervous system http://flybrain.neurobio.arizona.edu

FlyTrap

 Dr oso phil a transgenic lines created using an intron protein

trap strategy http://flytrap.med.yale.edu/

InterActive Fly Dr oso phil a genes and their roles in development

http://sdb.bio.purdue.edu/fly/aimain/1aahom

e.htm

 Dr oso phil amicroarray

centre Data and tools for  Dr oso phil a gene expression studies http://www.flyarrays.com/fruitfly

5.3.6.3. Other invertebrates5.3.6.3. Other invertebrates

AppaDB A database on the nematode P ri stionchu s pacificu s http://appadb.eb.tuebingen.mpg.de

CnidBase Cnidarian evolution and gene expression database http://cnidbase.bu.edu/

 Nematode.net Parasitic nematode sequencing project http://nematode.net/

 NEMBASE Nematode sequence and functional data database http://www.nematodes.org

Database Categories ListDatabase Categories List

Page 110: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 110/156

Database Categories List

Nucleotide Sequence Databases

RNA sequence databases

Protein sequence databases

Structure DatabasesGenomics Databases (non-vertebrate)

Metabolic and Signaling Pathways

Human and other Vertebrate Genomes

Human Genes and Diseases

Microarray Data and other Gene Expression Databases

Proteomics ResourcesOther Molecular Biology Databases

Organelle databases

Plant databases

Immunological databases

Database Categories ListDatabase Categories List

Metabolic and Signaling PathwaysMetabolic and Signaling Pathways

Page 111: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 111/156

The metabolic and signaling pathway is a collection of 

Pathway/Signaling Databases. Each database in this

collection describes the genome and metabolic pathways

of a single organism, with some exception databases. The

categories in this

1)Enzymes and enzyme nomenclature

2)Metabolic pathways

3)Intermolecular interactions and signaling pathways

6. Metabolic Enzymes and Pathways; Signaling Pathways6. Metabolic Enzymes and Pathways; Signaling Pathways

Page 112: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 112/156

y y g g yy y g g y

6.1. Enzymes and Enzyme Nomenclature6.1. Enzymes and Enzyme Nomenclature

ENZYME Enzyme nomenclature and properties http://www.expasy.org/enzyme

BR ENDA

Enzyme names and properties: sequence, structure, specificity,

stability, reaction parameters, isolation data http://www.brenda.uni-koeln.de

IntEnz Integrated enzyme database and enzyme nomenclature http://www.ebi.ac.uk/intenz

Enzyme

 Nomenclature IUBMB Nomenclature Committee recommendations http://www.chem.qmw.ac.uk/iubmb/enzyme

6.2. Metabolic Pathways6.2. Metabolic Pathways

KEGG

Kyoto encyclopedia of genes and genomes: metabolic and

regulatory pathways encoded in complete genomes http://www.genome.ad.jp/kegg

MetaCyc Metabolic pathways and enzymes from various organisms http://metacyc.org

PathDB Biochemical pathways, compounds and metabolism http://www.ncgr.org/pathdb

UM-BBD

University of Minnesota biocatalysis and biodegradation

database: microbial catabolism and biotransformations http://umbbd.ahc.umn.edu/

WIT2Integrated system for functional curation and development of 

metabolic modelshttp://wit.mcs.anl.gov/WIT2/

6.3. Intermolecular Interactions and Signaling Pathways6.3. Intermolecular Interactions and Signaling Pathways

Page 113: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 113/156

6.3. Intermolecular Interactions and Signaling Pathways6.3. Intermolecular Interactions and Signaling Pathways

aMAZE

A system for the annotation, management and analysis of 

 biochemical and signaling pathway networks http://www.amaze.ulb.ac.be/

BIND Biomolecular interaction network database http://www.bind.ca

BioCarta Online maps of metabolic and signaling pathways

http://www.biocarta.com/genes/allPathways.

asp

BR ITE

Biomolecular relations in information transmission and

expression, part of the KEGG system http://www.genome.ad.jp/brite

DIP

Database of interacting proteins: experimentally determined

 protein±protein interactions http://dip.doe-mbi.ucla.edu

DR C Database of ribosomal crosslinks

http://www.mpimg-berlin-

dahlem.mpg.de/ ag_ribo/ag_brimacombe/

drc

GeneNet Database on gene network components

http://wwwmgs.bionet.nsc.ru/mgs/gnw/ge

nenet

Page 114: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 114/156

IntAct project Protein±protein interaction data http://www.ebi.ac.uk/intact

InterDom Putative protein domain interactions http://interdom.lit.org.sg

JenPep

Functional and quantitative thermodynamic data

on peptide binding to immunological

 biomacromolecules http://www.jenner.ac.uk/Jenpep2

MPID MHC²peptide interaction database http://surya.bic.nus.edu.sg/mpid

R OSPath

R eactive oxygen s pecies (R OS) signaling

 pathway http://rospath.ewha.ac.kr 

STCDB Signal transductions classification database

http://www.techfak.uni-

  bielefeld.de/ mchen/STCDB

STR ING

Predicted functional associations between

 proteins www.bork.emblheidelberg.de/STRING

TR ANSPATH

Gene regulatory networks and microarray

analysis

http://www.biobase.de/pages/products/

databases.html

Database Categories ListDatabase Categories List

Page 115: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 115/156

Database Categories List

Nucleotide Sequence Databases

RNA sequence databases

Protein sequence databases

Structure DatabasesGenomics Databases (non-vertebrate)

Metabolic and Signaling Pathways

Human and other Vertebrate Genomes

Human Genes and Diseases

Microarray Data and other Gene Expression Databases

Proteomics ResourcesOther Molecular Biology Databases

Organelle databases

Plant databases

Immunological databases

gg

Human and other Vertebrate GenomesHuman and other Vertebrate Genomes

Page 116: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 116/156

The Human and other vertebrate genomes is a repository of 

the human genome as well as the other vertebrate genomes

containing databases.

1)Model organisms, comparative genomics

2)Human genome databases, maps and viewers

3)H

umanO

RFs.

77.Human and other Vertebrate Genomes.Human and other Vertebrate Genomes

Page 117: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 117/156

77.Human and other Vertebrate Genomes.Human and other Vertebrate Genomes

7.1. Mitochondrial Genes and Proteins7.1. Mitochondrial Genes and Proteins

AMmtDB Metazoan mitochondrial genes

http://bighost.area.ba.cnr.it/mitochondriom

e

GOBASE Organelle genome database

http://megasun.bch.umontreal.ca/gobase/go

 base.html

MitoDat Mitochondrial proteins (predominantly human) http://www-lecb.ncifcrf.gov/mitoDat/

MitoMap Human mitochondrial genome http://www.mitomap.org/

MitoNuc Nuclear genes coding for mitochondrial proteins

http://bio-

www.ba.cnr.it:8000/BioWWW/#MitoNuc

MITOP2 Mitochondrial proteins, genes and diseases http://ihg.gsf.de/mitop2/

MitoProteome

Mitochondrial protein sequences encoded by

mitochondrial and nuclear genes http://www.mitoproteome.org

OGR eComplete mitochondrial genome sequences for 200

metazoan specieshttp://www.bioinf.man.ac.uk/ogre

7.2. Model organisms, comparative genomics7.2. Model organisms, comparative genomics

Page 118: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 118/156

ACeDB

C.eleg an s, S . pombe, and human sequences and genomic

information http://www.acedb.org/

AllGenes Human and mouse gene, transcript and protein annotation http://www.allgenes.org/

ArkDB Genome databases for farm and other animals http://www.thearkdb.org/

Cre Transgenic

Database Cre transgenic mouse lines with links to publications http://www.mshri.on.ca/nagy/

DR ESH Human cDNA clones homologous to D

r oso phil a mutantgenes http://www.tigem.it/LO

CAL/drosophila/dros.html

Ensembl Annotated information on eukaryotic genomes http://www.ensembl.org/

FANTOM Functional annotation of mouse full-length cDNA clones http://fantom2.gsc.riken.go.jp

FR EP Functional repeats in mouse cDNAs http://facts.gsc.riken.go.jp/FR EP/

IPD-MHC

Database Non-human major histocompatibility complex sequences http://www.ebi.ac.uk/ipd/mhc

GenetPig Genes controlling economic traits in pig http://www.infobiogen.fr/services/Genetpig

K OG Eukaryotic orthologous groups of proteins

http://www.ncbi.nlm.nih.gov/C OG/new/sh

okog.cgi

LocusLink Curated sequences and descriptions of genetic loci http://www ncbi nlm nih gov/LocusLink

Page 119: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 119/156

LocusLink Curated sequences and descriptions of genetic loci http://www.ncbi.nlm.nih.gov/LocusLink 

Mouse

Genome

Database Mouse genome database http://www.informatics.jax.org/

Mouse SAGE SAGE libraries from various mouse tissues and cell lines http://mouse.biomed.cas.cz/sage

Mouse

Targeted

Mutations Information on transgenic animals and targeted mutations http://tbase.jax.org/

MTID Mouse transposon insertion database http://mouse.ccgb.umn.edu/transposon/

PEDE Pig EST data explorer: full-length cDNA libraries and ESTs http://pede.gene.staff.or.jp/

R at Genome

Database R at genetic and genomic data http://rgd.mcw.edu/

TIGR Gene

Indices Organism-specific databases of EST and gene sequences http://www.tigr.org/tdb/tgi.shtml

UniGene Unified clusters of ESTs and full-length mR  NA sequences http://www.ncbi.nlm.nih.gov/UniGene/

UniSTSUnified non-redundant view of sequence tagged sites withmarker and mapping data from a variety of resources

http://www.ncbi.nlm.nih.gov/entrez/query.f cgi?db=unists

ZFIN Genetic, genomic and developmental data from zebrafish http://zfin.org/

7.3. Human genome databases, maps and viewers7.3. Human genome databases, maps and viewers

Page 120: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 120/156

g , pg , p

Ensembl Annotated information on eukaryotic genomes http://www.ensembl.org/

AluGene Complete  Alu map in the human genome http://alugene.tau.ac.il/

CroW 21 Human chromosome 21 database http://bioinfo.weizmann.ac.il/crow21/

G3-R H Stanford G3 and TNG radiation hybrid maps http://www-shgc.stanford.edu/R H/

GB4-R H Genebridge4 human radiation hybrid mapshttp://www.sanger.ac.uk/Software/R Hserver/R Hserver.shtml

GDB Human genes and genomic maps http://www.gdb.org/

GenAtlas Human genes, markers and phenotypes http://www.citi2.fr/GENATLAS/

GeneCards

Integrated database of human genes, maps, proteins and

diseases http://bioinfo.weizmann.ac.il/cards/

GeneLoc

Gene location database (formerly UDB²Unified database

for human genome mapping) http://genecards.weizmann.ac.il/geneloc/

GeneNest Gene indices of human, mouse, zebrafish, etc. http://genenest.molgen.mpg.de/

GenMapDB Mapped human BAC clones http://genomics.med.upenn.edu/genmapdb

Gene Resource

Page 121: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 121/156

Gene R esource

Locator Alignment of ESTs with finished human sequence http://grl.gi.k.u-tokyo.ac.jp/

HOWDY Human organized whole genome database http://www-alis.tokyo.jst.go.jp/HOWDY/

HuGeMap Human genome genetic and physical map data

http://www.infobiogen.fr/services/Hugema

 p

Human BAC Ends

Database Non-redundant human BAC end sequences

http://www.tigr.org/tdb/humgen/bac_end_s

earch/bac_end_intro.html

IXDB Physical maps of human chromosome X http://ixdb.mpimg-berlin-dahlem.mpg.de/

 NCBI R efSeq Non-redundant DNA and protein sequence collection http://www.ncbi.nlm.nih.gov/R efSeq/

UCSC Genome

Browser Genome assemblies and annotation http://genome.ucsc.edu/

ParaDB Paralogy mapping in human genomes http://abi.marseille.inserm.fr/paradb/

R Hdb R adiation hybrid map data http://www.ebi.ac.uk/R Hdb

STACK Sequence tag alignment and consensus knowledgebase http://www.sanbi.ac.za/Dbases.html

Database Categories ListDatabase Categories List

Page 122: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 122/156

Database Categories List

Nucleotide Sequence Databases

RNA sequence databases

Protein sequence databases

Structure DatabasesGenomics Databases (non-vertebrate)

Metabolic and Signaling Pathways

Human and other Vertebrate Genomes

Human Genes and Diseases

Microarray Data and other Gene Expression Databases

Proteomics ResourcesOther Molecular Biology Databases

Organelle databases

Plant databases

Immunological databases

Human Genes and DiseasesHuman Genes and Diseases

Page 123: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 123/156

Human Genes and Diseases

Human genes and diseases is a category of those databases that has the

information regarding disease causing genes, having databases of cancerous

genes, human OR Fs, etc.

1)Human OR Fs

2)General human genetics databases

3)General polymorphism databases

4)Cancer gene databases

5)Gene-system or disease-specific databases

7.4.Human proteins7.4.Human proteins

Page 124: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 124/156

HPMR 

Human plasma membrane receptome: protein sequences,

literature, and expression database http://receptome.stanford.edu/

HPR D

Human protein reference database: domain architecture,

 post-translational modifications, and disease association http://www.hprd.org

HUNT Human novel transcripts: annotated full-length cDNAs http://www.hri.co.jp/HUNT

HUGE

Human unidentified gene-encoded large (>50 kDa) protein

and cDNA sequences http://www.kazusa.or.jp/huge

LIFEdbLocalization, interaction and functional assays of human proteins http://www.dkfz.de/LIFEdb

trome, trEST and

trGEN Databases of predicted human protein sequences ftp://ftp.isrec.isb-sib.ch/pub/databases/

8.H

uman Genes and Diseases8.H

uman Genes and Diseases

8.1. General Databases8.1. General Databases

Genetics Home

R eference A general guide on human hereditary diseases http://ghr.nlm.nih.gov/

Homophila Dr oso phil a homologs of human disease genes http://homophila.sdsc.edu/

I t ti l i ti i f ti t

Page 125: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 125/156

IMGT

International immunogenetics information system:

immunoglobulins, T cell receptors, MHC and R PI http://imgt.cines.fr/

Mutation

Spectra

Database

Mutations in viral, bacterial, yeast and mammalian

genes http://info.med.yale.edu/mutbase/

OMIA

Online Mendelian inheritance in animals: a catalog of 

animal genetic and genomic disorders http://www.angis.org.au/omia

OMIM

Online Mendelian inheritance in man: a catalog of 

human genetic and genomic disorders http://www.ncbi.nlm.nih.gov/Omim/

OR FDB Collection of  OR Fs that are sold by Invitrogen http://orf.invitrogen.com/

PathBase

European mutant mice pathology database:

histopathology photomicrographs and macroscopic

images http://www.pathbase.net/

PMD Compilation of protein mutant data http://pmd.ddbj.nig.ac.jp/

8.2. Human Mutations Databases8.2. Human Mutations Databases

8 2 1 G l l hi d t b8 2 1 G l l hi d t b

Page 126: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 126/156

8.2.1. General polymorphism database8.2.1. General polymorphism database

ALFR ED Allele frequencies and DNA polymorphisms http://alfred.med.yale.edu/

BayGenomics Genes relevant to cardiovascular and pulmonary disease http://baygenomics.ucsf.edu/

dbSNP Database of single nucleotide polymorphisms www.ncbi.nlm.nih.gov/SNP/

FIMM Functional molecular immunology data http://sdmc.krdl.org.sg:8080/fimm/

HGVS

Databases A compilation of human mutation databases http://www.hgvs.org/

HGV baseHuman genome variation database: curated human polymorphisms http://hgvbase.cgb.ki.se/

HGMD Human gene mutation database http://www.hgmd.org/

IPD

Immuno polymorphism database: data on human killer-cell

Ig-like receptors and human platelet antigens http://www.ebi.ac.uk/ipd

JSNP Japanese SNP database http://snp.ims.u-tokyo.ac.jp/

rSNP Guide SNPs in regulatory gene regions http://util.bionet.nsc.ru/databases/rsnp.html

SNP

Consortium

database SNP Consortium data http://snp.cshl.org/

TopoSNP Topographic database of non-synonymous SNPs http://gila.bioengr.uic.edu/snp/toposnp

8.2.2. Cancer8.2.2. Cancer

Page 127: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 127/156

Atlas of Genetics and

Cytogenetics in

Oncology and

Haematology

Cancer related genes, chromosomal abnormalities in

oncology and haematology, and cancer-prone diseases

http://www.infobiogen.fr/services/chromca

ncer/CGED Cancer gene expression database http://love2.aist-nara.ac.jp/CGED

Database of Germline

 p53 Mutations Mutations in human tumor and cell line p53 gene

http://www.lf2.cuni.cz/win/projects/germli

ne_mut_p53.htm

IAR C TP53 Database Human TP53 somatic and germline mutations http://www.iarc.fr/p53/

MTBMouse tumor biology database: mouse tumor types,genes, classification, incidence, pathology http://tumor.informatics.jax.org/

Oral Cancer Gene

Database

Cellular and molecular data for genes involved in oral

cancer http://www.tumor-gene.org/Oral/oral.html

R B1 Gene Mutation

Database Mutations in the human retinoblastoma (R B1) gene http://www.d-lohmann.de/R  b/

R TCGD Mouse retroviral tagged cancer gene database http://rtcgd.ncifcrf.gov/

SNP500Cancer  R e-sequenced SNPs from 102 reference samples http://snp500cancer.nci.nih.gov

SV40 Large T-

Antigen Mutant

Database Mutations in SV40 large tumor antigen gene http://bigdaddy.bio.pitt.edu/SV40/

Tumor Gene

Family

Databases

Cellular, molecular and biological data about genes involved

in ario s cancers http:// t mor gene org/tgdf html

Page 128: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 128/156

Databases in various cancers http://www.tumor-gene.org/tgdf.html

8.2.3. Gene, system or disease8.2.3. Gene, system or disease--specificspecific

ALPSbase Autoimmune lymphoproliferative syndrome database http://research.nhgri.nih.gov/alps/

Androgen

R eceptor Gene

Mutations

Database Mutations in the androgen receptor gene http://www.mcgill.ca/androgendb/

BTKbase Mutation registry for X-linked agammaglobulinemia http://bioinf.uta.fi/BTKbase/

CASR DB

Calcium-sensing receptor database: CASR mutations

causing hypercalcemia and/or hyperparathyroidism http://www.casrdb.mcgill.ca/

Cytokine Gene

Polymorphism in

Human Disease Cytokine gene polymorphism literature database

http://bris.ac.uk/pathandmicro/services/GAI

/cytokine4.htm

Collagen Mutation

Database Human type I and type III collagen gene mutations http://www.le.ac.uk/genetics/collagen/

ER GDB Estrogen responsive genes database

http://sdmc.lit.org.sg/ergdb/cgi-

 bin/explore.pl

FUNPEP

Low-complexity peptides capable of forming amyloid

 plaque

http://www.cmbi.kun.nl/swift/FUNPEP/g

ergo/

GOLD.db Genomics of lipid-associated disorders database http://gold.tugraz.at

tGR AP Mutants of G-protein coupled receptors of family A http://tinygrap.uit.no/GR AP/

Page 129: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 129/156

p p p y p yg p

HaemB Factor IX gene mutations, insertions and deletions

http://www.kcl.ac.uk/ip/petergreen/haemBd

atabase.html

HbVar Human hemoglobin variants and thalassemias http://globin.cse.psu.edu/globin/hbvar  

Human p53/hprt,

rodent lacI/lacZ

databases

Mutations at the human p53 and hprt genes; rodent

transgenic lacI and lacZ mutations

http://www.ibiblio.org/dnam/mainpage.htm

l

Human PAX2 Allelic

Variant Database Mutations in human PAX2 gene http://pax2.hgu.mrc.ac.uk/

Human PAX6 Allelic

Variant Database Mutations in human PAX6 gene http://pax6.hgu.mrc.ac.uk/

IL2R gbase

X-linked severe combined immunodeficiency

mutations http://research.nhgri.nih.gov/scid/

IMGT/Gene-DBVertebrate immunoglobulin and T cell receptor 

geneshttp://imgt.cines.fr/cgi-bin/GENElect.jv

IMGT/HLA Polymorphism of human MHC and related genes http://www.ebi.ac.uk/imgt/hla/

H di i fl di d d f ili l di

Page 130: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 130/156

INFEVER S

Hereditary inflammatory disorder and familial mediterranean

fever mutation data http://fmf.igh.cnrs.fr/infevers

KinMutBase Disease-causing protein kinase mutations http://www.uta.fi/imt/bioinfo/KinMutBase/

Lowe

Syndrome

Mutation

Database

Phosphatidylinositol-4,5-bisphosphate 5-phosphatase

mutations causing Lowe oculocerebrorenal syndrome http://research.nhgri.nih.gov/lowe/

 NCL Mutation

Database Polymorphisms in neuronal ceroid lipofuscinoses genes http://www.ucl.ac.uk/ncl/

PAHdb Mutations at the phenylalanine hydroxylase locus http://www.pahdb.mcgill.ca/

PGDB Prostate and prostatic diseases gene database http://www.ucsf.edu/PGDB

PHEXdb PHEX mutations causing X-linked hypophosphatemia http://www.phexdb.mcgill.ca/

PTCH1

Mutation

Database Mutations and SNPs found in PTCH1 gene

http://www.cybergene.se/PTCH/ptchbase.ht

ml

Database Categories ListDatabase Categories List

Page 131: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 131/156

Database Categories List

Nucleotide Sequence Databases

RNA sequence databases

Protein sequence databases

Structure DatabasesGenomics Databases (non-vertebrate)

Metabolic and Signaling Pathways

Human and other Vertebrate Genomes

Human Genes and Diseases

Microarray Data and other Gene Expression Databases

Proteomics ResourcesOther Molecular Biology Databases

Organelle databases

Plant databases

Immunological databases

Microarray Data and other Gene Expression DatabasesMicroarray Data and other Gene Expression Databases

Page 132: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 132/156

Microarrays are producing massive amounts of data.

These data, like genome sequence data, can use to gain insights into

underlying biological processes only if they are carefully recorded and stored

in databases, where they can be queried, compared and analysed by different

computer software programs .

A gene expression database can be regarded as consisting of three parts

the gene expression data matrix,

gene annotation

and sample annotation.

Hence the Microarray data and other gene expression databases is consists

of repositories of microarray data and gene expression data.

9. Microarray Data and other Gene Expression Databases9. Microarray Data and other Gene Expression Databases

Page 133: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 133/156

ArrayExpress Public collection of microarray gene expression data http://www.ebi.ac.uk/arrayexpress

Axeldb Gene expression in  X eno pu s l aevi shttp://www.dkfz-heidelberg.de/abt0135/axeldb.htm

BodyMap Human and mouse gene expression data http://bodymap.ims.u-tokyo.ac.jp/

BGED Brain gene expression database http://love2.aist-nara.ac.jp/BGED

CleanEx

Expression reference database, linking heterogeneous

expression data to facilitate cross-dataset comparisons http://www.cleanex.isb-sib.ch/

EICO DB

Expression-based imprint candidate organiser: a database for 

discovery of novel imprinted genes http://fantom2.gsc.riken.jp/EICODB/

ema p Atlas

Edinburgh mouse atlas: a digital atlas of mouse embryo

development and spatially-mapped gene expression http://genex.hgu.mrc.ac.uk/

EPConDB Endocrine pancreas consortium database http://www.cbil.upenn.edu/EPConDB

EpoDB Genes expressed during human erythropoiesis http://www.cbil.upenn.edu/EpoDB/

Fl Vi D hil d l t d ti htt // bi 07 i t d /

Page 134: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 134/156

FlyView Dr oso phil a development and genetics http://pbio07.uni-muenster.de/

GeneAnnot

R evised and improved annotation of Affymetrix human

gene probe sets http://genecards.weizmann.ac.il/geneannot/

GeneNote Human genes expression profiles in healthy tissues

http://genecards.weizmann.ac.il/genenote

/

GenePaint Gene expression patterns in the mouse http://www.genepaint.org/Frameset.html

GeneTrap

Expression patterns in an embryonic stem library of gene

trap insertions http://www.cmhd.ca/sub/genetrap.asp

GermOnline

Expression data relevant for the mitotic and meiotic cell

cycle and gametogenesis in yeast and higher eukaryotes http://www.germonline.org/

GXD Mouse gene expression database

http://www.informatics.jax.org/menus/expre

ssion_menu.shtml

HemBase Genes transcribed in differentiating human erythroid cells http://hembase.niddk.nih.gov/

HugeIndex Expression levels of human genes in normal tissues http://hugeindex.org/

Interferon

Stimulated GeneDatabase Genes induced by treatment with interferons

http://www.lerner.ccf.org/labs/williams/xchi p-html.cgi

Kidney

Development

Database

Kidney development and gene expression http://golgi.ana.ed.ac.uk/kidhome.html

MAGEST Ascidian ( Hal oc ynthia r oretzi) gene expression patterns http://www.genome.ad.jp/magest

M d k (f h t fi h O i l ti ) i

Page 135: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 135/156

MEPD

Medaka (freshwater fish Oryzias l atipe s) gene expression

 pattern database http://medaka.dsp.jst.go.jp/MEPD

MethDB DNA methylation data, patterns and profiles http://www.methdb.de/

 NASCarrays Nottingham  Ar abid o p si s Stock Centre microarray database http://affymetrix.arabidopsis.info

 NetAffx Public Affymetrix probesets and annotations http://www.affymetrix.com/

PEDB

Prostate expression database: ESTs from prostate tissue and

cell type-specific cDNA libraries http://www.pedb.org/

PEPR 

Public expression profiling resource: expression profiles in

a variety of diseases and conditions

http://microarray.cnmcresearch.org/pgadatat

able.asp

R ECODE

Genes using programmed translational recoding in their 

expression http://recode.genetics.utah.edu/

R efExA R eference database for human gene expression analysis http://www.lsbm.org/db/index_e.html

Stanford

Microarray

Database R aw and normalized data from microarray experiments

http://genome-

www.stanford.edu/microarray

Tooth

Development

Database

Gene expression in dental tissue http://bite-it.helsinki.fi/

Database Categories ListDatabase Categories List

Page 136: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 136/156

Database Categories List

Nucleotide Sequence Databases

RNA sequence databases

Protein sequence databases

Structure DatabasesGenomics Databases (non-vertebrate)

Metabolic and Signaling Pathways

Human and other Vertebrate Genomes

Human Genes and Diseases

Microarray Data and other Gene Expression Databases

Proteomics ResourcesOther Molecular Biology Databases

Organelle databases

Plant databases

Immunological databases

Proteomics ResourcesProteomics Resources

Page 137: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 137/156

The proteomic resources have databases containing

proteomics information from various genomes/proteomes.

�Characterization of Protein Complexes

�Protein Expression Profiling

�Proteome Mining

�Protein Arrays

Applications of Proteomics

What is Proteomics?What is Proteomics?

Page 138: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 138/156

What is Proteomics?What is Proteomics?

Defined as ³the analysis of the entire protein complementDefined as ³the analysis of the entire protein complement

in a given cell, tissue, or organism.´in a given cell, tissue, or organism.´

Proteomics ³also assesses activities, modifications,Proteomics ³also assesses activities, modifications,

localization, and interactions of proteins in complexes.´localization, and interactions of proteins in complexes.´

Technology of ProteomicsTechnology of Proteomics

Page 139: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 139/156

gygy

Separation and Isolation of ProteinsSeparation and Isolation of Proteins

1D and 2D PAGE1D and 2D PAGE

Edman SequencingEdman Sequencing

Mass SpectrometryMass Spectrometry

Database utilizationDatabase utilization

Page 140: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 140/156

Types of ProteomicsTypes of Proteomics

Page 141: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 141/156

Protein ExpressionProtein Expression

Quantitative study of protein expression betweenQuantitative study of protein expression between

samples that differ by some variablesamples that differ by some variable

Structural ProteomicsStructural Proteomics

Goal is to map out the 3Goal is to map out the 3--D structure of proteins andD structure of proteins andprotein complexesprotein complexes

Functional ProteomicsFunctional Proteomics

10. Proteomics Resources10. Proteomics Resources

Page 142: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 142/156

GelBank 2D gel electrophoresis patterns of proteins fromcomplete microbial genomes http://gelbank.anl.gov/

PEP

Predictions for entire proteomes: summarized

analyses of protein sequences http://cubic.bioc.columbia.edu/pep/

Proteome

Analysis

Database

Functional classification of proteins in whole

genomes http://www.ebi.ac.uk/proteome/

R ESID Pre-, co- and post-translational protein modifications

http://www-

nbrf.georgetown.edu/pirwww/dbinfo/r 

esid.html

SWISS-

2DPAGEAnnotated 2D gel electrophoresis database http://www.expasy.org/ch2d/

Other Molecular Biology DatabasesOther Molecular Biology Databases

Page 143: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 143/156

This category has the remaining types of databases. This

category again can be subdivide into the following

divisions:

1) BioImage

2) MetaRouter

3) PubMed

4) Drugs and drug design

5) Molecular probes and primers

11.Other Molecular Biology Databases11.Other Molecular Biology Databases

Page 144: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 144/156

11.1. Drugs and drug design11.1. Drugs and drug design

ANTIMIC Database of natural antimicrobial peptideshttp://research.i2r.a-star.edu.sg/Templar/DB/ANTIMIC/

APD Antimicrobial peptide database http://aps.unmc.edu/AP/main.php

BSD

Biodegradative strain database: microorganisms

that can degrade aromatic and other organic

compounds http://bsd.cme.msu.edu/

DAR T Drug adverse reaction target database http://xin.cz3.nus.edu.sg/group/drt/dart.asp

Peptaibol Peptaibol (antibiotic peptide) sequences

http://www.cryst.bbk.ac.uk/peptaibol/welco

me.html

Pharmacogenomics and

Pharmacogenetics

Knowledge Base

Variation in drug response based on human

variation http://www.pharmgkb.org/

TTD Therapeutic target database http://xin.cz3.nus.edu.sg/group/cjttd/ttd.asp

11.2. Probes11.2. Probes

Page 145: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 145/156

IMGT/PR IME

R -DB Immunogenetics oligonucleotide primer database

http://imgt3d.igh.cnrs.fr/PrimerDB/Query_ 

PrDB.pl

MPDB

Information on synthetic oligonucleotides proven useful as

 primers or probes

http://www.biotech.ist.unige.it/interlab/m

pdb.html

 probeBase

r R  NA-targeted oligonucleotide probe sequences, DNA

microarray layouts and associated information

http://www.microbialecology.net/probeba

se

R TPrimerDB R eal-time PCR primer and probe sequences

http://medgen31.ugent.be/primerdatabase/in

dex.php

Vir Oligo Virus-specific oligonucleotides for PCR and hybridization http://viroligo.okstate.edu/

11.3. Unclassified databases11.3. Unclassified databases

PubMed Citations and abstracts of biomedical literature http://pubmed.gov/

BioImage Database of multidimensional biological images http://www.bioimage.org/

Bioinformatics ToolsBioinformatics Tools

Page 146: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 146/156

Bioinformatics ToolsBioinformatics ToolsBLAST(Basic Local Alignment Search Tool)BLAST(Basic Local Alignment Search Tool)

BLAST is the algorithm used by a family of five programs that

will align your query sequence against sequences in a molecular 

database.

Statistical methods are applied to judge the significance of 

matches.

Reported alignments (i.e. sequences in the database that could

be identical to your query sequence) are reported in order of 

significance, as estimated by the applied statistics

BLASTNBLASTN

Page 147: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 147/156

Compares a nucleotide query sequence against aCompares a nucleotide query sequence against anucleotide sequence database.nucleotide sequence database.

BLASTPBLASTP

Compares an amino acid query sequence against aCompares an amino acid query sequence against aprotein sequence database.protein sequence database.

BLASTXBLASTX

Compares the sixCompares the six--frame conceptual translationframe conceptual translationproducts of a nucleotide query sequence (both strands)products of a nucleotide query sequence (both strands)against a protein sequence database.against a protein sequence database.

TBLASTNTBLASTN

Compares a protein query sequence against aCompares a protein query sequence against anucleotide sequence database dynamically translatednucleotide sequence database dynamically translated

in all six reading frames (both strands).in all six reading frames (both strands). TBLASTXTBLASTX

Compares a nucleotide query sequence against the sixCompares a nucleotide query sequence against the six--frame translations of a nucleotide sequence database.frame translations of a nucleotide sequence database.

Page 148: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 148/156

CLUSTALXCLUSTALX

Page 149: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 149/156

CLUSTALX CLUSTALX 

Clustal X (Clustal X (Thompson et al. 1997Thompson et al. 1997) is a) is aversion of version of Clustal WClustal W with a graphical userwith a graphical user

interface.interface. This programme is used for multipleThis programme is used for multiple

sequence alignment.sequence alignment.

Multiple AlignmentMultiple Alignment

Page 150: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 150/156

Multiple Alignment Multiple Alignment 

Phylogenetic AnalysisPhylogenetic Analysis

Page 151: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 151/156

Phylogenetic AnalysisPhylogenetic Analysis

Nucleic acid and protein sequences are used toNucleic acid and protein sequences are used toinfer Phylogenetic relationshipsinfer Phylogenetic relationships

Molecular phylogeny methods allow theMolecular phylogeny methods allow thesuggestion of phylogenetic trees, from a given setsuggestion of phylogenetic trees, from a given setof aligned sequences.of aligned sequences.

The phylogenetic trees aim at reconstructing theThe phylogenetic trees aim at reconstructing the

history of successive divergence which took placehistory of successive divergence which took placeduring the evolution, between the consideredduring the evolution, between the consideredsequences and their common ancestor.sequences and their common ancestor.

Phylogenetic programmesPhylogenetic programmes

Page 152: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 152/156

Phylogenetic programmesPhylogenetic programmes

PHYLIP

PAUP

MEGA

Treeview

ODEN

PHYLOWIN

TREECON

DENDRON

Gene IdentificationGene Identification

Page 153: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 153/156

Gene IdentificationGene Identification

AAT: AAT: Analysis and Annotation Tool Analysis and Annotation Tool

FGENESH:FGENESH: Splice sites, protein coding exons & geneSplice sites, protein coding exons & genemodelsmodels

Genie:Genie: Gene finder based on hidden Markov modelsGene finder based on hidden Markov models

GenScan:GenScan: Identification of gene structures in genomicIdentification of gene structures in genomicDNADNA

Grail:Grail: DNA sequence analysis toolDNA sequence analysis tool

ORF Finder:ORF Finder: Search for open reading frame, at NCBISearch for open reading frame, at NCBI

Protein Structure PredictionProtein Structure Prediction

Page 154: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 154/156

Protein Structure PredictionProtein Structure Prediction

3D3D--PSSM:PSSM: Protein Fold RecognitionProtein Fold Recognition

Multicoil:Multicoil: Predict coiled coil structuresPredict coiled coil structures

NNPredict:NNPredict: Protein secondary structure predictionProtein secondary structure prediction

PredictProtein:PredictProtein: Sequence analysis and structureSequence analysis and structurepredictionprediction

SAPS:SAPS: Statistical analysis of protein sequencesStatistical analysis of protein sequences

Protein 3D Structure /Protein 3D Structure /

Page 155: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 155/156

ModellingModelling

FUGUE:FUGUE: SequenceSequence--structure homology recognitionstructure homology recognition

PDB Viewer:PDB Viewer: Protein structure databaseProtein structure database

Proinformatix:Proinformatix: Modeling oligopeptides for energeticallyModeling oligopeptides for energeticallyminimized structuresminimized structures

SWISSSWISS--MODEL:MODEL:  An automated knowledge An automated knowledge--basedbasedprotein modelling serverprotein modelling server

Page 156: INDO Thai What is Bioinformatics,A.sharMA

8/6/2019 INDO Thai What is Bioinformatics,A.sharMA

http://slidepdf.com/reader/full/indo-thai-what-is-bioinformaticsasharma 156/156