Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

Preview:

Citation preview

Martin John Bishop

UK HGMP Resource CentreHinxtonCambridge CB10 1 SBmbishop@hgmp.mrc.ac.ukhttp://www.hgmp.mrc.ac.uk

Bioinformatics scope

Genome sequences - DNA Transcripts - RNA Proteins Protein interactions Macromolecular assemblies Development and cellular function Genetic linkage analysis

Molecular biology needs bioinformatics

Biological data - molecules Sequences Structures Gene expression Proteomes Pathways Evolution

Computer analysis – methods Comparison Modelling Co-regulation Mass spectrometry Knowledge bases Phylogenetics

Molecular biology is about information

Central dogma DNA

<-> RNA -> protein -> phenotype <- DNA

Molecules Processes

Central paradigm Genome repository <-> RNA world -> Protein sequence -> Protein structure -> Protein function -> Phenotype<- Fed back to genome

Information processing

The activities of HGMP-RC

B io in fo rm a tics S e rv ices

M H C F u g u M o u se se q ue n c ing T e ch n o lo gyd e ve lop m e nt

R e se a rch

B io lo g ica l m a te ria lsb y m a il o rd er

B io lo g ica l se rv ice sin c lu d in g

h o te l fa c ilit ies

C o n tra c t R & D

B io lo g y S e rv ices

H G M P -R C

On-line service

M a ilN e tw o rk N e w sF ile s /B a ckup

S e rv ice s

D a ta L in ks

U n re s tric te d

P u b licD a ta

P riva teD a ta

R e g is te re d u se rs

In fo rm a tion A n a ly tica l to o ls

O n -lin e se rv ice

HGMP-RC SERVICE

Web menu X (or VNC) Java Telnet

Telnet menu / Unix login

GENOME WEB

Up to date Relevant Fully searchable Fully verified Extensive

INTEGRATED ANALYSIS

BLAST NIX PIX GLUE PIE MAGI PINT

COMMON OPTIONS

EMBOSS GCG PINE CLUSTAL STADEN PASSWORD

GENOMICS APPLICATIONS

Linkage Analysis Radiation Hybrid Mapping Sequence Ready Clone Maps Genome Databases Polymorphisms Sequence Analysis Gene Prediction Expression Profiling Phylogenetic Analysis Integrated Tools - GLUE,

RHYME, NIX, PIE

PROTEOMICS APPLICATIONS

Protein Sequence Analysis Protein Structure Analysis Protein Structural Modelling Proteome Databases Tools for Peptide Sequence

Determination Protein Cellular Localisation Protein Functional Studies Pathways and Protein

Interactions Integrated tools and

databases - PIX

NETWORK / JANET SERVICE

LONDON Currently 34 Mbps

main link Future keep 34

Mbps link for backup

CAMBRIDGE Currently 8 Mbps

redundant link Future Gigabit

Ethernet

SERVERS

More than 80 servers 1, 4 and 8 cpu SMP Sparc and Intel Solaris and Linux Databases doubling every 14 months

LOADS

Load is the percentage of processes trying to run

Interactive load 50% Job queues load 100% Jobs waiting can be 6-10 times the

work being processed

PROCESSES AND QUEUES

Menu service (hot swop) General analysis (overloaded) Sun BLAST and NIX queue Dell BLAST queue BLAST data file server Interactive Linkage queue Heavy Linkage queue

USERS’ REAL WORLD PROBLEMS

Comparative method Extrapolate from known to similar Hints to reduce the amount of

experimental work that needs to be done

SOFTWARE SYSTEMS

A variety of technical solutions are used BLAST NCBI Entrez SRS GeneCards NIX ENSEMBL

HELPING THE USER

Information discovery – completeness Communication – multiple sites Ontology – uniformity? Software integration – ease of use Reasoning about results Monitoring – repeat queries

MAJOR CHALLENGES

User interface Back end processing Cost recovery

NEW TECHNOLOGIES?

Web services GRID (EMBnet) Object-orientated computing Multi-agent systems

TREASURE

Web service with top level container Customise for the user User selects a service and opens it as

an application An alternative view can be built

around user data as the fundamental objects

IMPLEMENTATION

EMBREO library written in Java handles web service layer (also CORBA, XML-RPC, JDBC and other connectivity)

Also handles file access and transfer and display of results (including use of VNC)

Simple Object Access Protocol (SOAP) Browser channel uses XML format

USER ACCOUNTING AND CUSTOMIZATION

Currently very complex HED NIS+ Filesystem configuration files

Future a single database Lightweight Directory Access Protocol

(LDAP)

CREDITS

Gary Williams Menu systems and Genome Web

Geoff Gibbs Network and systems

Peter Tribble Web servers, Queues, Treasure

Recommended