14
Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering) Tony Cass (Department of Biochemistry), Sevket Durucan (T. H. Huxley School of Environment) Imperial College London Discovery Net

Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

Embed Size (px)

Citation preview

Page 1: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

Discovery Net

Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering)

Bob Spence (Dept. of Electrical Engineering) Tony Cass (Department of Biochemistry), Sevket Durucan (T. H. Huxley School of Environment)

Imperial College London

Discovery Net

Page 2: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

AIM

To design, develop and implement an infrastructure to support real time processing, interaction, integration, visualisation and mining of massive amounts of time critical data generated by high throughput devices.

Page 3: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

The Consortium

Industry Connection : 4 Spin-off companies + related companies (AstraZeneca, Pfizer, GSK, Cisco, IBM, HP, Fujitsu, Gene Logic, Applera, Evotec, International Power, Hydro Quebec, BP, British Energy, ….)

Page 4: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

Industrial Contribution

Hardware : sensors (photodiode arrays, hybrid photodiodes, PMTs), systems (optics, mechanical systems, DSPs, FPGAs)

Software (analysis packages, algorithms, data warehousing and mining systems)

Intellectual Property: access to IP portfolio suite at no cost

Data: raw and processed data from biotechnology, pharmacogenomic, remote sensing (GUSTO installations, satellite data from geo-hazard programmes) and renewable energy data (from our own remote tidal power systems)

Page 5: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

High Throughput Sensing

Characteristics

Different Devices but same computational characteristics

•Data intensive &

• Data dispersive

•large scale,

•heterogeneous

•distributed data

•Real-time data manipulation Need to

• calibrate

• integrate

• analyse

GRID issues: wide area, high volume, scalability (data, users), collaboration

Data issues: different measurements for same object: Data registration, normalisation, calibration & quality control

Information issues: annotationssemantics, reference, integrated view of data

Discovery issues:  Distributed Knowledge Discovery, Management Incremental, Interactive Discovery & Collaborative Discovery

Distributed DevicesDistributed

warehousing

Distributed Reference DBs

Distributed Users

Collaborative applications

Page 6: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

High Throughput Computing Services

Distributed Data EngineeringData Registration, Data Normalisation, Data Quality

Information StructuringInformation Integration & Composition,

Semantics & Domain-based Ontologies, Sharing

Grid-based Knowledge DiscoveryGrid-based Data Mining, Collaborative Visualisation

DNet ArchitectureHigh Throughput Sensing (HTS) Applications

Large-scale Dynamic Real- time Decision

support

Large-scale Dynamic System Knowledge

Discovery

Grid Basic InfrastructureGlobus/Cordon/SRB

Utilising Grid Infrastructure for HT Computing

Based on

Kensington

Discovery P

latform

Based on

Globus &

O

RB

Infrastructure

Page 7: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

Testbed ApplicationsHTS Applications

Large-scale Dynamic Real- time Decision support

Large-scale Dynamic System Knowledge Discovery

Bio Chip Applications

Protein-folding chips: SNP chips, Diff. Gene chips using LFIIProtein-based fluorescent micro arrays

Renewable energy Applications

Tidal EnergyConnections to other renewable initiatives (solar, biomass, fuel cells), & to CHP and baseload stations

Remote Sensing Applications

Air Sensing, GUSTOGeological, geohazard analysis

1-100

10-100

>50000Image

RegistrationVisualisation

PredictiveModelling

RT decisions

1-100010-1000 >10000

Data QualityVisualisationStructuringClusteringDistributed Dynamic

Knowledge Management

Throughput(GB/s)

Size(petabytes)

Node Number

operations

1-10 1-10

>20000

StructuringMiningOptimisationRT decisions

Page 8: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

Large-scale urban air sensing applicationsEach GUSTO air pollution system produces 1kbit per second, or 1010 bits per year. We expect to increase the number (from the present 2 systems) to over 20,000 over next 3 years, to reach a total of 0.6 petabytes of data within the 3-year ramp-up.

GUSTO

GUSTO

NO

simulant 6.7.2001

The useful information comes from time-resolved correlations among remote stations, and with other environmental data sets.

You are here

Page 9: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

Electrical grid

There is large potential in embedded generation renewable sources – they will dominate in new build (nuclear., hydro and carbon) power stations. Decentralised power is the new paradigm. .

Renewables characterised by •large number of small units, •often in remote areas•wireless connectivity•fluctuating,unpredictable loading

As total exceeds 12% grid control becomes very difficult without RT e-grid.

•active management, •RT monitoring, •RT control, •minute to minute security, •pan network optimisation.

•This requires very high bandwidth •RT remote station data acquisition, •warehousing and analysis.

Page 10: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

The IC AdvantageThe IC infrastructure: microgird for the testbed

ICPC Resource

+20 TB of disk storage

+25 TB of tape storage

3 Clusters

(> 1 Tera Flops)

Network upgrade

Over than 12000 end devices

10 Mb/s – 1Gb/s to end devices

1 Gb/s between floors

10 Gb/s to backbone

10 Gb/s between backbone router matrix and wireless capability

2x1Gb/s to LMAN II

(10Gb/s scheduled 2004)

Access to disparate off-campus sites: IC hospitals, Wye College etc.

Core router switches

Building router switches

Floor switches

End devices

Core Fibre

Core to Building Fibre

Building Riser Fibre

Cat 5 floor wiring

London MANJANET

Proposed firewall

workstation cluster

storage

SMP

Central Computing Facilities

wireless

End devices

Floor switches

Building Router Switches

Core Router Switches

Proposed Firewall

London MAN/ JANET

£3m SRIF funding

150 Gflops Processing

>100 GB Memory

5 TB of disk storage

Page 11: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

Particle Physics and Astronomy Research Council (PPARC)

ASTROGRID (http://www.astrogrid.ac.uk/)

a ~£5M project aimed at building a data-grid for UK astronomy, which will form the UK contribution to a global Virtual Observatory

Page 12: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

Particle Physics and Astronomy Research Council (PPARC)

GridPP (http://www.gridpp.ac.uk/)

to develop the Grid technologies required to meet the LHC computing challenge

collaboration with international grid developments in Europe and the US

Page 13: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

EPSRC Testbeds (1)

MyGrid Personalised extensible environments for data-intensive in silico experiments in biology

Distributed Aircraft Maintenance Environment

RealityGrid closely couple high performance computing, high throughput experiment and visualization

Page 14: Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

EPSRC Testbeds (2)

GEODISE : Grid Enabled Optimisation and DesIgn Search for Engineering

CombiChem : Combinatorial Chemistry Structure-Property Mapping

Discovery Net : High Throughput Sensing