Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Dan ReedDan [email protected]@microsoft.com
Corporate Vice PresidentCorporate Vice PresidentExtreme Computing Group (XCG)Extreme Computing Group (XCG)
2
Commodity clustersProliferation of inexpensive hardware
“Attack of the Killer Micros”Race for MachoFLOPSLow level programming challenges
Rise of dataScientific instruments and surveysStorage, management and provenanceData fusion and analysis
Distributed servicesMultidisciplinary collaborationsInteroperability and scalabilityMulti-organizational social engineering
3
Domain IndependentInfrastructure
Domain IndependentInfrastructure
ComplexApplications
ComplexApplications
Domain SpecificInfrastructure
Domain SpecificInfrastructure
Scientific InstrumentPipelines
Scientific InstrumentPipelines
Diverse DisciplinaryData Archives
Diverse DisciplinaryData Archives
LaboratoryComputingLaboratoryComputing
High-endComputing
Infrastructure
High-endComputing
Infrastructure
Community Building
and Outreach
Community Building
and Outreach
4
Insatiable demandCycles, storage, software, support
Distributed acquisition/deploymentSometimes, duplicative, non-shared infrastructure
Distributed cost structures Power, space, staff, staff, hardware
Long-term sustainabilityDecades rather than months/years
The shape of the triangleApex versus mainstream users
5
Petascale/Exascale/…
Mobile/Desktopcomputing
Laboratory clusters
University infrastructure
National infrastructure
Data, data, data
Data, data, data
6
Bulk computing is almost free… but applications and power are not
Inexpensive sensors are ubiquitous… but data fusion remains difficult
Moving lots of data is {still} hard… because we’re missing trans-terabit/second networks
People are really expensive!… and robust software remains extremely labor intensive
Application challenges are increasingly complex … and social engineering is not our forte
Our political/technical approaches must change… or we risk solving irrelevant problems
7
Moore’s “Law” favored consumer commoditiesEconomics drove enormous improvementsSpecialized processors and mainframes falteredThe commodity software industry was born
Today’s economicsManycore processors/acceleratorsSoftware as a service/cloud computingMultidisciplinary data analysis and fusion
They will drive change in technical computingJust as did “killer micros” and inexpensive clusters
8
Manycore
HPC
Clouds
9
Very few users love technology itselfClusters and parallel programmingDistributed services, grids or cloudsData models and databases
Successful technologies are invisibleThey enable but are unobtrusive
Desktop/mobile accelerationSeamlessly accessibleStandard metaphors/tools
Manycore
HPC
Clouds
10
Increasing Abstraction
and Invisibility
11
Scholarly communicationsDomain-specific
services
Instant messaging
Identity
Document store
blogs &social
networking
Notification
Searchbooks
citations
Visualization and analysis services
Storage/data services
ComputeServices and virtualization
Project management
Reference management
Knowledge management
Knowledge discovery
Source: Tony Hey, Microsoft
12
Scale data throughput and store capacityFuse and analyze multidisciplinary data
Scale OnScale On--Demand Demand and Cost Effectivelyand Cost Effectively
Protect against data loss and unauthorized accessAddress failure and disaster scenarios
Ensure Service Continuity Ensure Service Continuity in the Cloudin the Cloud
Enable rapid development of new applications/servicesEasy access to consume multiple data sources
Support Support Emerging Applications Emerging Applications RRapidlyapidly
Hardware and software independenceLower operational cost of managing data
Reduce Infrastructure and Management Costs
13
Infrastructure as a Service
Applications as a Service
Software as a Service
Enables
www.azure.com
15
Hypervisor
Guest Partition (VM)
Host Partition(VM)
Guest Partition(VM)
Hardware
VirtualizationStack(VSP)
Drivers
Host OSServer Core
ApplicationsApplications
RD OSVirtualization
Stack(VSC)
Guest OSServer Enterprise
VirtualizationStack(VSC)
Guest OSServer Enterprise
NICNIC Disk1
Disk1
VMBUS VMBUS VMBUS
Disk2
Disk2 CPUCPU
Azure Services
LoadBalancer
Public Internet
Worker Role(s)
Front-endWeb Role
16
Switches
Highly-availableFabric Controller
Out-of-band communication – hardware control
In-band communication –software control
WS08 Hypervisor
VMVM
VM
Control VM
Service RolesControl Agent
WS08
Node can be a VM or a physical machine
Load-balancers
17
C#
18
Experiments Archives LiteratureSimulations
Many PetabytesDoubling every
2 years
19
Hypothesis-driven“I have an idea, let me verify it.”
Exploratory“What correlations can I glean from everyone’s data?”
Different tools and techniquesExploratory analysis relies on deep data mining
supervised and unsupervised learning“grep” is not a data mining tool
… but an RDBMS really isn't either
Massive, multidisciplinary dataRising rapidly and at unprecedented scale
20
Metagenomics sample 50 roles, speedup 45100 roles, speedup 94
BLAST user selects DBs and
input sequence
BlastWeb Role Input
SplitterWorker
Role
BLASTExecution
Worker Role #n…
.
CombinerWorker Role
GenomeDB 1
GenomeDB K
BLAST DBConfiguration
Azure Blob Storage
BLASTExecution
Worker Role #1
Basic MapReduce- 2 GB database in each
worker role- 500 MB input file
Map reduce-styleParallel BLASTDNA matching
21
Query planLINQ query
LINQ: .NET Language Integrated QueryDeclarative SQL-like programming with C# and Visual StudioEasy expression of data parallelismElegant and unified data model
Dryad
select
where
logs
Automatic query plan generation
Distributed query execution by
Dryad
var logentries =from line in logswhere
!line.StartsWith("#")select new
LogEntry(line);
Source: Yuan Yu et al
22
Each data center is Each data center is ~10X~10X
the size of a the size of a soccer/football soccer/football fieldfield
23
Cooling technologiesOperating points, heat dissipation, …
New packaging technologiesOptoelectronics, memory stacking, …
New storage models/algorithmsSolid state storage
Locality-aware algorithmsThe speed of light is pretty slow
Programming modelsEffective scale-invariant abstractions
Intelligent power managementAdaptation and power down
System adaptation and integrationReliability and power as first class objects
24
Manycore
HPC
Clouds
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it
should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.