Upload
princess-kipling
View
216
Download
1
Embed Size (px)
Citation preview
1
High Performance Presentation:5 slides/Minute?
(65 slides / 15 minutes)
IO and DB “stuff” for LSSTA new world record?
Jim Gray
Microsoft Research
2
TerraServer Lessons Learned• Hardware is 5 9’s (with clustering)• Software is 5 9’s (with clustering)• Admin is 4 9’s (offline maintenance)• Network is 3 9’s (mistakes, environment)
• Simple designs are best• 10 TB DB is management limit
1 PB = 100 x 10 TB DBthis is 100x better than 5 years ago.(yahoo!, HotMail are 300TB, Google! Is 2PB)
• Minimize use of tape–Backup to disk (snapshots)–Portable disk TBs
99 9999 9 9 999 9 999 99
3
Serving BIG images• Break into tiles (compressed):
– 10KB for modems– 1MB for LANs
• Mosaic the tiles for pan, crop
• Store image pyramid for zoom– 2x zoom only adds 33% overhead
1 + ¼ + 1/16 + …
• Use a spatial index to cluster & find objects
1.6x1.6 km2 image
.8x.8 km2 image
.4x.4 km2 image
.2x.2 km2 tile
4
Economics
• People are more than 50% of costs
• Disks are more than 50% of capital
• Networking is the other 50% – People– Phone bill– Routers
• Cpus are free (they come with the disks)
5
SkyServer/ SkyQuery Lessons• DB is easy• Search
– It is BEST to index– You can put objects and attributes in a row
(SQL puts big blobs off-page)– If you can’t index, you can extract attributes and quickly compare– SQL can scan at 5M records/cpu/second– Sequential scans are embarrassingly parallel
• Web services are easy• XML Data Sets :
– a universal way to represent answers– minimize round trips: 1 request/response– Diffgrams allow disconnected update
6
How Will We Find Stuff?Put everything in the DB (and index it)
• Need dbms features: Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, replicationIf you don’t use one, you’r creating one!
• Simple logical structure: – Blob and link is all that is inherent– Additional properties (facets == extra tables)
and methods on those tables (encapsulation) • More than a file system • Unifies data and meta-data• Simpler to manage• Easier to subset and reorganize• Set-oriented access• Allows online updates • Automatic indexing, replication
SQLSQL
7
How Do We Represent Data To The Outside World?
• File metaphor too primitive: just a blob• Table metaphor too primitive: just records• Need Metadata describing data context
– Format– Providence (author/publisher/ citations/…)– Rights– History– Related documents
• In a standard format• XML and XML schema• DataSet is great example of this• World is now defining standard schemas
schema
Data ordifgram
<?xml version="1.0" encoding="utf-8" ?>
- <DataSet xmlns="http://WWT.sdss.org/">
- <xs:schema id="radec" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="radec" msdata:IsDataSet="true">
<xs:element name="Table">
<xs:element name="ra" type="xs:double" minOccurs="0" />
<xs:element name="dec" type="xs:double" minOccurs="0" /> …
- <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
- <radec xmlns="">
- <Table diffgr:id="Table1" msdata:rowOrder="0">
<ra>184.028935351008</ra>
<dec>-1.12590950121524</dec>
</Table>
…
- <Table diffgr:id="Table10" msdata:rowOrder="9">
<ra>184.025719033547</ra>
<dec>-1.21795827920186</dec>
</Table>
</radec>
</diffgr:diffgram>
</DataSet>
8
Emerging Concepts
• Standardizing distributed data– Web Services, supported on all platforms– Custom configure remote data dynamically– XML: Extensible Markup Language– SOAP: Simple Object Access Protocol– WSDL: Web Services Description Language– DataSets: Standard representation of an answer
• Standardizing distributed computing– Grid Services– Custom configure remote computing dynamically– Build your own remote computer, and discard– Virtual Data: new data sets on demand
9
Szalay’s Law:The utility of N comparable datasets is N2
• Metcalf’s law applies to telephones, fax, Internet.• Szalay argues as follows:
Each new dataset gives new information2-way combinations give new information.
• Example: Combine these 3 datasets– (ID, zip code)– (ID, birth day)– (ID, height)
• Other example: quark star: Chandra Xray + Hubble optical,+600 year old records..Drake, J. J. et al. Is RX J185635-375 a Quark Star?. Preprint, (2002).
X-ray, optical,
infrared, and radio
views of the nearby Crab
Nebula, which is now in a state of
chaotic expansion after a
supernova explosion first
sighted in 1054 A.D. by Chinese Astronomers.
Crab star 1053 AD
10
Science is hitting a wallFTP and GREP are not adequate
• You can GREP 1 MB in a second• You can GREP 1 GB in a minute • You can GREP 1 TB in 2 days• You can GREP 1 PB in 3 years.
• Oh!, and 1PB ~10,000 disks
• At some point you need indices to limit searchparallel data search and analysis
search and analysis tools• This is where databases can help
• You can FTP 1 MB in 1 sec• You can FTP 1 GB / min (= 1 $/GB)
• … 2 days and 1K$• … 3 years and 1M$
11
Networking: Great hardware & Software
• WANs @ 5GBps (1 = 40 Gbps)
• GbpsEthernet common (~100 MBps)– Offload gives ~2 hz/Byte– Will improve with RDMA & zero-copy
– 10 Gbps mainstream by 2004
• Faster I/O– 1 GB/s today (measured)– 10 GB/s under development– SATA (serial ATA) 150MBps/device
12
Bandwidth: 3x bandwidth/year for 25 more years
• Today: – 40 Gbps per channel (λ)
– 12 channels per fiber (wdm): 500 Gbps
– 32 fibers/bundle = 16 Tbps/bundle
• In lab 3 Tbps/fiber (400 x WDM)
• In theory 25 Tbps per fiber
• 1 Tbps = USA 1996 WAN bisection bandwidth
• Aggregate bandwidth doubles every 8 months!
1 fiber = 25 Tbps1 fiber = 25 Tbps
13
Redmond/Seattle, WA
San Francisco, CA
New York
Arlington, VA
5626 km10 hops
Information Sciences InstituteInformation Sciences InstituteMicrosoftMicrosoft
QwestQwestUniversity of WashingtonUniversity of Washington
Pacific Northwest GigapopPacific Northwest GigapopHSCC HSCC (high speed connectivity consortium)(high speed connectivity consortium)
DARPADARPA
Hero/Guru Networking
14
Real Networking• Bandwidth for 1 Gbps “stunt” cost 400k$/month
– ~ 200$/Mbps/m (at each end + hardware + admin)– Price not improving very fast– Doesn’t include operations / local hardware costs
• Admin… costs more ~1$/GB to 10$/GB• Challenge: Go home and FTP from a “fast”server• The Guru Gap: FermiLab <-> JHU
– Both “well connected”– vBNS, NGI, Internet2, Abilene,….– Actual desktop-to-desktop ~ 100KBps– 12 days/TB (but it crashes first).
• The reality: to move 10GB, mail it! TeraScale Sneakernet
15
How Do You Move A Terabyte?
14 minutes6172001,920,0009600OC 192
2.2 hours1000Gbps
1 day100100 Mpbs
14 hours97631649,000155OC3
2 days2,01065128,00043T3
2 months2,4698001,2001.5T1
5 months360117700.6Home DSL
6 years3,0861,000400.04Home phone
Time/TB$/TBSent
$/MbpsRent
$/monthSpeedMbps
Context
Source: TeraScale Sneakernet, Microsoft Research, Jim Gray et. all Source: TeraScale Sneakernet, Microsoft Research, Jim Gray et. all
16
There Is A Problem
• GREAT!!!!– XML documents are portable objects– XML documents are complex objects– WSDL defines the methods on objects
(the class)
• But will all the implementations match?– Think of UNIX or SQL or C or…
• This is a work in progress.
Niklaus Wirth: Niklaus Wirth: Algorithms + Data Structures = ProgramsAlgorithms + Data Structures = Programs
17
Changes To DBMS’s
• Integration of Programs and Data– Put programs inside the database
allows OODB– Gives you parallel execution
• Integration of Relational, Text, XML, Time• Scaleout (even more)• AutoAdmin (“no knobs”)• Manage Petascale databases
(utilities, geoplex, online, incremental)
18
Publishing Data
Roles
Authors
Publishers
Curators
Archives
Consumers
Traditional
Scientists
Journals
Libraries
Archives
Scientists
Emerging
Collaborations
Project web site
Data+Doc Archives
Digital Archives
Scientists
19
The Core Problem: No Economic Model
• The archive user has not yet been born. How can he pay you to curate the data?
• The Scientist gathered data for his own purposeWhy should he pay (invest time) for your needs?
• Answer to both: that’s the scientific method
• Curating data (documenting the design, the acquisition and the processing)Is very hard and there is no reward for doing it.The results are rewarded, not the process of getting them.
• Storage/archive NOT the problem (it’s almost free)
• Curating/Publishing is expensive.
20
SDSS Data Inflation – Data Pyramid
• Level 1AGrows 5TB pixels/year growing to 25TB~ 2 TB/y compressed growing to 13TB~ 4 TB today (level 1A in NASA terms)
• Level 2Derived data products ~10x smaller But there are many catalogs.
• Publish new edition each year – Fixes bugs in data.
– Must preserve old editions
– Creates data pyramid
• Store each edition – 1, 2, 3, 4… N ~ N2 bytes
• Net: Data Inflation: L2 ≥ L1
E1
E2
E3E4
4 editions oflevel 1A data(source data)
4 editions of level 2 derived data products. Note that each derived product is small, but they are numerous. This proliferation combined with the data pyramid implies that level2 data more than doubles the total storage volume.
time
Level 1A 4 editions of Level 2 products
21
What’s needed?(not drawn to scale)
Science Data & Questions
Scientists
DatabaseTo store
dataExecuteQueries
Plumbers
Data Mining
Algorithms
Miners
Question & AnswerVisualizat
ion
Tools
22
CS Challenges For Astronomers
• Objectify your field:– Precisely define what you are talking about.– Objects and Methods / Attributes– This is REALLY difficult.– UCDs are a great start but, there is a long way to go
• “Software is like entropy, it always increases.” -- Norman Augustine, Augustine’s Laws– Beware of legacy software – cost can eat you alive– Share software where possible.– Use standard software where possible.– Expect it will cost you 25% to 40% of project.
• Explain what you want to do with the VO– 20 queries or something like that.
Science Data & Questions
Scientists
23
Challenge to Data Miners: Linear and Sub-Linear Algorithms
• Today most correlation / clustering algorithmsare polynomial N2 or N3 or…
• N2 is VERY big when N is big (1018 is big)
• Need sub-linear algorithms
• Current approaches are near optimal given current assumptions.
• So, need new assumptionsprobably heuristic and approximate
Data MiningAlgorit
hms
Miners
Techniques
24
Challenge to Data Miners: Rediscover Astronomy
• Astronomy needs deep understanding of physics.
• But, some was discovered as variable correlations then “explained” with physics.
• Famous example: Hertzsprung-Russell Diagramstar luminosity vs color (=temperature)
• Challenge 1 (the student test): How much of astronomy can data mining discover?
• Challenge 2 (the Turing test):Can data mining discover NEW correlations?
Data MiningAlgorit
hms
Miners
25
Plumbers: Organize and Search Petabytes
• Automate – instrument-to-archive pipelines
It is is a messy business – very labor intensiveMost current designs do not scale (too many manual steps)BaBar (1TB/day) and ESO pipeline seem promising.A job-scheduling or workflow system
– Physical Database design & access• Data access patterns are difficult to anticipate • Aggressively and automatically use indexing, sub-setting.• Search in parallel
• Goals– Answer easy queries in 10 seconds.– Answer hard queries (correlations) in 10 minutes.
Database
To store data
ExecuteQueries
Plumbers
26
Scale UP
Scaleable Systems
• Scale UP: grow by adding components to a single system.
• Scale Out: grow by adding more systems.
Scale OUT
27
What’s New – Scale Up
• 64 bit & TB size main memory
• SMP on chip: everything’s smp
• 32… 256 SMP: locality/affinity matters
• TB size disks
• High-speed LANs
28
Who needs 64-bit addressing?You! Need 64-bit addressing!
• 640K ought to be enough for anybody. Bill Gates, 1981
• But that was 21 years ago == 221/3 = 14 bits ago.
• 20 bits + 14 bits = 34 bits so.. 16GB ought to be enough for anybody Jim Gray, 2002
• 34 bits > 31 bits so…34 bits == 64 bits
• YOU need 64 bit addressing!
29
64 bit – Why bother?• 1966 Moore’s law:
4x more RAM every 3 years. 1 bit of addressing every 18 months
• 36 years later: 236/3 = 24 more bits Not exactly right, but…
32 bits not enough for servers32 bits gives no headroom for clients
So, time is running out ( has run out )• Good news:
Itanium™ and Hammer™ are maturingAnd so is the base software (OS, drivers, DB, Web,...)
Windows & SQL @ 256GB today!
30
64 bit – why bother?• Memory intensive calculations:
– You can trade memory for IO and processing
• Example: Data Analysis & Clustering a JHU• in memory CPU time is
~NlogN , N ~ 100M• Disk M chunks
→ time ~ M2
• must run many times• Now running on
HP Itanium Windows.Net Server 2003 SQL Server
Graph courtesy of Alex Szalay & Adrian Pope of Johns Hopkins University
Memory in GB
1.0
10.0
100.0
1000.0
10000.0
100000.0
0 10 20 30 40 50 60 70 80 90 100
No of galaxies in Millions
CPU
time
(hrs
)
1
4
32
256
year
decade
week
day
month
31
Amdahl’s balanced System Laws• 1 mips needs 4 MB ram and needs 20 IO/s • At 1 billion instructions per second
need 4 GB/cpuneed 50 disks/cpu!
• 64 cpus … 3,000 disks
1 bips1 bipscpucpu4 GB4 GB
RAMRAM 50 disks50 disks10,000 IOps10,000 IOps
7.5 TB7.5 TB
32
The 5 Minute Rule – Trade RAM for Disk Arms
• If data re-referenced every 5 minutes It is cheaper to cache it in ram than to get it from disk
A disk access/second ~ 50$ or ~ 50MB for 1 second or ~ 50KB for 1,000 seconds.
• Each app has a memory “knee” Up to the knee, more memory helps a lot.
33
Three TPC Benchmarks: GBs help a LOT!
even if cpu clock is slower
0
25,000
50,000
75,000
100,000
4x1.6Ghz IA32+8GB 4x1.6Ghz IA32+32GB 4x1Ghz Itanium 2 +48GB
Tra
ns
ac
tio
ns
Pe
r S
ec
on
d
64 bit Reduces IO, saves disks• Large memory reduces IO• 64-bit simplifies code• Processors can be faster (wider word)• Ram is cheap (4 GB ~ 1k$ to 20k$)• Can trade ram for disk IO • Better response time.• Example
– tpcC • 4x1Ghz Itanium2 vs • 4x1.6Ghz IA32 • 40 extra GB
→ 60% extra throughput
4x1.6GhzIA328GB
4x1 GhzIA6448GB
4x1.6GhzIA3232GB
34
AMD Hammer™ Coming Soon• AMD Hammer™ is 64bit capable• 2003: millions of Hammer™ CPUs will ship • 2004: most AMD CPUs will be 64bit • 4GB ram is less than 1,000$ today
less than 500$ in 2004• Desktops (Hammer™)
and servers (Opteron™).• You do the math,…
Who will demand 64bit capable software?
35
A 1TB Main Memory • Amdahl’s law: 1mips/MB , now 1:5
so ~20 x 10 Ghz cpus need 1TB ram• 1TB ram ~ 250k$ … 2m$ today
~ 25k$ … 200k$ in 5 years• 128 million pages
– Takes a LONG time to fill– Takes a LONG time to refill
• Needs new algorithms • Needs parallel processing• Which leads us to…
– The memory hierarchy– smp – numa
36
• If cpu is always waiting for memoryPredict memory requests and prefetch– done
• If cpu still always waiting for memoryMulti-program it (multiple hardware threads per cpu) – Hyper Threading: Everything is SMP– 2 now more later– Also multiple cpus/chip
• If your program is single threaded– You waste ½ the cpu and memory bandwidth– Eventually waste 80%
• App builders need to plan for threads.
Hyper-Threading: SMP on chip
37
The Memory Hierarchy• Locality REALLY matters• CPU 2 G hz, RAM at 5 Mhz
RAM is no longer random access.• Organizing the code gives 3x (or more)• Organizing the data gives 3x (or more)
• Level latency (clocks) size• Registers 1 1 KB• L1 2 32 KB• L2 10 256 KB• L3 30 4 MB• Near RAM 100 16 GB• Far RAM 300 64 GB
38
RAM
Off chip
Icache
Arithmatic Logical Unit
Dcache
L2 cache
The Bus
Remote cache
DiskNetwork
Other Cpus
Other Cpus
Other Cpus
Other Cpus
registers
L1 cache
Remote RAM Remote RAM
39
Scaleup Systems Non-Uniform Memory Architecture (NUMA)
Coherent but… remote memory is even slower
All cells see a common memory
Slow local main memory Slower remote main memory
Scaleup by adding cellsScaleup by adding cells
Planning for 64 cpu, 1TB ram Planning for 64 cpu, 1TB ram
Interconnect, Interconnect, Service Processor, Service Processor, Partition management Partition management are vendor specificare vendor specific
Several vendors doing thisSeveral vendors doing thisItanium and HammerItanium and HammerSystem interconnect System interconnect
Crossbar/SwitchCrossbar/Switch
Partition Partition managermanager
Config DBConfig DB
CPUCPU CPUCPUCPUCPU CPUCPU
MemMem MemMemMemMem MemMem
I/OI/O ChipsetChipset
CPUCPU CPUCPUCPUCPU CPUCPU
MemMem MemMemMemMem MemMem
I/OI/O ChipsetChipset
CPUCPU CPUCPUCPUCPU CPUCPU
MemMem MemMemMemMem MemMem
I/OI/O ChipsetChipset
CPUCPU CPUCPUCPUCPU CPUCPU
MemMem MemMemMemMem MemMem
I/OI/O ChipsetChipset
Service Service ProcessorProcessor
Service Service ProcessorProcessor
40
Changed Ratios Matter
• If everything changes by 2x, Then nothing changes.
• So, it is the different rates that matter.Improving FAST Improving FAST
CPU speedCPU speed
Memory & disk sizeMemory & disk size
Network BandwidthNetwork Bandwidth
Slowly changing Slowly changing
Speed of lightSpeed of light
People costsPeople costs
Memory bandwidthMemory bandwidth
WAN pricesWAN prices
41
Disks are becoming tapes• Capacity:
– 150 GB now, 300 GB this year, 1 TB by 2007
• Bandwidth:– 40 MBps now
150 MBps by 2007
• Read time – 2 hours sequential, 2 days random now
4 hours sequential, 12 days random by 2007
150 IO/s 40 MBps150 IO/s 40 MBps
150 GB150 GB
200 IO/s 150 MBps200 IO/s 150 MBps
1 TB1 TB
42
Disks are becoming tapesConsequences
• Use most disk capacity for archivingCopy on Write (COW) file system in Windows and other OSs.
• RAID10 saves arms, costs space (OK!).• Backup to disk
Pretend it is a 100GB disk + 1 TB disk– Keep hot 10% of data on fastest part of disk.– Keep cold 90% on colder part of disk
• Organize computations to read/write disks sequentially in large blocks.
43
Wiring is going serial and getting FAST!
• Gbps Ethernet and SATA built into chips
• Raid Controllers: inexpensive and fast.
• 1U storage bricks @ 2-10 TB
• SAN or NAS (iSCSI or CIFS/DAFS)
Enet
100MBps/link
8xSATA
150M
Bps/lin
k
44
NAS – SAN Horse Race• Storage Hardware 1k$/TB/y
Storage Management 10k$...300k$/TB/y
• So as with Server ConsolidationStorage Consolidation
• Two styles: NAS (Network Attached Storage) File Server
SAN (System Area Network) Disk Server
• I believe NAS is more manageable.
45
SAN/NAS Evolution
ModularModular
MonolithicMonolithic
SealedSealed
46
IO ThroughputK Access Per Second Vs. RPM
Kaps vs. RPMKaps vs. RPM
00 50005000 1500015000 20000200001000010000
KapsKaps
00
4040
8080
120120
200200
160160
47
Comparison Of Disk Cost$’s for similar performance
Seagate Disk Prices*Seagate Disk Prices*
*Source: Seagate online store, quantity one prices*Source: Seagate online store, quantity one prices
$29.7$455Fibre15K RPM36.7 GBX15 36LP
$29.7$455SCSI15K RPM36.7 GBX15 36LP
$32.5$325SCSI10K RPM36.7 GB36 ES 2
$14.0$101ATA7200 RPM40 GBATA 1000
$15.9$86ATA5400 RPM40 GBATA 100
$/K RevCostConnect.SpeedSizeModel #
48
Comparison Of Disk Costs ¢/MB for different systems
Seagate 6.4¢$1155Int SCSI181 GB
WD 2.3¢$276Ext. ATA120 GB
Dell 1.4¢$115Int. ATA80 GB
Cost/MBCostTypeSizeMfg.
EMC xx¢SANXX GB
Source: DellSource: Dell
49
Why Serial ATA Matters
• Modern interconnect
• Point-to-point drive connection
– 150Mbs –> 300Mbs
• Facilitates ATA disk arrays
• Enables inexpensive“cool” storage
50
Performance (on Y2k SDSS data)
time vs queryID
1
10
100
1000
Q08 Q01 Q09 Q10A Q19 Q12 Q10 Q20 Q16 Q02 Q13 Q04 Q06 Q11 Q15B Q17 Q07 Q14 Q15A Q05 Q03 Q18
seco
nd
s cpu
elapsedae
• Run times: on 15k$ HP Server (2 cpu, 1 GB , 8 disk)
• Some take 10 minutes• Some take 1 minute • Median ~ 22 sec. • Ghz processors are fast!
– (10 mips/IO, 200 ins/byte)– 2.5 m rec/s/cpu
cpu vs IO
1E+0
1E+1
1E+2
1E+3
1E+4
1E+5
1E+6
1E+7
0.01 0.1 1. 10. 100. 1,000.CPU sec
IO c
ount 1,000 IOs/cpu sec
~1,000 IO/cpu sec ~ 64 MB IO/cpu sec
51
NVO: How Will It Work?
• Define commonly used `atomic’ services
• Build higher level toolboxes/portals on top
• We do not build `everything for everybody’
• Use the 90-10 rule:– Define the standards and interfaces– Build the framework– Build the 10% of services
that are used by 90%– Let the users build the rest
from the components
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5
# of services# o
f u
sers
52
Federation
Data Federations of Web Services• Massive datasets live near their owners:
– Near the instrument’s software pipeline– Near the applications– Near data knowledge and curation– Super Computer centers become Super Data Centers
• Each Archive publishes a web service– Schema: documents the data– Methods on objects (queries)
• Scientists get “personalized” extracts
• Uniform access to multiple Archives– A common global schema
53
Grid and Web Services Synergy• I believe the Grid will be many web services
share data (computrons are free)
• IETF standards Provide – Naming– Authorization / Security / Privacy– Distributed Objects
Discovery, Definition, Invocation, Object Model
– Higher level services: workflow, transactions, DB,..
• Synergy: commercial Internet & Grid tools
54
Web Services: The Key?• Web SERVER:
– Given a url + parameters – Returns a web page (often dynamic)
• Web SERVICE:– Given a XML document (soap msg)– Returns an XML document– Tools make this look like an RPC.
• F(x,y,z) returns (u, v, w)
– Distributed objects for the web.– + naming, discovery, security,..
• Internet-scale distributed computing
Yourprogram
DataIn your address
space
Web Service
soap
object
in
xml
Yourprogram Web
Server
http
Web
page
55
Grid?
• Harvesting spare cpu cycles is not important– They are “free” (1$/cpu day)– They need applications and data (which are not free)
(1$/GB shipped)
• Accessing distributed data IS important– Send the programs to the data– Send the questions to the databases.
• Super Computer Centers becomeSuper Data Centers
Super Application Centers
56
The Grid: Foster & Kesselman (Argonne National Laboratory)
Internet computing and GRID technologies promise to change the way we tackle complex problems. They will enable large-scale aggregation and sharing of computational, data and other resources across institutional boundaries …. Transform scientific disciplines ranging from high energy physics to the life sciences
57
Grid/Globus
• Leader of the pack for GRID middleware
• Layered software toolkit– 1: Grid Fabric (OS, TCP)
– 2: Grid ServicesGlobus Resource Allocation ManagerGlobus Information Service (meta-computing directory
service)Grid Security InfrastructureGridFTP
– 3: Application ToolkitsJob submissionMPICH-G2 message passing interface
– 4:Specific ApplicationsOVERFLOW Navier-Stokes flow solver
58
Globus in gory detailSHELL SCRIPTSglobus-mds-search '(&(hn=denali.mcs.anl.gov)
(objectclass=GlobusSystemDynamicInformation))' cpuload1 |\
sed -n -e '/^hn=/p' -e '/^cpuload1=/p' |\ sed -e 's/,.*$//' -e 's/=/ /g' |\ awk '/^hn/{printf "%s", $2} /^cpuload/{printf
" %s\n", $2}‘
if [ $# -eq 0 ]; then echo "provide argument <number of processes to
start>" 1>&2 exit 1fiif [ -z "$GRAMCONTACT" ] ; then GRAMCONTACT="`globus-hostname2contacts -type
fork pitcairn.mcs.anl.gov`"fipwd=`/bin/pwd`rsl="&(executable=${pwd}/myjobtest)(count=$1)"arch=`${GLOBUS_INSTALL_PATH}/sbin/config.guess`${GLOBUS_INSTALL_PATH}/tools/${arch}/bin/globusrun
-o -r "${GRAMCONTACT}" "${rsl}"
LIBRARIES/* get process id and hostname */
pid = getpid();
rc = globus_libc_gethostname(hn, 256);
globus_assert(rc == GLOBUS_SUCCESS);
/* get current time and convert to string format. setting [25] to zero will strip the newline character. */
mytime = time(GLOBUS_NULL);
timestr = globus_libc_ctime_r( &mytime, buf, 30 );
timestr[25] = '\0';
globus_libc_printf("%s : process %d on %s came to \ life\n",timestr, pid, hn);
/*THE BARRIER!!! */
globus_duroc_runtime_barrier();
/*Passed the barrier: get current time again and print it out.*/
mytime = time(GLOBUS_NULL);
timestr = globus_libc_ctime_r( &mytime, buf, 30 );
globus_libc_printf("%s : process %d on %s passed \the barrier\n", timestr, pid, hn);
/*TODO 1: get the layout of the DUROC job using first globus_duroc_runtime_intra_subjob_rank() and then globus_duroc_runtime_inter_subjob_structure(). */
/* We are done.*/
rc = globus_module_deactivate_all();
globus_assert(rc == GLOBUS_SUCCESS);
return 0;
59
Shielding Users
• Users do not want to deal with XML,they want their data
• Users do not want to deal with configuring grid computing, they want results
• SOAP: data appears in user memory, XML is invisible
• SOAP call: just a remote procedure
60
Atomic Services
• Metadata information about resources– Waveband– Sky coverage– Translation of names to universal dictionary (UCD)
• Simple search patterns on the resources– Cone Search– Image mosaic– Unit conversions
• Simple filtering, counting, histogramming• On-the-fly recalibrations
61
Higher Level Services
• Built on Atomic Services• Perform more complex tasks• Examples
– Automated resource discovery– Cross-identifications– Photometric redshifts– Outlier detections– Visualization facilities
• Expectation:– Build custom portals in matter of days from existing building
blocks (like today in IRAF or IDL)
62
SkyQuery• Distributed Query tool using a set of
services
• Feasibility study, built in 6 weeks from scratch– Tanu Malik (JHU CS grad student) – Tamas Budavari (JHU astro postdoc)
• Implemented in C# and .NET
• Won 2nd prize of Microsoft XML Contest
• Allows queries like:
SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o,
TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5
AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2
63
ArchitectureImage cutout
SkyNodeSDSS
SkyNode2Mass
SkyNodeFirst
SkyQuery
Web Page
64
Cross-id Steps
• Parse query• Get counts• Sort by counts• Make plan• Cross-match
– Recursively, from small to large
• Select necessary attributes only• Return output• Insert cutout image
SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o,
TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5 AND AREA(181.3,-0.76,6.5) AND (o.i - t.m_j) > 2 AND o.type=3
65
Show Cutout Web Service