The Data Logistics Toolkit
Martin Swany
Professor, School of Informatics and ComputingExecutive Associate Director, Center for Research in Extreme
Scale Computing (CREST)Indiana University
The Data Logistics Toolkit
• Logistics - the management of the flow of resources from the point of origin to the point of consumption
• The DLT integrates local and distributed storage infrastructure, file transfer software, performance monitoring and tuning
• The DLT software distribution supports the creation of network- optimized data nodes
DLT Overview
• Set of packages with configuration scripts, etc.
• Allows the configuration of – DTN with GridFTP– IBP storage depot for content distribution– Phoebus WAN accelerator– On-ramp for Internet2 AL2S using XSP
• Includes Periscope/perfSONAR monitoring• Automatic network tuning
DTN with AL2S On-Ramp
• Working with the Globus team at U. Chicago and Argonne
• Leveraging our eXtensible Session Protocol (XSP) to create end-to-end, “sessions”– user-network interface (UNI)
• XSP daemon acts as network controller– signals AL2S/OESS, OSCARS, OpenFlow
• GridFTP XIO driver, updating to use the Globus Transfer Network Controller API
• Generic, transparent on-ramp to circuit networks like AL2S
WAN Acceleration
• A key reason the Science DMZ model “works” is the separation of lossy access networks from high-bandwidth, long-latency links
• Termination of TCP connections in “middleboxes” can increase throughput by reducing the RTT
• Protocol translation
• Storage in the network to buffer and burst
Distributed Storage for Content Distribution
• IBP provides a primitive, scalable, in-network storage service
• File-like abstractions can be built on top of this• Uses a data structure known as an exNode (like
a Unix inode) to track allocations• These basic building blocks can be used to build
various instances– Parallel filesystem– Distributed RAID-like storage– Content distribution network– Bittorrent-like peer to peer transfers
Architecture• Unified Network Information Service (UNIS)
– Descendant of perfSONAR Lookup and Topology Services– Network and service “graph”
• Intelligent Data Movement Service (IDMS)– Data dispatcher– Operates on UNIS data– Spawn storage services dynamically in GENI
• Periscope/perfSONAR– Monitoring for operational integrity and optimization, BLiPP
• Storage Services– IBP, prototype based on Ceph
• Other services– Data transfer (GridFTP), WAN acceleration
Earth Observation Depot Network (EODN) –An open, community specific content distribution
network for remote sensing data
Landsat data• Landsat 8 launched February 13th, 2013
• Covers the entire land surface of the Earth every 16 days – 8 day offset from Landsat 7
– ~700 scenes each day
• Each scene contains a GeoTIFF product: high-resolution sensor images
– ~1GB compressed, 2GB uncompressed
• Traditionally used for environmental monitoring and land use and land cover change studies
EODN
Client
EODN (DLT) WISC
IUNYSER
MIZZ
RealEarthUW-Madison
UNISDMS
discover / measure
(3) stage sensing data
(2) harvest
(6) Processing…
(7) WMS upload
(5) fast download
EODNHarvester
(1) subscribe
(4) publish
webGUI
Landsat Ground Network
Cisco Appliance Platform
• In collaboration with Internet2, Cisco and Fusion-io
• Cisco C220 server– 2x Intel® Xeon® E5-2680, 16 cores@4GHz, 64GB DDR3 RAM– Fusion-io ioDrive2 1.2 TB
• CentOS 6.4 Linux with DLT RPMs and tuning for data transfer throughput
12
Acknowledgements
• Staff Scientist Dr. Ezra Kissel leads the DLT development efforts, PI of the GENI IDMS effort
• CC-NIE integration project with U. Tennessee and Vanderbilt U.
• CC-NIE integration project with the Globus team at U. Chicago and Argonne Nat’l Lab
• EODN development with AmericaView, U. Wisconsin