Upload
eagan
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Federated Data Stores Volume, Velocity & Variety. Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC http://xrootd.org. Big Data Access & The 3 V’s. Volume Increasing amount of data No single site can host all of the data Velocity - PowerPoint PPT Presentation
Citation preview
Federated Data StoresVolume, Velocity &
Variety
Future of Big Data Management Workshop
Imperial College LondonJune 27-28, 2013
Andrew Hanushevsky, SLAChttp://xrootd.org
June 27-28, 2013 2Workshop On the Future Of Big Data Management
Big Data Access & The 3 V’sVolume Increasing amount of data
No single site can host all of the dataVelocity Increasing number of analysis jobs
No single site can host all of the jobsVariety Increasing number of sites
Introduces many different storage systems
June 27-28, 2013 3Workshop On the Future Of Big Data Management
Data & Access & The World
Data Many places
Complete subsetsSometimes not
Compute Many places
Data co-locatedSometimes not
Data is distribute and many times replicated largely driven by computational needs
June 27-28, 2013 4Workshop On the Future Of Big Data Management
Multiple Sites – Unified ViewReality check… Multiple sites Different administrative domainsHow to logically combine all the storage? Provide storage access across multiple
sites Requires a minimal set of rules
Intersecting security model Promise of minimal service
June 27-28, 2013 5Workshop On the Future Of Big Data Management
Data Storage Federations“A collection of disparate space resources managed by co-operating but independent administrative domains transparently accessible via a common name space.”Unifies storage access Independent of data and compute
location
June 27-28, 2013 6Workshop On the Future Of Big Data Management
A Solution Using XRootD
6
A system for scalable cluster data access
Not a file systemNot just for file systems To handle varietyUsed in HEP and Astrophysics
xrootd
cmsd
May 15-17, 2013 7GoogleIO
XRootD Synergistic ApproachMinimize latency
Minimize hardware requirementsMinimize human costMaximize scaling
Velocity
Volume
VarietyMaximize utility
June 27-28, 2013 8Workshop On the Future Of Big Data Management
Variety Via Plug-In Architecture
8
Storage SystemHDFS gpfs Lustre UFS, …
Authentication
krb5 sss x.509 …
Clustering(cmsd)
Authorization Entity Names
Logical File System
dpm sfs sql …
Protocol
cms http xroot …
Protocol Driver
Any n protocols
June 27-28, 2013 9Workshop On the Future Of Big Data Management
Volume Via B64 Scaling
Private ClusterGCE Ephemeral Storage
SLAC
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
641 = 64
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
642 = 4096
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
643 = 262144
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
644 = 16777216
Manager(Root Node)
Data Server(Leaf Nodes)
Supervisors(Interior Nodes)
xrootd
cmsd
xrootd
cmsd
cmsdxroot
d
June 27-28, 2013 10Workshop On the Future Of Big Data Management
WYSIWYG Scalable Access
redirectopen()redirectopen()
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
641 = 64
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
xrootd
cmsd
642 = 4096
Clientopen()
cmsdxroot
d
Request routing is very different from traditional data management models
June 27-28, 2013 11Workshop On the Future Of Big Data Management
Real World Example (HEP)Federated ATLAS XRootD (FAX)
Independent sites federated by region
global
regional1
endpoint1 endpoint2
regional2
endpoint3
a b
c
c=max(a,b)
Graphic courtesy of Rob Gardner)
June 27-28, 2013 12Workshop On the Future Of Big Data Management
ATLAS FAX Infrastructure (From Rob Gardner)
Provides a global namespaceUnifies dCache, DPM, Lustre/GPFS, Xrootd storage backendsXrootd an efficient protocol for WAN accessMain Fall-back use case in production at many sitesRegional redirection network provides lookup scalability
A powerful capability which must be introduced to production carefully
June 27-28, 2013 13Workshop On the Future Of Big Data Management
HEP DeploymentLHC ALICE Data catalog driven federationLHC ATLAS Regional topologyLHC CMS Uniform topologyLSST (Large Synoptic Sky Telescope) Clusters mySQL servers for parallel queries
June 27-28, 2013 14Workshop On the Future Of Big Data Management
ConclusionFederated storage is key for big data Distributed management + uniform
access Preserves administrative autonomy Inherently scalable
The whole is greater than the sum of its partsXRootD provides flexible federation Addresses volume, velocity, and variety
Three main big data challenges
June 27-28, 2013 15Workshop On the Future Of Big Data Management
AcknowledgementsCurrent Software Contributors ATLAS: Doug Benjamin, Patrick McGuigan, CERN: Lukasz Janyst, Andreas Peters, Justin Salmon Fermi: Tony Johnson JINR: Danila Oleynik, Artem Petrosyan Root: Gerri Ganis, Bertrand Bellenet, Fons Rademakers SLAC: Andrew Hanushevsky, Wilko Kroeger, Daniel Wang, Wei
Yang UCSD: Matevz Tadel UNL: Brian Bockelman WLCG: Fabrizio Furano, David Smith
US Department of Energy Contract DE-AC02-76SF00515 with Stanford University