15
Federated Data Stores Volume, Velocity & Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC http://xrootd.org

Federated Data Stores Volume, Velocity & Variety

  • Upload
    eagan

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Federated Data Stores Volume, Velocity & Variety. Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC http://xrootd.org. Big Data Access & The 3 V’s. Volume Increasing amount of data No single site can host all of the data Velocity - PowerPoint PPT Presentation

Citation preview

Page 1: Federated Data Stores Volume, Velocity & Variety

Federated Data StoresVolume, Velocity &

Variety

Future of Big Data Management Workshop

Imperial College LondonJune 27-28, 2013

Andrew Hanushevsky, SLAChttp://xrootd.org

Page 2: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 2Workshop On the Future Of Big Data Management

Big Data Access & The 3 V’sVolume Increasing amount of data

No single site can host all of the dataVelocity Increasing number of analysis jobs

No single site can host all of the jobsVariety Increasing number of sites

Introduces many different storage systems

Page 3: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 3Workshop On the Future Of Big Data Management

Data & Access & The World

Data Many places

Complete subsetsSometimes not

Compute Many places

Data co-locatedSometimes not

Data is distribute and many times replicated largely driven by computational needs

Page 4: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 4Workshop On the Future Of Big Data Management

Multiple Sites – Unified ViewReality check… Multiple sites Different administrative domainsHow to logically combine all the storage? Provide storage access across multiple

sites Requires a minimal set of rules

Intersecting security model Promise of minimal service

Page 5: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 5Workshop On the Future Of Big Data Management

Data Storage Federations“A collection of disparate space resources managed by co-operating but independent administrative domains transparently accessible via a common name space.”Unifies storage access Independent of data and compute

location

Page 6: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 6Workshop On the Future Of Big Data Management

A Solution Using XRootD

6

A system for scalable cluster data access

Not a file systemNot just for file systems To handle varietyUsed in HEP and Astrophysics

xrootd

cmsd

Page 7: Federated Data Stores Volume, Velocity & Variety

May 15-17, 2013 7GoogleIO

XRootD Synergistic ApproachMinimize latency

Minimize hardware requirementsMinimize human costMaximize scaling

Velocity

Volume

VarietyMaximize utility

Page 8: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 8Workshop On the Future Of Big Data Management

Variety Via Plug-In Architecture

8

Storage SystemHDFS gpfs Lustre UFS, …

Authentication

krb5 sss x.509 …

Clustering(cmsd)

Authorization Entity Names

Logical File System

dpm sfs sql …

Protocol

cms http xroot …

Protocol Driver

Any n protocols

Page 9: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 9Workshop On the Future Of Big Data Management

Volume Via B64 Scaling

Private ClusterGCE Ephemeral Storage

SLAC

xrootd

cmsd

xrootd

cmsd

xrootd

cmsd

641 = 64

xrootd

cmsd

xrootd

cmsd

xrootd

cmsd

xrootd

cmsd

642 = 4096

xrootd

cmsd

xrootd

cmsd

xrootd

cmsd

xrootd

cmsd

643 = 262144

xrootd

cmsd

xrootd

cmsd

xrootd

cmsd

xrootd

cmsd

644 = 16777216

Manager(Root Node)

Data Server(Leaf Nodes)

Supervisors(Interior Nodes)

xrootd

cmsd

xrootd

cmsd

cmsdxroot

d

Page 10: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 10Workshop On the Future Of Big Data Management

WYSIWYG Scalable Access

redirectopen()redirectopen()

xrootd

cmsd

xrootd

cmsd

xrootd

cmsd

641 = 64

xrootd

cmsd

xrootd

cmsd

xrootd

cmsd

xrootd

cmsd

642 = 4096

Clientopen()

cmsdxroot

d

Request routing is very different from traditional data management models

Page 11: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 11Workshop On the Future Of Big Data Management

Real World Example (HEP)Federated ATLAS XRootD (FAX)

Independent sites federated by region

global

regional1

endpoint1 endpoint2

regional2

endpoint3

a b

c

c=max(a,b)

Graphic courtesy of Rob Gardner)

Page 12: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 12Workshop On the Future Of Big Data Management

ATLAS FAX Infrastructure (From Rob Gardner)

Provides a global namespaceUnifies dCache, DPM, Lustre/GPFS, Xrootd storage backendsXrootd an efficient protocol for WAN accessMain Fall-back use case in production at many sitesRegional redirection network provides lookup scalability

A powerful capability which must be introduced to production carefully

Page 13: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 13Workshop On the Future Of Big Data Management

HEP DeploymentLHC ALICE Data catalog driven federationLHC ATLAS Regional topologyLHC CMS Uniform topologyLSST (Large Synoptic Sky Telescope) Clusters mySQL servers for parallel queries

Page 14: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 14Workshop On the Future Of Big Data Management

ConclusionFederated storage is key for big data Distributed management + uniform

access Preserves administrative autonomy Inherently scalable

The whole is greater than the sum of its partsXRootD provides flexible federation Addresses volume, velocity, and variety

Three main big data challenges

Page 15: Federated Data Stores Volume, Velocity & Variety

June 27-28, 2013 15Workshop On the Future Of Big Data Management

AcknowledgementsCurrent Software Contributors ATLAS: Doug Benjamin, Patrick McGuigan, CERN: Lukasz Janyst, Andreas Peters, Justin Salmon Fermi: Tony Johnson JINR: Danila Oleynik, Artem Petrosyan Root: Gerri Ganis, Bertrand Bellenet, Fons Rademakers SLAC: Andrew Hanushevsky, Wilko Kroeger, Daniel Wang, Wei

Yang UCSD: Matevz Tadel UNL: Brian Bockelman WLCG: Fabrizio Furano, David Smith

US Department of Energy Contract DE-AC02-76SF00515 with Stanford University