24
GRNET eScience platform for Big Data management Codename: orka Monday, February 1, 2016

GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

GRNET eScience platform

for Big Data managementCodename: orka

Monday,February1,2016

Page 2: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

ProjectVision

• Data-IntensiveScience (storeandprocessbigdata,atPetabyte scale)

• Scientificworkflows• Virtual Research Environment• Datastreaming

Page 3: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

• Theproblem:datadeluge• Solution:

– PaaSover• ~okeanos (VM,processing)• Pithos+ (storage)

Bigdata

Page 4: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

Hadoopproject

• MostpopularimplementationfortheMapReduceprogrammingparadigm

• Opensource,commodityhardware• Hadoopcore (MapReduce,Hadoopdistributedfilesystem)

• Richecosystem(Pig,Hive,Hbase,manymore)• Researcherfocusesonthealgorithm andnotthesoftwareinstall/maintain/scaleetc.

Page 5: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

Hadoopclusterwith~orka

• GUI,CLI,RESTontopof~okeanos to:– Createcluster (withconfigurableoptions) fromarangeofHadoopdistro’s(akaimages)

– Transferyourdata– Submit,execute,monitorjobs– Deletecluster– Start/stop/formatcluster– Scalecluster,add/removenodes– Saveclustercreationmetadataforreproducibility

Page 6: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

Hadoopclusterwith~orka

Page 7: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

Add-onstobasicHadoop

• Othercomponents&runtimes– Spark

• ApacheHadoop-baseddistro’s– Cloudera– Hue(HDFSexplorer,Ooziewebeditor)

• Storagebackend– Pithosó HDFSconnector(analogoustoAmazonS3FilesystemforHadoop)

Page 8: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

ScientificWorkflows

• Orchestrationofatomicjobs• ApacheOozie• ApachePig

– Built-ininorka images

Page 9: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

Collaborativescientificresearch

• VirtualResearchEnvironment• Completesystemforteamsandprojects• Components:

– Research/Projecthomepage(portal,wiki)– ProjectManagement– Teleconference– Digitalrepositories

• ImplementedasDockerimages

Page 10: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

VirtualResearchEnvironment

Category Software stackPortal/ CMS Drupal (v7.37)Wiki,blog, forum Mediawiki (v1.2.4)Projectmanagement Redmine (v3.04)Webconferencing BigBlueButton (v0.81)Digitalrepositories DSpace (v5.3)

Page 11: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

ReproducibleResearch

• Saveyourexperiment’smetadataasabundle• DomainSpecificLanguage (DSL)thatfullydescribesanexperiment/job

• Texteditor=>simple YAMLfile• Re-play,possiblywithdifferentparameters• SavebundletoPithos• Shareyourbundlewithother~okeanosusers

Page 12: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

DatastreamsintoHDFS

• ApacheFlume• IntegratedintotheHadoopecosystem• Focusonstreamingdata

Page 13: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

High-levelArchitecture

Page 14: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

TechnologyStackeScience

Subsystem1[Orka0.1.1]

Back-End

OrkaSubSystem:Techn

ologiesO

verview Front-End

SinglePageApplication(SPA)üHTML5ü CSS3ü EmberJSüBootstrap

CommandLine(CLI)APIüOrkaAPI(Pythonscripts)

WebServerü nginx

ExternalAPIs/Technologiesü Synnefo/kamakiüAuthenticationüHadoop

ü DjangoRESTF/WorkRESTAPI AppServer

ü uWSGI

Supportedalso,(inprogress)ü RabbitMQ,MessageBrokerü CeleryTaskManager

Dataü Postgres DBMS

Page 15: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

Currentstate

– github.com/grnet/e-science

– escience.grnet.gr

Page 16: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

lambda on demand

λ

Page 17: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

λ lambda.grnet.gr 2

Simplifying ComputingThe lambda architecture

a a useful framework to think about designing big data applications

a robust framework for ingesting real-time streams of data while providing efficient stream and batch analytics.

fault-tolerant against both hardware failures and human errors

b

c

serves a wide range of use cases, and in which low-latency reads and updates are required

d

Page 18: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

λ lambda.grnet.gr 3

λ: lambda architecture

The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) pre-computing arbitrary query functions, called batch views.

Batch Layer

The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way.

Serving layer

The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.

Speed layer

The Lambda Architecture solves the problem of computing arbitrary functions on arbitrary data in realtime by decomposing the problem into three layers: the batch layer, the serving layer, and the speed layer.

Page 19: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

λ lambda.grnet.gr 4

data

λ: lambda architecturean example

data

batch layer

master

dataset

serving layer

batch view

batch view

real-time view real-time view

speed layer

query

query

1

23

4

5

data

1data is dispatched to batch and speed layer for processing.

2 precomputes the batch views

3indexes the batch views

4 deals with recent data only.

Any incoming query can be answered by merging results from batch views and real-time views.

5

Page 20: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

λ lambda.grnet.gr 5

okeanos Users

Lambda on demand

service

λ instances

λ layers

λ api

λ λ λ

Speed Batch Speed Batch Speed

Provisioning a λ instance

Based on

Page 21: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

λ lambda.grnet.gr 6

λambda UIDashboard, Instances, Applications and help

Create your lambda instances based on your needs. Manage , deploy applications and start your lambda instance.

λ - Instancesmanage lambda instances

Upload your Java or Scala application for streaming and batch jobs. Your applications are stored on the Pithos+ storage service.

Applications manage your applications

Short guides on how to 1) deploy, run and manage your lambda instances. 11) deploy, run and manage your applications 111) export and view your results

Help Informational guides

λ?

app

Page 22: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

λ lambda.grnet.gr 7

Experienced UserUse the λambda API

lambda instance

lambda applications

λ - API

create

upload

manage

delete

manage

delete

well documented

with

Swagger mkdocs doc

Page 23: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

λ lambda.grnet.gr 8

e-science vs λUse the λambda API

Lamda λ: focuses on analysing steaming Data

e-Science: focuses on existing data + offers a pre-installed collaborative tools to handle data

Page 24: GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying Computing The lambda architecture a a useful framework to think about designing

λ lambda.grnet.gr 9

Questions ?