GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying...

GRNET eScience platform

for Big Data managementCodename: orka

Monday,February1,2016

ProjectVision

• Data-IntensiveScience (storeandprocessbigdata,atPetabyte scale)

• Scientificworkflows• Virtual Research Environment• Datastreaming

• Theproblem:datadeluge• Solution:

– PaaSover• ~okeanos (VM,processing)• Pithos+ (storage)

Bigdata

Hadoopproject

• MostpopularimplementationfortheMapReduceprogrammingparadigm

• Opensource,commodityhardware• Hadoopcore (MapReduce,Hadoopdistributedfilesystem)

• Richecosystem(Pig,Hive,Hbase,manymore)• Researcherfocusesonthealgorithm andnotthesoftwareinstall/maintain/scaleetc.

Hadoopclusterwith~orka

• GUI,CLI,RESTontopof~okeanos to:– Createcluster (withconfigurableoptions) fromarangeofHadoopdistro’s(akaimages)

– Transferyourdata– Submit,execute,monitorjobs– Deletecluster– Start/stop/formatcluster– Scalecluster,add/removenodes– Saveclustercreationmetadataforreproducibility

Hadoopclusterwith~orka

Add-onstobasicHadoop

• Othercomponents&runtimes– Spark

• ApacheHadoop-baseddistro’s– Cloudera– Hue(HDFSexplorer,Ooziewebeditor)

• Storagebackend– Pithosó HDFSconnector(analogoustoAmazonS3FilesystemforHadoop)

ScientificWorkflows

• Orchestrationofatomicjobs• ApacheOozie• ApachePig

– Built-ininorka images

Collaborativescientificresearch

• VirtualResearchEnvironment• Completesystemforteamsandprojects• Components:

– Research/Projecthomepage(portal,wiki)– ProjectManagement– Teleconference– Digitalrepositories

• ImplementedasDockerimages

VirtualResearchEnvironment

Category Software stackPortal/ CMS Drupal (v7.37)Wiki,blog, forum Mediawiki (v1.2.4)Projectmanagement Redmine (v3.04)Webconferencing BigBlueButton (v0.81)Digitalrepositories DSpace (v5.3)

ReproducibleResearch

• Saveyourexperiment’smetadataasabundle• DomainSpecificLanguage (DSL)thatfullydescribesanexperiment/job

• Texteditor=>simple YAMLfile• Re-play,possiblywithdifferentparameters• SavebundletoPithos• Shareyourbundlewithother~okeanosusers

DatastreamsintoHDFS

• ApacheFlume• IntegratedintotheHadoopecosystem• Focusonstreamingdata

High-levelArchitecture

TechnologyStackeScience

Subsystem1[Orka0.1.1]

Back-End

OrkaSubSystem:Techn

ologiesO

verview Front-End

SinglePageApplication(SPA)üHTML5ü CSS3ü EmberJSüBootstrap

CommandLine(CLI)APIüOrkaAPI(Pythonscripts)

WebServerü nginx

ExternalAPIs/Technologiesü Synnefo/kamakiüAuthenticationüHadoop

ü DjangoRESTF/WorkRESTAPI AppServer

ü uWSGI

Supportedalso,(inprogress)ü RabbitMQ,MessageBrokerü CeleryTaskManager

Dataü Postgres DBMS

Currentstate

– github.com/grnet/e-science

– escience.grnet.gr

lambda on demand

λ lambda.grnet.gr 2

Simplifying ComputingThe lambda architecture

a a useful framework to think about designing big data applications

a robust framework for ingesting real-time streams of data while providing efficient stream and batch analytics.

fault-tolerant against both hardware failures and human errors

serves a wide range of use cases, and in which low-latency reads and updates are required

λ: lambda architecture

The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) pre-computing arbitrary query functions, called batch views.

Batch Layer

The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way.

Serving layer

The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.

Speed layer

The Lambda Architecture solves the problem of computing arbitrary functions on arbitrary data in realtime by decomposing the problem into three layers: the batch layer, the serving layer, and the speed layer.

λ: lambda architecturean example

batch layer

master

dataset

serving layer

batch view

real-time view real-time view

speed layer

1data is dispatched to batch and speed layer for processing.

2 precomputes the batch views

3indexes the batch views

4 deals with recent data only.

Any incoming query can be answered by merging results from batch views and real-time views.

okeanos Users

Lambda on demand

service

λ instances

λ layers

λ api

λ λ λ

Speed Batch Speed Batch Speed

Provisioning a λ instance

Based on

λambda UIDashboard, Instances, Applications and help

Create your lambda instances based on your needs. Manage , deploy applications and start your lambda instance.

λ - Instancesmanage lambda instances

Upload your Java or Scala application for streaming and batch jobs. Your applications are stored on the Pithos+ storage service.

Applications manage your applications

Short guides on how to 1) deploy, run and manage your lambda instances. 11) deploy, run and manage your applications 111) export and view your results

Help Informational guides

Experienced UserUse the λambda API

lambda instance

lambda applications

λ - API

create

upload

manage

delete

manage

delete

well documented

Swagger mkdocs doc

e-science vs λUse the λambda API

Lamda λ: focuses on analysing steaming Data

e-Science: focuses on existing data + offers a pre-installed collaborative tools to handle data

Questions ?

GRNET eScience platform for Big Data management · 2016-02-01 · λ lambda.grnet.gr 2 Simplifying...

Documents

GRNET NOC network monitoring & visualization tools

Panos Louridas (GRNET) louridas@grnet - e-IRGe-irg.eu/documents/10920/272330/12+louridas_legal-issues...Panos Louridas (GRNET) louridas@grnet.gr e-IRG Workshop Prague, 15 May 2009

Prof. Nectarios Koziris Vice Chairman, GRNET ICCS, NTUA

GRNET Service Box

A View on eScience

Effective Localization Crowdsourcing · Workflow λ Mark translatable strings, export λ Release string freeze λ Translator: VCS checkout λ Translate w/ specialized tools λ Get

EScience and Particle Physics Roger Barlow eScience showcase May 1 st 2007

WP9 (Future Technologies) 1 st Internal GRNET meeting

Www.see-grid-sci.eu SEE-GRID-SCI Dr. Ognjen Prnjat, GRNET SEE-GRID eInfrastructure for regional eScience EGEE09 conference The SEE-GRID-SCI initiative

eScience Group | Microsoft Research › en-us › research › wp-content › uploads … · External Research eScience Yogesh Simmhan eScience Group | Microsoft Research Catharine

White Box: GRNET Data Centre Use Case

“Grids and eScience” Mark Hayes Technical Director - Cambridge eScience Centre GEFD Summer School 2003

GRNET Strategic Viewpoint on electronic Infrastructures: Research Networks and Grids Prof. Vasilis Maglaris GRNET maglaris@grnet.gr,

End-to-End eScience

Nordic eScience Action Plan

Costas Kotsokalis, GRNET

Anthanassios Liakopoulos - GRNET- IPv6

Archipelago: New Cloud Storage Backend of GRNET

Aus plots escience-brasil

Hs Physsetting Escience Arabic