Scaling and Managing Big Data Apps in the Cloud

Preview:

DESCRIPTION

Title: Scaling and Managing Big Data Apps on Public Clouds Abstract: The massive computing and storage resources that are needed to support big data applications make on-demand, elastic cloud environments an ideal fit. However, managing your big data app on the cloud is no walk in the park - configuration, orchestration, H/A, auto-scaling are all quite complex when it comes to choosing the right cloud for you, whether it’s public, private or a hybrid cloud - which is where Cloudify and Eucalyptus come together. In this session, you'll learn how to deploy, manage, monitor and scale your big data apps on the open source Eucalyptus cloud platform using Cloudify, as well as easily test drive your apps locally and then migrate the workload to Amazon Web Services EC2.

Citation preview

Big Data In the

Cloud@natishalom

2

About GigaSpaces

Managing Big Data on the Cloud

100s of Enterprise Customers

My Data Out of My Hands..

No Way!

4

The Reality of Big Data..

2.7 ZB

0.5 Petabytes

66%

Global Digital Data

Two years tweets

Plan to use Big Data/Cloud

43% think that data

analytics could be improved in their organization if data analytics was part of

cloud services

Large ISV Case Study

• Application– Call Center surveillance

• Background– Previously – voice data

• Goal for a new system– Monitor data & voice– Multiple data sources – Advanced correlations

The Challenges...

Ever Growing Data

Deeper Correlation

Tight Performance

A Classic Case for...

A Typical Big Data System…

The Challenge

Cost Business Impact

Lower Margins

Competiveness

Time to Market

Customer Satisfaction

Infrastructure

Operational

The Solution Big Data

in the Cloud

Big Data in the Cloud - 3 Reasons

• Skills– Do you really need/want this all in-

house?• Huge amounts of external data – Does it make sense to move and

manage all this data behind your firewall?

• Focus on the value of your data– Instead of big data management

Holger Kisker

Managing Big Data on the

Cloud

• Auto start VMs• Install and configure

app components • Monitor • Repair • (Auto) Scale• Burst…

Big Data in the Cloud...

Reduce the Infrastructure Cost

Choose the Right Cloud for the Job

Use Eucalyptus for private data , AWS for sporadic workloads..

Big Data in the Cloud...

Reducing the Operational Complexity

• Consistent Management

• Automation Through the Entire Stack

Let’s Take a Closer Look…

Consistent Management

Portability

Automation

16

Consistent ManagementRecipes consistent description for running any app:

What middleware services to run Dependencies between services How to install services Where application and service binaries are When to spawn or terminate instances How to monitor each of the services

The Right Cloud for the Job (Cloud

Portability)

18

Choosing the Right Cloud for the Jobcompute { template "SMALL_LINUX"}

SMALL_LINUX : template imageId "us-east-1/ami-76f0061f“ remoteDirectory "/home/ec2-user/gs-files“ machineMemoryMB 1600 hardwareId "m1.small" locationId "us-east-1" localDirectory "upload" keyFile "myKeyFile.pem"

options ([ "securityGroups" : ["default"]as

String[], "keyPair" : "myKeyFile"])

overrides (["jclouds.ec2.ami-query":"",

"jclouds.ec2.cc-ami-query":""])privileged true

}

SMALL_LINUX : template{   imageId linuxImageId   remoteDirectory "/home/user/gs-files"   machineMemoryMB 1600   hardwareId “m1.medium”  locationId “us-west-1”  localDirectory "upload“ keyFile “myEucaKeyFile.pem”  username "user"   options ([         "securityGroups" : ["default"] as String[],                                             "keyPair" : keyPair ]) overrides ([“endpoint” : “http://communitycloud.eucalyptus.com”])                                  privileged true}

Automation across the stack1 Upload your recipe.

2 Cloudify creates VM’s & installs agents

3 Agents install and manage your app

4 Cloudify automate the scaling

Big Data Apps, on Any Cloud, Your Way

Open Source (Apache2)

Big Data On Demand with CloudifyRelational DB Clusters NoSQL Clusters Hadoop

MySQL MongoDB Hadoop (Hive, Pig,..)

Postgress Cassandra Storm

Couchbase ZooKeeper

ElasticSearch

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

22

Demo Time: Storm Cluster

Large ISV Case Study

• Application– Call Center surveillance system

• Background– Previously – voice data

• Goal for a new systemMonitor data & voiceMultiple data sources Advanced correlations Mission

Accomplished

Additional Benefits

• True Cloud Economics

• One product -> Any Customer Environment

• Increased Agility

Thank You!

References: http://www.cloudifysource.org http://github.com/CloudifySource