Download ppt - Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

Large dataset processing in the CloudKevin Glenny and GridwiseTech team

Simplified data oriented system

Internal or external

data sources

applications working on data

IT systems are constantly growing

Increased numberof users

Increased numberof applications

Increased amountof data

IT systems are constantly growing

Infrastructure bottleneck

Example

Electronics manufacturer

24/7 production

Report computation too long

for decision making

2.5 million transactions daily

4TB data to manage

What is Cloud computing?

„Transparant access to

capabilities using a

pay-per-use

business model”

Benefits:– Dynamic scaling

– Pay-for-use

– Off-shored administration

What are the delivery models?

SaaS (Software as a Service)– SalesForce.com, 63,00 clients

PaaS (Platform as a Service)– Google App Engine (2008), Microsoft Azure

(2008)

IaaS (Infrastructure as a Service)– Amazon Elastic Compute Cloud, 8.2 million

instances launched since 2006

Application data processing

Database sharding (MySQL,

postgreSQL etc.)

NoSQL (Google's BigTable,

Amazon's Dynamo etc.)

Data-grid (GigaSpaces XAP, Oracle Coherance, InfiniSpan etc.)

Data-grid and sharding in the Cloud

All data processing and persistencein the Cloud

Achievements:•Near real-time•Dynamic scaling (applicationand resources)•Pay-per-use•Reduced administration•HA

Remaining issues

Getting large datasets in and out of the Cloud– Bandwidth limited client side

– Resort to mailing hard drives!

Performance - 2 to 50% slow down

Data security/privacy - trust

SLAs – plan for the worst

Conclusions

Data oriented systems datasets grow causing bottlenecks

Datasets in the Cloud can be processed using scalable technologies

Challenges remain

Main – how to get the data to the Cloud?