44
Data Modeling and Scale Out Jason Stamper, 451 Research Vladi Vexler and Paul Campaniello, ScaleBase

Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

Embed Size (px)

Citation preview

Page 1: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

Data Modeling and Scale Out

Jason Stamper, 451 Research

Vladi Vexler and Paul Campaniello, ScaleBase

Page 2: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

2

Agenda

Data Modeling and Scale Out

1. 451 Research

• Key challenges in the data landscape

• Evolution of distributed database environments

2. ScaleBase

• Pros and cons of abstracting complex databases topology

• Top strategies of distributed data modeling

• Advanced data modeling and “what-if” simulations with Analysis Genie

• Scaling real apps – From need to deployment

• Demo

3. Q & A (please type questions directly into the GoToWebinar side panel)

Page 3: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

3

Today’s Presenters

Jason StamperAnalyst, Data Manage-

ment and Analytics

- 451 Research

• Over 20 years of

experience in IT

• Formerly Editor

of Computer Business

Review & Technology

Editor at The New

Statesman

Vladi VexlerVice President, Tech.

& Product Marketing

- ScaleBase

• Over 15 years experience

in software development

and product management

• Author of patents in field

of databases innovation,

dynamic data caching and

machine learning analytics

Paul CampanielloVice President,

Worldwide Marketing

- ScaleBase

• Over 25 years of software

marketing & sales

experience

• Held senior marketing

and sales positions at

Mendix, Lumigent, ESI,

ComBrio, Savantis and

Precise Software

Page 4: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

4

About 451 Research

Founded in 2000

210+ employees, including over 100 analysts

1,000+ clients: Technology & Service

providers, corporate

advisory, finance, professional services, and IT

decision makers

10,000+ senior IT professionals in our research

community

Over 52 million data points each quarter

Headquartered in New York with offices in

Boston, San Francisco, Washington, London…

Research & Data

Advisory Services

Events

Page 5: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

5

The Challenge

Businesses and their users are facing what one might call a

perfect storm – decision-makers need insight faster than ever,

and yet IT is struggling to avoid becoming a bottleneck.

Page 6: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

6

The Facts Speak for Themselves…

Recent survey by trade magazine Computer Business

Review: 98% (of 200 UK CIOs) admit “significant gap”

between what business expects and what IT can deliver.

Page 7: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

7

So What Does the Business Want?

Speed

Information, not data

Flexibility

Ease-of-use

Mobility

New ways of working

Self-service

Scale

Collaboration

Page 8: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

8

What Causes IT to Become a Bottleneck?

Governance

Control

Security

Budget

Legacy

Staff

Page 9: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

9

What Have We Learned So Far?

• So far, the emergence of so-called ‘hot’ data platform and analytics technologies have not solved the IT information bottleneck.

• Hadoop isn’t going to save the world (and neither is NoSQL).

• The ability to analyze large data sets, in real- or near real-time, is only set to grow in the era of the Internet of Things.

• IT is still critical, but it needs to enable the business to help itself. The question is how to achieve the right blend of usability, value-for-money and scalability.

Page 10: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

10

A Word or Two on Hadoop Adoption

0 2000 4000 6000 8000

2013

2012DW and DBMS

Unstructured file

Virtualized server/OS

Backup

Archive

Other

Big data/Hadoop

Average total storage capacity (TBs), and total storage footprint

by workload illustrate the low level of adoption today

Page 11: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

11

451 Research’s View of the ‘Total Data Approach’

Page 12: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

12

What is Driving the Change?

Developers

Agile

REST

JSON

Schemaless

Schema-on-read

Flexible

Applications

Web

Social

Mobile

Always-on

Interactive

Local

Architecture

Cloud

Scalable

Elastic

Virtual

Distributed

Flexible

New applications require distributed architecture

Distributed architecture encourages new development approaches

New development approaches demand new architecture

Distributed architecture enables new applications

New app requirements demand new development approaches

New devapproaches enable new lightweight

apps

Page 13: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

13

The Database Challenge

– The traditional relational database has been stretched beyond its normal capacity limits by the needs of high-volume, highly distributed or highly complex applications.

– There are workarounds – such as DIY sharding – but manual, homegrown efforts can result in database administrators being stretched beyond their available capacity in terms of managing complexity.

– Scalability

– Performance

– Relaxed consistency Increased willingness to look

– Agility for emerging alternatives

– Intricacy

– Necessity

Page 14: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

14

Scalability, and Other Challenges

• As usage of MySQL and MariaDB has grown, so has the usage

of applications that depend on MySQL and MariaDB:

– Games; Social; Customer Facing; Web; Business apps like Ad Networks;

• This has highlighted a number of challenges

– Scalability of master-slave architecture

– Performance and predictability at scale

– Lower latency; greater throughput; richer apps

– User expectations rising

– Manageability of increasing database/app sprawl

• External factors driving greater complexity:

– Distributed computing architectures

– Proliferation of cloud and elasticity requirements

– Geo-distributed application requirements

– Viral success means growth can come very quickly

Page 15: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

15

Conclusions

• The success of MySQL and MariaDB has led to complications in terms of scalability concerns

• Distributed computing, proliferation of cloud, and geo-distributed applications are adding to the complexity

• Manual sharding techniques transfer the strain from the database to the database administrator

• MySQL – and MySQL administrators – has/have never been under so much strain

• Database scalability software enables users to move beyond the limitations and complexity of DIY sharding; precisely how data is managed with a distributed database in the cloud or on premise is key.

Page 16: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

Scale Out Designs

Page 17: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

17

About ScaleBase

Distributed Database Management System

Architected for the Cloud

Simple. Reliable. Powerful.

Page 18: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

18

Quick Scale Out

Medium scale needs

Multiple database

replicas performing load

balancing with

read/write splitting

Designs of Distributed MySQL Environments

Massive Scale Out

High scale needs

Complete distributed

database environment,

with policy-based data

sharding/distribution

Page 19: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

19

Quick Scale-Out

Read/Write Splitting andContinuous Availability

Application

Redirection(ip/port)

MySQL Replicas

MySQL Master

R R R

R/W

Page 20: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

20

Massive Scale-Out

0 1 2

etc.

Master

Replicas

Master

Replicas

Master

Replicas

Shards:

Page 21: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

21

The Right Solution for You Depends on Your Goals

• Scale (mostly) reads

• Scale (mostly) writes

• Performance of reads

– Affected by joins and big tables scans of big tables

• Performance of writes

– Affected by IO r/wr, CPU and table indexes(a growing overhead)

• Locks

• CPU/IO/ RAM issues

• Load peaks

• Data growth

• Geo-distribution, special data distribution needs

Page 22: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

Pros and Cons of

Abstracting Complex Database Topology

Page 23: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

23

Pros of Abstracting Complex Database Topology

• Development Agility - Accelerates

your innovation speed

• Simplifies application code

• Reduces maintenance costs and

simplifies it

• Operations Efficiency – Zero

downtime for applications

• Reduces operation costs

• Better monitoring, analytics, HA,

scale, elasticity, etc.

Page 24: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

24

Cons of Abstracting Complex Database Topology

• Additional technology component may increase complexity

• Additional layer to monitor and manage

• Additional machines to monitor and manage (possible increased opex)

• Less control on application code level (transparent)

Page 25: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

25

Scale Out

Methodologies

Comparison

Page 26: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

Characteristics & Modeling in a

Distributed Database System

Page 27: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

27

Characteristics of Distributed Table Types

• MASTER – On master shard (0) onlySite settings, Admin data tables

• GLOBAL – Full copy on all shardsLookups, Frequently joined tables, Slow growing tables

• DISTRIBUTED-ROOT – Distribution based on a key column

User.Id

• DISTRIBUTED-CASCADED (child) – Based on parent rowUser_Photos, User_Photos_Likes – depend on Users

Shards: 0 1 2 3

Full table

Full table Full table Full table Full table

¼ table ¼ table ¼ table ¼ table

Page 28: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

28

Characteristics of Distributed Queries

• ONE-DB – 1 shard, 1 node. Most optimal.1) Any call when data known to be in one shard (Distributed/Master)

2) Call to Global table (load balance)

• ALL-DB – All shards, 1 node.1) AGREGATED READs (like map-reduce)

2) DML (writes) on Global tables

3) DDL (create, drop, alter schema)

• FULL-DB – All shards, all nodes.

Session calls (USE, SET)

• CROSS-DB – #n shards, 1 node. Least optimal, but criticalCross-shard conflict resolution.

Note: Not all sharding platforms support all distributed query types.

Page 29: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

29

Why Data Modeling is Important?

• DATA and LOAD – Efficient distribution of:

– DATA - all / main tables and data

– READS

– WRITES

• QUERIES

– Handle ALL-DB Queries (Map-reduce concept)

– Minimize (but support!) CROSS-DB Queries – higher performance and scale

• OPTIMIZE DEVELOPMENT with SQL ANALYTICS

– Insight into the real database usage

Page 30: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

30

Data Relationships Can be Extremely Complex

Usually, scale out is applied to growing-mature apps.

How do you define an optimal data distribution policy?

Page 31: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

Analysis Genie:

MySQL Visual Analysis &

Optimal Distribution Policy Configuration

Page 32: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

32

ScaleBase Analysis Genie

• A tool enabling MySQL visual analysis and building an optimal data

distribution policy; Designed for DBAs, Architects & Dev. Managers

• Two step-process:

– Analysis Assistant

– An agent captures app/DB information, including SQL traffic and

database metrics

– Obfuscates, summarizes and packages the App-DB data

– Analysis Genie

– a SaaS application, receives the AA package and presents the

visual analysis and details the policy configuration

Analysis Assistant Analysis Genie

Page 33: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

33

ScaleBase Analysis Genie

• Advanced analytics

– Schemas, data & queries

– Semantic structure analysis

– Usage, Load and Scale analytics

• Data Modeling and

Scale-out planning

– Customized for the most complex

applications

– Auto identification of optimal

data distribution policy

– Complete policy control

• Quality assurance

– Review before production

• Simulation of results

– “What-if” analysis

Page 34: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

34

Relationship Identification

Mapping includes:

• Schemas structures

• Tables & columns names

matching

• Queries parsing and

identification of joined

tables and columns

• Statistics on every object

size and access

Page 35: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

35

Analyzing Relationships: From Chaos to Order

Understanding

and mapping

complex

relationships

Page 36: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

ScaleBase Genie Demo

Page 37: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

37

MySQL Visual Analysis Demo

• Visual analysis

• Distribution policy identification and configuration

• Scale out load via data sharding (massive scale out)

ScaleBase Enterprise

Analysis

Genie

Page 38: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

Summary

Page 39: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

39

Reading Plus

Who:

• Online education company

Problem:

• Busy season (back-to-school) was approaching and they needed a solution

that could be quickly implemented, while guaranteeing uptime

• With increasing growth, they needed to implement a scale out solution quickly

Alternatives Considered:

• A clustering technology, which proved to be infeasible due to schema

complexity and a lengthy re-architecture requirement

Solution:

• Used visual analysis to determine best scale out plan

• ScaleBase Lite for instant scale out and continuous availability

• 35 Tomcat application servers were connected to 3 ScaleBase controllers

• ScaleBase performed automated read/write splitting and load balancing

Page 40: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

40

Next Gen SaaS ERP Company

Who:

• Inventory management

ecommerce company

• Hosted on Rackspace

(ScaleBase Partner)

Problem:

• Largest available hardware could not support workload

Alternatives Considered:

• Initially went with a “black box” solution, encountering many issues

Solution:

• Scaled out a single MySQL instance to 8 clustered shards

• On-demand growth – current workload over 20,000 TPS

– Plan to double footprint in next quarter

– Support all production customers during Black Friday & Cyber Monday

Page 41: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

41

Scale out to unlimited users

Continuous availability

Dynamic workload optimization

Fast and simple deployment

Easily scale out a single

MySQL instance

Optimized for the Cloud

Reduces time-to-market

No changes needed to app or database

Database usage analytics

Intelligent load balancing

Centralized data management

ScaleBase Distributed Database Management System

Page 43: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

43

How Can I Learn More?

Use visual analysis to plan your

scale out strategy

Download the

Analysis Genie:

https://www.scalebase.com/software

Read the 451 report about

ScaleBase (& the DB market)

Download Jason’s Report

(authored last week)

https://www.scalebase.com/resources/

whitepapers

Page 44: Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

Questions?

Contact Info:Paul Campaniello

[email protected]

Vladi Vexler

[email protected]

Resources:www.scalebase.com

www.scalebase.com/resources

www.scalebase.com/blog

[email protected]

(617) 630.2800