46
In search of database nirvana The challenges of delivering Hybrid Transaction/Analytical Processing Rohit Jain, CTO – 2016 [email protected] (C) Copyright 2015 Esgyn Corporation Esgyn Confidential

In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

Embed Size (px)

Citation preview

Page 1: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

In search of database nirvanaThe challenges of delivering Hybrid Transaction/Analytical Processing

Rohit Jain, CTO – 2016roh i t . ja [email protected]

Page 2: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Agenda The swinging database pendulum

Hybrid Transaction/Analytical Processing (HTAP) Workloads

Query versus storage engines

The challenges of HTAP◦ Single query engine for all workloads◦ Supporting multiple storage engines◦ Same data model for all workloads◦ Enterprise-caliber capabilities

Conclusion

Page 3: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

RDBMS

The swinging database pendulum

RDBMS challenges with Big Data• High TCO• Lack of elastic scalability• Did not meet performance

requirements• No support for semi-structured &

unstructured data • Inability to parallelize user code• No schema flexibility• Too complex for simple needs

NoSQL Enter NoSQL – polyglot programming & persistence• Key value stores• Wide column stores (Big Table)• Document stores• Text search• Graph database• Column stores

Page 4: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The swinging database pendulum

But enterprises wanted SQL• Skills prevalent• Existing tools & applications• Transaction support often useful• More efficient when joins needed• Easier than coding MapReduce • Merit in rigor of pre-defining columns• Uniform metadata across applications

NoSQL

But still …• Too many languages, interfaces, & data structures• Too much of gluing technologies together• Compatibility between different versions• No end-to-end view of workload performance• Support contracts with multiple vendors• Too many skills required to develop and manage• Too much data movement• No single solution for varied interfaces & use cases

SQL

Page 5: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Hybrid Transaction/Analytical Processing (HTAP) Workloads

OLTP• Mostly transactional• Sub-second response• Customer experience• Large update volume• Online updates• No historical data• High concurrency• Scales linearly• Normalized data model• Custom applications or

third-party solutions• Keyed updates/queries• Mostly SMP; MPP for

web-scale

ODS• Can be transactional• Sub-second to seconds• Customer experience or

Business internal• Low update volume• Batch to streaming feeds

from OLTP• Some historical data• Low concurrency if

internal, high otherwise• Near linear scale• Normalized data model• Custom apps/3rd party• Keyed queries• Mostly MPP

BI• Non-transactional• Seconds to minutes• Business internal• No direct updates• Batch to streaming feeds

from OLTP/ODS• Historical data• Low to high concurrency• Less linear in scale• Dimension data model• BI, OLAP, ROLAP tools –

reporting and dashboards• Ad hoc and scheduled

queries and large extracts• Mostly MPP

Analytics• Non-transactional• Minutes to hours• Business internal• No direct updates• Batch/aggregates from BI• Historical and big data• Low concurrency• Complex queries,

nonlinear scale• Columnar store• Analytical tools• Ad hoc queries; Analytics

in database• Mostly MPP

Essential to operate the business To improve performance of the company

Page 6: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Query versus storage engines

Hadoop Cluster

Switch SwitchOperational Business Intelligence Analytics

Query Engine• Allow clients to connect & submit queries• Distribute connections across cluster• Compile query• Execute query• Return results of query to client

Storage Engine• Storage structure• Partitioning• Automatic data repartitioning• Select columns• Select rows based on predicates• Caching writes and reads • Clustering by key• Fast access paths or filtering• Transactional support• Replication• Compression & encryption

• Mixed workload support• Bulk data ingest/extract• Indexing• Colocation or node locality• Data governance• Security• Disaster recovery• Backup, archive, restore• Multi-temperature data

support

In-memory

Single Query Engine

Marie Beaugureau
<Rohit, word on the street is that the ASF is starting to crack down on logo infringement, see this page: http://www.apache.org/foundation/marks/, so please remove any logos from the art on this slide, unless you have written permission to use them.>
Page 7: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Single query engine for all workloads

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

Page 8: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

Table A

Table B

Partitioned

The challenges of HTAP:Single query engine for all workloads

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Salting / Partitioning (hash, range, …)Salt key

G D C EF

Non-partitioned

GDC

FE

Clustered by Primary

Key

BA CMulti-column

clustering key

Page 9: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

The challenges of HTAP:Single query engine for all workloads

Equal-height histograms

• Unique Entry Count• Lowest and highest values• Multiple key / join column cardinalities• Sampling for fast stats updates• Incremental update stats• Skew – equal height histograms

Page 10: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

The challenges of HTAP:Single query engine for all workloads

80 minutes

2 minutes

Skew Buster

Page 11: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

The challenges of HTAP:Single query engine for all workloads

Week Item Store …01/07/2016 1 1 …

01/07/2016 1 3 …

01/07/2016 1 5 …

01/07/2016 2 34 …

01/07/2016 3 13 …

01/07/2016 3 3 …

01/07/2016 4 2 …

01/07/2016 4 4 …

01/14/2016 1 2 …

01/14/2016 1 4 …

01/14/2016 1 5 …

01/14/2016 1 35 …

01/14/2016 3 1 …

01/14/2016 3 20 …

Where is item = 1, Stores 2 through 5?

• Use of various statistics to generate an efficient plan

• Sequence of column access for column stores

Page 12: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Single query engine for all workloads

Indexes• Kinds of indexes and how they are leveraged

• Unique index

• Transactional consistency with base table

• Impact on updates

• Updates during bulk loads

Materialized Views• Synchronous and asynchronous maintenance

• Overhead of maintenance

• Automatic query rewrite

• User defined materialized views

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

Page 13: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

The challenges of HTAP:Single query engine for all workloads

Serial vs parallel plans

Node 1 Node 2 Node n

Client Application

HDFS

HBaseRegion 1

Filters

HDFS HDFS HDFS HDFS

Ethernet

CoprocessorsHBase

Region 2HBase

Region 3HBase

Region 4HBase

Region 5

Master Master

Multi-fragment

Master

ESP ESP ESP ESP ESP

ESP ESP ESP ESP ESP

Page 14: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

The challenges of HTAP:Single query engine for all workloads

Qry1

Qry2Qry4

Qry3Qry5 Qry6

Qry7

Page 15: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

The challenges of HTAP:Single query engine for all workloads

• Optimizer technology, e.g., Cascades used by Apache Trafodion and Microsoft SQL Server

• Query plan caching for operational

• Query plan cache management

• Extensibility of optimizer to evolve with varied workloads

• Recognizing query patterns, such as star joins

Page 16: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

The challenges of HTAP:Single query engine for all workloads

Adaptive and parallel joins • Nested join• Probe cache for nested join• Merge join• Matching partition join• Repartitioned hash join• Replication by broadcast hash join• Inner / outer child broadcast• Dimensional schema star join

• Inner join• Left Join• Right Join• Full Outer Join• Self join

Cost Premiums for nested joins or serial plans

Page 17: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

The challenges of HTAP:Single query engine for all workloads

Compute Cost

Execution Environment

Physical Properties

Estimates Confidence

Cardinality, Distribution, Correlation

SensitivityTo Estimates

Evaluate Risk

Risk Adjustment

Benefit

Risk

Risk Premiums• Nested join 20%• Merge join 10%• Serial plan 5%

?

Page 18: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

The challenges of HTAP:Single query engine for all workloads

Scan

Scan

Join

Group by

• Data flow architecture

• No materialization of intermediate results

• Graceful overflow to disk for large memory operations

• Efficiencies such as pre-fetch

• Fast path for operational workloads

Page 19: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

• Priority / SLA-based execution

• Allocation of resources by service level

• Decrease priority with usage increase

• Anti-starvation / switch between queries based on priority

The challenges of HTAP:Single query engine for all workloads

Query Low

QueryMedium

Queue

Mem

stor

e

HBase

….

Mem

stor

e

HBase

Mem

stor

e

HBase

Queue Queue

HBase Region 1

HBase Region 3

HBase Region 5

QueryHigh

Low Low Low

Medium MediumMedium

High HighHighLow Low Low

Medium MediumMedium

High HighHigh

Page 20: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Data structure – key support, clustering, partitioning

Statistics

Predicates on non-leading or non-key columns

Indexes and materialized views

Degree of parallelism

Reducing the search space

Join type

Data flow and access

Mixed workload

Feature support

The challenges of HTAP:Single query engine for all workloads

Operational workloads• Referential integrity• Stored procedures• Triggers• Various levels of transactional

isolation and consistency• …

BI and Analytics workloads• Materialized views• Fast / bulk extract, transform,

load (ETL)• OLAP, time series, statistical,

data mining, and other functions• …

Needed by both• Scalar and table mapping UDFs• Inner, outer, and full outer joins• Un-nesting of subqueries• Converting correlated subqueries to joins• Predicate push down• Sort avoidance strategies• Constant folding• Recursive union• …

Page 21: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Supporting multiple storage engines

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

Page 22: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

The challenges of HTAP:Supporting multiple storage engines

• Storage engine statistics, used by query engine

• Sampling

• Access to changed data for incremental updates

• Update counters to determine refresh schedule

Refresh

Page 23: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Supporting multiple storage engines

BA CMulti-column key

Query Engine

Storage Engine

A+B+C Single clustering key

Random single row and range access for operational workloads

31 551 722 422 932 442 123 123 2

A=2range access

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

Page 24: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

The challenges of HTAP:Supporting multiple storage engines

• Data partitioning across disks and nodes

• Hash, range, or combination

• Salting support

• Query engine imposed salting

• Repartitioning as the cluster expands/contracts

• Read/write access while being rebalanced

• Localize data access to avoid shuffling

CREATE TABLE t(a integer not null primary key, b integer) SALT USING 4 PARTITIONS;

HBase Region

HDFS

HBase Region

HDFS

HBase Region

HDFS

HBase Region

HDFS

INSERT(s) SELECT(s)

PART 1 PART 2 PART 3 PART 4

Page 25: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

The challenges of HTAPSupporting multiple storage engines

• Data types supported

• Query to storage engine data type mapping

• Value constraint enforcement

CHARACTER(n) Character string. Fixed-length n

VARCHAR(n) orCHARACTER VARYING(n)

Character string. Variable length. Maximum length n

BINARY(n) Binary string. Fixed-length n

BOOLEAN Stores TRUE or FALSE values

VARBINARY(n) orBINARY VARYING(n)

Binary string. Variable length. Maximum length n

INTEGER(p) Integer numerical (no decimal). Precision p

SMALLINT Integer numerical (no decimal). Precision 5

INTEGER Integer numerical (no decimal). Precision 10

BIGINT Integer numerical (no decimal). Precision 19

DECIMAL(p,s) Exact numerical, precision p, scale s. Example: decimal(5,2) is a number that has 3 digits before the decimal and 2 digits after the decimal

NUMERIC(p,s) Exact numerical, precision p, scale s. (Same as DECIMAL)

FLOAT(p) Approximate numerical, mantissa precision p. A floating number in base 10 exponential notation. The size argument for this type consists of a single number specifying the minimum precision

REAL Approximate numerical, mantissa precision 7

FLOAT Approximate numerical, mantissa precision 16

DOUBLE PRECISION Approximate numerical, mantissa precision 16

DATE Stores year, month, and day values

TIME Stores hour, minute, and second values

TIMESTAMP Stores year, month, day, hour, minute, and second values

INTERVAL Composed of a number of integer fields, representing a period of time, depending on the type of interval

ARRAY A set-length and ordered collection of elements

Page 26: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

The challenges of HTAPSupporting multiple storage engines

• Data types supported

• Query to storage engine data type mapping

• Value constraint enforcement

• Referential constraints

• Character sets

• Collations

• Compression

• Encryption

Page 27: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

The challenges of HTAP:Supporting multiple storage engines

• Projection at storage or query engine level

• Predicates evaluated by query and storage engines

• Predicates applied to compressed data

• Multi-column predicates

• IN lists; size of IN lists

• Multiple predicates with ORs and ANDs (pushdown)

• Evaluate predicates in sequence of filtering effectiveness

• Predicates comparing different columns of same table

• Complex expression evaluation

• Evaluation of functions

• Default or missing values on retrieval

C2 C1C3G1 7R2 4F2 9T2 4B2 1.... ..

C5C4 C623 T15 F57 R89 M82 N.... ..

project

Page 28: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

The challenges of HTAP:Supporting multiple storage engines

Server side extensibility e.g. HBase coprocessors or Cassandra triggers to push down:

• Complex predicate evaluation with expressions and functions

• Pre-aggregation

• Collocated joins or index maintenance

• Transactional support

• Security enforcement

• Some ANSI Trigger actions

Page 29: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Supporting multiple storage engines

• Mapping of security frameworks for the query and storage engines to enforce ANSI SQL security

• Integration with underlying Hadoop Kerberos security

• Integration with security solutions, like Sentry or Ranger

• Integration with security logging and SIEM solutions

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

Page 30: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

The challenges of HTAP:Supporting multiple storage engines

• Replication for high availability, backup and restore, and multi-data center support from query & storage engines

• ACID or BASE transactional support

• Integration between the query and storage engines, such as write ahead logs, and use of coprocessors

• Completely scalable and distributed transaction management architecture

• Multi datacenter support – active-active single or multiple master replication

• Overhead of transactions on throughput and system resources

• Online backup and point in time recovery

Page 31: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Supporting multiple storage engines

Single-Master

Multiple-Masters

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

Page 32: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

The challenges of HTAP:Supporting multiple storage engines

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

Time

Full transactionally consistent snapshot

Snapshots after non-transactional changes such as

bulk loads

Transactional changes captured continuously

Point-in-time recovery

Page 33: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

The challenges of HTAP:Supporting multiple storage engines

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

Point-in-time recovery

Time

Drop table or erroneous large transactional update

Restore previous full snapshot

Initiate recovery to point-in-time

Page 34: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

The challenges of HTAP:Supporting multiple storage engines

• Mapping storage to query engine metadata

• Handling storage engine specific options

• Support provided for external tables

• Changes to external tables outside of the query engine

• Operational vs. analytics objects

Page 35: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Supporting multiple storage engines

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

As nodes are added query engine immediately uses them for queries and transactions

Storage engine rebalances data automatically

• Transactional consistency across bulk loads

• Rowset inserts and selects

• Fast scanning options – snapshot scans, prefetching

• Integration for parallel operations

• Concurrency and mixed workload capability

• Elastic scale for Cloud deployments

Page 36: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Supporting multiple storage engines

• Storage and query engine error logging

• Mapping of storage engine errors to meaningful error messages and resolution options by the query engine

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

Page 37: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Statistics Key structure

Partitioning

Data type support

Projection and selection

Extensibility

Security enforcement

Transaction management

Metadata support

Performance, scale, and concurrency considerations

Error handling

Other operational aspects

The challenges of HTAP:Supporting multiple storage engines

• Minimize operational and performance impact of storage engine operational aspects, e.g., compaction or splitting

Page 38: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Same data model for all workloads …

Normal FormNormal form• 1NF• 2NF• 3NF• BCNF• 4NF• 5NF• 6NF

Star Schema

Snowflake Schema

Query engine integration with storage engine(s) to support all these data models

Page 39: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Same data model for all workloads

Normal form• 1NF• 2NF• 3NF• BCNF• 4NF• 5NF• 6NF

Star Schema

Snowflake Schema

Query engine integration with storage engine(s) to support all these data models

Page 40: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Same data model for all workloads

NoSQL Data Models“NoSQL Data Modeling Techniques”by Ilya KatsovHighly Scalable Blog

… and these!

Page 41: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Enterprise-caliber capabilities

High Availability

Security

Manageability

Page 42: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Enterprise-caliber capabilities

High Availability

Security

Manageability

• Percentage of uptime 99.99% = 52.56 minutes downtime to 99.999% = 5.26

• Online operations (data available for reads and writes)o Upgrading the OSo Upgrading the file systemo Upgrading the storage engineo Upgrading the query engineo Redistribute data to accommodate node and/or disk

expansions and contractionso Changing table definition, e.g., data type changes,

and adding, dropping, renaming columnso Create/drop secondary indexeso Full and incremental backups

Page 43: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Enterprise-caliber capabilities

High Availability

Security

Manageability

Page 44: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Enterprise-caliber capabilities

High Availability

Security

Manageability

Schema Management Performance Management Monitoring Security Management BAR ManagementObject Management Performance Monitoring Database Monitor User Management Backup AnalysisGraphical Object Editor Live Performance Monitoring Event Monitoring Role Management RecoveryCross-Platform Schema Knowledge Data Repository Live Event Monitoring Account Migration Log Backup  Bottleneck Analysis Threshold Alerts Audit Report Backup ReportsSQL Management Job/Workload Analysis Health Index Alarm ArchivalQuery Builder Job/Workload Wizard Live Health Monitoring    Visual Difference Tool Job/Workload Management Response Times Maintenance Configuration ManagementData Management Live Job/Workload

MonitoringAlert Center Repository Aging OS Provisioning

Data Migration OS Analysis Remote Monitoring Automated Maintenance Cluster ProvisioningSQL Profiler Capacity Capture Central Monitoring   Instance ProvisioningAutomated Import Capacity Trending Hardware Inventory Change Management Cloud ProvisioningVisual Explain Plans Capacity Forecast Hardware Monitoring Schema Capture Configuration EditorSession Management Space Management   Schema Compare and Synch  Lock Management Reorganization Management Troubleshooting Notifications  Process Management Query Cost Simulation Health Analysis Schema Rotation  Consistency Checks Historical Reports Problem Correlation Collaboration  Online Schema Evolution Bottleneck Tuning Automated Actions Virtual Changes  Built-In Automation Access Path Analysis      

Page 45: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

The challenges of HTAP:Enterprise-caliber capabilities

High Availability

Security

Manageability

• Operational performance by transactions per second

• Analytical performance by query

• Overhead of gathering metrics on operational and analytical workloads

• Configurable statistics collection

• Workload management by Service Level Objectiveso Based on priority and/or resource allocationo High priority operational workloads vs analytical workloads

• End-to-end visibility of transaction and query metrics

• Metric breakdown down to the query operation

• Metrics for table access across workloads down to the partition level

• Skew or bottlenecks

• Integration with YARN

Page 46: In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Conclusion

Detailed O’Reilly report:http://www.oreilly.com/data/free/in-search-of-database-nirvana.csp

It ain’t easy!!Very few products can even come close