Oracle Week 2016 - Modern Data Architecture

Preview:

Citation preview

Modern Operational Data Architecture

Arthur Gimpel, DataZone

About Me

• Name: Arthur Gimpel

• Position: Technology Evangelist, Solutions Architect, Trainer

• Tech Stack: MongoDB, SQL Server, Couchbase, Elastic Stack, Redis, Kafka, Python, .NET

Relational Databases

• First RDBMS was introduced in late 1970s • Exist in all possible flavors but share one

thing - ACID • Still dominate the database market

RDBMS In Theory

• Atomicity: All or nothing approach, transactions

• Consistency: Hard state, every transaction changes the whole DBMS

• Isolation: Transactions cannot interfere with each other

• Durability: Every transaction is persisted

RDBMS Is Not Perfect

• Everything is persisted, synchronously. Limited by IO performance

• All data is bound to a tabular schema, hard to make changes in big databases

• ACID makes horizontal scaling nearly* impossible

• Complex schema slows down aggregations and queries drastically

NoSQL

• Distributed / Horizontal Scalability • Mostly Open Source • Mostly schema less:

• Key - Value • Document • Graph

• Serves specific purposes

NoSQL - Key Value Stores

• Key: • Usually string, equivalent to primary key in a

relational database

• Value: • Simple values: Int, Float, DateTime • Complex values: Array, Binary, XML, JSON

Key Value - Characteristics• Database is usually a set of unique keys,

and its values • KV data stores are usually easy to

distribute • Key Value access usually is VERY fast • Indexing and querying values is usually

challenging

Key Value - Use Cases• Distributed caching

• Session / temporary user data

• Ad tech: Impressions

• Ad tech: Serving data - profiles, segments

• Recommendation engines - main data store

NoSQL - Graph Stores“In computing, a graph database is a database

that uses graph structures for semantic

queries with nodes, edges and properties to

represent and store data” (Wikipedia)

Graph - Characteristics• Nodes are entities - for example a person

• Properties describe nodes - for example age, name

• Edges are relations between nodes and/or properties

Graph - Use Cases• Fraud detection

• Recommendation engines - link analysis

• Intelligence systems

• Social Networks

• Medical Research

NoSQL - Document Stores

• Document databases usually store JSON • Used to store object oriented data • Usually used to avoid relational - object

mismatch • Document stores have the highest

adoption rate among NoSQL databases

Document Store - Characteristics• Information is stored in JSON variations

• Some document stores support secondary indexes for easier querying

• Documents are usually divided to logical groups (collections, buckets, types - instead of RDBMS tables)

Document Store - Use Cases• “Relational” use cases where there is a

need for high scale (volume, velocity, variety)

• Hierarchal data - aggregations

• Search use cases

NoSQL - Challenges

• Every data store has its purpose. There is no single solution to all database needs

• NoSQL does not implement all of RDBMS’s abilities (CDC, Jobs, Stored Procedures, Triggers)

• Every data store has its own languages, and APIs. There is no ANSI SQL

Not Only SQL

Polyglot Persistence Sample Use Cases

• Add search capabilities to your database

• Split session / temporary data processing to key value stores

• Add Graph analysis capabilities to your operational database

Search Use Case

Search: Architecture #1

Search: Architecture #2

Architecture Comparison

Architecture #1 Architecture #2

Data distribution strategy Data store based Application based

Data distribution component Data Pipeline Message Queue

Implementation Team Data Engineers / DevOps DevOps / Developers

Implementation Complexity

Low: Data pipeline development

High: data access layer refactor

Scalability Limited to RDBMS ScaleFully scalable regardless

of RDBMS

Summary

• Chose the relevant database engine for the right mission - replacing databases is not easy

• Do not hesitate to use more than one database engine in your operational application, single point of truth will be created in the analytical stack

• Sizing is no replacement for benchmark. Check your deployment carefully

DataZone Advanced Data Solutions

Enterprise Search

Data Flow Management

Centralized Logging

Operational Analytics

Polyglot Persistence

Business Analytics

DataZone Scale With Confidence

Troubleshooting & Tuning

Technological Evaluation

Training Services

Architecture Review

Cost Management

End-to-End Implementations

Infrastructure Support / DevOps

Our Ecosystem

Keep in touch: contact@DataZone.io