View
270
Download
2
Category
Preview:
Citation preview
Apache Trafodion™ (incubating)Push Hadoop Beyond Analyt icst rafodion.apache.org
Speaker : Rao Kakar lamudi ( rao.kakar lamudi@esgyn.com)
© 2015 Esgyn Corporation
Use CaseInternet of Things
Business Needs
◦ Enormous vehicle fleet◦ Real-time capture, monitoring, and analysis at scale
with high concurrency
Problem
◦ Optimize usage◦ Understand scheduling◦ Understand maintenance
◦ Real-time customer information
Challenge
◦ 559 million vehicle records per day◦ Sub-second response time◦ Sustained performance at >100 concurrent users
Solution
◦ Trafodion on standard x86 Linux cluster◦ Data load, query, and extract in parallel◦ Users can query both current and historical data
© 2015 Esgyn Corporation
Use CaseFinance
Business Needs◦ Customers need to query their recent balances and
their transactions from months or even years ago. ◦ They also want more information than can easily be
stored in a separated architecture story (vendor name, ATM location, transfer location, etc.)
Problem
◦ Retail business get transactions at a high overall volume from a wide variety of sources, like credit card transactions, tellers, electronic transfers, and ATMs.
◦ Customers make queries about individual transactions in the last day, month, and year, but the storage and query performance required to give full information about all transactions is beyond the capacity of traditional architectures
Challenge◦ Query data from the current day’s transactions with high
reliability and low latency, without impacting the performance of the primary transactional system
Solution
◦ EsgynDB initially provides an ODS for the mission-critical transaction system, offloading near-real-time queries there to allow the primary transactional system to meet its SLAs.
◦ The same data lake also includes the historical data, allowing for seamless connection of data over time, with no extra data replication. And with EsgynDB’ s ability to integrate structured, semi-structured, and unstructured data, customers and employees have access to more information about each transaction.
© 2015 Esgyn Corporation
Use CaseTelecommunications
Business Needs
◦ 24x7 ingest and analysis of voice, SMS, and data file business transactions
◦ Build new solutions for 100s of millions of users
Problem
◦ Up-to-date information within few minutes◦ Support and upsell◦ Trust your data
Challenge
◦ Load GB of data in minutes on an ongoing basis◦ Comprehensive queries against historical and recent
data◦ Data quality and rapid analysis to engage customer
Solution
◦ Trafodion on standard x86 Linux cluster◦ Ingest raw data at arrival, rate and load into Trafodion◦ Transactional inquiries◦ Detail reports
© 2015 Esgyn Corporation
Use CaseE-Commerce
Business Need
◦ Ad-driven revenue model◦ Need near-real time decisions to optimize ad
placement
Problem
◦ Log files Hive Traditional Database◦ Too slow to meet business requirements
Challenge
◦ 2 TB of data daily, 42 GB/hour peak◦ Misses critical data + lots of redundant data◦ High-volume transactions and concurrency◦ Produce account summaries in hours
Solution
◦ Query Hive data directly; store in Trafodion◦ Same data lake no ETL needed◦ Near-real time data access using SQL
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Trafodion Brings:• Open Source Apache Trafodion (Incubating) project and license
• Hadoop HBase scalability up to petabytes
• Full ANSI SQL support
• ACID Transactions (Atomic, Consistent, Isolated, Durable) across rows, tables, and servers
• Cost effective scale out
• Enterprise ready active-active replication across multiple data centers
• ODBC / JDBC / ADO.Net / Hibernate support
• Proven and hardened database engine with 20+ years of Tandem / Compaq / HP innovation
• Data federation (e.g. Kafka) and schema flexibility
• Optimized for real-time transaction processing, operational reporting, and operational data store (ODS) workloads that demand sub-second response times with high concurrency
Trafodion Stack Overview – Running Queries
HBase
Native HBase Tables KVS, Columnar
Hive
Native Hive TablesMulti-Structured
ESP
CMP Master
HBase
ESPDTM
Storage Engine
JDBC ODBC
Compiler and Optimizer
SQL ParallelismDistributed Transaction
Management
HDFS
. . . .
User and ISV Operational Applications
Database Connectivity
Data Store Integration
Driver
Relational Schema
Trafodion Tables
Client
SQL
© Copyright 2015 Esgyn Corporation Esgyn Confidential
Why Apache Trafodion?Ingredients for a world-class relational database
1. Time, Money, and Talent◦ 20+ years of investment◦ $300+ million invested◦ Database developers grew up on
◦ Shared nothing Massively Parallel Architecture◦ With a single system image across clusters
◦ 300+ years of database experience◦ On building OLTP and BI engines
ANSI and non-ANSI functionality supported, performance, scalability, concurrency, throughput, stability, high availability, transactional, UDF, SPJ, OLAP, etc.
Why Apache Trafodion?Ingredients for a World-class Relational Database
2. World Class Optimizer◦ Rule-driven and cost-based optimizer◦ Based on Cascades & Large Scope Rules
◦ Reduces search space◦ Recognizes patterns such as star joins
◦ Considers multiple join strategies◦ Nested and nested cache for operational◦ Merge and hybrid hash for large complex queries
◦ Optimizes inner, outer, & full outer joins◦ Considers serial & parallel plans based on
cardinality◦ Uses equal-height histograms to indicate skew◦ Leverages skew buster to eliminate skew◦ Un-nests subqueries◦ Converts correlated subqueries to joins
◦ Pushes down predicates to lowest operation◦ Filters e.g. row selection (start-stop key)◦ Coprocessors e.g. pre-aggregation
◦ Leverages Multi-Dimensional Access (MDAM)◦ To avoid full scans when no predicates on leading key
columns specified◦ Considers sort avoidance strategies
◦ Uses hash group by to avoid sorts◦ Leverages key order◦ Does in-memory sort when possible
◦ Uses sophisticated plan caching techniques◦ And a lot more …
Built & tuned to handle complexities & differences inherent in varied enterprise class workloads
© Copyright 2015 Esgyn Corporation Esgyn Confidential
Node 1 Node 2 Node n
Client Application
HDFS
HBase HBase HBaseFilters
HDFS HDFS HDFS HDFS
Ethernet
Coprocessors
3. World Class Parallel Data Flow Execution Engine
◦ Data Flow pipeline parallel architecture◦ Intermediate results materialized only for blocking
operations like sorts◦ Data overflow to disk only for large hash joins
◦ Adaptive Segmentation to use only needed resources◦ Co-located joins & repartitioning when necessary◦ Uses Inner and outer child broadcasts ◦ Parallel secondary index maintenance
Why Apache Trafodion?Ingredients for a world-class relational database
Master
ESP ESP ESP ESP ESP
ESP ESP ESP ESP ESP
Master
Multi-fragment
Supports salting of data across region servers
Why Apache Trafodion?Ingredients for a World-class Relational Database
4. World Class Distributed Transaction Management system
© Copyright 2015 Esgyn Corporation Esgyn Confidential
PerformanceYCSB and Order Entry scale linearly!
Transactional Order Entry
Thro
ughp
ut
YCSB
Selects Updates
50/50
Thro
ughp
ut
Thro
ughp
ut
Thro
ughp
ut
Try and Contribute Apache Trafodion Download:
◦ trafodion.apache.org
Try Trafodion on AWS:◦ https://aws.amazon.com/marketplace/pp/B018RBMFG0
Documentation:◦ trafodion.apache.org
Become a contributor – add a new feature, fix a bug, translate documentation, more◦ Discuss your changes on the dev mailing list◦ Create a JIRA issue◦ Setup your development environment◦ Prepare a patch containing your changes◦ Submit the patch
Recommended