Upload
mapr-technologies
View
6.168
Download
0
Embed Size (px)
Citation preview
Operator
Systems
OCS
IN
CDR
PCC
CRM
Data Flow
Inte
grati
on La
yer
RT Complex Event
Processing
Decisioning Engine
iCLM UI
Marketing Operations
Business Discovery,
Monitoring & Reporting
Visual Rules
Subscriber Data Store
HBase
MarketingCSR Monitoring
Big Data Analytics
Hadoop M/R Event, aggregation and profile
Data Hive DWH
Subscriber Profile
Decisioning Engine
Channels
We conducted an RFP for selecting the most Telco-Grade platform.The RFP focused on non-functional capabilities such as sustainable performance, high-availability and manageability.
The approach Each step should increase scalability and reduce TCO. Runtime (OLTP) processing:
We replace the underline plumbing's-minimal changes to business logic. All changes can be turned on/off by GUI configurations:
Modular hybrid architecture. Ability to work in dual mode - Good for QA…But also for production (legacy)…
Analytics processing: Calculate the Profile in M/R (Java).
Scalable. We have the best Java developers.
Wrap it with a DSL (Domain-Specific-Languages) That’s how we work for years – (ModelTalk paper) Non-Java-programmers can do the Job.
Legacy Architecture
Phase 1
Phase 1 – File queues in NFS
Resulting context
Pure plumbing change – no changes to business logic code. Offloading oracle: *2 Performance boost. No BigData technology. Windows NFS client performance is a bottleneck.
Phase # Customers # Events
Legacy 10M 120M
Phase 1 10M 200M
Phase 2
Phase 2 – Introducing MapR Hadoop Cluster
Resulting Context
MapR FS + NFS : Horizontally scalable Cheap compared to high-end NFS solutions. Fast and High-Available (using VIPs) Avoiding another hop to HDFS (Flume, Kafka). Many small files are stored in HDFS (100s of millions) – no need to merge files
Phase # Customers # Events
Legacy 10M 120M
Phase 1 10M 200M
Phase 2 unlimited 200M
Phase 2 – Introducing MapR Hadoop Cluster
Resulting Context Avro files:
Complex Object Graph Troubleshooting with PIG Out-of-the-box upgrade (e.g. adding field) Map/Reduce is incremental – Avro record capture the subscriber state Map/Reduce efficiency - avoiding huge joins
Subscriber Profile calculation: Performance : 2-3 hours. Linear scalability: No limitation on number of subscribers/raw data (buy more nodes) Fast run over history data allows for early launch
Sqoop - very fast insertions to MS-SQL (10s of millions of records in minutes). Data-Analysts started working over Hive environment. No HA for OOZIE yet… Hue is premature MS-SQL and ODBC over Hive is slow and limited
Phase 3
Phase 3 –Introducing MapR M7 Table Extensive YCSB load tests to find best table structure and read/update
granularity. Main conclusions: M7 knows how to handle very big heap – 90GB. Update granularity : small updates (using columns) = fast reads(*)While in other KV store need to update the entire BLOB
CSR tables migrated from Oracle to M7 Table: 10s of billions of records Need sub-second random access per subscriber 99.9% Writes – by Runtime machines (almost each event processing operation
produces update) 0.1% Reads – by Customer’s CSR representative. Rows – per subscriber key, 10’s of millions 2 CFs – TTL 365 days. 1 version. Qualifier:
key:[date_class_event_id], value: record Up to thousands per Row
Phase 3 –Introducing MapR M7 Table
Resulting Context
Choosing the right features – no too demanding performance wise. Easy to create and manage tables– still there’s some tweaking. No cross-table ACID - need to develop a solution for keeping consistency across M7
Table/Oracle/Files-system. Hard for QA - compared to RDBMS. No easy way to query. Need to develop tools.
Phase # Customers # Events
Legacy 10M 120M
Phase 1 10M 200M
Phase 2 unlimited 200M
Phase 3 unlimited 300M
Phase 4
Phase 4 – Migrating OLTP features to M7 tables
Subscriber State table migrated from Oracle to M7 Table: 25% Writes– by Runtime machines updating the state 100% Reads – by Runtime. Rows – per subscriber key, 10’s of millions 1 CFs – TTL -1. 1 version. YCSB to validate the solution Sizing model Qualifier:
key:state_name, value: state value. Dozens per Row. But….Only 10% are being updated per event
Subscriber Profile Table migrated from MS-SQL to M7 Table. Bulk insert once a day
Outbound Queue Table migrated from MS-SQL to M7 Table.
Phase 4 – Migrating OLTP features to M7 tables
Resulting Context
No longer dependent on Oracle for OLTP. Real-time processing can handle billions of events per day. Sizing is linear and easy to calculate:
Number of subscribers * state size * 80% should reside in cache. HW spec: 128GB RAM, 12 SAS drives.
Consistency management is very complicated.
Phase # Customers # Events
Legacy 10M 120M
Phase 1 10M 200M
Phase 2 unlimited 200M
Phase 3 unlimited 300M
Phase 4 unlimited unlimited
Phase 5
Phase 5 – Decommission legacy RDBMS
Resulting Context MySQL is not a new technology in our stack (part of MapR
distribution). Removing Oracle/MS-SQL from our architecture has
significant impact on system cost, deployment, monitoring etc.
Atzmon Hen-Tov Lior Schachter