Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform

Animated version

http://prezi.com/qeyffvue7nr3/?utm_campaign=share&utm_medium=copy&rc=ex0share

Operator

Systems

OCS

IN

CDR

PCC

CRM

Data Flow

Inte

grati

on La

yer

RT Complex Event

Processing

Decisioning Engine

iCLM UI

Marketing Operations

Business Discovery,

Monitoring & Reporting

Visual Rules

Subscriber Data Store

HBase

MarketingCSR Monitoring

Big Data Analytics

Hadoop M/R Event, aggregation and profile

Data Hive DWH

Subscriber Profile

Decisioning Engine

Channels

We conducted an RFP for selecting the most Telco-Grade platform.The RFP focused on non-functional capabilities such as sustainable performance, high-availability and manageability.

The approach Each step should increase scalability and reduce TCO. Runtime (OLTP) processing:

We replace the underline plumbing's-minimal changes to business logic. All changes can be turned on/off by GUI configurations:

Modular hybrid architecture. Ability to work in dual mode - Good for QA…But also for production (legacy)…

Analytics processing: Calculate the Profile in M/R (Java).

Scalable. We have the best Java developers.

Wrap it with a DSL (Domain-Specific-Languages) That’s how we work for years – (ModelTalk paper) Non-Java-programmers can do the Job.

Legacy Architecture

Phase 1

Phase 1 – File queues in NFS

Resulting context

Pure plumbing change – no changes to business logic code. Offloading oracle: *2 Performance boost. No BigData technology. Windows NFS client performance is a bottleneck.

Phase # Customers # Events

Legacy 10M 120M

Phase 1 10M 200M

Phase 2

Phase 2 – Introducing MapR Hadoop Cluster

Resulting Context

MapR FS + NFS : Horizontally scalable Cheap compared to high-end NFS solutions. Fast and High-Available (using VIPs) Avoiding another hop to HDFS (Flume, Kafka). Many small files are stored in HDFS (100s of millions) – no need to merge files


Legacy 10M 120M

Phase 1 10M 200M

Phase 2 unlimited 200M

Phase 2 – Introducing MapR Hadoop Cluster

Resulting Context Avro files:

Complex Object Graph Troubleshooting with PIG Out-of-the-box upgrade (e.g. adding field) Map/Reduce is incremental – Avro record capture the subscriber state Map/Reduce efficiency - avoiding huge joins

Subscriber Profile calculation: Performance : 2-3 hours. Linear scalability: No limitation on number of subscribers/raw data (buy more nodes) Fast run over history data allows for early launch

Sqoop - very fast insertions to MS-SQL (10s of millions of records in minutes). Data-Analysts started working over Hive environment. No HA for OOZIE yet… Hue is premature MS-SQL and ODBC over Hive is slow and limited

Phase 3

Phase 3 –Introducing MapR M7 Table Extensive YCSB load tests to find best table structure and read/update

granularity. Main conclusions: M7 knows how to handle very big heap – 90GB. Update granularity : small updates (using columns) = fast reads(*)While in other KV store need to update the entire BLOB

CSR tables migrated from Oracle to M7 Table: 10s of billions of records Need sub-second random access per subscriber 99.9% Writes – by Runtime machines (almost each event processing operation

produces update) 0.1% Reads – by Customer’s CSR representative. Rows – per subscriber key, 10’s of millions 2 CFs – TTL 365 days. 1 version. Qualifier:

key:[date_class_event_id], value: record Up to thousands per Row

Phase 3 –Introducing MapR M7 Table

Resulting Context

Choosing the right features – no too demanding performance wise. Easy to create and manage tables– still there’s some tweaking. No cross-table ACID - need to develop a solution for keeping consistency across M7

Table/Oracle/Files-system. Hard for QA - compared to RDBMS. No easy way to query. Need to develop tools.


Legacy 10M 120M

Phase 1 10M 200M



Phase 4

Phase 4 – Migrating OLTP features to M7 tables

Subscriber State table migrated from Oracle to M7 Table: 25% Writes– by Runtime machines updating the state 100% Reads – by Runtime. Rows – per subscriber key, 10’s of millions 1 CFs – TTL -1. 1 version. YCSB to validate the solution Sizing model Qualifier:

key:state_name, value: state value. Dozens per Row. But….Only 10% are being updated per event

Subscriber Profile Table migrated from MS-SQL to M7 Table. Bulk insert once a day

Outbound Queue Table migrated from MS-SQL to M7 Table.

Phase 4 – Migrating OLTP features to M7 tables

Resulting Context

No longer dependent on Oracle for OLTP. Real-time processing can handle billions of events per day. Sizing is linear and easy to calculate:

Number of subscribers * state size * 80% should reside in cache. HW spec: 128GB RAM, 12 SAS drives.

Consistency management is very complicated.


Legacy 10M 120M

Phase 1 10M 200M



Phase 4 unlimited unlimited

Phase 5

Phase 5 – Decommission legacy RDBMS

Resulting Context MySQL is not a new technology in our stack (part of MapR

distribution). Removing Oracle/MS-SQL from our architecture has

significant impact on system cost, deployment, monitoring etc.

Atzmon Hen-Tov Lior Schachter

Technology

Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform