14
Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three Deployed Solutions Tao Zhong K. Doshi Xi Tang Ting Lou Zhongyan Lu Hong Li Software and Services Group, Intel

Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three Deployed Solutions

  • Upload
    ajay

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three Deployed Solutions Tao Zhong K. Doshi Xi Tang Ting Lou Zhongyan Lu Hong Li Software and Services Group, Intel. Statement of faith: - PowerPoint PPT Presentation

Citation preview

Page 1: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

Big Data Workloads Drawn from Real-time Analytics Scenarios

Across Three Deployed Solutions

Tao Zhong K. Doshi Xi Tang Ting Lou Zhongyan Lu Hong Li

Software and Services Group, Intel

Page 2: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

Statement of faith:

Real time (low latency) analytics will become more important to end users – if not for all queries, for a non-trivial fraction of queries.

Page 3: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

We walk through three workload scenarios in this short presentation.

Objective- Generate ideas for workloads that reflect low latency and high throughput demands simultaneously.

All three use cases described here are in deployment or in pre-deployment testing among Intel partners in PRC.

Page 4: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

1. Smart City Application:

Detect and Prevent License Plate Fraud

Page 5: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

Object

PersonVehicle

VIDEO FRAMES

+Types Attributes

Image Files Descriptions FilesImage Files Descriptions FilesImage Files

CAPTURE

EXTRACT

STORE

COMPUTE ANALYSIS SERVICES

Page 6: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

CAPTURE

EXTRACT

STORE

COMPUTE

Object

PersonVehicle

VIDEO FRAMES

+Types Attributes

Image Files Descriptions FilesImage Files Descriptions FilesImage Files Descriptions Files

ANALYSIS SERVICES

RDBMSRegistration and Traffic

History Records

Page 7: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

Registration Records

Enforcement

File System

Extraction System

Query

Integrate

Retrieve

Feed

Persist

NotifyReal-time Analytics

Merge

Evolve

Detect

1A

5

43

2

F

E

DC

B

SMART CITY Workload Solution Flow

Page 8: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

Registration Records

Enforcement

File System

Extraction System

Query

Integrate

Retrieve

Feed

Persist

NotifyReal-time Analytics

Merge

Evolve

Detect

1A

5

43

2

F

E

DC

B

SMART CITY Workload Characteristics

Transactional and analytic activities Structured and unstructured data Scale out in-memory processing combined with

distributed persistent data stores Real-time and batch operations Information inflows from sensor and non-sensor devices

Structured and unstructured data, Transactional and analytic activities, Scale out in-memory processing combined with

distributed persistent data stores Real-time and batch operations, and Information inflows from sensor and non-sensor devices

Page 9: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

2. Content Management and Integration

Page 10: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

Rapid Content Management -- Solution Flow

New Media

Traditional

Media

New Media

Traditional

Media

New Media

Traditional

Media

New Media

Traditional

Media

New Media

Traditional

Media

Information

Accumulatio

n over time

Information

Accumulatio

n over time

Digest and

Cross Reference

RDBMS

Log Extract and Transform

Sqoop

HBase

bulk move older data

sparse edits

Search

Data Analysis Logic

Hive

Hibernate DriverHBase Driver Hive Dialect

Page 11: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

Rapid Content Management – Workload Characteristics

New Media

Traditional

Media

New Media

Traditional

Media

New Media

Traditional

Media

New Media

Traditional

Media

New Media

Traditional

Media

Information

Accumulatio

n over time

Information

Accumulatio

n over time

Digest and

Cross Reference

RDBMS

Log Extract and Transform

Sqoop

HBase

bulk move older data

sparse edits

Search

Data Analysis Logic

Hive

Hibernate DriverHBase Driver Hive Dialect

Structured and unstructured data Transactional and analytic activities Fast searches over “hot” data, slow searches over rest

Structured and unstructured data Transactional and analytic activities Fast searches over “hot” data, slow searches

over rest RDBMS ops mixed with HBASE

RDBMS ops mixed with HBASE

Page 12: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

3. Fraud Detection

Page 13: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

Mid-transac

tion Analytic

s

Transactions

History

Telecom Payment Fraud Detection/Prevention -- Solution Flow

Recharge

Transaction

Credit Records

ALERT

SELECT  phone_number, SUM (charge_time), SUM (charge_amount) FROM trans_tableWHERE SUM(charge_time) > threshold_1 and SUM(charge_amount) > threshold_2

Page 14: Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three  Deployed  Solutions

Summary

• Workload scenarios from several “real life” use cases

• Blend of SQL and NOSQL approaches

• Recent data is available for queries nearly instantaneously

• Real-time responsiveness combined with high data volumes

• Mix of slow and fast operations (low latency analytics on recent data, complex analytics on

historical data)