39
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications Tom Lubinski Founder and CEO SL Corporation 7 March 2012

Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Embed Size (px)

DESCRIPTION

The most critical large-scale applications today, regardless of industry, involve a demand for real-time data transfer and visualization of potentially large volumes of data. With this demand comes numerous challenges and limiting factors, especially if these applications are deployed in virtual or cloud environments. In this session, SL’s CEO, Tom Lubinski, explains how to overcome the top four challenges to real-time application performance: database performance, network data transfer bandwidth limitations, processor performance and lack of real-time predictability. Solutions discussed will include design of the proper data model for the application data, along with design patterns that facilitate optimal and minimal data transfer across networks.

Citation preview

Page 1: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale,

Data-Centric Applications

Tom Lubinski

Founder and CEO

SL Corporation 7 March 2012

Page 2: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Disclaimers

In 30 years, we’ve learned a lot

(a grizzled veteran)

But, we don’t know everything …

… we could be wrong !

My other computer is a Mac

We have “shipped” …

Page 3: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Connecticut Valley Power Grid Management System

Extensive background in real-time process monitoring

Critical Tax Season Applications at Intuit

Large volumes of dynamic data

OOCL World Wide Shipment Tracking

Visualization technologies

NASA Space Shuttle Launch Control System

Mission-critical applications

Background

Page 4: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Here to talk about Scalability and Performance

Problem Space:

Collection, Analysis, and Visualization in Real-Time of large volumes of monitoring data from large-scale, complex, distributed applications

Emphasis: Real-Time, Large Volumes of Data

Page 5: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Challenges

Challenge #1:

Database Performance

Common to see queries taking minutes

How can you get real-time that way ?

Page 6: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Challenges

Challenge #2:

Network Data-Transfer Bandwidth

Bigger pipes, but there’s more data to send

How do you get the greatest throughput ?

Page 7: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Challenges

Challenge #3:

Processor Performance

More cores just means more processes !

How do you optimize your utilization ?

Page 8: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Challenges

Challenge #4:

Lack of Real-Time Predictability

Virtualization is the new time-share !

How can you trust your data ?

“time-sharing”, “network computer”, “cloud”, do things ever really change ?

Page 9: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Solution – Clues ?

Facts of Life:

Database – can’t live with it, can’t live without it

Network – it’s a funnel, no way around it

Processor – must limit what you ask it to do

Virtualization - it’s erratic, have to compensate

Page 10: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Solutions

Solution #1:

Proper Data Model

Data structures designed for real-time

In-memory structures to buffer database

Page 11: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Can your application be …

… like a high-performance racecar ?

Page 12: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

… the Transmission …

What is most important part of racecar ? (besides the engine)

Page 13: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Not a simple “current value”

cache

High-performance Real-time Multi-dimensional

Data Cache

For Real-Time performance, it’s the Cache …

Page 14: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Real-Time Cache – optimized for performance !

Current / History Tables:

Indexed Insertion - asynchronous real-time data

Indexed extraction - optimized transfer to clients

In Out

Page 15: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Real-Time Cache – Data Processing / Aggregation

Reduction, Resolution, Aging

Detail Views

Summary Views

Aggregation

Raw Data Reduced

S

Page 16: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Real-Time Cache – Database read/write through (optimized for timestamped multi-dimensional data)

Seamless timeline navigation with automatic

database query

Real-Time data automatically written to DB

Page 17: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

This sounds a bit like Oracle Coherence …

Buffer database Read/write through Listeners Indexed queries What’s different ?

Page 18: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Multi-Tier Visibility into Monitoring Data

In-depth Monitoring of Middleware Components

Unified Real-time display of data from all Application tiers

Update for ORCL

Page 19: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Different tools for different problems !

Real-Time Multi-dimensional data:

Current / History Tables: Multiple rows (time range) of

selected columns returned in one query

Coherence cache distributes objects

(rows) = optimized horizontally

Real-Time multi-dimensional cache manages columns and optimizes

vertically

Page 20: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Benefits: Indexed Real-Time Caching

Slow SQL queries minimized

Users shielded from database details

Minimize CPU load using effective indexing

Page 21: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Solutions

Solution #2

Server-Side Aggregation

(am I being too obvious with this one ?)

Know the use cases

Joins and GroupBy done on server

SQL does this, but do you need it ?

Page 22: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Problems with SQL Database Queries

Slow

Slowwer with concurrent queries

If you need it fast, it goes even slowwwwwwer !

SQL = Not portable

(Timestamps, especially)

Page 23: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Know your problem space !

Real-Time Monitoring:

Join and GroupBy heavily used

We wrote our own! Performed in real-time on server-side data

Optimized for real-time requirements

Page 24: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Display of Large Data Volumes

Typical large implementation, distributed over several regions with many custom applications

Heatmap View showing current state of entire system – size represents number of servers for application

Color represents how close metric is to SLA – large red boxes are worst – drilldown to detail

Page 25: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Complex Visualizations of historical data

Observe “internal load balancing” of Data Grid

Page 26: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Example: Server-Side Aggregation/Caching

Join on App

GroupBy App

GroupBy Server

Join on Server

To Clients

Raw Data

Servlet Data

App Data

Server Data

Totals By App

Totals By Server

Page 27: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Each cache can maintain its own history

To Clients

Cached Data And Aggregates

Servlet Data

Totals By App

Totals By Server … …

Page 28: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Result: trend chart of Totals by History has all data available immediately

Using SQL would require:

Query 3 tables

2 GroupBys, 2 Joins, + Join on Timestamp (not portable)

Page 29: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Benefits: Server-Side Aggregation

Client requests and gets exactly what is needed

Client processing = zero

Server processing = done ahead of time

Current/History for aggregates readily available (No SQL)

Response time = fast

Page 30: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Solutions

Solution #3

Use Appropriate Design Patterns

Server-Side vs. Client-Side Processing

Efficient Data Transfer Patterns

Page 31: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Pattern #1:

Data Compaction

(obvious, initial approach for any data transfers)

Server Client

Packets only partially filled …

… replaced with full packets

encode decode

… even simple, non-proprietary algorithms can make big difference

Page 32: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Pattern #2:

Data Current / Changed

(large data tables with sparse real-time updates)

Server Client

Entire table sent every update …

… instead, send only changed rows

encode decode

… little more complex, requires indexing

Page 33: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Pattern #3:

Data History / Current

(trend chart invoke with real-time updates)

Server Client

Entire history table sent every update …

… instead, send history once, then current updates

manage merge

… similar to current / changed pattern, but specific to history

Page 34: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Pattern #4:

Data Current / Subset

(optimizing transfer of data subsets to multiple clients)

Server

Client

Changed subset sent to every client …

… instead, send subset only to registered client

register indexed

listen indexed

Client

listen indexed

… requires registration logic coupled with cache

Page 35: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Drill-Down to Detail Metrics

Drilldown to detail level metrics showing internal metrics from each application

Sophisticated history and alert view with fine-tuning of thresholds for each metric

Page 36: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Benefits: Design Patterns for Data Transfer

Same problem over and over again solved similar way

Reduce load on network

Optimize response time – no unnecessary data

Page 37: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Conclusions

Conclusion #1:

Know your data !

Data Model designed for real-time

In-memory structures to buffer database

Server-side aggregations

Page 38: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Conclusions

Conclusion #2

Respect Design Patterns !

Server-Side vs. Client-Side Processing

Efficient Data Transfer Patterns

Don’t over-generalize – solve the problem

Page 39: Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, Data-Centric Applications

Questions?

See www.sl.com for more into about SL and RTView

Don’t miss SL Booth on Exhibit Floor !