19
© Hortonworks Inc. 2012 Go beyond debug Wire Tap your App for knowlege with Hadoop Tom McCuch Solution Engineering @ Hortonworks Twitter: tmccuch Oleg Zhurakousky Principal Architect @ Hortonworks Twitter: z_oleg

Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

Embed Size (px)

DESCRIPTION

Today, application developers devote roughly 80% of their code to persisting roughly 20% of the total data flowing through the applications. That means two things: * 80% of the data flowing through our applications is at best lost in rolling log files, at worst never collected — without ever being analyzed or accounted for. * Application-level database programming, licensing, storage, administration, and ETL processing have maxed out IT budgets and have constrained app development teams from keeping pace with the rate of change in the business. The other 80% of the data is “Event Data” that can no longer be ignored if you want to stay competitive. Changes to application state are already stored as a sequence of events in application and middleware logs. In fact, since this data never held value to anyone but the developer in the past, a lot of potentially valuable information is often never collected. With Hadoop, we can: * store and query these events – Transaction tracing, * use the event log to reconstruct the application domain at any point in time – ETL, * use the same event log to construct new domains we haven`t planned for – ELT, and * automatically adjust our data domains to cope with retroactive changes – ??? In this talk, we will demonstrate how capturing all event data could dramatically simplify data collection and management within the enterprise.

Citation preview

Page 1: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2012

Go beyond debugWire Tap your App for knowlege

with Hadoop

Tom McCuch

Solution Engineering @ Hortonworks

Twitter: tmccuch

Oleg Zhurakousky

Principal Architect @ Hortonworks

Twitter: z_oleg

Page 2: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2012© Hortonworks Inc. 2012

The Application Development Dilemma

• Today, application developers devote roughly 80% of their code to persisting roughly 20% of the total data flowing through their applications

–80% of the data flowing through our applications is at best lost in rolling log files, at worst never collected -- without ever being analyzed or accounted for

–For the remaining 20% we do currently collect – application-level database programming, licensing, storage, administration, and ETL processing have maxed out IT operations budgets and have constrained app development teams from keeping pace with the rate of change in the business

Page 2

Page 3: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2012© Hortonworks Inc. 2012

Example: Data Available During Ingest

• Record count• Highest/Lowest record length• Average record length• Compression ratio

But with a little more work. . .• Field parsing

–Unique values–Unique values per field–Access to values of each field independently from the record–Relatively fast field-based searches, without indexing–Value encoding–Etc…

These are cross-cutting concerns!

Page 3

Page 4: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

How do we address cross-cutting concerns without disturbing the

existing process flow?

Page 4

Page 5: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2012© Hortonworks Inc. 2012

Wire Tap Defined

Page 5

Page 6: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2012© Hortonworks Inc. 2012

Wire Tap is an Enterprise Integration Pattern

Page 6

Page 7: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

TransformerConvert payload or modify headers

FilterDiscard messages based on boolean evaluation

RouterDetermine next channel based on content

SplitterGenerate multiple messages from one

AggregatorAssemble a single message from multiple

Other Enterprise Integration Patterns

Page 7

Page 8: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2012

The Business Case

Page 9: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2013

6 Key Hadoop DATA TYPES

1. SentimentUnderstand how your customers feel about your brand and products – right now

2. ClickstreamCapture and analyze website visitors’ data trails and optimize your website

3. Sensor/MachineDiscover patterns in data streaming automatically from remote sensors and machines

4. GeographicAnalyze location-based data to manage operations where they occur

5. Server LogsResearch logs to diagnose process failures and prevent security breaches

6. TextUnderstand patterns in text across millions of web pages, emails, and documents

Page

Value

Page 10: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2013

20 Apache Hadoop Enterprise Use Cases

Page

Vertical Use Case Data Type

Financial Services

New Account Risk Screens Text, Server Logs

Fraud Prevention Server Logs

Trading Risk Server Logs

Maximize Deposit Spread Text, Server Logs

Insurance Underwriting Geographic, Sensor, Text

Accelerate Loan Processing Text

Telecom

Call Detail Records (CDRs) Machine, Geographic

Infrastructure Investment Machine, Server Logs

Next Product to Buy (NPTB) Clickstream

Real-time Bandwidth Allocation Server Logs, Text, Sentiment

New Product Development Machine, Geographic

Retail

360° View of the Customer Clickstream, Text

Analyze Brand Sentiment Sentiment

Localized, Personalized Promotions Geographic

Website Optimization Clickstream

Optimal Store Layout Sensor

Manufacturing

Supply Chain and Logistics Sensor

Assembly Line Quality Assurance Sensor

Proactive Maintenance Machine

Crowdsourced Quality Assurance Sentiment

Page 11: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2012

Fraud Prevention

Business Problem• Financial institutions are always at risk of fraud• Fraudsters test bank systems for vulnerabilities• This testing leaves subtle patterns often undetected by bank

employees or law enforcement• Fraud losses costs banks millions

Solution• HDP reduces the cost to detect fraudulent activity• HDP stores more types of data for longer• Analysis of data in the “data lake” exposes fraudulent patterns that

would have gone undetected

Financial Services Data: Server Logs

Page 12: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

12

Credit Request Process Flow - Before

Credit Request Processing• Credit Request arrives on a Gateway• Credit Request is sent over a Channel • Credit Request Processor

• Receives Request• Processes the Request• Issues a Response

Page 13: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

• Credit Scoring• Fraud Detection• Gathering Data Available during Credit

Request Process Flow

Cross-Cutting Concerns

Page 14: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2012

Demo

Page 15: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

15

Credit Request Processing Flow - After

HDP

Page 16: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

16

Example: HTTP Header Collection

Page 17: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2012© Hortonworks Inc. 2012

Example: Data Available During Ingest

• Record count• Highest/Lowest record length• Average record length• Compression ratio

But with a little more work. . .• Field parsing - unstructured data is not all that unstructured…

–Unique values–Unique values per field–Access to values of each field independently from the record–Relatively fast field-based searches, without indexing–Value encoding–Etc…

These are cross-cutting concerns!

Page 17

Page 18: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2012

Demo

Page 19: Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

© Hortonworks Inc. 2012

Thank You!Questions & Answers

Follow: @tmccuch, @z_oleg, @hortonworks

Page 19