Igniting Audience Measurement at Time Warner Cable

IGNITING AUDIENCE MEASUREMENT

AT TIME WARNER CABLETIM CASE

Agenda

• Who is Time Warner Cable & Time Warner Cable Media

• What is Audience Measurement?

• Challenges With Legacy Architecture

• Next Generation Architecture

• Lessons Learned

1

Who Am I?

• 10+ years in E-commerce

• Focused on Data Warehousing for the last 5 years

• Certifications

– Cloudera Certified Administrator for Apache Hadoop (CCAH)

– Cloudera Certified Developer for Apache Hadoop (CCDH)

– Teradata Certified Professional

– IBM Certified Specialist - PureData System for Analytics

– Tableau Server Certified Professional

– MicroStrategy Certified Engineering Principal

– Certified ScrumMaster

– Certified SAFe Agilist

• College sports fan – Go Noles!

2

Time Warner Cable & Time Warner Cable Media

Time Warner Cable is among the largest providers of video, high-speed data and voice services in the U.S., connecting more than 15 million customers to entertainment, information and each other

• Serves customers in 29 states

• More than 50,000 employees across the U.S.

Time Warner Cable Media, the advertising arm of Time Warner Cable, provides national, regional and local marketers and agencies with innovative, strategic and cost effective advertising solutions.

3

The Audience Measurement platforms enables census reporting of subscriber viewership and allows us to answer the Five W’s

Who is watching?

– Anonymized demographics, consumer behaviors

What are they watching?

– Station, program information, advertisements

When are they watching?

– Day of week, daypart, time-shifted

Where are they watching?

– Set-top box, TWC TV apps

Why are they watching?

– Program metadata

What is Audience Measurement?

4

Viewership Data

• Set-top box

– Processing more than 500 million events per day

• Largest table is Program Tuning Event Fact

75 TB of raw data

180+ Billion records

• TWC TV app (iPad, iPhone, Android, Xbox, etc.)

• Video On Demand (VOD)

Ads Data

• TWC Media and 3rd party spots

Reference Data

• Household demographics

• Program data

• Automotive data

• Political affiliation

5 Heavy Analytical

Users

200 Audience

Finder Users

50 Tableau Consumers

5

Audience MeasurementBy the Numbers

• Around 100 Tableau Workbooks

– Authored by the business and IT

• Numerous ad hoc queries

6

Video Viewership Analyzer (VVA)

• Custom application that enables complex audience definition by the user community

– Date range

– Geography (DMA, Ad Zone or Zip Code)

– Platform (Classic, IPTV)

– Audience Definition

• Daypart

• Station and/or Program

• Demographics (includes line-of-business, propensities, Tribes and automotive)

• Platform usage (VOD, IPTV, high-speed data)

• Custom segmentation

• Output includes ranked list of stations and some high-level metrics

7

Audience Finder

Audience Finder: Reference Program

8

Technology

9

• 3rd Party application ingests raw data and performs anonymization, correlation and some enrichment/mediation

• TWC ingests files provided by 3rd Party and performs additional enrichment as well as applying business rules and stitching logic

– Executed in Netezza using SQL and shell scripts

• Two Netezza appliances

– TwinFin 36 used for ELT processing

– TwinFin 72 used for BI and customer-facing workloads

10

Legacy Platform Architecture

Source Data

TWC Media Business Logic

StitchingFiltering

Zombie Logic

Core LogicAnonymization

CorrelationMediation

Enrichment

Collection

• Inconsistency around reliability and availability of source and reference data

Processing

• Slow catch up process

• Arch does not promote speed to market for new features

Data Storage + Delivery

• Platform instability

• Does not support concurrent users

Analysis + Presentation

• Limited exploration and interactive capabilities

Challenges With Legacy Architecture

11

• SLA’s for T-3 and T-14

• Frequency of reprocessing

• Reference data quality

• Duration of reprocessing

• Team Velocity when introducing ETL changes

• Platform availability

• Query response times

• Response time SLA’s during mixed workload

• User satisfaction w/ the interface

• Customer dependency on IT for changes

Metrics to Assess

Technical Criteria

• Performance

• Supports batch and streaming

• Leverage software engineering patterns

• Open source momentum

• “-ilities”

– Scalability

– Elasticity

– Availability

– Durability

– Extensibility

• Enables DevOps to compliment Agile adoption

– Automated testing

– Test-driven Development (TDD)

– Continuous Integration (CI)

• Strong foundation for Data Lake12

Data Warehouse

Event Persistence

Hadoop

Visualization

Data Integration

Apache Spark is a more appropriate solution for set-top box processing logic:

Reduces complexity, simplifies code maintenance, improves defect resolution

time, improves run-time.

Can be applied in batch or near real-time with modest changes which positions

for T-x data availability (where ‘x’ is only limited by the availability of reference

data)

Enables use of Agile development principles (test-driven development and

continuous integration) there by Improving time-to-market, code quality, and

radically reducing QA costs and time.

Hadoop/HDFS for storing large historical data positions the organization

to leverage the evolving open source big data analytics technologies

(machine learning, SQL on Hadoop, graph processing, etc.)

Teradata will allow for large volumes of tuning event data to be secure,

easily accessible, and highly available to large numbers of users and at

reasonable cost.

Tableau enables self-service analytics, including advanced algorithms,

against the audience measurement data, then present information to

various consumers in meaningful ways.

Kafka is a high-performance, fault-tolerant, real-time messaging platform that

will allow us to keep a history of tuning events for faster reprocessing. This

component is critical once we are performing near real-time streaming of

events.

13

Technologies Selected

Core LogicAnonymization


Enrichment

TWC Media Business Logic

StitchingFiltering

Zombie Logic

Initial Nextgen Architecture

Replace MicroStrategy with Tableau to enable self-service

Replace Netezza for customer facing workloads with Teradata, improving platform stability, enabling sandboxes (e.g., Data Labs) and workload management tools which assist in managing to performance SLA’s

o Replace 3rd Party application and Netezza ELT with Spark for Collection and Processing logic (anonymization , correlation, enrichment, filtering, stitching, & zombie logic)

Source Data

14

Long-Term Architecture

• Implement an enterprise Data Lake to enable non-Media use cases

• Migrate to Spark Streaming and Kafka to enable near real-time use cases

• Evaluate dedicated infrastructure for more predictable performance

Event Data

Business LogicStitchingFiltering

Zombie Logic

Data LakeAnonymization


Enrichment

15

Reference Data

Lessons Learned

• Have Executive support

• Infrastructure is critical

– Node sizes

– Network

• Leverage the open source community

– Enhancements

– Extensions (Spark Packages)

• Talent is hard to find

– Consider abstractions

16

Partners

17

We’re Hiring!http://jobs.timewarnercable.com/

Thank You!

[email protected]@timrcase

http://www.linkedin.com/in/timrcase