Upload
edmund-boyd
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Analytics from 330 million
smartphonesSean Byrnes
CTO & Co-founder
Flurry Overview
60,000
160,000
App Developers:
Live Applications:
Flurry Analytics Better apps on iOS, Android, BB, WP, HTML5
480MDevices per month:
33BSessions per month:
AppCircle NetworkAcquisition & Monetization: iOS, Android
6,200App Developers:
200MDevices per month:
300BEvents per month:
3MDaily Completed Views
How Flurry Works
Flurry’s Scale
1.2 Billions Sessions / Day
900 Servers
1.56 PB
Topics
1. Big Data Collection (HDFS)
2. Big Data Processing (Hadoop)
3. Data Mining at Scale (Hbase)
BIG DATA COLLECTION
Incoming Data
Peak Connections per Second: 25,000 Data per day: 1.5 TB
Data Collection
Reports
Load BalancerLoad Balancer
Load BalancerData Collector Load BalancerData Collector Load BalancerData Collector
File File File
HDFS
Data Collection
Reports Reports
HDFS HDFS
Location A Location B
BIG DATA PROCESSING
11
Normalization
Data Correction
Metrics Computation
Agent Report
De-duplication
Portfolio Analysis
Benchmarking Clustering
Identify Device, Country,Carrier, etc.
Bad Phone ClocksPartial Session Reports
Handle duplicate reports
Flexible calculationConfigurable Dimensions
Data mining and analysis
Audience Segmentation
Industry Trends Application Analytics
MerchandisingAnalytics
Analytics Processing
Large-scale Data Processing
Input Data
NoSQL DataStore
Real-Time Batch
Collectors
Consumer/ ProducerSystems
MapReduce(jobs)
External Action
External Action
Map/Reduce Management
• Challenge: Task Starvation
• Challenge: Task Roadblocking
• Challenge: Network Connection Waiting
Network Topology: Chained
Rack 1 Rack 2
Switch 1 Switch 2
Rack 3
Switch 3
Network Topology: Star
Rack 3 Rack 2
Switch 3 Switch 4
Switch 1 Switch 2
Trunk
Rack 1 Rack 2
DATA MINING AT SCALE
Stages of Data
Normalized OLAP CubeRaw Data
80 Billion Rows160 Billion Rows500 Billion Records
NoSQL Tables
111111111 Data Data
Index Column Family A Column Family B
222222222 Data Data
333333333 Data Data
NoSQL OLAP
metric.dimension
Index Column Family A
#
metric.dimensionA
metric.dimensionB
metric.dimensionC
metric.dimensionA.dimensionB.dimensionC
metric.dimensionA.dimensionB
metric.dimensionA.dimensionC
...
Lexicographical Ordering
metric dimensionA dimensionB index
3 1 1 3113 1 11 31113 11 1 3111
metric.dimensionA.dimensionB
Lexicographical Ordering
metric dimensionA dimensionB index
3 001 001 30010013 001 011 30010113 011 001 3011001
metric.dimensionA.dimensionB
NoSQL OLAP
metric.dimension.date
metric.dimension.1_1_12metric.dimension.3_1_12
Index
Row Scan
metric1/1/12
3/1/12
blog.flurry.com
Sean [email protected]
Flurry, Inc.
282 2nd St. Suite 202
San Francisco, CA 94105
http://www.flurry.com