Upload
yahoo-developer-network
View
1.786
Download
0
Tags:
Embed Size (px)
DESCRIPTION
This talk will review the components required to build large scale data pipelines on Hadoop. The talk will draw on the experience of building large scale data pipelines at Yahoo.
Citation preview
Sameer Raheja Director Engineering, Yahoo!
July 18, 2012
Data Pipeline Overview
2
Data Pipeline Overview
• What is a Data Pipeline? • What components are required for Data Pipelines • How Hadoop is used to solve the Data Pipeline challenge at Yahoo
3
• Wikipedia defines Pipeline Computing as
– “A set of data processing elements connected in series, so that the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion; in that case, some amount of buffer storage is often inserted between elements.”
– http://en.wikipedia.org/wiki/Pipeline_%28computing%29
4
What makes a Data Pipeline complex?
5
Data Volume
6
Time
7
Frequency
8
Parallelism
9
Catch Up
10
Reprocessing
11
Coordination
12
A few more
• Data Policy • Capacity Planning • Monitoring • Alerting
• Retries
13
Physical Representation of a Pipeline
14
Definition: Stage
filter_serve(5m)
filter_click(15m)
click_serve_join(15m)
click_by_demo(hourly)
<Stage name=“filter_click” <Schedule frequency=“15m" offset=“0” timezone=“UTC”/> ... </Stage>
<Stage name=“click_serve_join” <Schedule frequency="15m" offset=“0” timezone=“UTC”/> ... </Stage>
<Stage name=“click_by_demo” <Schedule frequency=“hourly" offset=“0” timezone=“UTC”/> ... </Stage>
<Stage name=“filter_serve” <Schedule frequency="5m" offset=“0” timezone=“UTC”/> ... </Stage>
15
Definition: Stage Dependencies
filter_serve(5m)
filter_click(15m)
click_serve_join(15m)
click_by_demo(hourly)
<Stage name=“filter_serve” <Schedule frequency="5m" offset=“0” timezone=“UTC”/> ... </Stage>
<Stage name=“filter_click” <Schedule frequency=“15m" offset=“0” timezone=“UTC”/> ... </Stage>
<Stage name=“click_serve_join” <Dependencies> <DependsOn stageName=“filter_clicks” /> <DependsOn stageName=“filter_serves” start=“$stageStartTime - 2” end=“$stageStartTime”/> </Dependencies> ... </Stage>
<Stage name=“click_serve_join” <Dependencies> <DependsOn stageName=“click_serve_join” start=“$stageStartTime - 3” end=“$stageStartTime”/> </Dependencies> ... </Stage>
16
bclk_15mgclk_15m
bsrv_5mgsrv_15m
filter_serve(5m)
filter_click(15m)
aclk_15m
click_serve_join(15m)
click_by_demo(hourly)
cdem_hourly
Definition: Feed Dependencies
<Stage name=“filter_serve” <Data> <Outputs> <Output feedID=“gsrv”/> <Output feedID=“bsrv”/> </Outputs> </Data> ... </Stage>
<Stage name=“filter_click” <Data> <Outputs> <Output feedID=“gclk”/> <Output feedID=“bclk”/> </Outputs> </Data> ... </Stage>
<Stage name=“click_serve_join” <Data> <Inputs> <Input feedID=“bclk”/> <Input feedID=“gclk”/> <Input feedID=“gsrv”/> <Input feedID=“bsrv”/> </Inputs> <Outputs> <Output feedID=“aclk”/> </Outputs> </Data> ... </Stage>
<Stage name=“click_by_demo” <Data> <Inputs> <Input feedID=“aclk”/> </Inputs> <Outputs> <Output feedID=“cdem”/> </Outputs> </Data> ... </Stage>
17
W1W2
W4W5W1
W2 W6W3
W3W4W1
W2
W1W2
W3
bclk_15mgclk_15m
bsrv_5mgsrv_15m
filter_serve(5m)
filter_click(15m)
aclk_15m
click_serve_join(15m)
click_by_demo(hourly)
cdem_hourly
Definition: Jobs & Parallelism
<Stage name=“filter_click” <Parallelism value=“2” /> ... </Stage>
<Stage name=“click_serve_join” <Parallelism value=“6” /> ... </Stage>
<Stage name=“filter_serve” <Parallelism value=“4” /> ... </Stage>
<Stage name=“click_by_demo” <Parallelism value=“3” /> ... </Stage>
18
Definition: Execution Plan
cdemaclkgclk
bclk
gsrv
bsrv
gsrv
bsrv
gsrv
bsrv
aclkgclk
bclk
gsrv
bsrv
gsrv
bsrv
gsrv
bsrv
aclkgclk
bclk
gsrv
bsrv
gsrv
bsrv
gsrv
bsrv
aclkgclk
bclk
05:15 05:30 06:0005:45
filter_clicks(15m)
filter_serves(5m)
05:05 05:10 05:20 05:25 05:35 05:40 05:50 05:55
click_serve_join(15m)
filter_serves(5m)
filter_serves(5m)
filter_clicks(15m)
filter_serves(5m)
click_serve_join(15m)
filter_serves(5m)
filter_serves(5m)
filter_clicks(15m)
filter_serves(5m)
click_serve_join(15m)
filter_serves(5m)
filter_serves(5m)
filter_clicks(15m)
filter_serves(5m)
click_serve_join(15m)
filter_serves(5m)
filter_serves(5m)
click_by_demo(hourly)
gsrv
bsrv
gsrv
bsrv
gsrv
bsrv
06:05 06:10 06:15
Stage
Instances
Feed
Instances
Jobs
19
Data Pipeline Components – how to put it together
Component Definition
Data Collection Ability to transport data from data event producers to a single repository
Data Acquisition Ability to pull from a variety of external sources
Data Storage System to store and access large volumes of data quickly
Data Processing The ability to transform data in various useful ways including annotation, filtering and aggregation
Table Management / Meta Data
Provide a consistent API for data consumers with a standard meta data system
Job Coordination/Scheduling
Ability to schedule, submit, manage, retry, reprocess, catch up a DAG
Data Output Enables push or pull based delivery of data subject to policies
Data Policy Management Anonymize, retain, clean up and archive data
Monitoring / System Management
Provide the ability to operate, visualize and install pipelines
20
What is a Data Pipeline at Yahoo?
21
Sample Pipeline Flow
Event Stream
Raw Data
Event Transformer
Inter Event Joins
Fraud Detection
Pre Aggregate
Analysis Optimization
Reporting Research Targeting
Collection Extract, Transform and Load Business Logic Subflows
Verification
PreAggregate
Verification Definitive Metrics
22
Sample DAG
KS_SCJi: post_tp_ks_click (5m)
o: annotated_ks_click (5m)[priority: 500]
KS_SCJ_CSi: annotated_ks_click (5m)o: annotated_ks_click (5m)
[latency: 35][priority: 500]
SCJ_KS_CLICK_INITi: post_tp_ks_click (5m)
[priority: 500]
SCJ_KS_SERVE_BDB_INITi: ks_serve_int,ks_serve_bdb_int (5m)
[priority: 500]
ER_BOOKING_CLICK_IMPR_KS_STATSi: er_booking_click_impr_ks (15m)o: er_booking_click_impr_ks (15m)
[priority: 400]
ER_BOOKING_CLICK_IMPR_KS_STATS_CSi: er_booking_click_impr_ks (15m)
[latency: 60][priority: 400]
ER_BOOKING_CLICK_IMPR_KS_QUERYi: pub_ep_report_ks (15m)
o: er_booking_click_impr_ks (15m)[latency: 60]
[priority: 400]
IR_PATH_PERF_NGD_INIT_HOURLYi: ngd_preagg (5m)
[priority: 200]
IR_PATH_PERF_NGD_QUERY_HOURLYi: ngd_preagg (5m)
o: ir_path_perf_ngd (hourly)[latency: 60]
[priority: 200]
NGD_PREAGG_QUERY_COMPLETEi: post_tp_ngd_serve,post_tp_ngd_click,ngd_conversion (5m)
o: ngd_preagg (5m)[latency: 25]
[priority: 100]
ER_CLICK_IMPR_NGD_INITi: ngd_preagg (5m)
[priority: 400]
SOX_METRICS_NGD_INITi: ngd_preagg (5m)
[priority: 500]
AM_NGD_INITi: ngd_preagg (5m)
[priority: 500]
OF_NGD_ORDER_HOURLY_INITi: ngd_preagg (5m)
[priority: 100]
ER_BOOKING_CLICK_IMPR_INITi: pub_ep_report (15m)
[priority: 400]
ER_BOOKING_CLICK_IMPR_QUERYi: pub_ep_report (15m)
o: er_booking_click_impr (15m)[latency: 60]
[priority: 400]
PUB_EP_REPORT_QUERYi: gd_preagg (5m)
o: pub_ep_report (15m)[latency: 60]
[priority: 400]
IR_PUB_PERF_INIT_HOURLYi: pub_ep_report (15m)
[priority: 400]
SCJi: gd_click (5m)
o: annotated_gd_click (5m)[priority: 500]
SCJ_CSi: annotated_gd_click (5m)o: annotated_gd_click (5m)
[latency: 35][priority: 500]
SCJ_GD_SERVE_BDB_INITi: gd_serve_int,gd_serve_bdb_int (5m)
[priority: 500]
SCJ_GD_CLICK_INITi: gd_click (5m)[priority: 500]
KS_SERVE_BDBi: ks_serve (5m)
o: ks_serve_bdb_int,ks_serve_int (5m)[priority: 500]
KS_SERVE_INT_CSi: ks_serve_int (5m)o: ks_serve_int (5m)
[latency: 30][priority: 500]
KS_SERVE_BDB_CSi: ks_serve_bdb_int (5m)o: ks_serve_bdb_int (5m)
[latency: 30][priority: 500]
KS_SERVE_BDB_INITi: ks_serve (5m)[priority: 500]
SIJi: gd_impr (5m)
o: annotated_gd_impression (5m)[priority: 500]
SIJ_CSi: annotated_gd_impression (5m)o: annotated_gd_impression (5m)
[latency: 35][priority: 500]
SIJ_GD_IMPR_INITi: gd_impr (5m)[priority: 500]
SIJ_GD_SERVE_BDB_INITi: gd_serve_int,gd_serve_bdb_int (5m)
[priority: 500]
NGD_SCJi: ngd_click (5m)
o: annotated_ngd_click (5m)[priority: 500]
NGD_SCJ_CSi: annotated_ngd_click (5m)o: annotated_ngd_click (5m)
[latency: 35][priority: 500]
NGD_SCJ_SERVE_BDB_INITi: ngd_serve_int,ngd_serve_bdb_int (5m)
[priority: 500]
NGD_SCJ_CLICK_INITi: ngd_click (5m)
[priority: 500]
NGD_COB[priority: 500]
NGD_SERVE_CSi: ngd_serve (5m)o: ngd_serve (5m)
[latency: 25][priority: 500]
NGD_CLICK_CSi: ngd_click (5m)o: ngd_click (5m)
[latency: 25][priority: 500]
NGD_CONV_CSi: ngd_conversion (5m)o: ngd_conversion (5m)
[latency: 25][priority: 500]
ER_CM_CLICK_IMPR_GD_STATSi: er_cm_click_impr_gd (15m)o: er_cm_click_impr_gd (15m)
[priority: 500]
ER_CM_CLICK_IMPR_GD_STATS_CSi: er_cm_click_impr_gd (15m)o: er_cm_click_impr_gd (15m)
[latency: 60][priority: 500]
ER_CM_CLICK_IMPR_GD_QUERYi: cm_gd_preagg (15m)
o: er_cm_click_impr_gd (15m)[latency: 60]
[priority: 500]
NGD_SERVE_FILTERED_QUERYi: ngd_serve (5m)
o: ngd_serve_filtered (5m)[latency: 120][priority: 500]
NGD_SERVE_FILTERED_INITi: ngd_serve (5m)
[priority: 500]
IMS_QUOTA_SERVER_STATSi: ims_quota_server (15m)o: ims_quota_server (15m)
[priority: 100]
IMS_QUOTA_SERVER_QUERY_STATS_CSi: ims_quota_server (15m)o: ims_quota_server (15m)
[latency: 500][priority: 100]
IMS_QUOTA_SERVER_QUERYi: post_tp_annotated_gd_click,post_tp_annotated_gd_impression (5m,5m)
o: ims_quota_server (15m)[latency: 500][priority: 100]
SMJi: creative_metric (5m)
o: annotated_gd_cm (5m)[priority: 500]
SMJ_CSi: annotated_gd_cm (5m)o: annotated_gd_cm (5m)
[latency: 35][priority: 500]
SMJ_CREATIVE_METRIC_INITi: creative_metric (5m)
[priority: 500]
SMJ_GD_SERVE_BDB_INITi: gd_serve_int,gd_serve_bdb_int (5m)
[priority: 500]
GD_SERVE_BDBi: gd_serve (5m)
o: gd_serve_bdb_int,gd_serve_int (5m)[priority: 500]
GD_SERVE_INT_CSi: gd_serve_int (5m)o: gd_serve_int (5m)
[latency: 30][priority: 500]
GD_SERVE_BDB_CSi: gd_serve_bdb_int (5m)o: gd_serve_bdb_int (5m)
[latency: 30][priority: 500]
GD_SERVE_BDB_INITi: gd_serve (5m)[priority: 500]
ANNOTATED_KS_CLICK_HOURLY_STATS_CSi: annotated_ks_click (hourly)o: annotated_ks_click (hourly)
[latency: 90][priority: 500]
ANNOTATED_KS_CLICK_HOURLY_STATSi: annotated_ks_click (hourly)o: annotated_ks_click (hourly)
[priority: 500]
ER_LINE_CLICK_IMPR_NGD_QUERYi: adv_ep_report_ngd (15m)
o: er_line_click_impr_ngd (15m)[latency: 60]
[priority: 400]
ER_LINE_CLICK_IMPR_MERGE_INITi: er_line_click_impr,er_line_click_impr_ngd (15m)
[priority: 200]
ER_LINE_CLICK_IMPR_NGD_INITi: adv_ep_report_ngd (15m)
[priority: 400]
LOF_FETCHER_GD_5M[latency: 20]
[priority: 500]
3
3PI_BID_PROC_F_STATSi: 3pi_bid_proc_fail (15m)o: 3pi_bid_proc_fail (15m)
[priority: 100]
3PI_BID_PROC_F_QUERYi: ngd_serve_3pi (5m)
o: 3pi_bid_proc_fail (15m)[latency: 60]
[priority: 100]
TPLLODS_CHECKS_POST_TP_NGD_SERVEi: annotated_gd_impression (5m)
o: post_tp_annotated_gd_impression (5m)[latency: 40]
[priority: 500]
NGD_SERVE_BDB_INITi: post_tp_ngd_serve (5m)
[priority: 500]
NGD_PREAGG_INITi: post_tp_ngd_serve,post_tp_ngd_click,ngd_conversion (5m)
[priority: 100]
NGD_PREDICT_PREAGG_INITi: post_tp_ngd_serve,post_tp_ngd_click,ngd_conversion (5m)
[priority: 100]
SQM_NGD_SERVEURL_IMPR_HOURLY_INITi: post_tp_ngd_serve (5m)
[priority: 100]
TP_NGD_SERVE_INITi: ngd_serve (5m)
[priority: 500]
ER_NETWORK_CLICK_IMPR_INITi: network_report,network_report_smp (15m)
[priority: 400]
ER_NETWORK_CLICK_IMPR_QUERYi: network_report,network_report_smp (15m)
o: er_network_click_impr (15m)[latency: 60]
[priority: 400]
NETWORK_REPORT_QUERYi: gd_preagg (5m)
o: network_report (15m)[latency: 60]
[priority: 400]
IR_ADV_NET_PUB_INITi: network_report (15m)
[priority: 400]
NETWORK_REPORT_SMP_QUERYi: gd_preagg (5m)
o: network_report_smp (15m)[latency: 60]
[priority: 400]
ER_NETWORK_CLICK_IMPR_MERGE_STATSi: er_network_click_impr_merged (15m)o: er_network_click_impr_merged (15m)
[priority: 200]
ER_NETWORK_CLICK_IMPR_MERGE_STATS_CSi: er_network_click_impr_merged (15m)o: er_network_click_impr_merged (15m)
[latency: 60][priority: 200]
ER_NETWORK_CLICK_IMPR_MERGE_QUERYi: er_network_click_impr,er_network_click_impr_ngd (15m)
o: er_network_click_impr_merged (15m)[priority: 200]
ER_NETWORK_CLICK_IMPR_MERGE_AMDi: gd_impr,gd_click (5m)
o: er_network_click_impr_merged (15m)[latency: 60]
[priority: 200]
GD_SERVE_CSi: gd_serve (5m)o: gd_serve (5m)
[latency: 25][priority: 500]
DH_DATA_VALIDATION_LOF_FETCHER_5M[priority: 500]
GD_SERVE_ROLLUP_INIT[priority: 500]
ACT_EXCH_RB_SEG_INITi: gd_serve,seg_beacon,ngd_serve (5m)
[priority: 500]
ACT_SRV_TGTSRV_HR_INITi: gd_serve (5m)[priority: 500]
GD_COB[priority: 500]
YOO_GD_SERVE_CSi: yoo_gd_serve (5m)o: yoo_gd_serve (5m)
[latency: 25][priority: 500]
ACT_CLICKS_TGTCLICKS_HOURLY_QUERYi: annotated_gd_click (5m)
o: act_apex_clicks,act_apex_targeted_clicks (hourly)[latency: 60]
[priority: 500]
ACT_CLICKS_TGTCLICKS_HOURLY_INITi: annotated_gd_click (5m)
[priority: 500]
NGD_SERVE_BDBi: post_tp_ngd_serve (5m)
o: ngd_serve_bdb_int,ngd_serve_int (5m)[priority: 500]
NGD_SERVE_INT_CSi: ngd_serve_int (5m)o: ngd_serve_int (5m)
[latency: 30][priority: 500]
NGD_SERVE_BDB_CSi: ngd_serve_bdb_int (5m)o: ngd_serve_bdb_int (5m)
[latency: 30][priority: 500]
YOO_GD_CLICK_CSi: yoo_gd_click (5m)o: yoo_gd_click (5m)
[latency: 25][priority: 500]
YOO_GD_CLICK_SORTED_INITi: yoo_gd_click (5m)
[priority: 100]
BATCH_COB[priority: 500]
GD_IMPR_CSi: gd_impr (5m)o: gd_impr (5m)
[latency: 25][priority: 500]
GD_CLICK_CSi: gd_click (5m)o: gd_click (5m)
[latency: 25][priority: 500]
3PI_BID_PROC_F_QUERY_STATS_CSi: 3pi_bid_proc_fail (15m)o: 3pi_bid_proc_fail (15m)
[latency: 60][priority: 100]
AM_NGD_STATSi: am_ngd (15m)o: am_ngd (15m)
[priority: 500]
AM_NGD_QUERY_STATS_CSi: am_ngd (15m)o: am_ngd (15m)
[latency: 120][priority: 500]
AM_NGD_QUERY[priority: 500]
SOX_AM_NGD_INITi: am_ngd (15m)
[priority: 500]
CM_PREAGG_INITi: annotated_gd_cm (5m)
[priority: 500]
SOX_AM_GD_DEF_METRICS_CHECKi: sox_metrics_impr (5m)
[priority: 500]
SOX_METRICS_FOR_AMi: sox_metrics_impr (5m)
[latency: 20][priority: 500]
SOX_METRICS_GD_IMPRi: post_tp_annotated_gd_impression (5m)
o: sox_metrics_impr (5m)[latency: 20]
[priority: 500]
SOX_METRICS_HOURLY_ROLLUP_INITi: sox_metrics_impr (5m)
[priority: 500]
GD_SERVE_ROLLUP_STATS[priority: 500]
GD_SERVE_ROLLUP_STATS_CS[priority: 500]
GD_SERVE_ROLLUP_QUERY[priority: 500]
DEFINITIVE_METRICS_GD_QS_CHECK_15M[priority: 500]
DEFINITIVE_METRICS_VALIDATE_QS_WORKER_15M[priority: 500]
MME_QS_QUERYi: gd_preagg (5m)
o: gd_quota_server (15m)[priority: 500]
MME_QS_STATSi: gd_quota_server (15m)o: gd_quota_server (15m)
[priority: 500]
MME_QS_AMDi: gd_impr,gd_click (5m)o: gd_quota_server (15m)
[latency: 500][priority: 500]
ANNOTATED_KS_CLICK_HOURLY_INITi: annotated_ks_click (5m)
[priority: 500]
ANNOTATED_KS_CLICK_HOURLY_QUERYi: annotated_ks_click (5m)
o: annotated_ks_click (hourly)[latency: 90]
[priority: 500]
POST_MAPPING_ANNOTATED_KS_CLICK_INITi: annotated_ks_click (5m)
[priority: 300]
KS_PREAGG_INITi: ks_serve (5m)
i: annotated_ks_click (5m)[priority: 300]
IR_ADV_NET_PUB_MERGE_QUERY_HOURLY_STATSi: ir_adv_net_pub_merged (hourly)o: ir_adv_net_pub_merged (hourly)
[priority: 200]
IR_ADV_NET_PUB_MERGE_QUERY_HOURLY_STATS_CSi: ir_adv_net_pub_merged (hourly)o: ir_adv_net_pub_merged (hourly)
[latency: 60][priority: 200]
IR_ADV_NET_PUB_MERGE_QUERY_HOURLYi: ir_adv_net_pub,ir_adv_net_pub_ngd,ir_adv_net_pub_ks (hourly)
o: ir_adv_net_pub_merged (hourly)[priority: 200]
IR_ADV_NET_PUB_MERGE_AMDi: gd_impr,gd_click (5m)
o: ir_adv_net_pub_merged (hourly)[latency: 60]
[priority: 200]
KS_SERVE_ROLLUP_STATS[priority: 500]
KS_SERVE_ROLLUP_STATS_CS[priority: 500]
KS_SERVE_ROLLUP_QUERY[priority: 500]
TP_GD_SERVE_CLICK_INITi: annotated_gd_click (5m)
[priority: 500]
TPLLODS_GD_CLICKi: annotated_gd_click (5m)
o: post_tp_annotated_gd_click (5m)[latency: 40]
[priority: 500]
TP_NGD_CLICK_INITi: annotated_ngd_click (5m)
[priority: 500]
TPLLODS_CHECKS_POST_TP_NGD_CLICKi: annotated_ngd_click (5m)o: post_tp_ngd_click (5m)
[latency: 40][priority: 500]
ER_LINE_CLICK_IMPR_INITi: adv_ep_report (5m,15m)
[priority: 400]
ER_LINE_CLICK_IMPR_QUERYi: adv_ep_report (15m)
o: er_line_click_impr (15m)[latency: 60]
[priority: 400]
ADV_EP_REPORT_QUERYi: gd_preagg (5m)
o: adv_ep_report (15m)[latency: 60]
[priority: 400]
IR_ADV_PERF_INIT_HOURLYi: adv_ep_report (15m)
[priority: 400]
APEX_AUDIT_LOG_STATSi: apex_audit_log (5m)o: apex_audit_log (5m)
[priority: 300]
APEX_AUDIT_LOG_QUERY_STATS_CSi: apex_audit_log (5m)o: apex_audit_log (5m)
[latency: 25][priority: 300]
AUDIT_LOG_CSi: apex_audit_log (5m)o: apex_audit_log (5m)
[latency: 25][priority: 300]
3PI_BID_PROC_F_INITi: ngd_serve_3pi (5m)
[priority: 100]
NGD_SERVE_3PI_QUERYi: ngd_serve (5m)
o: ngd_serve_3pi (5m)[latency: 30]
[priority: 100]
AM_KS_STATSi: am_ks (15m)o: am_ks (15m)[priority: 500]
AM_KS_QUERY_STATS_CSi: am_ks (15m)o: am_ks (15m)[latency: 120][priority: 500]
AM_KS_QUERYi: ks_preagg (5m)o: am_ks (15m)[latency: 120][priority: 500]
SOX_AM_KS_INITi: am_ks (15m)[priority: 500]
AM_GD_STATSi: am_gd (15m)o: am_gd (15m)[priority: 500]
AM_GD_QUERY_STATS_CSi: am_gd (15m)o: am_gd (15m)[latency: 120][priority: 500]
SOX_VALIDATE_AM_GD[priority: 500]
LOF_FETCHER_DEFAULT_5M[latency: 20]
[priority: 500]
NGD_SERVE_3PI_INITi: ngd_serve (5m)
[priority: 100]
TPLLODS_CHECKS_POST_TP_KS_CLICKi: ks_click (5m)
o: post_tp_ks_click (5m)[latency: 40]
[priority: 500]
POST_MAPPING_KS_CLICK_INITi: post_tp_ks_click (5m)
[priority: 300]
SOX_METRICS_KS_INITi: ks_preagg (5m)
[priority: 500]
PREDICT_CORE_STATSi: ngd_predict_core (30m)o: ngd_predict_core (30m)
[priority: 500]
PREDICT_CORE_QUERY_STATS_CSi: ngd_predict_core (30m)o: ngd_predict_core (30m)
[latency: 120][priority: 500]
PREDICT_CORE_QUERYi: ngd_predict_preagg (5m)o: ngd_predict_core (30m)
[latency: 120][priority: 500]
ER_CREATIVE_CLICK_IMPR_MERGE_STATSi: er_creative_click_impr_merged (15m)o: er_creative_click_impr_merged (15m)
[priority: 200]
ER_CREATIVE_CLICK_IMPR_MERGE_STATS_CSi: er_creative_click_impr_merged (15m)o: er_creative_click_impr_merged (15m)
[latency: 60][priority: 200]
ER_CREATIVE_CLICK_IMPR_MERGE_QUERYi: er_creative_click_impr,er_creative_click_impr_ngd (15m)
o: er_creative_click_impr_merged (15m)[priority: 200]
ER_CREATIVE_CLICK_IMPR_MERGE_AMDi: gd_impr,gd_click (5m)
o: er_creative_click_impr_merged (15m)[latency: 60]
[priority: 200]
TPLLODS_CHECKS_POST_TP_ANNOTATED_GD_IMPRESSIONi: annotated_gd_impression (5m)
o: post_tp_annotated_gd_impression (5m)[latency: 40]
[priority: 500]
SOX_METRICS_GD_IMPR_INITi: post_tp_annotated_gd_impression (5m)
[priority: 500]
POST_TP_DEFINITIVE_METRICS_INIT_5M[priority: 500]
PREAGG_GD_INITi: post_tp_annotated_gd_click,post_tp_annotated_gd_impression (5m)
[priority: 100]
SQM_GD_SERVEURL_IMPR_HOURLY_INITi: post_tp_annotated_gd_impression (5m)
[priority: 100]
IMS_MOROCCO_INITi: post_tp_annotated_gd_click,post_tp_annotated_gd_impression (5m,5m)
[priority: 500]
IMS_QUOTA_SERVER_INITi: post_tp_annotated_gd_click,post_tp_annotated_gd_impression (5m,5m)
[priority: 100]
IMS_INITi: post_tp_annotated_gd_impression (5m)
[priority: 100]
TP_GD_SERVE_IMPR_INITi: annotated_gd_impression (5m)
[priority: 500]
IMS_YOO_INITi: yoo_gd_serve_sorted (hourly)
[priority: 100]
IMS_YOOi: yoo_gd_serve_sorted (hourly)
o: ims_yoo (hourly)[latency: 60]
[priority: 100]
YOO_GD_SERVE_SORTEDi: post_tp_yoo_gd_serve (5m)
o: yoo_gd_serve_sorted (hourly)[latency: 60]
[priority: 100]
ACT_YOO_CLICKS_TGTCLICKS_HOURLY_INITi: yoo_gd_serve_sorted,yoo_gd_click_sorted (hourly)
[priority: 100]
ACT_YOO_SRV_TGTSRV_HR_INITi: yoo_gd_serve_sorted (hourly)
[priority: 100]
MME_QS_STATS_CSi: gd_quota_server (15m)o: gd_quota_server (15m)
[latency: 500][priority: 500]
SOX_AM_KS_METRICSi: am_ks (15m)
[latency: 20][priority: 500]
SOX_VALIDATE_AM_KS[priority: 500]
IR_ADV_PERF_QUERY_HOURLYi: adv_ep_report (15m)o: ir_adv_perf (hourly)
[latency: 60][priority: 400]
ER_BOOKING_CLICK_IMPR_MERGE_INITi: er_booking_click_impr,er_booking_click_impr_ngd (15m)
[priority: 200]
ER_BOOKING_CLICK_IMPR_MERGE_QUERYi: er_booking_click_impr,er_booking_click_impr_ngd (15m)
o: er_booking_click_impr_merged (15m)[priority: 200]
ER_BOOKING_CLICK_IMPR_NGD_QUERYi: pub_ep_report_ngd (15m)
o: er_booking_click_impr_ngd (15m)[latency: 60]
[priority: 400]
KS_CLICK_CSi: ks_click (5m)[priority: 300]
KS_CLICK_INITi: ks_click (5m)[priority: 300]
KS_SERVE_CSi: ks_serve (5m)[priority: 300]
KS_SERVE_ROLLUP_INIT[priority: 500]
POST_MAPPING_KS_SERVE_INITi: ks_serve (5m)[priority: 300]
TPLLODS_YOO_GD_SERVE_INITi: yoo_gd_serve (5m)
[priority: 500]
SEG_BEACON_CSi: seg_beacon (5m)o: seg_beacon (5m)
[latency: 25][priority: 300]
AM_GD_INITi: gd_preagg (5m)
[priority: 500]
AM_GD_QUERY[priority: 500]
PREAGG_GD_QUERYi: post_tp_annotated_gd_click,post_tp_annotated_gd_impression (5m)
o: gd_preagg (5m)[latency: 25]
[priority: 100]
CM_GD_PREAGG_INITi: gd_preagg,cm_preagg (5m)
[priority: 500]
OF_GD_ORDER_HOURLY_INITi: gd_preagg (5m)
[priority: 100]
ER_CLICK_IMPR_INITi: gd_preagg (5m)
[priority: 400]
IR_PATH_PERF_INIT_HOURLYi: gd_preagg (5m)
[priority: 400]
MME_QS_INITi: gd_preagg (5m)
[priority: 500]
ER_LINE_CLICK_IMPR_MERGE_STATSi: er_line_click_impr_merged (15m)o: er_line_click_impr_merged (15m)
[priority: 200]
ER_LINE_CLICK_IMPR_MERGE_STATS_CSi: er_line_click_impr_merged (15m)o: er_line_click_impr_merged (15m)
[latency: 60][priority: 200]
ER_LINE_CLICK_IMPR_MERGE_QUERYi: er_line_click_impr,er_line_click_impr_ngd (15m)
o: er_line_click_impr_merged (15m)[priority: 200]
ER_LINE_CLICK_IMPR_MERGE_AMDi: gd_impr,gd_click (5m)
o: er_line_click_impr_merged (15m)[latency: 60]
[priority: 200]
CM_GD_PREAGG_QUERYi: gd_preagg,cm_preagg (5m)
o: cm_gd_preagg (15m)[latency: 60]
[priority: 500]
ER_CM_CLICK_IMPR_GD_INITi: cm_gd_preagg (5m,15m)
[priority: 500]
POST_MAPPING_ANNOTATED_KS_CLICK_QUERYi: annotated_ks_click (5m)
o: post_mapping_annotated_ks_click (5m)[priority: 300]
OF_GD_ORDER_HOURLY_QUERYi: gd_preagg (5m)
o: of_gd_order (hourly)[latency: 60]
[priority: 100]
QBP_REVENUE_HOURLY_STATSi: qbp_revenue (hourly)o: qbp_revenue (hourly)
[priority: 500]
QBP_REVENUE_HOURLY_STATS_CSi: qbp_revenue (hourly)o: qbp_revenue (hourly)
[latency: 60][priority: 500]
QBP_REVENUE_HOURLY_QUERYi: annotated_ks_click (hourly)
[latency: 60][priority: 500]
ER_BOOKING_CLICK_IMPR_KS_INITi: pub_ep_report_ks (15m)
[priority: 400]
TPLLODS_YOO_GD_SERVEi: yoo_gd_serve (5m)
o: post_tp_yoo_gd_serve (5m)[latency: 40]
[priority: 500]
IR_ADV_PERF_MERGE_QUERY_HOURLY_STATSi: ir_adv_perf_merged (hourly)o: ir_adv_perf_merged (hourly)
[priority: 200]
IR_ADV_PERF_MERGE_QUERY_HOURLY_STATS_CSi: ir_adv_perf_merged (hourly)o: ir_adv_perf_merged (hourly)
[latency: 60][priority: 200]
IR_ADV_PERF_MERGE_QUERY_HOURLYi: ir_adv_perf,ir_adv_perf_ngd (hourly)
o: ir_adv_perf_merged (hourly)[priority: 200]
IR_ADV_PERF_MERGE_AMDi: gd_impr,gd_click (5m)
o: ir_adv_perf_merged (hourly)[latency: 60]
[priority: 200]
DQM_IR_FEED_DATA_CHECKi: ir_adv_perf_merged (hourly)
[priority: 500]
DEFAULT_COB[priority: 500]
CREATIVE_METRIC_CSi: creative_metric (5m)o: creative_metric (5m)
[latency: 25][priority: 300]
3PI_BID_PROC_BASIC_INITi: ngd_serve_3pi (5m)
[priority: 100]
DQM_HIGH_RISK_CREATIVES_STATSi: dqm_high_risk_creatives (hourly)o: dqm_high_risk_creatives (hourly)
[priority: 500]
DQM_HIGH_RISK_CREATIVES_STATS_CSi: dqm_high_risk_creatives (hourly)o: dqm_high_risk_creatives (hourly)
[latency: 60][priority: 500]
DQM_HIGH_RISK_CREATIVES_QUERYi: dqm_crtv_metrics_rolling (hourly)o: dqm_high_risk_creatives (hourly)
[latency: 60][priority: 500]
DQM_CRTV_REPORTED_QUERYi: dqm_high_risk_creatives (hourly)
o: dqm_crtv_reported (hourly)[latency: 60]
[priority: 500]
NETWORK_REPORT_SMP_NGD_QUERYi: ngd_preagg (5m)
o: network_report_smp_ngd (15m)[latency: 60]
[priority: 400]
ER_NETWORK_CLICK_IMPR_NGD_INITi: network_report_ngd,network_report_smp_ngd (15m)
[priority: 400]
NETWORK_REPORT_NGD_QUERYi: ngd_preagg (5m)
o: network_report_ngd (15m)[latency: 60]
[priority: 400]
ER_CREATIVE_CLICK_IMPR_NGD_QUERYi: ngd_preagg (5m)
o: er_creative_click_impr_ngd (15m)[latency: 60]
[priority: 400]
ADV_EP_REPORT_NGD_QUERYi: ngd_preagg (5m)
o: adv_ep_report_ngd (15m)[latency: 60]
[priority: 400]
PUB_EP_REPORT_NGD_QUERYi: ngd_preagg (5m)
o: pub_ep_report_ngd (15m)[latency: 60]
[priority: 400]
KS_PREAGG_QUERYi: ks_serve (5m)
i: annotated_ks_click (5m)o: ks_preagg (5m)
[latency: 25][priority: 300]
ER_NETWORK_CLICK_IMPR_NGD_QUERYi: network_report_ngd,network_report_smp_ngd (15m)
o: er_network_click_impr_ngd (15m)[latency: 60]
[priority: 400]
IR_ADV_NET_PUB_NGD_INITi: network_report_ngd (15m)
[priority: 200]
SQM_SITE_METRICS_HOURLY_INITi: er_booking_click_impr_merged (15m)
[priority: 100]
SQM_SITE_METRICS_HOURLY_QUERYi: er_booking_click_impr_merged (15m)
o: sqm_site_metrics (hourly)[latency: 60]
[priority: 100]
ER_BOOKING_CLICK_IMPR_MERGE_STATSi: er_booking_click_impr_merged (15m)o: er_booking_click_impr_merged (15m)
[priority: 200]
ER_BOOKING_CLICK_IMPR_MERGE_AMDi: gd_impr,gd_click (5m)
o: er_booking_click_impr_merged (15m)[latency: 60]
[priority: 200]
TPLLODS_KS_CLICK_INITi: ks_click (5m)[priority: 500]
ER_BOOKING_CLICK_IMPR_MERGE_STATS_CSi: er_booking_click_impr_merged (15m)o: er_booking_click_impr_merged (15m)
[latency: 60][priority: 200]
POST_MAPPING_KS_CLICK_QUERYi: post_tp_ks_click (5m)
o: post_mapping_ks_click (5m)[priority: 300]
IMS_MOROCCO_STATSi: ims_morocco (hourly)o: ims_morocco (hourly)
[priority: 500]
IMS_MOROCCO_QUERY_STATS_CSi: ims_morocco (hourly)o: ims_morocco (hourly)
[latency: 60][priority: 500]
IMS_MOROCCO_QUERYi: post_tp_annotated_gd_click,post_tp_annotated_gd_impression (5m,5m)
o: ims_morocco (hourly)[latency: 60]
[priority: 500]
3PI_BID_PROC_BASIC_STATSi: 3pi_bid_proc_basic (15m)o: 3pi_bid_proc_basic (15m)
[priority: 100]
3PI_BID_PROC_BASIC_QUERYi: ngd_serve_3pi (5m)
o: 3pi_bid_proc_basic (15m)[latency: 60]
[priority: 100]
DH_DEFINITIVE_METRICS_CS_5Mi: dh_definitive_metrics (5m)o: dh_definitive_metrics (5m)
[latency: 25][priority: 500]
ER_REPORTS_KS_INITi: ks_preagg (5m)
[priority: 400]
AM_KS_INITi: ks_preagg (5m)
[priority: 500]
IR_ADV_PERF_KS_INIT_HOURLYi: ks_preagg (5m)
[priority: 200]
IR_PATH_PERF_KS_INIT_HOURLYi: ks_preagg (5m)
[priority: 200]
KS_PREAGG_HOURLY_INITi: ks_preagg (5m)
[latency: 90][priority: 300]
POST_MAPPING_KS_SERVE_QUERYi: ks_serve (5m)
o: post_mapping_ks_serve (5m)[priority: 300]
ACT_EXCH_RB_SEG_STATSi: act_exchange_rb_segments (hourly)o: act_exchange_rb_segments (hourly)
[priority: 500]
ACT_EXCH_RB_SEG_STATS_CSi: act_exchange_rb_segments (hourly)o: act_exchange_rb_segments (hourly)
[latency: 60][priority: 500]
ACT_EXCH_RB_SEGi: act_exchange_rb_segments_int (hourly)
o: act_exchange_rb_segments (hourly)[latency: 60]
[priority: 500]
SOX_METRICS_NGD_IMPR_QUERYi: ngd_preagg (5m)
o: sox_metrics_ngd_impr (5m)[latency: 20]
[priority: 500]
SOX_METRICS_NGD_CLICK_QUERYi: ngd_preagg (5m)
o: sox_metrics_ngd_click (5m)[latency: 20]
[priority: 500]
SOX_METRICS_NGD_CONV_QUERYi: ngd_preagg (5m)
o: sox_metrics_ngd_conv (5m)[latency: 20]
[priority: 500]
ER_CREATIVE_CLICK_IMPR_QUERYi: gd_preagg (5m)
o: er_creative_click_impr (15m)[latency: 60]
[priority: 400]
SOX_AM_NGD_DEF_METRICS_CHECKi: sox_metrics_ngd_impr,sox_metrics_ngd_click,sox_metrics_ngd_conv (5m)
[priority: 500]
SOX_METRICS_NGD_HOURLY_ROLLUP_INITi: sox_metrics_ngd_impr,sox_metrics_ngd_click,sox_metrics_ngd_conv (5m)
[priority: 500]
IR_PUB_PERF_MERGE_QUERY_HOURLY_STATSi: ir_pub_perf_merged (hourly)o: ir_pub_perf_merged (hourly)
[priority: 200]
IR_PUB_PERF_MERGE_QUERY_HOURLY_STATS_CSi: ir_pub_perf_merged (hourly)o: ir_pub_perf_merged (hourly)
[latency: 60][priority: 200]
IR_PUB_PERF_MERGE_QUERY_HOURLYi: ir_pub_perf,ir_pub_perf_ngd,ir_pub_perf_ks (hourly)
o: ir_pub_perf_merged (hourly)[priority: 200]
IR_PUB_PERF_MERGE_AMDi: gd_impr,gd_click (5m)
o: ir_pub_perf_merged (hourly)[latency: 60]
[priority: 200]
NGD_PREDICT_PREAGG_QUERY_COMPLETEi: post_tp_ngd_serve,post_tp_ngd_click,ngd_conversion (5m)
o: ngd_predict_preagg (5m)[latency: 60]
[priority: 100]
PREDICT_PEARL1_HOURLY_INITi: ngd_predict_preagg (5m)
[priority: 100]
PREDICT_PEARL2_HOURLY_INITi: ngd_predict_preagg (5m)
[priority: 100]
PREDICT_CORE_INITi: ngd_predict_preagg (5m)
[priority: 500]
NGD_RECONCILER_HOURLY_INITi: ngd_predict_preagg (5m)
[priority: 100]
PREDICT_DAILYVOL_HOURLY_INITi: ngd_predict_preagg (5m)
[priority: 100]
NGD_RECONCILER_LZ2_HOURLY_INITi: ngd_predict_preagg (5m)
[priority: 100]
DEFINITIVE_METRICS_ER_LINE_CLICK_IMPR_CHECK_15M[priority: 500]
ER_NETWORK_CLICK_IMPR_MERGE_INITi: er_network_click_impr,er_network_click_impr_ngd (15m)
[priority: 200]
AM_GD_AMDi: gd_impr (5m)o: am_gd (15m)[latency: 120][priority: 500]
SOX_AM_GD_INITi: am_gd (15m)[priority: 500]
TERMINAL[priority: 500]
QBP_REVENUE_HOURLY_INITi: annotated_ks_click (hourly)
[priority: 500]
KS_CLICK_BIDDED_HOURLY_INITi: annotated_ks_click,cm_click_bidded_terms (hourly)
[priority: 500]
PREDICT_PEARL1_HOURLY_QUERYi: ngd_predict_preagg (5m)
o: ngd_predict_pearl1 (hourly)[latency: 60]
[priority: 100]
ACT_YOO_CLICKS_TGTCLICKS_HOURLYi: yoo_gd_serve_sorted,yoo_gd_click_sorted (hourly)
o: act_yoo_clicks,act_yoo_targeted_clicks (hourly)[latency: 60]
[priority: 100]
YOO_GD_CLICK_SORTEDi: yoo_gd_click (5m)
o: yoo_gd_click_sorted (hourly)[latency: 60]
[priority: 100]
PUB_EP_REPORT_KS_QUERYi: ks_preagg (5m)
o: pub_ep_report_ks (15m)[latency: 60]
[priority: 400]
NETWORK_REPORT_KS_QUERYi: ks_preagg (5m)
o: network_report_ks (15m)[latency: 60]
[priority: 400]
SOX_OF_GD_HOURLY_METRICSi: of_gd_order (hourly)
[latency: 20][priority: 500]
SOX_VALIDATE_OF_GD_HOURLY[priority: 500]
SOX_OF_GD_HOURLY_INITi: of_gd_order (hourly)
[priority: 500]
ER_CREATIVE_CLICK_IMPR_MERGE_INITi: er_creative_click_impr,er_creative_click_impr_ngd (15m)
[priority: 200]
DQM_REPORTED_DATA_CHECK[priority: 500]
PREDICT_PEARL2_HOURLY_QUERYi: ngd_predict_preagg (5m)
o: ngd_predict_pearl2 (hourly)[latency: 60]
[priority: 100]
LOF_FETCHER_BATCH_5M[latency: 20]
[priority: 500]
3PI_BID_PROC_BASIC_QUERY_STATS_CSi: 3pi_bid_proc_basic (15m)o: 3pi_bid_proc_basic (15m)
[latency: 60][priority: 100]
LOF_FETCHER_NGD_5M[latency: 20]
[priority: 500]
DQM_ROLLING_METRICS_DATA_CHECK[priority: 500]
DQM_ROLLING_AGGREGATION_QUERYi: ir_adv_perf_merged (hourly)
o: dqm_crtv_metrics_rolling (hourly)[latency: 60]
[priority: 500]
IR_PATH_PERF_QUERY_HOURLYi: gd_preagg (5m)
o: ir_path_perf (hourly)[latency: 60]
[priority: 400]
POST_TP_DEFINITIVE_METRICS_WORKER_5M[priority: 500]
POST_TP_DEFINITIVE_METRICS_CHECK_15M[priority: 500]
IR_PATH_PERF_MERGE_INITi: ir_path_perf,ir_path_perf_ngd,ir_path_perf_ks (hourly)
[priority: 200]
ACCOUNT_PARTITION_MAP
NGD_RECONCILER_HOURLY_QUERYi: ngd_predict_preagg (5m)o: ngd_reconciler (hourly)
[latency: 60][priority: 100]
SOX_METRICS_FOR_AM_KSi: sox_metrics_ks_click (5m)
[latency: 20][priority: 500]
YOO_GD_SERVE_SORTED_INITi: post_tp_yoo_gd_serve (5m)
[priority: 100]
SOX_METRICS_KS_CLICK_QUERYi: post_tp_ks_click (5m)
o: sox_metrics_ks_click (5m)[latency: 20]
[priority: 500]
SOX_AM_KS_DEF_METRICS_CHECKi: sox_metrics_ks_click (5m)
[priority: 500]
SOX_METRICS_KS_HOURLY_ROLLUP_INITi: sox_metrics_ks_click (5m)
[priority: 500]
IR_ADV_PERF_NGD_INIT_HOURLYi: adv_ep_report_ngd (15m)
[priority: 200]
SOX_VALIDATE_AM_NGD[priority: 500]
SOX_AM_NGD_METRICSi: am_ngd (15m)
[latency: 20][priority: 500]
SOX_METRICS_FOR_AM_NGDi: sox_metrics_ngd_impr,sox_metrics_ngd_click,sox_metrics_ngd_conv (5m)
[latency: 20][priority: 500]
KS_CLICK_BIDDED_HOURLY_CFIo: cm_click_bidded_terms (hourly)
[latency: 60][priority: 500]
ACT_EXCH_RB_SEG_INTi: gd_serve,seg_beacon,ngd_serve (5m)
o: act_exchange_rb_segments_int (hourly)[latency: 60]
[priority: 500]
DEFINITIVE_METRICS_VALIDATE_ER_WORKER_15M[priority: 500]
SOX_VALIDATE_OF_NGD_HOURLY[priority: 500]
SOX_OF_NGD_HOURLY_METRICSi: of_ngd_order (hourly)
[latency: 20][priority: 500]
SOX_METRICS_FOR_OF_NGD_HOURLYi: sox_metrics_ngd (hourly)
[latency: 20][priority: 500]
SOX_AM_GD_METRICSi: am_gd (15m)
[latency: 20][priority: 500]
IR_PATH_PERF_MERGE_QUERY_HOURLY_STATSi: ir_path_perf_merged (hourly)o: ir_path_perf_merged (hourly)
[priority: 200]
IR_PATH_PERF_MERGE_QUERY_HOURLY_STATS_CSi: ir_path_perf_merged (hourly)o: ir_path_perf_merged (hourly)
[latency: 60][priority: 200]
IR_PATH_PERF_MERGE_QUERY_HOURLYi: ir_path_perf,ir_path_perf_ngd,ir_path_perf_ks (hourly)
o: ir_path_perf_merged (hourly)[priority: 200]
IR_PATH_PERF_MERGE_AMDi: gd_impr,gd_click (5m)
o: ir_path_perf_merged (hourly)[latency: 60]
[priority: 200]
CM_PREAGG_QUERYi: annotated_gd_cm (5m)
o: cm_preagg (5m)[latency: 45]
[priority: 500]
SQM_GD_SERVEURL_IMPR_HOURLY_QUERYi: post_tp_annotated_gd_impression (5m)
o: sqm_gd_serveurl_impr (hourly)[latency: 60]
[priority: 100]
IR_PUB_PERF_KS_INIT_HOURLYi: pub_ep_report_ks (15m)
[priority: 200]
KS_OFFER_BIDDED_HOURLY_EXP_REPORTING_TAG_QUERYi: ks_preagg (hourly)o: ks_offer (hourly)
[latency: 60][priority: 500]
KS_OFFER_BIDDED_HOURLY_QUERYi: ks_offer,cm_serve_bidded_terms (hourly)
o: ks_offer_bidded (hourly)[latency: 60]
[priority: 500]
KS_OFFER_BIDDED_HOURLY_INITi: ks_preagg (hourly)
[priority: 500]
IR_ADV_PERF_KS_QUERY_HOURLYi: ks_preagg (5m)
o: ir_adv_perf_ks (hourly)[latency: 60]
[priority: 200]
LATE_DATA_PROCESSOR_BATCH
IR_PATH_PERF_KS_QUERY_HOURLYi: ks_preagg (5m)
o: ir_path_perf_ks (hourly)[latency: 60]
[priority: 200]
IR_PUB_PERF_NGD_INIT_HOURLYi: pub_ep_report_ngd (15m)
[priority: 200]
ER_BOOKING_CLICK_IMPR_NGD_INITi: pub_ep_report_ngd (15m)
[priority: 400]
IR_ADV_NET_PUB_KS_INITi: network_report_ks (15m)
[priority: 200]
SQM_NGD_SERVEURL_IMPR_HOURLY_QUERYi: post_tp_ngd_serve (5m)
o: sqm_ngd_serveurl_impr (hourly)[latency: 60]
[priority: 100]
IR_PUB_PERF_NGD_QUERY_HOURLYi: pub_ep_report_ngd (15m)o: ir_pub_perf_ngd (hourly)
[latency: 60][priority: 200]
SOX_METRICS_KS_HOURLY_ROLLUPi: sox_metrics_ks_click (5m)o: sox_metrics_ks (hourly)
[latency: 60][priority: 500]
IR_ADV_PERF_MERGE_INITi: ir_adv_perf,ir_adv_perf_ngd,ir_adv_perf_ks (hourly)
[priority: 200]
SOX_METRICS_FOR_OF_HOURLYi: sox_metrics_impr (hourly)
[latency: 20][priority: 500]
SOX_OF_GD_HOURLY_DEF_METRICS_CHECKi: sox_metrics_impr (hourly)
[priority: 500]
DQM_ROLLING_DATA_CHECKi: dqm_crtv_metrics_rolling (hourly)
[priority: 500]
IR_ADV_NET_PUB_QUERYi: network_report (15m)
o: ir_adv_net_pub (hourly)[latency: 60]
[priority: 400]
IR_ADV_NET_PUB_MERGE_INITi: ir_adv_net_pub,ir_adv_net_pub_ngd,ir_adv_net_pub_ks (hourly)
[priority: 200]
IR_ADV_PERF_NGD_QUERY_HOURLYi: adv_ep_report_ngd (15m)o: ir_adv_perf_ngd (hourly)
[latency: 60][priority: 200]
IR_PUB_PERF_MERGE_INITi: ir_pub_perf,ir_pub_perf_ngd,ir_pub_perf_ks (hourly)
[priority: 200]
IR_ADV_NET_PUB_NGD_QUERYi: network_report_ngd (15m)
o: ir_adv_net_pub_ngd (hourly)[latency: 60]
[priority: 200]
SOX_OF_NGD_HOURLY_DEF_METRICS_CHECKi: sox_metrics_ngd (hourly)
[priority: 500]
SOX_METRICS_NGD_HOURLY_ROLLUPi: sox_metrics_ngd_impr,sox_metrics_ngd_click,sox_metrics_ngd_conv (5m)
o: sox_metrics_ngd (hourly)[latency: 60]
[priority: 500]
IR_PUB_PERF_QUERY_HOURLYi: pub_ep_report (15m)o: ir_pub_perf (hourly)
[latency: 60][priority: 400]
IR_PUB_PERF_KS_QUERY_HOURLYi: pub_ep_report_ks (15m)o: ir_pub_perf_ks (hourly)
[latency: 60][priority: 200]
IR_ADV_NET_PUB_KS_QUERYi: network_report_ks (15m)
o: ir_adv_net_pub_ks (hourly)[latency: 60]
[priority: 200]
PREDICT_DAILYVOL_HOURLY_QUERYi: ngd_predict_preagg (5m)
o: ngd_predict_dailyvol (hourly)[latency: 60]
[priority: 100]
OF_NGD_ORDER_HOURLY_QUERYi: ngd_preagg (5m)
o: of_ngd_order (hourly)[latency: 60]
[priority: 100]
SOX_OF_NGD_HOURLY_INITi: of_ngd_order (hourly)
[priority: 500]
KS_PREAGG_HOURLY_QUERYi: ks_preagg (5m)
o: ks_preagg (hourly)[latency: 90]
[priority: 300]
NGD_RECONCILER_LZ2_HOURLY_QUERYi: ngd_predict_preagg (5m)
o: ngd_reconciler_lz2 (hourly)[latency: 60]
[priority: 100]
KS_CLICK_BIDDED_HOURLY_QUERYi: ks_bidded_click,cm_click_bidded_terms (hourly)
o: ks_click_bidded (hourly)[latency: 60]
[priority: 500]
KS_CLICK_BIDDED_HOURLY_EXP_REPORTING_TAG_QUERYi: annotated_ks_click (hourly)o: ks_bidded_click (hourly)
[latency: 60][priority: 500]
KS_OFFER_BIDDED_HOURLY_CFIo: cm_serve_bidded_terms (hourly)
[latency: 60][priority: 500]
SOX_METRICS_HOURLY_ROLLUPi: sox_metrics_impr (5m)
o: sox_metrics_impr (hourly)[latency: 60]
[priority: 500]
IMS_QUERYi: post_tp_annotated_gd_impression (5m)
o: gd_ims (hourly)[latency: 60]
[priority: 100]
ACT_SRV_TGTSRVi: gd_serve (5m)
o: act_apex_serves,act_apex_targeted_serves (hourly)[latency: 60]
[priority: 500]
CURRENCY_LOOKUP
CURRENCY_LOOKUP_DATA_CHECKi: currency_lookup (hourly)
[priority: 500]
ACT_YOO_SRV_TGTSRVi: yoo_gd_serve_sorted (hourly)
o: act_yoo_serves,act_yoo_targeted_serves (hourly)[latency: 60]
[priority: 100]
23
Challenges
• Scale • Low Latency • Operational challenges
– Zero downtime upgrades – Reprocessing – Late data processing – Catch up – Capacity Planning
• Data Quality • Business Agility
– Schema evolution
24
Data Pipeline Components
Component Definition Product
Data Collection Ability to transport data from data event producers to a single repository
Y! Data Highway
Data Acquisition Ability to pull from a variety of external sources GDM
Data Storage System to store and access large volumes of data quickly
HDFS
Data Processing The ability to transform data in various useful ways including annotation, filtering and aggregation
M/R, PIG, Hive
Table Management / Meta Data
Provide a consistent API for data consumers with a standard meta data system
HCatalog
Job Coordination/Scheduling
Ability to schedule, submit, manage, retry, reprocess, catch up a DAG
Oozie
Data Output Enables push or pull based delivery of data subject to policies
HDFS Proxy
Data Policy Management Anonymize, retain, clean up and archive data GDM archive
Monitoring / System Management
Provide the ability to operate, visualize and install pipelines
Custom
25
Questions?