7
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#› vito Lars Rönnbäck, Nikolay Golov Big Data Normalization for Massively Parallel Processing Databases

2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE # vito Lars Rnnbck, Nikolay Golov Big Data Normalization for Massively Parallel Processing

Embed Size (px)

DESCRIPTION

© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#› 3 Back office Click stream BI Team Antifraud MDM CRM Node01Node02Node03Node04Node05

Citation preview

2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE # vito Lars Rnnbck, Nikolay Golov Big Data Normalization for Massively Parallel Processing Databases 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE # 2 Source: Google Analytics, LiveInternet, Internal data Avito Business Development Weekly Page Views (m) Q Focus on Moscow and St.Pete September 2010 Target 13 additional cities August 2011 Target total of 28 cities Q Merger with Slando and Olx reaffirmed #1 position in the Russian market +Vertical +Listing Fees +Pro tools Goods C2C +RE & Cars +B2C +Jobs +Services Path from Investment Stage to Cash Flow Generation Stage 2Stage 1 Stage 3 PositionCompeting with othersAhead of competitionx times ahead of competition Heavy investmentApproximately break-evenHigh EBITDA marginEconomics Build user baseDevelop business model and build leading brand Focus on monetization enhancement; attract professional classifieds market spend Focus January 2012 Avito has national coverage Q Launch of Domofond, a dedicated real estate classified Q Launch of a new revenue stream: Listing Fees AVITO - Clear #1 in Russia 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE # 3 Back office Click stream BI Team Antifraud MDM CRM Node01Node02Node03Node04Node05 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE # 4 Table types of Anchor Modeling Anchor. Table for entity. Attribute. Table for single attribute of an entity. Tie. Table for link between entities. Knot. Table for dictionary. 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE # 5 Avito DWH evolution 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE # 6 Semi-automatic addition of new entities, attributes and links to data model Universal approach to choosing data segmentation for tables Efficient logical compression of data Existing query optimizer can not produce efficient query execution plans for reports over normalized data model in a HP Vertica. Benefits and drawbacks of normalization in a MPP environment 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE # 7 Maximum merge join utilization instead of hash join to minimize risk of RAM depletion Temporary tables utilization to avoid repeated reading of table data from disk Automatic query plans generation according to Anchor Modeling metadata Reports over hundred-billion rows tables can be processed in minutes instead of hours Efficient query plans for highly normalized data in a MPP environment