Upload
raj-babu
View
452
Download
1
Embed Size (px)
Citation preview
Big Data Summit - Google VeniceOctober 20, 2015
Agenda
2:00 – 2:30
2:30 – 3:30
3:30 - 4:00
4:00 - 4:30
4:30 - 5:00
5:00 - 6:00
Registration & Welcome
GCP Big Data Overview by Rohit Khare, Google PM
Customer Stories - BlueCava & Pixalate
Panel Discussion, Q&A
Partner Story, Magnus Unum
Reception & Networking
3
● Parking behind Chaya Restaurant on Navy Street● Visitor badges● Washrooms● Beverage & food service● Wireless access “GoogleGuest”
Logistics
3
01 GCP Big Data OverviewRohit Khare, Google PM
deck
Confidential & ProprietaryGoogle Cloud Platform 5
Build. Store. Analyze.Google Cloud Platform for Big DataFocus on insights, not infrastructure
Big Data Summit, Los Angeles — October 20, 2015
Rohit Khare, Google Cloud Product ManagerWilliam Vambenepe, Lead Product Manager for Big Data
Confidential & ProprietaryGoogle Cloud Platform 8
BuildConnect Visualize Find Access
Confidential & ProprietaryGoogle Cloud Platform 9
IaaS PaaS SaaSInfrastructure-as-a-Service Platform-as-a-Service Software-as-a-Service
Google Cloud Platform
Cloud Computing
Confidential & ProprietaryGoogle Cloud Platform 10
Enterprise Cloud Platform market will exceed $43B globally by 2018.
2013
Confidential & ProprietaryGoogle Cloud Platform 11
AffordableCapacity
The decreasing cost of storage enables virtually unlimited
storage in the cloud. $600 can buy enough storage for the
world’s music.
(Source: McKinsey Global Institute May 2011)
Computing as a utility is now available for easy purchase,
provided from massively efficient data centers.
(Source: Nicholas Carr, The Big Switch, 2008)
The internet allows for a model of real-time access to new innovation, information and
applications from a wide range of devices.
IT Trends
On-demandcomputing
Instant access
Confidential & ProprietaryGoogle Cloud Platform 12
On and Off Growing Fast
• Successful services needs to grow/scale
• Keeping up w/ growth is big IT challenge
• Cannot provision hardware fast enough
• On & off workloads (e.g. batch job)
• Over provisioned capacity is wasted
• Time to market can be cumbersome
Cloud Computing Patterns
Confidential & ProprietaryGoogle Cloud Platform 13
Unpredictable Bursting Predictable Bursting
• Services with micro seasonality trends
• Peaks due to periodic increased demand
• IT complexity and wasted capacity
• Unexpected/unplanned peak in demand
• Sudden spike impacts performance
• Can’t over provision for extreme cases
Cloud Computing Patterns
Confidential & ProprietaryGoogle Cloud Platform 14
100 1,000 10,000 100,000
$0
$2,000
$4,000
$6,000
$8,000
publiccloud
privatecloud
servers servers servers servers
Cloud Economics10x cost benefit for large scale deployments
Confidential & ProprietaryGoogle Cloud Platform 15
Google Cloud Platform
Google Ecosystem + APIs
• Take advantage of Google’s entire ecosystem of services:
Search
Web analytics
Monetization
App Distribution
Confidential & ProprietaryGoogle Cloud Platform 16
We provide all of our customers with Bronze support giving you access to online documentation, community forums, and billing support.
If you want direct access to our support team for questions related to service functionality, best practice architectures, and service errors.
If you want 24 x 7 phone support, more rapid target initial response times and consultation on application development, and architecture for your specific use case.
If you want the most comprehensive, personal and customized support we offer. Includes everything in Gold support as well as direct access to the Technical Account Management team.
Goldstarts at $400/month
PlatinumContact Sales
Silver$150/month
BronzeFree
Support
Confidential & ProprietaryGoogle Cloud Platform 17
SSAE-16SOC 1
SSAE-16SOC 2
SSAE-16SOC 3
ISO27001
HIPAA(BAA)
PCI DSS v3.0 FISMA FedRamp
GAE Complete Complete Complete Complete H1 15 Complete FISMA (Moderate) H2 15
GCS Complete Complete Complete Complete Complete Complete n/a H2 15
GCE Complete Complete Complete Complete Complete Complete n/a H2 15
Datastore Complete Complete Complete Complete H1 15 Complete n/a H2 15
Big Query Complete Complete Complete Complete Complete Complete n/a H2 15
Cloud SQL Complete Complete Complete Complete Complete Complete n/a H2 15
Genomics H1 15 H1 15 H1 15 Complete H1 15 n/a n/a H2 15
Apps Complete Complete Complete Complete Complete n/a GAFG only H2 15
Certifications
Confidential & ProprietaryGoogle Cloud Platform 18
Pricing should be flexible and easy to understand. You shouldn’t need a PHD to understand prices, and you should get the best price automatically.
If you use a Compute Engine VM for more than 25% of a month, you receive discounts automatically.
Compute Engine instances are charged in one-minute increments (with a 10 minute min), so you only pay for what you use.
Per MinuteBilling
Sustained UseDiscounts
Philosophy
Pricing
For the past 15 years, Google has been building out one of the world’s fastest, most
powerful, highest quality cloud infrastructure on the planet.
Cloud Platform is built on the same infrastructure that powers Google.
Confidential & ProprietaryGoogle Cloud Platform 21
2002 2004 2006 2008 2010 2012
ColossusMapReduce
SpannerBig Table
Dremel
GFS
Google Innovations in Software
2013 2014
Dataflow
Kubernetes
Confidential & ProprietaryGoogle Cloud Platform 22
A look inside Google Cloud Platform
Confidential & ProprietaryGoogle Cloud Platform 23
Google Cloud Platform
NetworkingCompute Big Data Management Storage Mobile DeveloperTools
Confidential & ProprietaryGoogle Cloud Platform 24
ManagementNetworkingCompute Big Data Storage Mobile DeveloperTools
Google Cloud Platform
Compute
Compute Engine
Container Engine
App Engine
Confidential & ProprietaryGoogle Cloud Platform 25
ManagementNetworkingCompute Big Data Storage Mobile DeveloperTools
Google Cloud Platform
Storage
Cloud Storage
Cloud SQL
CloudDatastore
CloudBigTable
Confidential & ProprietaryGoogle Cloud Platform 26
NoSQL SQL Blob Block
Easy-to-use storage options
Confidential & ProprietaryGoogle Cloud Platform 27
Cloud StorageGoogle Cloud Platform
Confidential & ProprietaryGoogle Cloud Platform 28
Google Cloud Platform
Cloud Storage: Value
• Safe: Redundant storage at multiple physical locations. OAuth and granular access controls form strong, configurable security
• Ease of Use: Same APIs as other CGS products
• High Performance: We provide, 99.95% SLA and 24x7 phone support
• Pricing: Pay only for what you use with some of the lowest prices in the industry
Confidential & ProprietaryGoogle Cloud Platform 29
Google Cloud Platform
Cloud Storage: Features
• 3 storage options
○ Standard: The highest level of durability, availability and performance
○ DRA: High level of durability, availability and performance
○ Nearline: High performance data archiving, online backup, and disaster recovery
Confidential & ProprietaryGoogle Cloud Platform 30
Cloud DatastoreGoogle Cloud Platform
Confidential & ProprietaryGoogle Cloud Platform 31
Google Cloud Platform
Cloud Datastore: Value
• Accessible Anywhere
• Secure Sharing
• Same High Replication Datastore Used By App Engine Apps Today
• Equally Fast Queries For Any Sized Dataset
• Data is Replicated Across Several Data Centers
• Use From Any Application or Language
• Serving 4.5 Trillion Requests Per Month
Confidential & ProprietaryGoogle Cloud Platform 32
Google Cloud Platform
Cloud Datastore: Features
• Auto-scale
• Schemaless Access
• SQL-like Capabilities
• Authentication That Just Works
• Fast and Easy Provisioning
• RESTful Endpoints
• ACID Transactions
• Local Development Tools
• Built-in Redundancy
Confidential & ProprietaryGoogle Cloud Platform 33
Cloud SQLGoogle Cloud Platform
Confidential & ProprietaryGoogle Cloud Platform 34
Google Cloud Platform
Cloud SQL
• Fully managed
• Ease of Use
• Highly Reliable
• Flexible Charging
• Security, Availability, Durability
• EU and US Data Centers
• Easy Migration & Data Portability
• Control
Confidential & ProprietaryGoogle Cloud Platform 35
Cloud BigTableGoogle Cloud Platform
Confidential & ProprietaryGoogle Cloud Platform 36
ManagementNetworkingCompute Big Data Storage Mobile DeveloperTools
Google Cloud Platform
Big Data
Big Query Cloud Pub/Sub
CloudDataflow
Manage the Entire Lifecycle of Big Data
Store AnalyzeProcessCapture
Manage the Entire Lifecycle of Big Data
Cloud Logs
Google App Engine
Google Analytics Premium
Cloud Pub/Sub
BigQuery Storage(tables)
Cloud Bigtable(NoSQL)
Cloud Storage(files)
Cloud Dataflow
BigQuery Analytics(SQL)
Capture Store Analyze
Batch
Cloud DataStore
Process
Stream
Cloud Monitoring
Cloud Bigtable
Real time analytics and Alerts
Cloud Dataflow
Cloud Dataproc
Confidential & ProprietaryGoogle Cloud Platform 39
BigQueryGoogle Cloud Platform
Confidential & ProprietaryGoogle Cloud Platform 40
Google Cloud Platform
BigQuery: Value
● Performance: Ingest data at 100K rows/second and process real-time queries
● Ease of use: No administration for performance and scale
● Scale: No need to worry about growing data. Unlimited storage with pay as you go pricing model
● Non-technical analysts can drive queries on massive datasets using BI tools
Confidential & ProprietaryGoogle Cloud Platform 41
Google Cloud Platform
BigQuery: Features
● Interactive query performance: Query multi-terabyte datasets in an ad hoc manner
● SQL: Familiar SQL-like query syntax and intuitive user interface
● Data mashup: Query across diverse datasets
● Highly Available: Data replication in multiple geographies. Data is available and durable even in the case of extreme failure modes
● Secure: Access to data is controlled using customer-owned ACLs
Confidential & ProprietaryGoogle Cloud Platform 42
Cloud Pub/SubGoogle Cloud Platform
Confidential & ProprietaryGoogle Cloud Platform 43
Google Cloud Platform
Cloud Pub/Sub: Value
● Scalable, flexible, and reliable enterprise message-oriented middleware to the cloud
● Provides asynchronous messaging, allowing secure and highly available communication between independently written applications
● Delivers low-latency, durable messaging that helps developers quickly integrate systems hosted on the Google Cloud Platform and externally
Confidential & ProprietaryGoogle Cloud Platform 44
Google Cloud Platform
Cloud Pub/Sub: Features
• Unified messaging: Durability and low-latency delivery in a single product
• Global presence: Connect services located anywhere in the world
• Flexible delivery options: Both push- and pull-style subscriptions supported
• Data reliability: Replicated storage and guaranteed at-least-once message delivery
• Data security and protection: Encryption of data on the wire and at rest
Confidential & ProprietaryGoogle Cloud Platform 45
Cloud DataflowGoogle Cloud Platform
Confidential & ProprietaryGoogle Cloud Platform 46
Google Cloud Platform
Cloud Dataflow: Value
• Reduce cost of processing large datasets
• Save time: Automatically optimizes data-centric pipeline code by collapsing multiple logical passes into a single execution pass
• Increase efficiencies: Fully manages the lifecycle of required compute resources
• Simple: Dataflow makes it easy to write data-processing pipelines that incorporate both batch and stream-processing capabilities and is language-agnostic
Confidential & ProprietaryGoogle Cloud Platform 47
Google Cloud Platform
Cloud Dataflow: Features
• Unified programming model for both batch and stream-based data analysis
• Managed scaling: Manages the lifecycle of required compute resources
• Reliable & consistent processing: Built-in support for fault-tolerant execution
• Monitoring: Provides lifecycle statistics including in flight information like real time pipeline throughput, real time step lag and real time worker log inspection
Confidential & ProprietaryGoogle Cloud Platform 48
Cloud DataprocGoogle Cloud Platform
Programming
Resource provisioning
Performance tuning
Monitoring
ReliabilityDeployment & configuration
Handling growing scale
Utilization improvements
Typical Big Data Processing
Focus on Insight, Not infrastructure
Programming
Big Data with Google
Reduce Time to Understanding
Continuously accommodating greater data volumes and new data sources
Capture and store all data for all business functions
Complexity of building and maintaining a Big Data system with consistent ease of use
Reducing the time from data collection to action
Managing the cost of the data platform
1
2
3
4
Hurdles to innovate and iterate with Big Data
5
Keep system reliables/running
Keep your data secure
Collaboration within or across organizations7
8
9
6
Traditional Big Data = Big Problems
Google BigQuery
Google Compute and APP Engine Scalable VMs
TBs of Data
Process in seconds
Data Collection
ETL
Raw Data Storage
Aggregation
Analytics Storage
Visualization
Google Cloud Storage
Google Cloud Platform
1
2
3
4
5
6
Interactive Dashboards + apps
BI tools
Google Spreadsheets
1
2Collection
TransformationData processing
Cleansing4
Serve Analytics
Raw Data StorageBigQuery Staging
3 BigQuery Aggregate Staging
Raw Data Storage AdHoc QueriesREST API
5
6
Google Confidential
Google confidential │ Do not distribute
Overview:Data to process: Data in the Consolidated Audit Trail (CAT). A data repository of all equities and options orders, quotes, and events
Challenges:How to process the CAT and organize 100 billion market events into an “order lifecycle” in a 4 hour windowStore 6 years (~30PB) of data
Cloud Bigtable to process and run queries and tolerate volume increases
6 BILLIONMARKET EVENTS
WRITTEN PER HOUR
1.7 GIGsPER SECOND
PER HOUR
6 TBs
10 BNWRITTEN
PER HOUR BURSTS
1.7 GIGABYTESPER SECOND
10 TERABYTESPER HOUR
Google confidential │ Do not distribute
Overview:Data to process: standard game KPIs, marketing data, custom game insight
Several dozen gigabytes of raw logs per day
Challenges:Struggled to process large volume of data
Long delays between triggering logs and querying data; problematic for games running live events
Issues controlling permissions
Long-running queries, clunky analysis
Overview:Data to process: Standard game KPIs, marketing data, custom game insight
Several dozen gigabytes of raw logs per day
Challenges:Struggled to process large volume of data
Long delays between triggering logs and querying data; problematic for games running live events
Issues controlling permissions
Long-running queries, clunky analysis
“BigQuery has helped us focus on actually using data instead of exhausting ourselves just trying to get to the data.”
CRUNCH
150GIGS OF DATA IN15 SECONDS
INSTANT
LOG INGESTION
SCALEW
ITH
OU
T CLOGGING THE SYSTEM
F L E X I B I L I T Y
ON PERMISSION
CONTROLS
Confidential & ProprietaryGoogle Cloud Platform 54
Confidential & ProprietaryGoogle Cloud Platform 55
700 million
“App Engine enabled us to focus on developing the application. We wouldn’t have gotten here without the ease of development that App Engine gave us.”Bobby Murphy, CTO
Snapchat sends
photos and videos each day Google App Enginescaled seamlesslyduring growth to
millions of users
Small team is ableto innovate quickly
and expandglobally
Big Data Partner Ecosystem
Chartio
cloud.google.com
02 Customer Story - BlueCavaReza Qorbani, CTO
deck
BLUECAVA, INC. / 2015BLUECAVA, INC. / 2015 PAGE 59
CROSS SCREEN STARTS HERE
BLUECAVA, INC. / 2015
BLUECAVABusiness / Product / Challenges
PAGE 60
BLUECAVA, INC. / 2015
INTRODUCTION
PAGE 61
Reza QorbaniCTO @ BlueCava
• Work with Google Big Data Team in past 1.5 years
• Move from 100% Private Cloud to Hybrid Environment
• Deep Integration with Big Query
@qorbani
BLUECAVA, INC. / 2015
DIS
PLA
YM
OB
ILE
VID
EOEX
CH
AN
GE
SOC
IAL
Real-timeIntelligence
ABOUT – BlueCava
PAGE 62
VA
LIDA
TION
DEM
OG
RA
PH
LOC
ATIO
NEXC
HA
NG
EC
OV
ERA
GE
Association Graph
DataTechPlatforms
AdTechPlatformsOpen Network that Optimizes
Cross-Screen Marketing
MARTECH PLATFORMS & SERVICES
BLUECAVA, INC. / 2015
ABOUT – Association Graph
PAGE 63
House Hold
Consumer B Consumer A Consumer C
IDFA APN BCID
BLUECAVA, INC. / 2015
ABOUT – Coverage
PAGE 64
100M / House Holds
240M / Consumers
600M / Devices
BLUECAVA, INC. / 2015
ABOUT – Volume
PAGE 65
5 TB DailyDaily RAW Logs
250k req/secFrom Partners and Exchanges
1.3 PetabyteTotal Storage
25 Billion IDsIncluding our Partner IDs
BLUECAVA, INC. / 2015
ABOUT – Challenge
PAGE 66
− Generate data for customers− Multiple extraction at time− Keep data for months− Highly Available
− Easily run Ad-Hoc queries − Handle lots of POCs− Flexible to Change− Unified Data Store
− Bandwidth Cost− Storage Cost− Infrastructure Cost− Operation Cost
Cost Flexibility Delivery
BLUECAVA, INC. / 2015
ARCHITECTUREBlueCava Platform Overview / Before / Now / Future!
PAGE 67
BLUECAVA, INC. / 2015
ARCHITECTURE – BlueCava Platform Overview
PAGE 68
CORE INTERNAL CUSTOMER
PLATFORM
EDGEX BIDDER OPERATIONS QUALITY API PORTAL
METADATA PREPARE
LOGGINGA
GG
REG
ATE
FILT
ER
DET
ECTO
R
TRANSFER / PREPARE PROCESS / ASSOCIATION ANALYZE / REPORT
AG AE DB
BLUECAVA, INC. / 2015
ARCHITECTURE – Before
PAGE 69
WEST (IRVINE) EAST (ASHBURN)
CORECORECUSTOMERINTERNAL
PLATFORMBACKUP / DR
Geographic Load Balancing
XDC NET
BLUECAVA, INC. / 2015
ARCHITECTURE – Before / Challenges
PAGE 70
CostEstimate of $1.5M upfront to scale up
High Monthly Bandwidth costNeed to Extend Operation team
Scalability
Performance
Storage
Complexity
Resource Limitations
Datacenter Issue with Traffic spikes Need to scale down after POC finishes
Some processes took more than a dayCustomer delivery takes 5-10 hours
Ad-Hoc queries taking hours
Need more historical data to increase qualityNeed to keep customer data for monthsDeliver large amount of data to customers
Simple Tasks Require Data Engineering ExpertiseCustomizing Data Output was hard
Data Scientists need meaningful data setQA/Dev Environment SeparationAd-Hoc queries create issue for production
BLUECAVA, INC. / 2015
ARCHITECTURE – Before / Solution
PAGE 71
Big Query
▪ Big Data as a Service
▪ Extremely cost effective for our use-case▪ Support Hierarchical Data Model▪ Extremely fast▪ Query using SQL
▪ Solve most of our Big Data challenges
▪ Fraction of cost (It was Unbelievable)▪ Customer Delivery in Seconds!!! ▪ We dropped Delivery Spark Cluster (10 nodes)▪ We dropped Ad-Hoc Hadoop Cluster (100x nodes)▪ Offload ALL Customer Facing Jobs▪ Only 2 Sprints Development (6 Weeks)
BLUECAVA, INC. / 2015
ARCHITECTURE – Before / Solution
PAGE 72
Cloud Storage
▪ Nice integration with Big Query
▪ No file size limit like S3▪ HDFS Integration using Hadoop Connector▪ Seamless Cost Saving: DRA and Nearline
▪ Solved most of our Storage challenges
▪ Simplified our file delivery▪ Extremely competitive pricing ▪ No need for Backup ☺
BLUECAVA, INC. / 2015
ARCHITECTURE – Before / Solution
PAGE 73
Compute Engine
▪ Great Sustained Pricing
▪ No need for long-term contract▪ Simple CLI for Automation▪ BDUtil Library for Hadoop
▪ Elastic Environment which saved us on Cost
▪ 100+ nodes Hadoop under 6 minutes▪ Use as On-Demand Resource as needed▪ Stop purchasing more hardware!
BLUECAVA, INC. / 2015
ARCHITECTURE – Now
PAGE 74
WEST (IRVINE) Google Cloud Platform
CORE CUSTOMERINTERNAL
PLATFORM
Cloud Storage
Simple DNS
Interconnect Big Query
BLUECAVA, INC. / 2015
ARCHITECTURE – Future!
PAGE 75
CostMove all in Cloud
ScalabilityWorld-wide Coverage
PerformanceReal-time Association
SimplifyData Science Lab
Container Engine Dataproc Dataflow Datalab
BLUECAVA, INC. / 2015
ARCHITECTURE – Future!
PAGE 76
CORE REALTIME PROCESS
ASSOCIATION GRAPH
QUERY
LAB
STORAGE
INTERNAL
CUSTOMER
BATCH PROCESS
BLUECAVA, INC. / 2015 PAGE 77
THANK YOU
02 Customer Story - PixalateAmin Bandeali, CoFounder & CTO
deck
@
Amin Bandeali, Founder & CTOPixalate, Inc.
Agenda● What is Pixalate?● My Role @ Pixalate● Pixalate Breadth and Depth● What is Ad Fraud and why is it important to solve?● Challenges● Ad Fraud ● Real World BigQuery Use Cases ● Conclusion
Our Mission
To Rate the Whole Internet…...and YES we also see what Google doesn’t see!
What is Pixalate?Pixalate is a defacto Ratings Standard for Programmatic Advertising.
SellerTrustIndex.com
My Role @ Pixalate● Co-Founder, CTO and Solution Architect● Real-Time Data Junkie - Contributed to Apache Hadoop Project● Largest AWS DynamoDB user upon launch - not using it anymore!● Largest AWS SQS user - not using it anymore!● Pixalate backend runs Java, NodeJS, Redis, Solr, S3 and BigQuery● Denied using 25000 free hours of AWS Redshift!● 70% of Pixalate technology runs on AWS -- 30% on BigQuery● We move 2TB of data from AWS to Google Storage just for BigQuery
Challenges
Process 1+ Trillion Ad Transactions Data/month
Processing Upto 3 PB/month
Analyze Massive amounts of Data to detect fraud
Create customized reports with NO engineering support!
Close to 1 Trillion rows of data in BigQuery
What is Ad Fraud?
Ad Fraud against AdMob and MacDonlds
Our Realtime Fraud Map
http://www.pixalate.com/map
What’s wrong with this data?
A day in the life of Data Science Team● An Account Manager requests the data science team for customized report for
a client that measures some specific metrics for the last 6 months of their data.
● Solution 1: AWS EMR - Boring and takes Hours!○ The Big data engineers will execute an EMR (Hive) job that extracts the data and creates the
report
● Solution 2: BigQuery - Fun and takes Seconds!○ The data science team implements a usually complex query that calculates all the metrics in
SQL○ BigQuery will process a couple of TB of data and create the report in few seconds.
Bypassing the Engineers!● We need to expand a list of 500,000 network addresses in CIDR format (e.g.
128.0.0.1/24) to regular IP format and use them in client reports● Solution 1: Java
○ provide the Java engineers with the requirements ○ wait for implementation completion○ wait for UAT and Production push○ store the data in a database
■ total time ~3 workdays (in Startup Timezone)
● Solution 2: BigQuery○ the data science team writes a query with 25+ table JOINs and UNIONS that takes care of the
expansion in a clean, easy to test way, and runs it in BigQuery■ total time ~3 hours
From Waste Picking to Innovation
● The amount of digital data in the universe is growing at an exponential rate, doubling every two years, and changing how we live in the world. ○ YET only .5% of that data is analyzed!
● If you can’t mine these data easily and extract semantics, ○ then how is data collection different than waste-picking???
● BigQuery enables Innovation○ It breaks the dependency between data scientists and big-data
engineers
○ Now data scientists can write complex queries and analyse massive
amounts of data without the need of any backend coding (e.g. Java),
or some other big data framework
○ It enables the deep understanding of complex data and their
dependencies
Cost reduction using BigQuery● Complex data processing pipelines impose a new cost optimization
challenge● Main questions to be answered:
○ Where do I store the data I collect?○ Where/How do I aggregate the data I collect?○ How do I enhance the data I collect with other metadata?○ How do I process the data collected?
■ such that the overall cost is minimized??● BigQuery can HELP!
BigQuery Health Monitoring Using BigQuery
But Wait!
Here’s the real benefit...
Zero Cost Queries Over Petabytes!
● How can you query PETABYTES of historical data and create time series to detect traffic anomalies (e.g. network failures, etc)?
● BigQuery Zero Cost queries (a.k.a. table metadata) ○ can give you the big picture regarding table’s data health
■ within seconds ■ without having to run any costly queries
suspicious activity
Big Query Success is all about the Architecture
Spend a LOT of time on Table Schemas (hint: keep them flat)
Learnings● BigQuery has its gotchas!
○ The wrong Sharding strategy can slow you down○ Know your Quotas well -- they will haunt you!○ Balance the table JOINs appropriately○ Don’t use ORDER BY unless it’s mandatory○ Avoid “SELECT *” queries on “fat” tables over long time ranges
● Secret recipe○ push as much complexity as possible to BigQuery using advanced queries
■ usually > 100 lines of SQL code○ use backend languages (e.g. Java) to simply orchestrate the data pipeline○ don’t be scared of data duplication -- storage cost is much cheaper than analysis cost!
Q&A
Amin Bandeali
p: 888.749.2528 m: 714.757.9544e: [email protected] t: http://twitter.com/aminbandeali
Confidential & ProprietaryGoogle Cloud Platform 101
Panel Q&A
Rohit Khare, GCP Big Data PMReza Qorbani, BlueCava CTO
Amin Bandeali, Pixalate CoFounder & CTO
04 Partner Story - Magnus UnumRajesh Babu, BI, Big Data & Analytics solutions ArchitectSubash D'Souza, Big Data Evangelist
deck
Magnus UnumRaj Babu & Subash D’Souza
Modern BI & Big Data platform with Google Cloud
Magnus Unum…what we do
We are a LA based Big Data, Data Science & Analytics
Consulting Services firm specialized in advising our clients on Strategy, Road
Map/Blue Print, Implementation, Deployment, Maintenance/Support/Operations for their Big Data,
Data Science, BI and Analytics solutions
• Raj Babu• Co – Founder, Magnus Unum• Founder, Agile iSS• 20 years of experience in the BI & Analytics field• Worked on numerous, very large BI migration and Integration
projects
• Subash D’Souza• Over 10 years of experience in building scalable solutions for
various enterprise companies• Organizer for several LA User Groups including Big Data,
Apache Spark & Apache HBase• Organizer for Big Data Day LA• Recognized as a Champion of Big Data by Cloudera
Magnus Unum…Leadership
Magnus Unum - Key Services
• Architect, Design & Build Big Data Solutions• Cloud Migration services for Big Data,
Analytics & BI• Big Data Engineering & Staffing • Big Data managed & support services• Data Science Solutions & Services
Magnus Unum – Expertise• On-Prem
Cloudera, Hortonworks, IBM, Pivotal & MapR
• CloudGoogle Cloud, Amazon AWS & Microsoft Azure
• Analytics/ ReportingTableau, MicroStrategy, SAP BO, Qlik & Pentaho
• Data ScienceMachine Learning, R, SAS & Data Analytics
Why Google Cloud Platform?
Use Case 1 – Migrating your Data Warehouse and BI to Google Cloud• Capture / Migrate or Capture• Storage / Data Management• Data Processing• Query/Analytics• Data Integration• Access Control
Use Case 2 – Google Analytics detailed analysis
• Limitation in Google Analytics daily export• More detailed analysis available as part of
Google Cloud Platform( Must have premium access)
• Can analyze granular level details of User Interaction on websites and aggregate the results for display on-prem or within GCP
Please reach out to us for a free Consultation & Assessment of your BI, Big Data & Analytics
needs
& additional $500 in GCP credits!