55
April, 2018 © Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 1 GCP Workshop

Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

April, 2018

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 1

GCP Workshop

Page 2: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Koivu Solutions Oy

• Porist – already since 2017

• A Solutions Company • Knowledge to help define the digital

roadmap• Capabilities and tools to rapidly develop

digital pilots to showcase benefits • Deployment experience to expand

breakthrough digital technologies to the entire organization

• Extensive Fortune Global 1000 experience• Expertise developed across many digital

assessments in multiple industries• More than 25 years enterprise solutions

development experience• More than 100 enterprise customer projects

delivered across 4 continents

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 2

Page 3: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Koivu Founding Team – Born Global

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 3

Sami LahtiSenior Technology ArchitectFormer JDA, i2, Innomat

Samu LahtiSenior Software ArchitectFormer JDA, i2, Tieto

Chris MorhardSenior Business ArchitectFormer E2open, MercuryGate, JDA, i2

Janne SalmiSenior Business ArchitectFormer Chainalytics, ROCE Partnerts, McKinsey

Harri RajalaSenior Software ArchitectFormer JDA, i2, Innomat

Page 4: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Our team’s work has been for industry giants

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL4…and many more

Page 5: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Stop-and-go mentality is Achilles' heel of today’s enterprise software

• Nightly batch processing stops the business. 24 hour day is not 24 hour business day.

• Diminishing nightly time windows for data batch processing are creating pressure for systems to be available earlier on the next morning with data good enough for tomorrow’s business.

• These systems are sequence oriented, not capable to do parallel data processing and hard to scale. Peak load capacity needs to be always on-line.

5© Copyright 2018 Koivu Solutions Oy

runStop and

load runStop and

load runStop and

load runStop and

loadStop and

load

Page 6: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Modern way to build enterprise applications is to connect the dots on cloud platform

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 6

Page 7: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

IoT and Data Analytics solutions are mean to be real-time.

7© Copyright 2018 Koivu Solutions Oy

Page 8: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Message queue is a core component of a modern enterprise application

• When you need messages to be delivered?• People send messages all the time: email, SMS, Chat, Comments, … • These are just fraction of messages systems send to each others.

• Traditionally many message type processes have been managed as batch load, for example nightly. Today real-time operation is a business requirement.

• Message hub consumes messages and sends them for others to operate.• Works as shock absorber for load fluctuation.• Message content can be ‘anything’.

• Message buffering and guaranteed deliver works as enablers for high availability for the complete communication chain.• Publishers and Subscribers work independent and unknowingly of each others. • There can be many topics / channels on which messages are published.• A new consumer can join to listen the messages without changes to publisher(s).

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 8

Page 9: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Google Cloud Pub / Sub is a high-available messaging hub for asynchronous communication

• Cloud Pub/Sub is a simple, reliable, scalable foundation for stream analytics and event-driven computing systems.

• As part of Google Cloud’s stream analytics solution, the service ingests event streams and delivers them to Cloud Dataflow for processing and BigQuery for analysis as a data warehousing solution.

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 9

Page 10: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Spotify is a music streaming service, and also generates huge amount of messages.

“Whenever a user performs an action in the Spotify client—such as listening to a song or searching for an artist—a small piece of information, an event, is sent to our servers. Event delivery, the process of making sure that all events gets transported safely from clients all over the world to our central processing system, is an interesting problem. ”

https://labs.spotify.com/2016/03/03/spotifys-event-delivery-the-road-to-the-cloud-part-ii/

• Spotify is now on their 3rd generation of messaging solutions (2016).

• How do they handle millions of messages 24/7?

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 10

Page 11: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Spotify did a massive Pub / Sub Performance Test

• Spotify used 29 servers to send messages to Pub/Sub to generate continuous test load of 2 million messages.

• “Enabling batching and compression on the Event Service machines resulted in ~1Gbps of network traffic towards Pub/Sub.”

• “Pub/Sub passed the test with flying colours. We published 2M messages [per second] without any service degradation and received almost no server errors from the Pub/Sub backend.”

• With current Google prices (2018) this would generate cost of $17 per hour. • $0,04 per Gb

• 330TB monthly rate.

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 11

Page 12: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

The second part of the test was to let the Pub/Sub buffer messages for one hour before starting to consume them.

• 7.2 billion messages buffered

• Consumption was just slightly higher rate than publishing.

• It took 8 hours to catch up the backlog.

• No messages were observed to be lost!

• Pub/Sub managed the backlog and hide the ‘blackout’ of the back-end system.

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 12

Page 13: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

What was required from Spotify to setup, run and maintain Pub/Sub for this size of scalability test?

Absolutely nothing.

Pub / Sub is fully managed, unconfigurable, global service and

operated entirely by Google.

Not even capacity reservations needed, no system configuration, no

administration.

It just runs. Like a telecommunication switch for

phones.

“Based on these tests, we felt confident that Cloud Pub/Sub was the right choice

for us. Latency was low and consistent, and the

only capacity limitations we encountered was the one explicitly set

by the available quota. In short, choosing Cloud Pub/Sub rather

than Kafka 0.8 for our new event delivery platform was an obvious

choice.”

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 13

Page 14: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Serverless – NoOps

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 14

Page 15: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

BigQuery is Google's serverless, highly scalable, low cost enterprise data warehouse designed to make all your data analysts productive.

• Because there is no infrastructure to manage, you can focus on analyzing data to find meaningful insights using familiar SQL and you don't need a database administrator.

• BigQuery enables you to analyze all your data by creating a logical data warehouse over managed, columnar storage as well as data from object storage, and spreadsheets.

• BigQuery makes it easy to securely share insights within your organization and beyond as datasets, queries, spreadsheets and reports.

• BigQuery allows organizations to capture and analyze data in real-time using its powerful streaming ingestion capability so that your insights are always current.

• BigQuery is free for up to 1TB of data analyzed each month and 10GB of data stored.

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 15

Page 16: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Public data set used: New York Taxi Trips -2016

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 16

Table ID nyc-tlc:yellow.trips

Table Size 130 GB

Long Term Storage Size 130 GB

Number of Rows 1,108,779,463

More than a billion row data set

This dataset includes trip records from all trips completed in yellow taxis in NYC since 2009. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab Passenger Enhancement Program (TPEP).

Page 17: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Calculate average price for trips –according to passenger count

• Categorize all 1.1 billion rows by passenger count.

• Calculate average price, distance and time for each category.• So use numbers from every single 1.1

billion rows.

• 41Gb data scanned.

• 6 seconds execution time.

• Theoretical query cost…invisible.

• (Data set cost: Google provides public data sets)

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 17

SELECTCASEWHEN passenger_count <= 0 THEN 'Unknown'WHEN passenger_count >= 7 THEN '>= 7'ELSE STRING(passenger_count)

END AS PassengerCount,COUNT(passenger_count) AS TripCount,ROUND(AVG(total_amount),2) AS AverageTotalAmout,ROUND(AVG(trip_distance),2) AS AverageTripDistance,TIME(SEC_TO_TIMESTAMP(AVG((dropoff_datetime-pickup_datetime)/1000/1000))) AS

AverageTripDurationFROM

[nyc-tlc:yellow.trips]GROUP BY

PassengerCountORDER BY

PassengerCount

Demo can be run online using this

SQL.

Page 18: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Demo Recording

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 18

Page 19: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Add more calculation without increase data scanned: no more extra cost

• Find also max trip price.

• The same column used for both average and max calculation.

• No extra costs.

• Less than 6 seconds execution time.

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 19

SELECTCASE

WHEN passenger_count <= 0 THEN 'Unknown'WHEN passenger_count >= 7 THEN '>= 7'ELSE STRING(passenger_count)

END AS PassengerCount,COUNT(passenger_count) AS TripCount,ROUND(AVG(total_amount),2) AS AverageTotalAmout,ROUND(MAX(total_amount),2) AS MaxTotalAmout,ROUND(AVG(trip_distance),2) AS AverageTripDistance,TIME(SEC_TO_TIMESTAMP(AVG((dropoff_datetime-

pickup_datetime)/1000/1000))) AS AverageTripDurationFROM

[nyc-tlc:yellow.trips]GROUP BY

PassengerCountORDER BY

PassengerCount

Page 20: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Adding more results columns means that more data is scanned

• Find also average tip.

• A new column introduced into query.

• More data scanned as BigQuery is columnar store.

• Still Less than 6 seconds execution time – parallel processing.

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 20

SELECTCASE

WHEN passenger_count <= 0 THEN 'Unknown'WHEN passenger_count >= 7 THEN '>= 7'ELSE STRING(passenger_count)

END AS PassengerCount,COUNT(passenger_count) AS TripCount,ROUND(AVG(total_amount),2) AS AverageTotalAmout,ROUND(MAX(total_amount),2) AS MaxTotalAmout,ROUND(AVG(tip_amount),2) AS AverageTipAmout,ROUND(AVG(trip_distance),2) AS AverageTripDistance,TIME(SEC_TO_TIMESTAMP(AVG((dropoff_datetime-

pickup_datetime)/1000/1000))) AS AverageTripDurationFROM

[nyc-tlc:yellow.trips]GROUP BY

PassengerCountORDER BY

PassengerCount

Page 21: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Serverless – NoOps

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 21

Page 22: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Google Cloud Dataflow is service for real-time data stream processing

• Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness -- no more complex workarounds or compromises needed.

• And with its serverless approach to resource provisioning and management, you have access to virtually limitless capacity to solve your biggest data processing challenges, while paying only for what you use.

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 22

Page 23: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

• Hours of nightly runs become distributed around the clock as minute and second size small ‘mini-batches’.

• Windowing for data processing is continuous and data volumes fluctuating up and down requiring dynamic software architecture.

Delay to react to the new data and intelligent decisions becomes minutes instead of days.

Dynamic solution runs continuously and manages changes using small time windows

23© Copyright 2018 Koivu Solutions Oy

Page 24: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Dataflow combines batch and stream operations when needed

• The same logic, so code, is used to process both batch and stream data.

• No double maintenance for two different type of operation.

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 24

Page 25: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Example Dataflow implementation for real-time metrics and log data processing

• Statics calculation of 300+ metrics coming at 20 messages per second:

• Sums, averages, 10%, 90%, medians• Baseline compute, top contributor compute• Minute, hourly, daily values• Window statistics and cumulative statistics• Bottom up statistcs from instance, environment,

application, customer, company.• Log, exception and error file processing.• Real-time processing of constant data stream costs

about $300 per month.

• Great performance. Cloud Dataflow is 2-3x faster and cheaper than Hadoop when evaluating classic MapReduce based pipelines, such as PageRank and WordCount.

• And with dynamic work rebalancing, Cloud Dataflow effectively optimizes resource utilization which provides additional performance gains without requiring manual intervention.

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 25

Page 26: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Customers are saying…

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 26

Page 27: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Hands on Workshop

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 27

Page 28: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Ingest Pipelines

Storage

Analytics

Application &

Presentation

Standard

Devices

HTTPS

Constrained

Devices

Non-TCP

e.g. BLE

Gateway

Real Time Stream Processing - Internet of Things

App

Engine

Container

Engine

Cloud

Storage

Cloud

Pub/Sub

Data

Studio

Monitoring

Logging

Cloud

Dataflow

Cloud

Datastore

Cloud

Bigtable

BigQuery

Cloud

Dataproc

Cloud

Datalab

Compute

Engine

Entry Point

IoT Core

Cloud

Functions

Page 29: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Ingest Pipelines Analytics

Standard

Devices

HTTPS

Real Time Stream Processing - Internet of Things

Cloud

Pub/SubBigQuery

Cloud

Functions

Analytics

Data

Studio

1.4.3. 2.

Page 30: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 30

Demo Data

• Tieliikenne data

• https://www.liikennevirasto.fi/avoindata/tietoaineistot/lam-tiedot

• http://digitraffic.liikennevirasto.fi/

• Pori, Nakkila, Karkkila, Vihti LAM-asemat ( liikenteen automaattisetmittausasema, induktiosilmukat) ajoneuvonopeudetajoneuvotyypeittäin 1.4.2018.

• http://digitraffic.liikennevirasto.fi/tieliikenne/#ajantasaiset-lam-mittaustiedot

Page 31: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

GitHub Link

• Guidance to workshop details and source code:

• https://github.com/koivusolutions/koivu-workshops

• Open this to other browser window.

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 31

Page 32: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Cloud Console - https://console.cloud.google.com/

Page 33: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Big Query – Create New Dataset

1

Page 34: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Create Table

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 34

Page 35: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Import Data to Table (Create Table)

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 35

1

2

3

4

Page 36: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Preview Table

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 36

Page 37: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Query Table

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 37

2

3

1

Page 38: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Data Studio - https://datastudio.google.com

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 38

Page 39: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Add Data Source

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 39

Page 40: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Create Data Connector

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 40

Page 41: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Creating Data Connector

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 41

2

1

Page 42: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Creating Data Connector

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL

42

1

2

Page 43: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Add Table

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 43

Page 44: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Add Chart

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 44

1

2

Page 45: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Add Filter

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 45

1

2

Page 46: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Extra Report Challenge

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 46

Page 47: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Pub/Sub Real-Time Data into Pipeline -https://console.cloud.google.com/cloudpubsub

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 47

Page 48: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Create Topic

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 48

Page 49: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Manual operations to Pub/Sub

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 49

Page 50: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Cloud Functions - Workshop

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 50

1

2

Page 51: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Cloud Functions -Create

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 51

Page 52: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Publish Message from UI

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 52

Page 53: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Did Cloud Function Activate?

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 53

Page 54: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

Do you see data?

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 54

2

1

Page 55: Why GCP over AWS - Datatiededatatiede.fi/wp-content/uploads/2018/...GCP-Workshop-big-data-pipel… · •Diminishing nightly time windows for data batch processing are creating pressure

What Next?

• Dataprep

• Dataflow

• Datalab

• Machine Learning

• Big Table

• IoT Core

• …

© Copyright 2018 Koivu Solutions Oy CONFIDENTIAL 55