Data Lake Data Pipeline & (in Google Cloud)

Preview:

Citation preview

Megazone Google Cloud Team

Jung hoo Park

Data Pipeline & Data Lake

(in Google Cloud)

● What is a Data Pipeline?

● Data Engineering in Google Cloud

Agenda

01What is a Data Pipeline?

Conceptual Data Pipeline

Application ETL / ELT Data Analyst

PC Game

Data Scientist

Mobile Game

Media Contents Delivery Service

etc….

Ingest

Processing

Governance

Data Wrangling

Visualization

Report & Dashboard

Model Training

Model Serving

Streaming

Batch

서버 구매 서버설정 OS 인스톨 OS 설정 OS 최적화 OS 디버그 프로비저닝

재설정

스케일

일반적인 데이터 처리 환경 프로비저닝

02Google Cloud Service for Data Pipeline

StackdriverLogging

CloudPub/Sub

CloudStorage

Cloud IoT Core

CloudDatastore

CloudBigtable

CloudDataproc

CloudData ow

BigQuery

CloudDatalab

Ingest Process Store Analyze

CloudSpanner

Visualize

BigQuery

CloudDatalab

Data Studio

BigQueryStreaming API

3rd Party

Transfer Appliance

CloudDatalab

Transfer Service

CloudSQL

Cloud Dataprep

Cloud Dataprep

CloudComposer

StackdriverLogging

CloudPub/Sub

CloudStorage

Cloud IoT Core

CloudDatastore

CloudBigtable

CloudData ow

BigQuery

CloudDatalab

Ingest Process Store Analyze

CloudSpanner

Visualize

BigQuery

CloudDatalab

Data Studio

BigQueryStreaming API

3rd Party

Transfer Appliance

CloudDatalab

Transfer Service

CloudSQL

Cloud Dataprep

Cloud Dataprep

CloudComposer

CloudDataproc

Ingest

Cloud Pub/Sub

Global by default

No provisioning, auto-everything

Exactly-once processing

Seek and replay

Storage Transfer Service

Centralized job management

High-performance copies

Transfer data from cloud to cloud

Data security

Transfer data from bucket to bucket

Processing

Cloud Dataflow

Dataflow SQL

Dataflow Template

Streaming Engine

Inline Monitoring

Dataflow Shuffle

Auto-Scaling

Cloud Composer

Integration

Python Language

Multi-Cloud

Fully-Managed

Hybrid

Open Source

Store & Analyze

Cloud Storage

Pub/Sub Notifications for Cloud Storage

Customer-managed encryption keys

Object Version Management

Cloud Audit Logs with Cloud Storage

Retention Policies

Object Life-cycle Management

Google Cloud Storage Class

BigQuery

Foundation for AI & BI

Big data ecosystem integration

Petabyte Scale

Geo-expansion

Data Transfer Service

Serverless

03Demo

Architecture: Demo Scenario

BigQuery

Real-Time Events

Cloud Pub/Sub Cloud Dataflow

Thank you