43
1 © 2018 All rights reserved. Distributed Database Architecture for GDPR Karthik Ranganathan PostgresConf Silicon Valley Oct 15, 2018

Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

1© 2018 All rights reserved.

Distributed Database Architecture for GDPR

Karthik RanganathanPostgresConf Silicon Valley

Oct 15, 2018

Page 2: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

2© 2018 All rights reserved.

About Us

Kannan Muthukkaruppan, CEONutanix ♦ Facebook ♦ Oracle

IIT-Madras, University of California-Berkeley

Karthik Ranganathan, CTONutanix ♦ Facebook ♦Microsoft

IIT-Madras, University of Texas-Austin

Mikhail Bautin, Software ArchitectClearStory Data ♦ Facebook ♦ D.E.Shaw

Nizhny Novgorod State University, Stony Brook

ü Founded Feb 2016

ü Apache HBase committers and early engineers on Apache Cassandra

ü Built Facebook’s NoSQL platform powered by Apache HBase

ü Scaled the platform to serve many mission-critical use cases• Facebook Messages (Messenger)• Operational Data Store (Time series Data)

ü Reassembled the same Facebook team at YugaByte along with engineers from Oracle, Google, Nutanix and LinkedIn

Founders

Page 3: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

3© 2018 All rights reserved.

WHAT ISYUGABYTE DB?

Page 4: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

4© 2018 All rights reserved.

A transactional, planet-scale database

for building high-performance cloud services.

Page 5: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

5© 2018 All rights reserved.

NoSQL + SQL Cloud Native

Page 6: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

6© 2018 All rights reserved.

TRANSACTIONAL PLANET-SCALEHIGH PERFORMANCE

Single Shard & Distributed ACID Txns

Document-Based, Strongly Consistent Storage

Low Latency, Tunable Reads

High Throughput

OPEN SOURCE

Apache 2.0

Popular APIs ExtendedApache Cassandra, Redis and PostgreSQL (BETA)

Auto Sharding & Rebalancing

Global Data Distribution

Design Principles

CLOUD NATIVE

Built For The Container Era

Self-Healing, Fault-Tolerant

Page 7: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

7© 2018 All rights reserved.

WHAT IS GDPR?

Page 8: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

8© 2018 All rights reserved.

GDPR : General Data Protection Regulation

Page 9: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

9© 2018 All rights reserved.

Citizens of EU can control sharing and protection

of their personal data by businesses.

Page 10: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

10© 2018 All rights reserved.

Personal Data, also called

PII (Personally Identifiable Information)

• User name

• Email address

• Date of birth

• Bank details

• Location details

• Computer IP address

Page 11: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

11© 2018 All rights reserved.

Control over personal data

• Consent & data location

• Data privacy and safety

• Right to be forgotten

• Data access on demand

• Notify on data breach

• Data portability

• Ability to fix errors in data

• Restrict processing

Database concerns Application concerns

Page 12: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

12© 2018 All rights reserved.

#1 USER CONSENTAND DATA LOCATION

Page 13: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

13© 2018 All rights reserved.

Data must be stored in EU by default. Businesses

need explicit user consent to move it outside.

Page 14: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

14© 2018 All rights reserved.

Why is this hard?

• EU user data lives in that region

• Other countries have compliance regulation – more geo’s

• Public clouds may not have coverage – hybrid deployments

• Architecture depends on data – multiple per service

Think Global Deployments first!

Page 15: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

15© 2018 All rights reserved.

Example – online ecommerce site

• Products table needs globally replication – not PII data

Page 16: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

16© 2018 All rights reserved.

Read Replicas

Global Replication

Non-PII Data

Global Replication with YugaByte DB

Page 17: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

17© 2018 All rights reserved.

Example – online ecommerce site

• Users, orders and shipments needs locality – PII data

• Product locations table needs scale – may be PII

Page 18: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

18© 2018 All rights reserved.

Primary Data in EU

PII Data

Non-EU Data

Non-EU DataGeo-Partitioning

with YugaByte DB

Page 19: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

19© 2018 All rights reserved.

Replicate data on demand to other geo’s

• User may be ok with replicating data

• Read replicas on demand (for remote, low-latency reads)

• Change data capture (for analytics)

Page 20: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

20© 2018 All rights reserved.

Read Replicas

Primary Data in EU

PII Data with YugaByte DB

Read Replicas with YugaByte DB

Page 21: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

21© 2018 All rights reserved.

#2 DATA PRIVACYAND SAFETY

Page 22: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

22© 2018 All rights reserved.

Data must be secured by using best practices by

default. Users need to be notified on breach.

Page 23: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

23© 2018 All rights reserved.

Implement end-to-end encryption on day #1

Page 24: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

24© 2018 All rights reserved.

• Use TLS Encryption

• Between client and server for app interaction

• Between database servers for replication

Encrypt All Network Communication

Page 25: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

25© 2018 All rights reserved.

TLS Encryption

Database Cluster

User

Server to server communication

Page 26: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

26© 2018 All rights reserved.

• Encryption at rest

• Integrate with external Key Management Systems

• Ability to rotate keys on demand

Encryption All Storage

Have a key-value table with id to cipher key. Encrypt PII data with

the cipher key for fine-grained control. More in the next section.

Page 27: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

27© 2018 All rights reserved.

Encryption at Rest

Database Cluster

User

Encryption on disk

Key Management Service

Page 28: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

28© 2018 All rights reserved.

#3 RIGHT TO BE FORGOTTEN

Page 29: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

29© 2018 All rights reserved.

Data must be erased if on explicit request or when

data is no longer relevant to original intent.

Page 30: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

30© 2018 All rights reserved.

• Have a key-value table with id to cipher key

• Encrypt PII data with the cipher key on write

• Decrypt PII data on access

• Delete cipher key to forget PII data

Use Encryption of Data Attributes

Page 31: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

31© 2018 All rights reserved.

SET [email protected] FOR USER ID=XXX

Example - Storing User Profile Data

SET email=ENCRYPTED FOR USER ID=XXX

Get encryption key for user

Encryption PII DataStore encrypted data

• Reads require decryption• Data not accessible without key

Page 32: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

32© 2018 All rights reserved.

• Many cases where value not needed

• Anonymize PII data with one way hash functions

• Use hashed ids for in data warehouse

• There is no PII data if hashed ids are used!

Use Anonymization of Data Attributes

Page 33: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

33© 2018 All rights reserved.

[email protected] CHECKED OUT PRODUCT=X, CATEGORY=Gadget

Example – Website Analytics

USER=HASHED_VAL CHECKED OUT PRODUCT=X, CATEGORY=Gadget

One-way hash user id

Analytics

Page 34: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

34© 2018 All rights reserved.

Example – Website Analytics

• User no longer identifiable• Hashed data still useful!

Page 35: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

35© 2018 All rights reserved.

#4 DATA ACCESSON DEMAND

Page 36: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

36© 2018 All rights reserved.

Ability to inform a user about what data is being

used, for what purpose and where it is stored.

Page 37: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

37© 2018 All rights reserved.

• Store in a separate information architecture table

• Make tagging a part of the process

• Easy to find what PII data is stored on demand

Tag Tables and Columns with PII

Page 38: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

38© 2018 All rights reserved.

• Ensure PII are encrypted

• Ensure non-PII columns do not have sensitive data

• Use Spark/Presto to perform scan periodically

• Run scan on a read replica to not impact production

Run Continuous Compliance Checks

Page 39: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

39© 2018 All rights reserved.

Ensure PII columns are encrypted

Ensure no PII data in other columns

Tag PII Columns

Page 40: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

40© 2018 All rights reserved.

PUTTING IT ALL TOGETHER

Page 41: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

41© 2018 All rights reserved.

GDPR Reference Architecture

Primary Cluster(in EU)

Read Replica Clusters(Anywhere in the World)

Encrypted Encrypted

App clients

Encrypted Async Replication

Reads & Writes, Encrypted

Analytics clients

Read only, Encrypted

At-Rest Encryption for All Nodes At-Rest Encryption for All Nodes

PII Columns Encrypted w/ Cipher Key

Tag PII Columns

Ensure PII columns are encrypted

Ensure no PII data in other columns

Page 42: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

42© 2018 All rights reserved.

Page 43: Distributed Database Architecture for GDPR · 10/15/2018  · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache

43© 2018 All rights reserved.

Questions?Try it at docs.yugabyte.com/quick-start