24
Confidential Computing Analytics with data privacy and control Brian Thorne

Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Confidential ComputingAnalytics with data privacy and control

Brian Thorne

Page 2: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Privacy Preserving LinkageMotivation

Page 3: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Computation

Result

Org A Org B

Mutually Trusted

Multi-Organisation Analytics Today

Confidential Computing | [email protected] |

Page 4: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Computation

Result

NOT Trusted!

Confidential Confidential

But many Opportunities are Blocked

Confidential Computing | [email protected] |

Page 5: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Entity Matching

Page 6: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Confidential Computing | [email protected]

Overview of a typical

data integration

project within GOV

6 |

Page 7: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Confidential Computing | [email protected]

area I’m covering today

7 |

Page 8: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Privacy-preserving entity resolution

● Goal: match corresponding rows in two distinct databases

● Constraint: can’t share Personally Identifiable Information (PII)

● Solution: fuzzy & private matching

8

Page 9: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Privacy-preserving entity resolution

9

Page 10: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

How?

For every record we process the PII into a

Cryptographic Longterm Key or (CLK)

Briefly, we hash the bi-grams for each PII feature into a

bloom filter.

https://github.com/n1analytics/clkhash/

10

Page 11: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Cryptographic Longterm Key

11 |

Page 12: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Private Record Linkage

12 |

Fuzzy Matcher

Shared Secret Salt

Hasher

Personally Identifiable Information

AnonymousHashes

Hasher

Personally Identifiable Information

AnonymousHashes

Linkage Table

Semi-Trusted Party

Company A Company B

Confidential Computing | Brian Thorne

PII cannot be recovered from the hashes

Page 13: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

anonlink

Page 14: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Confidential Computing | [email protected]

Semi-trusted Third Party

• Only hashed data is uploaded to the entity resolution service

• Hash security relies on a shared secret between parties

• Implemented the service with a simple JSON + REST API

• All communication is secured with HTTPS

• Authentication tokens created for each job

• Result type and agreed schema is set at beginning

Page 15: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Confidential Computing | [email protected]

Client side: Command Line Utility

• Locally hashes PII data

• Creates new mapping jobs

on the server

• Uploads hash data

• Retrieves results

Page 16: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Confidential Computing | [email protected] |

Page 17: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Performance & Case Study

Page 18: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Confidential Computing | [email protected]

Speed and Scale

• 1.3B hash comparisons/s• Handle uploads of 35M hashes• 1M x 1M match takes around 5 hours

Running on four r4.4xlarge instances on AWS

Page 19: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Confidential Computing | [email protected]

Computing similarity between CLKs is a very parallel problem. Our

implementation utilizes multiple workers to carry out comparisons

using a kubernetes cluster

19 |

Page 20: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Data61 Privacy Projects

Protari, SENDA, Risk Identification, N1, Private Linkage

Page 21: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Confidential Computing

Confidential Computing21 |

Organisation 1

Cloud / data center

Sensitive

data

Health; sensor; finance;

government; location;

etc.

Organisation 2

Cloud / data center

Sensitive

data

How can we learn insights from data from multiple sources

and protect its value?

Insights

Joint

Analysi

s Health; sensor; finance;

government; location;

etc.

Page 22: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Protari

22 |

Page 23: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

N1 Analytics

23 |

Release your data without losing control Access data that is currently too sensitive

Fully, Somewhat, Partially

Homomorphic

encryption

Secure Multiparty Compute Learning from Aggregates

Learn and deploy models Secure aggregation of dataClustering/Anomaly

Detection

Goals

:

Technologies

:

Capabilities

:

Confidential Computing

Page 24: Analytics with data privacy and control · N1 Analytics 23 | Release your data without losing control Access data that is currently too sensitive Fully, Somewhat, Partially Homomorphic

Confidential ComputingAnalytics with data privacy and control

www.n1analytics.com