35
Trusted Smart Statistics: What it is Why it comes Where it brings us Fabio Ricciato [email protected] EUROSTAT - Big Data Task Force Smart Statistics 4 Smart Cities Kalamata, Greece, 6.10.2018

Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Trusted Smart Statistics: What it is Why it comes Where it brings us

Fabio Ricciato [email protected]

EUROSTAT - Big Data Task Force Smart Statistics 4 Smart Cities Kalamata, Greece, 6.10.2018

Page 2: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

The new datafied world

•  The cyber world is natively digitial. And the physical world is being increasingly digitized (IoT, Smart Devices…)

•  “Anything that goes digital, gets logged” (somewehere, by somebody) 1° fundamental law of datafication

digitalization à datafication

my mobile phone operator

Page 3: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

The new datafied world

•  The cyber world is natively digitial. And the physical world is being increasingly digitized (IoT, Smart Devices…)

•  “Anything that goes digital, gets logged” (somewehere, by somebody) 1° fundamental law of datafication

digitalization à datafication •  Individuals, organizations, places … become “data fountains” •  More and more business companies become “data buckets”

my mobile phone operator my energy provider

my app provider

me and my smart devices

Page 4: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

data and new data

Name. Gender. Birth date. Marital Status. Residence address. Occupation. Household composition… Monthly income. Monthly expenditures per good category. Number of touristic trips in a year.

“micro-data” •  Features about the

individual •  changing slowly or rarely •  recorded at coarse

temporal aggregation (months, years).

Page 5: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

data and new data

Name. Gender. Birth date. Marital Status. Residence address. Occupation. Household composition… Monthly income. Monthly expenditures per good category. Number of touristic trips in a year.

Your exact location, every second. Every single heart-beat, blood pressure… Every single transaction, purchases, encounter, event involving you… Your current opinion on any single fact…

… •  Features about single

events, transactions à highly pervasive, sub-individual level

•  changing continuously •  recorded at fine temporal

aggregation (minutes, seconds)

“micro-data”

“nano-data”

•  Features about the individual

•  changing slowly or rarely •  recorded at coarse

temporal aggregation (months, years).

Page 6: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

data and new data

Name. Gender. Birth date. Marital Status. Residence address. Occupation. Household composition… … Monthly income. Monthly expenditures per good category. Number of touristic trips in a year …

Your exact location, every second. Every single heart-beat, blood pressure… Every single transaction, purchases, encounter, event involving you… Your current opinion on any single fact…

“Deep data”

“Shallow data”

“micro-data”

“nano-data”

Page 7: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Official Statistics.

•  The ultimate goal of Official Statistics is to produce macro-data (statistics) from input micro-data •  Collection of micro-data as ancillary task

macro-data (statistics)

micro-data (abut individual)

Page 8: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Official Statistics. Augmented

•  Availability of new (deep, nano) data sources as opportunity to extend & empower Official Statistics

macro-data (statistics)

micro-data (abut individual)

nano-data (sub-individual)

Additional statistical products: more dimensions, better timeliness, finer spatio/temporal granularity, …

Additional processes

Additional Input Data Sources

Additional micro-data, possibly derived from nano-data

Page 9: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Where the data can be accessed?

smart car smart home

smartphone

carmaker energy

company online

platforms

smartwatch Statistical

Office

B2G channel Business(Bucket?)-to-Government access to privately-held data private-public partnerships …

C2G channel Citizens-to-Government Crodwsourcing, Smart Surveys Citizen Statistics!

Page 10: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Official Statistics based on survey data

society, economy policy,

media, research SO

collection processing

Public sector

SO: Statistical Office

Page 11: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Official Statistics based on survey data and administrative data

society, economy policy,

media, research SO

Public sector

collection processing

processing

SO: Statistical Office

Page 12: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

and now Big Data come into play

society, economy policy,

media, research SO

Public sector

collection processing

processing

Private sector (business and citizens)

Page 13: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Handling the new in old ways Pull data in

society, economy policy,

media, research SO

Public sector

collection processing

processing

processing processing processing processing

Private sector

x This is not feasible. Technical scalability, organisational, legal (risk concentration), …

Page 14: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Handling the new in old ways Pull data in

society, economy policy,

media, research SO

Public sector

collection processing

processing

processing processing processing processing

Private sector

Deep data

“Shallow data”

micro-data

nano-data

x This is not feasible. Technical scalability, organisational, legal (risk concentration), …

Page 15: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Handle the new in new ways Push computation out (partially)

society, economy policy,

media, research SO

Public sector

collection processing

processing

Private sector

Page 16: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

society, economy policy,

media, research SO

Public sector

collection processing

Handle the new in new ways Push computation out (partially)

processing processing processing processing

processing processing processing processing processing processing processing processing

Private sector

processing

Trusted Smart Statistics

Page 17: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Trusted Smart Statistics

processing processing processing processing

processing processing processing processing processing processing processing processing

Smart: externalization towards data sources of the (intial) part of processing execution Leveraging the “smart” features of the data sources (often Smart Systems, Smart Objects) and other “smart technologies” (e.g., Smart Contracts).

Trusted: ensure an articulated set of trust guarantees to all players (SO as “taker” and “giver” of trust guarantees)

Smart Statiscs as an opportunity to deliver more advanced statistical products, more timely (nowcasting), more targeted to specific user groups, through novel reporting and presentation ways …

Private sector (business and citizens)

SO

Trusted Smart Statistics

Guarantee that data are processed for the agreed purpose, by the agreed method, respect of user privacy &

business confidentiality, compliance with legal provisions …

Page 18: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Towards a Reference Architecture for Trusted Smart Statistics

Design Principles

Reference Architecture

Implementation

Work-in-progress at Eurostat in coordination with ESS European Statistical System in dialogue with other stakeholders •  Private Data Holders •  Researchers, Academic communities •  Data Protection Authorities •  other arms of European Commission •  National and Local authorities •  …

Specifications

Page 19: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Some design principles

1.  Processing method (algorithm) transparent to all involved parties •  co-designed or at least agreed-upon (consensus-based design)

2.  Data are not “moved to/shared with”, but only “used by” the Statistical Office – goal is the output, not the input! •  Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

3.  Engage and partner with the input parties •  Incentives might involve “giving back” computation output to them

4.  Agreement for data usage bound to computation instance. •  Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

5.  Purpose and algorithms open for public scrutiny •  public transparency à public trust

Page 20: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

consensus

SO DH-1

source code approved by all parties

1

DH-2 CA

Certification Authority?

Statistical Office

Data Holders

Page 21: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Some design principles

1.  Processing method (algorithm) transparent to all involved parties •  co-designed or at least agreed-upon (consensus-based design)

2.  Data are not “moved to/shared with”, but only “used by” the Statistical Office – goal is the output, not the input! •  Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

3.  Engage and partner with the input parties •  Incentives might involve “giving back” computation output to them

4.  Agreement for data usage bound to computation instance. •  Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

5.  Purpose and algorithms open for public scrutiny •  public transparency à public trust

Page 22: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

DH-1

consensus

CA SO DH-1

source code approved by all parties

1

confidential input data

2

SO

official statistics

[…]

DH-2

authenticated binary code executed in secure hardwar

DH-2

secret shares

non-personal intermediate data exported to SO

Page 23: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

secret shares

Secure Multi-Party Computation (SMPC) infrastructure

confidential input data

computation output (non-personal)

SMPC computation

An infrastructure (technology + organizational provisions) to let the output information be extracted without exchanging the input data

Page 24: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

DH-1

confidential input data

SO

official statistics

[…]

DH-2

authenticated binary code executed in secure hardwar

secret shares

B2G scenario with multiple DHs

Page 25: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

DH-1

confidential input data

SO

official statistics

[…]

DH-2

authenticated binary code executed in secure hardwar

secret shares

confidential input data

SO

BG2G scenario: SO providing input data

Page 26: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

DH-1

confidential input data

SO

official statistics

non-persona ldata exported to SO

[…]

DH-2

authenticated binary code executed in secure hardwar

secret shares

confidential input data

SO

commercial analytics

non-personal data exported for commercial purpose

[…]

private company

B2G2B scenario: giving back to the private sector!

B&G Partnership model?

Returning some output analytics product to the private sector for legitimate business purposes (with certification), might facilitate partnership models between Statistical Offices and private Data Holders

Page 27: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

DH-1

confidential input data

DH-2

authenticated binary code executed in secure hardwar

secret shares

commercial analytics

non-personal data exported for commercial purpose

[…]

private company

Reusing the infrastructure for B2B analytics?

Page 28: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Some design principles

1.  Processing method (algorithm) transparent to all involved parties •  co-designed or at least agreed-upon (consensus-based design)

2.  Data are not “moved to/shared with”, but only “used by” the Statistical Office – goal is the output, not the input! •  Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

3.  Engage and partner with the input parties •  Incentives might involve “giving back” computation output to them

4.  Agreement for data usage bound to computation instance. •  Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

5.  Purpose and algorithms open for public scrutiny •  public transparency à public trust

Page 29: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Some design principles

1.  Processing method (algorithm) transparent to all involved parties •  co-designed or at least agreed-upon (consensus-based design)

2.  Data are not “moved to/shared with”, but only “used by” the Statistical Office – goal is the output, not the input! •  Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

3.  Engage and partner with the input parties •  Incentives might involve “giving back” computation output to them

4.  Agreement for data usage bound to computation instance. •  Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

5.  Purpose and algorithms open for public scrutiny •  public transparency à public trust

Page 30: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Sharing input data Using input data on per-purpose basis

Statistical Office

agreement #1

agreement #2

query #1

query #2

Statistical Office

agreement

data import query #1

query #2

query #3

Page 31: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Some design principles

1.  Processing method (algorithm) transparent to all involved parties •  co-designed or at least agreed-upon (consensus-based design)

2.  Data are not “moved to/shared with”, but only “used by” the Statistical Office – goal is the output, not the input! •  Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

3.  Engage and partner with the input parties •  Incentives might involve “giving back” computation output to them

4.  Agreement for data usage bound to computation instance. •  Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

5.  Purpose and algorithms open for public scrutiny •  more public transparency à more public trust

more data pervasiveness

more public transparency

Page 32: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Some design principles

1.  Processing method (algorithm) transparent to all involved parties •  co-designed or at least agreed-upon (consensus-based design)

2.  Data are not “moved to/shared with”, but only “used by” the Statistical Office – goal is the output, not the input! •  Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

3.  Engage and partner with the input parties •  Incentives might involve “giving back” computation output to them

4.  Agreement for data usage bound to computation instance. •  Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

5.  Purpose and algorithms open for public scrutiny •  more public transparency à more public trust

Page 33: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Some slogans to shout loud

•  Let the information flow, not the data! •  Don’t show your data to me, but let me use it!

•  Share/distribute the computation share/distribute the control don’t share/distribute the data!

•  Close the data, open the algorithms!

•  Using more pervasive data calls for •  à more public transparency (open-source) •  à more checks and balances

(distributed control, consensus, certification authorities?) •  à stronger engagement of sources (fountains and buckets)

Page 34: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Take home message

•  Trusted Smart Statistics = the future of Official Statistics •  New sources of “big” data as input: more pervasive, timely,

heterogeneous... and often privately held!

•  Exploiting such data for Official Statistics requires a new architecture to build “trust” among all stakeholders à ongoing work in Eurostat

•  Key ingredients: SMPC and/or Trusted Hardware, open algorithms, source-code certification (?),…

•  Once deployed, the same platform can be reused for other public interest purposes (and perhaps even for B2B applications)

Page 35: Trusted Smart Statistics: What it is Why it comes Where it ... · being increasingly digitized (IoT, ... • The ultimate goal of Official Statistics is to produce macro-data (statistics)

Thanks for your attention

For follow-up contact [email protected]