20
Open licensing of software and data for public policy analysis and for collaborative research Robbie Morrison • Berlin, Germany Open source software and open data Copyright (c) 2021 Robbie Morrison <[email protected]> This work is licensed under a Creative Commons Attribution 4.0 International (CC‑BY‑4.0) License Release 03 • 12 February 2021 10 February 2021 Online presentation to Seminar – Introduction to Software Licensing in Europe • Chair of Civil Law, Technology, and IT Law • Humboldt University • Berlin, Germany

Open source software and open data

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Open licensing of software and data for public policy analysis and for collaborative research

Robbie Morrison • Berlin, Germany

Open source softwareand open data

Copyright (c) 2021 Robbie Morrison <[email protected]>This work is licensed under a Creative Commons Attribution 4.0 International (CC‑BY‑4.0) LicenseRelease 03 • 12 February 2021

10 February 2021

Online presentation toSeminar – Introduction to Software Licensing in Europe • Chair of Civil Law,

Technology, and IT Law • Humboldt University • Berlin, Germany

2

Scene setting● European focus:

– database protection under Database Directive 96/9/EC

● source code for research and for public policy analysis

● numerical data that is public and non‑personal, therefore:

– exclude privately‑held data, including consortium data and brokered data

– exclude personal data and therefore allied privacy issues

● examples from energy system analysis and related public interest goals:

– net‑zero carbon by 2050 or earlier (non‑negotiable and based on available budgets)

– accessible and reliable

– nature protection

3

Two revolutions

● open source software– estimates of 90%– partly facilitated by the cloud: software as a service (SaaS)– large software stacks: challenge of software license identification and compliance– quality through development: bug‑fixes, refactoring, testing, user‑base feedback

● data "tsunami"– mostly facilitated by advances in computer science and hardware– data needs curation: one model is community curation– provenance counts: difficult to recover lost quality– data versioning at scale remains a research question– primary data versus processed data

4

Simple code example● source code (explicitly compiled language because the steps are distinct)

– hello.cc

// SPDX-FileCopyrightText: 2021 Robbie Morrison <[email protected]>// SPDX-License-Identifier: GPL-3.0

#include <iostream>

int main(){ std::cout << "Hello, world!" << std::endl;}

● compiled software (build and test steps omitted)– hello.exe– which you can distribute under the GPL–3.0 license– notwithstanding, the example shown is too short to attract copyright protection– more information under FSFE REUSE initiative: https://reuse.software

5

Data example

weather and satellite data is nonetheless run through powerful climate models to produce the displayed results

Masson-Delmotte, Valérie, Panmao Zhai, Hans-Otto Pörtner, Debra Roberts, Jim Skea, Priyadarshi R Shukla, Anna Pirani, Wilfran Moufouma-Okia, Clotilde Péan, Roz Pidcock, Sarah Connors, JB Robin Matthews, Yang Chen, Xiao Zhou, Melissa I Gomis, Elisabeth Lonnoy, Tom Maycock, Melinda Tignor, and Tim Waterfield (editors) (2018). Global Warming of 1.5°C. An IPCC Special Report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways, in the context of strengthening the global response to the threat of climate change, sustainable development, and efforts to eradicate poverty. Geneva, Switzerland: Intergovernmental Panel on Climate Change (IPCC).

Figure SPM.1.a from IPCC SR15

6

Energy system modeling● computer simulations

– no graphical interface– often backcast from some future desired state– necessarily scenario‑based (forecasts not possible)– embed constraints (demand is covered, climate protection goals, limits on technology uptake, land availability)– embed feedbacks (cost reductions from deployment) – usually contain explicit management objectives (least incremental short‑run cost or least long‑run cost)– essentially report a sequence of decisions (concerning operations, investments, and technology choices)– increasing interest from policymakers

● various modeling paradigms– interdisciplinary: engineering, microeconomics, optimization, public policy, decision science, climate science, land use– integrated assessment models to 2100 (which often select carbon capture technologies after 2050)– energy system models to 2050 (technology rich, solved sequentially or as entire horizon)– next-steps models (also used for utility planning)– hybrid models with private decision‑taking (bounded rationality, social physics)

7

Another revolutiondomain‑wide cooperation on data

● once research artifacts are freely usable and reusable, researchers can start cooperating on shared agendas:– data semantics:

● domain ontology (highly formal representation of concepts and relationships)● standardized terminology (derives from above)● metadata standards

– technical standards:● data storage and interchange (low‑level protocols)● data‑centric tooling (high‑level schemas)

– data portals:● funded curation and community curation● work on linked open data (LOD) concepts to federate databases dispersed across the web

– standardized scenarios:● common reporting● cross‑model comparisons (vitally important for confidence)

8

Legal contexts European Union United States

software patents rejected yes

database protection Directive 96/9/EC unlikely

injunctions against intermediaries problematic

● competition law and law of civil wrongs and doctrines like misappropriation may apply● data covered by personal privacy omitted from this discussion● concerted effort by European Commission to develop a new industrial data right (IDR) possibly shelved

Anon (24 January 2020). B2 — Analytical report on EU law applicable to sharing of non-personal data — V2.0. Capgemini Invent, Fraunhofer FOCUS, Timelex, Support Centre for Data Sharing. Report for DG Connect (DG = European Union Directorate-General).reco

mm

ende

d

Husovec, Martin (November 2017). Injunctions against intermediaries in the European Union: accountable but not liable. Cambridge, United Kingdom: Cambridge University Press. ISBN 978-1-108-41506-4. doi:10.1017/9781108227421.

9

Database Directive 96/9/EC

● legal definition of a "database" is very broad:– includes printed maps

● a database is protected where both the:– direct investment is substantial– extraction is substantial– the substantiality principle derives from copyright law

● directive intended to support a database industry in Europe– instead material gets harvested and used to stock US servers– creates legal uncertainty for risk‑averse researchers in the absence of suitable open licenses

Covers European Economic Area (EEA)

Davidson, Mark J (January 2008). The legal protection of databases. Cambridge, United Kingdom: Cambridge University Press. ISBN 978-0-521-04945-0. Paperback edition.

10

US Copyright Office (2017)

"The notion of technical databases being creative is largely mutually exclusive. Orthodox database are highly structured, but they are not much selected and arranged. Nonorthodox databases, while not highly structured, are similarly even less likely to be selected and arranged."

US Copyright Office (November 2017). The Compendium of US Copyright Office Practices — Third edition: Chapter 700. US Government.

§727

11Ger

man

cop

yrig

htde

finiti

ons

Urh

eber

rech

tsge

setz

(U

rhG

) (D

euts

chla

nd)

UM

L (u

nifie

d m

odel

ing

lang

uage

) cl

ass

diag

ram

12

Open/closed spectrumnon-revealed data

negotiated bilateral agreements

consortium data

brokered data

open data choice of license issue

trade secret must have competition value

shared data

legally siloed

widely usable and reusable

high transactions costs

with open licenses(to provide certainty,particularly in Europe)

consortium data portalsbeing replaced by morenuanced brokered data

data usage attributes aretagged by the provider

proposed: CC‑BY‑4.0 or CC0‑1.0 (or something inbound compatible)

some public policy supportfor sharing non-open data

may involve commercialor personal privacy

private data

may also be of public in tere

st

privatel y‑held d ata

although some may be subject to statutory reporting

data under disclosure European legislation silent on licensingstatutory reporting

on public interest grounds

general data no

t covere d

public d ata

13

Touchstone definitions

See also: Morrison, Robbie and contributors (22 February 2019). Definitions for open. openmod forum. Germany.

Open Knowledge Foundation (no date). Open Definition 2.1 — Defining open in open data, open content and open knowledge. Open Knowledge Foundation (OKF). Cambridge, United Kingdom.

European Commission (26 June 2019). "Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information — PE/28/2019/REV/1"Official Journal of the European Union. L 172: 56–83. The directive entered into force on 16 July 2019. Recital 16 (page 58) begins:

"Open data as a concept is generally understood to denote data in an open format that can be freely used, re-used and shared by anyone for any purpose."

Touchstone definitions for "open data", one community and one statutory

14Ope

n da

ta li

cens

esLi

cens

e co

mpa

tibili

ty g

raph

15Cod

e/da

ta la

ndsc

ape

16

Open data licenses

Creative Commons CC–BY–4.0

– introduced 25 November 2013– first data‑capable license (that deals with the Database Directive)– requires attribution and attribution tracking– material may be modified or mixed and licensed under more restrictive terms but the attribution requirement must remain

Creative Commons CC0–1.0

– public domain dedication– falls back to maximally permissive license in civil law jurisdictions like those in Europe– no legal obligations for users– metadata should always be licensed CC0–1.0

my personal picks, others might add ODC ODbL–1.0

in most case, open data licenses do not provide permission, rather they offer certainty

17

Statutory reportingInformation under mandate to be published for public interest reasons

European Commission (15 June 2013). "Commission Regulation (EU) No 543/2013 of 14 June 2013 on submission and publication of data in electricity markets and amending Annex I to Regulation (EC) No 714/2009 of the European Parliament and of the Council (text with EEA relevance)"Official Journal of the European Union. L 163: 1–12.

European Commission (8 December 2011). "Regulation (EU) No 1227/2011 of the European Parliament and of the Council of 25 October 2011 on wholesale energy market integrity and transparency (text with EEA relevance)"Official Journal of the European Union. L 326: 1–16.

● public interest reasons include promoting system security and protecting against market failure● the problem is that the legislation mandated publication but was silent on licensing● the power exchanges (PX) make every effort to ensure reported material cannot be harvested and used● push from researchers and regulators to adopt CC‑BY‑4.0 licensing

18

CodaI want to acknowledge the shocking state that my generation has left our planet in. The 1990 IPCC First Assessment Report was entirely clear on the magnitude and urgency of the climate emergency.

First Fridays for Future school strike on 14 December 2018, Berlin, Germany.Robbie Morrison. Copyright retained.

19

Wish list

● public sector information be CC‑BY‑4.0 licensed by default● material under statutory reporting be CC‑BY‑4.0 licensed by law● the Database Directive 96/9/EC be repealed● better support provided for public domain dedication● avoid national data licenses (like the German government dl-de/by-2-0 license)

20

Some readings

Anon (24 January 2020). B2 — Analytical report on EU law applicable to sharing of non-personal data — V2.0. Capgemini Invent, Fraunhofer FOCUS, Timelex, Support Centre for Data Sharing. Report for DG Connect (DG = European Union Directorate-General).

Hirth, Lion (1 January 2020). "Open data for electricity modeling: legal aspects". Energy Strategy Reviews. 27: 100433. ISSN 2211-467X. doi:10.1016/j.esr.2019.100433. Open access.

Bimesdörfe, Kathrin (editor) (February 2019). Datenlizenzen für Open Government Data: Rechtliches Kurzgutachten: Handreichung zu den Nutzungsrechteregelungen gebräuchlicher Open Data Lizenzen und Empfehlungen für ihren Einsatz [Data licenses for Open Government Data: Legal brief: Guidance on the usage rights of common open data licenses and recommendations for their use] (in German). Düsseldorf, Germany: Ministerium für Wirtschaft, Innovation, Digitalisierung und Energie des Landes Nordrhein-Westfalen.

Davidson, Mark J (January 2008). The legal protection of databases. Cambridge, United Kingdom: Cambridge University Press. ISBN 978-0-521-04945-0. Paperback edition.

Stepanov, Ivan (2 January 2020). "Introducing a property right over data in the EU: the data producer's right — an evaluation". International Review of Law, Computers and Technology. 34 (1): 65–86. ISSN 1360-0869. doi:10.1080/13600869.2019.1631621. Open access.