Upload
vasa-curcin
View
166
Download
1
Embed Size (px)
Citation preview
Provenance abstraction for implementing security policiesLearning Health System and securing provenance of health data
Dr Vasa CurcinKing’s College London
Overview
• Learning Health System• LHS requirements for provenance data• TRANSFoRm project• Transformation-oriented Access Control Language
for Provenance (TACLP)
Learning Health System
“ ... one in which progress in science, informatics, and care culture align to generate new knowledge as an ongoing, natural by-product of the care experience, and seamlessly refine and deliver best practices for continuous improvement in health and health care.” (Institute of Medicine)
We can’t afford to waste data!
Learning Health System
Defining functions of a LHS are to:1. routinely and securely aggregate data from disparate sources2. convert the data to knowledge3. disseminate that knowledge, in actionable forms, to everyone who can
benefit from it.c/o C. Friedman
Learning Health System take-up
• US medical/academic centreso Mayo, Duke, Vanderbilto PCORI
• National data aggregatorso Clinical Practice Research Datalinko NIVEL
• EHR vendorso CSC, Asseco, TPP, InPractice Systems
• European academic-industrial collaborationso TRANSFoRm, EHR4CR, Semantic
HealthNet
…and Bill
Example: Clinical trial challenges
• Major motivation for the LHS work• Trials too expensive and difficult to run• Efficacy-effectiveness gap (EEG)
o Disconnect between outcomes from clinical trials and information needed for clinical practice
o Interaction of drug effect and real-life contextual factorso Challenge to identify contextual factors
• LHS provides context and workflow
LHS for Clinical Trials
• EHR integrationo Eligibility checking done automatically from EHR datao eCRFs partially filled based on EHR informationo All collected data stored in the EHR system as well as the
research database• Closing the loop
o eCRF data enriches the EHRo Helps the cliniciano Adds value to the EHR system
• Data does not go to waste!7
Trust in the LHS
• Research community is struggling to ensure transparency and correctness of published research
• Reasons complex and interleaving (positive bias, intractable analysis, deluge of journals)
• Bayer Healthcare team published work showing that only 25% of the academic studies they examined could be replicatedo Prinz et al. Nat. Rev. Drug Discov. 10, 712, 2011
• Of 53 oncology studies from 2001-2011, each highlighting big new apparent advances in the field, only 11% (6) could be robustly replicated.o Begley & Ellis Nature 483, 531–533, 2012
Trust in the LHS (cont.)
• The problem is by no means restricted to preclinical studies• Twelve randomised clinical trials testing 52 observational claims and failed to
reproduce a single oneo Young SS, Karr A. Deming, data and observational studies. Significance sep 2011;
8(3):116–120 • Replication of 100 experiments published in 2008 in three high-ranking
psychology journals – less than one half of finding replicatedo Estimating the reproducibility of psychological science. Science Aug 2015;349(6251)
• Random sample of 441 biomedical journal articles 2000 – 2014: none made all their data available, one provided full protocol, majority did not disclose funding or conflicts of interesto Iqbal et al. Reproducible Research Practices and Transparency across the Biomedical
Literature. PLoS biology 2016; 14(1) • Cost of irreproducible research in life science is estimated at $28 billion per
year in the U.So Freedman LP, Cockburn IM, Simcoe TS. The Economics of Reproducibility in Preclinical
Research. PLOS Biology jun 2015; 13(6)
• Each component in the healthcare system produces and consumes data:• Epidemiological research
using record linkages• Research data embedded in
the EHR• Decision support for
diagnosis• Provenance infrastructure
required to support all these domains
Data in the Learning Health System
Specific research
data
Actionable data
Routinely collected
data
• Clinical trials
• Controlled populations
• Well-defined questions
• EHR systems• Wide coverage• Vast quantity• May lack in
detail and quality
• Distilled scientific findings
• Usable in clinical practice
• Decision support
TRANSFoRm project
• €7.5M European Commission 2010-2015• Funded under the Patient Safety Work Program of FP7• Developing methods, models, services, validated
architectures and demonstrations to support:o Epidemiological research using GP records, including genotype-
phenotype studies and other record linkageso Clinical trials embedded in the EHRo Decision support for diagnosis
www.transformproject.eu
MiddlewareSecure data transport
RCT tools(Electronic Data
Collection)
Epidemiological study tools
(Data queries)
Authenticationframework
Diagnostic supporttools
Data source connectivity
module
Provenanceframework
Vocabulary service
TRANSFoRm software landscape
Use case 1: Type 2 Diabetes
• Research Question: In type 2 diabetic patients, are selected single nucleotide polymorphisms (SNPs) associated with variations in drug response to oral antidiabetic drugs (Sulfonylurea)?
• Design: Case-control study
• Data: primary care databases (phenotype data) pre-linked to genomic databases (genetic risk factors) – data federation
Use case 2: Gastro-oesophageal reflux disease (GORD)
• Research Question: What gives the best symptom relief and improvement in Quality of Life: continuous or on demand Proton Pump Inhibitor use?
• Design: Randomised Controlled Trial (RCT)• Data: Collection through EHR & web based questionnaire –
electronic case report forms AND mobile Patient Related Outcome Measures
• Provenance and security
Use case 3: Diagnostic Decision Support
• Early diagnostic suggestions for presenting problems:• chest pain• abdominal pain• shortness of breath
• Clinical Prediction Rule web service (with underlying ontology)
• Prototype Decision Support System integrated with a commercial electronic health record system• Vision by InPractice Systems
Provenance challenge for TRANSFoRm
• Viable methods for adoption in a heterogeneous software environmento No shared workflow middleware to rely on
• Need to achieve domain specificity• Able to demonstrate conformance to standards
o Title 21 of the Code of Federal Regulations; Electronic Records; Electronic Signatures (21 CFR Part 11)
o Good Clinical Practice (GCP)o EudraLex Vol. 4 Annex 11: Computerised Systems in EUo CONSORT, STROBE, RECORD
Semantic annotations
• Semantic concepts in the provenance graph defined using TRANSFoRm ontologies:o Clinical Research Information Model (CRIM)o Software infrastructure ontologyo Clinical evidence ontology
• Ontology concepts annotations on provenance nodes• Provenance templates define domain actions that map to
provenance fragments
PCROM (UML Model)
Randomised Clinical Trial
Ontology(RCTO)
Randomised Clinical Trial Provenance
Ontology(RCTPO)
Provenance templates
Provenance database
Provenance server
Existingtools
1. Tools are agnostic to provenance representation
2. Service invocation matches some provenance template in Provenance server
3. Template is instantiated into a provenance graph fragment with OWL concept annotations
4. Graphs merged inside the database
API service calls
OPM graphs annotated with OWL
Provenance security
• Use a single provenance graph for:o Full trial audito Reporting studieso Publication reviewo Collaboratorso Readers
• Need to abstract parts of the graph• Access control and view generation for provenance
graphso Future Generation Computer Systems, Volume 49, August
2015, Pages 8-27 Roxana Danger, Vasa Curcin, Paolo Missier, Jeremy Bryans
Basic idea
• The aim of an access control strategy is not only to determine if the resource can be viewed or not, but to construct a view of the graph which satisfies the security constraints
• The goal is for maximum amount of information to be retained
• NB Based on TRANSFoRm use cases but not implemented in the live system
Access control
• Ensuring that a principal (person, process, etc.) can only access the services or data in a system that they are authorized to
• Implemented through security policies that try to enforce a certain protection goal such as to prevent unauthorized disclosure (secrecy) and intentional or accidental unauthorized changes (integrity)
• Authorizations for some resource can be:o Positive (allow)o Negative (deny)
Access control
• Two classical approaches:o Closed policy
• deny-by-default• Access to a resource is only granted if a corresponding positive
authorization policy existso Open policy
• Permit-by=default• Access unless a corresponding negative authorization policy exists.
• Combined approach used to support policy exceptions• Conflict resolution needed if multiple policies apply, e.g.
o denials-take-precedenceo most-specific-takes- precedenceo priority levelso time-dependent access.
Access control languages for provenance
• Qin Ni et alo Semantic description of subjects (user roles) and resources to
be accessedo conditions under which restrictions are applied,o four different types of access permissions.
• Cadenhead et alo Added regular expressions for resource and condition
descriptions • Transformation-oriented Access Control Language for
Provenance (TACLP)o Allows users to define subgraphs to be transformed, with three
different levels of abstractions (namely hide, minimal and maximal).
External effects and causes
• External effects and causes of the set of nodes S w.r.t. a set of nodes Ro Set of nodes that represent the immediate
effects/causes of S that would be affected by removal of nodes in R from the graph V ()
o If S=R, then denote as ef(R) and ca(R)
Basic operations
• Node removalo Subgraph needs to be hiddeno e.g. if it is unnecessary for an analysis or user access to it
has been restricted. • Node replacement
o removing details of data and operations in a subgraph while retaining some information (abstract entity) of the existence of such subgraph.
Operation: node removal• Let Prov = (V , E , type) and R V be a set of nodes to be ⊆
removed. Result is a new provenance graph Prov =(V ,E′ ′,type ), such that: ′ ′
Abstract nodes and edges
• Dummy nodes introduced during entity replacement
• Preserve the causality of the rest of the graph• Two types of dependencies:
o Indirect• Denoted with double lines• Represent multi-step dependences (wdf+, u+, wgb+, wtb+)
o Soft dependencies• Denoted with double dashed lines• Generic transitive relationship which is not one of the above
False dependencies
• False dependencies introduce a previously non-existent path in the new graph, e.g. removing A, B
Causality preserving transformation
• A transformation is called causality preserving if it does not introduce false dependencies.
• Given a provenance graph and a set of entities to be abstracted/hidden, the question is how can these entities be joined or removed from the graph using only causality-preserving transformations?
Causality preserving partition and transformation
• Given a set of nodes R V, a causality preserving ⊆partition of R is such that removing or replacing any set of nodes will not introduce causal dependencies
• A graph transformation by partition of R is then a sequential application of Remp or Repp
• The necessary and sufficient condition for such transformation to be causality preserving is that for each all of P’s external causes and effects are connected
Optimal causality preserving partition
• Default partition of R consists of singletons, i.e. each node in R is a set in the partition.
• Optimal partition is such that none of its sets have the same sets of external causes and effects w.r.t. R
• Partitioning algorithmo Step 1, determine external causes and effects for default
partitiono Step 2, gradually merge the partitions until optimal.
Provenance graph transformation algorithm
• Once the partition is computed, the transformations are iteratively applied to each element in the partition
• Labels input provides names for generated abstract nodes
• Levels input provides abstraction level for each partitiono Hide
• remove operationo Minimum abstraction, maximum abstraction
• replace operation• isolated singletons removed as a special case.
Computational efficiency
• Transformation algorithm performance depends on the performance of the partition algorithm
• The other steps are linear to cardinality of the set of partitions and its edges
• The partition algorithm considers pair-wise combinations of nodes.
• Overall complexity is O(R2), where R is the set of nodes to abstract
Experimental results
• Provenance view transformation algorithm was implemented in Python 2.7 using Networkx API.
• Experiments were executed on Ubuntu 12.04, Intel Core i7-3687U CPU with 2.10GHz and 8GB RAM
• Synthetic provenance graphs used, randomly generating edges for each node within the degree range 2-10
• Two parameters:o the percentage of nodes to abstract (from 5 to 25 with a step 5)o the percentage of nodes to abstract which are causally
dependent (from 0 to 100 with a step of 25)• Each configuration was executed 10 times and the plots
presented show the averages of these executions.
Performance behaviour
• Execution time (Y) in seconds as a function of the number of nodes (X) and the percentage of nodes to abstract (Z)
• Quadratic time
Use case: Access to health data
• Access control for the provenance data collected from an Electronic Health Record (EHR) and clinical trial systems
• Rules:o Auditors. Healthcare system auditors or law enforcement agencies can access
the whole provenance graph during the auditing process. o Family doctors and patients. Electronic health records and their data
provenance can only be accessed by patients during weekends, and by FDs during weekdays.
o Active FDs. Active FDs have access to the provenance data associated with the EHRs of their patients and its provenance;
o Clinical trial 1. If some data comes from a clinical trial, the GP needs to be participant of the trial to see the subgraph associated with that trial.
o Clinical trial 2. Patients do not have access to clinical trial processes. o Laboratory. Patients do not have access to laboratory processes. o Automatic diagnosis recommendation. Patients have no access to any
information related to the automatic diagnosis recommendation nor to the graph segment connecting it with the clinical evidences.
TACLP
• Transformation-oriented Access Control Language for Provenance (TACLP)
• Extends the works of Ni and Cadenhead by introducing transformations
• A policy consists of:o Targeto Effecto Transformationo Condition (optional)o Obligation (optional)
TACLP Target
• Subject elemento Set of users (subject element) to which the policy should be applied,
expressed through IRI references• Record element
o Set of resources to which the policy should be applied, expressed through IRI references
• Restriction element (optional)o A conditional expression under which the policy is appliedo Either a relational comparison between a value in a property path and a
literal, or a full logical expression. • Scope element (optional)
o If the policy is ‘transferable’ or ‘non-transferable’ with respect to subjectso Whether it applies to all the ancestors of matched elements in the graph,
or to the matched elements only.
TACLP Effect
• Specifies the intended outcome• Four possibilities:
o Absolute permit guarantees access to the graph regardless of the effect of other policies
• e.g. for allowing access to auditors or law enforcement agencies, and avoids the need for additional conditions in deny policies
o Deny guarantees that certain parts of the graph will not be accessed by users in the subject element.
o Necessary permit is used to describe the necessary, but not always sufficient, conditions for accessing certain parts of the graphs
o Permit is used to describe those parts of the graph that can be accessed if there are no other policies denying access to it.
TACLP Transformation
• How to transform the provenance graph in order to hide certain resources
• Specification of which nodes need to be hidden and Removal/Replace operations to be applied to them
• Set of policies comprisingo Policy type (target, record, condition, effect,
transformation element and obligation)o Policy evaluation type (deny- takes-precedence or
permit-takes-precedence)
TACLP Transformation
• Abstraction levelo Hide
• matched nodes of the subgraph have to be completely hidden (removed) from the graph
• Remove transformation is applied; o Minimum abstraction
• Replace transformation is applied• No caused-by relationship (soft dependencies) will appear in
the transformed graph. o Maximum abstraction
• Replace transformation is applied• Soft dependencies can appear in the transformed graph.
Access control evaluation algorithms
• Aim to produce an abstracted graph that satisfies the constraints
• Deny-takes-precedence1. Absolute permit policies evaluated first2. Necessary permit and deny policies 3. Permit policies
• Allow-takes-precedence1. Absolute permit evaluated first2. Necessary permit policies3. Permit policies4. Deny policies
Summary
• Learning Health System presenting new set of challenges for medical and informatics communities
• Provenance can help establish trust in the LHS• Methods needed to verify trust• Abstraction of provenance traces needed to address
requirements of multiple stakeholderso Researcherso Regulatorso Publishers
• Future worko Projects running on provenance of decision support and visual analytics
for health datao Looking for partnerships to investigate applications of the security work
Acknowledgements
• Thanks to:o Roxana Dangero Paolo Missier o Jeremy Bryanto Derek Corrigano Brendan Delaney