View
22
Download
1
Category
Preview:
DESCRIPTION
A Model for Sharing of Confidential Provenance Information in a Query Based System. Meiyappan Nagappan Mladen A. Vouk North Carolina State University. June 17 th , 2008 IPAW 2008. Agenda. Problem Motivation A scenario: Sharing Provenance Research Objective Implementation Model - PowerPoint PPT Presentation
Citation preview
DOE Scientific Data Management Center – Scientific Process Automation
A Model for Sharing of Confidential
Provenance Information in a Query
Based System
Meiyappan Nagappan
Mladen A. Vouk
North Carolina State University
IPAW 2008
June 17th, 2008
IPAW 2008
1
DOE Scientific Data Management Center – Scientific Process Automation
Agenda
Problem Motivation A scenario: Sharing Provenance Research Objective Implementation Model Discussions Conclusion Future Work
IPAW 2008 2
DOE Scientific Data Management Center – Scientific Process Automation
Problem Motivation Provenance is increasingly being used as part of analyses
to speed-up the process, extend its scope beyond raw data, and enable handling of very large data sets.
Attendant problem: Sharing of provenance information Keeping this information appropriately but selectively
confidential/protected
Confidentiality: “Ensuring that information is accessible only to those authorized to have access” – ISO/IEC - 17799
IPAW 2008 3
DOE Scientific Data Management Center – Scientific Process Automation
Unauthorized access of provenance could be used to Reverse engineer a process Compromise the privacy of the user Etc.
On the other hand, lack of sharing for the sake of confidentiality could hinder scientific discovery
Frequent current solution: export and mail the data that is to be shared Duplication of data – large meta-data sets and growing
A typical simulation may generate ~ 1GB of meta data Cannot revoke access
Problem Motivation
IPAW 2008 4
DOE Scientific Data Management Center – Scientific Process Automation
Scenario: Sharing Provenance
A
B
C
R1
R2
S11S12
S21
R3S31
S32
S33
IPAW 2008 5
DOE Scientific Data Management Center – Scientific Process Automation
Research Goal
The goal of current work is to develop a model, in the context of provenance for scientific simulations that Enables easy sharing of provenance data
Allows for dynamic changes in the confidentiality levels to serve multiple and different users
Does not compromise the confidentiality of the provenance data (including privacy)
IPAW 2008 6
DOE Scientific Data Management Center – Scientific Process Automation
Implementation Model - Architecture
Super Computer running
Simulations
Laptop running Kepler
Provenance Store
Web Interface to Query Provenance
API
API
QueryRecord
Authorization Service
MGMT. API
IPAW 2008 7
DOE Scientific Data Management Center – Scientific Process Automation
Sub Goals
Sub Goal 1: Person who generates simulation data – owner of original provenance data
Sub Goal 2 : Users cannot edit/delete Administrator can but must leave audit trail
Sub Goal 3: Owner can annotate their data Sub Goal 4: Owner can choose collaborators Sub Goal 5: Auditors have full read only access
Goal is to build a model that enables sharing provenance in an environment where the confidentiality level changes dynamically
We attempt to achieve the Goal through the following 5 objectives (sub-goals)
IPAW 2008 8
DOE Scientific Data Management Center – Scientific Process Automation
What? Person who generates simulation data is owner of original
provenance data
Why? Each dataset is clearly traced to one owner
What is the risk? Dispute on who has the authority to share the data in the first place
Implementation? 3 Tiered: Client – Application Logic – Database Approach
Sub Goal 1
IPAW 2008 9
DOE Scientific Data Management Center – Scientific Process Automation
What? Editing and Audit Trail No edits/deletes by owner, collaborator, other users Administrator can edit, but must leave audit trail
Why? Consistency of data (particularly shared data) Auditing
Risk? Each time the collaborator may get different results
How? Restrict privileges at DB level Log all super user actions
Sub Goal 2
Provenance Store
MGMT. API
IPAW 2008 10
DOE Scientific Data Management Center – Scientific Process Automation
Sub Goal 3 What?
Data Annotation
Why? User specified meta data Collaborator may have different interpretation
Risk? Loss of valuable meta data about provenance Cannot flag inaccurate data – therefore need delete privileges
How? Annotation field in all tables of schema. Through WI, annotate Provenance Data and Saved Queries
Provenance Store
WIAPI
Query
IPAW 2008 11
DOE Scientific Data Management Center – Scientific Process Automation
Sub Goal 4
What? Data Sharing with dynamically changing confidentiality levels
Why? To share data on “What You See Is What You Want To Share” basis Each time a different subset of the data
Risk? Share entire data set or nothing Disk space wasted for saving a separate copy of subset
How? Query Sharing
IPAW 2008 12
DOE Scientific Data Management Center – Scientific Process Automation
User Authorization API DB
UsernamePassword
AuthenticateRequest Data Execute Query
Return Data
Save data for Collaborator Save the Query
View Queries Saved for me by other Collaborators
View Data Saved in Query for me by other Collaborators
Query Table
•Query ID•Saved by•Saved for•Query•Timestamp•Allow Cascading•Revoke Active
Sub Goal 4(contd.)
Annot Table
•Query ID•User ID•Annotation•Viewable
Annotate the Query
IPAW 2008 13
DOE Scientific Data Management Center – Scientific Process Automation
Why Query Sharing
Dynamically decide what to share
Size of the set of information to be shared is large
Subset of information rather than individual records
Sub Goal 4(contd.)
IPAW 2008 14
DOE Scientific Data Management Center – Scientific Process Automation
What? Data Audit and Verification
Why? Prevent tampering by malicious users Maintain Accuracy
Risk? Collaborators may try to break system Administrators may misuse super user privileges
How? Authorized and authenticated auditors Full Read only access to – Original data, Provenance data, Annotations Edit trails and logs of super user actions
Sub Goal 5
IPAW 2008 15
DOE Scientific Data Management Center – Scientific Process Automation
Issues
The model is Query Centric Automatic run time collection of provenance data required. Restricted to provenance data from scientific workflow
systems. Collaborator can annotate shared subset only as a whole. Does not address issues in long term storage and
scalability
IPAW 2008 16
DOE Scientific Data Management Center – Scientific Process Automation
Conclusion
With increase in emphasis on provenance data collection in scientific workflows, the issue of its confidentiality becomes more important
Not much research done in this area of provenance This model addresses the confidentiality in a collaborative
environment. Tradeoff – Disk Space:Time :: Query Sharing:Data Sharing
IPAW 2008 17
DOE Scientific Data Management Center – Scientific Process Automation
Validating our model against other solutions using different threat scenarios
Responsibility of sharing data is with user Privacy of user is at stake Tools required to foresee inferences from provenance data
Large data sets: Provenance data and shared queries grow steadily in size Accessing them will be difficult Tools required to improve the HCI aspect
Future Work
IPAW 2008 18
DOE Scientific Data Management Center – Scientific Process Automation
Questions?
IPAW 2008 19
DOE Scientific Data Management Center – Scientific Process Automation
Related Work: References[1] Hasan, R., Sion, R. and Winslett, M.: Introducing secure provenance: problems and
challenges Proceedings of the 2007 ACM workshop on Storage security and survivability, ACM, Alexandria, Virginia, USA, (2007). pp 13-18
[2] Griffiths, P.P. and Wade, B.W.: An authorization mechanism for a relational database system. ACM Transactions on Database Systems,(Sep 1976)., 1 (3). 242-255.
[3] Sandhu, R. and Samarati, P. 1996.: Authentication, access control, and audit. ACM Computer Survey 28, 1 (Mar. 1996), 241-243. DOI = http://doi.acm.org/10.1145/234313.234412
[4] Tan, V., Groth, P., Miles, S., Jiang, S., Munroe, S., Tsasakou, S. and Moreau, L.: Security Issues in a SOA-Based Provenance System. LNCS, Volume 4145 (Provenance and Annotation of Data). pp. 203-211. Springer Berlin / Heidelberg (2006)
IPAW 2008 20
Recommended