Upload
darcy-washington
View
215
Download
0
Embed Size (px)
DESCRIPTION
Enclave Mission To Promote access to sensitive NIST micro dataTo Promote access to sensitive NIST micro data –Serves mandate of TIP to “accelerate the development of high- risk, transformative research targeted to address key societal challenges.” –NIST has a unique source of innovation data which researchers can use to study: Entrepreneurship & innovationEntrepreneurship & innovation Early stage technology developmentEarly stage technology development Commercialisation of high-risk R&DCommercialisation of high-risk R&D To Protect ConfidentialityTo Protect Confidentiality –Technical –Legal –Organizational –Statistical To Archive, Index and Curate ATP Micro- dataTo Archive, Index and Curate ATP Micro- data
Citation preview
The NORC Data Enclave for The NORC Data Enclave for Sensitive MicrodataSensitive Microdata
Timothy M. MulcahyTimothy M. MulcahySenior Research Scientist, NORC/University of Senior Research Scientist, NORC/University of
Chicago, Chicago, [email protected]@norc.uchicago.edu
OverviewOverview• Enclave MissionEnclave Mission• Data ProtectionData Protection• Metadata DocumentationMetadata Documentation• Portfolio ApproachPortfolio Approach• Focus on Research Focus on Research
Collaboration/Developing MetricsCollaboration/Developing Metrics• Current StatusCurrent Status• SummarySummary
Enclave MissionEnclave Mission• To Promote access to sensitive NIST micro To Promote access to sensitive NIST micro
datadata– Serves mandate of TIP to “accelerate the development Serves mandate of TIP to “accelerate the development
of high-risk, transformative research targeted to address of high-risk, transformative research targeted to address key societal challenges.” key societal challenges.”
– NIST has a unique source of innovation data which NIST has a unique source of innovation data which researchers can use to study:researchers can use to study:• Entrepreneurship & innovationEntrepreneurship & innovation• Early stage technology developmentEarly stage technology development• Commercialisation of high-risk R&DCommercialisation of high-risk R&D
• To Protect ConfidentialityTo Protect Confidentiality– TechnicalTechnical– LegalLegal– OrganizationalOrganizational– StatisticalStatistical
• To Archive, Index and Curate ATP Micro-To Archive, Index and Curate ATP Micro-datadata
What’s in it for NIST?What’s in it for NIST?• Researcher access to database to Researcher access to database to
examine entrepreneurship and firm examine entrepreneurship and firm behaviorbehavior
• Development of research community, Development of research community, including graduate students (and including graduate students (and possibly undergraduates)possibly undergraduates)
• High quality research => more insights High quality research => more insights into value added of ATP/TIP programinto value added of ATP/TIP program
a)a) High quality analysis leverages federal investmentHigh quality analysis leverages federal investmentb)b) Metadata documentation improves scientific qualityMetadata documentation improves scientific quality
Ideal SystemIdeal System• SecureSecure• FlexibleFlexible• Low CostLow Cost• Meet Replication standardMeet Replication standard
– The only way to understand and evaluate an The only way to understand and evaluate an empirical analysis fully is to know the exact empirical analysis fully is to know the exact process by which the data were generatedprocess by which the data were generated
– Replication dataset include all information Replication dataset include all information necessary to replicate empirical resultsnecessary to replicate empirical results
– Metadata crucial to meet the standardMetadata crucial to meet the standard• Composed of documentation and structured metadataComposed of documentation and structured metadata• Undocumented data are uselessUndocumented data are useless
• Create foundation for metadata Create foundation for metadata documentation and extend data lifecycledocumentation and extend data lifecycle
Metadata & Survey CycleMetadata & Survey Cycle
Data collection is not a static process – it’s a lifecycleData collection is not a static process – it’s a lifecycle It dynamically evolved across time and involves many It dynamically evolved across time and involves many
playersplayers It extends to aggregate data to reach decision makersIt extends to aggregate data to reach decision makers Metadata are crucial to capture knowledgeMetadata are crucial to capture knowledge
*Exhibit Courtesy of Chuck Humphrey*Exhibit Courtesy of Chuck Humphrey
NORC Data Enclave: MechanicsNORC Data Enclave: Mechanics1.1. Data ProtectionData Protection
a)a) Already collect data for multiple statistical Already collect data for multiple statistical agencies (BLS, Federal Reserve (IRS data), agencies (BLS, Federal Reserve (IRS data), EIA, NSF/SRS etc.) => safeguards in placeEIA, NSF/SRS etc.) => safeguards in place
b)b) NIST approved IT security plan NIST approved IT security plan 2.2. Provision of access – a portfolio approachProvision of access – a portfolio approach
a)a) Statistical protection (statistical)Statistical protection (statistical)b)b) Researcher training (Educational)Researcher training (Educational)c)c) Dissemination to researcher community Dissemination to researcher community
(Operational)(Operational)d)d) Agency-specific data protection requirements Agency-specific data protection requirements
(Legal)(Legal)
Statistical, Technical, Legal & Statistical, Technical, Legal & Operational Controls Operational Controls
Utility Confidentiality
Data ProtectionData Protection
Data ProtectionData ProtectionThe Data Enclave is fully compliant with The Data Enclave is fully compliant with DOC IT Security Program Policy, Section DOC IT Security Program Policy, Section 6.5.2, the Federal Information Security 6.5.2, the Federal Information Security Management Act, provisions of Management Act, provisions of mandatory Federal Information mandatory Federal Information Processing Standards (FIPS) and all Processing Standards (FIPS) and all other applicable NIST Data IT system other applicable NIST Data IT system and physical security requirements. and physical security requirements.
IT SecurityIT Security• Encrypted connection with the data enclave using virtual Encrypted connection with the data enclave using virtual
private network (VPN) technology. VPN technology enables private network (VPN) technology. VPN technology enables the data enclave to prevent an outsider from reading the the data enclave to prevent an outsider from reading the data transmitted between the researcher’s computer and data transmitted between the researcher’s computer and NORC’s network. NORC’s network.
• Users access the data enclave from specific, pre-defined IP Users access the data enclave from specific, pre-defined IP addresses. addresses.
• Citrix’s Web-based technology. Citrix’s Web-based technology. – All applications and data run on the server at the data enclave. All applications and data run on the server at the data enclave. – Data enclave can prevent the user from transferring any data Data enclave can prevent the user from transferring any data
from data enclave to a local computer. from data enclave to a local computer. – Data files cannot be downloaded from the remote server to the Data files cannot be downloaded from the remote server to the
user’s local PC. user’s local PC. – User cannot use the “cut and paste” feature in Windows to User cannot use the “cut and paste” feature in Windows to
move data from the Citrix session. move data from the Citrix session. – User is prevented from printing the data on a local computer. User is prevented from printing the data on a local computer.
• Audit logs and audit trailsAudit logs and audit trails
Provision of AccessProvision of Access
2413
Menu Options for Agency X (and Study Y)
1,42,312Licensing (different levels of anonymization)
None13,53 withcustomization
Onsite Access
252None
Remote Access
Educational (1,2,3,4)
Operational (1,2,3,4,5)
Statistical (1,2,3,4,5)
LegalOptions (1,2,3,4)
Sample Modalities
Provision of Research AccessProvision of Research Access
Provision of Research Provision of Research AccessAccessTwo Approaches:Two Approaches:
Remote accessRemote access– External researchers access data via an encrypted External researchers access data via an encrypted
connection with the data enclave using VPNconnection with the data enclave using VPN– RSA Smart Card RSA Smart Card – Restrict user access from specific, pre-defined IP addressesRestrict user access from specific, pre-defined IP addresses– Citrix technology to access applications – configured so no Citrix technology to access applications – configured so no
downloads, cut and paste or print possibledownloads, cut and paste or print possible Onsite accessOnsite access
– Secure room at NORC site (Bethesda, MD & Chicago, IL)Secure room at NORC site (Bethesda, MD & Chicago, IL)– Secure machinesSecure machines– Video cameraVideo camera– Audit logs and trailsAudit logs and trails– WorkspacesWorkspaces
Legal and Statistical Legal and Statistical ProtectionsProtections
LegalLegal– Access Agreement signed by institutional and individual Access Agreement signed by institutional and individual
researcherresearcher– Approved institutionsApproved institutions– Access limited to data requested and authorizedAccess limited to data requested and authorized
StatisticalStatistical– Remove obvious identifiers and replace with unique Remove obvious identifiers and replace with unique
identifiersidentifiers– Statistical techniques chosen by agency (recognising Statistical techniques chosen by agency (recognising
data quality issues)data quality issues)Note: Both are at discretion of agency and can go above Note: Both are at discretion of agency and can go above
and beyond the minimum level of protectionand beyond the minimum level of protection
Researcher TrainingResearcher TrainingSubjectsSubjects
– Basic confidentiality Basic confidentiality – Agency specific (joint with agency)Agency specific (joint with agency)– Dataset specific (joint with agency)Dataset specific (joint with agency)
LocationsLocations– OnsiteOnsite– Web-basedWeb-based– Researcher locations (AAEA, JSM, AOM, ASA, ASSA, Researcher locations (AAEA, JSM, AOM, ASA, ASSA,
NBER summer institute)NBER summer institute)Note: The training is designed to go above and beyond Note: The training is designed to go above and beyond
current practice in terms of both frequency and current practice in terms of both frequency and coveragecoverage
Data Enclave Training Agenda NORC – University of Chicago 4350 East West Highway Suite 800 Bethesda, MD 20814 Day 1 8:30-9:00 Welcome (NASS/ERS/NORC) 9:00-10:30 Data enclave navigation (NORC) 10:30-10:45 Break 10:45-12:15 Metadata documentation (NORC) 12:15-1:15 Lunch 1:15-2:45 Confidentiality and data disclosure (NORC) 2:45-3:00 Break 3:00-4:00 ARMS survey overview (ERS) –ERS Staff 4:00-4:10 Confidentiality agreement signing
Data Enclave Training Agenda NORC – University of Chicago 4350 East West Highway Suite 800 Bethesda, MD 20814 Day 2 8:30-9:00 Data files and documentation (Data Producer) 9:00-10:00 Sampling and weights (Data Producer) 10:00-10:15 Break 10:15-11:15 Item quality control and treatments for non-response (Data Producer) 11:15-12:15 Statistical testing (Data Producer) 12:15-12:30 Closing and adjournment
Researcher Researcher ResponsibilitiesResponsibilities
• Serve Agency MissionServe Agency Mission• Metadata documentationMetadata documentation
– CodeCode– Information about variablesInformation about variables
• Post research outputPost research output• Cite sourcesCite sources• Evaluation and feedbackEvaluation and feedback
Developing a Virtual Developing a Virtual CollaboratoryCollaboratory
• Value AddedValue Added– Serve Agency MissionServe Agency Mission– Metadata documentationMetadata documentation
• CodeCode• Information about variablesInformation about variables
• Policy RelevancePolicy Relevance– Research outputResearch output
• Cite sourcesCite sources• Evaluation and feedbackEvaluation and feedback
Logging OnLogging OnThe browser downloads the .ica file and launches the Citrix Client
ENCLAVE LEVEL PORTAL
ENCLAVE LEVEL PORTAL
SITEMENU CONTENT
DISPLAYAREA
ENCLAVE LEVEL FEATURESInforms users about
enclave updates, events, publications, new features,
etc.
Guidelines and technical assistance for new users
Calendar of events such as conferences, data release,
trainings,….
Background information on the data enclave
ENCLAVE LEVEL FEATURES
Overview and catalog of surveys available in the
enclave
General information on clients or survey series
ENCLAVE LEVEL FEATURES
Access to enclave documentation and public
survey documents (reports, questionnaires, no data!).
This Information consists of files organized in folders. Can also be searched by
categories.
A wiki based knowledge area maintained by the
enclave managers. Provides FAQ, technical
info, tips & trick,…
Issue tracking system for users to request technical
assistance from the enclave staff or report
issues with the survey data.
Collaborative features reserved for data enclave
managers (not be visible to regular users)
GROUP LEVEL FEATURES
SummarySummary• Goal: To promote access to Goal: To promote access to
sensitive ATP micro data while sensitive ATP micro data while protecting confidentialityprotecting confidentiality
• Benefits:Benefits:– Secure, low-cost approach to leveraging Secure, low-cost approach to leveraging
ATP’s investment in data collectionATP’s investment in data collection– Archiving, Indexing, and Curation of ATP Archiving, Indexing, and Curation of ATP
Micro-dataMicro-data– Applicable and Customizable to agency Applicable and Customizable to agency
needs and requirementsneeds and requirements
Next StepsNext Steps• Developing metricsDeveloping metrics
– Number of interactionsNumber of interactions– Additions to the wiki, code, combined Additions to the wiki, code, combined
variables, macrosvariables, macros– Research output (how to quantify)Research output (how to quantify)
• Developing incentivesDeveloping incentives– Establish leaders Establish leaders – External communicationsExternal communications
Contact InformationContact Information• Timothy M. MulcahyTimothy M. Mulcahy• [email protected]@norc.uchicago.edu• WebsiteWebsite
– http://dataenclave.norc.orghttp://dataenclave.norc.org