20
CCEGA Informatics Hemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

CCEGAInformatics Working Group

Bradley Hemminger

School of Information and Library Science

Supported in part by NIH Grant 5P20RR020751-02

Page 2: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Participants

• Roger Akers, Shepp Center• Peter DeSaix, Epidemiology • Xiaojun Guan, RENCI• Kevin Gamiel, RENCI • Barrie Hayes, Health Sciences Library • Brad Hemminger (chair) School of Information & Library

Science • Clark Jeffries, RENCI• Joel Kingsolver, Biology• Lavanya Ramakrishnan, RENCI• David Threadgill, Genetics • Kirk Wilhelmsen, Genetics • Dong Xiang, Lineberger Cancer Center

Page 3: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Aims

• Universal data model sharable by everyone. • Standardized, independent methods, so location can

be anywhere. • Practical. Adoptable by many disparate groups for

both new and legacy systems.• Utilize existing domain standards, controlled

vocabularies and ontologies (e.g. GO, MIAME, caBIG, …)

• Data repository should be safe and secure, with only controlled and accountable access by appropriate qualified entities.

Page 4: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Areas of Focus• Development of common data model

• Determine ways the common data model can be implemented as a common shared digital repository that allows for the ingest of digital content from many varied sources (both existing projects and new projects), and controlled access by appropriate people and automated agents.

Page 5: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Areas of Focus cont’d

• Address practical issues of how such a repository could be utilized by different groups with different needs in different contexts. Demonstrate advantages of how usage of the repository would be advantageous to groups, to help encourage them to utilize it.

• Define security and privacy issues for the repository, and propose and implement methods to support this.

• Preservation and curation.

Page 6: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Overview

• Status quo (difficulties summarized in Kirk’s talk).

• Diagram and brief explanation of planned architecture.

• How labs, clinics, and analysis would interact with repository.

Page 7: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Issues: Lab and Clinic to Analysis• Independent data management

– Data security– Version control– Redundancy– Controlled access

Clinical

Laboratory

Analysis

ELSI

Page 8: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Analysis

LAB

ELSI

Integration & Informatics

Clin

ic

CCEGA Model

We want the integration of the data operations across the labs, clinics, and analysis

Page 9: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

mapping

Ingest

mapping

Output

Lab

Repository

Data Store Analysis Methods

Association Table

Lab

Lab

Lab

Permissions

Page 10: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Timeline• First intramural workshop (spring 2005)• Weekly meetings (beginning spring 2005)

– Development of draft common model based on wealth of experience in local labs, and existing standards

– Analysis of data requirements, and existing infrastructure at UNC. Internal interviews with labs

• Second intramural workshop (summer/fall 2005)– Present draft common model for review and feedback by

UNC community

Page 11: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Timeline continued

• Extramural workshop (winter 2005)– Bring community of experts to UNC for discussions. – Learn in more detail about related work outside of UNC– Present our draft model to get feedback and criticism.

• Refine model• Implement and test model using data from the three

main projects identified in this grant. • Think about and plan for how this model spreads.

How to promote its use by groups with existing infrastructure as well as by new groups.

Page 12: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Common Data Model

• Survey schema/models in use by labs• Develop set of general requirements• Get ELSI and HIPAA requirements• Develop generalized model capable of

meeting needs• Test model with data collection and analysis

programs for alcoholism and addiction, breast cancer, and epidemiology studies that are part of the grant.

Page 13: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Initial Examples

• Epidemiology Specimen Collection and Tracking System (Roger)

• Alcoholism and Addiction Study (Kirk)

• Proteomics Core Facility General Model (Brad)

Page 14: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Page 15: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Page 16: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Page 17: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Security

• Security will be designed into the CCEGA model and to implemented in the repository to provide protection of information, while still allowing researchers timely access to data.

• Data will be protected via trusted broker methodology.

• Information is made anonymous by use of randomly chosen keys assigned by the trusted broker. The assignment is made at the clinic-database interface.

• The coded key will be used to identify experimental data, while providing linkage to the source organism private information in a secure association table.

Page 18: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Accountability

• Access permissions will determine which entities are allowed access to which data.

• All access to data is tracked via logs.• “Audit-readiness” will be maintained to respond

quickly to an outside investigation and challenge with the goal of quick clearance.

• Regular or random internal security audits will be included in a management strategy. Documents used in audits include 24/7 logs, flowcharts of procedures, training documents, incident reports, etc.

Page 19: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

Future (P50) Goals

• Comprehensive survey and publication of different schemas, architectures, controlled vocabularies/ontologies used by different groups. Comparison of similarities and differences.

• Digital content preservation planning.• Study of what factors determine how well

such models are adopted in this environment.• Make publicly available the developed

resources (data model, digital repository content, database structure/schema).

Page 20: CCEGA InformaticsHemminger CCEGA Informatics Working Group Bradley Hemminger School of Information and Library Science Supported in part by NIH Grant 5P20RR020751-02

CCEGA Informatics Hemminger

End