Upload
lamcong
View
217
Download
1
Embed Size (px)
Citation preview
Pollock 3/30/11
1
The Biostatistician’s Role in Managing Clinical Translational Research Data
The Biostatistician’s Role in Managing Clinical Translational Research Data
Brad Pollock, MPH, PhD
Chairman, Department of Epidemiology and Biostatistics
University of Texas Health Science Center at San Antonio
Main Campus
Biostatistics and Informatics Core, Cancer Therapy & Research Center
Biostatistics and Research Design Core, Institute for the Integration of Science and Medicine (CTSA)
Biomedical Informatics Core, Institute for the Integration of Science and Medicine (CTSA)
Children’s Oncology Group Community Clinical Oncology Program (CCOP) Research Base
• Cooperative group statistician for the Pediatric Oncology Group and he successor Children’s Oncology Group
• Cancer center biostatistics core director
• GCRC and CTSA biostatistics and informatics core director
• Biostatistics cores: computational infrastructure and data management
• Data quality
• Discipline roles and responsibilities
• Projects
• Trends
Biostatistical Support Units
Pollock 3/30/11
2
The Biostatistician’s Role in Managing Clinical Translational Research Data
Types of Biostatistical Support Units
• CTSA Biostatistics, Epidemiology, Research Design (BERD) units
• Cancer Center Support Grants (P30)
• Data Coordinating Centers
• Statistical and Data Centers for NCI cooperative groups
Academic Homes for Biostatistical Support Units
• Units can be based in:– Divisions
– Departments
– Schools/colleges
– Centers/institutes
– Administrative units of universities
– External coordinating centers
Biostatistics Core Functions
• Design studies– Clarify hypotheses and objectives
– Define endpoints
– Select study/experimental design
– Sample size/power calculations
– Develop analytic plans
• Monitor studies– Efficacy/futility
– Safety
• Analyze studies– Statistical analysis
– Writing reports/manuscripts
Co
mp
uta
tio
n
Who should define, manage, and oversee clinical translational research data operations?
Premise
• With some exceptions, computation in biostatistics has been heavily focused on analysis
• With the CTSAs, managing data for clinical translational research may be shifting toward the biomedical informatics discipline
WIKIPEDIA
• Biostatistics: …biostatistics encompasses the design of biological experiments, especially in medicine and agriculture; the collection, summarization, and analysisof data from those experiments; and the interpretation of, and inference from, the results.
• Biomedical Informatics: …at the intersection of information science, computer science, and health care. It deals with the resources, devices, and methods required to optimize the acquisition, storage, retrieval, and use of information in health and biomedicine. Health informatics tools include not only computers but also clinical guidelines, formal medical terminologies, and information and communication systems.
Pollock 3/30/11
3
The Biostatistician’s Role in Managing Clinical Translational Research Data
Biomedical Informatics
Focus areas:– Ontologies– Vocabulary/terminology– Data models– Human-machine interface– Natural language processing– Electronic health records– Data repositories
Who should define, manage, and oversee clinical translational research data operations?
It depends…
Answer:
• No brainer if you are a:– NCI cooperative Group Statistician
– Director of a NIH-funded Data Coordinating Center (DCC)
– Director of a structured biostatistics core: e.g., Centers for AIDS Research (CFAR), Alzheimer’s Disease Core Centers, etc.
• Often a requirement of the RFA
Answer:
• Less clear if you are a:– Director of a CTSA BERD unit
– Director of a CCSG P30 Biostatistics Core
– Director of an institutional biostatistics support unit with a separate group informatics group
• Clinical informatics
• Bioinformatics
– National Children’s Study center
DATA MANAGEMENT
What is Data Management?
• The development, execution and supervision of plans, policies, programs and practices that control, protect, deliver, and enhance the value of data and information assets*
*Data Management Association, Data Management Body of Knowledge (DAMA-DMBOK), 2008
Pollock 3/30/11
4
The Biostatistician’s Role in Managing Clinical Translational Research Data
Who’s Involved in Data Management
SubjectsParticipantsPatients Investigators
CliniciansResearch StaffClinical Staff
StatisticiansEpidemiologistsAnalytic Staff
Central ITCIOISOSNO
Research ITAnalystsProgrammersDBAs
End-to-End Process
Data Management within the Research Process
Final StatisticalAnalysis
ProtocolDevelopment
Data ManagementProcess
ITInvolvement
Data Management Changing Within the Research Process
Final StatisticalAnalysis
ProtocolDevelopment
Data ManagementProcess
Data managementconsiderations arebeginning to influencethe science
}
{
Storage and long term utilization affect the data long after the protocol’s final analysis
Data Management Responsibilities
• Maintain a functional, flexible, scalable, cost-efficient resource to handle a variety of data:– Demographic– Clinical/laboratory – Bioinformatics– Environmental
• Data quality and compliance with regulatory requirements– HIPAA– 21 CFR Part 11– FISMA
• Planning for:– Long time horizons (e.g., NCS)– Interoperability and federation (e.g., caTissue Suite,
caGRID, OpenMDR)
Database Management Functions
• Database design– Data elements– Relationships (data model)– Access control/security/integrity
• Application development– Data capture– Data curation– Querying– Reporting– Audit
• Database operation
How Data Are Handled
• Paper forms (CRFs) and keypunch
• Client-server DBMS and networked DBMS
• Web-front end DBMS– Pediatric Oncology Group replaced paper
in 1998• Web front-end
• Oracle back-end
• Clinical Trials Management System (CTMS)
Advancing Techn
ology
Pollock 3/30/11
5
The Biostatistician’s Role in Managing Clinical Translational Research Data
Clinical Trials Management Systems
IMPACT® CTMS
• Maintain and manage: Planning, preparation, performance, and reporting of
clinical trials
• Up-to-date contact information for participants
• Tracking deadlines and milestones • Regulatory approval • Progress reports
IDEAS
DATA QUALITY
Criteria for Reproducible Epidemiologic Research*
Research Component
Requirement
Data Analytical data set is available.
Methods Computer code underlying figures, tables, and other principal results is made available in a human-readable form. In addition, the software environment necessary to execute that code is available.
Documentation Adequate documentation of the computer code, software environment, and analytical data set is available to enable others to repeat the analyses and to conduct other similar ones.
Distribution Standard methods of distribution are used for others to access the software, data, and documentation.
*from Peng, Dominici, Zeger. Am J Epidemiol 2006;163:783–789
Little emphasis on how we get to this point!
Little emphasis on how we get to this point!
Endgame
• Our goal is to do meaningful analyses to address study hypotheses
• Ethical analyses requires quality data– Gelfond et al. “Principles for the Ethical
Analysis of Clinical and Translational Research” (resubmitted to Statistics in Medicine)
Information vs. Analytical Quality
The features that make information useful are directly related the features that make statistical analyses useful.
1.Statistical analyses should preserve the good qualities of the data.
2.The value of statistical analysis heavily depends on the information quality.
InformationQuality
StatisticalQuality
DISCIPLINE ROLES AND RESPONSIBILITIES
Biostatistics Cores
Pollock 3/30/11
6
The Biostatistician’s Role in Managing Clinical Translational Research Data
Computational Disciplines for Clinical Translational Research
Research ITComputer Science
Biomedical Informatics
Clinical Translational Research Enterprise
Computational and Biostatistics Disciplines for Clinical Translational Research
Research ITComputer Science
Biomedical Informatics
Clinical Translational Research Enterprise
Biostatistics
University of Texas Health Science Center at San Antonio
Informatics Data Exchange and Acquisition System (IDEAS)
• Began database development in 2001 for the San Antonio Cancer Institute (P30) using the general design approach of the POG Data System
• Extended to support the GCRC in 2002
• Adapted to caBIG requirements in 2007
caBIG
• NCI’s cancer Biomedical Informatics Grid launched in 2004
• Goals
– Connect scientists and practitioners through a shareable and interoperable infrastructure
– Develop standard rules and a common language to more easily share information
– Build or adapt tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care.
Informatics Data Exchange and Acquisition System (IDEAS)
• Began database development in 2001 for the San Antonio Cancer Institute (P30) using the general design approach of the POG Data System
• Extended to support the GCRC in 2002
• Adapted to caBIG requirements in 2007
• Extended to support the Institute for the Integration of Medicine and Science (CTSA) in 2008
Pollock 3/30/11
7
The Biostatistician’s Role in Managing Clinical Translational Research Data
Informatics Data Exchange and Acquisition System (IDEAS)
• Began database development in 2001 for the San Antonio Cancer Institute (P30) using the general design approach of the POG Data System
• Extended to support the GCRC in 2002
• Adapted to caBIG requirements in 2007
• Extended to support the Institute for the Integration of Medicine and Science (CTSA) in 2008 Single Point of Contact portal
Practice-Based Research Network (PBRN) support added in 2010
IDEAS Design Philosophy
• Open-development:Tools and infrastructure developed through an open,
participatory process.
• Open-access:Resources are freely obtainable…to ensure broad data-sharing
and collaboration.
• Open-source:Source code is available to view, alter, and redistribute.
• Federated: Software and resources are widely distributed, interlinked, and
available.
Complexity Encapsulation• Object-based templates• Common business objects• Custom object libraries• Standard Interfaces
User Interface
Data
Business Rules
WebProgrammers
Domain experts and Informatics analysts
DBA
Informatics Data Exchange and Acquisition System
The IDEAS
FrameworkAn interwoven structure of
interdependent components
Security Application
Data Collection Database
• Web• Interface• Batch
Pathology&
Genetics
Security
Protocols
Patient
IDEASThree Tier MVC Framework
IDEAS Interoperable Components
• IDEAS Custom
Meta-data generator
• Shibboleth: Federated Single Sign-On Authentication Service
• caTissue Suite
• Patient Study Calendar (PSC)
• Qualtrics
IDEAS and the IIMS-Affiliated Practice-Based Research
Networks (PBRNs)
1. StarNet (family practice) PBRN
2. Psychiatry PBRN
3. Dental PBRN
4. VA PBRN
Pollock 3/30/11
8
The Biostatistician’s Role in Managing Clinical Translational Research Data
Genetics and Biology of Liver Tumorigenesis in Children
• Bioinformatics and Biostatistics Core (BIBSC)
• Bring together disparate data: Pediatric Oncology Group, Children’s Cancer Group,
Children’s Oncology Group, the Cooperative Human Tissue Network (CHTN), Baylor pathology reference lab
Bioinformatics data from a range of high throughput platforms: Illumina, Affy, NextGen Sequencing, etc.
Demographic and clinical information
Human Studies Database Project
The Human Studies Database (HSDB) Project
• Premise:• Study results and design information
should be made computable for large-scale data mining, synthesis, re-analysis, and reuse
• HSDB: A CTSA multi-institutional project to federate study design descriptors and results of the human research portfolio over a grid-based architecture.
HSDB Use Cases
• Inform the design of new studies
• Facilitate systematic reviews/meta-analyses
• Identify potential collaborators by:
• Disease, population, bio-specimens available, analytic method of interest, etc.
• Aid in research management: Portfolio management (inventory of studies by design type) Comparison of human research portfolios across institutions Subject recruitment and community engagement
Ontology of Clinical Research (OCRe)
• HSDB is being developed using the Ontology of Clinical Research (OCRe) and common clinical vocabularies to standardize the storage of information
• Focus on: Study design (Study Design Classifier), interventions,
exposures, and analytic methods of individual-human studies
Any design type, for any intent, in any clinical domain
Federation across CTSAs
Pollock 3/30/11
9
The Biostatistician’s Role in Managing Clinical Translational Research Data
TRENDS
Twenty-five years from now…
…we will almost certainly have:
– New programming languages
– New methods for data management
– New architectures
– Completely different applications
– Unforeseen revolutions in how we create, store, manage, and process information
We Need to Expand Computing Education in the
Statistics Curricula
• Nolan and Temple Lang* – “Computational literacy and
programming are as fundamental to statistical practice and research as is mathematics”
– “Statisticians must be able to access data from various sources”
Nolan D, Temple Lang D. American Statistician, 2010, 64:97-107
Computation in Biostatistical Education
• Focus has been on statistical packages, statistical programming
• In 1990, UCLA set-up a new concentration in Data Management for the MS in Biostatistics degree (~fizzled)
• Emerging trend is increased training in bioinformatics and statistical genetics/genomics in curricula
Comparative Effectiveness Research (CER) Concerns
• Death of randomized controlled clinical trials
• Weak analytic designs without appropriate control for bias
• Poor data quality from existing data stores (e.g. EMR)
• Data mining and data dredging – Data dredging is the inappropriate (sometimes
deliberately so) use of data mining to uncover misleading relationships in data.
“Turning Dross into Gold”
• Just as alchemists tried turning dross into gold, the use massive repositories of clinical data (from electronic medical records) does not necessarily yield valid and meaningful inferences
• Data quantity is no substitute for quality
Pollock 3/30/11
10
The Biostatistician’s Role in Managing Clinical Translational Research Data
Need to Increase Interactions with Bioinformatics and Cross Train
• NIH’s 2000 definition– Bioinformatics: Research, development, or
application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.
– Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.
http://www.bisti.nih.gov/docs/CompuBioDef.pdf
Interaction of the Disciplines(Bayat A., BMJ, 324:324-27)
Clinical Research
Biostatistics
Clinical Informatics
Need to Increase Interactions with Bioinformatics and Cross Train
• This is already beginning to happen in some places:– Organizational structures
– Curricula
SUMMARY
Take Home Points
• Computational technologies for managing data are changing faster than technologies for analysis
• Data management Data quality
• Data quality Analytic quality
Take Home Points (continued)
• We need to think beyond the immediate project when designing our databases
• Future proof them– Will the data collected by tomorrow’s technology
be scientifically comparable with data collected by today’s technology if the technology is vastly different?
– Considerations:• Software
• Hardware platform
• Database content
Pollock 3/30/11
11
The Biostatistician’s Role in Managing Clinical Translational Research Data
Take Home Points (continued)
• Databases should be designed specifically with the analysis plan in mind
• Proper statistical analysis is still the mainstudy goal, not creating the “perfect” database
Other Take Home Points
• Ramp up the data side of computation into the biostatistics curriculum
• CER efforts should focus on: – Hypothesis testing vs. data mining
– Use of complete, high quality data
– Use of appropriate data models and analysis methods
Other Take Home Points (continued)
• Take advantage of opportunities to partnerwith biomedical informaticians:– Development of translational research which
melds biological data with clinical/population data– Adaptive design methods in clinical trials– National research networking– Future-proofing and repurposing our databases…
“Databases for Clinical Translational Research: Re-Purposing and Designing for Unanticipated Needs”
2:30 PM – 3:45 PM Thursday, April 28, 2011