Upload
brandi-davis-dusenbery
View
124
Download
2
Embed Size (px)
Citation preview
SCALABLE, COLLABORATIVE, REPRODUCIBLE, AND EXTENSIBLE ANALYSIS OF TCGA DATA IN THE
CLOUDBrandi Davis-Dusenbery, PhD
AACRApril 18, 2016
DISCLOSURE & FUNDING
This project has been funded in whole or in part with Federal funds from the National
Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN261201400008C.
I am an employee of Seven Bridges
GUIDING PRINCIPLES
Making data available isn’t
enough to make it usable
The best science happens in
teams
Reproducibility shouldn’t be
hard
The impact of TCGA is
extended by new data & tools
MAKING DATA AVAILABLE
ISN’T ENOUGH TO
MAKE IT USABLE
THE CGC ALLOWS YOU TO ACCESS MORE THAN 1PB OF MULTIDIMENSIONAL -OMICS DATA.
multiple Samples per Case
Primary Tumor
Solid Tissue NormalBlood Derived Normal
Metastatic… …
multiple Analyses per Sample
Genomic Transcriptomic
Proteomic Epigenomic
… …
Open Data Controlled Data
EXPLORE THE DATASET…
… AND THEN IMMEDIATELY RUN AN ANALYSIS.
THE BEST SCIENCE
HAPPENS IN TEAMS
SECURE AND COMPLIANT PROJECT MEMBERSHIP
• Projects serve as isolated workspaces for your data and tools.
• Fine-grained permissions give you control over who can see and use your assets.
• TCGA Controlled data projects access limited to only Authorized users.
RICH COMMUNICATION & EFFECTIVE COLLABORATION
Project descriptions, conversations, and realtime notifications keep everyone on the same page.
REPRODUCIBILITY SHOULDN’T BE
HARD
The inputs, outputs, and parameters as well of the
precise tool versions (including dependencies!)
are always linked and available for reference days
or months later.
EACH TASK IS REPRODUCIBLE & REMEMBERABLE
• Even the most complex workflows are captured as small run-able text files.
• Easy to share and save.
… AND SELF CONTAINED
THE IMPACT OF TCGA IS
EXTENDED BY NEW DATA &
TOOLS
• Graphical uploader
• Command Line uploader
• FTP / HTTP
• API
FOUR WAYS TO ADD YOUR OWN DATA
~40 properties in visual interface, unlimited custom properties via API.
EASILY ANNOTATE UPLOADED DATA TO MAKE IT EASIER TO FIND LATER
AS THE AMOUNT OF DATA HAS GROWN, SO TOO HAS THE NUMBER OF
TOOLS AVAILABLE TO ANALYZE IT
-omics data analysis tools* (each with many versions)
50+ used in a single TCGA marker paper
11,160
*omictools.com
DOCKER + CWL MAKES IT EASY TO PUT THESE TOOLS ON THE CGC …
AND OTHER PLACES
+
DEFINE THE TOOL, INPUTS, OUTPUTS AND PARAMETERS
ADD YOUR TOOL TO 100’S OF EXISTING TOOLS TO CREATE A WORKFLOW
WWW.CANCERGENOMICSCLOUD.ORG
MORE THAN $1M IN COMPUTE AND STORAGE CREDITS AVAILABLE FOR
YOU TO USETiered model allows everyone to access up to $1,600
(~ enough to do whole exome analysis of all pancreatic carcinoma samples)
Request up to $10,000 credits for large collaborative projects (Graduate students and Post-docs are particularly
encouraged to submit a request)
NEARLY 500 RESEARCHERS ARE USING THE CGC TODAY …
Early Adopter
Open Release
WWW.CANCERGENOMICSCLOUD.ORG
… JOIN THEM
Booth 452 Networking event
WWW.CANCERGENOMICSCLOUD.ORG
THANK YOU
This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No.
HHSN261201400008C.