View
121
Download
2
Embed Size (px)
DESCRIPTION
Presentation at the Canadian Cancer Research Conference satellite bioinformatics.ca workshop. This one is an introduction to tcga, icgc and cosmic databases.
Citation preview
Canadian Bioinformatics Workshopswww.bioinformatics.ca
Canadian Cancer Research ConferenceNovember 3-6, 2013
2Module #: Title of Module
Module 1: Cancer Genomic Databases bioinformatics.ca
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:
You attribute the work to its author and respect the rights and licenses associated with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
Module 1Cancer Genomic Databases
Module 1: Cancer Genomic Databases bioinformatics.ca
Schedule for Module 1Cancer Genomic Databases
•The Databases: – The International Cancer Genome Consortium
(ICGC)– The Cancer Genome Atlas (TCGA)– The Catalogue of Somatic Mutations in Cancer
(COSMIC)
•Data Access: human genomes and security and privacy issues, Open vs. Controlled Access data
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
http://bioinformatics.ca/
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Workshops planned for 2014: http://bioinformatics.ca/workshops
1. Exploratory Analysis of Biological Data using R
2. Bioinformatics for Cancer Genomics3. Informatics for RNA-sequence Analysis4. Informatics on High Throughput Sequencing
Data5. Pathway and Network Analysis of -omics Data6. Flow Cytometry Data Analysis using R7. Microarray Data Analysis8. Informatics and Statistics for Metabolomics
Module 1: Cancer Genomic Databases bioinformatics.ca
http://bioinformatics.ca/workshops/2013
Module 1: Cancer Genomic Databases bioinformatics.ca
E-mail: [email protected]
Web: http://bioinformatics.ca
Workshop announcement mailing list:
http://bioinformatics.ca/mailman/listinfo/announce
Module 1: Cancer Genomic Databases bioinformatics.ca
Soap-Box time!
• Open Access, Open Data and Open Source are essential for good Science.
• Openness is a responsibility, an obligation, and something that comes with the privilege of doing publicly funded work.
Open Access
Open Source
Open Data
Opencourseware
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Cancer therapy is like beating the dog with a stick to get rid of his fleas.
- Anna Deavere Smith, Let me down easy
Module 1: Cancer Genomic Databases bioinformatics.ca
http://goo.gl/Yhbsj
Module 1: Cancer Genomic Databases bioinformatics.ca
The revolution in cancer research can summed up in a single sentence: cancer is in essence,a genetic disease.
- Bert Vogelstein
Module 1: Cancer Genomic Databases bioinformatics.ca
Cancer: a Disease of the Genome
Challenge in Treating Cancer:
Every tumour is different Every cancer patient is different
Module 1: Cancer Genomic Databases bioinformatics.ca
Chin et al, Genes. Dev. 2011 March 15; 25(6): 534-555http://www.ncbi.nlm.nih.gov/pubmed/?term=21406553
Cancer Genomic Databases
Module 1: Cancer Genomic Databases bioinformatics.ca
TCGA
The Cancer Genome Atlas is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.
Module 1: Cancer Genomic Databases bioinformatics.ca
About the TCGA
• National Cancer Institute (NCI)• National Human Genome Research• Institute (NHGRI)• Phased Structure:
– Three-year pilot in 2006 with an investment of $50 million from each
– TCGA will collect and characterize more than 20 additional tumour types (now at 16)
Module 1: Cancer Genomic Databases bioinformatics.ca
Where to start with the TCGA?
Wiki: https://wiki.nci.nih.gov/display/TCGA/About+TCGA
Module 1: Cancer Genomic Databases bioinformatics.ca
Division of Labour• Biospecimen Core Resource (BCR)
– centre where samples are carefully catalogued, processed, qualitychecked and stored along with participant clinical information
• Genome Sequencing Centre (GSC)– uses high-throughput methods to identify changes to DNA
sequences that are associated with specific cancer types
• Genome Characterization Centre (GCC)– uses high-throughput technologies to analyze genomic changes involved in
cancer
• Genome Data Analysis Centre (GDAC)– provides novel informatics tools to the research community
– provides analysis results using TCGA data.• Data Coordinating Centre (DCC)
– Central provider of TCGA data.
– Standardizes data formats and validates submitted data.
Module 1: Cancer Genomic Databases bioinformatics.ca
TCGA Data
• Sequence reads from newer sequencing technologies are available at the Cancer Genome Hub: https://cghub.ucsc.edu/
• Higher level sequence data (variation calls and abundance measures) are available at the TCGA Portal: http://cancergenome.nih.gov/
Module 1: Cancer Genomic Databases bioinformatics.ca
TCGA data flow
http://goo.gl/b5nojx
Module 1: Cancer Genomic Databases bioinformatics.ca
Data Coordinating Centre
• Play a central role– Receiving data from BCR, GSC and GCC sites– Providing access to users– Performing analysis of data
• Responsibilities:– Protecting participant privacy and confidentiality– Developing data standards and controlled vocabularies– Establishing informatics pipelines for data flow– Developing new analytical and visualization
technologies to facilitate data analysis, for all audiences
Module 1: Cancer Genomic Databases bioinformatics.ca
TCGA DCC Data Portal
• Provides a platform to search, download and analyze TCGA data sets
• Two data access tiers: Open and Controlled• Analytic tools include: Cancer Molecular
Analysis and Cancer Genome Workbench (NCBIB), Integrative Genomics Viewer (Broad) and CancerGenomics Analysis (MSKCC).
Module 1: Cancer Genomic Databases bioinformatics.ca
TCGA Data Browserhttps://tcga-data.nci.nih.gov/tcga/
Query TCGAdata onlineusing theTCGA DataBrowser
Module 1: Cancer Genomic Databases bioinformatics.ca
The International Cancer Genome Consortium (ICGC)
• http://www.icgc.org/
• “ICGC was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe”
ICGCBAM/FASTQ
TCGABAM/FASTQ
ICGCOpenData
(includes TCGA
Open Data)
COSMICOpen Data
Module 1: Cancer Genomic Databases bioinformatics.ca
ICGC Map – November 201367 projects launched
Module 1: Cancer Genomic Databases bioinformatics.ca
ICGC datasets to date
Dec-11 Jan-2012 Feb March April June July Aug Sept OctMay Nov Dec Jan-2013 Feb March April May June July Aug Sept-2013
1000
2000
3000
4000
5000
6000
7000
8000
9000
10,000
Release 7Release 8
Release 9
Release 10
Release 11
Release 12Release 13
Release 14
Number of
Donors
ICGC Data Portal Cumulative Donor Count for Member Projects
Hardeep Nahal
• Cancer types: 41
• Donors: 8,532 (18,056 specimens)
• Simple somatic mutations: 1,995,134
• Copy number mutations: 18,526,593
• Structural rearrangements: 18,614
• Genes affected* by simple somatic mutations: 22,074
• Genes affected* by non-synonymous coding mutations: 19,150
Genes affected* by copy number mutations: 20,341
• Genes affected* by structural rearrangements: 1,884
• *out 22,259 protein coding genes annotated in Ensembl Human release 69
• Open tier and controlled data currently available
ICGC dataset version 14September 2013
Hardeep Nahal
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Select “Pancreatic cancer – Canada”
Module 1: Cancer Genomic Databases bioinformatics.ca
… But where is the data?
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
http://dcc.icgc.org/
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Can do bulk download of the data …
Module 1: Cancer Genomic Databases bioinformatics.ca
DACO
DACO
ICGCICGC
dbGaPdbGaP
EGAEGA
TCGATCGA
BAMBAM
ERAERA
BAMBAM
+ EGA id
BAMBAMBAMBAM
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
http://icgc.org/daco
Module 1: Cancer Genomic Databases bioinformatics.ca
• Detailed Phenotype and Outcome data Region of residenceRisk factorsExaminationSurgeryRadiationSampleSlideSpecific histological featuresAnalyteAliquotDonor notes
• Gene Expression (probe-level data)• Raw genotype calls• Gene-sample identifier links• Genome sequence files
ICGC Controlled Access Datasets
• Cancer Pathology Histologic type or subtypeHistologic nuclear grade
• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up
• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Newly discovered somatic variants
ICGC OA Datasets
http://goo.gl/w4mrV
Module 1: Cancer Genomic Databases bioinformatics.ca
Identify yourselfIdentify yourself
Fill out detail form which includes:• Contact and Project Information•Information Technology details and procedures for keeping data secure•Data Access Agreement
Fill out detail form which includes:• Contact and Project Information•Information Technology details and procedures for keeping data secure•Data Access Agreement
All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf
All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
DACO approved projects
Module 1: Cancer Genomic Databases bioinformatics.ca
DACO/DCC User Data Access Process
• Users approved through DACO are now automatically granted access to ICGC controlled access datasets available through the ICGC Data Portal and the EBI’s EGA repository
DACO Web ApplicationDACO Web Application
DCC User RegistryDCC User Registry
DCC Data Portal
DCC Data Portal
EBI EGAEBI EGA
application approvedby DACO
user accounts activated
Module 1: Cancer Genomic Databases bioinformatics.ca
Catalogue of Somatic Mutations in Cancer (COSMIC)
• http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/
• COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.
Module 1: Cancer Genomic Databases bioinformatics.ca
COSMIC• Somatic Mutations Only• Diverse sources
– Literature (Arrays, Next-Gen, PCR...)– TCGA– ICGC
• Diverse ways to look at data– Gene– Variation– Tumour type– Cell line– Experiment
Module 1: Cancer Genomic Databases bioinformatics.ca
FAQ
Module 1: Cancer Genomic Databases bioinformatics.ca
Looking up your favorite gene
1 2 3
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
Module 1: Cancer Genomic Databases bioinformatics.ca
In closing
• Remember all these sites have great amounts of documentation
• The field is changing quickly, and so are the portals.
• New features are planned as we speak, and so you need to use the sites, and keep coming back.
• Don’t be afraid to explore• Interested in learning more after today? Consider
one of the bioinformatics.ca workshops!
Module 1: Cancer Genomic Databases bioinformatics.ca
Acknowledgements: the CBW gang
Michelle Brazas
MichaelStromberg
MarcFiume
MichaelBrudno