63
Canadian Bioinformatics Workshops www.bioinformatics.ca Canadian Cancer Research Conference November 3-6, 2013

Introduction to Cancer Genomics Databases

Embed Size (px)

DESCRIPTION

Presentation at the Canadian Cancer Research Conference satellite bioinformatics.ca workshop. This one is an introduction to tcga, icgc and cosmic databases.

Citation preview

Page 1: Introduction to Cancer Genomics Databases

Canadian Bioinformatics Workshopswww.bioinformatics.ca

Canadian Cancer Research ConferenceNovember 3-6, 2013

Page 2: Introduction to Cancer Genomics Databases

2Module #: Title of Module

Page 3: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

You are free to:

Copy, share, adapt, or re-mix;

Photograph, film, or broadcast;

Blog, live-blog, or post video of;

This presentation. Provided that:

You attribute the work to its author and respect the rights and licenses associated with its components.

Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites

Page 4: Introduction to Cancer Genomics Databases

Module 1Cancer Genomic Databases

Page 5: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

@bffo

[email protected]

Page 6: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Schedule for Module 1Cancer Genomic Databases

•The Databases: – The International Cancer Genome Consortium

(ICGC)– The Cancer Genome Atlas (TCGA)– The Catalogue of Somatic Mutations in Cancer

(COSMIC)

•Data Access: human genomes and security and privacy issues, Open vs. Controlled Access data

Page 7: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 8: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

http://bioinformatics.ca/

Page 9: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 10: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Workshops planned for 2014: http://bioinformatics.ca/workshops

1. Exploratory Analysis of Biological Data using R

2. Bioinformatics for Cancer Genomics3. Informatics for RNA-sequence Analysis4. Informatics on High Throughput Sequencing

Data5. Pathway and Network Analysis of -omics Data6. Flow Cytometry Data Analysis using R7. Microarray Data Analysis8. Informatics and Statistics for Metabolomics

Page 11: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

http://bioinformatics.ca/workshops/2013

Page 12: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

E-mail: [email protected]

Web: http://bioinformatics.ca

Workshop announcement mailing list:

http://bioinformatics.ca/mailman/listinfo/announce

Page 13: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Soap-Box time!

• Open Access, Open Data and Open Source are essential for good Science.

• Openness is a responsibility, an obligation, and something that comes with the privilege of doing publicly funded work.

Open Access

Open Source

Open Data

Opencourseware

Page 14: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 15: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Cancer therapy is like beating the dog with a stick to get rid of his fleas.

- Anna Deavere Smith, Let me down easy

Page 16: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

http://goo.gl/Yhbsj

Page 17: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

The revolution in cancer research can summed up in a single sentence: cancer is in essence,a genetic disease.

- Bert Vogelstein

Page 18: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Cancer: a Disease of the Genome

Challenge in Treating Cancer:

Every tumour is different Every cancer patient is different

Page 19: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Chin et al, Genes. Dev. 2011 March 15; 25(6): 534-555http://www.ncbi.nlm.nih.gov/pubmed/?term=21406553

Cancer Genomic Databases

Page 20: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

TCGA

The Cancer Genome Atlas is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.

Page 21: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

About the TCGA

• National Cancer Institute (NCI)• National Human Genome Research• Institute (NHGRI)• Phased Structure:

– Three-year pilot in 2006 with an investment of $50 million from each

– TCGA will collect and characterize more than 20 additional tumour types (now at 16)

Page 22: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Where to start with the TCGA?

Wiki: https://wiki.nci.nih.gov/display/TCGA/About+TCGA

Page 23: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Division of Labour• Biospecimen Core Resource (BCR)

– centre where samples are carefully catalogued, processed, qualitychecked and stored along with participant clinical information

• Genome Sequencing Centre (GSC)– uses high-throughput methods to identify changes to DNA

sequences that are associated with specific cancer types

• Genome Characterization Centre (GCC)– uses high-throughput technologies to analyze genomic changes involved in

cancer

• Genome Data Analysis Centre (GDAC)– provides novel informatics tools to the research community

– provides analysis results using TCGA data.• Data Coordinating Centre (DCC)

– Central provider of TCGA data.

– Standardizes data formats and validates submitted data.

Page 24: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

TCGA Data

• Sequence reads from newer sequencing technologies are available at the Cancer Genome Hub: https://cghub.ucsc.edu/

• Higher level sequence data (variation calls and abundance measures) are available at the TCGA Portal: http://cancergenome.nih.gov/

Page 25: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

TCGA data flow

http://goo.gl/b5nojx

Page 26: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Data Coordinating Centre

• Play a central role– Receiving data from BCR, GSC and GCC sites– Providing access to users– Performing analysis of data

• Responsibilities:– Protecting participant privacy and confidentiality– Developing data standards and controlled vocabularies– Establishing informatics pipelines for data flow– Developing new analytical and visualization

technologies to facilitate data analysis, for all audiences

Page 27: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

TCGA DCC Data Portal

• Provides a platform to search, download and analyze TCGA data sets

• Two data access tiers: Open and Controlled• Analytic tools include: Cancer Molecular

Analysis and Cancer Genome Workbench (NCBIB), Integrative Genomics Viewer (Broad) and CancerGenomics Analysis (MSKCC).

Page 28: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

TCGA Data Browserhttps://tcga-data.nci.nih.gov/tcga/

Query TCGAdata onlineusing theTCGA DataBrowser

Page 29: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

The International Cancer Genome Consortium (ICGC)

• http://www.icgc.org/

• “ICGC was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe”

Page 30: Introduction to Cancer Genomics Databases

ICGCBAM/FASTQ

TCGABAM/FASTQ

ICGCOpenData

(includes TCGA

Open Data)

COSMICOpen Data

Page 31: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

ICGC Map – November 201367 projects launched

Page 32: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

ICGC datasets to date

Dec-11 Jan-2012 Feb March April June July Aug Sept OctMay Nov Dec Jan-2013 Feb March April May June July Aug Sept-2013

1000

2000

3000

4000

5000

6000

7000

8000

9000

10,000

Release 7Release 8

Release 9

Release 10

Release 11

Release 12Release 13

Release 14

Number of

Donors

ICGC Data Portal Cumulative Donor Count for Member Projects

Hardeep Nahal

Page 33: Introduction to Cancer Genomics Databases

• Cancer types: 41

• Donors: 8,532 (18,056 specimens)

• Simple somatic mutations: 1,995,134

• Copy number mutations: 18,526,593

• Structural rearrangements: 18,614

• Genes affected* by simple somatic mutations: 22,074

• Genes affected* by non-synonymous coding mutations: 19,150

Genes affected* by copy number mutations: 20,341

• Genes affected* by structural rearrangements: 1,884

• *out 22,259 protein coding genes annotated in Ensembl Human release 69

• Open tier and controlled data currently available

ICGC dataset version 14September 2013

Hardeep Nahal

Page 34: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 35: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 36: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Select “Pancreatic cancer – Canada”

Page 37: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

… But where is the data?

Page 38: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 39: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

http://dcc.icgc.org/

Page 40: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 41: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 42: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Can do bulk download of the data …

Page 43: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

DACO

DACO

ICGCICGC

dbGaPdbGaP

EGAEGA

TCGATCGA

BAMBAM

ERAERA

BAMBAM

+ EGA id

BAMBAMBAMBAM

Page 44: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 45: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

http://icgc.org/daco

Page 46: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

• Detailed Phenotype and Outcome data Region of residenceRisk factorsExaminationSurgeryRadiationSampleSlideSpecific histological featuresAnalyteAliquotDonor notes

• Gene Expression (probe-level data)• Raw genotype calls• Gene-sample identifier links• Genome sequence files

ICGC Controlled Access Datasets

• Cancer Pathology Histologic type or subtypeHistologic nuclear grade

• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up

• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Newly discovered somatic variants

ICGC OA Datasets

http://goo.gl/w4mrV

Page 47: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Identify yourselfIdentify yourself

Fill out detail form which includes:• Contact and Project Information•Information Technology details and procedures for keeping data secure•Data Access Agreement

Fill out detail form which includes:• Contact and Project Information•Information Technology details and procedures for keeping data secure•Data Access Agreement

All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf

All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf

Page 48: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 49: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 50: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 51: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 52: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 53: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 54: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

DACO approved projects

Page 55: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

DACO/DCC User Data Access Process

• Users approved through DACO are now automatically granted access to ICGC controlled access datasets available through the ICGC Data Portal and the EBI’s EGA repository

DACO Web ApplicationDACO Web Application

DCC User RegistryDCC User Registry

DCC Data Portal

DCC Data Portal

EBI EGAEBI EGA

application approvedby DACO

user accounts activated

Page 56: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Catalogue of Somatic Mutations in Cancer (COSMIC)

• http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/

• COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.

Page 57: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

COSMIC• Somatic Mutations Only• Diverse sources

– Literature (Arrays, Next-Gen, PCR...)– TCGA– ICGC

• Diverse ways to look at data– Gene– Variation– Tumour type– Cell line– Experiment

Page 58: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

FAQ

Page 59: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Looking up your favorite gene

1 2 3

Page 60: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 61: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Page 62: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

In closing

• Remember all these sites have great amounts of documentation

• The field is changing quickly, and so are the portals.

• New features are planned as we speak, and so you need to use the sites, and keep coming back.

• Don’t be afraid to explore• Interested in learning more after today? Consider

one of the bioinformatics.ca workshops!

Page 63: Introduction to Cancer Genomics Databases

Module 1: Cancer Genomic Databases bioinformatics.ca

Acknowledgements: the CBW gang

Michelle Brazas

MichaelStromberg

MarcFiume

MichaelBrudno