27
ChimeraScan Chimeric transcript discovery by paired end transcriptome sequencing.

BC-Cancer ChimeraScan Presentation

Embed Size (px)

Citation preview

Page 1: BC-Cancer ChimeraScan Presentation

ChimeraScanChimeric transcript discovery by paired end

transcriptome sequencing.

Page 2: BC-Cancer ChimeraScan Presentation

AGENDA

• Overview: What is ChimeraScan?

• ChimeraScan Method(Algorithm)

• How to run ChimeraScan?

• ChimeraScan Results

• Limitations: What could be done better?

• Comparison with current software(deFuse, Trans-Abyss)

Page 3: BC-Cancer ChimeraScan Presentation

WHAT IS CHIMERASCAN?

• A tool for discovering chimeric transcripts or fusions in sequencing data.

Page 4: BC-Cancer ChimeraScan Presentation

ChimeraScan Method

● ChimerScan differs from other fusion finders(deFUSE) in that it adds a fragmentationstep along with the whole paired-endapproach which is also used by deFUSE.

Tell me more!!!!

Page 5: BC-Cancer ChimeraScan Presentation

ChimeraScan Algorithm

Fragmentation

Page 6: BC-Cancer ChimeraScan Presentation

ChimeraScan Algorithm

Step 1: Prepare reads for alignment

ChimeraScan parses FASTQ

1) converts all quality scores to Sanger format

(Phred + 33)

2) converts the qname for the reads from an arbitrarily long

string to a number (1/1, 1/2 for PE reads)

Reduces storage requirements for intermediate steps

Page 7: BC-Cancer ChimeraScan Presentation

ChimeraScan Algorithm

Pysam package is used.

Step 3: Create a sorted/indexed BAM file

Enables fast lookup of original read alignments by genomic coordinates.

Step 4: Estimate insert size distribution

Only uniquely mapping reads are used to sample the insert size distribution (used in future steps to help localize fusion

breakpoints).

Page 8: BC-Cancer ChimeraScan Presentation

ChimeraScan Algorithm

Step 5: Realign initially unmapped reads(Fragmentation)All of the initially unmapped reads are treated as single reads and realigned.

Additionally, the reads are trimmed such that only the sequences at the ends of the

fragment are aligned (default=25bp).

Step 6: Discover discordant reads

Page 9: BC-Cancer ChimeraScan Presentation

ChimeraScan Algorithm

Step 7: Nominate chimeras(fragment size distribution used)

Step 8: Extract chimeric breakpoint sequences(from genome FASTA file)

bowtie indexer used to create new alignment index of these breakpoint sequences

Step 9: Nominate reads that could span breakpoints

7

9

7 8

Page 10: BC-Cancer ChimeraScan Presentation

ChimeraScan Algorithm

Step 10: Align against breakpoint sequence database

(Created in step 8)

Step 11: Assess breakpoint spanning alignment

results (min anchor > #homologous bases between 5’->3’ at breakpoint

+ #mismatches allowed)

Reads that align to the breakpoint sequence index are

discarded if the overlap is small (less than anchor_min bases)

or have larger overlap but contain mismatches (red reads).

Reads overlapping the breakpoint by more than anchor_length

bases are retained (green read).

Page 11: BC-Cancer ChimeraScan Presentation

ChimeraScan Algorithm

Step 12: Filter chimerasMany filters which the user can specify to minimize the amount of false

positives.(know-false-positives, filter-size-distribution, supporting reads)

Step 13: Produce a text output file (BEDPE file)

Page 12: BC-Cancer ChimeraScan Presentation

How to run ChimeraScan

STEP 1: Generate read paired fastq files from merged bam files

'Bash baprojects/trans_scratch/software/deFUSE/scripts/bam2fastq.converter.sh'

INPUT(S):

BAM_FILE_PATH(ABSOLUTE)

LIBRARY_ID

OUTPUT_DIRECTORY

Page 13: BC-Cancer ChimeraScan Presentation

How to run ChimeraScan

STEP 2: Submit Chimerascan to cluster

'python /projects/trans_scratch/chimerascan/chimerascan-0.4.5/bin/chimerascan_run.py'

INPUT(S):

-v: verbose (for logging and debugging)

-p: processors(tested with -p = 8)

chimerascan_index(generated during chimerascan installation)

Fastq_1, Fastq_1 (both generated in step 1)

Page 14: BC-Cancer ChimeraScan Presentation

How to run ChimeraScan

Combine steps 1 & 2:

'bash /projects/trans_scratch/chimerascan/chimerascan_setup.sh'

INPUT(S):

PATIENT_ID

LIBRARY_ID

BAM_FILE_PATH

PROJECT_DIRECTORY

output(S):'qsub_all_chimerascan.sh': a script that submits both

steps 1 & 2 to the cluster. Jobs are run serially(fastq files are created before the chimerascan job is submitted)

Page 15: BC-Cancer ChimeraScan Presentation

ChimeraScan Results

Output(S):

Chimerascan outputs a chimeras.bedpe tabular file.

The chimeras.bedpe file contains information about the chromosomal regions, transcript ids, genes, and statistics for each chimera. The file adapts to the BEDPE format for representing paired-intervals (courtesy Aaron Quinlan and the BEDTools project).

The chimeras.bedpe also contains spanning and supporting reads(total score) for each reported events.

Other intermediate files are also created during the run, but they do not contain any useful information and thus can be deleted after the run is complete.

Page 16: BC-Cancer ChimeraScan Presentation

ChimeraScan Results

PROJECT LIBRIRAY_ID TOTAL TIME TOTAL SPACE

MCF7 A37098 ~23 HRS 178 GB

UHR Z01229 ~21 HRS 132 GB

COLO-829 A36972 ~20 HRS 157 GB

OUR VALIDATION:

Run settings: 8 cores, 8 parallel jobs

Page 17: BC-Cancer ChimeraScan Presentation

Limitations: What could be better?

Lack of an injective(one to one) mapping from chimeras.bedpe event types to our current set of event types.

Translocation ---> {interchromosomal}

Duplication ---> {intrachromosomal_complex, adjacent_complex}

Deletion ---> {intrachromosomal, intrachromosomal_diverging, intrachromosomal_complex}

Inversion --> {intrachromosomal_diverging}

Relies on an annotated set of genes(found in the reference index)

High sensitivity but also high number false positives. (tradeoff??)

Page 18: BC-Cancer ChimeraScan Presentation

Comparison with current software

MCF7 LIBRARY ChimeraScan DeFUSE(filtered)

Trans-Abyss (1.4.8)

Total events 629 503 161

Validated events found

32/89 35/89 33/89

Validated events not found

57 54 56

Novel events found

2 3 4

89 events were listed in the publications

18 events out of 89 were novel events

71 events out of 89 were previously known events

Note: The events for trans-abyss were taken from the 'sense_fusions.tsv' tabular file.

Page 19: BC-Cancer ChimeraScan Presentation

Comparison with current software(Validated Events)

Library: MCF7(A37098)

Total Events Found: 45/89

Events unique to Chimerascan: 2/89

Events unique to deFUSE: 5/89

Events unque to Trans-Abyss: 5/89

2

5522

3

Trans-abyss deFUSE

ChimeraScan

3 5

Page 20: BC-Cancer ChimeraScan Presentation

Comparison with current software(All Events)

Library: MCF7(A37098)

Total Events Found: 10,160

Events unique to Chimerascan: 587/10,160

Events unique to deFUSE: 502/10,160

Events unque to Trans-Abyss: 8857/10,160

587

5028857

15

172

Trans-abyss deFUSE

ChimeraScan

8 19

Page 21: BC-Cancer ChimeraScan Presentation

Comparison with current software

UHRLIBRARY

ChimeraScan DeFUSE(filtered)

Tran-Abyss (1.4.8)

Total events 1304 192 78

Validated events found

21/68 14/68 21/68

Validated events not found

47 54 47

68 events were listed in the publications

14 events out of 68 were externally verified events

44 events out of 68 were previously known events

Note: The events for trans-abyss were taken from the 'sense_fusions.tsv' tabular file.

Page 22: BC-Cancer ChimeraScan Presentation

Comparison with current software(Validated Events)

Library: UHR(Z01229)

Total Events Found: 28/68

Events unique to Chimerascan: 4/68

Events unique to deFUSE: 0/68

Events unque to Trans-Abyss: 5/68

4

05

59

3

2

Trans-Abyss deFUSE

ChimeraScan

Page 23: BC-Cancer ChimeraScan Presentation

Comparison with current software(All Events)

Library: UHR(Z01229)

Total Events Found: 18,015

Events unique to Chimerascan: 1279/18,015

Events unique to deFUSE: 154/18,015

Events unque to Trans-Abyss: 16,558/18,015

1279

15416,558

69

9

24

Trans-Abyss deFUSE

ChimeraScan

Page 24: BC-Cancer ChimeraScan Presentation

Comparison with current software(All Events)

Library: COLO-829(a36972)

Total Events Found: 3,361

Events unique to Chimerascan: 458/3,361

Events unique to deFUSE: 225/3,361

Events unque to Trans-Abyss: 2,668/3,361

458

2252,668

01

6

3

Trans-Abyss deFUSE

ChimeraScan

Page 25: BC-Cancer ChimeraScan Presentation

What's Next???

• Improve Runtime

• Find an injective mapping from chimeras.bdpe event types to our current set of event types

Page 26: BC-Cancer ChimeraScan Presentation

Reference(s)

• Iyer MK, Chinnaiyan AM, Maher CA. ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics. 2011;27(20):2903-2904. doi:10.1093/bioinformatics/btr467.

• Weirather JL, Afshar PT, Clark TA, et al. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing. Nucleic Acids Research. 2015;43(18):e116. doi:10.1093/nar/gkv562.

Page 27: BC-Cancer ChimeraScan Presentation

Karen MungallCaleb Choo

AWKNOLEDGEMENTS