65
Chapter 6: Structural Variation and Medical Genomics CS-6293 Bioinformatics Instructor: Dr. Jianhua Ruan Presented by: Nesthor Perez

Chapter 6: Structural Variation and Medical Genomics

  • Upload
    cole

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Chapter 6: Structural Variation and Medical Genomics. CS-6293 Bioinformatics Instructor: Dr. Jianhua Ruan. Presented by: Nesthor Perez. Outline. Outline. 1. Introduction. Based on the genetic every single human has different genomes. - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 6:  Structural Variation and  Medical Genomics

Chapter 6: Structural Variation and

Medical GenomicsCS-6293 Bioinformatics

Instructor: Dr. Jianhua Ruan

Presented by: Nesthor Perez

Page 2: Chapter 6:  Structural Variation and  Medical Genomics

Outline

Nesthor Perez

1. Introduction

2. Germline and Somatic SVs

3. Technologies for Measurement of SVs

4. Resequencing Strategies for SVs

5. Representation of SVs

6. Challenges for Cancer Genomics

7. Future Prospects

Page 3: Chapter 6:  Structural Variation and  Medical Genomics

Outline

Nesthor Perez

1. Introduction

2. Germline and Somatic SVs

3. Technologies for Measurement of SVs

4. Resequencing Strategies for SVs

5. Representation of SVs

6. Challenges for Cancer Genomics

7. Future Prospects

Page 4: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

1. Introduction• Based on the genetic every single human has

different genomes.• Based on each genome there’s special trait for

diseases.• GWAS identified common germline.• DNA variants are associated to: diabetes, heart

deseases, and other deseases.• GWAS only explained fraction of heritability of traits.

Page 5: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

1. IntroductionEvery single person:

Has a different genome sequence:

Based on each person genetic and genomes, special trait are applied for each disease.

Page 6: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

1. Introduction• Cancer Genome Sequencing Studies identified

Somatic Mutations associated with cancer progression.

• This mutations are very heterogeneous.• Few mutations are common between patients.• Hard to associate mutations to cancer causes.• Comprehensive studies involve “all variants”.

Individual genomes are req for each case.

Page 7: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

1. Introduction• GWAS focus on Single Nucleotide Polymorphism:

every single human genome is unique.• Previously Germline Variants identified SCALES

ranging of DNA sequences:SNP’s Structural Variants

• Examples:– Duplications.– Deletions.– Inversions.– Translocations.

Page 8: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

1. Introduction• Then, GWAS identified common Single Nucleotide

Polymorphism SNP’s:Common SNP’s for common diseases (similarities).Common Variants between diseases (differences).

• Main purpose: Disease Association and Cancer Genetics Studies.

• In the last 5 years, DNA sequence next-generation technology become commercially available to companies: Illumina Life TechnologyComplete Genomics

Page 9: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

1. IntroductionChromosome components:

Page 10: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

1. IntroductionA reference genome range from SNPs to Stuctural Variants:

Page 11: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

1. IntroductionIn the last 5 years, these companies develop sequencing technology:

Consequently DNA cost decreased

Page 12: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

1. Introduction• Consequently the cost of DNA practice has

decreased.• DNA at low cost, the study of all variables is possible.• All variables:

Germlines. Somatics. SNP’s (Single Nucleotide Polymorphism). SV’s (Structural Variants).

• This paper talks about these sequence technologies, especially on Structural Variables: SV’s.

Page 13: Chapter 6:  Structural Variation and  Medical Genomics

Outline

Nesthor Perez

1. Introduction

2. Germline and Somatic SVs

3. Technologies for Measurement of SVs

4. Resequencing Strategies for SVs

5. Representation of SVs

6. Challenges for Cancer Genomics

7. Future Prospects

Page 14: Chapter 6:  Structural Variation and  Medical Genomics

Outline

Nesthor Perez

1. Introduction

2. Germline and Somatic SVs

3. Technologies for Measurement of SVs

4. Resequencing Strategies for SVs

5. Representation of SVs

6. Challenges for Cancer Genomics

7. Future Prospects

Page 15: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

2.1 Germline Structural Variation• Human Genetic Study has a big purpose:

Identify a unique DNA sequence• Attempts:

Identify common SNP’s (HapMap project).Whole-Genome Seq & Micro-Array measurement found

similar SV’s for:DuplicationsDeletionsInversions

Then, common SV’s are now linked to:AutismSchizophrenia

Page 16: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Human Genetics Study purpose:Identify a unique DNA

sequencing.

2.1 Germline Structural Variation

Steps:

Identify common SNPs

Whole-Genome Seq and Micro-Array measurement found similar SVs through:

- Duplications- Deletions- Inversions

Large DNA seq

Page 17: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

2.2 Somatic Structural Variation• Cancer: driven by somatic mutations accumulated in

life: “Micro Evolutionary Process”.• Early studies in Leukemia and Lymphoma.• Identified as “Recurrent Chromosomal

Rearrangements”.• Present in many patients with the same cancer.• DNA sequence Next-Generation reconstruct how

cancer genomes are organized at single nucleotide resolution.

Page 18: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

2.3 Mechanisms of Structural Variation

• Base on the amount of sequence similarity

(homology) at the breakpoint of SV’s, there are two

mechanism:

NHEJ: Non-Homologus End Joining:Little or no sequence similarity.

NAHR: Non-Allelic Homologous Recombination:High sequence similarity.

Page 19: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Cytogenetic Techniques:

Chromosome Painting:

2.3 Mechanisms of Structural Variation

Page 20: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Cytogenetic Techniques:2.3 Mechanisms of Structural Variation

Page 21: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Cytogenetic Techniques:

Fluorescent in Situ Hybridization (FISH):

2.3 Mechanisms of Structural Variation

Page 22: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

(FISH)

Page 23: Chapter 6:  Structural Variation and  Medical Genomics

Outline

Nesthor Perez

1. Introduction

2. Germline and Somatic SVs

3. Technologies for Measurement of SVs

4. Resequencing Strategies for SVs

5. Representation of SVs

6. Challenges for Cancer Genomics

7. Future Prospects

Page 24: Chapter 6:  Structural Variation and  Medical Genomics

Outline

Nesthor Perez

1. Introduction

2. Germline and Somatic SVs

3. Technologies for Measurement of SVs

4. Resequencing Strategies for SVs

5. Representation of SVs

6. Challenges for Cancer Genomics

7. Future Prospects

Page 25: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

• SV’s features are based on: Size.Complexity.Ranging: from hundred of nucleotides to large scale of

chromosome rearrangements.Cytogenetic Techniques:

Chromosome Painting.Spectral Karyotyping (SKY).Fluorescent in Situ Hybridization. (FISH)

3. Technologies for Measurement of Structural Variation

Page 26: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3. Technologies for Measurement of Structural Variation

• Large SV’s can be observed on CHROMOSOMES:

Page 27: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3.1 Microarrays• This technology was used for the first genome-wide

survey in 2004.• This technique apply the concept of “array

Comparative Genomic Hybridization: aCGH.• Reference genome are identified by a fluorescent

color.• By now, there are hundreds of thousands of probes

avaiables.• Since individual copy number ratios are subject to

experimental errors, computational techniques are required to analyze aCGH.

Page 28: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3.1 Microarrays

Page 29: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3.1 Microarrays• aCGH can be used to measure both: germline SV’s in

normal genomes and somatic SV’s in cancer genomes.

• aCGH initially was developed for cancer genomics applications.

• aCGH now is also used to detect copy number variants in large number of genomes at low cost.

• aCHG limitations:Detects only copy number variants.Requires that genomic probes from the reference genome

lie in non-repetitive regions.

Page 30: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3.2 Next-generation DNA Sequencing Technologies

• Since DNA sequencing technology has demonstrated substantial sophistication, the DNA analysis cost has decreased a lot, too.

• A limitation can be the length of a DNA that can be sequenced.

• DNA short sequences range from 30 to 1000 nucleotides, or base pairs (bp).

Page 31: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3.2 Next-generation DNA Sequencing Technologies

• Some DNA sequence technologies use a paired-end sequencing protocol to increase read length.

• At earlier Sanger sequencing protocols the DNA fragments size depended on the cloning vector.

• At next-generation technologies, several techniques have been used to generate paired reads.

• Today, latest techniques produce paired reads from fragments of only a few hundred bp to fragments of 2-3 kb.

Page 32: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3.2 Next-generation DNA Sequencing Technologies

• Next-generation sequencing technologies have limited read lengths and limited insert sizes in comparison to Sanger sequencing.

• Two approaches to detect SV’s using DNA next-generation technology:Novo Assembly:

Sophisticated algorithms are used to reconstruct genome sequences from overlaps between reads.

Human genome assemblies are highly fragmented.

Page 33: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3.2 Next-generation DNA Sequencing Technologies

• Two approaches to detect SV’s using DNA next-

generation technology:

Resequencing:Differences are found between an individual genome and a related

reference genome.

These differences are the same differences between the aligned

reads and the reference sequence.

Page 34: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

From earlier DNA Generation to new sequencing technology:

3.2 Next-generation DNA Sequencing Technologies

Advantages:

Disadvantages:Limitation in the length of a DNA molecule to be sequenced:

Today’s technologies produce “SHORT SEQUENCES” of DNA.Range:

30 1000 nucleotides

In order to increase read length, these DNA sequencing technologies use:Paired End or Mate Pair

Page 35: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3.2 Next-generation DNA Sequencing Technologies

There’re two approaches to detect SVs:

Page 36: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3.3 New DNA Sequencing Technologies• Previous DNA technologies challenges have been

several limitations.• For example:

SV’s breakpoints in high-repetitive sequences.• Third-generation and single molecule technologies

offer additional advantages for SV’s:– Longer reads lengths.– Easier sample preparation.– Lower input DNA requirements.– Higher throughput.

Page 37: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3.3 New DNA Sequencing Technologies• Third-generation technologies expected

improvements:– Paired reads:

Include more than two reads from a single DNA fragment.– Long-range sequence information with low input DNA

requirements.• Sequencing technologies keep a fast development

thanks to the improvements of:– Chemistry.– Imaging.– Technology manufacture.

Page 38: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

3.3 New DNA Sequencing Technologies• New improvements are expected about:

– Increasing read lengths.– Inserting lengths.– Enhancing throughput.

• A new sequencing technology is the “Nanopore”, which directly read the nucleotides of long molecules of DNA, giving a dramatic advance.

• Using Nanopore, extremely long reads (tens of kb) are generated.

Page 39: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Longer read lenghts:New features:

3.3 New DNA Sequencing Technologies

Higher throughput:

Page 40: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

New features:

3.3 New DNA Sequencing Technologies

Easier sample preparation

Page 41: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

New features:

3.3 New DNA Sequencing Technologies

Lower input DNA requirements:

Page 42: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Keep active development thanks new improvements around:

3.3 New DNA Sequencing Technologies

Chemistry: Imaging Processing:

Data Processing:

Page 43: Chapter 6:  Structural Variation and  Medical Genomics

Outline

Nesthor Perez

1. Introduction

2. Germline and Somatic SVs

3. Technologies for Measurement of SVs

4. Resequencing Strategies for SVs

5. Representation of SVs

6. Challenges for Cancer Genomics

7. Future Prospects

Page 44: Chapter 6:  Structural Variation and  Medical Genomics

Outline

Nesthor Perez

1. Introduction

2. Germline and Somatic SVs

3. Technologies for Measurement of SVs

4. Resequencing Strategies for SVs

5. Representation of SVs

6. Challenges for Cancer Genomics

7. Future Prospects

Page 45: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

4. Resequencing Strategies for Structural Variation

• Purpose:Predict SV’s by alignments of sequence reads to the reference genome.

• Steps:Alignments of readsPrediction of SV’s from alignments.

• Resequencing is straightforward in principle but detection of SV’s in human genomes is really hard.

• Some types of SV’s are easy to detect, other are really difficult.

Page 46: Chapter 6:  Structural Variation and  Medical Genomics

Step 1: Alignments of reads:

4. Resequencing Strategies for Structural Variation

Reads

Page 47: Chapter 6:  Structural Variation and  Medical Genomics

Step 2: Predictions of SVs from alignments:

4. Resequencing Strategies for Structural Variation

“Disease”

Page 48: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

4. Resequencing Strategies for Structural Variation

• Some SV’s are hard to detect due technological limitations and biological features.

• Technological limitations: Sequencing errors. Limited read lengths. Insert sizes.

• SV’s biological features :Enriched for repetitive sequences near their breakpoints.Overlap: multiple states or complex architectures.Recurrent variants at the same locus.

Page 49: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

4. Resequencing Strategies for Structural Variation

• Therefore, alignments and predictions of SV’s are not easy tasks.

• Effective algorithms are required for highly sensitive and specific predictions of SV’s.

• Three approaches to identify SV’s from aligned reads: Split reads. Depth of coverage analysis. Paired-end mapping.

Page 50: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

4.1 Read Alignment• This is one of the most researched problem in

Bioinformatics.• Specialized task of aligning millions to billions of

individual short reads is done by software like:Maq.BWA.Bowtie/Bowtie2.BFAST.mrsFAST.

Page 51: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

4.1 Read Alignment• Reading alignment can be done getting a single

alignment for each read, or reads with multiple high-quality alignments.

• Choosing an alignment randomly with multiple alignments of equal score, is another option.

• In case of unique alignment, there’s a limitation to detect SV’s with breakpoints in repetitive regions.

• In case of ambiguous alignment, SV’s prediction requires an algorithm to distinguish between multiple possible alignments for each read.

Page 52: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

4.2 Split Reads• This is a direct approach to detect SV’s where

alignments are in two parts.

• To reduce false positive predictions, multiple split

reads are required.

• Split reads is only feasible when reads are sufficient

long.

Page 53: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

4.3 Depth of Coverage• Depth of coverage detects differences in the number

of reads that align to intervals in the reference genome.

• The number of reads in a nucleotide is:c = NL , where N is the number of reads G L is the length of each read

G is the length of the genome c is the coverage

• An example is “30X coverage”, which means a number of reads of c = 30.

Page 54: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

4.3 Depth of Coverage• In case an individual genome got a deletion of a

segment, the coverage of this segment is reduced to the half.

• In case an interval of the reference genome was duplicated or amplified, the coverage increases in the same number of copies.

• The coverage depth indicates the number of copies of this interval in the genome.

• Coverage calculation is affected by repetitive sequences.

Page 55: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

4.4 Paired-end Sequencing and Mapping• This is the most common resequencing approach.• This is used to identify somatic SV’s in cancer

genomes and germline SV’s.• This is using several next-generation sequencing

technologies.• This is used to obtain paired reads from opposite

ends of a larger DNA.• The length of particular sequenced fragment is

unknown.

Page 56: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Outline

1. Introduction

2. Germline and Somatic SVs

3. Technologies for

Measurement of SVs

4. Resequencing Strategies for

SVs

5. Representation

of SVs

6. Challenges for Cancer

Genomics

7. Future Prospects

Page 57: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Outline

1. Introduction

2. Germline and Somatic SVs

3. Technologies for

Measurement of SVs

4. Resequencing Strategies for

SVs

5. Representation

of SVs

6. Challenges for Cancer

Genomics

7. Future Prospects

Page 58: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

5. Representation of Structural Variants• Earlier DNA technologies have reduced the survey

cost of SV’s.• The Cancer Genome Atlas (TCGA) are performing

paired-end sequencing and aCGH of several human genomes.

• On the other hand, Microarray-based techniques are being used for small or single investigator projects.

• Therefore, in the future there’s an expectation of enormus number of measurement of SV’s.

Page 59: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Outline

1. Introduction

2. Germline and Somatic SVs

3. Technologies for

Measurement of SVs

4. Resequencing Strategies for

SVs

5. Representation

of SVs

6. Challenges for Cancer

Genomics

7. Future Prospects

Page 60: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Outline

1. Introduction

2. Germline and Somatic SVs

3. Technologies for

Measurement of SVs

4. Resequencing Strategies for

SVs

5. Representation

of SVs

6. Challenges for Cancer

Genomics

7. Future Prospects

Page 61: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

6. Challenges for Cancer Genomics Studies• Most cancer genomes are aneuploid, so the number

of copies of regions are variables.• High-resolution reconstruction of cancer genomes

are too small to be detected by cytogenetics.• Cancer is a heterogeneous mixture of cells with

possibly several number of mutations.• Heterogeneity means admixture and subpopulation

of tumor cells.• Some subpopulations contain mutations.• Most cancer genomes do not sequence single tumor

cells. They sequence mixture of cells.

Page 62: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Outline

1. Introduction

2. Germline and Somatic SVs

3. Technologies for

Measurement of SVs

4. Resequencing Strategies for

SVs

5. Representation

of SVs

6. Challenges for Cancer

Genomics

7. Future Prospects

Page 63: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Outline

1. Introduction

2. Germline and Somatic SVs

3. Technologies for

Measurement of SVs

4. Resequencing Strategies for

SVs

5. Representation

of SVs

6. Challenges for Cancer

Genomics

7. Future Prospects

Page 64: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

7. Future Prospects• It will be possible to systematically measure nearly all

but most complex variants in an individual genome.• SV’s between nearly identical sequences might

remain inaccesible until significally different types of DNA sequencing technologies become available.

• Having a complete list of germline SV’s, unsolved heritability for a trait cannot readily be the cause of lack of measurement of genetic information.

• The efficacy of particular treatments will require additional and hard working for future successfull results.

Page 65: Chapter 6:  Structural Variation and  Medical Genomics

Nesthor Perez

Thanks