Exome sequencing and complex disease :

Preview:

DESCRIPTION

Exome sequencing and complex disease :. practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand. What is exome sequencing ?. Exon : coding sequence of the DNA Exome sequencing : - PowerPoint PPT Presentation

Citation preview

1

EXOME SEQUENCING AND COMPLEX DISEASE :practical aspects of rare variant association studies

Alice Bouchoms

Amaury Vanvinckenroye

Maxime Legrand

2

WHAT IS EXOME SEQUENCING ?

Exon : coding sequence of the DNA Exome sequencing :

Aim : to sequence the coding part of the DNA i.e. the exons

3

INTRODUCTION

GWAS : helped discover common coding variants

Exome sequencing Also rare coding variants Faster, better large sample ( > 10 000 individuals) Before 2010 : only few publications on PUBMED Now : more than 2000 publications on PUBMED

2013

20122011

4

KEY QUESTIONS TO ASK YOURSELF

5

STUDY DESIGN

State objectives Focus on extreme outcomes

• Unusual phenotype or traits• BUT : CAREFUL : de novo mutations

Geographical restrictions ?

6

STUDY DESIGN Sequencing strategy ?

Quality of the sample : 20x or greater level of coverage

depth of sequencing/person : 60x or greater Non-coding regions : can still be usefull

Determine ancestries or estimate genotype 0,2x to 2x

7

VARIANT CALLING

Goal : obtain high-quality genotypes Several steps:

DNA contamination, DNA fingerprints, good follow-up?

Alignment with reference genome, calibration of base quality score, removal of duplicate reads.

8

VARIANT CALLING

After reads mapping: Sample quality metrics (spotting of outlier

properties) Variant calling:

Look for differences where overlaps appear in alignment with the reference genome

9

VARIANT CALLING Machine-learning-based classifier:

Polymorphic variants / artifacts Evaluate metrics : true / false positives

Quality metrics on samples Recommendation: min depth of coverage 20X Development of standards for storing sequence

data and variant calls

ASSOCIATION ANALYSIS Goal: find functional effects of variants Score: indicates the effect on the protein function

Separation between variants with high damage and the others

If multiple annotations, 3 ways: Focus on the longest transcript Focus on the most deleterious effect Focus on the canonical transcript

11

ASSOCIATION ANALYSIS

Single variant association test Check of quality data

Usual way of processing rare variants: gather them in groups acting on the same gene to do the analysis

12

ASSOCIATION ANALYSIS 2 methods for processing groups:

Comparison of the number of variants between cases and controls

Comparison with chance expectations Recommendation: at least a test of each category

with different thresholds If no threshold, variety of frequency cut-offs

13

ASSOCIATION ANALYSIS

Packages available to perform the tests with subsets of data

Example : 1. missense, splice, stop altering variants 2. subset of deleterious variants 3. splice, stop altering variants

14

ASSOCIATION ANALYSIS No optimal choices for the analysis because of

variability of variants and of their charateristics between genes.

Permutation-based approachesStatistical significance

If no permutation-based threshold, p values ≤ 5 10-7

QQ plots to summarize the results

15

APPROACHES FOR FOLLOW-UP

To demonstrate association based on the analysed samples, additional samples are needed.

16

APPROACHES FOR FOLLOW-UP

Exome chip experiments examine most of the varaints, but not very sensitive to non-European populations.

17

APPROACHES FOR FOLLOW-UP

Statistical imputation

Take the base which has the highest correlation with the missing one, and assume it is the same allele than T (i.e. minor or major). But again, often not possible for mixed

populations

18

ROLE OF FUNCTIONAL ASSAYS

Study the changes in the proteins due to coding variants

Study why these changes result in diverse diseases.

19

FORWARD GENETICS

Other approach to study functional variants First look at which proteins show changes Then search in the DNA sequence for the

variant(s)

20

DISCUSSION

In other articles : more careful about the sample quality gain of sensitivity in variant calls if made among

several samples indels in variant call are the major source of false

positive. Need alignment algorithm which allows gapped alignement

Check results of association in data bases

21

DISCUSSION

Because of costs, exome sequencing studies focus on coding part of the genome. Thus not suitable for non-exonic sequence. (stuctural variants, chromosomal rearrangements)

These problems will be partially solved by the cut in costs of sequencing

22

REFERENCES

23

Recommended