96
Variant View Visualizing Sequence Variants in their Gene Context Joel A. Ferstay 1* , Cydney B. Nielsen 2 , Tamara Munzner 1 University of British Columbia 1 , now at AeroInfo Systems/Boeing Canada* BC Cancer Agency 2 1

Variant View Visualizing Sequence Variants in their Gene Context Joel A. Ferstay 1*, Cydney B. Nielsen 2, Tamara Munzner 1 University of British Columbia

Embed Size (px)

Citation preview

  • Slide 1

Variant View Visualizing Sequence Variants in their Gene Context Joel A. Ferstay 1*, Cydney B. Nielsen 2, Tamara Munzner 1 University of British Columbia 1, now at AeroInfo Systems/Boeing Canada* BC Cancer Agency 2 1 Slide 2 Design study Real users, real tasks, real data Find out what users are doing Support with visualization tool Reflect and present guidelines Analys t Tool 2 Slide 3 Design Study The Design Process 3 Slide 4 Collaborated with analysts at the Genome Sciences Centre Study genetic basis of leukemia Needed help interpreting their data Major problems: What do we show? How do we show it? 4 Slide 5 Design cycle Intervie w 5 Slide 6 Design cycle Intervie w Data and Tasks 6 Slide 7 Design cycle Intervie w Data and Tasks Create Data Sketch 7 Slide 8 Design cycle Intervie w Data and Tasks Create Data Sketch Present Data Sketch 8 Slide 9 Design cycle Intervie w Data and Tasks Create Data Sketch Present Data Sketch REPE AT 9 Slide 10 Data sketches Alternative to paper prototyping Load and show real data Beneficial when the data is complex 10 [Lloyd and Dykes, InfoVis 2011] Slide 11 Can identify features of interest in data 11 Emphasize some features over others Slide 12 Can identify design dead ends early 12 Slide 13 Methods and durations Semi-structured interviews 7 months Once per week One hour in duration Presented data sketches 8 deployed over 5 months Implemented with D3 toolkit [Bostock et al., InfoVis 2011 ] 13 Design study methodology [Sedlmair et al., 2012] Slide 14 Problem characterization: Data 14 Slide 15 The data Data are sequence variants Difference between reference genome and a given genome 15 Slide 16 The data Data are sequence variants Difference between reference genome and a given genome 16 Reference Genome DNA:ATA TGA TCA ACA CTT Sample 1 Genome DNA:ATA TGG TCA ATA CTT Sample 2 Genome DNA:ATA TGA TGA ACA CCT Slide 17 The data Data are sequence variants Difference between reference genome and a given genome 17 Reference Genome DNA:ATA TGA TCA ACA CTT Sample 1 Genome DNA:ATA TGG TCA ATA CTT Sample 2 Genome DNA:ATA TGA TGA ACA CCT Harmful ? Harmles s? Slide 18 Multi-scale data 18 DNA of the organism Slide 19 Multi-scale data 19 Genes are functional units Slide 20 Multi-scale data 20 Exons code for proteins Slide 21 Multi-scale data 21 Regions within proteins perform activities Slide 22 Related work: Genome scale 22 Slide 23 Ensembl genome browser 23 Explore genome and variant data [Chen et al., BMC Genomics 2010] Slide 24 Genome scale shown at the top 24 Slide 25 User data is stacked in horizontal tracks 25 Slide 26 User data is stacked in horizontal tracks 26 Very flexible framework Slide 27 Problem with the genome scale 27 Features of interest become small Slide 28 Problem with the genome scale 28 Features of interest become small Varian t Slide 29 Problem with the genome scale 29 Features of interest become small Varian t Exo n Slide 30 Analysts must pan and zoom 30 Slide 31 Analysts must pan and zoom 31 Heavy interaction costs with zooming Slide 32 Analysts must pan and zoom 32 Heavy interaction costs with zooming Varian t Must know where to look Slide 33 Raw variant data (What they looked at before) 33 Slide 34 Raw data variant-centric 34 Tabular data format Slide 35 Raw data variant-centric 35 Tabular data format Variant row 1 Slide 36 Raw data variant-centric 36 Tabular data format Variant row 2 Slide 37 Raw data variant-centric 37 Tabular data format Variant row 2 Difficult to reason about variants without their biological context Slide 38 Multiple biological levels/scales What to show? 38 Slide 39 Many biological levels and scales 39 Slide 40 Only some levels and scales are beneficial for variant analysis 40 Slide 41 Filter out whole genome; keep genes 41 Slide 42 Filter out non-exon regions 42 Slide 43 Left with a filtered scope 43 Slide 44 Related work: Filtered scope 44 Slide 45 The Ensembl Variation Image 45 [Chen et al., BMC Genomics 2010] Slide 46 First filtering step: per-gene view 46 [Chen et al., BMC Genomics 2010] One gene shown Slide 47 Second filtering step: partial removal inter-exon regions 47 [Chen et al., BMC Genomics 2010] Partial removal Slide 48 Problem: Extends multiple screens 48 [Chen et al., BMC Genomics 2010] Slide 49 Problem: Extends multiple screens 49 [Chen et al., BMC Genomics 2010] 1 st Screen Slide 50 Problem: Extends multiple screens 50 [Chen et al., BMC Genomics 2010] 2 nd Screen Slide 51 Problem: Features of interest small 51 [Chen et al., BMC Genomics 2010] Exon regions small Color coding difficult to see Slide 52 Our Goal: Show attributes necessary for variant analysis 52 Slide 53 Use information-dense visual encoding 53 Slide 54 Use information-dense visual encoding 54 Slide 55 Use information-dense visual encoding 55 Slide 56 Use information-dense visual encoding 56 Slide 57 Use information-dense visual encoding 57 Slide 58 Use information-dense visual encoding 58 Slide 59 Use information-dense visual encoding 59 Slide 60 The tool: Variant View 60 Slide 61 Variant View 61 Slide 62 Variant View 62 Information-dense single gene view Slide 63 Variant View 63 Information-dense single gene view No need for pan and zoom Slide 64 Variant View 64 Sorting metrics guide gene navigation Slide 65 Variant View 65 Sorting metrics guide gene navigation Control what shows up here Slide 66 Variant View 66 Peripheral supporting data Slide 67 Related work: Targeted for variant analysis 67 Slide 68 MuSiC variant visualization plot 68 [Dees et al., Genome Research 2012] Slide 69 Side-by-side comparison 69 Slide 70 Side-by-side comparison 70 Protein regions can overlap Regions get separate lanes Slide 71 Side-by-side comparison 71 Large bloom of repeated elements: more salient Many collocated variants Slide 72 Driving biological tasks 72 Slide 73 Task 1: Discover genes Tool originally designed to discover genes with harmful variants Sorting metrics guide single gene navigation Uncover new genes affected by variants in leukemia Want to see if Variant View exposed known genes in leukemia 73 Slide 74 74 Slide 75 Sorting by derived metric revealed known leukemia genes 75 Slide 76 Highly scored gene by sorting metric 76 Slide 77 Visual inspection reveals collocation of variants 77 Slide 78 Several functional protein regions affected 78 Slide 79 Highly scored by metric and not known 79 Slide 80 Protein chemical class change evident 80 Slide 81 In contrast, low scoring gene 81 Slide 82 No collocation of variants 82 Slide 83 Mostly unaffected protein regions 83 Slide 84 Variant tasks Started with the main task of discover gene Shared tool with analysts Identified two more tasks! Patient compare Debug pipeline 84 Slide 85 Task 2: Patient compare Clinical setting application Compare patient data to known harmful variants The challenge Similarity is loosely understood rather than fully characterized What constitutes a match requires visual inspection 85 Slide 86 Adapted Variant View with minimal changes 86 Slide 87 Navigate through patient data with list 87 Slide 88 Patient data emphasized with arrows 88 Slide 89 Patient has same harmful L to P mutation 89 Slide 90 These variants probably do not match 90 Slide 91 Task 3: Debug pipeline Debug data generation pipeline Remove errors from data before analysis takes place Analysts originally did not think they needed this support 91 Slide 92 Tool revealed errors in the data 92 The tool exposed artifacts in the data that slid past at least two rounds of quality metric filtering this type of problem would not have been caught by our previous, automated methods. - Analyst 3 Slide 93 Conclusions 93 Designed, implemented, and deployed tool for visual variant impact assessment Originally designed for Discover Genes task Adapted to two others with minimal changes Methods What to show Filtering data scope How to show it Carefully selected visual encodings Goals Navigation-free main overview at gene level Reveal genes of interest; accomplished by sorting by new, derived metrics Slide 94 Acknowledgements 94 Our collaborators at the GSC Dr Aly Karsan Rod Docking Dr Linda Chang Dr Gerben Duns Simon Chang Funding VIVA, AeroInfo Systems/Boeing Canada, MITACS Slide 95 Questions? 95 Joel Ferstay: [email protected] Paper Page: http://www.cs.ubc.ca/labs/imager/tr/2013/ VariantView/ Software Available as Open Source Slide 96 Future work: Other use contexts 96 Scaling up beyond the current design target ~50 variants at once Possibly integrate Variant View with MedSavant Tool by Fiume et al., BioVis Posters 2012 Focus on interactive filtering