Upload
lee-campbell
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
HMS Genetics Department 2010 RetreatComputational Biology breakout session
• John Aach, PhD, Lecturer, Church Lab, Department of Genetics • Mark Borowsky, PhD, Director, Molecular Biology
bioinformatics team & co-Director, Illumina sequencing core.• Peter Park. PhD, Assistant Professor of Pediatrics &
Associate Director of Bioinformatics, PCPGM
1
HMS Genetics Computational Biology breakout session
About this session …
• Purpose: Encourage interactions with computationalists; discuss how computational methods can work effectively in research
• Session objectives:
2
Reason to be here What we hope to do
• Question about specific tools or specific data in your project
Tell you what we can … but don’t expect Car Talk!
HMS Genetics Computational Biology breakout session
About this session …
• Purpose: Encourage interactions with computationalists; discuss how computational methods can work effectively in research
• Session objectives:
4
Reason to be here What we hope to do
• Question about specific tools or specific data in your project
Tell you what we can … but don’t expect Car Talk!
• Broader questions about comp. bio. or how to approach larger computational issues
Give guidance on best practices and possible starting points for your thinking
• Just curious … Give you an idea of what we do
• What we get out of it: Learn about things important to you, ideas we can develop, possibilities for collaboration
• Feedback: Contact us or Vonda Shannon for suggestions or follow-up on this session
HMS Genetics Computational Biology breakout session
John Aach, PhD [email protected], 617-432-0061
• Currently: Lecturer, HMS Dept. of Genetics, Church Lab (since 1996)• Background
– PhD Boston U. 1985 (philosophy / psychology); BA Princeton U. 1975 (music)– many years as developer, manager, technology architect in IT
• Focus / interests: Like to be at interface between data and biology. – Church Lab comp. bio. requirements for “omics”, synthetic bio:
• Develop new forms of data by error analysis, integration with other data, etc.• modeling / performance assessment / optimization of technology or bio. system• fast-moving develop/demonstrate vs. production orientation• close interaction with bench; interface with many other fields
• Projects have included– next-gen sequencing analysis for Church Lab targeted sequencing methods– automated image processing for cell morphology (with Perrimon Lab)– mathematical modeling (“polony” formation, metabolic models)– (early) computational miRNA search (with Ruvkun Lab)– microarray analysis; expression data “time-warping” method
6
HMS Genetics Computational Biology breakout session
Peter Park
• Currently: Assistant Professor (2006-)• Background
– Instructor, HMS; Postdoctoral fellow, HSPH (biostatistics); – PhD Caltech 1999 (applied mathematics); BA Harvard 1994 (applied
mathematics)
• Focus / interests: – Microarray-based
• gene expression, ChIP-chip, copy number, microRNA• Platforms: Affymetrix, Agilent, Illumina, etc.
– Sequence-based• ChIP-seq, RNA-seq (SAGE-like vs whole transcript), copy number (whole-genome
sequencing/targeted sequencing)• Platforms: Illumina, SOLiD, Helicos
• Longwood sequencing facilities: Partners—Landsdowne/MGH, HMS Biopolymer, Children’s (SOLiD), DFCI (Helicos), others?
7
HMS Genetics Computational Biology breakout session
Mark Borowsky, PhD
• Molecular and cell biologist…– Developmental gene regulation in flies– Cell adhesion in vertebrates– Infectious disease
• …turned informaticist– Microbial genome sequencing and analysis– Human annotation and cDNA sequencing– Gene expression analysis– Next gen sequence analysis (Illumina)– Tools for bench scientists
Make computational analysis available to biologists to answer biological questions.
8
HMS Genetics Computational Biology breakout session
Some possible discussion topics
• What are some “best practices” in computational biology?• How do you design an experiment with a computational component?• How does one pick a set of tools for a problem?• When does one write custom software vs rely on pre-existing tools?• How do you work with a computational collaborator?• What are some models for computational biology support?• How do computational biology projects develop?• How is the field of computational biology evolving?• Other questions?
10
HMS Genetics Computational Biology breakout session
Computational “best practices” (John Aach)
Pay attention to fundamentals• Record every bit of analysis in scripts, spreadsheets, etc., and document
internally with comments and externally in a project log or notebook, so that they can be repeated and/or varied.
• Double-check every coded computation– Individual computations by working out examples by hand or other software– Whole systems by comparing with other systems or biological expectations
• Don’t re-invent the wheel. If a tool is available that does close to what you need, it’s worth trying to use it.
• Work closely with experimentalist partners to – assure you’re addressing the right problems– know what parameters affect computations– keep pace with changing protocols– provide feedback on experimental controls, performance, and data integration
• Keep your programs and files well organized, and write them to the level of performance needed by the project.– See Noble (2009) PLoS Comput Biol e1000424 for one set of recommendations
12
HMS Genetics Computational Biology breakout session
Plan your experiment with your computationalist (Mark Borowsky)
1. Define the biological question.
2. Choose metrics to evaluate the quality of your data.
3. Choose metrics to answer your question.
4. Determine how much data you will need to achieve significance.
5. Determine how many replicates you will need.
6. Define sources of bias and necessary controls.
7. Be realistic about yields from high throughput instruments.
8. Estimate a failure rate and plan extra samples.
14
HMS Genetics Computational Biology breakout session
Picking out computational tools (John Aach)
16
1. The central issue• All algorithms make
o assumptions about data (e.g., biological source, error models, data content …)o generate a computational result (alignment, expression significance, …)
• Research the algorithms and make sure you use ones that generate the results you need and that your data conforms with its assumptions
2. Routine problems• Use any tool that is convenient and is conformant with “best practices”
3. Complex problems• Break into main parts and look for solutions to each. Start with the harder parts.
4. Inevitable compromises• If algorithms don’t do exactly what you need or data assumptions not exactly met,
consider whether they are close enough to use with suitable adjustments• Performance / convenience important; influence overall research time allocation• Sometimes you simply can’t get an algorithm to run
5. Other considerations• Always check the results of the algorithm on data whose results you know• Try to keep abreast of new tools (difficult…)• If a choice of tools, choose ones that have been shown to perform better on similar
data. Otherwise, choose ones that have better theoretical foundation
HMS Genetics Computational Biology breakout session
When does one write software (John Aach; Mark Borowsky)
• Only when necessary!
• Existing tools don’t work, don’t perform well enough, or don’t integrate well enough to do a task
• You need to run the same processing repeatedly on different data sets or with changes restricted to a fixed set of parameters.
• You have an idea for a new algorithm
18
HMS Genetics Computational Biology breakout session
Care and handling of your computationalist (Mark Borowsky)
• Contact when planning experiments• Approach as collaborators• Educate
– Us about your biological system and questions– Yourself about computational approaches used/accepted in your field
(provide references)
• Ask to be educated about assumptions, costs and benefits• Discuss resources
– Development time– Compute time– Hardware and disk space (ours and yours)
• Ask about the queue• Ask what you can do to facilitate
Don’t get frustrated, get in touch.
20
HMS Genetics Computational Biology breakout session
Bioinformatics Support? (Peter Park)
• Software packages can be used to formulate hypothesis or carry out initial analysis
• But manual intervention is necessary to get to publication• Even miscellaneous tasks like data deposition could take a lot of time
when done manually.
• The problem will be more acute in the future with sequencing data.• Infrastructure will be an issue, given the size of data sets.• HMS Orchestra cluster is available (~$.70/GB/year for storage)
22
HMS Genetics Computational Biology breakout session
Possible models for Bioinformatics Support? (Peter Park)• Joint grants
– a considerable lag-time until the grant is funded, if at all• Fee for service
– there aren’t many places that offer service (some companies do this)– quality is unpredictable– investigators generally under-estimate the cost– small data sets often require as much work as large data sets
• Institutional core services– Institutional grants
• ‘Collaboration’ between labs– Enough incentives for the bioinformatician?
• How do other places do it?– Not many places do it well– How does Broad do it?– DFCI – additional computing charges in grants for personnel– DFCI - Center for Cancer Computational Biology– Harvard Catalyst Genetics and Bioinformatics Consulting Program for
“clinical and translational investigators”
23
HMS Genetics Computational Biology breakout session
Collaboration life cycle (Mark Borowsky)
25
Plan
Review
Refine AlgorithmGenerate more data
Analysisoutput
Generate primary data
Implementanalysismethods
HMS Genetics Computational Biology breakout session
Example (John Aach)
26
Need: automated morphology analysis (hard!)Phases1. initial analysis: established need for
stochastic labeling (with Amy Kiger and Pam Bradley)
2. Strategy development (with Chris Bakal)– different cell line more amenable to analysis– smaller no. of perturbations compatible with
less than full automation
3. Analysis methods development– supervised learning approach– statistics– integrate with biological knowledge
4. Publish
HMS Genetics Computational Biology breakout session
How computational biology evolves (John Aach)1. State of computational
biology at any given time = established tools & data + tools in development that
make “best guesses” at phenomena currently hard to measure
28
your data
your analysis
Low level data management
tools
Data analysis & interpretation
tools
Data-bases
imputation tools (forefront of research!)
Example DNA motif
discovery tools
HMS Genetics Computational Biology breakout session
How computational biology evolves (John Aach)
29
1. State of computational biology at any given time = established tools & data + tools in development that
make “best guesses” at phenomena currently hard to measure
2. New experimental techniques are developed and demonstrated that capture new relevant data
your data
your analysis
Low level data management
tools
Data analysis & interpretation
tools
Data-bases
imputation tools (still developing!!)
new experimental
technique
demo analyses
Example ChIP2
HMS Genetics Computational Biology breakout session
How computational biology evolves (John Aach)
30
your data
your analysis
Low level data management
tools
Data analysis & interpretation
tools
Data-bases
old imputation tools
maturingexperimental
technique
1. State of computational biology at any given time = established tools & data + tools in development that
make “best guesses” at phenomena currently hard to measure
2. New experimental techniques are developed and demonstrated that capture new relevant data
3. The techniques and their tools mature and their data gets put into databases some imputation still
needed, but much drops off databasing difficult due to
massive data and changing technology