View
222
Download
0
Category
Preview:
Citation preview
ORGANIZERS: SPONSORS:
#BeyondTheBench #BECareer2013
#CurrentExchange
Establishing your
online presence
Robert Aboukhalil
Why
? You’re being Googled
#1:
Lin
ked
In
Why LinkedIn? • Online CV + networking • Recruiters use LinkedIn • Find jobs posted on LinkedIn • Apply to jobs
www.linkedin.com/pub/robert-aboukhalil/84/a648df/
#2:
Fac
eboo
k
#3:
Tw
itter
#4:
You
r w
ebsi
te
Step 1: Wordpress.com
Step 1: Wordpress.com
Step 2: themeforest.net
Step 2: themeforest.net
Step 3: Have an awesome portfolio
Now
wha
t?
A language all scientists should know
How R helped me look at billions of genotypes and how it can help you too
Mitchell Bekritsky
WSBS Graduate Student
What is R?
• Language for statistical
analysis, data manipulation
and graphics
• Open source
• Flexible language
• Powerful built-in functions
• Strong user community
• Publication quality graphs
• Free!
Graphic from h=p://blenditbayes.blogspot.com/2013/06/visualising-‐crime-‐hotspots-‐in-‐england_25.html
Who uses R?
Source: h=p://www.revoluKonanalyKcs.com/what-‐is-‐open-‐source-‐r/companies-‐using-‐r.php
What is R used for?
• Movie recommendations
• Credit risk analysis
• Tailoring online advertising
• Predicting economic activity
• Clinical drug development
• News graphics
• Modeling oil spills
• Predicting election outcomes
Graphic from h=p://www.nyKmes.com/interacKve/2009/06/25/arts/0625-‐jackson-‐graphic.html
But I’m a biologist…
How R helped me see my data
• First time looking at microsatellite genotypes
• How many microsatellites differ from reference genome?
• By how much?
Problems:
– Lots of data (4.7 million genotypes)
– Complex information
– Too big for Excel
– No good graphics in Excel either
One of my first graphs in R
Lessons learned about my data
• Lots of microsatellites differ
from reference by a little bit
• Thousands differ by ± 20 bp
• 8.27% of all microsatellites
differ from reference (~400k)
Lessons learned about my graph
• This is a terrible graph
A bad R graph is better than no R graph
Bad graphs helped me
• Understand my data better
• Improve my analyses
• Improve how I communicate
my data
• R has incredible flexibility for
graphing—if you can dream it,
you can probably build it
A bad R graph is better than no R graph
Bad graphs helped me
• Understand my data better
• Improve my analyses
• Improve how I communicate
my data
• R has incredible flexibility for
graphing—if you can dream it,
you can probably build it
My best R graphs make one point clearly without clutter
For example…
How R saved my thesis
• Processing lots of sequencing
data in hundreds of people
• Too many people and
processes to monitor all steps
of pipeline by eye while data
was being processed
Sanity check
• After data processing did data
look bi-allelic?
How R saved my thesis
• Processing lots of sequencing
data in hundreds of people
• Too many people and
processes to monitor all steps
of pipeline by eye while data
was being processed
Sanity check
• After data processing did data
look bi-allelic?
No!!
Troubleshooting using R
• People don’t actually have massive deletions and amplifications
• My pipeline was deleting files because of a bug, which would
remove large chunks of chromosomes
• Thanks to R, I found people where this had happened, tracked
down the bug, and didn’t report massive CNVs in autistic children
Side note
• If it looks too good to be true, it probably is
R helped me build a better genotyper
• Some non-reference alleles
aren’t covered well
• Leads to incorrect genotype
calls
Problem
• How do I develop a smarter
genotyper and know that it
works?
R helped me build a better genotyper
• Some non-reference alleles
aren’t covered well
• Leads to incorrect genotype
calls
Problem
• How do I develop a smarter
genotyper and know that it
works? 0 20 40 60 80 100
020
4060
80100
chr19:54772760 A repeat, reference length 8
8 bp allele coverage
10 b
p al
lele
cov
erag
e
Genotypes10|-110|108|-18|108|8
Modeling genotypes in R
• Built a model for biased
genotypes in R
• Model helped me build a more
accurate genotyper
• When applied to real data,
clear improvements
R finds de novo mutations for me
• >300 million genotypes
• How do I find de novo mutations in all that data?
R to the rescue!
What R has done for me
Data mining
• Finding de novo mutations
• Quality control for my data
Data manipulation
• Converting raw read counts to genotypes
Data simulation and modeling
• Finding ways to improve my genotyper
Data visualization
R has extensive support for biologists
Bioconductor is an incredible resource for biological analyses in R
• Microarrays
• Differential expression (DESeq, edgeR, cummeRbund)
• Gene models
• Flow cytometry (flowCore, flowStats, flowViz)
• Interacting with Ensembl, Cosmic, Gramene, etc. (biomaRt)
Installing R
• R can be downloaded from r-
project.org
• R runs on PCs, Macs and
Linux computers
• The R project website has an
R manual to get you started
Working in R
Native R interface can be hard to
work with
• Lots of windows
• Difficult to keep things
organized
RStudio interface
• All your variables, help pages,
script windows and consoles
in one place
• Highlights R code for easier
programming
• Tabbed windows for multiple
scripts
• History saves all previous
commands, plot history saves
all previous plots
• Find it at rstudio.com
Learning R
Many online tutorials
• R has its own introduction
• Statistics Using R with Biological Examples
Take interesting data, use it to explore R
• Plot, graph, use statistical tests
Ask someone who knows R
• Getting started is pretty easy
• Learn what you need when you need it
Thanks!!
The Bioscience Entreprise Club is dedicated to helping CSHL’s science research professionals and alumni cultivate and leverage their cross-disciplinary skill sets and expertise to transition into diverse careers.
Current Exchange is CSHL’s very own student-run magazine. We feature articles about science aimed at a general audience. Check out our inaugural issue at issuu.com/currentexchange Send your articles to raboukha@cshl.edu by November 5, 2013
Recommended