Getting Started with CellProfilerGetting Started with CellProfiler
Mark-Anthony Bray, Ph.DImaging Platform, Broad InstituteCambridge, Massachusetts, USA
2
Software OverviewSoftware Overview
• Available from www.cellprofiler.org• Free, open source (Python)• Software available for Windows, Mac and Linux
Image Analysis &Quantification
Image-centric Data Analysis
3
CellProfiler: OverviewCellProfiler: Overview
• ProcessProcess large sets of images• Identifies and measuresIdentifies and measures objects• ExportExport data for further analysis
• Goal: Provide powerful image analysis methods with a user-friendly interface
• Philosophy: Measure everything, ask questions later...• Support data analysis based on individual cells
5
Typical CellProfiler Pipeline WorkflowTypical CellProfiler Pipeline Workflow
• For image-based assays, the basic objective is always to – Identify cells/organisms – Measure feature(s) of interest
• The uniqueness of each assay comes in– Deciding what compartments to identify and
how to identify them – Determining which measure(s) are most useful
to identify interesting samples
7
The CellProfiler InterfaceThe CellProfiler Interface
• Pipeline panel: Displays modules in pipeline– Modules executed in order from top to bottom
Change module position
Add or remove modules
Module help
8
Load pipeline by double-clicking on it
View images by double-clicking on the filename
The CellProfiler InterfaceThe CellProfiler Interface
• File panel: Displays files in default image folder
9
The CellProfiler InterfaceThe CellProfiler Interface
• The figure window has additional menu options
• Toolbar menu: Pan, zoom in/out
• CellProfiler Image Tools– Image Tool (also
displayed by clicking on image)
– Interactive zoom– Show pixel data
(location, intensity)
10
The CellProfiler InterfaceThe CellProfiler Interface
• Folder panel: Change default input and output directories– Usually these should be separate folders
Input folder: Contains images to be analyzed
Output folder: Contains the output file plus exported data and images
11
The CellProfiler InterfaceThe CellProfiler Interface
• Settings panel: View and change settings for each module– Clicking on a different module updates the settings view
12
Module CategoriesModule Categories• File processing: Image
input, file output
• Image processing: Often used for pre-processing prior to object identification
• Object processing: Identification, modification of objects of interest
• Measurement: Collection of measurements from objects of interest
• Data Tools: Measurement exploration, measurement output
13
The First Module: LoadImagesThe First Module: LoadImages
• Related how? Depending on the imaging device, one file may represent– One channel at one imaging location– Multiple channels at one imaging location– Multiple channels at multiple locations– Etc…
• Loads an “image set” which is a group of related images, in preparation for further processing
DNA GFP
14
The First Module: LoadImagesThe First Module: LoadImages• Can use text matching to define the difference between images in a set
All images stained for GFP have the text Channel1- in the name
Same for DNA images (Channel2-)
Assign each image a meaningful name name for downstream reference
16
•16
What Is An “Image”?What Is An “Image”?
•Images from Carolina Wahlby
17
Object IdentificationObject Identification• Once the images are loaded, how do you find objects of
interest?
• Step 1: Distinguish the foreground from the background by picking a good threshold
• Step 2: Identify objects as regions brighter than the threshold
• Step 3: Cut and join objects to “improve” their shape
18
Primary Object IdentificationPrimary Object Identification
• Many options for thresholding, cut and join methods, etc.
19
ThresholdingThresholding
• Definition: Division of the image into background and foreground
• Method: Pick the method that provides the best results– Otsu: Default - Good for readily identifiable foreground / background – Background, RobustBackground: Good for images in which most of
the image is comprised of background
• What is the best threshold value for dividing the intensity histogram into foreground and background pixels…
Here?
Or here?
Pixel values
Fre
qu
en
cy
20
ThresholdingThresholding
• Correction factor– Multiplication factor applied to threshold– Adjusts threshold stringency/leniency– Setting this factor is empirical
• Upper/lower bounds– Set safety limits on automatic threshold to
guards against false positives– Helpful for unexpected images: Empty wells,
images with dramatic artifacts, etc
21
Object SeparationObject Separation
• Once the foreground objects have been identified, we need to distinguish multiple objects contained in the same “clump”
Images from Carolina Wahlby
•••
••
••
•
22
Object SeparationObject Separation
• Two step process in “de-clumping”1. Identification of the objects in a clump2. Drawing boundaries between the clumped objects
Adjust settings to “de-clump” objects
23
Object SeparationObject Separation
– Intensity: Works best if objects are brighter at center, dimmer at edges
– Shape: Works best if objects have indentations where clumps touch (esp. if objects are round)
Peaks
2
1 2
Indentations
• Clump identification: Two options
1
1
•••
••
••
•
24
Object SeparationObject Separation
– Distance: Draws boundary lines midway between object centers
– Intensity: Draws boundary lines at dimmest line between objects
• Test mode allows users to view results of all setting combinations
• Drawing boundaries: Two options
1
•••
••
••
•
25
Object SeparationObject Separation
• Additional separation settings: Adjust these settings if objects are being incorrectly split into pieces or merged together
Original image Smoothing filter
size = 4
Smoothing filter
size = 8
• Smoothing: Increase to reduce intensity irregularities which produce over-segmentation of objects
26
Object SeparationObject Separation
• Suppress Local Maxima– Smallest distance allowed between object intensity
peaks to be considered one object rather than a clump– Decrease to reduce improper merging of objects in
clumps
Original image Maxima
distance = 4
Maxima
distance = 8
Maxima
27
Object SeparationObject Separation
• Adjusting these parameters can produce more improper segmentation than it solves
• The proper settings are usually a matter of trial and error– The automatic settings are a good starting point, though
• However….
Original image Smoothing filter
size = 4
Smoothing filter
size = 8
28
Filtering Invalid ObjectsFiltering Invalid Objects
• See FilterObjects module for more advanced filtering options
Discard objects that fail size criterion or touch the image border
29
Primary Object IdentificationPrimary Object Identification
• Colors used to label each segmented object– Shows if each object has
been identified and separated properly
• Outlines highlight valid objects– Green: Valid– Yellow: Invalid – Touching
border– Red: Invalid – Size
criterion
• Gives object count as a measurement
30
Secondary Object Identification Secondary Object Identification • Goal: Identify individual cell boundaries by “growing” primary objects
using a staining channel– Nuclei typically more uniform in shape, more easily separated than cells
• Segment nuclei first, then use segmented nuclei to start cell segmentation
31
Secondary Object IdentificationSecondary Object Identification
• Methods– Distance-N: Ignores image
information• Useful in cases where no cell
stain is present
– Watershed, propagate, Distance-B: Uses image information
• Finds dividing lines between objects and background / neighbors
• Test mode allows user to view results of all methods
Propagation
Distance-N
32
Secondary Object IdentificationSecondary Object Identification
• Regularization: Controls the precise dividing line between cells that touch each other– Performed by balancing between intensity and distance– Usually not adjusted
• Correction factor, lower/upper bounds on threshold: Same purpose as in IdentifyPrimaryObjects
Regularization = 0 Regularization = ∞
33
Tertiary Object IdentificationTertiary Object Identification
• Goal: Identify tertiary objects by removing the primary objects from secondary objects – “Subtract” the nuclei objects from cell objects
to obtain cytoplasm
Cells Nuclei Cytoplasm— ═
34
Measurement Modules: Object MorphologyMeasurement Modules: Object Morphology
Select the objects to measure
35
Module: MeasureObjectAreaShapeModule: MeasureObjectAreaShape
• Goal: Measure morphological features such as – Area– Perimeter– Eccentricity– MajorAxisLength– MinorAxisLength– Orientation– FormFactor: Compactness measure, circle = 1, line = 0
36
Measurement Modules: Object IntensityMeasurement Modules: Object Intensity
Select the image to measure from
Select the objects to measure
37
Module: MeasureObjectIntensityModule: MeasureObjectIntensity
• Goal: Measure object intensity features such as– Integrated intensity: Sum of the pixel intensities within
an object– Mean, median, standard deviation intensities– Maximal and minimal pixel intensities– Lower/Upper quartile
• The object intensity may be obtained from any image, not just the image used to identify the object– Example: Ph3 intensity may be measured using the
nuclei objects
38
Measurement Modules: Object TextureMeasurement Modules: Object Texture
Select the image to measure from
Select the objects to measure
Select the spatial scale
39
MeasureObjectTextureMeasureObjectTexture
• Goal: Determine whether the staining pattern is smooth on a particular scale
• Selection of the appropriate texture scale is essentially empirical– A higher number measures larger patterns of texture– Smaller numbers measure more localized (finer)
patterns of texture
• Can also add several texture modules to the pipeline, each measuring a different texture scale
40
Other Measurement ModulesOther Measurement Modules
• CalculateMath: Arithmetic operations for measurements• CalculateStatistics: Assay quality (V and Z' factors) and
dose response data (EC50) for all measurements
• Image-based measures– MeasureImageAreaOccupied– MeasureImageGranularity– MessureImageIntensity
• Object-based measures– MeasureCorrelation– MeasureObjectNeighbors– MeasureRadialDistribution
41
Data Export ModulesData Export Modules
• User may output images or image measurements
Select the objects to export
42
Measurement DisplayMeasurement Display
• The average measurements for all objects in the image are displayed in the figure window
• However, the individual measurements for each object are stored in the output file
43
Data Export ModulesData Export Modules
• Goal: Retain images of intermediate image processing steps for quality control or save measurements for later analysis and exploration
• SaveImages: Writes an image to a file– Intermediate images in the pipeline are not saved unless
requested– Choice of many image formats to write → module can be used as
an image format converter
• ExportToSpreadsheet: Export measurements as a comma-separated file readable by spreadsheet programs
• ExportToDatabase: Export measurements as a per-object and per-table plus configuration file for upload to a MySQL database
45
Illumination CorrectionIllumination Correction
• The physical limitations of any microscope produce nonuniformities in the optical path of the sample, microscope, and/or camera
• Example: Tiling raw images shows that there is uneven illumination from left to right in each image– This heterogeneity can lead to inaccurate intensity
measurements – A cell located at (a) is brighter than one at (b) even if the
cells have the same amount of fluorescent material
(a) (b)
Carpenter et al, Genome Biology 2006, 7:R100
46
Illumination CorrectionIllumination Correction
• Illumination correction ensures that object segmentation and measurements (e.g. DNA content) are more accurate
Carpenter et al, Genome Biology 2006, 7:R100
47
Illumination CorrectionIllumination Correction
• Two modules– Correct Illumination Calculate: Creates a illumination correction function– Correct Illumination Apply: Applies the function to your images
• Available options– Correct each image individually, or all images together as an ensemble?– Calculate the illumination function by using foreground pixels or
background pixels?– Apply the function using division or subtraction?
• Additional considerations– Create a new illumination correction function if you image on a different
microscope or change plates– Correct each channel since absolute illumination intensities may differ
between channels– First, create and save the function from image set, then load and apply it
prior to identification
48
Cluster ComputingCluster Computing
• If processing time is too great on a single computer, then run the pipeline on a cluster– Download and install CellProfiler on a computing
cluster– Add the ExportToDatabase module– Add the CreateBatchFiles module to the end of the
pipeline and configure it appropriately– Run the first image cycle locally– Submit the batches to your cluster for processing– Check the progress of processing
• For really big screens, it is necessary to process images in batches on a computing cluster.
49
Data AnalysisData Analysis
• At the end of a pipeline, you may have 500+ features per cell– Size, shape, staining intensity, texture
(smoothness), etc
• Remember our Philosophy: “Measure everything, ask questions later...”
50
Data AnalysisData Analysis
• What does this data set look like? • Cytological profile, or Cytoprofile
• Shows all the measurements acquired– For each individual cell – In every image – In the entire experiment.
+1
0
-1
Cell #6111617
-.2 .7 -.1 0 .2 -.9
51
CellProfiler Analyst: OverviewCellProfiler Analyst: Overview
• ExploreExplore data large sets of images• IdentifyIdentify interesting subpopulations and see
the original images• Identify Identify interesting phenotypes automatically
• Goal: Provide the user with a powerful suite of image exploration and machine learning methods
52
The CellProfiler Analyst InterfaceThe CellProfiler Analyst Interface
• CellProfiler Analyst (CPA) allows you to explore the data with a variety of tools
• Upon startup, CPA request a properties file which contains– Locations of the measurement tables– How the images are referenced– Other assorted information
53
Plate ViewerPlate Viewer
• Displays data in plate layout– 96- or 384-well format– Measurements are shown as color-coded wells or mouse tool-
tips– Right-clicking on well reveals list of images to display
54
Image ViewerImage Viewer
• Displays an image referenced by number
• Color display– Colors are assigned to
each channel of image data
– Shown as a merged color image
– Toggle channel visibility and color scaling
55
Plotting ToolsPlotting Tools
• Various plotting tools allow user to explore and sift through the measurements and make discoveries
56
Data AnalysisData Analysis
• Why make so many measurements?– For many screens, only a few measurements
are necessary to obtain the phenotype
X-axis: DNA content
Y-a
xis:
pho
spho
-H3
stai
ning
57
Data AnalysisData Analysis
• Unfortunately, for other phenotypes, the proper features are not so simple to find…
Wild-type HT29 cells
Cells on the move
Crescent-shaped nuclei Peas in a pod
Crooked projections
Actin dots at junctions
Long projections
Hyphae-like projections
58
Data AnalysisData Analysis
• Concentrating on single cells allows us to avoid problems of heterogeneous populations, and to detect rare events (such as mitosis)
• However, determining which combinations of features and values are appropriate for a phenotype is tedious and impractical
• We have included a machine learning classification tool to automatically chose the features and values require to score a rare or subtle phenotype
59
Automated Cell Image ProcessingAutomated Cell Image Processing
• Cytoprofile of 500+ features measured for each cell
104 images, 103 cells in each:Total of 107 cells/experiment
Thousands of wells
Each cell with cytoprofile
60
Iterative Machine LearningIterative Machine Learning
• System presents ~500 cells to biologists for scoring
• System defines rule based on cytoprofile of scored cells
YesYes
Rule
Iteration
NoNo
61
Iterative Machine LearningIterative Machine Learning
• Scored cells are sorted by well: Identify samples with a high proportion of positive cells
Scored
107 cells
Rule
62
Final NotesFinal Notes
• Where to get help– Access help from the CellProfiler main window– Ask for help on the CellProfiler.org forum
63
Image assay developmentApply image analysis methods to biological questions
Mark Bray
Anne Carpenter David
Logan
Algorithm development & software engineeringDevelop & test new image analysis and data mining methods
and create open-source software tools
IT/Administration
Peggy (Margaret) Anthony
Kate Madden
RayJones
Vebjørn Ljoså
Auguste Genovesio(begins 2010)
Adam Fraser
Carolina Wählby
The TeamThe Team
Lee Kamentsky
Director