17
Integrating automized Semantic Annotation and Information Visualization for the analysis of multichannel Fluorescence Micrographs from Pancreatic Tissue Julia Herold a , Sylvie Abouna b , Luxian Zhou b , Stella Pelengaris c , David Epstein d , Michael Khan b , Tim W. Nattkemper a a Biodata Mining & Applied Neuroinformatics Group, Bielefeld University, Germany b Biomedical Research Institute, Department of Biological Sciences, University of Warwick, Coventry CV4 7AL, UK c Medical School, University of Warwick, Coventry CV4 7AL, UK d Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK Abstract The computational analysis of bioimages and in particular fluorescence micrographs received much attention these days. However, the computer science community seems to focus only on the problem of segmentation of images and automated classification of regions of interest (ROI). One particular strength of fluorescence microscopy is the simultaneous visualization of multiple molecules by staining with different fluorescent dyes. So in the analysis of such multichannel images, segmentation of ROIs resembles only a first step which must be fol- lowed by a second step for the analysis of the ROI’s signals in the different channels. In this paper we present a system that combines image segmentation and information visualization principles for an integrated analysis of fluorescence micrographs of tissue samples. The anal- ysis aims at the detection and annotation of cells of the Islets of Langerhans and the whole pancreas, which is of great importance in diabetes studies and in the search for new anti- diabetes treatments. The system operates with two modules. The automatic annotation module applies supervised machine learning for cell detection and segmentation. An infor- mation visualization module can be used for an interactive classification and visualization of cell types following the link-and-brush principle for filtering. Key words: Fluorescence Microscopy, Bioimage Informatics, Image Processing, Pattern Recognition, Machine Vision, Information Visualization, Exploratory Data Analysis (EDA) 1. Introduction Nowadays, the rapidly approaching field of Bioinformatics is developing a new branch: Bioimage Informatics. The established lab techniques allow a very broad and high-dimensional Email addresses: [email protected] (Julia Herold) Preprint submitted to Computerized Medial Imaging and Graphics March 27, 2009

Integrating automized Semantic Annotation and Information ... · Integrating automized Semantic Annotation and Information Visualization for the analysis of multichannel Fluorescence

Embed Size (px)

Citation preview

Integrating automized Semantic Annotation and Information

Visualization for the analysis of multichannel Fluorescence

Micrographs from Pancreatic Tissue

Julia Herolda, Sylvie Abounab, Luxian Zhoub, Stella Pelengarisc, David Epsteind, MichaelKhanb, Tim W. Nattkempera

aBiodata Mining & Applied Neuroinformatics Group, Bielefeld University, GermanybBiomedical Research Institute, Department of Biological Sciences, University of Warwick, Coventry CV4

7AL, UKcMedical School, University of Warwick, Coventry CV4 7AL, UK

dMathematics Institute, University of Warwick, Coventry CV4 7AL, UK

Abstract

The computational analysis of bioimages and in particular fluorescence micrographs receivedmuch attention these days. However, the computer science community seems to focus onlyon the problem of segmentation of images and automated classification of regions of interest(ROI). One particular strength of fluorescence microscopy is the simultaneous visualizationof multiple molecules by staining with different fluorescent dyes. So in the analysis of suchmultichannel images, segmentation of ROIs resembles only a first step which must be fol-lowed by a second step for the analysis of the ROI’s signals in the different channels. In thispaper we present a system that combines image segmentation and information visualizationprinciples for an integrated analysis of fluorescence micrographs of tissue samples. The anal-ysis aims at the detection and annotation of cells of the Islets of Langerhans and the wholepancreas, which is of great importance in diabetes studies and in the search for new anti-diabetes treatments. The system operates with two modules. The automatic annotationmodule applies supervised machine learning for cell detection and segmentation. An infor-mation visualization module can be used for an interactive classification and visualizationof cell types following the link-and-brush principle for filtering.

Key words: Fluorescence Microscopy, Bioimage Informatics, Image Processing, PatternRecognition, Machine Vision, Information Visualization, Exploratory Data Analysis (EDA)

1. Introduction

Nowadays, the rapidly approaching field of Bioinformatics is developing a new branch:Bioimage Informatics. The established lab techniques allow a very broad and high-dimensional

Email addresses: [email protected] (Julia Herold)

Preprint submitted to Computerized Medial Imaging and Graphics March 27, 2009

view on the molecular statistics. Genome-oriented projects aim at sequencing the wholegenome of multiple organisms and many genes can be functionally annotated by wet labexperiments or using sequence similarity to annotated genes. However, the function of mostgenes can not be determined using the genome-based approach and for most genes the func-tion is still unknown [1]. Although additional information is gained by 2D gel experimentsin proteomics, the lack of spatial information limits the significance of the data, if one at-tempts to map proteins to functions or to understand molecular networks. This point hasbeen identified as the situation where imaging can play a significant role [2] in further under-standing how proteins contribute to cell function. It has the benefit of, for example, beingable to capture quantitative data, offering subcellular resolution and by using fluorescencelabeling, multiple proteins can be analyzed in the same sample making it possible to studypatterns of co-localization [3]. Due to developments in the field of bioimaging, an increasingamount of data needs to be analyzed, so that manual evaluation is no longer feasible. Inmost applications of imaging in biomedical research, it is necessary to restrict the analysis ofimage features, e. g. protein location, to biologically significant objects in order to reduce theamount of data to be analyzed and to enable a semantic interpretation of the informationobtained. This semantic annotation can be, for example, identification of individual cellsor cell compartments which is mostly achieved by intensity-, shape- or texture-based seg-mentation. As a next step, quantitative information is to be extracted from the annotatedregions, i. e. cells, and written to a data base. To explore the hidden patterns in this data,the quantitative values must be visualized with a direct linkage to the image domain.

In this paper we demonstrate how machine learning based semantic annotation, in com-bination with visualization, can be used to obtain information relevant to diabetes studyand to screening of new anti-diabetes treatments. Increasingly, researchers are interestedin the ratio or balance of alpha and beta cells as type 2 diabetes, which affects around200,000,000 people worldwide, is primarily caused by a defective number of beta cells, whichproduce the sugar lowering hormone insulin. However, excess of alpha cells and glucagonproduction, which raise glucose, may contribute to hyperglycaemia. Also current drugs maypartly work by reducing glucagon. Our approach operates in two steps. First, SVMs areapplied for semantic annotation of cell nuclei of the Islets of Langerhans and whole pancreasin pancreatic tissue sample images. Based on the annotation result quantitative informationfor each cell is extracted from two additional fluorescence images. In the second step, thesignal parameters for each cell are visualized in a scatter plot. Using sliders, thresholdscan be applied to the parameters to classify cells as alpha- or beta cells, which is directlyvisualized in an image display.

2. MATERIAL

Images are obtained by confocal microscopy with a ×40 objective. Each tissue sec-tion is immunostained for two proteins, namely insulin and glucagon. Insulin is pro-duced by beta cells (β-cells) and glucagon by alpha cells (α-cells), so this staining en-ables us to identify and distinguish these different cell types. In addition we use DAPI(4’,6-Diamidino-2-phenylindol) which stains DNA blue irrespective of the cell type. As an

2

example fig. 1 displays the images obtained for one tissue sample. Figure 1(left) showsthe staining with DAPI displaying nuclei, 1(center) α-cell immunostaining for glucagon and1(right) β-cell immunostaining for insulin.

3. METHODS

In the following section we describe the image acquisition setup as well as the individualimage analysis steps which are displayed schematically in figure 2. After image acquisition,a semantic image annotation is carried out with a supervised machine learning approach tolocate biological significant objects, i.e. cell nuclei (step (a) in figure 2) [4]. Subsequently,object specific features are extracted and visualized for each image channel, i.e. stainedprotein (step (b)). Based on this visualization, the feature as well as the image domain canbe interactively explored to differentiate α-, β-, and non-islet-cells based on their underlyingmultichannel characteristic (step (c)).

3.1. Image acquisition and preprocessing

Images of pancreatic tissue sections were obtained by confocal microscopy with a ×40objective. We used DAPI (4’,6-Diamidino-2-phenylindol) to stain the DNA of each cell witha blue color, which allows for the identification of cell nuclei. Additionally, each section wasimmunostained against two proteins, glucagon and insulin, which allows to identify α- andβ-cells as they produce glucagon and insulin, respectively. We thereby obtain three channelsfor each tissue section each of 512×512 pixels in size representing nuclei (DAPI-channel),α-cells (glucagon-channel), and β-cells (insulin-channel). Exemplary, figure ?? shows imagesof one tissue section stained with (left) DAPI, and immunsostained against (center) glucagonand (right) insulin. Images were manually enhanced for better visibility.

Images were normalized to cover the whole 8bit intensity range of 0-255. To reducenoise in the images, we applied bilateral filtering to each image [5]. The geometric spreadparameter σd was fitted based upon the maximum object size (Nmax), measured in pixels,as σd = Nmax−1

4. The photometric spread was set to σr = 20 which has proven to be an

appropriate value for our image domain.

3.2. Semantic image annotation

A support vector machine approach, as also described in [4], was applied for semanticimage annotation, i.e. localizing cell nuclei stained with DAPI. Therefore, each image loca-tion had to be classified as being the center of a cell nucleus or not. The detection approachproposed consists of three steps. First, as the SVM is a supervised machine learning ap-proach, a training step has to be performed. Second, the trained SVM is applied pointwise to each image location. In a third step, the SVM output is post-processed to computenucleus center positions.

To obtain a representative training set for SVM training, a DAPI stained nucleus imagewas manually evaluated by a human expert labeling an equal number of positive and negativetraining samples. For each labeled position px,y an image patch of size N×N (N = Nmax+2)centered at px,y was extracted and the gray values were written to a N2 dimensional gray

3

value feature vector xx,y. The parameter Nmax represents the maximum object size andneeded to be manually set by the user. As we can assume that the class of a patch isinvariant with respect to rotation we increased the size of our training-set fourfold by rotatingeach patch four times by an angle of β = k ∗ 360◦

4with k = {0, 1, . . . , 3}. The dimension

of xx,y was reduced by a PCA[6] projection onto the first eight eigenvectors with highesteigenvalues resulting in an eight dimensional feature vector x̃x,y. This dimension reductionallowed for fewer training samples and a faster computation. Training was performed witha Gaussian kernel. Kernel parameterization was achieved through a fully automatic ten foldcross validation.

To detect cell nuclei in the whole micrograph, the trained SVM was applied point wiseto each image location px,y. For each loacation px,y the feature vector x̃x,y was extractedand classified by the trained SVM. If the patch was classified as a nucleus patch, the SVMconfidence value, i.e. the distance to the hyperplane, was written to position px,y of a newmatrix of the same size as the original image, called the confidence map. After classificationof each image location, this confidence map was evaluated to obtain exact nucleus positions.Areas of high confidence correspond to nucleus areas where the highest value is likely tobe the center of the nucleus. Therefore, local maxima exceeding a user defined thresholdt within a Nd × Nd area with Nd = N−1

2were considered as nucleus positions. Nd can be

considered as the distance between two separate objects. Modifying the threshold t allowsfor interactively adapting the number of detected nuclei according to their confidence value.

The performance of the SVM was measured by evaluation against a gold standard ob-tained through manual labeling of all nuclei in the image. The standard ROC[7] performancemeasures sensitivity (SE) and positive predictive value, in bioinformatics also referred to asspecificity or precision, were used to statistically assess the detection performance of theSVM. SE reflects the percentage of nuclei labeled by the human expert which were alsodetected by the SVM approach. PPV measures the percentage of nuclei detected by theSVM which were also labeled by the expert.

3.3. Multichannel feature extraction

The semantic annotation, described in the previous section, enables to restrict the evalu-ation and analysis of further channels to regions of interest. Multichannel features can thusbe linked to specific biological objects. Furthermore, the data which needs to be analyzedis reduced to a minimum.

Throughout this study, we evaluated the object specific characteristics of two additionalchannels to classify a nucleus as an α-, β- or non-islet-cell nucleus. In a first step, we ex-tracted ROIs based on the semantic annotation. Therefore, a nucleus border was calculatedfor each detected nucleus position px,y, see figure 3(a). A Nmax×Nmax image patch centeredat px,y was extracted and a gray value closing was performed[8].

The resulting image was thresholded with the Otsu method[9] and the nucleus borderpixels were calculated by applying morphological and logical operations[8].

Based on the extracted nucleus borders, object specific features were extracted for eachchannel. In this study, we applied two different feature extraction strategies. For thefirst strategy, both channels were classified into object and background pixels via Otsu

4

thresholding. For each channel, we calculated the percentage of nucleus border pixels (bordercoverage) which were also object pixels in the thresholded channel as shown in figure 3(b).This strategy holds the advantage that, based on the Otsu thresholded images, we wereadditionally able to compute the islet specific α- and β cell mass with respect to the wholeislet mass, a measure relevant for diabetes study. To obtain the whole islet mass, thethresholded insulin and glucagon channels were merged and all object pixels were counted.However, the border coverage strategy always requires a binarization of the image throughthe Otsu method, which can lead to non desirable results. Therefore, we also present a secondstrategy which does not require a thresholding of the images. Here, a median intensity forthe nucleus border was calculated for each channel (see figure 3(c)). Both of the featureextraction methods produced two dimensional feature vectors z(x,y) = (z1, z2) for eachnucleus. Based on these feature vectors, cell nuclei were classified as α-, β- or non-islet-cellnuclei: If the median intensity or the border coverage, depending on the feature used for theanalysis, exceeded a user defined threshold, i. e. z1 > tα or z2 > tβ the nucleus was classifiedas α- or β-cell nucleus, respectively. The remaining nuclei were classified as non-islet-cellnuclei. Different measures were calculated based on this classification as e.g. number of α-and β-cells, percentage of α- and β-cells with respect to all cells, and the α- to β-cell ratio.This ratio is relevant for diabetes studies and testing of anti-diabetes treatment as type 2diabetes is primarily caused by a defective number of β-cells but an excess of α-cells andthus glucagon production may contribute to hyperglycaemia.

To obtain a gold standard for the classification of α- and β-cells, image stacks weremanually evaluated fourfold by a human expert. The average of the α- and β-cell count wasused as the gold standard.

3.4. Visual data mining and interaction

Appropriate visualization techniques which allow to interactively and simultaneouslyexplore the feature domain as well as the image domain are required for data analysis. Es-pecially data exploration in the image domain is important as biological experts are veryfamiliar with interpreting data in this domain and (probably even more important) thespatial image domain represents the special information gain, complementary to the stan-dard non-spatial techniques, as outlined in the Introduction. To explore different variablesin different domains, the new field of Information Visualization [10, 11] proposes severaltechniques. one of the most powerful approaches is the link & brush principle. In a link& brush-based data exploration, the variables are plotted in at least two different displays.Using a mouse pointer, variable subsets are selected in one display and automatically high-lighted in the other display(s). Throughout this study, we analayzed two channels in additionto the channel for semantic annotation. It was therefore sufficient to use a scatter plot forshowing the cellular features (z1, z2) in one display (see the four background windows inFigure 4). In the scatter plot, two sliders can be used to select thresholds tα and tβ for thetwo variables. Each new selection of a threshold triggers a new display in the image domain(α− or β−channel) where all cells with variable values above the threshold are highlightedthrough different colored boxes or different symbols, as shown in Figure 2. Here, overlays ofthe different channels as well as the detection and classification results were constructed and

5

updated according to changes of the channel specific thresholds. If the number of features,i.e. channels, would increase the display would be extended into a scatterplot matrix andlink & brush techniques would be employed.

4. RESULTS

In the following section we present results obtained for two sample images. First, theperformance of the nucleus detection is shown followed by the results of the cell type classifi-cation. As a comparison, results for the pixel wise clustering of the image data is presented.

4.1. Nucleus detection

For nuclei detection, a SVM was trained with 59 positive and 59 negative hand selectedsamples from a tissue sample stained with DAPI. The parameter Nmax was set to Nmax = 17.The trained SVM was then used for nucleus detection on two previously unseen sampleimages A and B. A confidence threshold of t = 0 was applied for both images. A total of283 and 517 nuclei were detected for tissue sample A and B, respectively. With respect tothe gold standard obtained by human labeling a SE of 91% and a PPV of 96% was achievedaveraged over both detection results. Figure 5(a) presents a detection result for tissue sampleA overlaid on top of the original nucleus image. Correctly detected nuclei (true positives,TP) are marked in white, false positives (FP), thus nuclei which were detected by the SVMbut not marked by the expert, in red and false negatives (FN), nuclei marked by the expertbut not detected by the SVM, in orange. Most FP were cell nuclei showing low intensity anddiffuse boundaries. However, taking a closer look at these FP, see fig 5(b), shows that theywere correctly detected as nuclei and were missed by the expert. FN detections were mostlynuclei which were very small and close to another bigger and brighter nucleus. However,most overlapping and non regular shaped nuclei were correctly detected.

4.2. Feature extraction and data mining

For both tissue samples, nucleus boundaries were extracted as described in 3.3. Fig-ure 6(a) and (b) shows the extracted boundaries for sample A and B, respectively. Nu-cleus boundaries were extracted properly especially for separated nuclei. Overlapping nucleimostly featured overlapping boundaries at their intersection point, as displayed in figure6(c) with a lighter gray color. This information could be used to find the appropriate sepa-ration point if required. For our study, however, the overlapping boundaries did not featurea problem.

Both feature extraction strategies, described in section 3.3, were applied to extract a twodimensional feature vector for each cell nucleus. Based on the scatterplot visualization, apreliminary threshold was manually set. Through the interactive linking of the two visual-izations, scatterplot and image visualization, the threshold could conveniently be fine tuned.Figure 7(a) shows the intensity scatterplot for the intensity feature for tissue sample A. In afirst step, thresholds were set to 15 for the glucagon-channel and 20 for the insulin-channel.By analyzing the image visualization, we could see that too many nuclei were classified asβ-cell nuclei. By interactively changing the threshold, it became clear that 30 was a more

6

appropriate threshold for the insulin-channel. The resulting classification is shown in figure7(b) and (c). Similar fine tuning was performed to set the thresholds for the border coveragefeatures. A scatter plot visualization of the border coverage feature and the classificationresult for sample B is shown in figure 7(d-f).

Table 1 displays the number of nuclei in both samples classified as α- and β-cell nuclei forboth feature strategies as well as the human counting result. The percentage of each classwith respect to all detected nuclei is shown in braces. As table 1 shows, our proposed semi-automatic approach very well agreed with the human expert’s countings. However, therewas a mismatch in the counting of the α-cell nuclei for sample B for both feature extractionstrategies. By having a closer look at the nuclei classified as α-cell nuclei we could see thatthere was no nucleus missed which should be counted as an α-cell nucleus. The classificationresult for the α-cells of sample B with the border coverage feature is shown in figure 7(e). Inthis figure, you can see that all cells showing a nucleus were correctly classified but there arecells which do not show a nucleus but glucagon staining. This explains the disagreement, asthese cells were also counted by the human expert.

Besides the number of classified cells, we also computed the α- to β-cell ratio for eachanalysis, as this is a valuable information for diabetes study. Here, a ratio of 0.24 wascalculated for sample A with the border coverage feature and a ratio of 0.25 for the medianintensity feature. For sample be a ratio of 0.21 and 0.27 were calculated for the bordercoverage and median intensity feature, respectively.

As mentioned before, the border coverage feature strategy holds the advantage to addi-tionally compute the cell masses of α- and β-cells with respect to the whole islet cell mass.For tissue sample A, we computed an α-cell mass of 12% and a β-cell mass of 88%. Quitesimilar results were obtained for sample B with an α-cell mass of 14% and a β-cell mass of85%.

5. DISCUSSION

In this paper, we presented and evaluated a first integrated software for the analysisof multichannel microscopy images which allows for a robust semantic image annotation,i.e. ROI detection, a semantic-based feature extraction, and an explorative data analysis ofthe features. Through the concept of semantic annotation, the system enables an objectbased multichannel image analysis, i.e. object specific multichannel features are extractedand analyzed, rather than analyzing the multichannel characteristic of each pixel in theimage. We could show that by simply analyzing image pixel information, no cell typespecific information could be extracted and thus this strategy is not applicable to extractdiabetes relevant information [12].

Our system requires only few parameters which makes it easily usable also by non com-puter experts. For SVM training the maximum object size Nmax from which further param-eters can be derived, and the PCA target dimension which can easily be set by analyzinga visualization of the eigenvalues in our software, need to be set. Once an SVM has beentrained, analyzing new images only requires the selection of the feature specific thresholds.

7

This process is guided through concepts from visual datamining which allows for an in-teracitve, explorative analysis of the data. Simultaneously, the user can explore the imageas well as the feature domain to choose appropriate settings. This strategy makes it per-fectly applicable to problems where there is only little knowledge of the underlying dataand the evaluation goal is quite vague. We have shown that results similar to a manualexpert classification were obtained by applying our semi-automatic approach with differ-ent feature extraction strategies. The cell type classification allowed to compute diabetesrelevant information as for example the α- to β-cell ratio. Furthermore, by applying theborder coverage feature we could also compute cell type specific cell masses. However, therequirement of thresholding for this computation can sometimes lead to unwanted resultsespecially if staining is not perfect.

The process of semantic annotation with a trained SVM only requires around 20 secondson an INTEL Core 2 Duo CPU with 3 GHz and 2 Gb RAM. Analyzing the multichannelinformation can also be solved within few second. However, the time requirements alwaysdepend on how much fine tuning of the threshold is required. With respect to a manualhuman evaluation our system provides an enormous time saving and thus makes it suitablefor the application to studies with a reasonable high throughput of high-content data.

Although we only presented results for a three channel setup in this study, the systemcan easily be adapted to higher dimensional data. In this paper, we intended to give aninspiration on how to incorporate visual data mining concepts in the field of high-contentbioimage analysis. It is interesting to note, that already quite basic feature extraction andvisualization strategies were sufficient to provide valuable information for diabetes study.

6. ACKNOWLEDGEMENTS

J.H. was supported by the International NRW Graduate School in Bioinformatics andGenome Research, funded by the state of North Rhine-Westphalia (NRW). Special thanksgo to Yvonne Dyck for programming the interactive exploration interface.

References

[1] V. Starkuviene, R. Pepperkok, The potential of high-content high-throughput microscopy in drugdiscovery., Br J Pharmacol 152 (1) (2007) 62–71. doi:10.1038/sj.bjp.0707346.URL http://dx.doi.org/10.1038/sj.bjp.0707346

[2] S. Megason, S. Fraser, Imaging in systems biology., Cell 130 (5) (2007) 784–795.doi:10.1016/j.cell.2007.08.031.URL http://dx.doi.org/10.1016/j.cell.2007.08.031

[3] Z. Perlman, M. Slack, Y. Feng, T. Mitchison, L. Wu, S. Altschuler, Multidimensional drug profiling byautomated microscopy., Science 306 (5699) (2004) 1194–1198. doi:10.1126/science.1100709.URL http://dx.doi.org/10.1126/science.1100709

[4] J. Herold, S. Abouna, L. Zhou, S. Pelengaris, D. B. Epstein, M. Khan, T. W. Nattkemper, A machinelearning based system for multichannel fluorescence analysis in pancreatic tissue bioimages, in: Proc.of IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2008), Athen, 2008.

[5] C. Tomasi, R. Manduchi, Bilateral filtering for gray and color images, in: ICCV, 1998, pp. 839–46.URL citeseer.ist.psu.edu/tomasi98bilateral.html

8

[6] K. Pearson, On lines and planes of closest fit to a system of point in space, Philosophical Magazin 2(1901) 559–572.

[7] J. Hanley, Reciever operating characteristic mehtodology: the state of the art, Crit. Rev. in Diagn.Imaging 29 (1989) 307–35.

[8] R. Gonzalez, R. Woods, Digital Image Processing (2nd Edition), Prentice Hall, New Jersey, 2002.URL http://www.amazon.ca/exec/obidos/redirect?tag=citeulike04-20&path=ASIN/0201180758

[9] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans Syst Man Cybern 9(1979) 62–66.

[10] C. Ware, Information Visualization, Morgan Kaufmann Publishers, 2004.[11] R. Spence, Information Visualization: Design for Interaction, Prentice-Hall, 2006.[12] A way towards analyzing high-content bioimage data by means of semantic annotation and visual data

mining.[13] D. Keim, Information visualization and visual data mining, IEEE Trans Vis Comput Graphics 8 (1)

(2002) 1–8. doi:10.1109/2945.981847.

9

Figure 1: 512×512 px sized pancreatic tissue sample images stained with (left) DAPI showing cell nuclei,(center) immunostained for glucagon showing α-cells and (right) immunostained for insulin showing β-cells.

Figure 2: Analysis strategy for multichannel pancreatic images. (a) First, a semantic annotation is per-formed to obtain biological meaningful objects, marked with white rectangles in the second image. (b) Foreach object, multichannel features are extracted and visualized in a two dimensional scatterplot, displayinge.g. median border intensity for the α-channel against the median border intensity of the β-channel. (c)Interactively, a threshold for each channel can be determined by analyzing the scatterplot as well as theclassification outcome (cross=α-cell, rectangle=β-cell).

Figure 3: Schematic visualization of the feature extraction strategies. First a nucleus border is extracted foreach nucleus position (a). Subsequently, different strategies can be applied for feature extraction. (b) Thechannels are thresholded and the border coverage is calculated through logical operations. (c) The medianintensity across the nucleus border is calculated for each channel.

Figure 4: Interactive feature exploration with link and brush: In a scatter plot visualization of the theextracted features z (x-axis: z1, y-axis: z2), the user can select the threshold for the α-channel (see forinstance the background window on the left in a). Those cells with a z1 value above this threshold aremarked simultaneously in the alpha-channel (foreground window on the left in a). The top row a) showsresults obtained for conservative thresholding, leading to a too small number of cells. The lower row b)shows reasonable results, obtained after tuning both thresholds.

Figure 5: Nucleus detection result for sample A. Correctly detected nuclei are highlighted in white, falsepositives in red and false negatives in orange. The right image shows an enlarged subimage.

(a) (b)

(c)

Figure 6: Extracted nuclei borders, shown in white, with (a) for sample A and (b) for sample B. (c) showssub-images displaying two overlapping nuclei. Intersection of the borders are visible through a lighter graycolor.

Table 1: Number of nuclei classified as α- and β-cell nuclei for the border coverage feature, the intensityfeature and the manual evaluation for both sample images A and B. The percentage with respect to alldetected nuclei is shown in braces.

tissue sample A tissue sample Bborder intensity human border intensity human

# α cells 22 (8%) 23 (8%) 21 23 (4%) 28 (5%) 40# β cells 92 (33%) 92 (33%) 91 110 (22%) 101 (20%) 100

(a)(b) (c)

(d)(e) (f)

Figure 7: α- and β-cell nucleus classification results for tissue sample A (a)-(c) and B (d)-(f). (a) and (d)show scatter plot visualizations for the intensity feature and border coverage feature, respectively. (b) and(e) show the classification result for the α-cell nuclei, marked with a white rectangle, and (c) and (f) resultsfor the β-cell nuclei classification.