Zharkova Sunspot

8/2/2019 Zharkova Sunspot

http://slidepdf.com/reader/full/zharkova-sunspot 1/16

1

Technique for Automated Recognition of Sunspots on Full Disk Solar

Images

S.Zharkov, V.Zharkova, S.Ipson and A. Benkhalil

AbstractA new robust technique is presented for automated identification of sunspots on full disk whitelight (WL) solar images obtained from the Michelson Doppler Imager (MDI) instrument aboardthe SOHO satellite and on Ca II K1 line images from the Meudon Observatory, France. Thetechnique applies image standardisation procedures for elimination of limb darkening and non-circular image shape. To find sunspot candidates edge-detection methods are applied followedby local thresholding producing initial over segmentation of images that is remedied with amedian filter. By using morphological closing operations the features are smoothed and filledby applying watershed, followed by dilation operator to define regions of interest containingsunspots. Each region is examined separately and the values of thresholds used to definesunspot boundaries are determined as a function of the full disk quiet Sun intensity value and

local statistical properties of the region, such as: mean intensity and standard deviation. Thedetected sunspots are accepted if they have strong magnetic field presence in the SOHO/MDImagnetogram data. A number of physical and geometrical parameters of detected sunspotfeatures are extracted and stored in a relational database along with umbra/penumbrainformation in the form of pixel run-length data within a bounding rectangle. The detectionresults reveal very good agreement with the manual synoptic maps from Meudon (France) andmanual drawings from Locarno (Switzerland). The temporal variations of sunspot areas detectedfrom white light images in 2003 using the presented technique showed a very high correlation(96%) with those produced manually at the National Oceanic and Atmospheric Administration(NOAA), USA. Reliable automated detection of solar features like sunspots is very importantfor the data mining of increasing volumes of datasets and for the reliable forecast of solaractivity and space weather.

Keywords: Digital Solar Image, Sunspots, Local Threshold, Edge Detection, MorphologicalOperators, Sunspot Area Time Series

1. Introduction

Sunspot identification and characterisation including location, lifetime, contrast, etc. arerequired for a quantitative study of the solar cycle. Sunspot studies also play an essential part inthe modelling of the total solar irradiance during the solar cycle . As a component of SolarActive Regions, sunspots and their behaviour are also used in the study of Active Regionevolution and in the forecast of solar flare activity (Steinegger et al. [1]).

Manual sunspot catalogues in different formats are produced at various locations allover the world such as the Meudon Observatory (France), the Locarno Solar Observatory(Switzerland), the Mount Wilson Observatory (USA) and many others. The Zurich relativesunspot numbers (or since 1981 Sunspot Index Data (SIDC)), compiled from these manualcatalogues, are used as a primary indicator of solar activity (Hoyt and Schatten [2, 3] andTemmer, Veronig, and Hanslmeier [4]).

With the substantial increase in the size of solar image data archives, the automateddetection and verification of various features of interest is becoming increasingly important for,among other applications, data mining and the reliable forecast of solar activity and spaceweather. This imposes stringent requirements on the accuracy of the automate feature detectionand verification procedures in comparison with the existing manual ones in order to create afully automated Solar Feature Catalogue.

A sunspot is a dark cooler part of the Sun’s surface. It is cooler than the surroundingatmosphere because of the presence of a strong magnetic field that inhibits the transport of heatvia convective motion in the Sun. The magnetic field is formed below the Sun's surface, and



2

extends out into the solar corona. Sunspots are best observed in the visible continuous spectrumalso known as ‘white light’ (WL thereafter). Larger sunspots can also be observed in CaII K1absorption line images as well as in and CaII K3 absorption line images. Sunspots generallyconsist of two parts: a darker, often circular central umbra, and a lighter outer penumbra (seeFigure 1).

Figure 1. A segment of a high-resolution SOHO/MDI image showing sunspots withdark umbras and lighter penumbras on a grey quiet Sun background.

In white light and CaII K1 line digital images sunspots can be characterised by thefollowing two properties: they are considerably darker than the surrounding photosphere andthey have well-defined borders, i.e. the change in intensity from the quiet photosphere near thesunspot to the sunspot itself occurs over a distance no more than 2-4 arcseconds (or 1-2 pixelsfor the data used in this study). Most existing techniques for sunspot detection rely on theseproperties by using thresholding and/or edge-detection operators.

The first thresholding methods for the extraction of sunspot areas used an a prioriestimated intensity threshold on white-light full-disk solar images [5-7]. Sunspots were definedas features with intensity 15% [6] or 8.5% [7, 8] below the quiet Sun background and sunspotareas were estimated by simply counting all pixels below these values. Similar methods wereapplied to high-resolution images of solar disk regions containing a sunspot or a group of sunspots using constant intensity boundaries for the umbra–penumbra and thepenumbra–photosphere transitions at 59% and 85% of the photospheric intensity, respectively

[8, 9]. For digital solar images the thresholding methods were improved by using imagehistograms to help determine the threshold levels. Steinegger et al. [1] used the so-calleddifference histogram method to determine the intensity boundary between the penumbra and thephotosphere that was defined for each individual spot. Another method, based on the cumulativesum of sunspot area contained in the successive brightness bins of the histogram [10], wasapplied to determine the umbral areas of sunspots observed in high resolution images capturinga fragment of the solar disk [10, 11]. A method using sunspot contrast and contiguity, and basedon a region growing technique was developed and described by Preminger et al. in [12].

There is also a Bayesian technique for active region and sunspot detection and labellingdeveloped by Turmon et al. [13] that is rather computationally expensive. Moreover, the methodis more oriented toward faculae detection and does not detect sunspot umbras. Although the

algorithm performs well when trained appropriately, the training process itself can be ratherdifficult to arrange on images with different background variations corresponding to varyingobserving conditions.

Another approach to sunspot area measurements utilizing edge detection and boundarygradient intensity was suggested for high-resolution observations of individual sunspot groups,and/or non full disk segments by [14]. The method is very accurate when applied to datawith sufficiently high resolution. However, in its current form, this method is not suitable forthe automated sunspot detection on full disk images of the low and moderate resolutions that areavailable in most archives. Therefore, all the existing techniques described above in theiroriginal form are not suitable for the automatic detection and identification of sunspots on fulldisk images, since their performance depends on the images with high resolution [14] and/orquality [12] that cannot be guaranteed for full disk images.

In the current paper a new hybrid technique for automatic identification of sunspots onfull disk images using edge detection is proposed, which is significantly improved by usingimage standardization and enhancement procedures. The techniques presented are used for the



3

detection of sunspots on white light and Ca II K1 line full disk images, extracting sunspotsizes, locations, umbra and penumbra areas and intensities with high accuracy restricted onlyby pixel resolution. The techniques can provide fast automated data processing online fromground-based and space based instruments. The techniques applied for image pre-processingand sunspot detection on white light and Ca II KI images are described in Section 2, theverification of detected features is presented in Section 3 and the conclusions are drawn inSection 4.

2. The technique for sunspot detection

2.1 Observations and pre-processing techniques

2.1.1 Observations and their synchronisation

The following two sets of solar full disk images, provided in the Flexible ImageTransport System (FITS) file format (http://fits.gsfc.nasa.gov/), were used for this study: thefirst was supplied by the Meudon Observatory, and the second was obtained from the MDIinstrument aboard the SOHO satellite. Both sets cover the time period spanning April 1 to 30,

2002 and July 1 to 31, 2002, while the SOHO/MDI data were processed for the 8 year periodfrom 1996 to 2003.

The CaII K1 line spectro-heliograms from Meudon provide images of the solarphotosphere in the blue wing of the CaII K 3934 Å line, or K1 line taken at a given time TCa of solar rotation. These data are acquired once a day on film by performing scans of the solar diskusing an entrance slit. The film image is then digitised, providing a pixel size of about 2.3arcseconds. The other set of data used, from the SOHO/MDI instrument, provides almostcontinuous observations of the Sun in the white light continuum in the vicinity of the Ni I 6768 line with a pixel size of about 2 arc seconds taken at the time Twl. Intensities of all pixelsoutside the solar disk in the SOHO/MDI WL images are set to zero.

These images were, in addition, complemented by magnetic field measurements fromthe line-of-sight (LOS) magnetograms captured by the same SOHO/MDI instrument at themoment TM, keeping the data consistent with the WL images. A magnetogram is an imageobtained by an instrument, which can detect the strength and location of the magnetic fieldsfrom the Zeeman polarisation of the radiation in this field. In the magnetogram shown in Figure2, grey areas indicate low magnetic field regions, while black and white areas indicate regionswhere there are strong negative and positive magnetic fields, respectively.

Figure 2. A sample magnetogram fragment from SOHO/MDI . The darkest areas are regions of negative magnetic polarity (directed toward the centre of the Sun) and the white areas areregions of positive magnetic polarity (directed towards the observer). The grey areas indicateregions of weak magnetic field.

For the determination of the magnetic field inside detected sunspot areas the white lightimages and magnetograms were synchronised as follows. We rotate a solar magnetogram imageto the time, Twl, and point of view corresponding to a WL image using standard IDL SolarSoftlibraries to allow pixel-by-pixel comparison of both images. Using sunspot detection results,defined on WL images, as masks applied to corresponding synchronised magnetograms allows

us to extract pixel values from these areas in magnetic field units calibrated by the SOHO/MDIteam (Scherrer et al. [15]).



4

2.1.2 Pre-processing technique

The images from both data sets were pre-processed, with FITS file header informationchecked and amended where necessary using the techniques described by Zharkova et al. [16].These techniques include limb fitting; removal of geometrical distortion; centre position andsize standardisation.

The limb fitting method has three stages: (1) computing an initial approximation of thedisk centre and radius; (2) using edge-detection to provide candidate points for fitting an ellipseusing information from the initial estimate; (3) fitting an ellipse to the candidate limb pointsusing a least squares approach to iteratively remove outlying points. The procedure starts bymaking an initial estimate of the solar centre and radius from image data thresholded at anintensity obtained from an analysis of the image histogram, then smoothes the result by usingGaussian smoothing kernel of size 5×5 that is recommended by the MDI team (Scherrer et al.[15]) as the first stage of applying Canny edge-detection routine (Canny [17]) to the original 12-bit data. Candidate edge points for the limb are selected using a radial histogram method basedon the initial centre estimate and the chosen points fitted to a quadratic function by minimisingthe algebraic distance using Singular Value Decomposition. The five parameters of the ellipse

fitting the limb are extracted from the quadratic function. These parameters are used to define anaffine transformation that converts the image shape into a circle. Transformed images aregenerated using bilinear interpolation.

Often solar images require intensity renormalisation because of radial limb darkening[18] caused by the radiation projection from the spherical atmosphere onto a flat solar imagethat increases the radiation’s optical depth towards the limb and results in pixel darkening. Thisis achieved by fitting a background function to a set of radial sample points having medianradial intensities. The median filtering of the radial intensity starts by transformation of astandardised solar disk onto a rectangular image using a Cartesian to polar co-ordinatestransformation. The median value of each row is used to replace all the intensities in each row.The median transformation is a very effective way of removing artefacts often present in theimages taken from ground-based observatories. However, the presence of non-radial

illumination effects in an image such as stripes and lines caused by dust present at the spectralslit may cause larger than sunspot length variations of the background intensity along each rowof fixed radius and then the median of the row is no longer an appropriate background estimatebut would require a sophisticated segmentation procedure [16]. Such a segmentation procedureis not implemented yet, so these images are automatically disregarded by the software if suchnon-radial variations are too severe. By removing the limb darkening one obtains a “flat”,sometimes called “contrast”, image [12] of the solar photosphere using the procedure describedin the first paragraph above (see Zharkova et al. [16]).

In order to compare the image quality in both data sets, we have used three basicstatistical moments of the digital image data values taken from the image headers (SOHO/MDI)[15] or generated from images directly (Meudon). These include mean (formula 1), variance(formula 2, not plotted here), skewness (the lack of symmetry of pixel values towards the

central pixel, formula 3) and kurtosis (a measure of whether the data are peaked or flat relativeto a normal distribution towards the central pixel, formula 4) which were calculated for the fulldisk pixel data x j ,(j=1,N) as follows:

Mean = ∑−

=

=1

0

1 N

j

j x N

x , (1)

Variance = ( )∑−

=

−−

1

0

2

11 N

j

j x x

N , (2)

Skewness =

31

0

1 ∑−

=

− N

j

j

Variance

x x

N , (3)

Kurtosis = 31 41

0

−

−∑−

=

N

j

j

Variance

x x

N . (4)



5

Figure 3. Full disk data statistics (Mean, Skewness and Kurtosis) presented for theSOHO/MDI continuum (the 3 top plots) and Meudon Observatory Ca II k1 line (3 bottom plots)datasets covering February to May 2002. Peaks and dips in the SOHO/MDI plots correspond todefective data (images) that were automatically rejected by our software.

The results of the comparison between the Ca II K1 and MDI WL datasets for theperiod February to May 2002 are presented in Figure 3, where the three upper plots are from theMDI data and the three bottom plots are deduced from the Meudon data. In each set the upper



6

plot presents the mean, the middle one presents the skewness and the bottom one presents thekurtosis calculated for every daily image.

The general quality of Meudon images is highly dependent on atmospheric conditions atthe time of the observation. A number of instrumental artefacts which are difficult to eliminate,such as dust lines, are often present in these images, thus making image unsuitable forautomated detection. Together, atmospheric conditions and instrumental artefacts produce thevariations shown in the bottom three plots of Figure 3. The SOHO satellite data is not subject to

clouds as is demonstrated in Figure 3 (3 upper plots), though there are dropouts from time totime due to spacecraft problems. Hence, the pre-processing for SOHO images consisted of limbdarkening removal only while for Meudon images it included noise filtering with median or/andGaussian filters. Hence, the pre-processed images containing quiet Sun pixels with darker and,possibly (for the images in Ca II K1 line) brighter features superimposed, which are suitable forsunspot detection.

For a full-disk solar image free of the limb-darkening, the quiet Sun intensity value isestablished from an image histogram as the intensity with the highest pixel count (see forexample Figure 4a and 4b). Thus, in a manner similar to [1], by analysing the histogram of the“flat” image, an average quiet Sun intensity,

QSun I , can be determined.

2.2 Description of the Technique

2.2.1 Automatic Detection on the SOHO/MDI white light images

The technique developed for the SOHO/MDI data relies on the good quality of theimages evident from the top three graphs in Figure 3. This allows a number of parameters,including threshold values as percentages of the quiet Sun intensity, to be set constant for thewhole data set. Since sunspots are characterized by strong magnetic field, the synchronisedmagnetogram data is then used for sunspot verification by checking the magnetic flux at theidentified feature location.

Basic (binary) morphological operators such as dilation, closing and watershed [19, 20]are used in our detection code. Binary morphological dilation also known as Minkowskiaddition is defined as

{ } U B x

x x A A B x B A

∈

=∅≠∩=⊕ )(: ,

where A is the signal or image being operated on and B is called the “Structuring Element”. Thisequation simply means that B is moved over A and the intersection of B reflected and translatedwith A is found. Dilation using disk structuring elements corresponds to isotropic swelling orexpansion algorithms common to binary image processing.

Binary morphological erosion also known as Minkowski subtraction is defined as

A { } I B x

x x A A B x B∈

=∅≠⊆= )(: .

The equation simply means that erosion of A by B is the set of points x such that B translated by

x is contained in A. When the structuring element contains the origin, Erosion can be seen as ashrinking of the original image.Morphological closing us defined as dilation followed by erosion. Morphological

closing is an idempotent operator. Closing an image with a disk structuring element eliminatessmall holes, fills gaps on the contours and fuses narrow breaks and long, thin gulfs.

The morphological watershed operator segments images into watershed regions andtheir boundaries. Considering the gray scale image as a surface, each local minimum can bethought of as the point to which water falling on the surrounding region drains. The boundariesof the watersheds lie on the tops of the ridges. This operator labels each watershed region with aunique index, and sets the boundaries to zero. We apply the watershed operator provided in theIDL library by Research Systemc Inc. to binary image where it floods enclosed boundaries andthus is used in a filling algorithm. For a detailed discussion of mathematical morphology see the

refernces within the text and numerous books on Digital Imaging.The detection code is applied to a “flattened“ full disk SOHO/MDI continuum image,

∆ , (Figure 5a), with estimated quiet Sun intensity, QSun I (Figure 4b), image size, solar disk



7

centre pixel coordinates, disk radius, date of observation, and resolution (in arc seconds perpixel). Because of the Sun’s rotation around its axis, a SOHO/MDI magnetogram, ,Μ taken atthe time TM, is synchronized to the continuum image time TWL via a spatial displacement of thepixels to the position they had at the time TWL in order to obtain the same point of view as thosefor the continuum.

The technique presented in the current paper uses edge detection with threshold applied

on the gradient image. This technique is significantly less sensitive to noise than the global

threshold since it uses the background intensity in the vicinity of a sunspot. We considersunspots as connected features characterized by strong edges, lower than surrounding quiet Sunintensity and strong magnetic field. Sunspot properties vary over the solar disk, so a two stageprocedure is adopted. First, sunspot candidate regions are defined. Second, these are analyzedon the basis of their local properties to determine sunspot umbra and penumbra regions. This isfollowed by verification using magnetic information. A detailed description of the procedure isprovided in the pseudo-code below.

Sunspot candidate regions are determined by combining two approaches: edge-detection and low intensity region detection (steps 1-3). First, we obtain a gradient grey-levelimage, ∆ p, from the original pre-processed image, ∆ (Figure 5a) by applying Gaussiansmoothing with a sliding window (5x5) followed by Sobel gradient operator (step 1). Then (step

2) we locate strong edges via iterative thresholding of the gradient image starting from initialthreshold, 0T , whose value is not critical but should be small. The threshold is applied followed

by 5×5 median filter and the number of connected components, c N , and the ratio of the number

edge pixel to the total number of disk pixels, R , are determined. If the ratio is too large or thenumber of components is greater than 250, the threshold is incremented. The number 250 is

(a) (b)

Figure 4. Flat image histograms for the Meudon Observatory CaII K1 line (a) and SOHO/MDIwhite light images (b).

based on the available recorded maximum number of sunspots which is around 1701. Since atthis stage of the detection we are dealing with noise and the possibility of several features joinedinto a single candidate region, this limit is increased 250 to ensure that no sunspots areexcluded. The presence of noise and fine structures in the original “flat” image will contributemany low gradient value pixels resulting in just a few very large connected regions, if thethreshold is too close to zero. Imposing an upper limit of 0.7 on the ratio of numbers of edgepixels to disk pixels excludes this situation. A lower value increment in the iterativethresholding loop ensures better accuracy, at the cost of computational time. The increment

value can also be set as a function of the intermediate values of c N and R . The resulting binary

1 The number varies depending on the observer.



8

The pseudo-code describing the sunspot detection algorithm in SOHO/MDI white light images.

1. Apply Gaussian smoothing with sliding window 5x5 followed by the Sobel operator to a

copy of ∆ ;

2. Using the initial threshold value, 0T , threshold the edge map and apply the median 5x5

filter to the result. Count the number of connected components,c N , and the ratio of the

number edge pixel to the total number of disk pixels, R . (Feature Candidates, Figure 5b

and 6b). If c N is greater than 250 or R is larger than 0.7, increase 0T by set value (1 or

larger depending onc N and R ) and repeat step 3 from the beginning.

3. Similarly, iteratively threshold original flat image to define less then 100 dark regions.Combine (using OR operator) two binary images into one binary Feature Candidate Map(Figure 5c).

4. Remove the edge corresponding to the limb from Candidate Map and fill the possible gapsin the feature outlines using IDL's morphological closure and watershed operators (Figures5d and 6c).

5. Use blob colouring to define a region of interest,i

F , as a set of pixels representing a

connected component on the resulting binary image, ∆Β6. Create an empty Sunspot Candidate Map, ∆Β , a byte mask which will contain the detection

results with pixels belonging to umbra marked as 2, penumbra as 1.

7. For everyiF extract a cropped image containing

iF and define Ts and Tu :

i. if iF =< 5 pixels assign the thresholds:

for penumbra Ts = 0 .9 1 QSun I ; for umbra Tu = 0 .6 QSun I

ii. if i

F > 5 pixels assign the thresholds:

for penumbra: Ts = m a x { 0 .9 3 QSun I ; (< F i> - 0 .5 * ∆ F i )} ;

for umbra: Tu = m a x { 0 .5 5 QSun I ; (< F i> - ∆ F i )},

where < F i> is a mean intensity and ∆ F i a standard deviation foriF .

8. Threshold a cropped image at this value to define the candidate umbral and penumbral

pixels and insert the results back into ∆Β (Figures 5e and 6d). Use blob colouring to define

a candidate sunspot,iS as a set of pixels representing a connected component in ∆Β .

9. To verify the detection results, cross check ∆Β with the synchronised magnetogram, ,Μ as

follows:

for every sunspot candidatei

S of ∆Β extract

)|)(max()(max ii S p pS B ∈Μ=)|)(min()(min ii

S p pS B ∈Μ=if 100)))(()),((max( minmax <ii S BabsS Babs then disregard

iS as noise.

10. For eachi

S extract and store the following parameters: gravity center coordinates

(Carrington and projective), area, diameter, umbra size, number of umbras detected,maximum-minimum-mean photometric intensity (as related to flattened image), maximum-minimum magnetic flux, total magnetic flux and total umbral flux.

image (Figure 5b) contains complete and incomplete sunspot boundaries as well as noise andother features such as the solar limb. Similarly, the original “flat” image is iteratively thresholded to define dark regions (step

3). The resulting binary image contains fragments of sunspot regions and noise. The two binaryimages are combined using the logical OR operator (Figure 5c). The image will contain featureboundaries and blobs corresponding to the areas of high gradient and/or low intensity as well asthe limb edge. After removing the limb boundary, a 7x7 morphological closure operator [19, 20]



9

is applied to close incomplete boundaries. Closed boundaries are then filled by applying a fillingalgorithm based on the IDL watershed function [21] to the binary image. A 7x7 dilationoperator is then applied to define the regions of interest which possibly contain sunspots (step 4,Figure 5d, regions masked in dark grey).

In the second stage of detection, these regions are uniquely labeled using blob colouringalgorithm [22] (step 5) and individually analyzed (steps 6-7, Figure 5e). Penumbra and umbraboundaries are determined by thresholding at values

u

T ands

T which are functions of the

region’s statistical properties and quiet Sun intensity defined in step 7. Practically, the formulaefor determining

uT andsT (step 7), including the quiet Sun intensity coefficients, 0.91, 0.93,

0.6, 0.55, are determined by applying the algorithm to a training set of about 200 SOHO/MDIWL images. Since smaller regions of interest (step 7i) carry less statistical intensity informationlower

sT value reduces the probability of false identification. In order to apply this sunspot

detection technique to datasets from other instruments, the values of the constants appearing inthe formulas for

uT and

sT should be determined for each dataset. As mentioned in the

Introduction, other authors [6-9] apply global thresholds at the different values 0.85 or 0.925 for

sT and 0.59 for uT .

In the final stage of the algorithm (steps 8-9) we verify the resulting sunspot candidatesby determining the maximum magnetic field within the candidate region (Figure 5f). Thisinformation is extracted from synchronized magnetogram Μ , as described in step 9 when wecan verify a sunspot candidate as a sunspot, if this magnetic field is higher than the magneticfield threshold. The latter is chosen to be equal to 100 Gauss, that is appropriate for smallestsunspots, or pores, [18] and is a factor 5 higher than the noise in magnetic field measurementsby the MDI instrument [15].

The method works particularly well for larger features. It also detects a number of smaller features (under 5 pixels) for which there is often not enough information in thecontinuum image to make a decision whether the detected feature corresponds to a true featureor an artifact (Figure 5e). This detection is verified with great accuracy by using the magneticfield information, extracted from the synchronized magnetograms (Figure 5f). By comparing

Figures 5e and 5f we can see that the false detection (marked by a circle) between the two topand bottom sunspot groups has been remedied. Detected feature parameters are extracted andstored in the ASCII format in the final step 10.

2.2.2. Technique modifications for CaII K1 images

The technique described above can also be also applied to CaII k1 images with thefollowing modifications . First, these images contain substantial noise and distortions owing toinstrumental and atmospheric conditions, so their quality is much lower than the SOHO/MDIwhite light data (see Figure 3, bottom three plots). Hence, the threshold in step 7 (i.e. item 7 inthe pseudo code) has to be set lower, i.e. Ts = max { 0.91 QSun I ; (<F i> - 0.25* F i )} for

penumbra and Tu = max { 0.55 QSun I ; (<F i> - F i)}, for umbra, where F i is the meanabsolute deviation of the region of interest F i.

The examples of sunspot detection with this technique applied to a Ca II K1 imagetaken on 2/04/02 are presented in Figure 6. First, the full disk solar image is pre-processed(Figure 6a) by correcting, if necessary, the shape of the disk to a circular one (via automatedlimb ellipse-fitting) and by removing the limb-darkening as described in Section 2.1.2. ThenSobel edge detection (similar results were also achieved with morphological gradient operation[19] defined as the result of subtracting an eroded version of the original image from a dilatedversion of the original image) is applied to the pre- processed image, followed by thresholdingin order to detect strong edges (Figure 6b). This over segments the image; and then a 5 × 5median filter and an 8 × 8 morphological closing filter [19, 20] are applied to remove noise, to

smooth the edges and to fill in small holes in edges. After removing the limb edge, the



10

Figure 5. A sample of the sunspot detection technique applied to the WL disk

SOHO/MDI image from 19th

April, 2002 (presented for a cropped fragment for betterresolution): (a) a part of the original image; (b) the detected edges; (c) candidate map; (d) theregions of interests after filtering as masks on original; (e) the detection results beforemagnetogram verification, false identification is circled; (f) the final detection results.



11

watershed transform [21] is applied to the thresholded binary image in order to fill-in closedsunspot boundaries. Regions of interest are defined, similar to the WL case, via morphologicalclosure and dilation [19,20]. Candidate sunspot features are then detected by local thresholdingusing threshold values in the previous paragraph. The candidate features’ statistical propertiessuch as size, principal component coefficients and eigenvalues, intensity mean and meanabsolute deviation are then used to aid the removal of false identifications such as the artefactsand lines, often present in the Meudon Observatory images (Figures 6d and 6e).

It can be seen that on the Ca K1 image shown in Figure 6 the technique performs as well as onthe white light data (Figure 5). However, in many other Ca images the technique can stillproduce a relatively large number of false identifications for smaller features under 10 pixelwhere there is not enough statistical information to differentiate between the noise and sunspots.This raises the problem of verification of the detected features that is discussed below.

2.3 Verification and Accuracy

There are two possible means of verification. The first option assumes the existence of atested well-established source of the target data that is used for a straight-forward comparisonwith the automated detection results. In our case, such data would be the records (sunspot

drawings) produced by a trained observer. However, the number (and geometry) of visible/detectable sunspots depend on the time (date) of observation (sunspot lifetime can beless than an hour), location of the observer, wavelength and resolution (in case of digitalimaging). Therefore, this method works best when the input data for both detection methods isthe same. Otherwise, a number of differences can appear naturally when comparing the twomethods.

The second option is comparing two different datasets describing the same sunspotstaken close together in time by different instruments, by extracting from each dataset a carefullychosen invariant parameter (or set of parameters), such as Sunspot Area, and looking at itscorrelation. For our technique verification both methods were applied and the outcome ispresented below.

2.3.1 Verification with drawings and synoptic maps

The verification of the automated sunspot detection results started by comparison withthe sunspot database produced manually at the Meudon Observatory and published as synopticmaps in ASCII format. The comparison is shown in Table 1. The two cases presented in Table 1correspond to two ways of accepting/rejecting detected features. In general, by consideringfeature size, shape (ie principal components), mean intensity, variance, quiet sun intensity andproximity to other features, one can decide whether the result is likely to be a true feature. InTable 1, case 1, we have included features with sizes over 5 pixels, mean intensities less thanthe quiet Sun’s, mean absolute deviations exceeding 20 (which is about 5 % of the quiet Sunintensity), principal component ratios less than 2.1. In case 2, we include practically all detectedcandidate features by setting the deviation and principal component ratio thresholds to 0.05.

The differences between the manual and automatic methods are expressed bycalculating the False Acceptance Rate (FAR) (where we detect a feature and they do not) andthe False Rejection Rate (FRR) (where they detect a feature and we do not). FAR and FRR werecalculated for the available observations for the two different classifier settings described in theprevious paragraph. The FAR is lowest for the classifier case 1 and does not exceed 8.8% of thetotal sunspot number detected on a day. By contrast, FRR is lowest for the classifier case 2 anddoes not exceed 15.2 % of the total sunspot number.

The error rates in the table 1 are the consequences of several factors related to the imagequality. First, different seeing conditions can adversely influence automated recognition results;for example, a cloud can obstruct a significant portion of the disk, thus greatly reducing thequality of that segment of the image, making it difficult to detect the finer details of that part of the solar photosphere. Second, some small (less than 5-8 pixels) dust lines and image artefactscan be virtually indistinguishable from smaller sunspots leading to false identifications.

Also, in order to interpret the data presented in Table 1, the following points have to beclarified. Sunspot counting methods are different for different observatories. For example, a



12

single large sunspot with one umbra is counted as a single feature at Meudon, but can becounted as 3 or more sunspots (depending on the sunspot group configuration) at the LocarnoObservatory. Similarly, there are differences between the Meudon approach and our approach.For example, a large sunspot with several umbras is counted as one feature by us, but can becounted as several features by the Meudon observer. Furthermore, interpretation of sunspot dataat Meudon is influenced by the knowledge of earlier data and can sometimes be revised in thelight of the subsequent observations. Hence, for the instance, on April 2, 2002 there are 20

sunspots detected at Meudon Observatory. The automated detection applied to the same image

Figure 6. Sunspot Detection on the CaII K1 line full disk image obtained from MeudonObservatory on 01/04/02 : (a) The original image cleaned; (b) The detected edges; (c) Thefound regions (dilated); (d) The final detection results superimposed on original image; e) Theextract of a single sunspot group from (d).

a b

c d

e



13

yielded 18 sunspots corresponding to 17 of the Meudon sunspots with one of the Meudonsunspots detected as two. Thus, in this case FAR is zero, and FRR is 3.

Currently, our automated detection approach is based on extracting all the availableinformation from a single observation and storing this information digitally. Further analysisand classification of the archive data is in progress that will allow us to produce sunspotnumbers identical to the existing spot counting techniques.

Table 1. The accuracy of sunspot detection and classification for CaII K1 line observations incomparison with the manual set obtained from Meudon Observatory (see the text for descriptionof defined cases 1 and 2).

Date#Spots,Meudon

#Spots,case 1

FAR,case 1

FRR,case 1

#Spots,case 2

FAR,case 2

FRR,case 2

01-Apr-02 16 17 2 1 19 4 102-Apr-02 20 18 0 3 18 0 3

03-Apr-02 14 13 0 3 24 10 2

04-Apr-02 13 15 2 2 20 6 1

05-Apr-02 16 18 0 2 18 1 206-Apr-02 10 10 0 5 15 5 5

07-Apr-02 11 9 1 5 13 4 4

08-Apr-02 14 17 3 2 22 7 0

09-Apr-02 16 17 0 2 17 0 2

10-Apr-02 12 12 0 4 14 1 3

11-Apr-02 12 9 0 7 10 1 7

12-Apr-02 18 20 2 0 21 3 0

14-Apr-02 20 23 2 2 34 13 2

15-Apr-02 13 16 1 4 18 2 3

16-Apr-02 10 13 3 1 19 9 117-Apr-02 11 11 1 1 13 2 0

18-Apr-02 12 11 1 1 12 1 0

19-Apr-02 11 14 0 0 15 1 0

20-Apr-02 13 10 0 2 11 1 2

21-Apr-02 9 8 1 1 15 7 0

22-Apr-02 12 13 1 0 15 3 0

23-Apr-02 14 13 0 1 15 1 0

24-Apr-02 18 15 0 3 17 0 1

25-Apr-02 17 13 0 3 17 2 1

27-Apr-02 9 7 0 1 9 2 1

28-Apr-02 9 10 1 0 11 2 029-Apr-02 8 12 5 0 20 13 0

For the verification of sunspot detection on the SOHO/MDI images, which have betterquality (see Figure 3 three upper plots), less noise and better time coverage (4 per day) we usedthe daily sunspot drawings produced manually since 1965 at the Locarno Observatory inSwitzerland. The Locarno manual drawings are produced in accordance with the techniquedeveloped by M. Waldmeier [23] and the results are stored in the Solar Index Data Catalogue(SIDC) at the Royal Belgian Observatory in Brussels [24]. While the Locarno observations are

subject to seeing conditions, this is counterbalanced by the higher resolution of liveobservations (about 1 arcsecond). We have compared the results of our automated detection inwhite-light images with the available drawings in Locarno for June-July 2002, as well as forJanuary- February 2004 along with a number of random dates in 2002 and 2003 (about 100



14

daily drawings with up to 100 sunspots per drawing). The comparison has shown a goodagreement (~95 to 98%). The automated method detects all sunspots visually observable in theSOHO/MDI WL observations. The discrepancies are found at the level of sunspot pores(smaller structures), and can be explained by the time difference between the observations andby the lower resolution of the SOHO/MDI images.

2.3.2 Verification with the NOAA dataset

Figure 7. A comparison of temporal variations of daily sunspot areas extracted in 2003from the USAF/ NOAA archive (US) and from the Solar Feature Catalogue with the presentedtechnique.

Comparison of temporal variations of the daily sunspot areas extracted from the EGSO SolarFeature Catalogue in 2003 presented in Figure 7 (bottom plot) with those available as ASCIIfiles obtained from the drawings of about 365 daily images obtained in 2003 at the US AirForce / NOAA (taken from National Observatory for Astronomy and Astrophysics NationalGeophysical Data Centre, US [25]), revealed a correlation coefficient of 96 percent (Figure 7,upper plot). This is a very high accuracy of detection that ensures a good quality of extracted

parameters within the resolution limits defined by a particular instrument.Further verification of sunspot detection in WL images can be obtained by comparing

sunspot area statistics with solar activity index such as sunspot numbers. Sunspot numbers aregenerated manually from sunspot drawings [24] and are related to the number of sunspotgroups. The first attempt to compare the sunspot area statistics detected by us with the sunspotnumber revealed a correlation of up to 86% (Zharkov and Zharkova [26]). For accuratecomparison classification of sunspots into groups is required. Manually this is done usingsunspot magnetic field polarity tags and the property that neighbouring sunspots with oppositemagnetic field polarities are paired into groups. Implementation of the automated classificationof sunspots into groups is the scope of a future paper.



15

3. The conclusions

New improved techniques are presented for automated sunspot detection on full disksolar images obtained in the Ca II K1 line from the Meudon Observatory (~300 images) and inWL from the MDI instrument aboard the SOHO satellite (10,082 images).

The technique applies automated image cleaning procedures for elimination of limbdarkening and non-circular image shape. Edge-detection methods and global thresholdingmethods are used to produce initial image segmentation. The resulting over-segmentation isremedied using a median filter followed by morphological closing operations to closeboundaries. A watershed region filling operation, 7x7 morphological closing and dilationoperators are used to define the regions of interest possibly containing sunspots. Each region isthen examined separately and the values of thresholds used to define sunspot umbra andpenumbra boundaries are determined as a function of the full disk quiet Sun intensity value andstatistical properties of the region (such as a mean intensity, standard deviation and absolutemean deviation). The sunspots detected in WL are verified using the SOHO/MDI magnetogramdata.

The detection results for the selected months in 2002-2003 show a good agreement with

the Meudon manual synoptic maps and a very good agreement with the Locarno manualdrawings (95-98%). The temporal variations of sunspot areas detected in 2003 with thepresented technique from white light images revealed a very high correlation (96%) with thoseproduced manually at the National Observatory for Astronomy and Astrophysics (NOAA), US.

A number of physical and geometrical parameters of sunspot features are extracted andstored in the relational database along with run-length encoded umbra and penumbra regionswithin the bounding rectangle for each sunspot. The database is accessible via web-services andhttp://solar.inf.brad.ac.uk website.

In order to significantly reduce the errors contained in acceptance and rejection ratecoefficients FARs and FRRs and validate the detection with the existing activity index [25], theSunspot Candidate classification into groups has to be implemented. This can be done byexamining the adjacent observations, their magnetic polarity and values and is the scope of a

forthcoming paper.

4. Acknowledgements

The research has been done for European Grid of Solar Observations (EGSO) fundedby the European Commission within the IST Fifth Framework, grant IST-2001-32409.

5. References

[1] Steinegger, M., Vazquez, M., Bonet, J.A. and Brandt, P.N. On the energy balance of SolarActive Region, 1996, Ap.J., 461, 478.[2] Hoyt, D.V., Schatten, K.H. Group Sunspot Numbers: A New Solar Activity Reconstruction,

1998a, Solar Phys., 179, 189-219[3] Hoyt, D.V., Schatten, K.H. Group Sunspot Numbers: A New Solar Activity Reconstruction,1998b, Solar Phys., 181, 491-512[4] Temmer, M., Veronig, A., Hanslmeier, A. Hemispheric Sunspot Numbers Rn and Rs:Catalogue and N-S asymmetry analysis, A & A, 2002, 390, 707-715[5] Chapman, G. A. and Groisman, G. A digital analysis of sunspot areas, 1984, Solar Phys., 91:45[6] Chapman, G. A., Cookson, A. M., and Dobias, J. J., Observations of changes in thebolometric contrast of sunspots, 1994, Ap.J., 432, 403-408[7] Chapman, G. A., Cookson, A. M., and Hoyt, D. V., Solar irradiance from Nimbus-7compared with ground-based photometry, 1994, Solar Phys., 149, 249[8] Steinegger, M., Brandt, P. N., Pap, J., and Schmidt, W. Sunspot photometry and the totalsolar irradiance deficit measured in 1980 by ACRIM, 1990, Astr.Sp.Sci., 170, 127-133[9] Brandt, P. N., Schmidt, W., and Steinegger, M., On the umbra-penumbra area ratio of sunspots, 1990, Solar Phys., 129: 191

http://solar.inf.brad.ac.uk/

http://solar.inf.brad.ac.uk/



16

[10] Pettauer, T., Brandt, P.N. On novel methods to determine areas of sunspots fromphotoheliograms, 1997, Solar Phys., 175, 197[11] Steinegger, M., Bonet J., A., Vazquez, M.. Simulation of seeing influences on thephotometric determination of sunspot areas, 1997, Solar Phys., 171: 303[12] Preminger, D. G., Walton, S.R., Chapman, G.A.: Solar Feature Identification usingContrast and Contiguity. 2001, Solar Phys., 202: 53[13] Turmon, M., Pap, J. M., Mukhtar S.: Statistical Pattern Recognition for Labeling Solar

Active Regions: Application to SOHO/MDI Imagery, 2002, Ap.J., 568:396-407[14] L.: Automation of area measurement of Sunspots, 1998, Solar Phys., 180:109-130[15] Sherrer, P.H., Bogart, R. S., Bush, R. I., et al.: The Solar Oscillations Investigation -Michelson Doppler Imager, 1995, Ann. Rev. Astr.Astrophys., 2, 363[16] Zharkova, V.V., Ipson, S. S., Zharkov, S.I., Benkhalil, A., Aboudarham, J., Bentley, R.D.,A full disk image standardisation of the synoptic solar observations at the Meudon Observatory.2003, Solar Phys., 214: 89-105[17] J. Canny, A Computational Approach to Edge Detection, 1986, IEEE Transactions onPattern Analysis, 6, 679-698[18] Allen, C. W.: 1973, Astrophysical Quantities, Athlone Press, London[19] Serra, J.: Image Analysis and Mathematical Morphology, Vol. 1, 1988, Academic Press

Limited, London.[20] Matheron, G: Random Sets and Integral Geometry, 1975, John Wiley & Sons[21] Jackway, P.: Gradient Watersheds in Morphological Scale-Space, 1996, IEEE Transactions

on Image Processing, v.5, 913-921.[22] Rosenfeld, A., Pfaltz, J.L.: Sequential Operations in Digital Image Processing, 1966,J.Assoc. for Comp. Machinery, Vol. 13, , pp.471-494[23] Waldmeier, M. The sunspot activity in the years 1610-1960, Sunspot Tables, 1961,Schulthess Publisher, Zurich.[24] Solar Index Data Catalogue, Belgian Royal Observatory, http://sidc.oma.be/products/ [25] NOAA National Geophysical Data Center,

http://www.ngdc.noaa.gov/stp/SOLAR/ftpsunspotregions.html [26] Zharkov, S.I and Zharkova, V.V. Statistical properties of sunspots and their magnetic field

detected from full disk SOHO/MDI images in 1996-2003, 2005, Adv.Sp.Res., in press.

Documents

Zharkova Sunspot