37
SDO Progress Presentation

SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Embed Size (px)

Citation preview

Page 1: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

SDO Progress Presentation

Page 2: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Agenda

Benchmark dataset– Acquisition– Future additions– Class balancing and problems

Image Processing– Image parameters to extract– Segmentation– Extraction times– Future parameters to add

Unsupervised Attribute Evaluation– Correlation Maps– Multi Dimensional Scaling Maps

Classifiers

Page 3: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Benchmark Dataset: Creation

Searched the for solar events in Heliophysics Event Knowledgebase (HEK) from 1999-11-03 to 2008-11-04

Extracted 1,600 Images from the TRACE Mission Image repository

Page 4: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Benchmark Dataset: Class balancing

We selected 8 different phenomena (or classes) and we retrieved 200 random images for each phenomena

For phenomena with 200+ reported events we selected one random image of each event

For phenomena with less than 200 reported events we selected images from the same event but taken at a different time

We selected images in the171 and 1600 wavelength due to similarity (all from the TRACE mission)

We re-sized all images to be 1024x1024 pixels

Page 5: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Benchmark Dataset: Problems

The HEK reports 14 different types of events (at the time of benchmark generation)

– There are not enough, or no events reported for several phenomena

– Different phenomena in different wavelengths– Different image resolutions between same phenomena– Several phenomena missing (Sigmoids, Polarity Inversion

Line Mapping, Bright Points) So far we received response from Manolis K. Georgoulis regarding images containing sigmoids

Page 6: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Benchmark Dataset: Future Additions

Add the missing 6 classes of events if we can find suitable images

Add more images to our benchmark dataset

Create different ‘benchmarks’ for other wavelengths, since images between some wavelengths are very different

Page 7: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Benchmark Dataset

Event Name # of Images retrieved

Wavelength Resolution (count) HEK reported events

Active Region 200 1600 768x768 (200) 200+

Coronal Jet 200 171 1024x1024 (155)768x768 (45)

25

Emerging Flux 200 1600 768x768 (200) 12

Filament 200 171 1024x1024 (127)768x768 (73)

45

Filament Activation 200 171 1024x1024 (200) 27

FilamentEruption

200 171 1024x1024 (49)768x768 (151)

53

Flare 200 171 1024x1024 (64)768x768 (136)

200+

Oscillation 200 171 1024x1024 (50)768x768 (150)

3

Page 8: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Image Processing

Our goal in mind for selecting image processing techniques and parameters is:

– Fast extraction of parameters from images– Help to clearly distinguish between different

phenomena (not an easy task to do)

Page 9: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Image Processing: Parameters to Extract

Image parameter:

Entropy

Mean

Standard Deviation

3rd Moment (skewness)

4th Moment (kurtosis)

Relative Smoothness

Fractal Dimension

Tamura Contrast

Tamura Directionality

Histogram R

Histogram J

• So far we have extracted the following textural parameters

Page 10: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Image Processing: Segmentation

Segmenting the images breaks the images into smaller pieces. These smaller pieces allow us to analyze pieces of the image and decide what parts are interesting.

We used grid segmentation split the images into 128 by 128 pixel blocks and we extracted all our parameters from each block

Page 11: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Image Processing: Segmentation Example

Page 12: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Image Processing: Extraction Time

Total time for 1600 images

0

50

100

150

200

250

300

350

400

450

4 x 4 8 x 8 16 x 16

Grid

Tim

e in

Sec

on

ds

Tamura Directionality

Tamura Contrast

Histogram R

Histogram J

Kurtosis

Fractal Dimension

Entropy

Standard Deviation

Skewness

RS

Uniformity

Mean

Page 13: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Image Processing: Future Parameters to Add

Feature Type

Asymmetry index Asymmetry

Lengthening index Asymmetry

Extent Asymmetry

Middle divergence Asymmetry

Compactness (irregularity) index, circularity, CIRC

Border irregularity

Regularity of contour Border irregularity

Edge abruptness, sharpness Border irregularity

Pigmentation transition Border irregularity

Greatest diameter Others

Thinness ratio Others

Lesion area and perimeter Others

Minimum diameter Others

Transition area, transition region imbalance Others

Background region imbalance Others

Peripheral dark. Others

Roundness Others

Fullness ratio Others

Page 14: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Unsupervised Attribute Evaluation- Preliminary Results -

Page 15: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Why Correlation Analysis?

If parameters are highly correlated we can use just one of them for tuning our classifiers.

Removal of correlated parameters will shorten processing time (especially important since we plan to implement them on SDO’s pipeline), and future storage (the smaller the storage use is, the faster image retrieval is going to perform)

Page 16: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Unsupervised Attribute Evaluation: Correlation Maps Within Same Class

Page 17: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Unsupervised Attribute Evaluation: Correlation Maps Within Same Class

Page 18: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Unsupervised Attribute Evaluation: Correlation Maps Within Same Class

We can reduce dimensionality certainly in two different cases: 3rd Moment and 4th moment since in the majority of our correlation maps, they are always strongly correlated. The same holds for Uniformity and RS. Any other parameters that appeared as strongly correlated during our experiments (e.g. Tamura Contrast and Standard deviation) have to be analyzed further, since their correlation is inconsistent across different phenomena types.

We were also able to identify a few correlations that only occur within certain types of phenomena (e.g. Filament, Filament Activation and Filament Eruption). This would allow us in the future to be able to discern these phenomena from other ones, and focus our parameter analysis in determining the phenomena occurring during that reduced set of classes.

Page 19: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Unsupervised Attribute Evaluation: Multi-Dimensional Scaling for previously shown

correlation maps

Page 20: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Unsupervised Attribute Evaluation: Multi-Dimensional Scaling for previously shown

correlation maps

Page 21: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Classifiers

Based on our correlation analysis we selected 3 different scenarios to run our classification experiments (based on the Active Region class):

1) Original 10 image parameters

2) Removing 3 uncorrelated parameters: Fractal Dimension and Tamura Directionality and Contrast

3) Removing 3 of the parameters that are correlated with other parameters: Standard Deviation, Uniformity and Tamura Contrast

Page 22: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Classifiers

We ran all 8 classes plus Empty Sun as a class

We used the NaiveBayes classifier using 10 fold cross validation

Page 23: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

.2

2measure-F

FPFNTP

TP

,RecallFNTP

TP

,Precision

FPTP

TP

Classifiers: Evaluation Measures

There are many different measures used to evaluate accuracy of the classifiers. Many popular ones use Confusion Matrix as a source of information about the classifier:

Commonly used measures are Precision, Recall, and F-measure, which can be defined as (using the symbols from the Confusion Matrix above):

Page 24: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Classifiers: Evaluation Measures

Other commonly used measures are ROC Curves (Receiver Operating Characteristic), which characterize the trade-off between true positive hits and false alarms.

The Area Under the ROC Curve (ROC area)

is a measure of the accuracy of the model – a predictor model with perfect accuracy

will have an area of 1.0– The closer we are to the diagonal line

(i.e., the closer the area is to 0.5), the less accurate is the model.

Page 25: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Preliminary Results - Classifiers: Naïve Bayes

Naïve Bayes is a linear classifier

A linear classifier achieves the grouping of items that have similar feature values into groups by making a classification decision based on the value of the linear combination of the features.

Page 26: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Preliminary Results - Classifiers: Naïve Bayes -

Using all 10 features:

Correctly Classified Instances 4421 31.6509 %Incorrectly Classified Instances 9547 68.3491 %

TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.838 0.194 0.351 0.838 0.494 0.898 ActiveRegion 0.152 0.033 0.368 0.152 0.215 0.775 CoronalJet 0.773 0.112 0.463 0.773 0.579 0.929 EmergingFlux 0.077 0.056 0.148 0.077 0.102 0.686 Empty 0.067 0.007 0.533 0.067 0.119 0.692 Filament 0.357 0.073 0.38 0.357 0.368 0.808 FilamentActivation 0.463 0.215 0.212 0.463 0.291 0.747 FilamentExplosion 0.002 0.01 0.024 0.002 0.004 0.62 Flare 0.12 0.07 0.177 0.12 0.143 0.692 Oscillation

Page 27: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Preliminary Results - Classifiers: Naïve Bayes -

Removing 3 uncorrelated parameters: Fractal Dimension and Tamura Directionality and Contrast

Correctly Classified Instances 3993 28.5868 %Incorrectly Classified Instances 9975 71.4132 %

TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.815 0.282 0.265 0.815 0.4 0.846 ActiveRegion 0.073 0.022 0.293 0.073 0.117 0.674 CoronalJet 0.8 0.142 0.413 0.8 0.545 0.924 EmergingFlux 0.001 0.007 0.011 0.001 0.001 0.577 Empty 0.001 0 0.143 0.001 0.001 0.605 Filament 0.377 0.085 0.357 0.377 0.367 0.778 FilamentActivation 0.481 0.233 0.205 0.481 0.287 0.725 FilamentExplosion 0.003 0.01 0.039 0.003 0.006 0.557 Flare 0.022 0.021 0.116 0.022 0.037 0.664 Oscillation

Conclusion: By removing parameters “randomly” we can lower our chances of recognizing solar phenomena (we dropped from 31% to 28%).

Page 28: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Preliminary Results - Classifiers:Naïve Bayes -

Removing 3 correlated parameters: Standard Deviation, Uniformity and Tamura Contrast

Correctly Classified Instances 4641 33.2259 %Incorrectly Classified Instances 9327 66.7741 %

TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.808 0.156 0.393 0.808 0.529 0.902 ActiveRegion 0.22 0.056 0.33 0.22 0.264 0.776 CoronalJet 0.811 0.137 0.426 0.811 0.558 0.93 EmergingFlux 0.073 0.064 0.126 0.073 0.093 0.671 Empty 0.067 0.01 0.452 0.067 0.117 0.692 Filament 0.391 0.076 0.392 0.391 0.392 0.818 FilamentActivation 0.512 0.218 0.227 0.512 0.314 0.757 FilamentExplosion 0.059 0.019 0.282 0.059 0.098 0.643 Flare 0.048 0.016 0.276 0.048 0.081 0.696 Oscillation

Conclusion: Here by removing 3 parameters (out of 10), that were highly correlated to other parameters, we reduce storage and processing time,while being able to maintain comparable accuracy (33%, vs. 31% originally).

Page 29: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Preliminary Results - Classifiers: C4.5

Based on the dimensionality and distribution of our values from our Image Parameters, we decided to investigate the results of a Decision Tree classifier, as the next step after a linear classifier.

A Decision Tree classifier has the goal of creating a model that predicts the value of a target variable based on several input variables. Each interior node corresponds to one of the input variables; there are edges to children for each of the possible values of that input variable. Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf.

Page 30: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Using all 10 features:

Correctly Classified Instances 9163 65.5999 %Incorrectly Classified Instances 4805 34.4001 %

TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.757 0.044 0.681 0.757 0.717 0.902 ActiveRegion 0.636 0.055 0.589 0.636 0.612 0.831 CoronalJet 0.799 0.023 0.812 0.799 0.805 0.919 EmergingFlux 0.668 0.032 0.725 0.668 0.696 0.908 Empty 0.624 0.048 0.62 0.624 0.622 0.816 Filament 0.765 0.033 0.746 0.765 0.755 0.902 FilamentActivation 0.484 0.056 0.518 0.484 0.5 0.785 FilamentExplosion 0.534 0.051 0.567 0.534 0.55 0.796 Flare 0.637 0.045 0.641 0.637 0.639 0.845 Oscillation

Preliminary Results - Classifiers: C4.5 -

Page 31: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Preliminary Results - Classifiers: C4.5 -

Removing 3 uncorrelated parameters: Fractal Dimension and Tamura Directionality and Contrast

Correctly Classified Instances 8278 59.264 %Incorrectly Classified Instances 5690 40.736 %

TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.653 0.067 0.549 0.653 0.596 0.841 ActiveRegion 0.587 0.061 0.546 0.587 0.566 0.821 CoronalJet 0.778 0.031 0.759 0.778 0.769 0.908 EmergingFlux 0.573 0.041 0.634 0.573 0.602 0.861 Empty 0.564 0.055 0.56 0.564 0.562 0.791 Filament 0.772 0.035 0.731 0.772 0.751 0.908 FilamentActivation 0.414 0.063 0.452 0.414 0.432 0.747 FilamentExplosion 0.424 0.058 0.478 0.424 0.449 0.732 Flare 0.568 0.046 0.605 0.568 0.586 0.83 Oscillation

Conclusion: Again, by removing parameters “randomly” we can lower our chances of recognizing solar phenomena (we dropped from 65% to 59%).

Page 32: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Preliminary Results - Classifiers: C4.5 -

Removing 3 correlated parameters: Standard Deviation, Uniformity and Tamura Contrast

Correctly Classified Instances 8876 63.5452 %Incorrectly Classified Instances 5092 36.4548 %

TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.716 0.04 0.692 0.716 0.704 0.898 ActiveRegion 0.586 0.056 0.567 0.586 0.576 0.822 CoronalJet 0.791 0.029 0.773 0.791 0.782 0.914 EmergingFlux 0.678 0.039 0.687 0.678 0.682 0.905 Empty 0.61 0.053 0.589 0.61 0.599 0.811 Filament 0.767 0.034 0.737 0.767 0.752 0.9 FilamentActivation 0.448 0.057 0.498 0.448 0.472 0.776 FilamentExplosion 0.506 0.055 0.537 0.506 0.521 0.778 Flare 0.615 0.048 0.616 0.615 0.616 0.838 Oscillation

Conclusion: And once again, by removing 3 parameters, that were highly correlated to other parameters, we were able to maintain comparable accuracy (63%, vs. 65% originally – minimal decrease noted).

Page 33: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Preliminary Results - Classifiers: Adaboost C4.5

Going after a better results? Boosting may help

AdaBoost, short for Adaptive Boosting, is a machine learning algorithm, is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers

We decided to use this boosting algorithm in order to improve our classification results

Page 34: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Using all 10 features:

Correctly Classified Instances 10114 72.4084 %Incorrectly Classified Instances 3854 27.5916 %

TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.781 0.032 0.755 0.781 0.768 0.964 ActiveRegion 0.715 0.042 0.681 0.715 0.697 0.943 CoronalJet 0.838 0.017 0.861 0.838 0.849 0.968 EmergingFlux 0.756 0.031 0.752 0.756 0.754 0.947 Empty 0.698 0.037 0.704 0.698 0.701 0.942 Filament 0.847 0.026 0.801 0.847 0.823 0.971 FilamentActivation 0.546 0.044 0.607 0.546 0.575 0.899 FilamentExplosion 0.617 0.04 0.656 0.617 0.636 0.911 Flare 0.719 0.041 0.685 0.719 0.701 0.945 Oscillation

Preliminary Results - Classifiers: Adaboost C4.5 -

Page 35: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Preliminary Results - Classifiers: Adaboost C4.5 -

Removing 3 uncorrelated parameters: Fractal Dimension and Tamura Directionality and Contrast

Correctly Classified Instances 8920 63.8603 %Incorrectly Classified Instances 5048 36.1397 %

TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.66 0.058 0.589 0.66 0.622 0.923 ActiveRegion 0.646 0.049 0.622 0.646 0.634 0.909 CoronalJet 0.811 0.025 0.802 0.811 0.807 0.952 EmergingFlux 0.653 0.043 0.656 0.653 0.654 0.912 Empty 0.591 0.043 0.634 0.591 0.612 0.896 Filament 0.795 0.033 0.752 0.795 0.773 0.959 FilamentActivation 0.461 0.059 0.495 0.461 0.477 0.843 FilamentExplosion 0.486 0.053 0.533 0.486 0.509 0.844 Flare 0.644 0.045 0.644 0.644 0.644 0.913 Oscillation

Conclusion: And again, by removing parameters “randomly” we can lower our chances of recognizing solar phenomena (we dropped from 72% to 63%).

Page 36: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Preliminary Results - Classifiers: Adaboost C4.5 -

Removing 3 correlated parameters: Standard Deviation, Uniformity and Tamura Contrast

Correctly Classified Instances 9706 69.4874 %Incorrectly Classified Instances 4262 30.5126 %

TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.776 0.033 0.747 0.776 0.761 0.956 ActiveRegion 0.667 0.046 0.644 0.667 0.655 0.927 CoronalJet 0.826 0.018 0.851 0.826 0.838 0.965 EmergingFlux 0.737 0.031 0.748 0.737 0.742 0.94 Empty 0.663 0.04 0.672 0.663 0.668 0.925 Filament 0.809 0.029 0.776 0.809 0.792 0.962 FilamentActivation 0.51 0.054 0.544 0.51 0.526 0.874 FilamentExplosion 0.59 0.049 0.6 0.59 0.595 0.899 Flare 0.676 0.043 0.664 0.676 0.67 0.931 Oscillation

Conclusion: And for the last time - by removing 3 parameters (out of 10), that were highly correlated to other parameters, we reduce storage and processing time, while being able to maintain comparable accuracy (72%, vs. 69% originally – 3% decrease, when compared against 30% reduction in number of parameters seems like a good deal).

Page 37: SDO Progress Presentation. Agenda Benchmark dataset – Acquisition – Future additions – Class balancing and problems Image Processing – Image parameters

Preliminary Conclusions

Based on these preliminary classification results we can drop the following image parameters: Standard Deviation, Uniformity and Tamura Contrast in order to reduce storage space and computational costs, since we achieve similar classification percentages than with the complete set of parameters.