Upload
john-jenkinson
View
30
Download
0
Tags:
Embed Size (px)
Citation preview
ENHANCEMENT CLASSIFICATION OF GALAXY IMAGES
APPROVED BY SUPERVISING COMMITTEE:
Arytom Grigoryan, Ph.D., Chair
Walter Richardson, Ph.D.
David Akopian, Ph.D.
Accepted:Dean, Graduate School
Copyright 2014 John Jenkinson
All rights reserved.
DEDICATION
To my family.
ENHANCEMENT CLASSIFICATION OF GALAXY IMAGES
by
JOHN JENKINSON, M.S.
DISSERTATION
Presented to the Graduate Faculty of
The University of Texas at San Antonio
In Partial Fulfillment
Of the Requirements
For the Degree of
MASTER OF SCIENCE IN ELECTRICAL ENGINEERING
THE UNIVERSITY OF TEXAS AT SAN ANTONIO
College of Engineering
Department of Electrical and Computer Engineering
December 2014
All rights reserved
INFORMATION TO ALL USERSThe quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscriptand there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
Microform Edition © ProQuest LLC.All rights reserved. This work is protected against
unauthorized copying under Title 17, United States Code
ProQuest LLC.789 East Eisenhower Parkway
P.O. Box 1346Ann Arbor, MI 48106 - 1346
UMI 1572687
Published by ProQuest LLC (2015). Copyright in the Dissertation held by the Author.
UMI Number: 1572687
ACKNOWLEDGEMENTS
My most sincere regard is given to Dr. Artyom Grigoryan for giving me the opportunity to learn
to research and for being here for the students, to Dr. Walter Richardson, Jr. for teaching complex
topics from the ground up and leading this horse of a student to mathematical waters applicable
to my research, to Dr. Mihail Tanase for being the study group that I have never had, and to Dr.
Azima Mottaghi for constant motivation, support and the remark, "You can finish it all in one day."
Additionally, this work was progressed through discussions with Mehdi Hajinoroozi, Skei, hftf,
and pavonia. I also acknowledge the UTSA Mexico Center for their support of this research.
December 2014
iv
ENHANCEMENT CLASSIFICATION OF GALAXY IMAGES
John Jenkinson, B.S.
The University of Texas at San Antonio, 2014
Supervising Professor: Arytom Grigoryan, Ph.D., Chair
With the advent of astronomical imaging technology developments, and the increased capacity
of digital storage, the production of photographic atlases of the night sky have begun to generate
volumes of data which need to be processed autonomously. As part of the Tonantzintla Digi-
tal Sky Survey construction, the present work involves software development for the digital image
processing of astronomical images, in particular operations that preface feature extraction and clas-
sification. Recognition of galaxies in these images is the primary objective of the present work.
Many galaxy images have poor resolution or contain faint galaxy features, resulting in the mis-
classification of galaxies. An enhancement of these images by the method of the Heap transform
is proposed, and experimental results are provided which demonstrate the image enhancement to
improve the presence of faint galaxy features thereby improving classification accuracy. The fea-
ture extraction was performed using morphological features that have been widely used in previous
automated galaxy investigations. Principal component analysis was applied to the original and en-
hanced data sets for a performance comparison between the original and reduced features spaces.
Classification was performed by the Support Vector Machine learning algorithm.
v
TABLE OF CONTENTS
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Galaxy Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Hubble Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 de Vaucouleurs Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Digital Data Volumes in Modern Astronomy . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 Digitized Sky Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2 Problem Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Problem Description and Proposed Solution . . . . . . . . . . . . . . . . . . . . . 14
1.4 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.1 Survey of Automated Galaxy Classification . . . . . . . . . . . . . . . . . 15
1.4.2 Survey of Support Vector Machines . . . . . . . . . . . . . . . . . . . . . 17
1.4.3 Survey of Enhancement Methods . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 2: Morphological Classification and Image Analysis . . . . . . . . . . . . . . . 20
2.1 Astronomical Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Image enhancement measure (EME) . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Spatial domain image enhancement . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 Negative Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.2 Logarithmic Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 28
vi
2.3.3 Power Law Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.4 Histogram Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.5 Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Transform-based image enhancement . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.1 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.2 Enhancement methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 Image Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.5.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.5.2 Rotation, Shifting and Resizing . . . . . . . . . . . . . . . . . . . . . . . 53
2.5.3 Canny Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.6 Data Mining and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.6.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.6.2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . 64
2.6.3 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.7 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.8 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Appendix A: Project Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
A.1 Preprocessing and Feature Extraction codes . . . . . . . . . . . . . . . . . . . . . 85
A.2 SVM Classification codes with data . . . . . . . . . . . . . . . . . . . . . . . . . 92
A.2.1 Original data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
A.2.2 Enhanced data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Vita
vii
LIST OF TABLES
Table 1.1 Hubble’s Original Classification of Nebulae Table . . . . . . . . . . . . . . 3
Table 2.1 Morphological Feature Descriptions . . . . . . . . . . . . . . . . . . . . . 64
Table 2.2 Feature Values Per Class . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Table 2.3 Galaxy list and relation between NED classification and current project
classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Table 2.4 Summary of classification results for original and enhanced data. Accuracy
improved by 12.924% due to enhancement. . . . . . . . . . . . . . . . . . 81
viii
LIST OF FIGURES
Figure 1.1 Hubble Tuning Fork Diagram. Image from http://www.physast.uga.edu/ rl-
s/astro1020/ch20/ch26_fig26_9.jpg. . . . . . . . . . . . . . . . . . . . . . 2
Figure 1.2 Plate scan of Elliptical and Irregular Nebulae from Mount Wilson Obser-
vatory originally included in Hubble’s paper, Extra-galactic Nebulae. . . . . 4
Figure 1.3 Plate scan of Spiral and Barred Spiral Nebulae from Mount Wilson Obser-
vatory originally included in Hubble’s paper, Extra-galactic Nebulae. . . . . 6
Figure 1.4 A plane projection of the revised classification scheme. . . . . . . . . . . . 10
Figure 1.5 A 3-Dimensional representation of the revised classification volume and
notation system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Figure 1.6 Sloan Digital Sky Survey coverage map. http://www.sdss.org/sdss-surveys/.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Figure 2.1 Schmidt Camera of Tonantzintla. Permission to use image from the Insti-
tuto Nacional de Astrofísica, Óptica y Electrónica (INAOE). . . . . . . . . 20
Figure 2.2 Plate Sky Coverage. Permission to use image from the Instituto Nacional
de Astrofísica, Óptica y Electrónica (INAOE). . . . . . . . . . . . . . . . . 21
Figure 2.3 Digitized plate AC8431 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Figure 2.4 Marked plate scan AC8431 . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 2.5 Plate scan AC8409 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 2.6 Marked plate scan AC8409 . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 2.7 Cropped galaxies from plate scans AC8431 and AC8409 read left to right
and top to bottom: NGC 4251, 4274, 4278, 4283, 4308, 4310, 4314, 4393,
4414, 4448, 4559, 3985, 4085, 4088, 4096, 4100, 4144, 4157, 4217, 4232,
4218, 4220, 4346, 4258. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Figure 2.8 Negative, log and power transformations. . . . . . . . . . . . . . . . . . . 28
ix
Figure 2.9 Top to bottom: Galaxy NGC4258 and its Negative Image. . . . . . . . . . . 29
Figure 2.10 Logarithmic and nth root transformations. . . . . . . . . . . . . . . . . . . 30
Figure 2.11 γ-power transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Figure 2.12 Galaxy NGC 4217 power law transformations. . . . . . . . . . . . . . . . . 32
Figure 2.13 Histogram processing to enhance Galaxy NGC 6070. . . . . . . . . . . . . 34
Figure 2.14 Top to Bottom: Histogram of original and enhanced image. . . . . . . . . . 35
Figure 2.15 Illustration of the median of a set of points in different dimensions. . . . . . 36
Figure 2.16 Signal-flow graph of determination of the five-point transformation by a
vector x = (x0, x1, x2, x3, x4)′. . . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 2.17 Network of the x-induced DsiHT of the signal z. . . . . . . . . . . . . . . . 44
Figure 2.18 Intensity values and spectral coefficients of Galaxy NGC 4242. . . . . . . . 46
Figure 2.19 Butterworth lowpass filtering performed in the Fourier (frequency) domain. 47
Figure 2.20 α-rooting enhancement of Galaxy NGC 4242. . . . . . . . . . . . . . . . . 47
Figure 2.21 Top: Galaxy PIA 14402, Bottom: NGC 5194, both processed by Heap
transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Figure 2.22 Computational scheme for galaxy classification. . . . . . . . . . . . . . . . 49
Figure 2.23 Background subtraction of Galaxy NGC 4274 by manual and Otsu’s thresh-
olding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Figure 2.24 Morphological opening for star removal from Galaxy NGC 5813. . . . . . 54
Figure 2.25 Rotation of Galaxy image NGC 4096 by galaxy second moment defined
angle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Figure 2.26 Resizing of Galaxy NGC 4220. . . . . . . . . . . . . . . . . . . . . . . . . 59
Figure 2.27 Canny edge detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Figure 2.28 PCA rotation of axes for a bivariate Gaussian distribution. . . . . . . . . . 65
Figure 2.29 Pictorial representation of the development of the geometric margin. . . . . 69
Figure 2.30 Maximum geometric margin. . . . . . . . . . . . . . . . . . . . . . . . . . 70
Figure 2.31 SVM applied to galaxy data. . . . . . . . . . . . . . . . . . . . . . . . . . 73
x
Figure 2.32 Classification iteration class pairs. . . . . . . . . . . . . . . . . . . . . . . 77
Figure 2.33 PCA feature space iteration 1 classification. . . . . . . . . . . . . . . . . . 78
Figure 2.34 PCA feature space iteration 2 classification. . . . . . . . . . . . . . . . . . 79
Figure 2.35 PCA feature space iteration 3 classification. . . . . . . . . . . . . . . . . . 79
Figure 2.36 PCA feature space iteration 4 classification. . . . . . . . . . . . . . . . . . 80
Figure 2.37 PCA feature space iteration 1 classification of enhanced data. . . . . . . . . 81
Figure 2.38 PCA feature space iteration 2 classification of enhanced data. . . . . . . . . 82
Figure 2.39 PCA feature space iteration 3 classification of enhanced data. . . . . . . . . 82
Figure 2.40 PCA feature space iteration 4 classification of enhanced data. . . . . . . . . 83
xi
Chapter 1: INTRODUCTION
1.1 Galaxy Classification
Why classify galaxies? It is an inherent characteristic of man to classify objects. Our country’s
government classifies families according to annual income to establish tax laws. Medical doctor’s
classify our blood’s type making successful transfusion possible. Organic genes are classified by
genetic engineers so that freeze resistant DNA from a fish can be used to "infect" a tomato cell
making the tomato less susceptible to cold. Words in the English language are assigned to the
categories noun, verb, adjective, adverb, pronoun, preposition, conjunction, determiner, and excla-
mation, allowing for the structured composition of sentences. Differential equations are classified
as ordinary (ODEs) and partial (PDEs) with ODEs having sub-categories: linear homogeneous,
exact differential equations, n-th order equations, etc..., which allowing easy of study and for solu-
tion methods to be developed for certain classes such as the method of undetermined coefficients
for ordinary linear differential equations with variable coefficients. If we say that a system is linear,
there is no need to mention that the system’s input-output relationship is observed to be additive
and homogeneous. Classification pervades every industry, and enables improved communication,
organization and operation within society. For galaxies classification in particular, astrophysicists
think that to understand the formation and subsequent evolution of galaxies one must first dis-
tinguish between the two main morphological classes of massive systems: spirals and early-type
systems which are also called ellipticals. Galaxies with spiral arms, for example, are normally ro-
tating disk of stars, dust and gas with plenty of fuel for future star formation. Ellipticals, however,
are normally more mature system which long ago finished forming stars. The galaxies’ histories
are also revealed; dust lane early-type galaxies are starbust systems formed in gas-rich mergers
of smaller spiral galaxies. A galaxy’s classification can reveal information about its environment.
A morphology-density relationship has been observed in many studies; spiral galaxies tend to be
located in low-density environments and ellipticals in more dense environments [1,2,3].
1
There are many physical parameters of galaxies that are useful for their classification, but this
paper considers the classification of galaxies by their morphology, a word derived from the Greek
word morph, meaning shape or form.
1.1.1 Hubble Scheme
Hubble’s scheme was visually popularized by the "tuning fork" diagram which displays examples
of each nebulae class, described in this section, in the transition sequence from early-type elliptical
to late-type spiral. The tuning fork diagram is shown in Figure 1.1. While the basic classification
Figure 1.1: Hubble Tuning Fork Diagram. Image from http://www.physast.uga.edu/ rls/as-
tro1020/ch20/ch26_fig26_9.jpg.
of galaxy morphology assigns members to the categories of elliptical and spiral, the most promi-
nent classification scheme was introduced by Sir Edwin Hubble in his 1926 paper, "Extra-galactic
Nebulae." This classification scheme is based on galaxy structure. The individual members of a
class differ only in apparent size and luminosity. Originally, Hubble stated that the forms divide
themselves naturally into two groups: those found in or near the Milky Way and those in moderate
2
or high altitude galactic latitudes. This paper, along with Hubble’s classification scheme will only
consider the extra-galactic division: Table 1.1 shows that this scheme contains two main divisions,
Table 1.1: Hubble’s Original Classification of Nebulae Table
Type: Symbol Example
A. Regular: N.G.C
1. Elliptical....................................................En
(n=1,2,...,7 indicates the ellipticity of the image)
3379
221
4621
2117
E0
E2
E5
E7
2. Spirals:
a) Normal spirals............................................S
(1) Early..........................................................Sa
(2) Intermediate..............................................Sb
(3) Late...........................................................Sc
b) Barred spirals.............................................SB
(1) Early..........................................................SBa
(2) Intermediate..............................................SBb
(3) Late...........................................................SBc
N.G.C.
4594
2841
5457
N.G.C.
2859
3351
7479
B. Irregular: ........................................................................Irr 4449
regular and irregular galaxies. Within the regular division, three main classes exist: elliptical,
spirals, and barred spirals. The term nebulae and galaxies are used interchangeably with a brief
discussion of the rational for this at the end of this subsection. N.G.C. and U.G.C are acronyms
for New General Catalogue and Uppsala General Catalogue, respectively, and are designations for
deep sky objects.
Elliptical galaxies range in shape from circular through flattening ellipses to a limiting lenticu-
lar figure in which the ratio of axes is about 1 to 3 or 4. They contain no apparent structure except
for their luminosity distribution which is maximum at the center of the galaxy and decreases to
unresolved edges. The degree to which an elliptical nebulae is flattened is determined by the cri-
terion, elongation, defined as (a − b)/a, where a and b are the semi major and semi minor axes,
respectively, or an ellipse fitted to the nebulae. The elongation mentioned here is different than,
and not to be confused with, the morphic feature elongation that is introduced later in this paper.
Elliptical nebulae are designated by the symbol,"E," followed by the numerical value of ellipticity.
3
The complete series is E0, E1,. . ., E7, the last representing a definite limiting figure which marks
the junction with spirals. Examples of nebulae with differing ellipticities are shown in Figure 1.2.
Figure 1.2: Plate scan of Elliptical and Irregular Nebulae from Mount Wilson Observatory origi-
nally included in Hubble’s paper, Extra-galactic Nebulae.
All regular nebulae with ellipticities greater than about E7 are spirals, and no spirals are known
4
with ellipticity less than this limit. Spirals are designated by the symbol "S". Classification criteria
for spiral nebulae is: (1) relative size of the unresolved nuclear region; (2) extent to which the arms
are unwound; (3) degree of resolution in the arms. Relative size of the nucleus decreases as the
arms of the spiral more widely open. The stages of this transition of spiral galaxies are designed
as "a" for early types, "b" for intermediate types, and "c" for late types. Nebulae intermediate
between E7 and Sa are occasionally designated as S0, or lenticular.
Barred spirals is a class of spirals which have a bar of nebulosity extending diametrically across
the nucleus. This class is designated by the symbol "SB", with a sequence which parallels that of
normal spirals, leading to the subdivision of barred spirals designated by "SBa", "SBb", and "SBc"
for early, intermediate and late type barred spirals, respectively. Examples of normal and barred
spirals along with their subclasses are shown in Figure 1.3.
Irregular nebulae are extra-galactic nebulae that lack both discriminating nuclei and rotational
symmetry. Individual stars may emerge from an unresolved background in these galaxies.
For any given imaging system, there is a limiting resolution beyond which classification cannot
be made with any confidence. Hubble designed galaxies within this category by the letter "Q."
On the usage of nebulae versus galaxy, the astronomical term nebulae has come down through
the centuries as the name for permanent, cloudy patches in the sky that are beyond the limits of
the solar system. In 1958, the term nebulae was used for two types of astronomical bodies: clouds
of dust and gas which are scattered among the stars of the galactic system (galactic nebulae),
and the remaining objects, which are now recognized as independent stellar systems scattered
through space beyond the limits of the galactic system (extra-galactic nebulae). Some astronomers
considered that since nebulae are now considered stellar systems they should be designated by
some other name, which does not carry the connotation of clouds or mist. Today, those who adopt
this consideration refer to other stellar systems as external galaxies. Since this paper only considers
external galaxies we will drop the adjective and employ the term galaxies for whole external stellar
systems [4].
5
Figure 1.3: Plate scan of Spiral and Barred Spiral Nebulae from Mount Wilson Observatory orig-
inally included in Hubble’s paper, Extra-galactic Nebulae.
6
1.1.2 de Vaucouleurs Scheme
The de Vaucouleurs Classification system is an extension of the Hubble Classification system, and
is the most commonly used system. For this reason it is noted in this paper.
About 1935, Hubble undertook a systematic morphological study of the approximately 1000
brighter galaxies listed in the Shipely Ames Catalogue, north of -30° declination, with a view of
refining his original classification scheme. The main revisions include a) the introduction of the
S0 and SB0 types regarded as transition stages between ellipticals and spirals at the branching off
point of the tuning fork. S0, or lenticular galaxies resemble spiral galaxies in luminosity, but do
not contain visible spiral arms. A visible lens surrounds these galaxies bordered by a faint ring
of nebulosity. Characteristics of lenticular galaxies are a bright nucleus in the center of a disc
or lens. Near the perimeter of the galaxy, there exists a faint rim or envelope with unresolved
edges. Hubble separated the lenticulars into two groups, S0(1) and S0(2). These groups have a
smooth lens and envelope, and some structure in the envelope in the form of a dark zone and ring,
respectively. S0/a is the transition stage between S0 and Sa and shows apparent developing spiral
structure in the envelope. SB0 objects are characterized by a bar through the central lens. Hubble
distinguished three groups of SB0 objects: group SB0(1) have a bright lens, with broad, hazy bar
and no ring, surrounded by a larger, fainter envelopes some being circular, group SB0(2) have a
broad, weak bar across a primary ring, with faint outer secondary rings, and group SB0(3) have a
well developed bar and ring pattern, with the bar stronger than the ring.
c) Harlow Shapely proposed an extension to the normal spiral sequence beyond Sc designating
galaxies showing a very small, bright nucleus and many knotty irregular arms by Sd. A parallel
extension of the barred spiral sequence beyond the stage SBc was introduced by de Vaucouleurs in
1955 which may be denoted SBd or SBm [5,6].
For Irregular type galaxies related to Magenellic Clouds, I(m), an important characteristic is
their small diameter and low luminosity which marks them as dwarf galaxies.
d) Shapely discovered the existence of dwarf ellipticals (dE) by observation of ellipticals with
7
very low surface brightness.
de Vaucouleurs noted that after all such types or variants have been assigned into categories,
there remains a hard core of "irregular" objects which do not seem to fit into any of the recognized
types. These outliers are presently discarded, and only isolated galaxies are considered in the
present article.
The coherent classification scheme proposed by de Vaucouleurs which included most of the
current revision and additions to the standard classification is described here. Classification and
notation of the scheme are illustrated in Figure 1.4, which may be considered as a plane projection
of the three dimensional representation in Figure 1.5. Four Hubble classes are retained: ellipticals
E, lenticulars S0, spirals S, irregulars I.
Lenticulars and spirals, were re-designated "ordinary" SA and "barred" SB, respectively, to
allow for the use of the compound symbol SAB for the transition stage between these two classes.
The symbol S alone is used when a spiral object cannot be more accurately classified as either SA
or SB because of poor resolution, unfavorable tilt, etc.
Lenticulars were divided into two subclasses, denoted SA0 and SB0, where SB0 galaxies have
a bar structure across the lens and SA0 galaxies do not. SAB0 denotes objects with a very weak
bar. The symbol S0 is now used for a lenticular object which cannot be more precisely classified
as either SA0 or SB0; this is often the case for edgewise objects.
Two main varieties are recognized in each of the lenticular and spiral families, the" annular"
or "ringed" type, denoted (r), and the" spiral" or " S-shaped" type, denoted (s). Intermediate types
are noted (rs). In the "ringed" variety the structure includes circular (sometimes elliptical) arcs or
rings (SO) or consists of spiral arms or branches emerging tangentially from an inner circular ring
(5). In the "spiral" variety two main arms start at right angles from a globular or little elongated
nucleus (5 A) or from an axial bar (5 B). The distinction between the two families A and B and
between the two varieties (r) and (s) is most clearly marked at the transition stage SO/a between
the SO and 5 classes. It vanishes at the transition stage between E and SO on the one hand, and at
the transition stage between 5 and I on the other (d. Fig. 3).
8
Four sub-divisions or stages are distinguished along each of the four spiral sequences SA(r),
SA (s), SB(r), SB(s), viz. "early", "intermediate" and "late" denoted a, b, e as in the standard
classification, with the addition of a "very late" stage, denoted d. Intermediate stages are noted 5
ab, 5 be, 5 cd. The transition stage towards the magellanic irregulars (whether barred or not) is
noted 5 m, e.g. the Large Magellanic Cloud is 5 B (s) m. Along each of the non-spiral sequences
the signs + and - are used to denote " early" and "late" subdivisions; thus E+ denotes a "late" E,
the first stage of the transition towards the SO class 2. In both the SAO and S BO sub-classes
three stages, noted SO-, 50°, 50+ are thus distinguished; the transition stage between SO and Sa,
noted SO/a by HUBBLE, may also be noted Sa-. Notations such as S a+, S b-, etc. may be used
occasionally in the spiral sequences, but the distinction is so slight between, say, 5 a+ and S b-,
that for statistical purposes it is convenient to group them together as 5 a b, etc. Experience shows
that this makes the transition subdivisions, Sab, Sbe, etc. as wide as the main sub-divisions, Sa,
Sb, etc. 3.
The classification of irregulars which do not show clearly the characteristic spiral structure are
noted I(m).
Figure 1.4 shows a plane projection of the revised classification scheme.Compare with Fig-
ure 1.5. The ordinary spirals SA are in the upper half of the figure, the barred spirals SB in the
lower half. The ring types (r) are the the left, the spiral types (s) to the right. Ellipticals and lentic-
ulars are near the center, magellanic irregulars near the rim. The main stages of the classification
sequence from E to Im through S0-, S0, S0+, Sa, Sb, Sc, Sd, Sm are illustrated, approximately
on the same scale, along each of the four main morphological series SA(r), SA(s), SB(r), SB(s).
A few mixed or "intermediate" types SAB and S(rs) are shown along the horizontal and vertical
diameters respectively. This scheme is superseded by the slightly revised and improved system
illustrated in Figure 1.5.
Figure 1.5 shows a 3-Dimensional representation of the revised classification volume and no-
tation system. From left to right are the four main classes: ellipticals E, lenticulars S0, spirals S,
and Irregulars I. Above are ordinary families SA, below the barred families SB; on the near side
9
Figure 1.4: A plane projection of the revised classification scheme.
are the S-shaped varieties s(s), on the far side the ringed varieties S(r). The shape of the volume
indicated that the separation between the various sequences SA(s), SA(r), SB(r), SB(s) is greatest
at the transition stage S0/a between lenticulars and spirals and vanishes at E and Im. A central
cross-section of the classification volume illustrates the relative location of the main types and the
notation system. There is a continuous transition of mixed types between the main families and va-
10
rieties across the classification volume and between stages along each sequence; each point in the
classification volume represents potentially a possible combination of morphological characteris-
tics. For classification purposes this infinite continuum of types is represented by a finite number
of discrete "cells" [5, 6, 7]. The classification scheme included here defers to [5, 6] for a complete
Figure 1.5: A 3-Dimensional representation of the revised classification volume and notation sys-
tem.
description.
11
1.2 Digital Data Volumes in Modern Astronomy
1.2.1 Digitized Sky Surveys
Modern astronomy has produced massive volumes of data relative to that produced at the start of
the 20th century. Digitized sky surveys attempt to construct a virtual photographic atlas of the
universe through the identification and cataloging of observed celestial phenomena for the purpose
of understanding the large-scale structure of the universe, the origin and evolution of galaxies,
the relationship between dark and luminous matter, and many other topics of research interest
in astronomy. This idea is being realized through the efforts of multiple organizations and all
sky surveys. Notable surveys and their night sky coverage contribution and data collection are
mentioned here.
The Sloan Digital Sky Survey (SDSS) is the most prominent on going all sky survey, in its
seventh data release almost 1 billion objects have been identified in approximately 35% of the
night sky. Comprehensive data collection for the survey which uses electronic light detectors for
imaging is projected at 15 terabytes [8]. An image from the SDSS displaying the current coverage
of the sky in orange with selected regions displayed in higher resolution is shown in Figure 1.6.
The Galaxy Evolution Explorer (GALEX), a NASA mission led by Caltech, has used micro
channel plate detectors in two bands to image 2/3 of the night sky from the GALEX satellite be-
tween 2003 and the present in its survey [9]. In 1969, the two micro sky survey (TMSS) scanned
70% of the sky and detected approximately 5,700 celestial sources of infrared radiation [10]. With
the advancement of infrared sensing technology, the Two micron "all-sky" survey (2MASS) de-
tected an 80,000 fold increase over the TMSS between 1997 and 2001. The 2MASS was conducted
by two separate observatories at Mount Hopkins Arizona and Cerro Tololo Inter-American Obser-
vatory (CITO), Chile, using 1.3 meter telescopes equipped with a 3 channel camera and a 256x256
electronic light detector. Each night of released data consisted of 250,000 point sources, 2,000
galaxies, and 5,000 images weighing about 13.8 Gigabytes per facility. The compiled catalog has
over 1,000,000 galaxies, extracted from 99.998% sky coverage and 4,121,439 atlas images [11].
12
Figure 1.6: Sloan Digital Sky Survey coverage map. http://www.sdss.org/sdss-surveys/.
Sky coverage by the Space Telescope Science Institute’s Guide Star Catalog 2 (GSC-2) survey
which occurred from 2000 to 2009 was 100%. The optical catalog produced by this survey used 1"
resolution scans of 6.5x6.5 square degrees photographic plates from the Palomar and UK Schmidt
telescopes. Almost 1 billion point sources were imaged. Each plate was digitized using a modified
microdensitometer with a pixel size of either 25 or 15 microns (1.7 or 1.0 arcsec respectively).
The digital images are 14000x14000 (0.4 GB) or 23040x23040 (1.1 GB) in size [12]. The second
Palomar Observatory Sky Survey (POSS2) images 897 plates between the early 1980’s and 1999
which covered the entire southern celestial hemisphere using the Oschin Schmidt telescope [13].
One of the main objectives of the ROSAT All-sky survey was to conduct the first all-sky survey
in X-ray with an imaging telescope leading to a major increase in sensitivity and source location
13
accuracy. ROSAT was conducted between 1990-1991 covering 99.7% of the sky [14]. The Faint
Images of the Radio Sky at Twenty-centimeters (FIRST) project was designed to produce the radio
equivalent of the Palomer Observatory Sky Survey 10,000 square degrees of the North and South
Galactic Caps. The survey began in 1993 and is currently active [15, 16]. The Deep Near Infrared
Survey (DENIS) is a survey of the southern sky in two infrared and one optical band conducted at
the La Silla European Space Observatory in Chile. The survey ran from 1996 through 2001 and
cataloged 355 million point sources [17]. The present work is part of the Tonantzintla Digital Sky
Survey which is discussed in Chapter 2.
1.2.2 Problem Motivation
The image quantity and data volume produced by digital sky surveys presents human analysis with
an impossible task. Therefore, source detection and classification in modern astronomy necessitate
automation in the image processing and analysis, providing the motivation for the present work.
To address this problem, an algorithm for processing astronomical images to classify galaxies con-
tained therein is presented and implemented using followed by class discrimination of the detected
galaxies according to the scheme mentioned in section 1.1.1. Class discrimination is performed
using extracted galaxy feature values which experience varying accuracy with different methods of
segmentation. Faint regions of galaxies can be lost during segmentation, leading to increased error
during feature extraction and subsequent classification. Enhancement of the galaxy image by mul-
tiple methods is proposed and implemented to reduce data loss during segmentation and improve
the accuracy of feature extraction implied through the increase of classification performance.
1.3 Problem Description and Proposed Solution
This project is part of the on going work within the Tonantzintla Digital Sky Survey. The present
work focuses on automated astronomical image processing and classification. Final performance
criterion is 100% classification in categories E0,. . . ,E7, S0, Sa, Sb, Sc, SBa, SBb, SBc, Irr, while
the present work builds towards that goal by incremental improvement of classification perfor-
14
mance with categories elliptical "E," spiral "S," lenticular "S0," barred spiral "SB," and irregular
"Irr." The intent in this work is to partially or fully resolve the classification performance limita-
tions within the galaxy segmentation, edge detection and feature extraction stages of the image
processing pipeline by enhancing the galaxy images by method of the Heap transform to preserve
the faint regions of the galaxies which may be lost during the processing of images without en-
hancement. Classification is performed by the supervised machine learning algorithm Support
Vector Machines (SVM).
1.4 Previous Work
1.4.1 Survey of Automated Galaxy Classification
Morphological classification of galaxies into 5 broad categories was performed by the artificial
neural network (ANN) machine learning algorithm with back propagation trained using 13 pa-
rameters by Storrie-Lombardi in [18]. Odewahn classified galaxies from large sky surveys using
ANNs in [35, 36, 37]. The development progress of an automatic star/galaxy classifier using Ko-
honen Self-Organizing Maps was presented in [38, 39] and using learning vector quantization and
fuzzy classified with back-propogation based neural networks in [39]. An automatic system to
classify images of varying resolution based on morphology was presented in [40]. Owens, in [19],
shows comparable performance between the machine learning algorithms of oblique decision trees
induced with different impurity measures to the artificial neural network used in [18] and that clas-
sification of the original data could be performed with less well-defined categories. In [20] an
artificial neural network was trained on the features of galaxies that were defines as a galaxy class
mean by 6 independent experts. The network performed comparable to the overall root mean
square dispersion between the experts. A comparison of the classification performance of an artifi-
cial neural network machine learning algorithm to that of human experts for 456 galaxies with their
source being the SDSS in [20] was detailed in [21]. Lahav showed the classification performance
of galaxy images and spectra an unsupervised artificial neural network trained with galaxy spectra
15
de-noised and compressed by principal component analysis. A supervised artificial neural net-
work was also trained with classes determined by human experts [22]. Folkes, Lahav and Maddox
trained an artificial neural network using a small number of principal components selected from
galaxy spectra with low signal-to-noise ratios characteristic of redshift surveys. Classification was
the performed into 5 broad morphological classes. It was shown that artificial neural networks are
useful in discriminating normal and unusual galaxy spectra [23]. The use of galaxy parameters lu-
minosity and color and the image-structure parameters: size, image concentration, asymmetry and
surface brightness to classify galaxy images into three classes was performed by Bershady, Jangren
and Conselice. It was determined that the essential features for discrimination were a combination
of spectral index, e.g., color, and concentration, asymmetry, and surface brightness [24]. A com-
parison using ensembles of classifiers for the classification methods Naive bayes, back propagation
artificial neural network, and a decision-tree induction algorithm with pruning was performed by
Bazell which resulted in the artificial neural network producing the best results, and ensemble
methods improving the performance of all classification methods [30]. A computational scheme
to develop an automatic galaxy classifier using galaxy morphology was shown to provide robust-
ness for classification using artificial neural networks in [26,34]. Bazell derived 22 morphological
features, including asymmetry, which were used to train an artificial neural network for the clas-
sification of galaxy images to determine which features were most important [27]. Strateva used
visual morphology and spectral classification to show that two peaks correspond roughly to early
(E, S0, Sa) and late-type (Sb, Sc, Irr) galaxies. It was also shown that the color of galaxies corre-
lates with their radial profile [28]. The Gini coefficient, a statistic commonly used in econometrics
to measure the distribution of wealth among a population, was used to quantify galaxy morphol-
ogy based on galaxy light distribution in [29]. In [31], an algorithm for preprocessing galaxy
images for morphological classification was proposed. In addition, the classification performance
between an artificial neural network, locally weighted regression and homogeneous ensembles of
classifiers was performed for 2 and 3 galaxy classes. Lastly, compression and discrimination by
principal component analysis was performed. The artificial network performed best under all con-
16
ditions. In [32], principal component analysis was applied to galaxy images and a structural type
estimator names "ZEST" used a 5 nonparametric diagnosis to classify galaxy structure. Finally,
Banerji presented morphological classification by artificial neural networks for 3 classes yielding
90% accuracy in comparison to human classifications [33].
1.4.2 Survey of Support Vector Machines
This method of class segregation is performed by hyperplanes which can be defined by a variety
of functions, both linear and non linear. The development of this method is presented in Chapter 2.
Support vector machines (SVMs) have been employed widely in the areas of pattern recognition
and prediction. Here a limited survey of SVM applications is presented, which includes two sur-
veys conducted by researchers in the field. Romano applied SVMs to photometric and geometric
features computed from astronomical imagery for the identification of possible supernovae in [42].
M. Huertas-Company applied SVM to 5 morphological features, luminosity and redshift calcu-
lated from galaxy images in [43]. Freed and Lee classified galaxies by morphological features into
3 classes using a SVM in [44]. Saybani conducted a survey of SVMs used in oil refineries in [45].
Xie proposed a method for predicting crude oil prices using a SVM in [90]. Petkovi used a SVM
to predict the power level consumption of an oil refinery in [47]. Balabin performed near infrared
spectroscopy for gasoline classification using nine different multivariate classification methods in-
cluding SVMs in [48]. Byun and Lee conducted a comprehensive survey on applications of SVMs
for pattern recognition and prediction in [41]. References contained therein are included here in
support of the present survey. For classification with q classes (q>2), classes are trained pairwise.
The pairwise classifiers are arranged in trees where each tree node represents a SVM. A bottom up
tree originally proposed for recognition of 2D objects was applied to face recognition in [49, 50].
In contrast, an interesting approach was the top down tree published in [51]. SVMs applied to
improve classification speed of face detection was presented in [63, 53]. Face detection from mul-
tiple views was presented in [56, 55, 54]. A SVM was applied to coarse eigenface detection for
a fine detection in [57]. Frontal face detection using SVMs was discussed in [58]. [59] presented
17
SVMs for face and eye detection. Independent component analysis for face features were input
to the SVM in [60], orthogonal Fourier-Mellin Moments in [61], and an overcomplete wavelet
decomposition as input in [62]. A myriad of other applications have been ventured using SVMs
including but not limited to 2-D and 3-D object recognition [64, 65, 66], texture recognition [66],
people and pose recognition [67, 68, 69, 70, 71], moving vehicle detection [72], radar target recog-
nition [73, 76], hand written character and digit recognition [74, 75, 71, 77], speaker or speech
recognition [78,79,80,81], image retrieval [82,83,84,85], prediction of financial time series [86],
bankruptcy [87], and other classifications such as gender [88], fingerprints [89], bullet-holes for
auto scoring [90], white blood cells [91], spam categorization [92], hyperspectral data [93], storm
cells [94], and image classification [95].
1.4.3 Survey of Enhancement Methods
Image enhancement is the process of visually improving the quality of a region of or the entire
image with respect to some measure of quality, e.g., the Image Enhancement Measure (EME)
introduced in Chapter 2. Enhancement methods can be classified as either spatial domain or trans-
form domain methods depending on whether the manipulation of the image is performed directly
on the pixels or on the spectral coefficients, respectively. Here, a survey of both spatial and trans-
form domain methods is presented for the enhancement of astronomical images and images in
general. Spatial domain methods are commonly referred to as contrast enhancement methods.
The core of these methods are histogram equalization, logarithmic and inverse log transforma-
tions, negative and identity transformations, nth-power and nth-root transformations, histogram
matching and local histogram processing. Adaptive histogram equalization, which uses local con-
trast stretching to calculate several histograms corresponding to distinct sections of the image, was
applied after denoising to improve the contrast of astronomical images in [96, 99, 100, 34] and
generic images in [106]. Traditional histogram equalization was applied to the Hale-Bopp comet
image for enhancement in [98] and other astronomical images in [97, 101, 103, 104, 105]. [102]
included histogram equalization in the development of two algorithms for point extraction and
18
matching for registration of infrared astronomical images. Astronomical images were logarithmi-
cally transformed for visualization in [108] and likewise for generic images in [127]. Inverse log
transformations, negative and identity transformations, nth-power and nth-root transformations,
histogram matching and local histogram processing are introduced and applied to generic images
in [107, 126, 127, 129]. At the core of transform domain methods for image enhancement exist
the discrete Fourier, Heap, α-rooting, Tensor, and Wavelet transforms. Astronomical image en-
hancement performed by the discrete Fourier transform was presented in [109, 111, 112], by the
Wavelet transform in [110] and by the Heap and α-rooting transform in [113], and the Curvelet
transform in [114, 98]. The enhancement of generic images can be seen in [115, 127, 128, 129] by
the discrete Fourier and Cosine transforms, in [116] by the Heap transform, in [117,118,127,128]
by the α-rooting, in [119,120,121,122] by the Tensor or Paried transform, in [123,98,124] by the
Wavelet transform, and in [124,125] by other methods of transform domain processing.
19
Chapter 2: MORPHOLOGICAL CLASSIFICATION AND IMAGE
ANALYSIS
2.1 Astronomical Data Collection
Figure 2.1: Schmidt Camera of Tonantzintla. Permission to use image from the Instituto Nacional
de Astrofísica, Óptica y Electrónica (INAOE).
20
The Tonantzintla Schmidt camera was constructed in the Harvard Observatory shop under the
guidance of Dr. Harlow Shapley, and started operation in 1942. The spherical mirror is 762 mm
in diameter and coupled to a 660.4 mm correcting plate. The camera is shown in figure 2.1. The
8x8 inch2 photographic plates cover a 5ºx5º field with a plate-scale of 95 arcsec/mm. The existing
collection consists of a total of 14565 glass plates: 10445 taken in direct image mode; and 4120
through a 3.96° objective prism. Figure 2.2 shows the sky covered by the complete plate collection,
marking the center of each observed field [130].
Figure 2.2: Plate Sky Coverage. Permission to use image from the Instituto Nacional de As-
trofísica, Óptica y Electrónica (INAOE).
The plates are first digitized at the maximum optical resolution of the scanner, 4800 dots per
inch (dpi), and then rebinned by a factor 3 for a final pixel size of ˜ 15 μm (1.51 arcsec/pixel) and
transformed to the transparency (positive) mode. Each image has 12470 x 12470 pixels (about 350
Mb in 16-bit mode) and is stored in FITS format.
The images in this project were received from the collection of digitzed photographic plates at
21
the Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE). The present data set consists
of 6 plate scans. All 6 plates were marked to indicate the galaxies contained within the image. The
goal is the process the digitized plates automatically, i.e., segmenting galaxies within the image,
calculating their features and performing classification. In initial attempts of processing the plate
scans in Matlab on an Alienware M14x with an Intel Core i7-3840QM 2.80GHz CPU and 12.0GB
DDRAM5, e.g, applying the watershed algorithm for segmentation, memory consumption errors
were experienced. Consequently, the galaxies within each plate scan were cropped and process-
ing individually. Figures 2.3, 2.4, 2.5, 2.6, and 2.7 show the original digitized plates AC841 and
AC8409, their marked versions indicating captured galaxies, and the cropped galaxies from both
plates. Upon performing automatic classification with the cropped images, one of the University
of Texas at San Antonio’s (UTSA) high performance computing clusters SHAMU, will be used for
the automatic classification of whole plate scans. SHAMU consists of twenty-two computational
nodes and two high-end visualization nodes. Each computational node is powered by dual Quad-
core Intel Xeon E5345 2.33GHz processors (8M Cache). SHAMU consists of twenty-three Sun
Fire X4150 servers, four Penguin Relion 1800E servers, a DELL Precision R5400 and a DELL
PowerEdge R5400. SHAMU utilizes GlusterFS open-source file system over high speed Infini-
Band connection. A Sun StorageTek 2530 SAS array, fully populated with twelve 500GB hard
drives, acts as SHAMU’s physical storage in a RAID 5 configuration. SHAMU is networked to-
gether with two DELL PowerConnect Ethernet switches and one QLogic Silverstorm InfiniBand
switch.
2.2 Image enhancement measure (EME)
To measure the quality of images and select optimal processing parameters, we consider the de-
scribed in [131, 128] quantitative measure of image enhancement that relates to Weber’s law of
human visual system. This measure can be used for selecting the best parameters for image en-
hancement by the Fourier transform, as well as other unitary transforms. The measure is defined
as follows. A discrete image {fn,m} of size N1 × N2 is divided by k1k2 blocks of size L1 × L2,
22
Figure 2.3: Digitized plate AC8431
where integers Li = [Ni/ki], i = 1, 2. The quantitative measure of enhancement of the processed
image, Ma : {fn,m} → {fn,m}, is defined by
EMEa(f) =1
k1k2
k1∑k=1
k2∑l=1
20 log10
[maxk,l(f )
mink,l(f)
],
where maxk,l(f) and mink,l(f) respectively are the maximum and minimum of the image fn,m
inside the (k, l)th block, and a is a parameter, or a vector parameter of the enhancement algorithm.
23
Figure 2.4: Marked plate scan AC8431
EMEa(f ) is called a measure of enhancement, or measure of improvement of the image f. We
define a parameter a0 such that EME(f) = EMEa0(f) to be the best (or optimal) Φ-transform-
based image enhancement vector parameter. Experimental results show that the discrete Fourier
transform can be considered as the optimal, when compared with the cosine, Hartley, Hadamard,
and other transforms. When Φ is the identity transformation, I, the EME of f = f is called the
enhancement measure of the image f, i.e., EME(f) = EMEI(f). EME values of the enhanced
galaxy images are presented in subsequent subsections.
24
Figure 2.5: Plate scan AC8409
2.3 Spatial domain image enhancement
Contrast enhancement is the process of improving image quality by manipulating the values of
single pixels in an image. This processing is said to occur in the spatial domain, meaning that the
image involved in processing is represented as a plane in 2-Dimensional Euclidean space, which
coined contrast enhancement methods as spatial domain methods. Contrast enhancement in the
spatial domain is paralleled by transform based methods which operate in the frequency domain as
25
Figure 2.6: Marked plate scan AC8409
is shown in following subsections. The image enhancement is described by a transformation T
T : f(x, y)→ g(x, y) = T[f(x, y)]
where f(x, y) is the original image, g(x, y) is the processed image, and T is the enhancement
operator. As a rule, T is considered to be a monotonic and invertible transformation.
26
Figure 2.7: Cropped galaxies from plate scans AC8431 and AC8409 read left to right and top to
bottom: NGC 4251, 4274, 4278, 4283, 4308, 4310, 4314, 4393, 4414, 4448, 4559, 3985, 4085,
4088, 4096, 4100, 4144, 4157, 4217, 4232, 4218, 4220, 4346, 4258.
2.3.1 Negative Image
This transformation is especially useful for processing binary images, e.g., text-document images,
and is described as
Tn : f(x, y) → g(x, y) = M − f(x, y)
27
for every pixel (x, y) in the image plane. M is the maximum intensity in the image f(x, y). Figure
2.8 shows this transformation for the image 0 ≤ f(x, y) ≤ L − 1, where L is the intensity value
in the image. In the discrete, M is the maximum level, M = L− 1, and Tn : r → s = L− 1− r,
where r is the original image intensity and s is the intensity mapped by the transformation. The
example of an image negative is given in Figure 2.9.
0 50 100 150 200 2500
50
100
150
200
250
identitynegative46*log(1+r)16*sqrt(1+r)
40*(1+r)(1/3)
0.004*r2
c*r3
Figure 2.8: Negative, log and power transformations.
2.3.2 Logarithmic Transformation
The logarithmic function is used in image enhancement, because it is a monotonically increasing
function. The transformation is described as
Tl : f(x, y) → g(x, y) = c0log(1 + f(x, y))
28
Figure 2.9: Top to bottom: Galaxy NGC4258 and its Negative Image.
where c0 is a constant and is calculated as c0 = M/log(1 + M) in order to preserve the resolution
of the enhanced image by gray scale. For example, for the 256-gray level scale image, c0 ≈ 46.
Other versions of this transform are based on the use of the nth roots instead of the log function as
29
shown in Figure2.8. For example,
T2 : f(x, y)→ g(x, y) = c0
√1 + f(x, y).
where the constant c0 = 16, when processing a 256-level gray scale image. Examples of image
enhancement by such transformations are given in Figure 2.10.
(a) Original image (b) log transformation
(c) square root transformation (d) 3rd root transformation
Figure 2.10: Logarithmic and nth root transformations.
2.3.3 Power Law Transformation
These transformations are parameterized by γ and described as
Tγ : f(x, y) → g(x, y) = cγ(1 + f(x, y))γ
30
where γ > 0 is a constant which is selected by the user. The constant cγ is used to normalize the
gray scale levels within [0,M].
For 0 ≤ γ ≤ 1, the transform maps a narrow range of dark samples of the image into a wide
range of bright samples, and it smoothes the difference between intensities of bright samples of the
original image. The Power law transformation is shown with γ = 0.0500, 0.8500, 1.6500, 2.4500,
3.2500, 4.0500, and 4.8500 in Figure 2.11.
0 50 100 150 200 2500
50
100
150
200
250
original0.050.851.652.453.254.054.85
Figure 2.11: γ-power transformation.
Examples of image enhancement by power log transformations are given in Figure 2.12.
2.3.4 Histogram Equalization
Consider an image of size N×N as a random realization ξ takes values r from a range [rmin, rmax],
and let h(r) = fξ(r) be the probability density function of ξ. It is desirable to transform the image
in such a way that the new image will have the uniform distribution. The equates to a change of
31
(a) Original image (b) γ = 0.005
(c) γ = 0.3 (d) γ = 0.9
Figure 2.12: Galaxy NGC 4217 power law transformations.
32
random variable
ξ → ξ = w(ξ) (w : r → s)
such that w is a monotonically increasing function
h(s) = fbξ(s) =1
w(rmax)−w(rmin).
The following fact is well-known:
h(s) = h(r)dr
ds
or h(r)dr = h(s)ds. Integrating this equality yields
r∫rmin
1
w(rmax)− w(rmin)ds =
r∫rmin
h(a)da
which yields s = w(r)
w(r) −w(rmin)
w(rmax)− w(rmin)=
r∫rmin
h(a)da = F (r).
In the particular case, when rmin = 0 and w(rmin) = 0, the following result is obtained
w(r) = w(rmax)F (r).
In the case of digital image, where the image has been sampled and quantized, the discrete version
of this transform has the representation
r → s =
⎧⎪⎪⎨⎪⎪⎩[M
r∑k=1
h(k)
]if r = 1, 2, . . . , M − 1
0 if r = 0
where r is the integer value of the original image, s is the quantized value of the transformed
image, and h(k) is the histogram of the image.
33
So, independent of the image intensity probability density function, the intensity density func-
tion of the processed image is uniform,
fbξ(s) =1
w(rmax)− w(rmin).
Histogram equalization applied to galaxy NGC 6070 is shown in Figure 2.13 with the correspond-
ing original and enhanced image histograms shown in Figure 2.14. The histogram equalization
destroys the details of the galaxy image, indicating that spatial methods of enhancement are not
suitable for all images. This is part of the motivation for using α-rooting, Heap transform, and
other transform based which are described in the next section.
(a) Original image (b) Histogram equalization
Figure 2.13: Histogram processing to enhance Galaxy NGC 6070.
2.3.5 Median Filter
A noteworthy spatial domain filter is the Median filter. This filter is based on order statistics. Given
a set of numbers S = {1, 2, 1, 4, 2, 5, 6, 7}, the values in S are rearranged in order of descending
value, i.e., 7, 6, 5, 4, 2, 2, 1, 1, and labeled as order statistics in ascending order, i.e., 7 is the 1st order
statistic and the second 1 is the 7th order statistic. The 4 and adjacent 2 can both be considered
34
0 50 100 150 200 250 3000
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 50 100 150 200 250 3000
1000
2000
3000
4000
5000
6000
7000
8000
9000
Figure 2.14: Top to Bottom: Histogram of original and enhanced image.
35
as the median here, and the selection is made at the discretion of the user. In general, the highest
order statistics is regarded as the nth order statistic.
The Median filter comes from the follow problem in probability. Given a set of points S =
{x1, x2, . . . , x7} containing the Median point m, i.e., m ∈ S, which point in the set closest to
every other point in the set. Figures 2.15 illustrate this in two different ways.
The Median m is found by minimization of the following function
|m− x1|+ |m− x2|+ |m− x3|+ · · ·+ |m− xn| =n∑
k=1
|xk −m|.
In signal filtration, the Median filter preserves the range and edges of the original signal in contrast
to the mean filter which destroys the signal edges. For signals with many consecutive noisy points,
the length of the median filter must be extended to retain this behavior. The Median filter has the
root property where the output of the filtration will be identical to the previous output after a certain
number of filtration iterations. The Median filter is effective in removing salt and pepper noise.
x
xx
x
x
mx
x
5
2
1
4
3
6
7
(a) median in the line
x x x x m x x x1 2 3 4 5 6 7 x8
(b) median in space
Figure 2.15: Illustration of the median of a set of points in different dimensions.
36
2.4 Transform-based image enhancement
In parallel to directly processing image pixels in the spatial domain by contrast enhancement meth-
ods, transform based methods of enhancement manipulate the spectral coefficients of an image in
the domain of the transform. The primary benefits of these methods are low computational com-
plexity and the usefulness of unitary transforms for filtering, coding, recognition, and restoration
analysis in signal and image processing. First the operators that transform the domain of the image
are introduced followed by methods of enhancement in the transform domain.
2.4.1 Transforms
Each of the following transforms presented here in one dimension can easily be extended into two
dimensions which is where the transforms are useful for image processing.
Fourier Transform
The one dimensional discrete Fourier transform (1-D DFT) maps the real line in the time domain to
the complex domain resulting in time domain signals being transformed into the frequency domain.
The direct transform and inverse transform pair are defined, for a discrete function xn, as
Fp =
N−1∑n=0
xncos
(2πnp
N
)− jxnsin
(2πnp
N
)
xn =1
N
N−1∑p=0
Fpcos
(2πnp
N
)+ jFpsin
(2πnp
N
)
where n = 0, 1, . . . , N − 1 represents discrete time points and p = 0, 1, . . . , N − 1 represents
discrete frequency points. The basis functions for this transform are complex exponentials. The
"real" and "imaginary" parts of this sum are considered as the sum of the cosine terms and the sum
of the sine terms, respectively, and are computed by the fast Fourier transform.
37
Hartley Transform
Similar to the Fourier transform is the Hartley transform, but only generates real coefficients. This
transform is defined in the one dimensional case as
Hp =N−1∑n=0
xn
(cos
(2πnp
N
)+ sin
(2πnp
N
))=
N−1∑n=0
xncas
(2πnp
N
)
where the basis function cas(t) = cos(t) + sin(t). The inverse transform is calculated by
xn =1
N
N−1∑p=0
Hpcas
(2πnp
N
)
Cosine Transform
The cosine transform or cosine transform of type 2 is determined by the following basis functions:
φp(n) =
⎧⎪⎪⎨⎪⎪⎩1√2N
, if p = 0
1√N
cos
(π(n + 1/2)p
N
), if p �= 0
for the p = 0 case as
Xc0 =
1√2N
N−1∑n=0
xn
and for the p �= 0 case as
Xcp =
1√N
N−1∑n=0
xncos
(π(n + 1/2)p
N
)
=1√N
N−1∑n=0
xn
(cos( πn
2N
)cos(pπn
N
)− sin
( πn
2N
)sin(pπn
N
))where p = 1 : (N − 1).
38
Paired Transform
The one dimensional unitary discrete paired transform (DPT), also known as the Grigoryan trans-
form is described in the following way. The transform describes a frequency-time representation
of the signal by a set of short signals which are called the splitting-signals. Each such signal is
generated by a frequency and carries the spectral information of the original signal in a certain set
of frequencies. These sets are disjoint. Therefore, the paired transform transfers the signal into a
space with frequency and time, or space which represents a source "bridge" between the time and
frequency. Consider the most interesting case, when the length of signals is N = 2r , r > 1. Let
p, t ∈ XN = {0, 1, . . . , N − 1}, and let χp,t(n) be the binary function
χp,t(n) =
⎧⎪⎨⎪⎩ 1, if np = tmodN
0, otherwisen = 0 : (N − 1).
Given a sample p ∈ XN and integer t ∈ [0, N/2], the function
χ′p,t(n) = χp,t(n)− χp,t+n/2(n)
is called the 2-paired, or shortly the paired function.
The complete set of these functions is defined for frequency points p = 2k, k = 0, . . . , r − 1
and p = 0, and time points 2kt. The binary paired functions can also be written as the following
transformation of the consine function:
χ′2k,2kt(n) = M(cos(2π(n− t)/2r−k)), (χ′
0,0(n) ≡ 1),
where t = 0 : (2r−k−1 − 1). M(x) is the real function which is not zero only on the bounds
of the interval [−1, 1] and takes values M(−1) = −1 and M(1) = 1. The paired functions are
determined by the extremal values of the consine functions, when they run through the interval
with different frequencies.
39
The totality of the N paired functions
{χ′2k,2kt; n = 0 : (r − 1), t = 0 : (2r−n−1 − 1, 1}
is the complete and orthogonal set of functions [132,134].
Haar Transform
The Haar transform is the first orthogonal transform found after the Fourier transform, which is
now widely used in wavelets theory and in applications in image processing, in the N = 2r, r > 1
the transform is defined without normalization by the following matrix:
[HA2] =
⎡⎢⎣ 1 1
1 −1
⎤⎥⎦
[HA4] =
⎡⎢⎣ [HA2] [HA2]√
2I2 −√2I2
⎤⎥⎦ ,
where I2 is the unit matrix 2× 2, and for k > 2
[HA2k+1] =
⎡⎢⎣ [HA2k] [HA2k]√
2kI2k −√
2kI2k
⎤⎥⎦ .
Heap Transform
The discrete Heap transform is a new concept which was introduced by Artyom Grigoryan in 2006
[135]. The basis functions of the transformation represent certain waves which are propagated in
the “field" which is associated with the signal generator. The composition of the N-point discrete
heap transform, T, is based on the special selection of a set of parameters ϕ1, ..., ϕm, or angles
from the signal generator and given rules, where m ≥ N − 1. The transformation T is considered
40
separable, which means there exist such transformations Tϕ1, Tϕ2, ..., Tϕm that
T = Tϕ1,...,ϕm = Tϕi(m). . . Tϕi(2)
Tϕi(1)
where i(k) is a permutation of numbers k = 1, 2, ..., m.
Consider the case when each transformation Tϕkchanges only two components of the vec-
tor z = (z1, ..., zN−1)′. These two components may be chosen arbitrarily and such a selection is
defined by a path of the transform. Thus, Tϕkis represented as
Tϕk: z→ (z1, ..., zk1−1, fk1(z, ϕk), zk1+1, ..., zk2−1, fk2(z, ϕk), zk2+1, ..., zm). (2.1)
Here the pair of numbers (k1, k2) is uniquely defined by k, and 1 ≤ k1 < k2 ≤ m. For simplicity
of calculations, we assume that all first functions fk1(z, ϕ) in (2.1) are equal to a function f(z, ϕ),
as well as all functions fk2(z, ϕ) equal to a function g(z, ϕ). The n-dimensional transformation
T = Tϕ1,...,ϕm is composed by the transformations
Tk1,k2(ϕk) : (zk1, zk2) → (f(zk1, zk2 , ϕk), g(zk1, zk2, ϕk)).
The selection of parameters ϕk, k = 1 : m, is based on specified signal generators x, the num-
ber of which is defined through the given decision equations, to achieve a uniqueness of parameters
and desired properties of the transformation T. Consider the case of two decision equations with
one signal-generator.
Let f(x, y, ϕ) and g(x, y, ϕ) be functions of three variables; ϕ is referred to as the rotation
parameter such as the angle, and x and y as the coordinates of a point (x, y) on the plane. It is
assumed that, for a specified set of numbers a, the equation g(x, y, ϕ) = a has a unique solution
with respect to ϕ, for each point (x, y) on the plane or its chosen subset.
41
The system of equations ⎧⎪⎨⎪⎩ f(x, y, ϕ) = y0
g(x, y, ϕ) = a
is called the system of decision equations [135]. First the value of ϕ is calculated from the second
equation which we call the angular equation. Then, the value of y0 is calculated from the given
input (x, y) as y0 = f(x, y, ϕ). It is also assumed that the two-point transformation
Tϕ : (z0, z1)→ (z′0, z
′1) = (f(z0, z1, ϕ), g(z0, z1, ϕ)),
which is derived from the given decision equations by Tϕ : (x, y)→ (f(x, y, ϕ), a), is unitary. We
call Tϕ the basic transformation.
Example 1: Consider the following functions that describe the elementary rotation:
f(x, y, ϕ) = x cos ϕ− y sinϕ,
g(x, y, ϕ) = x sinϕ + y cosϕ.
Given a real number, the basic transformation is defined as the rotation of the point (x, y) to the
horizontal Y = a,
Tϕ : (x, y)→ (x cos ϕ− y sin ϕ, a).
The rotation angle ϕ is calculated by
ϕ = arccos
(a√
x2 + y2
)+ arctan
(y
x
).
The first pair to be processed is (x0, x1),
(x0, x1) → (x(1)0 , a),
42
the next is (y0, x2),
(x(1)0 , x2) → (x
(2)0 , a),
with the new value of x0 = x(2)0 , and so on. The first component of the signal is renewed and
participates in calculation of all (N − 1) basic transformations Tk = Tϕk, k = 1 : (N − 1).
Therefore, at the stage k, the first component of the transform is y0 = x(k)0 .
The complete transform of the signal-generator x is
T (x) = (y0, a1, a2, . . . , aN−1), (y0 = x(N−1)0 ).
The signal-flow graph of processing the five-point generator x is shown in Figure 2.16.
T1
T2
T3
T4
x1
x0
x3
x2
x4
a4
a3
a1
a2
y0
y0
y0
y0
Tk=T(φ
k), k=1:4
φk=r(y
0,x
k,a
k)
Figure 2.16: Signal-flow graph of determination of the five-point transformation by a vector x =(x0, x1, x2, x3, x4)
′.
This transform is applied the the input signal zn in the same order, or path P , as the generator
x. In the first stage the first two components are processed
Tϕ1 : (z0, z1) → (z(1)0 , z
(1)1 ),
next
Tϕ2 : (z(1)0 , z2) → (z
(2)0 , z
(1)2 ),
43
φ1
φ2 φ
N−1
z0
(1)z
0
(2)z0
z1
(1)
z0
(N−1)
φ1,T
1φ
2,T
2φ
N−1,T
N−1
Tφ
1
Tφ
2
Tφ
N−1
z1
z2 z
N−1
x1
z2
(1) zN−1(1)
z0
(N−2)
x2
xN−1
x0 x
0
(1) x0
(2) x0
(N−2)y
0
...
...
......
Level 1
Level 2
Figure 2.17: Network of the x-induced DsiHT of the signal z.
and so on. The result of the transform is
T [z] = (z(n−1)0 , z
(1)1 , z
(1)2 , . . . , z
(1)N−1), a = 0.
Now consider the case when all parameters ak = 0, i.e., when the whole energy of the vector
x is collected in one heap, and then transfered to the first component. In other words, we consider
the Givens rotations of vectors, or points (y0, xk) on the horizontal Y = 0. Figure 2.16 shows
the transform-network of the transform of the signal z = (z0, z1, z2, ..., zN−1)′. The parameters
(angles) of the transformation are generated by the signal-generator x. In the 1st level and the kth
stage of the flow-graph, the angle ϕk is calculated by inputs (x(k−1)0 , xk), where k ∈ {1, N − 1}
and x(0)0 = x0. This angle is used in the basic transform Tk = Tϕk
to define the next component
x(k)0 , as well as to perform the transform of the input signal z, in the 2nd level. The full graph itself
represents a co-ordinated network of transformation of the vector z, under the action on x.
2.4.2 Enhancement methods
The common algorithm for image enhancement via a 2-D invertible transform consists of: The
frequency ordered system-based method can be represented as
x → X = T(x)→ O ·X → T−1[O(X)] = x.
44
Algorithm 2.1 Transform based image enhancement
1. Perform the 2-D unitary transform
2. Multiply the transform coefficients, X(p, s) by some factor, O(p, s)
3. Perform the 2-D inverse unitary transform
O is an operator which could be applied on the coefficients X(p, s) of the transform or its real
and imaginary parts ap,s and bp,s if the transform is complex. For instance, they could be X(p, s),
aαp,s, bα
p,s, or logαap,s, logαbp,s. The cases of greatest interest are when O(X)p,s is an operator of
magnitude and when O(x)p,s is performed separately on the coefficients.
Let X(p, s) be the transform coefficients and let the enhancement operator O be of the form
X(p, s) · C(p, s), where the latter is a real function of the magnitude of the coefficients, i.e.,
C(p, s) = f(|X|)(p, s). C(p, s) must be real since only modification of the magnitude and not
phase information is desired. The following possibilities are a subset of methods for modifying the
magnitude coefficients within this framework.
1. C1(p, s) = C(p, s)γ|X(p, s)|α−1, 0 ≤ α < 1 (which is the so-called modified α-rooting);
2. C2(p, s) = logβ[|X(p, s)|λ + 1
], 0 ≤ β, 0 < λ;
3. C3(p, s) = C1(p, s) · C2(p, s).
α, λ, and β are the parameters of the enhancement which are selected by the user to achieve the
desired enhancement. Denoting by θ(p, s) ≥ 0 the phase of the transform coefficient X(p, s), the
transform coefficient can be expressed as
X(p, s) = |X(p, s)|ejθ(p,s)
where |X(p, s)| is the magnitude of the coefficients. The investigation of the operator O applied to
the modules of the transform coefficients instead of directly to the transform coefficients X(p, s)
45
will be performed as
O(X)(p, s) = O(|X|)(p, s)|e[jθ(p,s)].
The assumption that the enhancement operator O(|X|) takes one of the forms Ci(p, s)|X(p, s)|, i =
1, 2, 3 at every frequency point (p, s) is made. Figure 2.18 shows Galaxy NGC 4242 in the time
domain (pixel intensity values) and frequency domain (spectral coefficients).
(a) intensity image (b) spectral coefficients
Figure 2.18: Intensity values and spectral coefficients of Galaxy NGC 4242.
Figure 2.19 shows Butterworth lowpass filtering for Galaxy UGC 7617 for n = 2 and D0 =
120. The transfer function of the filter of order n with cutoff frequency at a distance D0 from the
origin is defined as
X(p, s) =1
1 + [D(p, s)/D0]2n.
α-rooting
Figure 2.20 shows the enhancement of Galaxy NGC 4242 by method C1(p, s) with α = 0.02.
Heap transform
Figure 2.21 shows the results of enhancing galaxy images PIA 14402 and NGC 5194 by the Heap
transform.
46
(a) original image (b) low pass filtering
Figure 2.19: Butterworth lowpass filtering performed in the Fourier (frequency) domain.
(a) original image (b) enhancement by α = 0.02
Figure 2.20: α-rooting enhancement of Galaxy NGC 4242.
2.5 Image Preprocessing
The steps taken to prepare the galaxy images for feature extraction are detailed in this section. The
position, size, and orientation of the galaxy varies from image to image. Therefore, the prepro-
cessing steps will produce a training set that is invariant to galaxy position, scale and orientation.
Individual galaxies were cropped from the digitized photographic plates and processed manually
by adjusting parameters at several stages in the pipeline. Automatic selection of these parameters
if part of future work. Figure 2.5 shows the computational scheme for the classification pipeline.
47
Figure 2.21: Top: Galaxy PIA 14402, Bottom: NGC 5194, both processed by Heap transform.
2.5.1 Segmentation
Other than the object of interest, galaxy images contain stars, gast, dust, and artifacts induced
during the imaging and scanning process. For a galaxy to be recognized, such contents not included
in the galaxy need to be removed. In general, this process involves denoising and inpainting. Here,
the background is subtracted via a single threshold or Otsu’s method. Otsu’s method is calculated
in Matlab by the command graythresh. Otsu’s method automatically selects a good threshold
for images where there are few stars and the galaxy intensity varies greatly from the background.
As the quantity and size of stars increase in the image, or when the background is close in intensity
to the galaxy, Otsu’s method is not performing well. After background subtraction by thresholding,
stars and other artifacts are removed by the morphological opening operation by different values
of pixel connectivity using the Matlab function bwareaopen.
A grayscale image relates to a function f(x, y) that takes values from a finite interval [0, M ].
In the discrete case, M is considered to be a positive integer. Consider an image with only one
48
Galaxy Images
Segmentation:
Thresholding
Morphological Opening
Feature Invariance:
Rotation, Centering, Resizing
Canny Edge Detection
Feature Extraction:
Elongation
Form Factor
Convexity
Bounding-rectangle-to-fill-factor
Bounding-rectangle-to-perimeter
Asymmetry Index
Support Vector Machine
Galaxy Classes
Figure 2.22: Computational scheme for galaxy classification.
49
object
f(x, y) =
⎧⎪⎨⎪⎩ 1 (x, y) ∈ O ⊂ X
0 otherwise
where O is the set of pixels in the object, and X is the whole domain of the image. The function
f(x, y) represents a binary image. Any number can be used instead of 1, e.g., 255. Thresholding
is defined as the following procedure
g(x, y) = gT (x, y) =
⎧⎪⎨⎪⎩ 1 f(x, y) ≥ T
0 otherwise
where T is a positive number from the interval [0, M ]. This number is called a threshold.
Otsu’s method begins by representing a grayscale image by L gray levels. ni represents the
number of pixels at level i, and the total number of pixels N = n1 + n2 + . . . + nL. The image
histogram is then described by a probability distribution
pi =ni
N, pi ≥ 0,
L∑i=1
pi = 1.
The intensity values are then separated into two classes C0 and C1 by a threshold k, where C0
represents the intensities [0, . . . , k] and C1, [k + 1, . . . , L]. The occurrence, mean levels for each
class are respectively given by
w0 = Pr(C0) =
k∑i=1
pi = w(k)
w1 = Pr(C1) =L∑
i=k+1
pi = 1− w(k)
and
μ0 =k∑
i=1
iPr(i|C0) =k∑
i=1
ipi
w0=
μ(k)
w(k)
50
μ1 =L∑
i=k+1
iPr(i|C1) =L∑
i=k+1
ipi
w1=
μT − μ(k)
1− w(k)
where w(k) and μ(k) are the zeroth- and first-order moments up the the kth level, respectively, and
μT = μ(L) =L∑
i=1
ipi
is the total mean level of the original image. The following relationships are easily verified for any
k
w0μ0 + w1μ1 = μT , w0 + w1 = 1. (2.2)
The class variances are given by
σ20 =
k∑i=1
(i− μ0)2Pr(i|C0) =
k∑i=1
(i− μ0)2pi
w0
σ21 =
L∑i=k+1
(i− μ1)2Pr(i|C1) =
L∑i=k+1
(i− μ1)2pi
w1.
The following criteria to measure k as an effective threshold are introduced from discriminant
analysis
λ =σ2
B
σ2W
, κ =σ2
T
σ2W
, η =σ2
B
σ2T
,
where
σ2W = w0σ
20 + w1σ
21
σ2B = w0(μ0 − μT )2 + w1(μ1 − μT )2
and from equation 2.2
σ2T =
L∑i=1
(i− μT )2pi
are the within-class variance, the between-class variance, and the total variance of levels, respec-
tively.
51
Through relationships between the criteria, the problem becomes finding the k that maximizes
the criterion η or equivalently σ2B by
η(k) =σ2
B
σ2T
or
σ2B(k) =
[μTw(k)− μ(k)]2
w(k)[1− w(k)],
and, as shown in [136], the optimal threshold k∗, restricted to the range S∗ = {k; 0 < w(k) < 1}is
σ2B(k∗) = max
1≤k<Lσ2
B(k).
Figure 2.23 shows original images with subtracted backgrounds by different manual thresholds
and Otsu’s method.
(a) Original image (b) T = 60
(c) T = 74 (d) Otsu’s T = 85
Figure 2.23: Background subtraction of Galaxy NGC 4274 by manual and Otsu’s thresholding.
The average difference between single thresholds and thresholds by Otsu’s method for the
enhanced data set was 6.67 with a standard deviation of 11.21.
Mathematical morphology provides image processing with powerful nonlinear filters which
52
operate according to the Minkowski’s addition and subtraction. Given subsets X and B of Rn,
Minkowski’s addition, X ⊕B, of sets X and B is the set
X ⊕ B =⋃b∈B
{Xb = {x + b; x ∈ X}}.
For the set B = {−b; b ∈ B} symmetric to B with respect to the origin, the set X ⊕ B is called a
dilation of the set X by B. The set B is said to be a structuring element.
So, in the symmetric case, if B = B, Minkowski’s addition of sets X and B and the dilation
of X by B are the same concepts.
The dual operation to Minkowski’s addition of sets X and B is the subtractions, X B, which
is defined as
X B = (Xc ⊕ B)c =⋂b∈B
{Xb = {x + b; x ∈ X}}.
The set X B dual to the dilation X ⊕ B is called an erosion of the set X by B. By means of
dilation and erosion of sets, the corresponding operations of opening, X ◦ B, and closing, X • B,
can be defined as
X ◦ B = (X B)⊕ B =⋃{x + B; x + B ⊂ X}
X • B = (Xc ◦ B)c = (X ⊕ B) B.
Herewith, the operation of opening of X by B is dual to the operation of closing of X by B, i.e.,
X ◦ B = (Xc • B)c. Figure 2.24 shows star and artifact removal of Galaxy NGC 5813 with pixel
connectivity P = 64.
2.5.2 Rotation, Shifting and Resizing
To achieve invariance to orientation, position, and scale, the galaxies were shifted by their geomet-
rical center, rotated by the angle between their first principal component and the image x-axis, and
resized to a uniform size of 128x128 pixels, respectively.
53
(a) original image (b) thresholded image
(c) opened image
Figure 2.24: Morphological opening for star removal from Galaxy NGC 5813.
54
The geometrical center, or centroid, of an object in an image is the center of mass of the object.
The center is the point where one can concentrate the whole mass of the object without changing
the first moment relative to any axis. The first moment with respect to the x axis is defined by
μx
∫ ∫X
f(x, y)dxdy =
∫ ∫X
xf(x, y)dxdy.
The first moment with respect to the y axis is defined by
μy
∫ ∫X
f(x, y)dxdy =
∫ ∫X
yf(x, y)dxdy.
The coordinate of the object center is then (μx, μy).
In the discrete case, the first moment with respect to the axis x is defined by
μx
∑n
∑m
fn,m =∑
n
∑m
nfn,m =∑
n
n∑m
fn,m
and with respect to the y axis
μy
∑n
∑m
fn,m =∑
n
∑m
mfn,m =∑
n
m∑m
fn,m
where the summation is performed over all pixels (n, m) of the object O.
The center of the object is defined as
(μx, μy) =
⎛⎜⎜⎝∑
n
∑m
nfn,m∑n
∑m
fn,m
,
∑n
∑m
mfn,m∑n
∑m
fn,m
⎞⎟⎟⎠ .
55
In the discrete binary case, the center is defined as
(μx, μy) =
⎛⎜⎜⎜⎝∑
(n,m)∈O
n
∑(n,m)∈O
1,
∑(n,m)∈O
m
∑(n,m)∈O
1
⎞⎟⎟⎟⎠ =
⎛⎜⎜⎜⎝∑
(n,m)∈O
n
card(O),
∑(n,m)∈O
m
card(O)
⎞⎟⎟⎟⎠where card(O) is the cardinality of the set O that defines the binary image.
To find the orientation of an object in an image, if possible or if such exists and is unique,
consider a line along which the second moment is minimum. In other words, consider the integral
E = μ2(l) =
∫ ∫l
r2f(x, y)dxdy (2.3)
where r is the distance of point (x, y) from the line l, i.e., the length of the perpendicular emitted
from point (x, y) to the line l. The line l is described by the equation
l : xsinθ − ycosθ + p = 0
where p is the length of the perpendicular drawn from the origin (0, 0) to the line l. Therefore, 2.3
can be rewritten as
E = E(θ) =
∫ ∫l
(xsinθ − ycosθ + p)2f(x, y)dxdy. (2.4)
The following two denotations are made to for the image coordinates shifted by the geometrical
center of the object
x′ = x− μx, y′ = y − μy,
and the second moments of the shifted object are denoted
a =
∫ ∫l
(x′)2f(x, y)dx′dy′, c =
∫ ∫l
(y′)2f(x, y)dx′dy′, b =
∫ ∫l
(x′)2(y′)2f(x, y)dx′dy′.
56
E(θ) can then be rewritten as
E(θ) = asin2(θ)− bsin(θ)cos(θ) + ccos2(θ)
or E(θ) =1
2(a + c)− 1
2(a− c)cos(2θ)− 1
2bsin(2θ).
Differentiating E by θ gives
E(θ)′ = 0 → tan(2θ) =b
a− c(a �= c �= b).
Therefore, the angle of the orientation line l(θ) is found by
sin(2θ) = ± b√b2 + (a− c)2
, cos(2θ) = ± a− c√b2 + (a− c)2
.
The angle of the orientation line l(θ) was calculated for each galaxy image, and the used to rotate
the image by the Matlab function imrotate. Figure 2.25 shows this rotation for galaxy image
NGC 4096 by angle−64 degrees. Note that the image x-axis of the image in Matlab is vertical, and
the desired orientation of the galaxy’s first principal component being collinear with the horizontal
axis of the image is achieved by rotating the galaxy an additional 90 degrees.
(a) segmented galaxy (b) rotated galaxy
Figure 2.25: Rotation of Galaxy image NGC 4096 by galaxy second moment defined angle.
57
Resizing an image involves either subsampling if the desired image size is less than the original
image size and resampling if the desired image size is greater than the original image. Subsampling
reduces the size of an image by creating a new image with pixel value a calculated from the values
of a neighborhood of pixels about a in the original image. Resampling from the image size of
128× 128 into 256× 256 is calculated by
⎡⎢⎢⎢⎢⎢⎢⎢⎣
· · · ·· a b ·· c d ·· · · ·
⎤⎥⎥⎥⎥⎥⎥⎥⎦→
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
· · · · · ·· a a b b ·· a a b b ·· c c d d ·· c c d d ·· · · · · ·
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.
Another process of subsampling is defined by calculation of means, as follows below for the
2× 2 subsampling example where
a =a1 + a2 + b1 + b2
4, b =
a3 + a4 + b3 + b4
4
c =c1 + c2 + d1 + d2
4, d =
c3 + c4 + d3 + d4
4.
Image resizing is peformed in Matlab by the function imresize. Figure 2.26 shows an example
of image resizing from size 138× 197 into size 128× 128 for galaxy image NGC 4220.
2.5.3 Canny Edge Detection
The Canny edge detection method was developed by John Canny in 1986. The Canny edge detector
was developed to satisfy the performance criteria: (1) Good detection (2) Good localization (3)
Only one response to a single edge. Good detection means reducing false positives (non edges
being detected as edges) and false negatives (edges not being detected). Good localization means
that minimal error exists between identified edge points and true edge points. Only one response
58
(a) cropped image size 138× 197 (b) image resized to 128× 128
Figure 2.26: Resizing of Galaxy NGC 4220.
to a single edge ensures that the operator eliminates the multiple maxima output from the filter
at step edges. Canny formulated each of these three criterion mathematically and found solutions
through numerical optimization. The result is that impulse response of the first derivative of a
Gaussian approximately the optimal edge detector which optimizes the signal-to-noise ratio and
localization, i.e., the first two criteria. The edge detected algorithm is presented below here. Let
f(x, y) denote the input image and G(x, y) denote the Gaussian function
G(x, y) = e−x2 + y2
2σ2 .
The convolution of these two functions results in a smoothing of the input image and is written as
s(x, y) = f(x, y) ∗G(x, y),
where σ controls the degree of smoothing of the image.
First order finite difference approximations are used to compute the gradient of s(x, y) which
59
is written as [sx, sy] where
sx =δs
δx, sy =
δs
δy.
The gradient magnitude and orientation or angle are respectively computed by
M(x, y) =√
sx2 + sy
2
and
α(x, y) = tan−1
[sy
sx
].
The array of image magnitudes will contain large values in the directions of greatest change. The
array is then thinned so that only the magnitudes at the points of greatest local change remain.
This procedure is called non maxima suppression. An example presents this notion. Consider a
3× 3 grid where 4 possible orientations are found through the center point in the grid: horizontal,
vertical, +45 degrees, and −45 degrees. All possible orientations have been discretized into these
4 orientations. A range of orientations is then specified to quantize the orientations. Edge direction
is determined by the edge normal, computed by 2.5.3.
Let dk, k = 1, 2, . . . , n represent the discrete orientations where n is the number of orientations.
Using the 3× 3 grid, every nonmaxima suppression scheme at every point (x, y) in α(x, y) can be
formulated as where st(x, y) is the nonmaxima suppressed image.
Algorithm 2.2 Nonmaxima suppression algorithm
1. Find the orientation dk which is closest to α(x, y)
2. Set st(x, y) = 0ifM(x, y) is less than at least one of its two neighbors along dk, otherwise,
st(x, y) = M(x, y).
Finally, a hysteresis thresholding is applied to st(x, y) to reduce falsely detected edges. Two
thresholds are used here and are referred to as a weak (or low) threshold τ1 and a strong (or high)
threshold τ2. Too low of a threshold will retain false positives. Too high of a threshold will remove
60
correctly detected edges. The double threshold produces two new images written as
stw(x, y) = st(x, y) ≥ τ1
where stw(x, y) denotes the image created due to the weak threshold and
sts(x, y) = st(x, y) ≥ τ2
where sts(x, y) denotes the image created due to the strong threshold. Edges in sts(x, y) are
linked into contours by searching through an 8 pixel neighborhood in stw(x, y) for edges that can
be linked to the end of the current edge. The output of the algorithm is the image of all nonzero
points in stw(x, y) appended to sts(x, y). Canny edge detection was performed using the Matlab
function edge with τ1 = 0.3, τ2 = 0.9 and σ = 1.5. Figure 2.27 shows the Canny edge detector
for multiple galaxy images.
2.6 Data Mining and Classification
The canonical problem addressed in the field of data mining and classification is the following:
Given a very large family of vectors (signals, images, etc.) each of which lives in a high dimen-
sional space, how can the set be effectively represented this data for storage and retrieval, for
recognizing patterns within the images, and for classifying objects. In the subsequent sections, a
small subset of the tools used in statistics, data mining, and machine learning in astronomy will be
investigated to address the posed problem of the representation and classification of galaxy images.
2.6.1 Feature Extraction
A useful galaxy feature descriptor varies in value so that a classifier can discriminates between
input galaxies and place each galaxy into one of several classes. The shape, or morphologi-
cal, features used in this paper are described in [26, 31, 137] and are Elongation (E), Form Fac-
61
(a) NGC 6070 original (b) NGC 6070 canny edge
(c) NGC 4460 original (d) NGC 4460 canny edge
(e) NGC 4283 original (f) NGC 4283 canny edge
Figure 2.27: Canny edge detection.
62
tor (F), Convexity (C), Bounding-rectangle-to-fill-factor (BFF), Bounding-rectangle-to-perimeter
(BP), and Asymmetry Index (AI). Table ?? gives the average values of the original data for these
features.
Elongation has higher values for spiral and lenticular galaxies and lower values for irregular
and elliptical galaxies. This feature can be written as
E =(a− b)
(a + b)
where a is the major axis and b is the minor axis.
Form factor is useful in dividing spiral galaxies from other classes. This feature can be written
as
F =A
P 2
where A is the number of pixels in the galaxy and P is the number of pixels in the galaxy edge
found by canny edge detection.
Convexity has larger for spirals with open winding arms and lower values for compact galaxies
such as are in the class elliptical. This feature can be written as
C =P
(2H + 2W )
where P is as defined above and H and W are the height and width of minimum bounding rectangle
for the galaxy.
Bounding-rectangle-to-fill-factor is... This feature is defined as
BFF =A
HW
where A, H, and W are as defined above.
Bounding-rectangle-to-perimeter shows a decreasing trend from compact and circular galaxies
63
Table 2.1: Morphological Feature Descriptions
Feature Formula
E (a−b)/(a+b) Has higher values for s
F A/P 2 Form factor is useful in dividing spiral gala
C P/(2H+2W ) Convexity has larger for spirals with open winding arms and lower v
BFF A/HW
BP HW/(2H+2W )2 Bounding-rectangle-to-perim
AIP
i,j |I(i,j)−I180(i,j)|/Pi,j |I(i,j)| The asymmetry index tends towa
Table 2.2: Feature Values Per Class
Feature Elliptical Lenticular Simple Spiral Barred Spiral Irregular
E 0.071 0.382 0.547 0.485 0.214
F 0.059 0.049 0.025 0.029 0.044
C 0.888 0.872 1.05 1.01 0.953
BFF 0.744 0.699 0.609 0.583 0.634
BP 0.062 0.052 0.043 0.048 0.059
AI 0.274 0.375 0.510 0.464 0.354
to open and edge-on galaxies. This feature can be written as
BP =HW
(2H + 2W )2
where H and W are as defined above.
The asymmetry index tends towards zero when the image is invariant under a 180 degree rota-
tion. This feature can be written as
AI =
∑i,j
|I(i, j)− I180(i, j)|∑i,j
|I(i, j)|
where I is the original image and I180 is the image rotated by 180 degrees.
2.6.2 Principal Component Analysis
Data may be highly correlated, but represented such that its axes are not aligned with the directions
in which the data varies the most. A data set generated by N observations with K measurements
64
per observation lives in a K-Dimensional space, each dimension, or axis, representing a feature of
the data. To represent the data in a more compact form, the axes can be rotated to be collinear with
the directions of maximum variance in the data, thereby discriminating between the data points. In
other words, this rotation results in the first feature being collinear with the direction of maximum
variance, the second feature being orthogonal to the first and maximizing the residual variance,
and so on. This dimensionality reduction technique is called Principal Component Analysis (PCA),
also known as the Karhunen-Loéve transform or Hotelling transform, and is depicted in Figure 2.28
for a bivariate Gaussian distribution. Consider the data set xi with N observations and K features
Figure 2.28: PCA rotation of axes for a bivariate Gaussian distribution.
written as the N ×K matrix X. The covariance matrix of zero mean data is estimated as
CX =1
N − 1XT X
65
where N is the dimension of the matrix and division by N−1 is necessary for CX to be an un-biased
estimate of the covariance matrix. Nonzero components in the off diagonal entries represent corre-
lation between the features, whereas zero components represent uncorrelated data. PCA transform
the original data into equivalent uncorrelated data so that the covariance matrix of the new data is
diagonal and the diagonal entries decrease from top to bottom. To achieve this, PCA attempts to
find a nonsingular matrix R which transforms X into such an ideal matrix. The data transforms to
Y = XR and its covariance estimate to
CY = RT XT XR = RT CXR
The first column r1 of R is the first principal component, and is along the direction of the data with
maximum variance. The columns of R which are called principal components form an orthonormal
basis of the data space. The first principal component r1 can therefore be derived using Lagrangian
multipliers and setting equal to zero the cost function φ(r1, λ) as
φ(r1, λ) = rT1 CXr1 − λ1(r
T1 r1− 1).
Settingδφ(ri, λ)
δri
set= 0 then gives
CXr1 − λ1r1 = 0 or CXr1 = λ1r1.
This shows that λ1 is an eigenvalue of the covariance matrix CX , i.e., a root of (CX − λ1I) = 0.
λ1 = rT1 CXr1 being the largest eigenvalue in CX equates to maximizing the variance along the
first principal component. The remaining principal components are derived in the same manner.
CX The matrix CY is the transformation of CX in the basis consisting of the columns of R,
the eigenvectors of CX . This comes to have the new basis, i.e., the columns of R have a basis, of
eigenvectors of CX . Since CX is symmetric by definition, the Spectral Theorem guarantees that
the eigenvectors of CX are orthogonal. These eigenvectors can be listed in any order and CY will
66
remain diagonal. However, the requirement of PCA is to list them such that the diagonal entries of
CY be in decreasing order of their values, which comes to a unique order of the eigenvectors which
make the columns of R. The order of the components (or dimensions) is the so named rank-order
according to variance. With CX = RCY RT and these eigenvectors in this order, the set of principal
components is defined.
The morphological feature data described in 2.6.1 was reduced in dimension from 6 to 2 by
keeping the first two principal components for both the comparison of classification performance
with compressed data and visualization. All classification figures in the following sections were
generated from the classification of PCA features.
2.6.3 Support Vector Machines
The Support Vector Machine (SVM) learning algorithm captures the structure of a multi-class
training data set towards predicting class membership of unknown data with correctness and high
decision confidence. Classes are divided by a decision boundary or hyperplane defined by with the
minimum distance between the boundary and nearest point in each class defining the margins of
the boundary, which the SVM optimizes. Points that lie on the margin are called support vectors.
Consider a linear classifier for a binary classification problem with labels y, y ∈ {−1, 1}, and
features x. The classifier is written as
hw,b(x) = g(wT x + b),
and
g(z) =
⎧⎪⎨⎪⎩ 1 if z ≥ 0
−1 otherwise
67
where w is the weight vector, and b is the bias of the hyperplane. Given a training example
(x(i), y(i)), the functional margin of (w, b) is defined with respect to the training example as
γ(i) = y(i)(wT x(i) + b).
If y(i) = 1, then wT (x(i) + b) need to be a large positive number for a large functional margin, and,
conversely, if y(i) = −1, then wT (x(i) + b) needs to be a large negative number. A large functional
margin represents a confident and correct prediction.
With the chosen g, if w and b are scaled by 2, the function margin is scaled by a factor of 2.
However, since g(wT x + b) = g(2wT x + 2b), no change would occur in hw,b(x). This shows that
hw,b(x) depends only on the sign, and not the magnitude, of g(wTx + b).
Given a training set S = {(x(i), y(i)); i = 1, 2, . . . , m}, the functional margin of (w, b) with
respect to S is defined as the smallest functional margin of the individual training examples and is
written as
γ = mini=1,...,m
γ(i).
Another type of margin is the geometric margin. Consider the training set in Figure 2.6.3.
The hyperplane defined by (w, b) is shown, along with vector w, which is normal to the hyper-
plane. Point A represents positive training example x(i) with label y(i) = 1. The geometric margin
of point A, γ(i), has distance of line segment AB. Point B is defined by x(i) − γ(i)w/||w||. Since
point B is on the decision boundary, which satisfies the equation wT x + b = 0, then
wT
(x(i) − γ(i) w
||w||)
+ b = 0.
Solving for γ(i) yields
γ(i) =wT x(i) + b
||w|| =
(w
||w||)T
x(i) +b
||w|| .
In general, the geometric margin of (w, b) with respect to any training example (x(i), y(i)) is given
68
�
��
��
��
��
��
��
��
��
��
��
�
�
�
�
�
�
�
��
�
���
�����
w
�
�A
B γ(i)
��
���
�
�
�
�
�
�
Figure 2.29: Pictorial representation of the development of the geometric margin.
by
γ(i) = y(i)
((w
||w||)T
x(i) +b
||w||
).
Note that if ||w|| = 1, then the geometric margin equals the functional margin. Additionally, the
geometric margin is invariant to scaling the parameters w and b.
Given a training set S = {(x(i), y(i)); i = 1, 2, . . . , m}, the geometric margin of (w, b) with
respect to S is defined as the smallest geometric margin of the individual training examples and is
written as
γ = mini=1,...,m
γ(i).
Assuming the training data is linearly separable, the problem of determining the boundary decision
that maximizes the geometric margin is posed as the follow optimization problem
maxγ,w,b
γ subject to y(i)(wT x(i) + b) ≥ γ, i = 1, 2, . . . , m and ||w|| = 1.
The ||w|| = 1 constraint is non-convex. To work towards recasting the optimization problem as
convex, first recall that γ = γ/||w||. With this relation, the problem can then be written as an
69
�
��
��
��
��
��
��
��
��
��
��
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
����
����
����
����
����
����
��
����
����
����
����
����
��
Figure 2.30: Maximum geometric margin.
optimization of the functional margin that achieves the geometric margin optimization:
maxγ,w,b
γ
||w|| subject to y(i)(wTx(i) + b) ≥ γ, i = 1, 2, . . . , m.
Again, the object functionγ
||w|| is non-convex, and the problem cannot be solved by standard
optimization software.
Recall that w and b can be scaled without affecting the decision of our classifier. The scaling
constraint that the functional margin of (w, b, ) with respect to the training set must be 1 is intro-
duced, γ = 1.γ
||w|| then becomes1
||w|| , and since maximizing1
||w|| is equivalent to minimizing
||w||, the geometric margin convex optimization problem is then posed as
minγ,w,b
1
2||w||2 subject to y(i)(wTx(i) + b) ≥ 1, i = 1, 2, . . . , m.
Which can be solved by the commercial quadratic programming (QP) code. Figure ?? illustrates
the geometric margin for a training set.
Whereas the previous problem is referred to as the primal form, optimization theory tells of
70
a dual form for expressing the primal problem. Constructing the Lagrangian for the optimization
problem gives
L(w, b, α) =1
2||w||2 −
m∑i=1
αi[y(i)(wT x(i) + b)− 1]. (2.5)
To find the dual form of the problem, L(w, b, α) is minimized with respect to w and b for fixed α.
Setting the derivatives of L with respect to w and b to zero gives
�wL(w, b, α) = w −m∑
i=1
αiy(i)x(i) = 0
which implies that
w =m∑
i=1
αiy(i)x(i) = 0. (2.6)
Take the derivative with respect to b gives
δ
δbL(w, b, α) =
m∑i=1
αiy(i) = 0. (2.7)
Substituting the definition of w in (2.2) into the Lagrangian in (2.1) yields
L(w, b, α) =
m∑i=1
αi − 1
2
m∑i,j=1
y(i)y(j)αiαj(x(i))Tx(j) − b
m∑i=1
αiy(i),
but from (3) the last term is equal to zero, which gives
L(w, b, α) =m∑
i=1
αi − 1
2
m∑i,j=1
y(i)y(j)αiαj(x(i))Tx(j).
This result, along with the constraints αi ≥ 0 and (10), the following dual optimization problem is
obtained
maxα
W (α) =m∑
i=1
αi − 1
2
m∑i,j=1
y(i)y(j)αiαj〈x(i), x(j)〉
subject to αi ≥ 0, i = 1, 2, . . . , m, and
m∑i=1
αiy(i) = 0.
71
Suppose the model’s parameters have been fit to a training set. The task now is to predict class
membership of a new point input x by calculating wT x + b and, if this quantity is grater than zero,
predict y = 1. Using the expression for w in (2.2), this calculation can be written
wTx + b =
(m∑
i=1
αiy(i)x(i)
)T
x + b (2.8)
g(z) =m∑
i=1
αiy(i)〈x(i), x〉+ b (2.9)
where the points x(i) for which αi �= 0 are the support vectors.
So far, the assumption for the data is linear separability. In application this assumption is
relaxed by the introduction of slack variables ξi leading to the primal minimization formulation
minγ,w,b
1
2||w||2 subject to y(i)(wTx(i) + b) ≥ 1− ξi, i = 1, 2, . . . , m.
with the following constraints limiting the amount of slack
ξi ≥ 0 and∑
i
ξi ≤ C.
Therefore, misclassification is bounded in quantity by C .
Finally, the SVM optimization is equivalent to minimizing
m∑i=1
(1− y(i)g(xi))+ + λ||w||2, (2.10)
where λ is related to the misclassification bound C and the index + indicates x+ = max(0, x).
Figure 2.31 shows the SVM decision boundary computed for the data of 15 galaxies having
class membership to either class Irregular or Regular. SVM maps data from the input space Υ to
a feature space F using a nonlinear function φ : Υ → F called a kernel so that the discriminant
72
5 10 15 20 25 30−3
−2
−1
0
1
2
3
4
5
6
I (training)I (classified)R (training)R (classified)Support Vectors
Figure 2.31: SVM applied to galaxy data.
function becomes
hw,b(x) = wT φ(x) + b. (2.11)
Many kernel functions are possible, and the present work has used the quadratic kernel
K(x, x′) = (xT x′ + 1)d. (2.12)
2.7 Results and Discussion
The galaxy data used in this classification is listed in Table 2.3. The name of each galaxy is given
along with its corresponding classification obtained from the NASA/IPAC Extragalactic Database
(NED) and the relation between the NED classification and the scheme used in the present work.
Only the major galaxy classes Elliptical "E," Lenticular "S0," Spiral "S," Barred Spiral "SB," and
73
Irregular "Irr" were used in classification. All subclasses listed in the table below such as Sa, Sd,
SBm, etc... were, in the SVM training and validation, generalized to belong to their respective
major class. Galaxy NGC 4457 has NED classification SAB0/a(s), which is interpreted as either
S0 or SBa for compliance with the present scheme, and was judicially assigned to class barred
spiral (SB) due to similarities between the feature values of NGC 4457 and the SB class. Galaxy
NGC 4144 has NED classification SAB(s)cd? edge-on and was not used in classification since a
definite relation to the present classification scheme was unable to be determined.
Table 2.3: Galaxy list and relation between NED classification and current project classification
Galaxy name N.E.D. Class Present Work Class
NGC 4278 E1-2 E
NGC 4283 E0 E
NGC 4308 E? E
NGC 5813 E1-2 E
NGC 5831 E3 E
NGC 5846 E0-1 E
NGC 5846A compact E2+ E
NGC 4346 S0 edge-on S0
NGC 4460 SB0ˆ+(s)? edge-on S0
NGC 4251 SB0? edge-on S0
NGC 4220 SA0ˆ+(r) S0
NGC 4346 S0 edge-on S0
NGC 4324 SA0ˆ+(r) S0
NGC 5854 SB0 S0
NGC 5838 SA0ˆ- S0
NGC 5839 SAB0ˆ0?(rs) S0
NGC 5864 SB0ˆ0(s)? edge-on S0
74
Table 2.3: Continued
NGC 5865 SAB0ˆ- S0
NGC 5868 SAB0ˆ- S0
NGC 4310 SAB0ˆ+(r) S0
NGC 4218 Sa? Sa
NGC 4217 Sb edge-on Sb
NGC 4100 SA(rs)bc Sb/Sc
UGC 10288 Sc: edge-on Sc
NGC 6070 SA(s)cd Sc/Sd
UGC 07617 Sd Sd
NGC 4457 SAB0/a(s) (S0)/SBa
NGC 4314 SB(rs)a SBa
NGC 4274 SB(r)ab SBa/SBb
NGC 4448 SB(r)ab SBa/SBb
NGC 4157 SAB(s)b? edge-on SBb
NGC 5850 SB(r)b SBb
NGC 5806 SAB(s)b SBb
NGC 4232 SBb pec? SBb
NGC 4088 SAB(rs)bc SBb/SBc
NGC 4258 (Messier 106) SAB(s)bc SBb/SBc
NGC 4527 SAB(s)bc SBb/SBc
NGC 4389 SB(rs)bc pec? SBb/SBc
NGC 4496 SBc SBc
NGC 4085 SAB(s)c SBc
NGC 4096 SAB(rs)c SBc
NGC 4480 SAB(s)c SBc
75
Table 2.3: Continued
UGC 10133 SAB(r)c SBc
NGC 4559 SAB(rs)cd SBc/SBd
NGC 4242 SAB(s)dm SBd
NGC 4393 SABd SBd
NGC 4288 SB(s)dm SBd/SBm
NGC 3985 SB(s)m SBm
NGC 4449 IBm Irr
UGC 07408 IAm Irr
UGC 07577 Im Irr
UGC 07639 Im Irr
UGC 07690 Im Irr
NGC 4496B IB(s)m Irr
NGC 4144 SAB(s)cd? edge-on not used
The classification scheme used in this project is a subset of Hubble’s classification scheme;
galaxies are assigned to 1 of the 5 major classes: Elliptical "E," Lenticular "S0," Spiral "S," Barred
Spiral "SB," and Irregular "Irr." Classification was performed by two classes at a time using Sup-
port Vector Machines (SVM) in Matlab’s Statistical Toolbox with both a linear and quadratic kernel
and default parameters. The Matlab functions svmtrain and svmclassify were used to train
the classifiers and and perform validation, respectively. The pairs used at each iteration is shown
in Figure 2.32. The idea was to iteratively perform classification between the whole remaining set
and a single class, removing the classified set from the remaining whole in the next iteration. The
training and validation sets are separated such that approximately one third of the data was used
for validation while the remainder was used for training. The extracted feature data is was listed
in a spreadsheet and sorted by class starting from elliptical to lenticular through barred spirals and
76
Iteration 1 Irregular
Elliptical
Lenticular
Simple Spiral Barred Spiral
Spiral
Not Elliptical
Regular
Iteration 2
Iteration 3
Iteration 4
Figure 2.32: Classification iteration class pairs.
irregular. The bottom one third of each class was reserved fro validation while the top one third
was used for training. This process was a single-fold validation.
For Iteration 1, galaxies were assigned to the Irregular or Regular class. Training was per-
formed on a set of 40 galaxies consisting of 5 irregular and 35 regular, with class membership
ranging from elliptical to spiral and barred spiral. All 6 morphic features were used in the training.
The validation set contained 15 galaxies: 1 irregular and 14 regular. Of the validation set, 7/15
galaxies were classified correctly giving an accuracy of 46.6667%. Principal component analysis
(PCA) was applied to the training and validation sets, and the data was projected onto the first two
principal components. Using the reduced data as input the SVM yielded a classification accuracy
of 13.3333%. Using the quadratic kernel for the SVM classification yielded 13/15 (86.6667%) and
12/15 (80%) accuracy for 6 and 2 features, respectively. Figure 2.33 shows classification in the
PCA feature space for each kernel. The legend indicates the symbols Irregular (I) and Regular (R).
For all subsequent classification of un-enhanced galaxy images the irregular class was removed
from the training and validation sets.
The next pair of classes used for SVM training is Elliptical and Not Elliptical. The label vector
77
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
I (training)I (classified)R (training)R (classified)Support Vectors
(a) linear kernel
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
I (training)I (classified)R (training)R (classified)Support Vectors
(b) quadratic kernel
Figure 2.33: PCA feature space iteration 1 classification.
used was binary with entries 1 for Elliptical and 0 for Not Elliptical. The training set consisted of
34 galaxies: 5 elliptical and 29 galaxies belonging to classes lenticular, spiral and barred spiral.
The validation set contained 15 galaxies: 2 elliptical and 13 others. All 6 morphic features were
used. Classification accuracy was 13/15 (86.6667%). PCA was applied to the data set and the data
was projected onto the first two principal components. Classification in the reduced feature space
was 12/15 correctly classified galaxies or 80% accuracy. Using the quadratic kernel for the SVM
classification yielded 3/15 (20%) accuracy for both sets of 6 and 2 features. Figure 2.34 shows
classification in the PCA feature space for each kernel. The legend indicates the symbols Elliptical
(1) and Not Elliptical (0).
Elliptical galaxies were then removed from the training and test sets for all subsequent classi-
fication of un-enhanced galaxy images.
Lenticular and Spiral are the next two classes to be trained by the SVM. The training set con-
sisted of 9 lenticular galaxies and 20 spiral galaxies, while the test set consisted of 4 lenticular and
9 spiral galaxies. All 6 morphic features were used. Classification accuracy was 11/13 (84.6154%)
and 8/13 (61.5385%) with the linear and quadratic kernels, respectively. Classification accuracy of
the two PCA features with the linear and quadratic kernels respectively was 9/13 (69.2308%) and
3/13 (23.0769%). Figure 2.35 shows classification in the PCA feature space for each kernel. The
78
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(a) linear kernel
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(b) quadratic kernel
Figure 2.34: PCA feature space iteration 2 classification.
legend indicates the symbols Lenticular (1) and Spiral (0). Lenticular galaxies were then removed
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(a) linear kernel
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(b) quadratic kernel
Figure 2.35: PCA feature space iteration 3 classification.
from the training and test sets for all subsequent classification of un-enhanced galaxy images.
The final categories to be trained by SVM are Simple Spiral, which are referred to as spiral,
and Barred Spiral. The training set contained 5 simple spirals and 15 barred spirals. Validation
was performed by 2 simple and 7 barred spirals. All 6 morphic features were used. The SVM clas-
sified 7/9 galaxies correctly giving 77.7778% accuracy. After PCA, 2/9 galaxies were classified
correctly giving 22.2222% accuracy. Using the quadratic kernel for the SVM classification yielded
79
8/9 (88.8889%) and 2/9 (22.2222%) accuracy for 6 and 2 features, respectively. Figure 2.36 shows
classification in the PCA feature space for each kernel. The legend indicates the symbols Simple
Spiral (1) and Barred Spiral (0). Classification was then performed for the Heap transform en-
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(a) linear kernel
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(b) quadratic kernel
Figure 2.36: PCA feature space iteration 4 classification.
hanced galaxy image set. Table 2.4 summarizes the classification results for both the original and
enhanced data, 6 and 2 features, and linear and quadratic kernels. Figures 2.37, 2.38, 2.39, 2.40
show the classification results of the PCA feature space for the enhanced data set. The total classifi-
cation accuracy for the original data was 51.570% and for the enhanced data was 64.494% giving
an overall improvement in classification performance by galaxy image enhancement of 12.924%.
80
Classification Results
Linear Kernel
Original Data 6 Features 2 PCA Features
Irregular/Regular 7/15 (46.6667%) 2/15 (13.3333%)
Elliptical/Not Elliptical 13/15 (86.6667%) 3/15 (20%)
Lenticular/Spiral 11/13 (84.6154%) 9/13 (69.2308%)
Spiral/Barred Spiral 7/9 (77.7778%) 2/9 (22.2222%)
Enhanced Data
Irregular/Regular 4/15 (26.6667%) 2/15 (13.3333%)
Elliptical/Not Elliptical 11/15 (73.3333%) 10/15 (66.6667%)
Lenticular/Spiral 11/13 (84.6154%) 9/13 (60%)
Spiral/Barred Spiral 8/9 (88.8889%) 7/9 (77.7778%)
Quadratic Kernal
Original Data 6 Features 2 PCA Features
Irregular/Regular 13/15 (86.6667%) 12/15 (80%)
Elliptical/Not Elliptical 10/15 (66.6667%) 3/15 (20%)
Lenticular/Spiral 8/13 (61.5385%) 3/13 (23.0769%)
Spiral/Barred Spiral 4/9 (44.4444%) 2/9 (22.2222%)
Enhanced Data
Irregular/Regular 12/15 (80%) 0/15 (0%)
Elliptical/Not Elliptical 12/15 (80%) 13/15 (86.6667%)
Lenticular/Spiral 11/13 (84.6154%) 9/13 (60%)
Spiral/Barred Spiral 6/9 (66.6667%) 6/9 (66.6667%)
Table 2.4: Summary of classification results for original and enhanced data. Accuracy improved
by 12.924% due to enhancement.
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
I (training)I (classified)R (training)R (classified)Support Vectors
(a) linear kernel
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
I (training)I (classified)R (training)R (classified)Support Vectors
(b) quadratic kernel
Figure 2.37: PCA feature space iteration 1 classification of enhanced data.
81
0 0.5 1 1.5 2−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(a) linear kernel
0 0.5 1 1.5 2−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(b) quadratic kernel
Figure 2.38: PCA feature space iteration 2 classification of enhanced data.
0 0.5 1 1.5 2−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(a) linear kernel
0 0.5 1 1.5 2−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(b) quadratic kernel
Figure 2.39: PCA feature space iteration 3 classification of enhanced data.
2.8 Future Work
Improve the segmentation scheme to capture more accurately the shape of the galaxies. Extend the
classification scheme to include classes Sa, Sb, Sc, SBa, SBb, SBc, SBd, SBm and the elliptical
subclasses E0, . . . , E7. Use a sparse dictionary to perform classification of image data. Download
a data set from the CDS Strausburg to increase the size of training and validation sets. 5-fold and
10-fold cross validation for classification.Implement classification procedures in python. Develop
graphical user interface for user driven or automated classification software.
82
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(a) linear kernel
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 (training)0 (classified)1 (training)1 (classified)Support Vectors
(b) quadratic kernel
Figure 2.40: PCA feature space iteration 4 classification of enhanced data.
83
Appendix A: PROJECT SOFTWARE
A list and brief description of the Matlab codes used in the paper is below,
• galaxy_processing.m: preprocessing and feature extraction as delineated in sections 2.5 and
2.6.1.
• centroid.m: calculates the center of brightness of the galaxy image to be used for shifting the
image by the centroid.
• galaxy_shift.m: shifts the galaxy image so that the center of brightness and the image center
are coincident.
• secondmoment.m: calculates the second moments of the galaxy image and the angle between
the first second moment and the vertical axis of the image.
• calculateEllipse: calculates and plots an ellipse by the centroid, ellipse axes and angle of
rotation on the galaxy image.
• classification_Irr_Reg.m: original data classification for classes irregular and regular. (gen-
erated figure 2.33)
• classification_E_NE.m: " " elliptical and not elliptical. (generated figure 2.34)
• classification_S0_S.m: " " lenticular and spiral. (generated figure 2.35)
• classification_S_SB.m: " " simple spiral and barred spiral. (generated figure 2.36)
• heap_classification_Irr_Reg.m: enhanced data classification for classes irregular and regular.
(generated figure 2.37)
• heap_classification_E_NE.m: " " elliptical and not elliptical. (generated figure 2.38)
• heap_classification_S0_S.m: " " lenticular and spiral. (generated figure 2.39)
• heap_classification_S_SB.m: " " simple spiral and barred spiral. (generated figure 2.40)
84
A.1 Preprocessing and Feature Extraction codes
% call: galaxy_processing.m
%
% Background subtraction by thresholding. Threshold is determined
% by either manual inspection of threshold image iterations of the
% histogram levels or Otsu's method. Star/object removal by morphological
% opening. Shift image so galaxy centroid and image center are coincident.
% Galaxy rotation by angle between 2nd moment and vertical image axis.
% Crop and resize image to 128x128. Edge detection and calculate best fit
% ellipse for use in feature extraction by 6 morphological features.
%% Read image in
A=imread('AC8431_NGC3985.tif');
[N M L]=size(A);
A=A(:,:,1);
%% Find best threshold
H=imhist(A,65535); %imhist(A,65535) for uint16 images
figure;
subplot(2,2,1)
imshow(A)
subplot(2,2,[3,4])
plot(H)
% x1=1*10^4; % for uint16 images
for i=50:200
subplot(2,2,[3,4])
hold on;
T=i;
xx=[T T];
yy=[0 H(T)];
hline=line(xx,yy);
set(hline,'Color',[1 0 0]);
htext=text(T-5, H(T),'T');
set(htext,'Color',[1 0 0]);
85
Ab=(A>T);
subplot(2,2,2)
imshow(Ab)
ss=sprintf('Thresholding by %g',T);
stitle=title(ss);
pause(.1)
delete(htext); delete(hline);
end
% Thresholding
bw=im2bw(A,19200/65535);
bw=1-bw;
bw2=bwareaopen(bw,256);
% cc=bwconncomp(bw2); %use for more than 1 object
% L=labelmatrix(cc);
% L(L~=2)=0;
% L=double(L);
% imshow(L,[])
% X=double(A);
% g=L.*X;
imshow(bw2,[]); %colormap(gray(65535))
X=double(A);
g=bw2.*X;
imshow(g,[]); %colormap(gray(65535))
%% Shifting image by centroid
[xc,yc]=centroid(g,1);
Y=galaxy_shift(g,xc,yc);
%% Rotate image by angle defined by 2nd moments
[m11,m20,m02]=secondmoment(g);
theta=(1/2)*atan2(2*m11,m20-m02);
alpha=theta*(180/pi);
gr=imrotate(g,angle);
imshow(gr,[])
86
% Crop galaxy
% reduce size of rotated galaxy by size(gr)/n, n=1,2,...
% use the reduced size to compose a new image I which contains
% the galaxy.
I=imcrop(gr,[102 214 129 125]);
gs=imresize(I,[128 128]);
imshow(gs,[])
[N M L]=size(gs);
%% Calculating morphics features
bs=im2bw(gs);
p=regionprops(bs,'all');
p=p(1);
xc=p.Centroid(1);
yc=p.Centroid(2);
a=p.MajorAxisLength/2;
b=p.MinorAxisLength/2;
BBox=round(p.BoundingBox);
[X,Y]=calculateEllipse(p.Centroid(1),p.Centroid(2),a,b,0);
% Edge detection
[gCanny, gt]=edge(gs,'canny',[0.3 .9], 0.5);
imshow(gCanny)
G=find(gCanny>0);
figure;
imshow(gs,[]); hold on;
plot(X,Y,'b*');
rectangle('Position',p.BoundingBox,'EdgeColor','r')
plot(G,'g-');
% Elongation: (a-b)/(b+a).
Elongation=(a-b)/(b+a)
% Form Factor: ratio of the area of the galaxy
% (number of pixels in the galaxy) to its perimeter
% (number of pixels in canny edge detection).
87
numpixels_galaxy=0;
for n=1:N
for m=1:M
if(gs(n,m)~=0)
numpixels_galaxy=numpixels_galaxy+1;
end
end
end
numpixels_perimeter=numel(find(gCanny>0));
Formfactor=numpixels_galaxy/numpixels_perimeter
% Convexity: ratio of the galaxy perimeter to the
% perimeter of the minimum bounding rectangle.
% imshow(A) %show bounding rectangle superimposed on galaxy.
% rectangle('position',[xmin ymin width height],'EdgeColor','r');
rectangle_perimeter=2*BBox(3)+2*BBox(4);
Convexity=numpixels_perimeter/rectangle_perimeter
%Bounding-rectangle-to-fill-factor (BFF): area of the bounding rectangle
%to the number of pixels within the rectangle.
rectangle_area=BBox(3)*BBox(4);
L1=BBox(1);
W1=BBox(2);
L=BBox(1)+BBox(3);
W=BBox(2)+BBox(4);
numpixels_bounding_box=0;
for n=L1:L
for m=W1:W
numpixels_bounding_box=numpixels_bounding_box+1;
end
end
BFF=rectangle_area/numpixels_bounding_box
% Bounding-rectangle-to-perimeter: area of the bounding rectangle
% to the number of pixels included in the perimeter.
88
Bounding_rectangle_to_perimeter=rectangle_area/rectangle_perimeter
% Asymmetry index: taking the difference between the galaxy image
% and the same image rotated 180 degrees about the center of the galaxy.
% The sum of the absolute value of the pixels in the difference image
% is divided by the sum of pixels in the original image to give the
% asymmetry parameter.
gs_rotated=imrotate(gs,180);
difference_image=gs-gs_rotated;
Asymmetry_index=sum(sum(abs(difference_image)))/sum(sum(gs))
%===============================================================% call: centroid.m
%
% calculate the first moment of an image. centroid(X,I) calculates
% the centroid for binary or grayscale image X. If X is binary, I=0.
% If X is intensity image, I=1.
% John Jenkinson, Dr. Artyom Grigoryan, ECE UTSA 2014.
function[xc,yc]=centroid(X,I)
[N M L]=size(X);
X=double(X(:,:,1));
xbar=0; ybar=0;
for n=1:N
for m=1:M
a=X(n,m);
xbar = xbar + n*a;
ybar = ybar + m*a;
end
end
if(I==1)
ss=sum(X(:)); %faster than sum(sum(X)) for type double
else if(I==0)
ss=N*M;
end
89
end
xc=round(xbar/ss); yc=round(ybar/ss);
end
%===============================================================% call: galaxy_shift.m
%
% Shift the center of brightness to the image center.
% John Jenkinson, ECE UTSA 2014.
function[Y]=galaxy_shift(g,xc,yc)
[N M L]=size(g);
Y=zeros(N,M);
if(N/2-yc<0 & M/2-xc<0)
Y(1:N+(N/2-yc),1:M+(M/2-xc))=g(1-(N/2-yc):N,1-(M/2-xc):M);
else if(N/2-yc<0 & M/2-xc>0)
Y(1:N+(N/2-yc),1+(M/2-xc):M)=g(1-(N/2-yc):N,1:M-(M/2-xc));
else if(N/2-yc>0 & M/2-xc<0)
Y(1+(N/2-yc):N,1:M+(M/2-xc))=g(1:N-(N/2-yc),1-(M/2-xc):M);
else if(N/2-yc>0 & M/2-xc>0)
Y(1+(N/2-yc):N,1+(M/2-xc):M)=g(1:N-(N/2-yc),1:M-(M/2-xc));
end
end
end
end
end
%===============================================================% call: secondmoment.m
%
% Let say you have the image A(n,m) of galaxy of size NxM the moment mu(11)
% is calculated as follows:
% by Art Grigoryan edited by John Jenkinson
function [m11,m20,m02]=secondmoment(A)
[N,M]=size(A);
90
m11=0;
m20=0;
m02=0;
for n=0:N-1
n1=n+1;
for m=0:M-1
a=A(n1,m+1);
ma=m*a;
na=n*a;
m11=m11+n*ma;
m20=m20+n*na;
m02=m02+m*ma;
end
end
if(islogical(A)==1)
% normalization
ss=N*M;
m11=m11/ss;
m20=m20/ss;
m02=m02/ss;
else
% normalization
ss=sum(sum(A));
m11=round(m11/ss);
m20=round(m20/ss);
m02=round(m02/ss);
end
end
%===============================================================% call: calculateEllipse.m
%
% calculate points to draw an ellipse
91
function [X,Y] = calculateEllipse(x, y, a, b, angle, steps)
% x coordinate
% y coordinate
% semimajor axis
% semiminor axis
% angle of the ellipse (in degrees)
narginchk(5, 6);
if nargin<6, steps = 36; end
beta = -angle * (pi / 180);
sinbeta = sin(beta);
cosbeta = cos(beta);
alpha = linspace(0, 360, steps)' .* (pi / 180);
sinalpha = sin(alpha);
cosalpha = cos(alpha);
X = x + (a * cosalpha * cosbeta - b * sinalpha * sinbeta);
Y = y + (a * cosalpha * sinbeta + b * sinalpha * cosbeta);
if nargout==1, X = [X Y]; end
end
A.2 SVM Classification codes with data
A.2.1 Original data
training=[0.2379 0.031 1.1141 0.6371 0.0604 0.338
0.2066 0.0623 0.8261 0.7143 0.0595 0.7111
0.3681 0.0423 0.8803 0.586 0.0559 0.1604
0.1589 0.0275 1.1492 0.5895 0.0617 0.2602
0.2876 0.058 0.8281 0.6792 0.0586 0.3558
0.0577 0.059 0.8803 0.7329 0.0624 0.2386
0.0175 0.0585 0.9 0.7582 0.0624 0.1724
0.054 0.0497 0.9521 0.7206 0.0625 0.1144
0.0316 0.0767 0.7955 0.7769 0.0625 0.2979
0.1817 0.0733 0.7895 0.75 0.0609 0.303
92
0.5137 0.0393 0.8707 0.651 0.0458 0.4838
0.5666 0.0372 0.9038 0.6854 0.0444 0.1155
0.3609 0.0482 0.8878 0.6913 0.055 0.0932
0.6616 0.0284 0.9259 0.6455 0.0377 0.2113
0.3547 0.047 0.871 0.6524 0.0546 0.219
0.4334 0.0457 0.8917 0.6918 0.0525 0.1033
0.461 0.0428 0.8625 0.6395 0.0498 0.5098
0.2049 0.0629 0.84 0.74 0.06 0.2342
0.1287 0.0718 0.8158 0.7841 0.0609 0.2609
0.5203 0.032 0.9405 0.625 0.0454 0.64
0.1891 0.0412 1.011 0.6946 0.0606 0.1607
0.7442 0.0141 1.0774 0.5336 0.0306 0.5857
0.6239 0.0347 0.9126 0.7114 0.0406 0.4327
0.411 0.0415 0.9273 0.6789 0.0525 0.2029
0.7521 0.0145 1.0337 0.497 0.0312 1.0387
0.1306 0.0462 0.9239 0.6423 0.0614 0.3473
0.5882 0.0291 0.9153 0.5535 0.0441 0.1941
0.4012 0.0192 1.3214 0.6382 0.0527 0.4836
0.5257 0.0356 0.9414 0.7045 0.0448 0.1554
0.7802 0.0155 1 0.5871 0.0264 0.2839
0.5228 0.0184 1.2409 0.5717 0.0496 0.3557
0.505 0.0305 0.9783 0.5842 0.0499 0.3856
0.4325 0.0322 0.8718 0.4838 0.0506 0.443
0.5556 0.0282 0.8941 0.5242 0.0429 0.6762
0.521 0.0233 1.119 0.6573 0.0443 0.3256
0.453 0.044 0.8519 0.6118 0.0521 0.2796
0.7246 0.0227 0.9924 0.6446 0.0347 0.7385
0.5626 0.0248 1.115 0.6618 0.0466 0.9537
0.6077 0.0318 0.9091 0.657 0.04 0.5912
0.6071 0.0219 1.0536 0.5515 0.0441 0.3169];
%NGC 4449,UGC 7408,UGC 7577,UGC 7639,UGC 7690
%NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831
93
%NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346
%NGC 4324,NGC 5854,NGC 5838,NGC 5839,NGC 5864
%NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288
%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157
%NGC 5850,NGC 5806,NGC 4232,NGC 4088,NGC 4258
%NGC 4527,NGC 4389,NGC 4496,NGC 4085,NGC 4096
Y=['I'; 'I'; 'I'; 'I'; 'I'; 'R'; 'R'; 'R'; 'R'; 'R';...
'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';...
'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';...
'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R'];
coeff=pca(training);
reduced_training=training*coeff(:,1:2);
svmStruct=svmtrain(training,Y,'kernel_function',...
'quadratic','showplot',true);
test=[0.0284 0.044 0.9241 0.6016 0.0625 0.299
0.0474 0.0469 0.9684 0.705 0.0624 0.3738
0.1105 0.0548 0.9314 0.7682 0.0619 0.4194
0.1687 0.0637 0.8448 0.75 0.0606 0.1961
0.1563 0.0692 0.85 0.8333 0.06 0.2
0.4373 0.051 0.8421 0.7037 0.0514 1.6172
0.3642 0.0147 1.5035 0.603 0.055 0.5245
0.7489 0.0199 0.9471 0.5502 0.0324 0.6252
0.304 0.0258 1.1216 0.5528 0.0588 0.4888
0.2894 0.0161 1.3588 0.4942 0.06 0.6418
0.6478 0.0129 1.3286 0.5558 0.0411 0.6026
0.3865 0.0333 0.9158 0.5154 0.0541 0.4956
0.3934 0.0403 0.8406 0.5123 0.0556 0.4945
0.484 0.0319 0.9286 0.4949 0.0556 0.5979
0.2565 0.0618 0.7857 0.6361 0.06 0.3743];
%test set is first row irregular, remaining regular.
%NGC 4496B,NGC 5846,NGC 5846A,NGC 5865,NGC 5868,NGC 4310
%NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242
94
%NGC 4393,NGC 4288,NGC 3985
coeff2=pca(test);
reduced_test=test*coeff2(:,1:2);
group = svmclassify(svmStruct,test,'showplot',true);
%===============================================================%elliptical versus not elliptical
training=[0.0577 0.059 0.8803 0.7329 0.0624 0.2386
0.0175 0.0585 0.9 0.7582 0.0624 0.1724
0.054 0.0497 0.9521 0.7206 0.0625 0.1144
0.0316 0.0767 0.7955 0.7769 0.0625 0.2979
0.1817 0.0733 0.7895 0.75 0.0609 0.303
0.5137 0.0393 0.8707 0.651 0.0458 0.4838
0.5666 0.0372 0.9038 0.6854 0.0444 0.1155
0.3609 0.0482 0.8878 0.6913 0.055 0.0932
0.6616 0.0284 0.9259 0.6455 0.0377 0.2113
0.3547 0.047 0.871 0.6524 0.0546 0.219
0.4334 0.0457 0.8917 0.6918 0.0525 0.1033
0.461 0.0428 0.8625 0.6395 0.0498 0.5098
0.2049 0.0629 0.84 0.74 0.06 0.2342
0.1287 0.0718 0.8158 0.7841 0.0609 0.2609
0.1891 0.0412 1.011 0.6946 0.0606 0.1607
0.7442 0.0141 1.0774 0.5336 0.0306 0.5857
0.6239 0.0347 0.9126 0.7114 0.0406 0.4327
0.411 0.0415 0.9273 0.6789 0.0525 0.2029
0.7521 0.0145 1.0337 0.497 0.0312 1.0387
0.1306 0.0462 0.9239 0.6423 0.0614 0.3473
0.5882 0.0291 0.9153 0.5535 0.0441 0.1941
0.4012 0.0192 1.3214 0.6382 0.0527 0.4836
0.5257 0.0356 0.9414 0.7045 0.0448 0.1554
0.7802 0.0155 1 0.5871 0.0264 0.2839
0.5228 0.0184 1.2409 0.5717 0.0496 0.3557
0.505 0.0305 0.9783 0.5842 0.0499 0.3856
95
0.4325 0.0322 0.8718 0.4838 0.0506 0.443
0.5556 0.0282 0.8941 0.5242 0.0429 0.6762
0.521 0.0233 1.119 0.6573 0.0443 0.3256
0.453 0.044 0.8519 0.6118 0.0521 0.2796
0.7246 0.0227 0.9924 0.6446 0.0347 0.7385
0.5626 0.0248 1.115 0.6618 0.0466 0.9537
0.6077 0.0318 0.9091 0.657 0.04 0.5912
0.6071 0.0219 1.0536 0.5515 0.0441 0.3169];
%NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831,NGC 4346,NGC 4460
%NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854,NGC 5838
%NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288
%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850
%NGC 5806,NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527
%NGC 4389,NGC 4496,NGC 4085,NGC 4096
% 1 for Elliptical 0 for Not Elliptical
Y=[1 1 1 1 ...
1 0 0 0 0 0 ...
0 0 0 0 0 0 ...
0 0 0 0 0 0 ...
0 0 0 0 0 0 ...
0 0 0 0 0 0];
Y=Y';
coeff=pca(training);
reduced_training=training*coeff(:,1:2);
svmStruct=svmtrain(reduced_training,Y,'kernel_function',...
'quadratic','showplot',true);
% 'kernal_function','quadratic'
test=[0.0474 0.0469 0.9684 0.705 0.0624 0.3738
0.1105 0.0548 0.9314 0.7682 0.0619 0.4194
0.5203 0.032 0.9405 0.625 0.0454 0.64
0.1687 0.0637 0.8448 0.75 0.0606 0.1961
0.1563 0.0692 0.85 0.8333 0.06 0.2
96
0.4373 0.051 0.8421 0.7037 0.0514 1.6172
0.3642 0.0147 1.5035 0.603 0.055 0.5245
0.7489 0.0199 0.9471 0.5502 0.0324 0.6252
0.304 0.0258 1.1216 0.5528 0.0588 0.4888
0.2894 0.0161 1.3588 0.4942 0.06 0.6418
0.6478 0.0129 1.3286 0.5558 0.0411 0.6026
0.3865 0.0333 0.9158 0.5154 0.0541 0.4956
0.3934 0.0403 0.8406 0.5123 0.0556 0.4945
0.484 0.0319 0.9286 0.4949 0.0556 0.5979
0.2565 0.0618 0.7857 0.6361 0.06 0.3743];
%NGC 5846,NGC 5846A,NGC 5864,NGC 5865,NGC 5868,NGC 4310
%NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242
%NGC 4393,NGC 4288,NGC 3985
coeff2=pca(test);
reduced_test=test*coeff2(:,1:2);
group = svmclassify(svmStruct,reduced_test,'showplot',true);
%===============================================================%lenticular versus spiral
clear all; close all; clc
training=[0.5137 0.0393 0.8707 0.651 0.0458 0.4838
0.5666 0.0372 0.9038 0.6854 0.0444 0.1155
0.3609 0.0482 0.8878 0.6913 0.055 0.0932
0.6616 0.0284 0.9259 0.6455 0.0377 0.2113
0.3547 0.047 0.871 0.6524 0.0546 0.219
0.4334 0.0457 0.8917 0.6918 0.0525 0.1033
0.461 0.0428 0.8625 0.6395 0.0498 0.5098
0.2049 0.0629 0.84 0.74 0.06 0.2342
0.1287 0.0718 0.8158 0.7841 0.0609 0.2609
0.1891 0.0412 1.011 0.6946 0.0606 0.1607
0.7442 0.0141 1.0774 0.5336 0.0306 0.5857
0.6239 0.0347 0.9126 0.7114 0.0406 0.4327
0.411 0.0415 0.9273 0.6789 0.0525 0.2029
97
0.7521 0.0145 1.0337 0.497 0.0312 1.0387
0.1306 0.0462 0.9239 0.6423 0.0614 0.3473
0.5882 0.0291 0.9153 0.5535 0.0441 0.1941
0.4012 0.0192 1.3214 0.6382 0.0527 0.4836
0.5257 0.0356 0.9414 0.7045 0.0448 0.1554
0.7802 0.0155 1 0.5871 0.0264 0.2839
0.5228 0.0184 1.2409 0.5717 0.0496 0.3557
0.505 0.0305 0.9783 0.5842 0.0499 0.3856
0.4325 0.0322 0.8718 0.4838 0.0506 0.443
0.5556 0.0282 0.8941 0.5242 0.0429 0.6762
0.521 0.0233 1.119 0.6573 0.0443 0.3256
0.453 0.044 0.8519 0.6118 0.0521 0.2796
0.7246 0.0227 0.9924 0.6446 0.0347 0.7385
0.5626 0.0248 1.115 0.6618 0.0466 0.9537
0.6077 0.0318 0.9091 0.657 0.04 0.5912
0.6071 0.0219 1.0536 0.5515 0.0441 0.3169];
%NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854
%NGC 5838,NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288
%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806
%NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527,NGC 4389
%NGC 4496,NGC 4085,NGC 4096
% 1 for Lenticular 0 for Spiral
Y=[1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
Y=Y';
coeff=pca(training);
reduced_training=training*coeff(:,1:2);
svmStruct=svmtrain(reduced_training,Y,'showplot',true);
%,'kernel_function','quadratic'
test=[0.5203 0.032 0.9405 0.625 0.0454 0.64
0.1687 0.0637 0.8448 0.75 0.0606 0.1961
0.1563 0.0692 0.85 0.8333 0.06 0.2
0.4373 0.051 0.8421 0.7037 0.0514 1.6172
98
0.3642 0.0147 1.5035 0.603 0.055 0.5245
0.7489 0.0199 0.9471 0.5502 0.0324 0.6252
0.304 0.0258 1.1216 0.5528 0.0588 0.4888
0.2894 0.0161 1.3588 0.4942 0.06 0.6418
0.6478 0.0129 1.3286 0.5558 0.0411 0.6026
0.3865 0.0333 0.9158 0.5154 0.0541 0.4956
0.3934 0.0403 0.8406 0.5123 0.0556 0.4945
0.484 0.0319 0.9286 0.4949 0.0556 0.5979
0.2565 0.0618 0.7857 0.6361 0.06 0.3743];
%NGC 5864,NGC 5865,NGC 5868,NGC 4310,NGC 6070
%UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242
%NGC 4393,NGC 4288,NGC 3985
coeff2=pca(test);
reduced_test=test*coeff2(:,1:2);
group = svmclassify(svmStruct,reduced_test,'showplot',true);
%===============================================================%simple spiral versus barred spiral
training=[0.1891 0.0412 1.011 0.6946 0.0606 0.1607
0.7442 0.0141 1.0774 0.5336 0.0306 0.5857
0.6239 0.0347 0.9126 0.7114 0.0406 0.4327
0.411 0.0415 0.9273 0.6789 0.0525 0.2029
0.7521 0.0145 1.0337 0.497 0.0312 1.0387
0.1306 0.0462 0.9239 0.6423 0.0614 0.3473
0.5882 0.0291 0.9153 0.5535 0.0441 0.1941
0.4012 0.0192 1.3214 0.6382 0.0527 0.4836
0.5257 0.0356 0.9414 0.7045 0.0448 0.1554
0.7802 0.0155 1 0.5871 0.0264 0.2839
0.5228 0.0184 1.2409 0.5717 0.0496 0.3557
0.505 0.0305 0.9783 0.5842 0.0499 0.3856
0.4325 0.0322 0.8718 0.4838 0.0506 0.443
0.5556 0.0282 0.8941 0.5242 0.0429 0.6762
0.521 0.0233 1.119 0.6573 0.0443 0.3256
99
0.453 0.044 0.8519 0.6118 0.0521 0.2796
0.7246 0.0227 0.9924 0.6446 0.0347 0.7385
0.5626 0.0248 1.115 0.6618 0.0466 0.9537
0.6077 0.0318 0.9091 0.657 0.04 0.5912
0.6071 0.0219 1.0536 0.5515 0.0441 0.3169];
%NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288,NGC 4457
%NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806
%NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527
%NGC 4389,NGC 4496,NGC 4085,NGC 4096
% 1 for Spiral 0 for Barred Spiral
Y=[1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
Y=Y';
coeff=pca(training);
reduced_training=training*coeff(:,1:2);
svmStruct=svmtrain(reduced_training,Y,'kernel_function',...
'quadratic','showplot',true);
%,'kernel_function','quadratic'
test=[0.3642 0.0147 1.5035 0.603 0.055 0.5245
0.7489 0.0199 0.9471 0.5502 0.0324 0.6252
0.304 0.0258 1.1216 0.5528 0.0588 0.4888
0.2894 0.0161 1.3588 0.4942 0.06 0.6418
0.6478 0.0129 1.3286 0.5558 0.0411 0.6026
0.3865 0.0333 0.9158 0.5154 0.0541 0.4956
0.3934 0.0403 0.8406 0.5123 0.0556 0.4945
0.484 0.0319 0.9286 0.4949 0.0556 0.5979
0.2565 0.0618 0.7857 0.6361 0.06 0.3743];
%NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242
%NGC 4393,NGC 4288,NGC 3985
coeff2=pca(test);
reduced_test=test*coeff2(:,1:2);
group = svmclassify(svmStruct,reduced_test,'showplot',true);
A.2.2 Enhanced data
100
training=[0.2028 0.0399 1.0698 0.7504 0.0608 1.0613
0.2187 0.0379 1.02 0.646 0.0611 0.4876
0.4311 0.0116 1.5897 0.5422 0.0541 1.4891
0.0873 0.0179 1.4145 0.5727 0.0625 0.2709
0.1025 0.0493 0.9038 0.6488 0.0621 0.2294
0.0616 0.0223 1.3416 0.6442 0.0623 0.2499
0.0439 0.0594 0.8462 0.6845 0.0621 0.2609
0.0498 0.0386 1.0259 0.6544 0.062 0.4055
0.066 0.0297 1.2147 0.7027 0.0624 0.297
0.1106 0.0612 0.8429 0.7007 0.062 0.1972
0.563 0.0361 0.9012 0.6811 0.043 0.3022
0.5703 0.0343 0.9 0.6169 0.045 0.1646
0.4029 0.0437 0.85 0.6012 0.0525 0.203
0.6352 0.0297 0.8824 0.5779 0.04 0.3413
0.5132 0.0402 0.8814 0.6323 0.0494 0.0874
0.4404 0.0377 0.9557 0.6677 0.0516 0.2233
0.4778 0.0393 0.9455 0.7083 0.0496 0.4565
0.2595 0.0531 0.89 0.7148 0.0589 0.1188
0.1686 0.0687 0.7857 0.6927 0.0612 0.2556
0.5027 0.0415 0.8942 0.6748 0.0492 0.1393
0.1871 0.0386 1.0429 0.6948 0.0603 0.2403
0.6458 0.0125 1.1555 0.4424 0.0377 0.6822
0.606 0.0331 0.9237 0.6895 0.0409 0.3285
0.3777 0.047 0.8939 0.6923 0.0542 0.23
0.7385 0.0209 0.9314 0.5232 0.0347 0.8889
0.2116 0.044 0.9107 0.6126 0.0596 1.8642
0.5645 0.0352 0.9286 0.6695 0.0454 0.2674
0.4421 0.0279 1.1424 0.7056 0.0516 0.3175
0.5088 0.0314 1.0037 0.6476 0.0489 0.1525
0.7701 0.0192 0.9634 0.6308 0.0283 0.2815
0.4965 0.0186 1.2083 0.4951 0.0548 0.4783
0.487 0.0362 0.9202 0.6272 0.0488 0.3401
101
0.3847 0.043 0.8571 0.55 0.0574 0.3939
0.5871 0.0271 0.9656 0.6006 0.042 0.5289
0.5288 0.0309 0.9645 0.6443 0.0446 0.3304
0.4683 0.0366 0.9167 0.6028 0.051 0.4332
0.4643 0.042 0.9167 0.672 0.0525 0.4112
0.5166 0.0175 1.297 0.5687 0.0518 0.4288
0.6254 0.0331 0.8987 0.6617 0.0404 0.5547
0.6314 0.031 0.9389 0.6861 0.0398 0.1997];
%NGC 4449,UGC 7408,UGC 7577,UGC 7639,UGC 7690
%NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831
%NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346
%NGC 4324,NGC 5854,NGC 5838,NGC 5839,NGC 5864
%NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288
%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157
%NGC 5850,NGC 5806,NGC 4232,NGC 4088,NGC 4258
%NGC 4527,NGC 4389,NGC 4496,NGC 4085,NGC 4096
Y=['I'; 'I'; 'I'; 'I'; 'I'; 'R'; 'R'; 'R'; 'R'; 'R';...
'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';...
'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';...
'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R'];
coeff=pca(training);
reduced_training=training*coeff(:,1:2);
svmStruct=svmtrain(reduced_training,Y,'kernel_function',...
'quadratic','showplot',true);
%,'kernel_function','quadratic'
test=[0.1574 0.0321 1.1029 0.6243 0.0625 0.42
0.0188 0.0368 1.0887 0.6989 0.0625 0.3746
0.0763 0.0406 1.0671 0.7388 0.0625 0.9791
0.1338 0.0592 0.8514 0.6912 0.0621 0.2128
0.0194 0.0653 0.875 0.8 0.0625 0.3
0.4014 0.0365 0.9178 0.6007 0.0512 0.9695
0.3985 0.0389 0.9258 0.6263 0.0533 0.3471
102
0.7009 0.0124 1.1456 0.3995 0.0406 0.5291
0.2914 0.0296 1.067 0.577 0.0583 0.3457
0.246 0.0247 1.1345 0.5328 0.0597 0.5092
0.5692 0.0229 1.0414 0.5557 0.0447 0.5544
0.2172 0.0124 1.5533 0.4904 0.0611 0.5086
0.3022 0.0153 1.3596 0.4925 0.0576 0.7544
0.3798 0.029 1.0152 0.4938 0.0604 0.4923
0.2377 0.0559 0.8617 0.6898 0.0602 0.2997];
%test set is first row irregular, remaining regular.
%NGC 4496B,NGC 5846,NGC 5846A,NGC 5865,NGC 5868,NGC 4310
%NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242
%NGC 4393,NGC 4288,NGC 3985
coeff2=pca(test);
reduced_test=test*coeff2(:,1:2);
GROUP = svmclassify(svmStruct,reduced_test,'showplot',true);
%===============================================================%elliptical versus not elliptical
training=[0.0616 0.0223 1.3416 0.6442 0.0623 0.2499
0.0439 0.0594 0.8462 0.6845 0.0621 0.2609
0.0498 0.0386 1.0259 0.6544 0.062 0.4055
0.066 0.0297 1.2147 0.7027 0.0624 0.297
0.1106 0.0612 0.8429 0.7007 0.062 0.1972
0.563 0.0361 0.9012 0.6811 0.043 0.3022
0.5703 0.0343 0.9 0.6169 0.045 0.1646
0.4029 0.0437 0.85 0.6012 0.0525 0.203
0.6352 0.0297 0.8824 0.5779 0.04 0.3413
0.5132 0.0402 0.8814 0.6323 0.0494 0.0874
0.4404 0.0377 0.9557 0.6677 0.0516 0.2233
0.4778 0.0393 0.9455 0.7083 0.0496 0.4565
0.2595 0.0531 0.89 0.7148 0.0589 0.1188
0.1686 0.0687 0.7857 0.6927 0.0612 0.2556
0.1871 0.0386 1.0429 0.6948 0.0603 0.2403
103
0.6458 0.0125 1.1555 0.4424 0.0377 0.6822
0.606 0.0331 0.9237 0.6895 0.0409 0.3285
0.3777 0.047 0.8939 0.6923 0.0542 0.23
0.7385 0.0209 0.9314 0.5232 0.0347 0.8889
0.2116 0.044 0.9107 0.6126 0.0596 1.8642
0.5645 0.0352 0.9286 0.6695 0.0454 0.2674
0.4421 0.0279 1.1424 0.7056 0.0516 0.3175
0.5088 0.0314 1.0037 0.6476 0.0489 0.1525
0.7701 0.0192 0.9634 0.6308 0.0283 0.2815
0.4965 0.0186 1.2083 0.4951 0.0548 0.4783
0.487 0.0362 0.9202 0.6272 0.0488 0.3401
0.3847 0.043 0.8571 0.55 0.0574 0.3939
0.5871 0.0271 0.9656 0.6006 0.042 0.5289
0.5288 0.0309 0.9645 0.6443 0.0446 0.3304
0.4683 0.0366 0.9167 0.6028 0.051 0.4332
0.4643 0.042 0.9167 0.672 0.0525 0.4112
0.5166 0.0175 1.297 0.5687 0.0518 0.4288
0.6254 0.0331 0.8987 0.6617 0.0404 0.5547
0.6314 0.031 0.9389 0.6861 0.0398 0.1997];
%NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831,NGC 4346,NGC 4460
%NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854,NGC 5838
%NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288
%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850
%NGC 5806,NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527
%NGC 4389,NGC 4496,NGC 4085,NGC 4096
% 1 for Elliptical 0 for Not Elliptical
Y=[1 1 1 1 ...
1 0 0 0 0 0 ...
0 0 0 0 0 0 ...
0 0 0 0 0 0 ...
0 0 0 0 0 0 ...
0 0 0 0 0 0];
104
Y=Y';
coeff=pca(training);
reduced_training=training*coeff(:,1:2);
svmStruct=svmtrain(training,Y,'showplot',true);
%'kernel_function','quadratic',S0
test=[0.0188 0.0368 1.0887 0.6989 0.0625 0.3746
0.0763 0.0406 1.0671 0.7388 0.0625 0.9791
0.5027 0.0415 0.8942 0.6748 0.0492 0.1393
0.1338 0.0592 0.8514 0.6912 0.0621 0.2128
0.0194 0.0653 0.875 0.8 0.0625 0.3
0.4014 0.0365 0.9178 0.6007 0.0512 0.9695
0.3985 0.0389 0.9258 0.6263 0.0533 0.3471
0.7009 0.0124 1.1456 0.3995 0.0406 0.5291
0.2914 0.0296 1.067 0.577 0.0583 0.3457
0.246 0.0247 1.1345 0.5328 0.0597 0.5092
0.5692 0.0229 1.0414 0.5557 0.0447 0.5544
0.2172 0.0124 1.5533 0.4904 0.0611 0.5086
0.3022 0.0153 1.3596 0.4925 0.0576 0.7544
0.3798 0.029 1.0152 0.4938 0.0604 0.4923
0.2377 0.0559 0.8617 0.6898 0.0602 0.2997];
%NGC 5846,NGC 5846A,NGC 5864,NGC 5865,NGC 5868,
%NGC 4310,NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,
%NGC 4242,NGC 4393,NGC 4288,NGC 3985
coeff2=pca(test);
reduced_test=test*coeff2(:,1:2);
group = svmclassify(svmStruct,test,'showplot',true);
%===============================================================%lenticular versus spiral
training=[0.563 0.0361 0.9012 0.6811 0.043 0.3022
0.5703 0.0343 0.9 0.6169 0.045 0.1646
0.4029 0.0437 0.85 0.6012 0.0525 0.203
0.6352 0.0297 0.8824 0.5779 0.04 0.3413
105
0.5132 0.0402 0.8814 0.6323 0.0494 0.0874
0.4404 0.0377 0.9557 0.6677 0.0516 0.2233
0.4778 0.0393 0.9455 0.7083 0.0496 0.4565
0.2595 0.0531 0.89 0.7148 0.0589 0.1188
0.1686 0.0687 0.7857 0.6927 0.0612 0.2556
0.1871 0.0386 1.0429 0.6948 0.0603 0.2403
0.6458 0.0125 1.1555 0.4424 0.0377 0.6822
0.606 0.0331 0.9237 0.6895 0.0409 0.3285
0.3777 0.047 0.8939 0.6923 0.0542 0.23
0.7385 0.0209 0.9314 0.5232 0.0347 0.8889
0.2116 0.044 0.9107 0.6126 0.0596 1.8642
0.5645 0.0352 0.9286 0.6695 0.0454 0.2674
0.4421 0.0279 1.1424 0.7056 0.0516 0.3175
0.5088 0.0314 1.0037 0.6476 0.0489 0.1525
0.7701 0.0192 0.9634 0.6308 0.0283 0.2815
0.4965 0.0186 1.2083 0.4951 0.0548 0.4783
0.487 0.0362 0.9202 0.6272 0.0488 0.3401
0.3847 0.043 0.8571 0.55 0.0574 0.3939
0.5871 0.0271 0.9656 0.6006 0.042 0.5289
0.5288 0.0309 0.9645 0.6443 0.0446 0.3304
0.4683 0.0366 0.9167 0.6028 0.051 0.4332
0.4643 0.042 0.9167 0.672 0.0525 0.4112
0.5166 0.0175 1.297 0.5687 0.0518 0.4288
0.6254 0.0331 0.8987 0.6617 0.0404 0.5547
0.6314 0.031 0.9389 0.6861 0.0398 0.1997];
%NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854
%NGC 5838,NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288
%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806
%NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527,NGC 4389
%NGC 4496,NGC 4085,NGC 4096
% 1 for Lenticular 0 for Spiral
Y=[1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
106
Y=Y';
coeff=pca(training);
reduced_training=training*coeff(:,1:2);
svmStruct=svmtrain(training,Y,'showplot',true);
% 'kernel_function','quadratic',
test=[0.5027 0.0415 0.8942 0.6748 0.0492 0.1393
0.1338 0.0592 0.8514 0.6912 0.0621 0.2128
0.0194 0.0653 0.875 0.8 0.0625 0.3
0.4014 0.0365 0.9178 0.6007 0.0512 0.9695
0.3985 0.0389 0.9258 0.6263 0.0533 0.3471
0.7009 0.0124 1.1456 0.3995 0.0406 0.5291
0.2914 0.0296 1.067 0.577 0.0583 0.3457
0.246 0.0247 1.1345 0.5328 0.0597 0.5092
0.5692 0.0229 1.0414 0.5557 0.0447 0.5544
0.2172 0.0124 1.5533 0.4904 0.0611 0.5086
0.3022 0.0153 1.3596 0.4925 0.0576 0.7544
0.3798 0.029 1.0152 0.4938 0.0604 0.4923
0.2377 0.0559 0.8617 0.6898 0.0602 0.2997];
%NGC 5864,NGC 5865,NGC 5868,NGC 4310,NGC 6070
%UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242
%NGC 4393,NGC 4288,NGC 3985
coeff2=pca(test);
reduced_test=test*coeff2(:,1:2);
group = svmclassify(svmStruct,test,'showplot',true);
%===============================================================%simple spiral versus barred spiral
training=[0.1871 0.0386 1.0429 0.6948 0.0603 0.2403
0.6458 0.0125 1.1555 0.4424 0.0377 0.6822
0.606 0.0331 0.9237 0.6895 0.0409 0.3285
0.3777 0.047 0.8939 0.6923 0.0542 0.23
0.7385 0.0209 0.9314 0.5232 0.0347 0.8889
0.2116 0.044 0.9107 0.6126 0.0596 1.8642
107
0.5645 0.0352 0.9286 0.6695 0.0454 0.2674
0.4421 0.0279 1.1424 0.7056 0.0516 0.3175
0.5088 0.0314 1.0037 0.6476 0.0489 0.1525
0.7701 0.0192 0.9634 0.6308 0.0283 0.2815
0.4965 0.0186 1.2083 0.4951 0.0548 0.4783
0.487 0.0362 0.9202 0.6272 0.0488 0.3401
0.3847 0.043 0.8571 0.55 0.0574 0.3939
0.5871 0.0271 0.9656 0.6006 0.042 0.5289
0.5288 0.0309 0.9645 0.6443 0.0446 0.3304
0.4683 0.0366 0.9167 0.6028 0.051 0.4332
0.4643 0.042 0.9167 0.672 0.0525 0.4112
0.5166 0.0175 1.297 0.5687 0.0518 0.4288
0.6254 0.0331 0.8987 0.6617 0.0404 0.5547
0.6314 0.031 0.9389 0.6861 0.0398 0.1997];
%NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288,NGC 4457
%NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806
%NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527
%NGC 4389,NGC 4496,NGC 4085,NGC 4096
% 1 for Spiral 0 for Barred Spiral
Y=[1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
Y=Y';
coeff=pca(training);
reduced_training=training*coeff(:,1:2);
svmStruct=svmtrain(training,Y,'kernel_function',...
'quadratic','showplot',true);
%'kernel_function','quadratic',
test=[0.3985 0.0389 0.9258 0.6263 0.0533 0.3471
0.7009 0.0124 1.1456 0.3995 0.0406 0.5291
0.2914 0.0296 1.067 0.577 0.0583 0.3457
0.246 0.0247 1.1345 0.5328 0.0597 0.5092
0.5692 0.0229 1.0414 0.5557 0.0447 0.5544
0.2172 0.0124 1.5533 0.4904 0.0611 0.5086
108
0.3022 0.0153 1.3596 0.4925 0.0576 0.7544
0.3798 0.029 1.0152 0.4938 0.0604 0.4923
0.2377 0.0559 0.8617 0.6898 0.0602 0.2997];
%NGC 6070,UGC 7617,NGC 4480,UGC 10133,NGC 4559,NGC 4242
%NGC 4393,NGC 4288,NGC 3985
coeff2=pca(test);
reduced_test=test*coeff2(:,1:2);
group = svmclassify(svmStruct,test,'showplot',true);
109
BIBLIOGRAPHY
[1] Lintott, Chris J. and Schawinski, Kevin and Slosar, Anze and Land, Kate and Bamford, Steven
and others. Galaxy Zoo: morphologies derived from visual inspection of galaxies from the
Sloan Digital Sky Survey. Mon. Not. R. Astron. Soc., 389, 1179-1189, 2008.
[2] R. A. Skibba, S. P. Bamford, R. C. Nichol, C. J. Lintott, D. Andreescu, E. M. Edmondson,
P. Murray and M. J. Raddick. Galaxy Zoo: disentangling the environmental dependence of
morphology and colour. Mon. Not. R. Astron. Soc., 399, 966-982, 2009.
[3] Shabala, S. S. and Ting, Y. S. and Kaviraj, S. and Lintott, C. and Crockett, R. M. and Silk, J.
and Sarzi, M. and Schawinski, K. and Bamford, S. P. and Edmondson, E. Galaxy Zoo: dust
lane early-type galaxies are tracers of recent, gas-rich minor mergers. Mon. Not. R. Astron.
Soc., 423, 59-67, 2012.
[4] Hubble, E. P. Extragalactic nebulae. Astrophysical Journal, 64, 321-369, 1926.
[5] de Vaucouleurs, G. Classification and Morphology of External Galaxies. Handbuch der
Physik, 53, 275-310, 1959.
[6] de Vaucouleurs, G. Revised Classification of 1500 Bright Galaxies. Astrophysical Journal
Supplement, 8, 31, 1963.
[7] Hubble, E. P. The Realm of the Nebulae. Yale University Press, 1936.
[8] Abazajian et al. The First Data Release of the Sloan Digital Sky Survey. The Astronomical
Journal, 128, 4, 2081-2086, 2003. , Volume 126, Issue 4, pp. 2081-2086
[9] Patrick Morrissey et al. The Calibration and Data Products of GALEX. The Astrophysical
Journal Supplement Series , 173, 682, 2007.
[10] Neugebauer, G. and Leighton, R. B. Two-micron sky survey. A preliminary catalogue. NASA
SP, Washington: NASA, 1696.
110
[11] Skrutskie et al. The Two Micron All Sky Survey (2MASS). The Astronomical Journal, 131,
1163-1183, 2006.
[12] Lasker et al. The Second-Generation Guide Star Catalog: Description and Properties. The
Astronomical Journal, 136, 735-766, 2008.
[13] Reid et al. The Second Palomar Sky Survey. Publications of the Astronomical Society of the
Pacific, 103, 661-674, 1991.
[14] W. Voges et al. The ROSAT all-sky survey bright source catalogue. Astronomy and Astro-
physics, 349, 389-405, 1999.
[15] Becker, R. H. and White, R. L. and Helfand, D. J. The FIRST Survey: Faint Images of the
Radio Sky at Twenty Centimeters. The Astronomical Journal, 450, 559, 1995.
[16] Becker, R. H. and White, R. L. and Helfand, D. J. The VLA’s FIRST Survey. Astronomical
Society of the Pacific Conference Series, 61, 165, 1994.
[17] Epchtein, N. et al. The deep near-infrared southern sky survey (DENIS). The Messenger, 87,
27-34, 1997.
[18] Storrie-Lombardi, MC et al. Morphological Classification of Galaxies by Artificial Neural
Networks. Mon. Not. R. Astron. Soc., 259, 8-12, 1992.
[19] Owens, E. A. and Griffiths, R. E. and Ratnatunga, K. U. Using oblique decision trees for the
morphological classification of galaxies. Mon. Not. R. Astron. Soc., 281, 153-157, 1996.
[20] Naim, A. and Lahav, O. and Sodre, Jr., L. and Storrie-Lombardi, M. C. Automated morpho-
logical classification of APM galaxies by supervised artificial neural networks. Mon. Not. R.
Astron. Soc., 275, 567-590, 1995.
[21] Nicholas M. Ball. Morphological Classification of Galaxies Using Artificial Neural Net-
works. University of Sussex,UK. MSc thesis, 2001.
111
[22] Lahav, O. Artificial neural networks as a tool for galaxy classification. Data Analysis in
Astronomy, 43-51, 1997.
[23] Folkes, S. R. and Lahav, O. and Maddox, S. J. An artificial neural network approach to the
classification of galaxy spectra. Mon. Not. R. Astron. Soc., 283, 651-665, 1996.
[24] Bershady, M. A. and Jangren, A. and Conselice, C. J. Structural and Photometric Classifica-
tion of Galaxies. I. Calibration Based on a Nearby Galaxy Sample. The Astronomical Journal,
119, 2645-2663, 2000.
[25] D. Bazell and David W. Aha Ensembles of Classifiers for Morphological Galaxy Classifica-
tion. The Astrophysical Journal, 548, 219, 2001.
[26] Goderya, Shaukat N. and Lolling, Shawn M. Morphological Classification of Galaxies using
Computer Vision and Artificial Neural Networks: A Computational Scheme. Astrophysics and
Space Science, 279, 377-387, 2002.
[27] D. Bazell. Feature relevance in morphological galaxy classification. Mon. Not. R. Astron.
Soc., 316, 519-528, 2000.
[28] Strateva, I. et al. Color Separation of Galaxy Types in the Sloan Digital Sky Survey Imaging
Data. The Astronomical Journal, 122, 1861-1874, 2001.
[29] Abraham, R. G. and van den Bergh, S. and Nair, P. A New Approach to Galaxy Morphology.
I. Analysis of the Sloan Digital Sky Survey Early Data Release. The Astrophysical Journal,
588, 218-229, 2003.
[30] D. Bazell and David W. Aha Ensembles of Classifiers for Morphological Galaxy Classifica-
tion. The Astrophysical Journal, 548, 219, 2001.
[31] de la Calleja, J. and Fuentes, O. Machine learning and image analysis for morphological
galaxy classification. Mon. Not. R. Astron. Soc., 349, 87-93, 2001.
112
[32] C. Scarlata et al. COSMOS Morphological Classification with the Zurich Estimator of Struc-
tural Types (ZEST) and the Evolution Since z = 1 of the Luminosity Function of Early, Disk,
and Irregular Galaxies. The Astrophysical Journal Supplement Series, 172, 406-433, 2007.
[33] Banerji, M. et al. Galaxy Zoo: reproducing galaxy morphologies via machine learning. Mon.
Not. R. Astron. Soc., 406, 342-353, 2010.
[34] Goderya, S. and Andreasen, J. D. and Philip. Advances in Automated Algorithms For Mor-
phological Classification of Galaxies Based on Shape Features. Astronomical Data Analysis
Software and Systems (ADASS) XIII, 314, 617, 2004.
[35] Odewahn, S. C. Automated galaxy classification in large sky surveys. Neural Networks,
1999. IJCNN ’99. International Joint Conference on, 6, 3824-3829, 1999.
[36] Odewahn, S. C. Automated galaxy classification with the APS digitization of POSS I. Astro-
physical Letters and Communications, 31, 55-64, 1995.
[37] Odewahn, S. C. and Windhorst, R. A. and Driver, S. P. and Keel, W. C. Automated Morpho-
logical Classification in Deep Hubble Space Telescope UBVI Fields: Rapidly and Passively
Evolving Faint Galaxy Populations. Astrophysical Journal Letters, 472, L13-L16, 1996.
[38] Maehoenen, P. H. and Hakala, P. J. Automated Source Classification Using a Kohonen Net-
work. Astrophysical Journal Letters, 452, L77, 1995.
[39] Cortiglioni, F. and Mähönen, P. and Hakala, P. and Frantti, T. Automated Star-Galaxy Dis-
crimination for Large Surveys. The Astrophysical Journal, 556, 937-943, 2001.
[40] Baillard, A. and Bertin, E. and Mellier, Y. and McCracken, H. J. and Géraud, T. and Pelló,
R. and Leborgne, F. and Fouqué, P. Project EFIGI: Automatic Classification of Galaxies.
Astronomical Society of the Pacific, 351, 236, 2006.
113
[41] Byun, Hyeran and Lee, Seong-Whan. Applications of Support Vector Machines for Pattern
Recognition: A Survey. Proceedings of the First International Workshop on Pattern Recogni-
tion with Support Vector Machines, 213-236, 2002.
[42] Romano, Raquel A. and Aragon, Cecilia R. and Ding, Chris. Supernova Recognition Us-
ing Support Vector Machines. Proceedings of the 5th International Conference on Machine
Learning and Applications, 77-82, 2006.
[43] Huertas-Company, M. et al. A robust morphological classification of high-redshift galaxies
using support vector machines on seeing limited images. Astronomy & Astrophysics, 497,
743-753, 2009.
[44] Freed, M. and Jeonghwa Lee. Application of Support Vector Machines to the Classifica-
tion of Galaxy Morphologies. Computational and Information Sciences (ICCIS), 2013 Fifth
International Conference on, 322-325, 2013.
[45] Saybani et al. Applications of support vector machines in oil refineries: A survey. Interna-
tional Journal of the Physical Sciences, 6(27), 6295-6302, 2011.
[46] Xie W, Yu L, Xu S, Wang S. A New Method for Crude Oil Price Forecasting Based on
Support Vector Machines. Computational Science-ICCS, 3994, 444-451, 2006.
[47] Petkovic, Milena R. and Rapaic, Milan R. and Jakovljevic, Boris B. Electrical Energy Con-
sumption Forecasting in Oil Refining Industry Using Support Vector Machines and Particle
Swarm Optimization. WSEAS Trans. Info. Sci. and App., 6(11), 1761-1770, 2009.
[48] Balabin RM, Safieva RZ, Lomakina EI. Gasoline classification using near infrared (NIR)
spectroscopy data: comparison of multivariate techniques. Anal Chim Acta., 671, 27-35, 2010.
[49] Guo, Guodong and Li, Stan Z. and Chan, Kapluk. Face Recognition by Support Vector
Machines. Proceedings of the Fourth IEEE International Conference on Automatic Face and
Gesture Recognition 2000, 196-201, 2000.
114
[50] Guodong Guo and Stan Z. Li and Kap Luk Chan. Support vector machines for face recogni-
tion. Image and Vision Computing, 19, 631-638, 2001.
[51] John C. Platt and Nello Cristianini and John Shawe-taylor. Large Margin DAGs for Multiclass
Classification. Advances in Neural Information Processing Systems, 547-553, 2000.
[52] Haizhou Ai and Luhong Liang and Guangyou Xu. Face detection based on template match-
ing and support vector machines. Image Processing, 2001. Proceedings. 2001 International
Conference on, 1, 1006-1009, 2001.
[53] Romdhani, S. and Torr, P. and Scholkopf, B. and Blake, A. Computationally efficient face
detection. Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Con-
ference on, 2, 695-700, 2001.
[54] Yongmin Li and Shaogang Gong and Liddell, H.. Support vector regression and classification
based multi-view face detection and recognition. Automatic Face and Gesture Recognition,
2000. Proceedings. Fourth IEEE International Conference on, 300-305, 2000.
[55] Ng, Jeffrey and Gong, Shaogang. Multi-View Face Detection and Pose Estimation Using a
Composite Support Vector Machine Across the View Sphere. Proceedings of the International
Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems,
14–, 1999.
[56] Jeffrey Ng and Shaogang Gong. Composite support vector machines for detection of faces
across views and pose estimation. Image and Vision Computing, 20, 359-368, 2002.
[57] Yongmin Li and Shaogang Gong and Sherrah, J. and Liddell, H. Multi-view face detection
using support vector machines and eigenspace modelling. Knowledge-Based Intelligent Engi-
neering Systems and Allied Technologies, 2000. Proceedings. Fourth International Conference
on, 1, 241-244, 2000.
115
[58] Osuna, E. and Freund, R. and Girosi, F. Training support vector machines: an application
to face detection. Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE
Computer Society Conference on, 130-136, 1997.
[59] Kumar, V.P. and Poggio, T. Learning-based approach to real time tracking and analysis of
faces. Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE Interna-
tional Conference on, 96-101, 2000.
[60] Yuan Qi and Doermann, D. and DeMenthon, D. Hybrid independent component analysis
and support vector machine learning scheme for face detection. Acoustics, Speech, and Signal
Processing, 2001. Proceedings. (ICASSP ’01). 2001 IEEE International Conference on, 3,
1481-1484, 2001.
[61] Terrillon, J.-C. and Shirazi, M.N. and Sadek, M. and Fukamachi, H. and Akamatsu, S. Hybrid
independent component analysis and support vector machine learning scheme for face detec-
tion. Pattern Recognition, 2000. Proceedings. 15th International Conference on, 4, 210-217,
2000.
[62] Papageorgiou, C.P. and Oren, M. and Poggio, T. A general framework for object detection.
Computer Vision, 1998. Sixth International Conference on, 555-562, 1998.
[63] Haizhou Ai and Luhong Liang and Guangyou Xu. Face detection based on template match-
ing and support vector machines. Image Processing, 2001. Proceedings. 2001 International
Conference on, 1, 1006-1009, 2001.
[64] Roobaert, D. and Van Hulle, M.M. View-based 3D object recognition with support vector
machines. Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE
Signal Processing Society Workshop., 77-84, 1999.
[65] Pontil, Massimiliano and Verri, Alessandro Support Vector Machines for 3D Object Recog-
nition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 637-646, 1998.
116
[66] Yingjie Wang and Chin-Seng Chua and Yeong-Khing Ho. Facial feature detection and face
recognition from 2D and 3D images. Pattern Recognition Letters, 23, 1191-1202, 2002.
[67] Kim, K.I and Kim, J. and Jung, K. Recognition of facial images using support vector ma-
chines. Statistical Signal Processing, 2001. Proceedings of the 11th IEEE Signal Processing
Workshop on, 468-471, 2001.
[68] M. Pittore, C. Basso, and A. Verri. Representing and recognizing visual dynamic events with
support vector machines. In Proceedings of Int. Conference on Image Analysis and Processing,
18-23, 1999.
[69] C. Nakajima, M. Pontil, and T. Poggio. People recognition and pose estimation in image
sequences. In Proceedings of IEEE Int. Joint Conference on Neural Net-works, 4, 189-194,
2000.
[70] S. Gutta, J.R.J. Huang, P. Jonathon, and H. Wechsler. Mixture of experts for classification of
gender, ethnic origin, and pose of human. IEEE Trans. on Neural Networks, 4, 948-960, 2001.
[71] Loo-Nin Teow and Kia-Fock Loe. Robust vision-based features and classification schemes
for off-line handwritten digit recognition. Pattern Recognition, 35, 2355-2364, 2002.
[72] Dashan Gao and Jie Zhou and Leping Xin. SVM-based detection of moving vehicles for
automatic traffic monitoring. Intelligent Transportation Systems, 2001. Proceedings. 2001
IEEE, 745-749, 2001.
[73] Kent, S. and Kasapoglu, N. G. and Kartal, M. Radar target classification based on support
vector machines and High Resolution Range Profiles. Radar Conference, 2008. RADAR ’08.
IEEE, 1-6, 2008.
[74] Choisy, C. and Belaid, A. Handwriting recognition using local methods for normalization
and global methods for recognition. Document Analysis and Recognition, 2001. Proceedings.
Sixth International Conference on, 23-27, 2001.
117
[75] Gorgevik, D. and Cakmakov, D. and Radevski, V.. Handwritten digit recognition by combin-
ing support vector machines using rule-based reasoning. Information Technology Interfaces,
2001. ITI 2001. Proceedings of the 23rd International Conference on, 1, 139-144, 2001.
[76] Junxian Li and Limin Shen and Shuo Yang. A Novel Radar Target Recognition Algorithm
Based on SVM. Intelligent Information Technology Application Workshops, 2008. IITAW ’08.
International Symposium on, 431-434, 2008.
[77] Oliveira, L. and Sabourin, R. Support vector machines for handwritten numerical string
recognition. Frontiers in Handwriting Recognition, 2004. IWFHR-9 2004. Ninth International
Workshop on, 39-44, 2004.
[78] Xin Dong and Wi Zhaohui. Speaker recognition using continuous density support vector
machines. Electronics Letters, 37, 1099-1101, 2001.
[79] Bengio, S. and Mariethoz, J. Learning the decision function for speaker verification. Acous-
tics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP ’01). 2001 IEEE Interna-
tional Conference on, 1, 425-428, 2001.
[80] Changxue Ma and Randolph, M.A and Drish, J. A support vector machines-based rejection
technique for speech recognition. Acoustics, Speech, and Signal Processing, 2001. Proceed-
ings. (ICASSP ’01). 2001 IEEE International Conference on, 35, 381-384, 2001.
[81] Wan, V. and Campbell, W.M. Support vector machines for speaker verification and identifi-
cation. Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal
Processing Society Workshop, 2, 775-784, 2000.
[82] Guo, G. and Hong-Jiang Zhang and Li, S.Z. Distance-from-boundary as a metric for texture
image retrieval. Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP ’01).
2001 IEEE International Conference on, 3, 1629-1632, 2001.
118
[83] Qi Tian and Hong, Pengyu and Huang, T.S. Update relevant image weights for content-based
image retrieval using support vector machines. Multimedia and Expo, 2000. ICME 2000. 2000
IEEE International Conference on, 2, 1199-1202, 2000.
[84] H. Druker, B. Shahrary, and D.C. Gibbon. Support vector machines: relevance feedback and
information retrieval. Information Processing & Management, 3, 305-323, 2002.
[85] Lei Zhang and Fuzong Lin and Bo Zhang. Support vector machine learning for image re-
trieval. Image Processing, 2001. Proceedings. 2001 International Conference on, 2, 721-724,
2001.
[86] Francis E.H. Tay and L.J. Cao. Modified support vector machines in financial time series
forecasting. Neurocomputing, 48, 847 - 861, 2002.
[87] Fan, A and Palaniswami, M. Selecting bankruptcy predictors using a support vector ma-
chine approach. Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS
International Joint Conference on, 6, 354-359, 2000.
[88] B. Moghaddam and M. H. Yang. Gender classification using support vector machines. In
Proceedings of IEEE Int. Conference on Image Processing, 2, 471-474, 2000.
[89] Yuan Yao and Gian Luca Marcialis and Massimiliano Pontil and Gian Luca and Marcialis
Massimiliano Pontil and Paolo Frasconi and Fabio Roli. Combining Flat and Structured Repre-
sentations for Fingerprint Classification With Recursive Neural Networks and Support Vector
Machines. Pattern Recognition, 36, 397-406, 2003.
[90] Xie, W.F. and Hou, D. J. and Song, Q. Bullet-hole image classification with support vector
machines. Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE
Signal Processing Society Workshop, 1, 318-327, 2000.
119
[91] C. Ongun, U. Halici, K. Leblebicioglu, V. Atalay, M. Beksac, and S. Beksac. Feature ex-
traction and classification of blood cells for an automated differential blood count system. In
Proceedings of Int. Joint Conference on Neural Networks, 2461-2466, 2001.
[92] Drucker, H. and Wu, S. and Vapnik, V.N. Support vector machines for spam categorization.
Neural Networks, IEEE Transactions on, 10, 1048-1054, 1999.
[93] Junping Zhang and Ye Zhang and Tingxian Zhou. Classification of hyperspectral data using
support vector machine. Image Processing, 2001. Proceedings. 2001 International Conference
on, 1, 882-885, 2001.
[94] Ramirez, L. and Pedrycz, W. and Pizzi, N. Severe storm cell classification using support
vector machines and radial basis function approaches. Electrical and Computer Engineering,
2001. Canadian Conference on, 1, 87-91, 2001.
[95] Chapelle, O. and Haffner, P. and Vapnik, V.N. Support vector machines for histogram-based
image classification . Neural Networks, IEEE Transactions on, 10, 1055-1064, 1999.
[96] Gintu Xavier, Tintu Erlin Philip, Deepthi T.V.N, K.P Soman. An Efficient Algorithm for
the Segmentation of Astronomical Images. IOSR Journal of Computer Engineering, 6, 21-29,
2012.
[97] Grant Privett. Creating and Enhancing Digital Astro Images. Springer Science & Business
Media, Jan 7, 2007.
[98] Jean-Luc Starck and Fionn Murtagh. Astronomical Image and Data Analysis. Springer-
Verlag Berlin Heidelberg, 2006.
[99] Peng, E. W. and Ford, H. C. and Freeman, K. C. and White, R. L. A Young Blue Tidal Stream
in NGC 5128. The Astronomical Journal, 124, 3144-3156, 2002.
120
[100] Jacob Lucas, Brandoch Calef, and Keith Knox. Image enhancement for astronomi-
cal scenes. Proc. SPIE 8856, Applications of Digital Image Processing XXXVI, 885603
doi:10.1117/12.2025191, 2013.
[101] Eliska Anna Kubickova. Processing of Astronomical Images Using Matlab Image Process-
ing Toolbox. Ad Alta : Journal of Interdisciplinary Research, 2011.
[102] Adith Chandrasekhar. Point Extraction and Matching for Registration of Infrared Astro-
nomical Images. Chester F. Carlson Center for Imaging Science of the College of Science,
Rochester Institute of Technology, Master’s Thesis, 1999.
[103] Hoekzema, N. M. and Brandt, P. N. Small-scale topology of solar atmosphere dynamics.
Astronomy and Astrophysics, 353, 389-395, 2000.
[104] Comerón, S. and Knapen, J. H. and Sheth, K. and Regan, M. W. and Hinz, J. L. and Gil de
Paz, A. and Menéndez-Delmestre, K. and Muñoz-Mateos, J.-C. and Seibert, M. and Kim, T.
and Athanassoula, E. and Bosma, A. and Buta, R. J. and Elmegreen, B. G. and Ho, L. C. and
Holwerda, B. W. and Laurikainen, E. and Salo, H. and Schinnerer, E. The Thick Disk in the
Galaxy NGC 4244 from S4G Imaging. The Astrophysical Journal, 729, 18, 2011.
[105] Davis, D. R. and Hayes, W. B. Scalable Automated Detection of Spiral Galaxy Arm Seg-
ments. The Astrophysical Journal, 790, 87, 2014.
[106] Ji, T-L and Sundareshan, M.K. and Roehrig, H. Adaptive image contrast enhancement based
on human visual properties. Medical Imaging, IEEE Transactions on, 13, 573-586, 1994.
[107] Rafael C. Gonzales and Richard E. Woods. Digital Image Processing. Prentice Hall, 3
edition, 2012.
[108] Starck, J.-L. and Murtagh, F. and Pirenne, B. and Albrecht, M. Astronomical Image Com-
pression Based on Noise Suppression. Publications of the Astronomical Society of the Pacific,
108, 446-455, 1996.
121
[109] Faundez-Abans, M. and de Oliveira-Abans, M. Looking for fine structures in galaxies.
Astronomy and Astrophysics, 128, 289-297, 1998.
[110] James F. Scholl. Image enhancement of the galaxy VV371c using the 2D fast
wavelet transform. Proc. SPIE 2308, Visual Communications and Image Processing ’94,
doi:10.1117/12.185886., 1268, 1994.
[111] Burkhead, M. S. and Matuska, W. Fourier Transform Enhanced Photography of the M51
System. AAS Photo Bulletin, 23, 13, 1980.
[112] S. Djorgovski. Enhancement Of Features In Galaxy Images. Proc. SPIE 0627, Instrumen-
tation in Astronomy VI, 674 doi:10.1117/12.968146, 674, 1986.
[113] Jenkinson et al. Machine Learning and Image Processing in Astronomy with Sparse Data
Sets. Submitted to: IEEE Transactions on Systems, Man, and Cybernetics, 2014.
[114] Starck, J. L. and Donoho, D. L. and Candès, E. J. Astronomical image representation by the
curvelet transform. Astronomy and Astrophysics, 398, 785-800, 2003.
[115] Leonid P. Yaroslavsky. Local adaptive image restoration and enhancement with the use of
DFT and DCT in a running window. Proc. SPIE 2825, Wavelet Applications in Signal and
Image Processing IV, doi:10.1117/12.255218., 2, 1996.
[116] Artyom M. Grigoryan and Mehdi Hajinoroozi. Image and Audio Signal Filtration with
Discrete Heap Transforms. Applied Mathematics and Sciences: An International Journal
(MathSJ), 1, 2014.
[117] Artyom M. Grigoryan, Sos S. Agaian. Alpha-Rooting Method of Color Image Enhancement
by Discrete Quaternion Fourier Transform. Proc. SPIE 9019, Image Processing: Algorithms
and Systems XII, 901904 (February 25, 2014); doi:10.1117/12.2040596.
[118] McClellan, James H. Artifacts in alpha-rooting of images. Acoustics, Speech, and Signal
Processing, IEEE International Conference on ICASSP ’80, 5, 449-452, 1980.
122
[119] Arslan, F.T. and Grigoryan, AM. Image enhancement by the tensor transform. Biomedical
Imaging: Nano to Macro, 2004. IEEE International Symposium on, 1, 816-819, 2004.
[120] Arslan, F.T. and Grigoryan, AM. Fast splitting alpha-rooting method of image enhancement:
tensor representation. IEEE Trans. Image Process., 15, 3375-3384, 2006.
[121] Arslan, F.T. and Grigoryan, AM. Enhancement of Medical Images by the Paired Transform.
Image Processing, 2007. ICIP 2007. IEEE International Conference on, 1, 537-540, 2007.
[122] Grigoryan, Artyom M. and Naghdali, Khalil. On a Method of Paired Representation: En-
hancement and Decomposition by Series Direction Images. J. Math. Imaging Vis., 34, 185-199,
2009.
[123] Ronald R. Coifman and Artur Sowa. Combining the Calculus of Variations and Wavelets
for Image Enhancement. Applied and Computational Harmonic Analysis, 9, 1-18, 2000.
[124] Ruchika Mishra and Utkarsh Sharma and Manish Shrivastava. Contrast Enhancement of
Remote Sensing Images using DWT with Kernel Filter and DTCWT. International Journal of
Computer Applications, 87, 43-49, 2014.
[125] Naghdali, K. and Ranjith, R. and Grigoryan, AM. Fast signal-induced transforms in image
enhancement. Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Confer-
ence on, 565-570, 2009.
[126] Morrow, W.M. and Paranjape, R.B. and Rangayyan, R.M. and Desautels, J. E L. Region-
based contrast enhancement of mammograms. Medical Imaging, IEEE Transactions on, 11,
392-406, 1992.
[127] Agaian, S. S. and Silver, B. and Panetta, K. A. Transform Coefficient Histogram-Based
Image Enhancement Algorithms Using Contrast Entropy. Trans. Img. Proc., 16, 741-758,
2007.
123
[128] Agaian, S.S. and Panetta, K. and Grigoryan, AM. Transform-Based Image Enhancement
Algorithms with Performance Measure. Image Processing, IEEE Transactions on, 10, 367-
382, 2001.
[129] Douglas F. Elliott. Handbook of Digital Signal Processing: Engineering Applications. Aca-
demic Press, Feb 1, 1988.
[130] Díaz-Hernández, R. and González, J. J. and Costero, R. and Guichard, J. Retrieval of spec-
troscopic information from the Tonantzintla Schmidt camera archival plates. Society of Photo-
Optical Instrumentation Engineers (SPIE) Conference Series, 8011, 2011.
[131] A.M. Grigoryan and S.S. Agaian. Multidimensional Discrete Unitary Transforms: Repre-
sentation, Partitioning and Algorithms. Marcel Dekker Inc., New York, 2003.
[132] A. Grigoryan and M. Grigoryan. Brief notes in advanced dsp: Fourier analysis with matlab.
CRC Press Taylor and Francis Group, 2009.
[133] Agaian, S.S. and Panetta, K. and Grigoryan, AM. Discrete unitary transforms generated by
moving waves. Proc. of the International Conference: Wavelets XII, SPIE: Optics+Photonics,
6701, 25, 2007.
[134] Grigoryan, AM. 2-D and 1-D multipaired transforms: frequency-time type wavelets. Signal
Processing, IEEE Transactions on, 49, 344-353, 2001.
[135] Grigoryan, AM. and Grigoryan, M.M., Nonlinear Approach Of Construction of Fast Unitary
Transforms. Information Sciences and Systems, 2006 40th Annual Conference on, 1073-1078,
2006.
[136] Nobuyuki Otsu. A Threshold Selection Method from Gray-Level Histograms. Systems,
Man and Cybernetics, IEEE Transactions on, 9, 62-66, 1979.
124
[137] Abraham, R. G. and Valdes, F. and Yee, H. K. C. and van den Bergh, S. The morphologies
of distant galaxies. 1: an automated classification system. Astrophysical Journal, Part 1, 432,
75-90, 1994.
[138] Ivezic, Ž. and Connolly, A.J. and Vanderplas, J.T. and Gray, A. Statistics, Data Mining and
Machine Learning in Astronomy. Princeton University Press, Princeton, NJ 2014.
125
VITA
John Jenkinson is from Austin, Texas. He graduated with a Bachelor of Science from the
University of Texas at San Antonio. He is currently completing his Masters of Science in Electrical
Engineering degree from the University of Texas at San Antonio (UTSA). His future plans include
attending a PhD program at UTSA.