12
MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION Silvia de Castro García 16/03/2018 Machine Learning for cluster-galaxy classification 1 Directores: Dr. Ricardo Pérez Martínez, Dra. Ana María Pérez García

MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

MACHINE LEARNING FOR CLUSTER-GALAXY CLASSIFICATION Silvia de Castro García

16/03/2018 Machine Learning for cluster-galaxy classification 1

Directores: Dr. Ricardo Pérez Martínez, Dra. Ana María Pérez García

Page 2: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

INTRODUCTION

2

Context Galaxy Clusters are giant cosmic laboratories harboring thousands of objects with different origins and characteristics. It is commonly accepted that the evolution of galaxies within clusters differs from that in the field, although the main processes are still poorly understood. Key to a full characterization of these objects in such a high density environments is a comprehensive study of a coherent set of clusters, using a wide variety of photometric data from different space observatories and optical surveys from ground based telescopes.

Galaxy Cluster SDSS J1044 +4112

Page 3: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

INTRODUCTION

3

Problem However, the current limited classification techniques do not scale appropriately with the vast volume of data and data formats available.

Page 4: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

INTRODUCTION

4

Solution Apply machine learning techniques (both supervised and unsupervised learning) to multi-wavelength datasets In order to efficiently classify cluster galaxies.

Page 5: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

INTRODUCTION

5

Science Case Objective

Cluster membership determination: Develop a fast photo-z estimator able to establish memberships with accuracy comparable to spectroscopic redshifts.

Page 6: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

BACKGROUND

6

Machine Learning techniques are starting to be widely used in Astronomy. We find several works in photometric redshift estimation in different domains:

“Cooperative photometric redshift estimation” – S. Cavuoti+ 2017 “Metaphor: a ML based method for the probability density estimation of photometric redshifts” – S. Cavuoti+ 2017 “Mapping the galaxy color-redshift relation: optimal photometric redshift calibration strategies for cosmology surveys”- D. Masters+ 2015 “Photometric redshifts for quasars in multi-band surveys – M. Brescia+ 2013

Page 7: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

THE DATA

2016-08-04 Title of the presentation Confidential - For internal use only 7

Combining data of 7 different catalogues: o XMM-Newton and Chandra catalogues for X-ray data; o GALEX for ultraviolet data; o Moran et. al. (2005) catalogue of optical/NIR information including

HST and ground-based broad-band data (from CFHT and Hale 200-inch Telescopes);

o IRAC and MIPS data from Spitzer; o PACS & SPIRE from Herschel.

19670 sources 1262 cluster-

member 32 photometric

points

Multi-wavelength photometric catalogue of cluster ZwCl0024+1652 produced by

Pérez Martinez et. al. (2016)

Page 8: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

Weka GUI – Explorer and Visualization 8

WEKA (Waikato Environment for Knowledge Analysis) https://www.cs.waikato.ac.nz/ml/weka/Witten_et_al_2016_appendix.pdf

WEKA is a data mining framework providing state-of-the-art techniques in machine learning.

THE TOOL

Advantages Drawbacks Easy to use – GUI available Specific-dedicated format

(*.arff) – No FITS compatible.

Highly portable – written in JAVA

Not widely used in Astronomy –> few use-cases available

Wide set of ML techniques including: data pre-processing, classification, regression, clustering, association rules and visualizing capabilities.

Not possible to train models from large data sets from Weka Explorer GUI although owners claim should be possible with the CLI (further work for Big Data shall be explored).

Open Source – GNU General Public License

Page 9: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

9

SCIENCE CASE 1: PHOTO-Z ESTIMATOR

DATA PRE-PROCESSING 1 CLUSTERING 2

CLASSIFYING 3 PHOTO-Z DETERMINATION 4

o FITS2ARFF conversion o Adding attributes (deriving colours from

photometric points) – 10 colours; o Removing redundant/irrelevant

attributes

o Objective: Find clusters in the colour-data of the training set (1262 galaxies with spectroscopic z)

o ML technique: K-means algorithm with Euclidian distance

o Objective: Classify the test set, using the clusters found in previous step

o ML-technique: K-nearest neighbours

o Objective: Estimate photo-z o ML-technique: Computing the median

photo-z of the sources of the cluster.

Page 10: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

10

IN PROGRESS o Pre-processing: Selecting the most-significant

colours; o Clustering:

Improving k selection for k-means (Elbow method);

Manhattan distance vs Euclidian distance; o Classifying:

Test different options of K-NN;

Page 11: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

11

NEXT STEPS o Keep on tuning clustering and classifying methods to improve results; o Explore other ML techniques for the photo-z estimator (e.g. Self-Organised

Maps or Expectation Maximization for clustering, Random forest, SVM and Deep Learning for classification);

o Explore the semi-supervised approach; o Extend methodology to different cluster data; o Compare results and extract conclusions;

Technology: o Test WEKA CLI performance with larger datasets; o Explore WEKA for Big Data; o Check suitability of WEKA vs. other tools (Python SciPy / Keras)

Page 12: MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION · GALAXY CLASSIFICATION . Silvia de Castro García . 16/03/2018 Machine Learning for cluster-galaxy classification . 1 ... accepted

12

QUESTIONS?

THANK YOU