Bidhu Final v36by48 2

Embed Size (px)

Citation preview

  • 7/26/2019 Bidhu Final v36by48 2

    1/1

    Big Data of the Huge Universe (BIDHU)When Galaxies Collide A Machine Learning Approach

    Duilia F. de Mello1,2, Felipe Augusto de Castro e Silva1,3, Gabriel Alexandre Santos4,5, Pedro Xavier4,61The Catholic University of America, 2 NASA Goddard Space Flight Center, 3University of Campinas,

    4Northern Virginia Community College (NOVA), 5Uniandrade PR, 6Federal Institute of Goias

    We present the latest results of an ongoing research project searching for clues regarding

    the evolution of galaxies. Our own galaxy, the Milky Way, is in an active process of collision

    with small dwarf galaxies and will collide with the large Andromeda galaxy in approximately

    4 billion years from now. In this project, we are searching for evidence that other galaxies

    are going through similar processes. We are using the largest database of the universe,

    known as the Sloan Digital Sky Survey, to search for colliding galaxies. First we selected

    a small slice of the universe and were able to identify 100 candidates. We improved our

    search code and later we were able to select 40,000 pairs of galaxies. We are now in the

    process of implementing a code that is training the computer (Machine Learning) to

    verify whether these pairs of galaxies are in advanced stages of collision. Here we present

    the latest results on the Machine Learning Approach.

    Abstract

    We adopt a machine learning approach that uses a convolutional neural network todetermine, from galaxies images, whether there is or not a collision. Results were

    considerable successful and some new approaches were proposed to improve the method in

    future works.

    Galaxies Collision through Convolutional Neural Networks

    Convolutional Neural Networks are mathematical models inspired by the organization ofthe animal visual cortex that are largely used in image processing and recognition. Their

    operation can be compared to the human eye; for example, if you see a dog, in a small amountof time, your eye will register the image, transform it into electric pulses, send it to some

    neurons that are going to classify the data and make you understand that there is a doginfront of you. In this work, we make a learning machine see some galaxies images andclassify them as colliding or non-colliding galaxies.

    To make this classification possible, we need to teach the mathematical model how tounderstand whether these galaxies are colliding or not. This is done through a process

    named training, in which were presented 200 pre-classified galaxiesby eye by our team tothe machine (100 colliding galaxies and 100 non-colliding galaxies) to make it understand

    which images contain colliding galaxies and which contain non-colliding galaxies.

    COLLIDING and NOT

    Various architectures of convolutional neural networks

    were tested. In the best outcome, in a set of 80 images (40

    colliding galaxies and 40 non-colliding galaxies), the

    galaxies in the images classified as colliding were really in

    collision, but 12 colliding galaxies were misclassified as

    non-colliding, what is called underfitting.

    We conclude that our trained machine was 85% of thetime correct BUT current sample is small and need to

    be increased!!!

    YES YES YES YES YES YES YES YES YES YES YES YES

    YES YES YES YES YES YES YES YES YES YES YES YES

    YES YES YES YES YES YES YES YES YES YES YES YES

    YES YES YES YES YES YES YES YES YES YES YES YESNO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO

    NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO

    NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO

    NO NO NO NO

    Results

    Conclusions and future work

    Our method can already be used to classify real data.

    Due to hardware processing narrows and the number of

    available previous classified images, more complex

    architectures could not be tested yet.

    Many other machine learning methods can be

    aggregated for more precise results.

    Expand classification to 40,000 galaxies soon

    We will create a GUI that will allow each user to select a

    galaxy on the screen and let the device do the

    classification.

    Two years ago D. de Mello embarked in a BIG DATA

    project that she didn't have much experience with, but

    she could see the potential it had in introducing

    undergrads to research.

    Most of the students she talked to had heard of big data,

    but when they accepted working on this project they had

    no idea that their database would be the entireuniverse!

    Today we are a group of 10 people spread in the US and

    in Brazil working on a project we call the Big Data of the

    Huge Universe (BIDHU). This is what we haveaccomplished so far:

    First step: sample selection by undergraduate studens

    at CUA W. Barbosa, Ana Nascimento (visiting), Rocio

    Rossi : 100 colliding galaxies

    Second step: expand sample with international

    collaborators A. Borges, M. Goya, S. Puga: 40,000candidates

    Third step: machine learning by undergraduatestudents at CUA and NOVA Felipe de Castro e Silva,

    Gabriel A. Santos, Pedro Xavier

    1 undergraduate thesis in computer science W.

    Barbosa (UFAL)

    1 honorable mention on science fair of undergrads

    Ana C. Nascimento (UFRJ)

    2 conference presentations Rocio Rossi (CUA) and

    Ana C. Nascimento (UFRJ).

    The BIDHU Project A Success Story

    This work uses a machine learning approach through

    Convolutional Neural Networks to classify galaxy

    images as colliding and non-colliding

    Convolutional Neural Networks are mathematical

    models that are largely used to classify image data.

    200 pre-classified images were used to train the neural

    network.

    Various neural network architectures were tested and

    there are good outcomes for colliding images

    classification (85% success), but larger sample is

    required.

    Future work may use other machine learning methods

    to improve the results and will expand it to 40,000

    galaxies.

    Summary

    Interested in joining the BIDHU project? Email [email protected], follow our posts on

    LinkedIn and come to 206 Hannan Hall.

    We have billions of galaxies to classify and many discoveries to be made!

    Examples of colliding galaxies discovered by the BIDHU team