22
Toledo, 2006-02- 25 Databases @ MPA Databases@MPA, access methods and plans With contributions from JHU : Alex Szalay, Jan Vanderberg MPA: Jeremy Blaizot, Jarle Brinchmann, Guinevere Kauffmann, Anja von der Linden, Ben Panter, Guo Qi, Volker Springel, Vivienne Wild

Databases@MPA, access methods and plans

  • Upload
    von

  • View
    20

  • Download
    0

Embed Size (px)

DESCRIPTION

Databases@MPA, access methods and plans. With contributions from JHU : Alex Szalay, Jan Vanderberg MPA: Jeremy Blaizot, Jarle Brinchmann, Guinevere Kauffmann, Anja von der Linden, Ben Panter, Guo Qi, Volker Springel, Vivienne Wild. - PowerPoint PPT Presentation

Citation preview

Page 1: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Databases@MPA, access methods and plans

With contributions from • JHU : Alex Szalay, Jan Vanderberg• MPA: Jeremy Blaizot, Jarle Brinchmann, Guinevere Kauffmann, Anja von der Linden, Ben Panter, Guo Qi, Volker Springel, Vivienne Wild

Page 2: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Last year, Budapest

• Presented milli-Millennium halo merger tree database

• Requests:– More properties (lambda, ...) X– Galaxies V– Correlation with environment (galaxies in voids) V– Millennium

• Why use databases ? Ask Alex.

Page 3: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Current status

• milli-Millennium– Galaxies added: merger trees, links to their parent halos– Density field at various smoothings– Updated web site (demo)

• Millennium subset– Subset (~2%, 10x milli-Mil) of halo and galaxy trees– Z=0 density field

• Millennium– Halo trees in database (proprietary)– SAM galaxies under way (settle on model etc)– Density fields at all Z will be added: 1056964608 rows

• Durham – milli_Millennium mirror (Postgres)– Durham halo tree and galaxy catalogues

Page 4: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Other databases

• ROSAT: source catalogues and RASS photons (~100 million)

• SDSS Peripherals– SDSS_MPA (Brinchman, Kauffmann, Tremonti et al)– MOPED (Ben Panter)– SDSS_PCA (Vivienne Wild et al)

• GalICS (Jeremy Blaizot)• HEALPix all sky maps (Alex Szalay, Tony Banday)

– wmap (3 year data soon !)– extinction maps– radio maps (Bonn)– ROSAT background (hopefully)

Page 5: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Access

• Public: http://www.g-vo.org/mpasims• Local web apps to Millennium, BESTDR3 and

peripherals: http://www.g-vo.org/sdssdr3/• Public web browser queries limited (1min,

10000 rows)• Local databases + web apps less limited

Page 6: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Streaming

• Query results temporarily buffered on server: memory

• Streaming queries: faster, less limited (only timeout)

• Access:– IDL (with Ben Panter)

• wget –http-user=*** --http-password=*** -O localfile.csv http://www.g-vo.org/sdssdr3/DBQueryStream?SQL=select * from moped..agebin

• GUI asking for username/password• Interprets CSV stream, turned into IDL components

– TOPCAT

Page 7: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Plans: Millennium

• Millennium:– Tune database

• 750000000 halos• N x 1000000000 galaxies• 63 x 256^3 density field grid cells

– More halo properties (shape, λ, ...)– More galaxy catalogues

• different parameters • different algorithms (GalICS, Durham, ...)

– Light cone mock catalogues– Galaxy spectra (+ PCA)– Links to SDSS mirror and peripherals– Proper metadata handling (ala SkyServer)– "SAM online„– Move webapps to MPA– Use JHU services, install CAS jobs

Page 8: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Plans: SDSS mirror + peripherals

• Make mirror web site public• Upgrade SDSS mirror to DR4 …• Stabilize, document, publish SDSS

peripherals• Proper metadata handling• Links to Millennium• Personal databases: MyDB (ala SkyServer)

• Add logos

Page 9: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Theory VO: spectra

• Combine theory and observations• Example: query-by-example on theory

spectra• Find similar spectra, from these the actual

galaxy formation history• Chi-squared on all stored spectra ? Slow,

requires storing all of them• Idea (not original, see HVO/JHU talks): use

PCA to compress data

Page 10: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

PCA

• Need training sample of theory spectra to create eigenspectra

• Project all spectra • Store PCA amplitudes in DB• Provide web service:

– Upload (observational) spectrum (IVOA SSA/SED)– Project onto theory eigenspectra– Use amplitudes as parameters in query for

“nearby” amplitudes– Return corresponding theory spectra– Return corresponding galaxy formation histories,

or their halos, or their environment …

Page 11: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Issues

• Dealing with errors, gaps: “gappy PCA” (Connolly & Szalay)

• Normalization: – incoming spectrum in general from very different

dataset, needs common normalization – Incoming set will have gaps, errors– Ad hoc normalization possible (and works quite

good)

• Indexing of complex multi-dimensional point set for quick nearest k neigbours search (Voronoi ? See Laszlo‘s work)

Page 12: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Normalized gappy PCA

• Fit normalization factor at same time as PCA amplitudes. Model:

• Minimize (over ai and N ) :

Page 13: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Page 14: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Page 15: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Page 16: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

So far

• Ran PCA on BC03 stochastic bursts (Vivienne)

• On first GalICS+milli-Millennium spectra (Jeremy)

• Projected SDSS spectra on both• Defined a PCA data model/schema• Stored PCAs in database• TOPCAT

Page 17: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

PCA data model (RDB schema available) PCADecompositionAlgorithm

SpectrumCatalogue

-redshift-target

Spectrum

-lambda-bin-flux-error

PhotometryPoint

-spectrum*

-spectra

*

-restRedshift : double

PCARun

*

-algorithm

1

*

-catalogue

1

-assumedRedshift : double-featureMask

PCASpectrum-inputSpectra

* *-spectrum1

-pcaRank : int

PCAEigenSpectrum

-eigenSpectra

*

-lambda-mean-variance-wavelengthMask

PCAPreProcessing

-preprocessing

*

PCAProjectionRun

-normalization : double-redshiftShift : double-amplitudes : double

PCAAmplitudes

*

-spectrum

1

-amplitudes

*

*

-pcaDecomposition

1

PCAProjectionAlgrithm

*

-algorithm1

Page 18: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Page 19: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Page 20: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

milliMil-GalICS PC1 vs PC2 Voronoi tesselation

Page 21: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Issues for query-by-example

• Overlap quite good, but good enough ?• GalICS spread less than SDSS. • BC03 comparable with SDSS, but different slope.• Systematics

– Model: • physics very preliminary (see Blaizot & de Lucia?)• resolution effects

– Preprocessing SDSS galaxies • Rebinning: different algorithms give comparable results• (slightly) wrong redshift ? Can be easily simulated

– Projection algorithm: normalization does not affect outcome– Observational systematics: use virtual telescope (+virtual

spectrograph) to test on the theory spectra.Easier to blow up simulation than to shrink observation cloud

Page 22: Databases@MPA,  access methods and plans

Toledo, 2006-02-25 Databases @ MPA

Comments

• Millennium database being used for science projects (Guo Qi)

• SDSS peripherals used for science projects (see Vivienne’s talk, Ben Panter)

• Use of mydb for debugging and testing (Jeremy)

• Please give comments, feedback.