12
How to speed up search of ILMT light curves using t HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT Software Data acquisition Clustering Framework (HPC) Data reduction Image subtraction (application to GL) Databases (RDBMS) …. GAIA QSO Classifier Software Poels, J.

How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

Embed Size (px)

Citation preview

Page 1: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh)

method in relational databases

ARC Liège, 11 February 2010

•ILMT Software• Data acquisition• Clustering Framework (HPC)• Data reduction• Image subtraction (application to GL)• Databases (RDBMS)• ….

•GAIA QSO Classifier Software

Poels, J.

Page 2: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

Motivation

• We explore ways of doing spatial search within a relational database

• hierarchical triangular mesh and HEALPix (a tessellation of the sphere) as a zoned bucketing system, representing areas as disjunctive-normal form constraints.

• The approach has the virtue that the zone mechanism works well on B-Trees native to all SQL systems and integrates naturally with current query optimizers

• Involved projects: – SDSS (Sloan Digital Sky Survey)– GSC (Guide Star Catalog) Palomar and UK Schmidt surveys – COBE (Cosmic Background Explorer)– WMAP (Wilkinson Microwave Anisotropy Probe)– ….

Page 3: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

ILMT_operating_param

timetemperatureCCD positionrotation (ppm)........

Page 4: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

ILMT_reference_cat

2MASS/SDSS cat

ILMT_reference_cat

Iterative_cat_1

Iterative_cat_N

Iterative_cat_2

Iterative_cat_(N-1)

Page 5: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

Fits_files_cat

fits_file_idfilenamepathfile_typeprocessing_levelx_global_0y_global_0alpha_0

Page 6: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

Reference_img_cat

source_idfits_file_id1fits_file_id2x_localy_localx_globaly_globalalphadeltaflux_apertureflux_psf_fitMag_RObject_typeProcessing_NoFwhmIsolated_flag

Page 7: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

Night catalogs

source_idfits_file_id1fits_file_id2x_localy_localx_globaly_globalalphadeltaflux_apertureflux_psf_fitMag_RObject_typeProcessing_NrFwhmIsolated_flag

Night_1_cat

source_idfits_file_id1fits_file_id2x_localy_localx_globaly_globalalphadeltaflux_apertureflux_psf_fitMag_RObject_typeProcessing_NrFwhmIsolated_flag

Night_2_cat

source_idfits_file_id1fits_file_id2x_localy_localx_globaly_globalalphadeltaflux_apertureflux_psf_fitMag_RObject_typeProcessing_NrFwhmIsolated_flag

Night_3_cat

source_idfits_file_id1fits_file_id2x_localy_localx_globaly_globalalphadeltaflux_apertureflux_psf_fitMag_RObject_typeProcessing_NrFwhmIsolated_flag

Night_N_catsource_idNight1_rowidNight2_rowid…………………

NightN_rowidRef_img_cat_rowid

Objects_rowid_ptr

Page 8: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

Is this (horizontal time) model suitable ?

• At the beginning the RDBMS should run smoothly but after 5 years of operation ?

• Indexing is not an easy task• A given source will be measures an order of 10^3 where each measure set

featuring ~ 200 bytes • Assuming ~10x10^6 sources we get a multi-TB DB (for alphanumeric data

only)! Consider also ~ 25TB of image data• Example of performance bottleneck: Search all constant point-like sources.

We have to scan the whole DB and for each source, track its history. This means that for each source we have to issue 10x10^6x(N-1) SQL statements ! Forget it.

• Beyond the query complexity, the DMS prefetch the rows which are more likely to be read: useless and slow disk activity.

• The data must be rearranged• Solution: HTM

Page 9: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

HTM (Hierachical Triangular Mesh)• HTM [18] maps triangular regions

of the sphere to unique identifiers• The technique for subdividing the

sphere in spherical triangles is a recursive process

• At each level of the recursion, the area of the resulting triangles is roughly the same

• In areas with a larger data density, the recursion process can be applied with a greater level of detail than in areas with lower density

• The starting point is a spherical octahedron which identifies 8 spherical triangles of equal size

Page 10: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

• The term quadtree is used to describe a class of hierarchical data structures based on the principle of a fast recursive decomposition of the space.

• Sky tessellation with various mapping functions have been proposed. It is a matter of fact that the astronomical community is accepting the HTM and HEALPix (Hierarchical, Equal Area, and iso-Latitude Pixelisation) schema as the default for object catalogues and for maps visualization and analysis, respectively. HEALPix gives a hierarchical iso-area and iso-latitude tessellation of the sphere and so are convenient for harmonic data analysis on the sphere (densities, integrals, spherical harmonics, Fourier transforms,etc.,)

Page 11: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

• Using a 64 bit long integer to store the index IDs leads to a limit for the pixels size of about 7.7 and 0.44 milli-arcsec on a side for HTM and HEALPix, respectively. Being able to quickly retrieve the list of objects in a given sky region is crucial in several projects.

Page 12: How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT

Application to the ILMT

• The indexing scheme does not have to go as deep as the actual pixel resolution• Each triangle is linked to its own database table (the GSC approach)• SQL queries involving searches based on RA,DEC (+range) quickly provide the HtmId(s) which in turn is used to build the table_name(s) to be accessed. • We can dynamically choose a triangle surface coverage depending on the maximum number of sources (e.g. max 100 sources per triangle)• One single SQL statement returns a cursor with the whole source_id history • The problem is now: for each triangle database table, select distinct all source_id and fetch history rows ~ 10x10^6 SQL runs ! Manageable • The indexing C++ software is freely available at:

http://www.sdss.jhu.edu/