Upload
melvin-poole
View
214
Download
1
Embed Size (px)
Citation preview
How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh)
method in relational databases
ARC Liège, 11 February 2010
•ILMT Software• Data acquisition• Clustering Framework (HPC)• Data reduction• Image subtraction (application to GL)• Databases (RDBMS)• ….
•GAIA QSO Classifier Software
Poels, J.
Motivation
• We explore ways of doing spatial search within a relational database
• hierarchical triangular mesh and HEALPix (a tessellation of the sphere) as a zoned bucketing system, representing areas as disjunctive-normal form constraints.
• The approach has the virtue that the zone mechanism works well on B-Trees native to all SQL systems and integrates naturally with current query optimizers
• Involved projects: – SDSS (Sloan Digital Sky Survey)– GSC (Guide Star Catalog) Palomar and UK Schmidt surveys – COBE (Cosmic Background Explorer)– WMAP (Wilkinson Microwave Anisotropy Probe)– ….
ILMT_operating_param
timetemperatureCCD positionrotation (ppm)........
ILMT_reference_cat
2MASS/SDSS cat
ILMT_reference_cat
Iterative_cat_1
Iterative_cat_N
Iterative_cat_2
Iterative_cat_(N-1)
Fits_files_cat
fits_file_idfilenamepathfile_typeprocessing_levelx_global_0y_global_0alpha_0
Reference_img_cat
source_idfits_file_id1fits_file_id2x_localy_localx_globaly_globalalphadeltaflux_apertureflux_psf_fitMag_RObject_typeProcessing_NoFwhmIsolated_flag
Night catalogs
source_idfits_file_id1fits_file_id2x_localy_localx_globaly_globalalphadeltaflux_apertureflux_psf_fitMag_RObject_typeProcessing_NrFwhmIsolated_flag
Night_1_cat
source_idfits_file_id1fits_file_id2x_localy_localx_globaly_globalalphadeltaflux_apertureflux_psf_fitMag_RObject_typeProcessing_NrFwhmIsolated_flag
Night_2_cat
source_idfits_file_id1fits_file_id2x_localy_localx_globaly_globalalphadeltaflux_apertureflux_psf_fitMag_RObject_typeProcessing_NrFwhmIsolated_flag
Night_3_cat
source_idfits_file_id1fits_file_id2x_localy_localx_globaly_globalalphadeltaflux_apertureflux_psf_fitMag_RObject_typeProcessing_NrFwhmIsolated_flag
Night_N_catsource_idNight1_rowidNight2_rowid…………………
NightN_rowidRef_img_cat_rowid
Objects_rowid_ptr
Is this (horizontal time) model suitable ?
• At the beginning the RDBMS should run smoothly but after 5 years of operation ?
• Indexing is not an easy task• A given source will be measures an order of 10^3 where each measure set
featuring ~ 200 bytes • Assuming ~10x10^6 sources we get a multi-TB DB (for alphanumeric data
only)! Consider also ~ 25TB of image data• Example of performance bottleneck: Search all constant point-like sources.
We have to scan the whole DB and for each source, track its history. This means that for each source we have to issue 10x10^6x(N-1) SQL statements ! Forget it.
• Beyond the query complexity, the DMS prefetch the rows which are more likely to be read: useless and slow disk activity.
• The data must be rearranged• Solution: HTM
HTM (Hierachical Triangular Mesh)• HTM [18] maps triangular regions
of the sphere to unique identifiers• The technique for subdividing the
sphere in spherical triangles is a recursive process
• At each level of the recursion, the area of the resulting triangles is roughly the same
• In areas with a larger data density, the recursion process can be applied with a greater level of detail than in areas with lower density
• The starting point is a spherical octahedron which identifies 8 spherical triangles of equal size
• The term quadtree is used to describe a class of hierarchical data structures based on the principle of a fast recursive decomposition of the space.
• Sky tessellation with various mapping functions have been proposed. It is a matter of fact that the astronomical community is accepting the HTM and HEALPix (Hierarchical, Equal Area, and iso-Latitude Pixelisation) schema as the default for object catalogues and for maps visualization and analysis, respectively. HEALPix gives a hierarchical iso-area and iso-latitude tessellation of the sphere and so are convenient for harmonic data analysis on the sphere (densities, integrals, spherical harmonics, Fourier transforms,etc.,)
• Using a 64 bit long integer to store the index IDs leads to a limit for the pixels size of about 7.7 and 0.44 milli-arcsec on a side for HTM and HEALPix, respectively. Being able to quickly retrieve the list of objects in a given sky region is crucial in several projects.
Application to the ILMT
• The indexing scheme does not have to go as deep as the actual pixel resolution• Each triangle is linked to its own database table (the GSC approach)• SQL queries involving searches based on RA,DEC (+range) quickly provide the HtmId(s) which in turn is used to build the table_name(s) to be accessed. • We can dynamically choose a triangle surface coverage depending on the maximum number of sources (e.g. max 100 sources per triangle)• One single SQL statement returns a cursor with the whole source_id history • The problem is now: for each triangle database table, select distinct all source_id and fetch history rows ~ 10x10^6 SQL runs ! Manageable • The indexing C++ software is freely available at:
http://www.sdss.jhu.edu/