Upload
morris-shelton
View
217
Download
3
Embed Size (px)
Citation preview
Web-based 2d visualization with large data sets
NASA/IPAC Infrared Science Archive
Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu
Larger Data Sets
2003
2MASS Point Source
Catalog
0.5 billion rows> 100 columns
2013
AllWISE Source Catalog
0.75 billion rows> 300 columns
IRSA’s Firefly tri-view
Gum31, AllWISE Source Catalog, 0.5d search. Data are selected in each of the 3 views.
Problem
Sky area: box with center 150.12, +2.21 and length 5400 arcsec.
Catalog Rows, Columns (short form default)
Space on disk(ascii IPAC Table)
AllWISE Source Catalog
30,000 rows, 47 columns
13MB / 9B per cell
COSMOS Cassata morphology Catalog
230,000 rows,15 columns
62MB / 18B per cell
Spitzer Source List 250,000 rows, 148 columns
416MB / 11B per cell
Table covers one page at a time.Image overlay and plot should cover all rows.
How do we visualize this much data?
OverplottingPoints on top of each other
- hard to distinguish
- hard to interpret
- can be aggregated
Plot area: 400 x 400 px2
Symbol size: 5 x 5 px2
160,000 px2/ 25 px2 = 6400
230,000 catalog rows are plotted with 5960 square symbols
Binning Data aggregation technique
Used by statistical packages (R or SDSS)
2-d histogram; shade represent Np in
bin
Outlier preserving
Color-Color Diagram
Color-color diagram created from AllWISE Source Catalog. 1 degree cone search. Lockman Hole. 46,475 data points from are represented by 1,598 bins.
Color-Color Diagram (2)
Same diagram, different shading scheme. Darker – 3.1 times more points.
Binning – calculation
x:y – aspect ratioNbins – maximum number of bins Nx = (int)sqrt( Nbins * [x:y] )Ny = (int)sqrt( Nbins / [x:y] )
binsizex = (xmax – xmin) / Nx + padx
binsizey = (ymax – ymin) / Ny + pady
Server-side vs. Client-side SERVER SIDE CLIENT SIDE
Reduces transferred data size
Used for larger tables (> 30,000 rows)
Reduces rendered data size
Common plot operations – zoom, select – do not require server call
Used for smaller tables (up to 30,000 rows)
Preparing data for visualization
1.
• Retrieve data from low-level query and data service
2.• Apply dynamic [current table] filters
3.• Apply current sorting order
4.• Aggregate data for visualization
stream table processing – one row at a time
cache intermediate results fix plot aspect ratio
Polic
ies
Binning: implications for tri-view
Filtering from image overlay. How to find matching rows?
Aggregation parameters must be preserved!
What do we save? Aggregation parameters
X, Y names or expressions Minimum values: xmin, ymin
Step sizes: binsizex, binsizey
For each aggregated value Bin index Number of points
Conclusions Binning is efficient aggregation technique
Use client-side binning for smaller tables
Preserve aggregation parameters to move between aggregated and full data
Process one row at a time / cache on server
Fix aspect ratio on client
Web-based 2d visualization with large data sets
NASA/IPAC Infrared Science Archive
Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu