15
Web-based 2d visualization with large data sets NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Embed Size (px)

Citation preview

Page 1: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Web-based 2d visualization with large data sets

NASA/IPAC Infrared Science Archive

Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Page 2: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Larger Data Sets

2003

2MASS Point Source

Catalog

0.5 billion rows> 100 columns

2013

AllWISE Source Catalog

0.75 billion rows> 300 columns

Page 3: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

IRSA’s Firefly tri-view

Gum31, AllWISE Source Catalog, 0.5d search. Data are selected in each of the 3 views.

Page 4: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Problem

Sky area: box with center 150.12, +2.21 and length 5400 arcsec.

Catalog Rows, Columns (short form default)

Space on disk(ascii IPAC Table)

AllWISE Source Catalog

30,000 rows, 47 columns

13MB / 9B per cell

COSMOS Cassata morphology Catalog

230,000 rows,15 columns

62MB / 18B per cell

Spitzer Source List 250,000 rows, 148 columns

416MB / 11B per cell

Table covers one page at a time.Image overlay and plot should cover all rows.

How do we visualize this much data?

Page 5: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

OverplottingPoints on top of each other

- hard to distinguish

- hard to interpret

- can be aggregated

Plot area: 400 x 400 px2

Symbol size: 5 x 5 px2

160,000 px2/ 25 px2 = 6400

230,000 catalog rows are plotted with 5960 square symbols

Page 6: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Binning Data aggregation technique

Used by statistical packages (R or SDSS)

2-d histogram; shade represent Np in

bin

Outlier preserving

Page 7: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Color-Color Diagram

Color-color diagram created from AllWISE Source Catalog. 1 degree cone search. Lockman Hole. 46,475 data points from are represented by 1,598 bins.

Page 8: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Color-Color Diagram (2)

Same diagram, different shading scheme. Darker – 3.1 times more points.

Page 9: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Binning – calculation

x:y – aspect ratioNbins – maximum number of bins Nx = (int)sqrt( Nbins * [x:y] )Ny = (int)sqrt( Nbins / [x:y] )

binsizex = (xmax – xmin) / Nx + padx

binsizey = (ymax – ymin) / Ny + pady

Page 10: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Server-side vs. Client-side SERVER SIDE CLIENT SIDE

Reduces transferred data size

Used for larger tables (> 30,000 rows)

Reduces rendered data size

Common plot operations – zoom, select – do not require server call

Used for smaller tables (up to 30,000 rows)

Page 11: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Preparing data for visualization

1.

• Retrieve data from low-level query and data service

2.• Apply dynamic [current table] filters

3.• Apply current sorting order

4.• Aggregate data for visualization

stream table processing – one row at a time

cache intermediate results fix plot aspect ratio

Polic

ies

Page 12: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Binning: implications for tri-view

Filtering from image overlay. How to find matching rows?

Aggregation parameters must be preserved!

Page 13: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

What do we save? Aggregation parameters

X, Y names or expressions Minimum values: xmin, ymin

Step sizes: binsizex, binsizey

For each aggregated value Bin index Number of points

Page 14: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Conclusions Binning is efficient aggregation technique

Use client-side binning for smaller tables

Preserve aggregation parameters to move between aggregated and full data

Process one row at a time / cache on server

Fix aspect ratio on client

Page 15: NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu

Web-based 2d visualization with large data sets

NASA/IPAC Infrared Science Archive

Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu