Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER

Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases

Ugur YENIER

Introduction

Need for data interfaces emerging with Data warehousing Scientific Computation Business Analysis

Graphic Representations are more effective Allowing multiple views of the same data Easy discovery on massive data

Introduction

Meaning of data Discover Structure Find Patterns Derive Casual Relationships

n-Dimensional Data Cubes Cube Dimension = Relational Schema Dimension

Introduction

Most popular method : PIVOT TABLE Allow data cube to be rotated, pivoted Dimensions = Rows or Columns Remaining Dimensions are aggregated Cross-tabulations and summaries are provided

Further exploit : Graphs Projections of data cubes in

Bar Charts Scatter points Parallel Coordinate Displays

Introduction

POLARIS Interface for exploring multi-dimensional

databases Extends Pivot Table to directly generate rich

graphical displays Builds tables using algebraic formalism

involving fields of the database Each table contains layers and panes

Overview

Support interactive exploration of large multidimensional relational databases

A relational database may contain heterogeneous but interrelated tables

Field Characteristics Nominal Ordinal Quantitative Interval

Overview

Polaris Field Categorization Intervals = Quantitative (Ordered) Nominal = Ordinal

Dimension : Product Name Measure : Product Prize, Size Ordinal Fields Dimensions Quantitative Measure

Overview

Target Specifications Data-dense displays Multiple display types Exploratory Interface

Polaris meets specs. providing rapidly and incrementally generating table-based displays

Table = Rows, Columns + LAYERS

Overview

Each table axis may contain multiple nested dimensions

Each table entry (pane) consists a set of records represented with marks

Sample Polaris Interface

Interface Characteristics

Multivariate: Multiple dimension of data can be explicitly encoded

Comparative : Small-multiple displays to compare, exposing patterns and trends

Familiar : Statisticians are accustomed to using tabular display of graphics

Visualization

Multiple data sources may be combined in a single visualization

Dimensions are displayed in x,y,z shelves Record partitioning and layering Grouping information Graphic Type Field mappings to retinal properties

Visualization

Selecting a mark pops up detail window displaying specified tuples

It is possible to draw a rubber band around a set of marks to brush

(will be discussed later…)

Generating Graphics

There are three components Specifications of the different table

configurations Type of graphics inside each pane Details of the visual encodings

Table Algebra

Formal mechanism to specify table configurations

When a field is placed in a shelf, algebra expression is generated

x,y axes partition into rows and columns, z partitions to layers

Table Algebra

A,B,C representing ordinal fields P,Q,R representing quantitative fields

Assignment of sets to symbols reflect difference in how two types of fields will be encoded in the structure of the tables

Ordinal fields into rows and columns Quantitative fields into axes within the panes

Table Algebra

Valid expression is an ordered sequence of one or more symbols

Between each adjacent symbol there are operators

Operators (in order of precedence) (X) Cross (/) Nest (+) Concatenation

Concatenation

Performs union operations

Cross

Performs Cartesian product operations

Nest

Similar to cross operator, but only creates set entries for which there exist records with those domain values.

Interpretation is “B within A” For example, given the fields quarter and month, the

expression quarter/month would be interpreted as months within each quarter

Table Algebra

Every expression in the algebra can be reduced to a single set

Each entry in the set being an ordered concatenation of zero of more ordinal values with zero or more quantitative field names

This set evaluation of an expression is normalized set form

Normalized Set Form

Table axis is partitioned into columns (rows or layers) so that there is a one-to-one correspondence between set entries in the set and columns

Normalized Set Form

Types of Graphics

Once the table configuration is specified, next step is to specify the type of graphic in each pane

Three graphic families ordinal-ordinal ordinal-quantitative quantitative-quantitative

Each family contains a number of ways to mark records

Type of Graphics

Supported Polaris types Rectangle Circle Glyph Text Gantt bar Line Polygon Image

Types of Graphics

Dependent and independent dimensions are interpreted differently

By default dimensions are treated as independent dimensions

Aggregations affect the type of graphics

Ordinal-Ordinal

Axis variables are typically independent of each other Task is focused on understanding patterns and trends

in some function

ƒ(Ox,Oy) R

Typical example is studying sales and margin as a function of product type, month and state of items sold by a coffee chain

Ordinal-Quantitative

Typically bar chart, possibly clustered or stacked, the dot plot and Gantt Chart

Quantitative variable is often dependent on the ordinal variable and the aim is to understand or compare the properties of some function

ƒ(O) Q


Matrix of bar charts used to study several functions of the independent variables product and month


The cardinality of the record set does affect the structure of the graphics

When the cardinality of the record is set is one, the graphics are simple bar or dot plots

When the cardinality of the record is set to greater than one, the graphic is stacked bar chart


Major wars over the past 500 year shown as a Gantt chart

Additional layer in figure displays pictures of major scientists plotted as a function of the independent variables country of birth and date of birth

Quantitative- Quantitative

Used to understand the distribution of data as a function of one or both quantitative variables and to discover casual relationships between the two quantitative variables

Quantitative-Quantitative

Typical Map Flight scheduling varies with the

region of the country in which the flight originated

Number of flights between major airports has been plotted as a function of latitude and longitude

Plotted in two layers, the location plots and the geography of each state as a polygon

Visual Mappings

Each record in a pane is mapped to a mark Two Components

Type of graphic and mark Encoding of fields of the records into visual or retinal properties

of the selected mark Visual properties in Polaris are based on Bertin's retinal variables

Shape Size Orientation Color (value and hue) Texture (not supported in the current version of Polaris)

Retinal Properties

The different retinal properties that can be used to encode fields of the data and examples of the default mappings that are generated when a given type of data field is encoded in each of the retinal properties

Visual Mappings

Retinal properties of the display greatly enhances the data density and the variety of displays that can be generated

Analysts should not be required to construct the mappings

Instead, they should be able to simply specify that a field be encoded as a visual property

System should then generate an effective mapping from the domain of the field to the range of the visual property

DATA TRANSFORMATIONS AND VISUAL QUERIES Rapidly change the table configuration, type of

graphic, and visual encodings used to visualize a data set for interactive exploration

Resulting display is also manipulable Analyst is able to sort, filter, and transform the data

to uncover useful relationships and information Also form ad hoc groupings and partitions that

reflect this newly uncovered information

Data Transformations and Visual Queries Polaris supports four features to perform

visual queries Deriving additional fields Sorting and filtering Brushing and tool tips Undo and Redo

Deriving Additional Fields

The generated fields are aggregates or statistical summaries

Polaris currently provides five methods for deriving additional fields Simple aggregation of quantitative measures Counting of distinct values in ordinal dimensions Discrete partitioning of quantitative measures Ad hoc grouping within ordinal dimensions Threshold aggregation

Deriving Additional Fields Simple Aggregation Basic aggregation operations (that are

applied to a single quantitative field) Summation Average Minimum Maximum

Right-Click and apply, change type Easily extended to provide any statistical

aggregate that can be generated from relational data

Deriving Additional FieldsCounting of Ordinal Dimensions Counting of distinct values for an ordinal field

within the data set Right-Click and apply Applying the count operator changes the field

type (to quantitative) and thus change the table configuration and graph type in each pane

Deriving Additional FieldsDiscrete Partitioning Used to discretize a continuous domain Polaris provides two discretization methods

Binning, allows the analyst to specify a regular bin size in which to aggregate the data, useful for creating graphs, such as histograms, in which there are many regularly sized bins

Partitioning, allows the user to individually specify the size and name of each bin, useful for encoding additional categorizations into the data

Right-Click and apply

Deriving Additional FieldsAd hoc Grouping Ordinal version of quantitative partitioning,

where the user can choose to group together different ordinal values

Allows the analyst to add own domain knowledge to the analysis and to change the groupings as the exploration uncovers additional patterns


Deriving Additional FieldsThreshold Aggregation It is derived from two source fields: an ordinal field

and a quantitative field If the quantitative field is less than a certain

threshold value for any values of an ordinal field, those values are aggregated together to form an "Other" category

Allows the user to specify threshold values below which the data is considered uninteresting


Sorting and Filtering

Filtering allows the user to choose which values to display so that he can focus on and devote more screen space and attention to the areas of interest

For ordinal fields, a listbox with all possible values is shown and the user can check or uncheck each value to display it or not

For quantitative fields, a dynamic query slider allows the user to choose a new domain

Additionally, there are textboxes showing the chosen minimum and maximum values that the user can use to directly enter a new domain.

Sorting and Filtering

Sorting allows the user to uncover hidden patterns by changing the order of values within a field's domain or the ordering of tuples in the data

The ordering of tuples affects the drawing order of marks within a pane.

Polaris provides three ways for a user to sort the domain. User can bring up the filter window and drag-and drop the values

within that window to reorder the domain If the field has been used to partition the table into rows or

columns, the user can drag-and-drop the table row or column headers to reorder the domain values

Polaris provides programmatic sorting, allowing the user to sort one field based on the value in another field

Brushing and Tooltips

Analysts want to directly interact with the data, visually querying the data to highlight correlated marks or getting more details on demand Brushing allows the user to choose a set of

interesting data points by drawing a rubberband around them

Tooltips allow the user to get more details on demand.

Brushing

The user selects a single field whose values are then used to identify related marks and tuples

All marks corresponding to tuples sharing selected field values with the selected tuples are subsequently highlighted in all other panes or linked Polaris views

Allowing correlation between different projections of the same data set or relationships between distinct data sets.

Tooltips

If the user hovers over a data point or pane, additional details, such as specific field values for the tuple corresponding to the selected mark, are shown

Analysts can use tooltips to understand the relationship between the graphical marks and the underlying data

Undo and Redo

Unlimited undo and redo within an analysis sessio

Users can use the "Back" and "Forward" buttons on the top toolbar to either return to a previous visual specification or to move forward again.

GENERATING DATABASE QUERIES

Results

Throughout the analyses users want to see data and how they want to see it change continually

Analysts form hypotheses create new views to perform tests and experiments

Certain displays enable an understanding of overall trends, whereas others show causal relationships

As the analysts better understand the data, they may want to drill-down in the visible dimensions or display entirely different dimensions

Polaris supports this exploratory process through its visual interface By formally categorizing the types of graphics, Polaris is able to provide

a simple interface for rapidly generating a wide range of displays This allows analysts to focus on the analysis task rather than the steps

needed to retrieve and display the data

Discussions

Comparison with similar work is omitted in this presentation

Interpretation of visual specifications as database queries

Interactivity and performance of Polaris

Interpretation of Visual Specifications as Database Queries

Polaris generates the SQL query for each table pane

Similar to CUBE operator generating the queries to create the cross-tab and Pivot Table displays

However the CUBE operator is not applicable for Polaris because it assumes that the sets of relations partitioned into each table pane do not overlap

Interactivity and Performance of Polaris Polaris at its first implementations focuses on

the techniques, semantics and formalism rather then the interactivity

It has been experienced that the query response time does not need to be real-time in order to maintain a feeling of exploration (several tens of seconds)

Interactivity and Performance of Polaris Test Data:

A subset of a packet trace of a mobile network over a 13 week period, approx. 6 million tuples

A subset of the data collected from Sloan Digital Sky Survey (approx. 650MB)

Both stored on MS SQL Server 2000 Paper does not provide numeric data on

performance but the personal experiences of the testers

Conclusion

Polaris extends the well known Pivot Table interface to display relational query results using a rich inexpensive set of graphical displays

Succinct visual specification for describing table-based graphical displays of relational data

Interpretation of visual specifications as a precise sequence of relational database operations

Future Work

Performance evaluation Hierarchical data cubes Correspondence of marks to data tuples

(dynamic mark generation) Animation shelf to display sequencing data

Thank You

Documents

Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER