Upload
lynne-barker
View
232
Download
5
Tags:
Embed Size (px)
Citation preview
Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases
Ugur YENIER
Introduction
Need for data interfaces emerging with Data warehousing Scientific Computation Business Analysis
Graphic Representations are more effective Allowing multiple views of the same data Easy discovery on massive data
Introduction
Meaning of data Discover Structure Find Patterns Derive Casual Relationships
n-Dimensional Data Cubes Cube Dimension = Relational Schema Dimension
Introduction
Most popular method : PIVOT TABLE Allow data cube to be rotated, pivoted Dimensions = Rows or Columns Remaining Dimensions are aggregated Cross-tabulations and summaries are provided
Further exploit : Graphs Projections of data cubes in
Bar Charts Scatter points Parallel Coordinate Displays
Introduction
POLARIS Interface for exploring multi-dimensional
databases Extends Pivot Table to directly generate rich
graphical displays Builds tables using algebraic formalism
involving fields of the database Each table contains layers and panes
Overview
Support interactive exploration of large multidimensional relational databases
A relational database may contain heterogeneous but interrelated tables
Field Characteristics Nominal Ordinal Quantitative Interval
Overview
Polaris Field Categorization Intervals = Quantitative (Ordered) Nominal = Ordinal
Dimension : Product Name Measure : Product Prize, Size Ordinal Fields Dimensions Quantitative Measure
Overview
Target Specifications Data-dense displays Multiple display types Exploratory Interface
Polaris meets specs. providing rapidly and incrementally generating table-based displays
Table = Rows, Columns + LAYERS
Overview
Each table axis may contain multiple nested dimensions
Each table entry (pane) consists a set of records represented with marks
Sample Polaris Interface
Interface Characteristics
Multivariate: Multiple dimension of data can be explicitly encoded
Comparative : Small-multiple displays to compare, exposing patterns and trends
Familiar : Statisticians are accustomed to using tabular display of graphics
Visualization
Multiple data sources may be combined in a single visualization
Dimensions are displayed in x,y,z shelves Record partitioning and layering Grouping information Graphic Type Field mappings to retinal properties
Visualization
Selecting a mark pops up detail window displaying specified tuples
It is possible to draw a rubber band around a set of marks to brush
(will be discussed later…)
Generating Graphics
There are three components Specifications of the different table
configurations Type of graphics inside each pane Details of the visual encodings
Table Algebra
Formal mechanism to specify table configurations
When a field is placed in a shelf, algebra expression is generated
x,y axes partition into rows and columns, z partitions to layers
Table Algebra
A,B,C representing ordinal fields P,Q,R representing quantitative fields
Assignment of sets to symbols reflect difference in how two types of fields will be encoded in the structure of the tables
Ordinal fields into rows and columns Quantitative fields into axes within the panes
Table Algebra
Valid expression is an ordered sequence of one or more symbols
Between each adjacent symbol there are operators
Operators (in order of precedence) (X) Cross (/) Nest (+) Concatenation
Concatenation
Performs union operations
Cross
Performs Cartesian product operations
Nest
Similar to cross operator, but only creates set entries for which there exist records with those domain values.
Interpretation is “B within A” For example, given the fields quarter and month, the
expression quarter/month would be interpreted as months within each quarter
Table Algebra
Every expression in the algebra can be reduced to a single set
Each entry in the set being an ordered concatenation of zero of more ordinal values with zero or more quantitative field names
This set evaluation of an expression is normalized set form
Normalized Set Form
Table axis is partitioned into columns (rows or layers) so that there is a one-to-one correspondence between set entries in the set and columns
Normalized Set Form
Types of Graphics
Once the table configuration is specified, next step is to specify the type of graphic in each pane
Three graphic families ordinal-ordinal ordinal-quantitative quantitative-quantitative
Each family contains a number of ways to mark records
Type of Graphics
Supported Polaris types Rectangle Circle Glyph Text Gantt bar Line Polygon Image
Types of Graphics
Dependent and independent dimensions are interpreted differently
By default dimensions are treated as independent dimensions
Aggregations affect the type of graphics
Ordinal-Ordinal
Axis variables are typically independent of each other Task is focused on understanding patterns and trends
in some function
ƒ(Ox,Oy) R
Typical example is studying sales and margin as a function of product type, month and state of items sold by a coffee chain
Ordinal-Quantitative
Typically bar chart, possibly clustered or stacked, the dot plot and Gantt Chart
Quantitative variable is often dependent on the ordinal variable and the aim is to understand or compare the properties of some function
ƒ(O) Q
Ordinal-Quantitative
Matrix of bar charts used to study several functions of the independent variables product and month
Ordinal-Quantitative
The cardinality of the record set does affect the structure of the graphics
When the cardinality of the record is set is one, the graphics are simple bar or dot plots
When the cardinality of the record is set to greater than one, the graphic is stacked bar chart
Ordinal-Quantitative
Major wars over the past 500 year shown as a Gantt chart
Additional layer in figure displays pictures of major scientists plotted as a function of the independent variables country of birth and date of birth
Quantitative- Quantitative
Used to understand the distribution of data as a function of one or both quantitative variables and to discover casual relationships between the two quantitative variables
Quantitative-Quantitative
Typical Map Flight scheduling varies with the
region of the country in which the flight originated
Number of flights between major airports has been plotted as a function of latitude and longitude
Plotted in two layers, the location plots and the geography of each state as a polygon
Visual Mappings
Each record in a pane is mapped to a mark Two Components
Type of graphic and mark Encoding of fields of the records into visual or retinal properties
of the selected mark Visual properties in Polaris are based on Bertin's retinal variables
Shape Size Orientation Color (value and hue) Texture (not supported in the current version of Polaris)
Retinal Properties
The different retinal properties that can be used to encode fields of the data and examples of the default mappings that are generated when a given type of data field is encoded in each of the retinal properties
Visual Mappings
Retinal properties of the display greatly enhances the data density and the variety of displays that can be generated
Analysts should not be required to construct the mappings
Instead, they should be able to simply specify that a field be encoded as a visual property
System should then generate an effective mapping from the domain of the field to the range of the visual property
DATA TRANSFORMATIONS AND VISUAL QUERIES Rapidly change the table configuration, type of
graphic, and visual encodings used to visualize a data set for interactive exploration
Resulting display is also manipulable Analyst is able to sort, filter, and transform the data
to uncover useful relationships and information Also form ad hoc groupings and partitions that
reflect this newly uncovered information
Data Transformations and Visual Queries Polaris supports four features to perform
visual queries Deriving additional fields Sorting and filtering Brushing and tool tips Undo and Redo
Deriving Additional Fields
The generated fields are aggregates or statistical summaries
Polaris currently provides five methods for deriving additional fields Simple aggregation of quantitative measures Counting of distinct values in ordinal dimensions Discrete partitioning of quantitative measures Ad hoc grouping within ordinal dimensions Threshold aggregation
Deriving Additional Fields Simple Aggregation Basic aggregation operations (that are
applied to a single quantitative field) Summation Average Minimum Maximum
Right-Click and apply, change type Easily extended to provide any statistical
aggregate that can be generated from relational data
Deriving Additional FieldsCounting of Ordinal Dimensions Counting of distinct values for an ordinal field
within the data set Right-Click and apply Applying the count operator changes the field
type (to quantitative) and thus change the table configuration and graph type in each pane
Deriving Additional FieldsDiscrete Partitioning Used to discretize a continuous domain Polaris provides two discretization methods
Binning, allows the analyst to specify a regular bin size in which to aggregate the data, useful for creating graphs, such as histograms, in which there are many regularly sized bins
Partitioning, allows the user to individually specify the size and name of each bin, useful for encoding additional categorizations into the data
Right-Click and apply
Deriving Additional FieldsAd hoc Grouping Ordinal version of quantitative partitioning,
where the user can choose to group together different ordinal values
Allows the analyst to add own domain knowledge to the analysis and to change the groupings as the exploration uncovers additional patterns
Right-Click and apply
Deriving Additional FieldsThreshold Aggregation It is derived from two source fields: an ordinal field
and a quantitative field If the quantitative field is less than a certain
threshold value for any values of an ordinal field, those values are aggregated together to form an "Other" category
Allows the user to specify threshold values below which the data is considered uninteresting
Right-Click and apply
Sorting and Filtering
Filtering allows the user to choose which values to display so that he can focus on and devote more screen space and attention to the areas of interest
For ordinal fields, a listbox with all possible values is shown and the user can check or uncheck each value to display it or not
For quantitative fields, a dynamic query slider allows the user to choose a new domain
Additionally, there are textboxes showing the chosen minimum and maximum values that the user can use to directly enter a new domain.
Sorting and Filtering
Sorting allows the user to uncover hidden patterns by changing the order of values within a field's domain or the ordering of tuples in the data
The ordering of tuples affects the drawing order of marks within a pane.
Polaris provides three ways for a user to sort the domain. User can bring up the filter window and drag-and drop the values
within that window to reorder the domain If the field has been used to partition the table into rows or
columns, the user can drag-and-drop the table row or column headers to reorder the domain values
Polaris provides programmatic sorting, allowing the user to sort one field based on the value in another field
Brushing and Tooltips
Analysts want to directly interact with the data, visually querying the data to highlight correlated marks or getting more details on demand Brushing allows the user to choose a set of
interesting data points by drawing a rubberband around them
Tooltips allow the user to get more details on demand.
Brushing
The user selects a single field whose values are then used to identify related marks and tuples
All marks corresponding to tuples sharing selected field values with the selected tuples are subsequently highlighted in all other panes or linked Polaris views
Allowing correlation between different projections of the same data set or relationships between distinct data sets.
Tooltips
If the user hovers over a data point or pane, additional details, such as specific field values for the tuple corresponding to the selected mark, are shown
Analysts can use tooltips to understand the relationship between the graphical marks and the underlying data
Undo and Redo
Unlimited undo and redo within an analysis sessio
Users can use the "Back" and "Forward" buttons on the top toolbar to either return to a previous visual specification or to move forward again.
GENERATING DATABASE QUERIES
Results
Throughout the analyses users want to see data and how they want to see it change continually
Analysts form hypotheses create new views to perform tests and experiments
Certain displays enable an understanding of overall trends, whereas others show causal relationships
As the analysts better understand the data, they may want to drill-down in the visible dimensions or display entirely different dimensions
Polaris supports this exploratory process through its visual interface By formally categorizing the types of graphics, Polaris is able to provide
a simple interface for rapidly generating a wide range of displays This allows analysts to focus on the analysis task rather than the steps
needed to retrieve and display the data
Discussions
Comparison with similar work is omitted in this presentation
Interpretation of visual specifications as database queries
Interactivity and performance of Polaris
Interpretation of Visual Specifications as Database Queries
Polaris generates the SQL query for each table pane
Similar to CUBE operator generating the queries to create the cross-tab and Pivot Table displays
However the CUBE operator is not applicable for Polaris because it assumes that the sets of relations partitioned into each table pane do not overlap
Interactivity and Performance of Polaris Polaris at its first implementations focuses on
the techniques, semantics and formalism rather then the interactivity
It has been experienced that the query response time does not need to be real-time in order to maintain a feeling of exploration (several tens of seconds)
Interactivity and Performance of Polaris Test Data:
A subset of a packet trace of a mobile network over a 13 week period, approx. 6 million tuples
A subset of the data collected from Sloan Digital Sky Survey (approx. 650MB)
Both stored on MS SQL Server 2000 Paper does not provide numeric data on
performance but the personal experiences of the testers
Conclusion
Polaris extends the well known Pivot Table interface to display relational query results using a rich inexpensive set of graphical displays
Succinct visual specification for describing table-based graphical displays of relational data
Interpretation of visual specifications as a precise sequence of relational database operations
Future Work
Performance evaluation Hierarchical data cubes Correspondence of marks to data tuples
(dynamic mark generation) Animation shelf to display sequencing data
Thank You