24
1 2014 SAP AG or an SAP affiliate company. All rights reserved. SAP HANA SPS 11 - What’s New? HANA Spatial Raj Rathee, SAP HANA Product Management December, 2015 (Delta from SPS 10 to SPS 11)

What's new for Spatial in SAP HANA SPS 11

Embed Size (px)

Citation preview

1© 2014 SAP AG or an SAP affiliate company. All rights reserved.

SAP HANA SPS 11 - What’s New? HANA Spatial

Raj Rathee, SAP HANA Product Management December, 2015(Delta from SPS 10 to SPS 11)

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 2Public

Disclaimer

This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP.

SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP’s strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice.

This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 3Public

Agenda

Spatial Clustering SQLScript Support for Spatial Types Miscellaneous

Spatial Clustering

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 5Public

Spatial ClusteringUse Cases

Retail Store Location• Retailer wants to establish two chains of stores

• One luxury brand, one discount• Has location & income data of 120M US

households• Needs to be able to locate areas of high income

and low income households to strategically locate the two different chains

Analysis of Vending Outlets• Retailer has vending machines located across the

US• Has location and revenue generated of each

machine• Needs to be

• Analyze lucrative and non-lucrative locations• Easily visualize these locations• Do ad-hoc analysis – e.g. zoom in and get

specific machine details

• Spatial Clustering groups a set of points, that meets certain criteria, into clusters• A cluster is a partition of the original set of points• Spatial clustering essentially allows geographical information to be used to group data

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 6Public

Spatial Clustering(and GROUP BY/PARTITION BY)• GROUP BY/PARTITION BY can be used to group data on certain attributes (e.g. city, state,

country) – but provides basic information

• With spatial clustering, can use geographical information w/ other attributes (e.g. income, revenue) to cluster data

GROUP BY CLUSTER BY

Groups the data set using information contained in the column(s)

Splits the data set into clusters. Clusters are determined by geospatial point data using an algorithm

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 7Public

Spatial Clustering(and GROUP BY/PARTITION BY)

CREATE COLUMN TABLE HOUSEHOLDS ( hhid INT PRIMARY KEY, state CHAR (2), location ST_POINT, income DECIMAL (11, 2));

SELECT

ST_ClusterId() AS cluster_id,

ST_ClusterCentroid() AS centroid,

count(*) AS num_hholds,

avg(income) AS avg_clus_income

FROM households

WHERE income > 120000

GROUP CLUSTER BY location USING DBSCAN EPS 4 MINPTS 1000

HAVING count(*) >= 300

SELECT state, count(*) AS number_of_households, avg(income) AS household_income FROM households WHERE income > 120000 GROUP BY STATE HAVING count(*) >= 300;

Grouping on a predefined area(by state)

Using spatial clustering – cluster algorithm used to create the groups

Specification of clustering algorithmto use

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 8Public

Spatial Clustering(Example)

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 9Public

Spatial Clustering(Example)

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 10Public

Spatial ClusteringClustering Algorithms

• Spatial Clustering relies on clustering algorithms• Parameters to the algorithm used to influence the grouping criteria

• Algorithms Supported• Grid• K-Means• DBSCAN

• Typical operations done on a cluster • Which cluster (i.e. cluster ID a data point belongs to)• Finding the centroid (“central point”) and envelope (“bounding area”) of a cluster• Aggregations on data-points within a clustero No. of pointso Average (e.g. average household income)o Spatial aggregation functions (ST_Intersection_Aggr, ST_EnvelopeAggr etc.)

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 11Public

x

y

Grid

x

y

DBSCAN

x

y

K-Means

• Good for first impression• Easy to use• Extreme fast grid clustering

• Best for non-spherical clusters• Density based• Higher complexity, better insights

• Best for spherical clusters• Centroid based• Higher complexity, better insights

Spatial ClusteringClustering Algorithms

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 12Public

Spatial ClusteringGrid Clustering Algorithm

• Quick & easy way to cluster; Use other algorithms for deeper analysis

• Clustering defined w.r.t. a grid; Every cell is a unique cluster (cluster id)

• Parameters• X CELLS <int> Y CELLS <int> (mandatory)

• The number of cells is X times Y. The number of clusters can be less than X times Y, because cells without points do not count as clusters. • BETWEEN <left-point> AND <right-point> (optional, X values)

• Determines the X values of the overall grid rectangle regardless of the points to be investigated

• BETWEEN <lower-point> AND <upper-point> (optional, Y values) • Determines the Y values of the overall grid rectangle regardless of the points to be investigated

• Methods Supported for the algorithm• ST_ClusterID(), ST_ClusterEnvelope()

x

y

Grid

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 13Public

Spatial ClusteringK-Means Clustering Algorithm• Higher complexity algorithm for better insights; suited for spherical clusters

• Clustering defined w.r.t. a central point; i.e. centroid based

• Not deterministic. Different runs can result in different results.

• Parameters• CLUSTERS <int> (mandatory)

• Number of clusters to be created

• MAXITERATIONS <int> (optional, default 128)• Maximum number of iterations to be performed

• THRESHOLD <int> (optional, default 0)• Threshold for the change of the sum of the squared distances between two iterations

• INIT RANDOM PARTITION or INIT FORGY (optional, default FORGY)• Initialization method; FORGY is the recommended method

• Methods Supported for the algorithm• ST_ClusterID(), ST_ClusterCentroid()

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 14Public

Spatial ClusteringDBSCAN Clustering Algorithm

x

y

• Higher complexity algorithm for better insights; suited for non-spherical clusters

• Density based; suited to handle “noise” (outliers)

• Parameters• EPS <int> (mandatory)

• Neighborhood radius

• MINPTS <int> (mandatory)• The number of data points that must be contained in the neighborhood of a point in order to make it a core point

• Methods Supported for the algorithm• ST_ClusterID()

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 15Public

Spatial ClusteringNew Methods

• ST_ClusterID()• returns cluster id for a given point; works for all algorithms (Grid, K-Means, DBSCAN)

• ST_ClusterCentroid()• returns the centroid of a cluster; works for K-Means algorithm

• ST_ClusterEnvelope()• returns geometry of a cell; works for Grid algorithm

• Examples:SELECT ST_ClusterCentroid() AS centroidcluster,COUNT(*) AS counterFROM CLUSTER.VENDING_MACHINESGROUP CLUSTER BY locationUSING KMEANSCLUSTERS 5MAXITERATIONS 10THRESHOLD 0.01INIT RANDOM PARTITION;

SELECT ST_ClusterID() AS cluster_id, COUNT(*) AS counterFROM CLUSTER.VENDING_MACHINESGROUP CLUSTER BY locationUSING GRID X CELLS 10 Y CELLS 10;

SELECT ST_ClusterID() AS cluster_id,ST_ClusterEnvelope() AS clusterenvelopeFROM CLUSTER.VENDING_MACHINESGROUP CLUSTER BY LOCATIONUSING GRID X CELLS 10 Y CELLS 10;

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 16Public

Identify for each point the corresponding clusters (similar to window functions)

SELECT ST_ClusterId() OVER (CLUSTER BY location USING KMEANS CLUSTERS 5 ) AS cluster_id, vm_id, location, revenue FROM vending_machinesWHERE revenue < 15000 ORDER BY cluster_id, revenue

Hypothetical Example:Clustering with metadata accessor (similar to group by aggregates)

SELECT ST_ClusterId() AS cluster_id, ST_ClusterCentroid() AS centroid, ST_ClusterEnvelope() AS envelope, count(*) AS num_hholds, avg(income) AS avg_clus_income FROM households WHERE income > 120000 GROUP CLUSTER BY location USING […Clustering Algorithm and Params] HAVING count(*) >= 300

Cluster_id Centroid Envelope Num_hholds Avg_clus_income

1 Point(5 4) Polygon((…)) 311 304.123

2 Point(15 78) Polygon((..)) 621 714.234

Cluster_id Vm_id Location Revenue

1 1 Point(1 1) 1.311

1 2 Point(1.4 1.2) 1.166

1 3 Point(1.2 1.3) 799

2 4 Point(5.3 5.0) 2.125

2 5 Point(5.7 6) 1.750

3 6 Point(20 20) 1.532

Spatial ClusteringExamples

Note: There are limitations on what functions work with which clustering algorithm. Hence above is just for illustrative purposes…

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 17Public

Spatial ClusteringNew Aggregation Methods

ST_ConvexHullAggr()– returns the convex hull of all the geometries in a group– convex hull of a geometry is the smallest convex geometry that contains all of the points in the geometry

ST_EnvelopeAggr Method

– returns the bounding rectangle of all the geometries in a group

ST_IntersectionAggr Method– returns a geometry that is the spatial intersection of all the geometries in a group

Note: All aggregation functions can be combined with spatial clustering.

SQLScript Support for Spatial Types

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 19Public

SpatialSQLScript Support

SQLScript now supports the spatial data type ST_GEOMETRY and SQL spatial functions to access and manipulate spatial data.

Example:CREATE FUNCTION get_distance ( IN first ST_GEOMETRY, IN second ST_GEOMETRY )RETURNS distance double ASBEGIN distance = :first.st_distance(:second);END;

CREATE PROCEDURE nested_call ( IN first ST_GEOMETRY, IN second ST_GEOMETRY, OUT distance double, OUT res3 CLOB )ASBEGIN distance = get_distance (:first, :second); res3 = :first.st_astext();END;

The procedure call:

CALL nested_call( first => st_geomfromtext('Point(7 48)'), second => st_geomfromtext('Point(2 55)'), distance => ?, res3 => ?);

Returns

Out(1) Out(2)----------------------------------------------------------------------8,602325267042627 POINT(7 48)

Miscellaneous

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 21Public

SpatialMisc. Support

Tables with spatial data can now be partitioned.– Great for scale-out processing for spatial data

OGC Compliance Certified (In progress)– Exchange data with 3rd party applications across multiple sources

Delivery unit available for all spatial reference ids (In progress)

SAP HANA EIM / SAP HANA Smart Data Quality provides native geocoding and reverse geocoding capabilities

Summary

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 23Public

Summary

Spatial Clustering

– Support for three clustering algorithms (Grid, K-Mean, DBSCAN) and associated methods– New Aggregation methods

SQLScript Support for Spatial Types Miscellaneous

– Partitioning of spatial data– OGC certification

© 2015 SAP SE or an SAP affiliate company. All rights reserved.

Thank you

Contact information

SAP HANA Product [email protected]