Upload
sap-technology
View
1.171
Download
1
Embed Size (px)
Citation preview
1© 2014 SAP AG or an SAP affiliate company. All rights reserved.
SAP HANA SPS 11 - What’s New? HANA Spatial
Raj Rathee, SAP HANA Product Management December, 2015(Delta from SPS 10 to SPS 11)
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 2Public
Disclaimer
This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP.
SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP’s strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice.
This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 3Public
Agenda
Spatial Clustering SQLScript Support for Spatial Types Miscellaneous
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 5Public
Spatial ClusteringUse Cases
Retail Store Location• Retailer wants to establish two chains of stores
• One luxury brand, one discount• Has location & income data of 120M US
households• Needs to be able to locate areas of high income
and low income households to strategically locate the two different chains
Analysis of Vending Outlets• Retailer has vending machines located across the
US• Has location and revenue generated of each
machine• Needs to be
• Analyze lucrative and non-lucrative locations• Easily visualize these locations• Do ad-hoc analysis – e.g. zoom in and get
specific machine details
• Spatial Clustering groups a set of points, that meets certain criteria, into clusters• A cluster is a partition of the original set of points• Spatial clustering essentially allows geographical information to be used to group data
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 6Public
Spatial Clustering(and GROUP BY/PARTITION BY)• GROUP BY/PARTITION BY can be used to group data on certain attributes (e.g. city, state,
country) – but provides basic information
• With spatial clustering, can use geographical information w/ other attributes (e.g. income, revenue) to cluster data
GROUP BY CLUSTER BY
Groups the data set using information contained in the column(s)
Splits the data set into clusters. Clusters are determined by geospatial point data using an algorithm
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 7Public
Spatial Clustering(and GROUP BY/PARTITION BY)
CREATE COLUMN TABLE HOUSEHOLDS ( hhid INT PRIMARY KEY, state CHAR (2), location ST_POINT, income DECIMAL (11, 2));
SELECT
ST_ClusterId() AS cluster_id,
ST_ClusterCentroid() AS centroid,
count(*) AS num_hholds,
avg(income) AS avg_clus_income
FROM households
WHERE income > 120000
GROUP CLUSTER BY location USING DBSCAN EPS 4 MINPTS 1000
HAVING count(*) >= 300
SELECT state, count(*) AS number_of_households, avg(income) AS household_income FROM households WHERE income > 120000 GROUP BY STATE HAVING count(*) >= 300;
Grouping on a predefined area(by state)
Using spatial clustering – cluster algorithm used to create the groups
Specification of clustering algorithmto use
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 10Public
Spatial ClusteringClustering Algorithms
• Spatial Clustering relies on clustering algorithms• Parameters to the algorithm used to influence the grouping criteria
• Algorithms Supported• Grid• K-Means• DBSCAN
• Typical operations done on a cluster • Which cluster (i.e. cluster ID a data point belongs to)• Finding the centroid (“central point”) and envelope (“bounding area”) of a cluster• Aggregations on data-points within a clustero No. of pointso Average (e.g. average household income)o Spatial aggregation functions (ST_Intersection_Aggr, ST_EnvelopeAggr etc.)
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 11Public
x
y
Grid
x
y
DBSCAN
x
y
K-Means
• Good for first impression• Easy to use• Extreme fast grid clustering
• Best for non-spherical clusters• Density based• Higher complexity, better insights
• Best for spherical clusters• Centroid based• Higher complexity, better insights
Spatial ClusteringClustering Algorithms
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 12Public
Spatial ClusteringGrid Clustering Algorithm
• Quick & easy way to cluster; Use other algorithms for deeper analysis
• Clustering defined w.r.t. a grid; Every cell is a unique cluster (cluster id)
• Parameters• X CELLS <int> Y CELLS <int> (mandatory)
• The number of cells is X times Y. The number of clusters can be less than X times Y, because cells without points do not count as clusters. • BETWEEN <left-point> AND <right-point> (optional, X values)
• Determines the X values of the overall grid rectangle regardless of the points to be investigated
• BETWEEN <lower-point> AND <upper-point> (optional, Y values) • Determines the Y values of the overall grid rectangle regardless of the points to be investigated
• Methods Supported for the algorithm• ST_ClusterID(), ST_ClusterEnvelope()
x
y
Grid
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 13Public
Spatial ClusteringK-Means Clustering Algorithm• Higher complexity algorithm for better insights; suited for spherical clusters
• Clustering defined w.r.t. a central point; i.e. centroid based
• Not deterministic. Different runs can result in different results.
• Parameters• CLUSTERS <int> (mandatory)
• Number of clusters to be created
• MAXITERATIONS <int> (optional, default 128)• Maximum number of iterations to be performed
• THRESHOLD <int> (optional, default 0)• Threshold for the change of the sum of the squared distances between two iterations
• INIT RANDOM PARTITION or INIT FORGY (optional, default FORGY)• Initialization method; FORGY is the recommended method
• Methods Supported for the algorithm• ST_ClusterID(), ST_ClusterCentroid()
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 14Public
Spatial ClusteringDBSCAN Clustering Algorithm
x
y
• Higher complexity algorithm for better insights; suited for non-spherical clusters
• Density based; suited to handle “noise” (outliers)
• Parameters• EPS <int> (mandatory)
• Neighborhood radius
• MINPTS <int> (mandatory)• The number of data points that must be contained in the neighborhood of a point in order to make it a core point
• Methods Supported for the algorithm• ST_ClusterID()
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 15Public
Spatial ClusteringNew Methods
• ST_ClusterID()• returns cluster id for a given point; works for all algorithms (Grid, K-Means, DBSCAN)
• ST_ClusterCentroid()• returns the centroid of a cluster; works for K-Means algorithm
• ST_ClusterEnvelope()• returns geometry of a cell; works for Grid algorithm
• Examples:SELECT ST_ClusterCentroid() AS centroidcluster,COUNT(*) AS counterFROM CLUSTER.VENDING_MACHINESGROUP CLUSTER BY locationUSING KMEANSCLUSTERS 5MAXITERATIONS 10THRESHOLD 0.01INIT RANDOM PARTITION;
SELECT ST_ClusterID() AS cluster_id, COUNT(*) AS counterFROM CLUSTER.VENDING_MACHINESGROUP CLUSTER BY locationUSING GRID X CELLS 10 Y CELLS 10;
SELECT ST_ClusterID() AS cluster_id,ST_ClusterEnvelope() AS clusterenvelopeFROM CLUSTER.VENDING_MACHINESGROUP CLUSTER BY LOCATIONUSING GRID X CELLS 10 Y CELLS 10;
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 16Public
Identify for each point the corresponding clusters (similar to window functions)
SELECT ST_ClusterId() OVER (CLUSTER BY location USING KMEANS CLUSTERS 5 ) AS cluster_id, vm_id, location, revenue FROM vending_machinesWHERE revenue < 15000 ORDER BY cluster_id, revenue
Hypothetical Example:Clustering with metadata accessor (similar to group by aggregates)
SELECT ST_ClusterId() AS cluster_id, ST_ClusterCentroid() AS centroid, ST_ClusterEnvelope() AS envelope, count(*) AS num_hholds, avg(income) AS avg_clus_income FROM households WHERE income > 120000 GROUP CLUSTER BY location USING […Clustering Algorithm and Params] HAVING count(*) >= 300
Cluster_id Centroid Envelope Num_hholds Avg_clus_income
1 Point(5 4) Polygon((…)) 311 304.123
2 Point(15 78) Polygon((..)) 621 714.234
…
Cluster_id Vm_id Location Revenue
1 1 Point(1 1) 1.311
1 2 Point(1.4 1.2) 1.166
1 3 Point(1.2 1.3) 799
2 4 Point(5.3 5.0) 2.125
2 5 Point(5.7 6) 1.750
3 6 Point(20 20) 1.532
…
Spatial ClusteringExamples
Note: There are limitations on what functions work with which clustering algorithm. Hence above is just for illustrative purposes…
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 17Public
Spatial ClusteringNew Aggregation Methods
ST_ConvexHullAggr()– returns the convex hull of all the geometries in a group– convex hull of a geometry is the smallest convex geometry that contains all of the points in the geometry
ST_EnvelopeAggr Method
– returns the bounding rectangle of all the geometries in a group
ST_IntersectionAggr Method– returns a geometry that is the spatial intersection of all the geometries in a group
Note: All aggregation functions can be combined with spatial clustering.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 19Public
SpatialSQLScript Support
SQLScript now supports the spatial data type ST_GEOMETRY and SQL spatial functions to access and manipulate spatial data.
Example:CREATE FUNCTION get_distance ( IN first ST_GEOMETRY, IN second ST_GEOMETRY )RETURNS distance double ASBEGIN distance = :first.st_distance(:second);END;
CREATE PROCEDURE nested_call ( IN first ST_GEOMETRY, IN second ST_GEOMETRY, OUT distance double, OUT res3 CLOB )ASBEGIN distance = get_distance (:first, :second); res3 = :first.st_astext();END;
The procedure call:
CALL nested_call( first => st_geomfromtext('Point(7 48)'), second => st_geomfromtext('Point(2 55)'), distance => ?, res3 => ?);
Returns
Out(1) Out(2)----------------------------------------------------------------------8,602325267042627 POINT(7 48)
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 21Public
SpatialMisc. Support
Tables with spatial data can now be partitioned.– Great for scale-out processing for spatial data
OGC Compliance Certified (In progress)– Exchange data with 3rd party applications across multiple sources
Delivery unit available for all spatial reference ids (In progress)
SAP HANA EIM / SAP HANA Smart Data Quality provides native geocoding and reverse geocoding capabilities
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 23Public
Summary
Spatial Clustering
– Support for three clustering algorithms (Grid, K-Mean, DBSCAN) and associated methods– New Aggregation methods
SQLScript Support for Spatial Types Miscellaneous
– Partitioning of spatial data– OGC certification
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Thank you
Contact information
SAP HANA Product [email protected]