49
Public DMM117 – SAP HANA Processing Services: Text, Spatial, Graph, Series, and Predictive

Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive

Embed Size (px)

Citation preview

Public

DMM117 – SAP HANA Processing Services:Text, Spatial, Graph, Series, and Predictive

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 2Public

Speakers

Bangalore, October 5 - 7

Priyanka Nalakath

M S Poornapragna

Las Vegas, Sept 19 - 23

Anthony Waite

May Chen

Barcelona, Nov 8 - 10

Markus Fath

Anthony Waite

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 3Public

Disclaimer

The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of SAP. Except for your obligation to protect confidential information, this presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or any related document, or to develop or release any functionality mentioned therein.

This presentation, or any related document and SAP's strategy and possible future developments, products and or platforms directions and functionality are all subject to change and may be changed by SAP at any time for any reason without notice. The information in this presentation is not a commitment, promise or legal obligation to deliver any material, code or functionality. This presentation is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This presentation is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this presentation, except if such damages were caused by SAP’s intentional or gross negligence.

All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 4Public

Agenda

Introduction: a platform to analyze various data types

Text

Spatial

Graph

Series

Numbers

Public

Introduction

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 6Public

Example scenarios

Public Security

Generate real-time intelligence from multiple sources

• Case management, activities,master data

• Social media

• Phone monitoring

• Traffic data

Insurance

Analyze the impact of natural disasters from many perspectives

• Policy data, locations

• News/media

• Satellite imagery

• Business networks

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 7Public

SAP HANA – The Platform Powers the Digital Transformation

SAP HANA PLATFORMON-PREMISE | CLOUD | HYBRIDON-PREMISE | CLOUD | HYBRID

Public

Text

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 9Public

What types of text processing capabilities are supported?

Full-text searchIn addition to string matching, SAP HANA features full-text search which works on content stored in tables or exposed via views. Just like searching on the Internet, full-text search finds terms irrespective of the sequence of characters and words.

Text analysisCapabilities range from basic tokenization and stemming to more complex semantic analysis in the form of entity and fact extraction. Text analysis applies within individual documents and is the foundation for both full-text search and text mining.

Text miningText mining makes semantic determinations about the overall content of documents relative to other documents. Capabilities include key term identification and document categorization. Text mining is complementary to text analysis.

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 10Public

Full-text search

SAP HANA provides an in-database search engine Supports 32 languages and handles binary file

formats Modeling tools for search Search queries via built-in procedure, SQL, and

OData Linguistic and fuzzy (error tolerant) search

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 11Public

Full-text index and full-text search

CREATE COLUMN TABLE "RESEARCH_PAPERS" ("ID" INTEGER PRIMARY KEY,"AUTHOR" NVARCHAR(200),"MIMETYPE" NVARCHAR(200),"DOCUMENT" BLOB

);

CREATE FULLTEXT INDEX "FTI_RESEARCH_PAPERS_DOCUMENT"ON "RESEARCH_PAPERS"("DOCUMENT")

;

SELECT "ID", "AUTHOR", "DOCUMENT"FROM "RESEARCH_PAPERS" WHERE CONTAINS(

("AUTHOR", "DOCUMENT"), 'roberd software', FUZZY(0.8));

Full Text Indexing

Fu

ll Tex

t Ind

ex

Full Text Indexing

insert

ID DOC

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 12Public

Search models

In a search model you define the structure of your “search object” and how it is exposed to an application Tables and joins Columns

– Default columns for search– Weights for ranking– Fuzziness – Default columns for facets

TableTable

Model

Access

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 13Public

Search models and data access

CALL ESH_SEARCH (query,?);Built-in procedure to search on multiple search models with an “OData” query and a “JSON” response

CALL ESH_CONFIG (config);Built-in procedure to add search annotations (request/response, facets, UI areas etc.) to views

search annotations

TableTable

SQL

search annotations

JSON

UI

*any* View

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 14Public

Text analysis

SAP HANA provides in-database text analysis

Linguistic analysis

Entity extraction

e.g. persons, organizations

Fact extraction

e.g. sentiments, mergers & acquisitions

Grammatical role analysis

subject-predicate-object

Custom dictionaries and rules for domain adaptation

e.g. chemical substances, product launch

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 15Public

SAP HANA SAP HANA

ExtendedApplication Services

Text analysis

Text Analysis as an optional processing step “on top” of full-text indexing

Full Text Indexing

Fu

ll Tex

t Ind

ex

TextAnalysisResultsTable

Full Text Indexing with TA

insert

ID DOC

Text Analysis on non-persisted data

Text

Text Analysis

TextAnalysisResults

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 16Public

Text analysisadvanced configuration options

Custom dictionaries for domain specific entity extraction Dictionaries are stored in repository Updates to dictionaries are considered “immediately”

Standard Form

Variant Type

Arnold Schwarzenegger

Arnie American Film Actor

Sylvester Stallone

Sly American Film Actor

SAP SE SAP AG Company

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 17Public

for: currency

type: company

stem: acquire,

buy

type: company

Text analysisadvanced configuration options

Custom rules for domain specific fact extraction Rules are stored in repository Updates to rules are considered “immediately”

Rule elements Tokens, stems, part-of-speech tags Iteration operators Wildcards, alternation, negation Character classifiers (case-sensitivity) Grouping and containment (regEx)

*

SAP acquired Sybase for $5.8 billion

IBM buys Softlayer for $2 billion

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 18Public

Text analysisusing text analysis results

Search-based applications Include text analysis results in a search model for navigation and

filtering

Analytics Simple calculations like term frequencies and co-occurrence Clustering, topic modeling or other text mining techniques

– R, Predictive Analysis Library (PAL) functions

Geotagging Assign longitude/latitude coordinates to “location” entities

Graph Analysis Store co-occurrences or semantic triples as graph for pattern

matching, reasoning etc.

Result list item 1this is the abstract of the document shown in line 1

Result list item 1this is the abstract of the document shown in line 1

Result list item 1this is the abstract of the document shown in line 1

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 19Public

Text mining

SAP HANA provides in-database text mining

Identify similar documents

Identify key terms of a document

Identify related terms

Categorize new documents based on a training corpus

Scenarios

Highlight the key terms when viewing a patent document

Identify similar incidents for faster problem solving

Categorize new scientific papers along a hierarchy of topics

t1

tn

d1

d2

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 20Public

Text mining

The text mining table is built from the results of linguistic analysis.

Essentially, it is a large term-document matrix.

The matrix is fully accessible for custom algorithms.

Full Text Indexing

Fu

ll Tex

t Ind

ex

TextMiningTable

TextAnalysis

Table

insert

ID DOC

Full Text Indexing with TA and TM

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 21Public

SAP HANA

Text mining

Text mining functions• Related documents• Relevant terms• Related terms• Classify kNN• and more

Text MiningTables

TM SQLExtended

Application Services

Text Mining.js API

Public

Spatial

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 23Public

Spatial

SAP HANA provides native spatial data processing

Store 2D and 3D vector datatypes

50+ geospatial functions and algorithms

Geocoding and reverse geocoding

Geo content (GAB) and mapping services

Open standards (OGC, 1999 SQL/MM)

SDK for custom geospatial algorithms

Bulk and streaming data integration capabilities

Integration with Esri, Pitney Bowes, HERE and more

Spatial Analytics with SAP HANAiDMM270 (H2)

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 24Public

Geographic dataCategories

Vector data

Point, Linestring, Polygon, MultiPoint, …

Networks, Topologies, Point Clouds, …

Metadata– spatial reference systems (SRS) – unit of measures (UOM)

Raster data

Gridded datae.g. digital terrain elevation, weather information

Image datae.g. created from optical or spectral sensors

Metadata Raster- and grid information Spatial- and band reference system

Point Linestring Polygon CircularString

14 35 25

17 39 59

16 15 17

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 25Public

Spatial predicates

g1 g2

g1

g2

g1.ST_Touches(g2)

(g1 ∩ g2 ≠ ) (B(∅ ∧ g1) ∩ B(g2) = )∅

g1.ST_Within(g2)

g1 ∩ g2 = g1 I(g1) ∩ E(g2) = ø∧

g1.ST_Equals(g2)

g1 = g2

g2

g1

g1 g1.ST_Crosses(g2)

I(g1) ∩ I(g2) ≠ (g1 ∩ g2 ≠ g1) (g1 ∩ g2 ≠ g2)] ∅ ∧ ∧

g2

g1

g1

g2

g1.ST_Overlaps(g2)

(I(g1) ∩ I(g2) ≠ ) ∅ ∧

(I(g1) ∩ E(g2) ≠ ) ∅ ∧

(E(g1) ∩ I(g2) ≠ ) ∅

g1.ST_Intersects(g2)

g1 ∩ g2 ≠ ø

g1

g2

g1.ST_Disjoint(g2)

g1 ∩ g2 = ø

g1

g2

g2

g1

g2

g1

g2

g1.ST_Contains(g2)

g1 ∩ g2 = g2 I(g1) ∩ I(g2) ≠ ∧ø

g2

g1

g1

g1.ST_Covers(g2) *

g1 ∩ g2 = g2

g2

g1g2

* No OGC standard

g1g2

g2

g1

g1g2

g1 g2

g1 g2

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 26Public

Spatial clustering and joins

Clustering - grid, k-means, dbscan

SELECT ST_ClusterId() AS CID, ST_ClusterCentroid() AS CENTROID, COUNT(*) AS C

FROM "RESEARCH_ORGANIZATIONS"GROUP CLUSTER BY "LON_LAT" USING KMEANS CLUSTERS 5;

Join

SELECT * FROM "RESEARCH_ORGANIZATIONS" AS T1,

"PROJECT_LOCATION" AS T2WHERE T2."LON_LAT".ST_DISTANCE(

T1."LON_LAT", 'kilometer‚) <100;

spherical clusters non-spherical clusters

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 27Public

Spatial joins in Calculation View modeler

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 28Public

SpatialGeocoding

SAP HANA supports geocoding, reverse geocoding, and address cleansing.

This data transformation/ enrichment can either run local (reference data is stored in HANA) or via a remote service.

Local geocoding and address cleansing is handled by SAP HANA smart data quality.

SAP HANA

Geocode reference data

Geocoding service,

e.g. HERE

Address DataLongitude, Latitude

Geocode transform or

geocode index

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 29Public

SpatialGeo content and services

SAP HANA includes HERE mapping content and services

Mapping services API/SDK

Map content for “generalized administration boundaries” (GAB) and “postcode areas” (POC)

mapping service

SAP HANA

mapcontent

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 30Public

Sample spatial clients

SAP HANA

ODBCEsri ArcGIS

Server

SAP Business Objects Cloud

Esri ArcGIS Portal

Esri ArcGIS Desktop

MapService

QueryLayer

ODBC

shapefileupload

Native SAP UI5 app

ExtendedApplication Services

Public

Graph

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 32Public

Graph

SAP HANA provides a native graph engine property graph model full transactional (ACID) properties basic graph functions like shortest path and strongly

connected components native graph viewer tightly integrated in SAP HANA operations (security, backup

etc.)

Benefits Store and analyze graph data in real-time Tools and graph algorithms to navigate and extract insight

from relationship data Combine text, spatial, and advanced analytics with

relationship intelligence

SAP HANA Graph Processing: Information and Demonstrationi

DMM212 (L1)

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 33Public

Workspace

Property graph

Powerful and flexible property graph model

vertices (nodes) and edges (relationships) tables

vertices connected via multiple edges of any type

dynamic graph workspace view

Up-to-date insights without replicating data

Enhance graph semantic by adding new attributes to vertices and edges

Key Name Birthdate

Herman Herman Hesse 19270530

Samuel Samuel Becket 19281001

Key Source Target Type

1 Maria Herman hasSon

2 Maria Samuel hasSon

Vertices Edges

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 34Public

Graph algorithms

Neighborhood Search Shortest Path Strongly Connected Components

Pattern Matching

AphroditeHera ArtemisCronus

LetoHadesPoseidonGaia

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 35Public

SELECT * FROM GET_SHORTEST_PATHS ORDER BY "WEIGHT" WITH PARAMETERS ( 'placeholder' = ('$start$', ['zeus']), 'placeholder' = ('$level$', '5'));

With a calculation view, a graph node can be used which triggers a graph algorithm

When retrieving data from a calculation view, the graph algorithm is executed.

Graph modeler

Public

Series

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 37Public

Series data

SAP HANA provides native support for series data

Store and generate series data

SQL integration for query processing

Detect and correct errors or anomalies

“Horizontal” aggregation/disaggregation (e.g. hourly to daily)

Series analysis (similarity, regression, smoothing, binning etc.)

Benefits

Efficient, scalable storage of series data

Simple and concise SQL interface

Optimized series algorithms

Seamless integration into existing database

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 38Public

Series table

CREATE COLUMN TABLE "WEATHER"(

"STATION_ID“ varchar(3) not null references "WEATHER_STATION",

"DATE“ date not null,

"MAXTEMP“ decimal(3,1),

primary key("STATION_ID", "DATE")

) SERIES (

SERIES KEY("STATION_ID")

EQUIDISTANT INCREMENT BY 1 DAY MISSING ELEMENTS NOT ALLOWED

PERIOD FOR SERIES ("DATE", NULL)

);

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 39Public

Series data functions

Functions that make it easier to manipulate series data

SERIES_GENERATE – Generate a complete series

SERIES_DISAGGREGATE – Move from coarse units (day) to finer (hour)

SERIES_ROUND – Convert a single value to a coarser resolution

SERIES_PERIOD_TO_ELEMENT – Convert a timestamp in a series to its offset from start

SERIES_ELEMENT_TO_PERIOD – Convert an integer to the associated period

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 40Public

Analytical functions

Functions for analyzing series data:

LINEAR_APPROX – Replace NULL values by interpolating adjacent non-NULL values

CUBIC_SPLINE_APPROX – Replace NULL values by interpolating adjacent non-NULL values

CORR – Pearson product-moment correlation coefficient

CORR_SPEARMAN – Spearman rank correlation

DFT – Compute the discrete Fourier transform

MEDIAN

AUTO_CORR – Correlation of a (sub-)series with itself at varying lags

Public

Advanced Analytics

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 42Public

Advanced Analytics

SAP HANA provides in-database data mining

Application Function Library (AFL) contains packages for data mining and predictive analysis, e.g. Predictive Analysis Library (PAL)

– Native algorithms for advanced analysis

– In-database processing for fast results

– Support for common data mining tasks like clustering, classification, association, time series etc.

R integration for SAP HANA

– use the R open source environment in context of SAP HANA

– R integration via fast, parallelized connection

– R script is embedded within SAP HANA SQL Script

Introduction to Predictive Modeling and Application Deployment for SAP HANAi

DMM271 (H2)

BA101 (L1)

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 43Public

Advanced Analytics

SAP applications

SAP HANA Platform

Integration Services

SpatialText Analysis, Text Mining

GraphRules Engine

OtherMachine

DataLocation

DataTextTransaction

SAP Predictive Analytics

Application Function Library

APL, BFL, PAL, UDF, OFL, etc.

R

SAP HANA Studio & Application

Function ModelerSmart Data Access

Event Stream Processing Smart Data IntegrationEmbedded Predictive

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 44Public

Advanced AnalyticsPredictive Analysis Library (PAL)

SAP HANA In-Memory Predictive Analytics

SAP HANA embeds multiple advanced analytics function libraries, optimized for massive parallel in-memory processing Predictive Analytics Library

– Core of numerous powerful, native predictive algorithms for in-database & in-memory processing that fully exploit the power of SAP HANA, resulting in quicker insight and faster implementations

Content and Usage– The library includes common as well as specialized algorithms targeting

various data mining and machine learning areas– Leveraged and embedded in native SAP applications and usage from within

SAP HANA development tools as well as SAP Predictive Analytics

Scenarios & Use Cases– Various LoB / industry scenarios making use of Association Analysis, Time

Series Forecasting, Link Prediction, Predictive Modeling, etc.

SAP HANA Platform

Predictive Analysis LibraryPredictive Analysis Library

continuous growth and enhancements

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 45Public

Advanced AnalyticsPredictive Analysis Library (PAL)

Association Analysis– Apriori – Apriori Lite– FP-Growth – KORD – Top K Rule Discovery

Classification Analysis– CART– C4.5 Decision Tree Analysis– CHAID Decision Tree Analysis– K Nearest Neighbor– Logistic Regression (incl. SGD)– Neural Network– Naïve Bayes– Random Forest– Support Vector Machine– Parameter Selection / Model

EvaluationConfusion Matrix, Area Under Curve

Regression– Multiple Linear Regression– Polynomial Regression– Exponential Regression– Bi-Variate Geometric Regression– Bi-Variate Logarithmic Regression

Probability Distribution– Distribution Fit– Cumulative Distribution Function– Quantile Function– Kaplan-Meier Survival Analysis

Outlier Detection– Inter-Quartile Range Test

(Tukey’s Test)– Variance Test – Anomaly Detection– Grubbs Outlier Test

Link Prediction– Common Neighbors– Jaccard’s Coefficient– Adamic/Adar– Katzβ

Data Preparation– Sampling, Random Distribution S.– Binning– Scaling– Partitioning– Principal Component Analysis (PCA)

Statistic Functions (Univariate)– Mean, Median, Variance, Standard

Deviation– Kurtosis– Skewness

Statistic Functions (Multivariate)– Covariance Matrix– Pearson Correlations Matrix– Chi-squared Tests:

Test of Quality of FitTest of Independence

– F-test (variance equal test)

Other– Weighted Scores Table– Substitute Missing Values

Cluster Analysis– ABC Classification– DBSCAN – K-Means– K-Medoid Clustering– K-Medians– Kohonen Self Organized Maps– Agglomerate Hierarchical– Affinity Propagation– Latent Dirichlet Allocation (LDA)– Gaussian Mixture Model (GMM)– Cluster Assignment

Time Series Analysis– Single/Double/Triple Exponential

Smoothing– Forecast Smoothing– ARIMA/ Seasonal ARIMA– Brown Exponential Smoothing– Croston Method– Linear Regression with Damped Trend

and Seasonal Adjust– Forecast Accuracy Measures,

Test for White Noise, Trend, Seasonality

Public

Demo

Subtitle/name of demo here

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 47Public

SAP TechEd Online

Continue your SAP TechEd education after the event!

Access replays of Keynotes Demo Jam SAP TechEd live interviews Select lecture sessions Hands-on sessions …

http://sapteched.com/online

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 48Public

Further information

Related SAP TechEd sessions:DMM212 - SAP HANA Graph Processing: Information and Demonstration (L1)DMM270 - Spatial Analytics with SAP HANA (H2)DMM271 - Introduction to Predictive Modeling and Application Deployment for SAP HANA (H2)

SAP Public Webscn.sap.com www.sap.com

SAP Education and Certification Opportunitieswww.sap.com/education

Watch SAP TechEd Onlinewww.sapteched.com/online

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 49Public

Thanks for attending this session.

Please complete your session evaluation for DMM117.

Contact information:

Markus [email protected]

Feedback