44
Welcome! © 2013 SAP AG. All rights reserved.

Human Face of Big Data - archive.sap.com

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Welcome!

© 2013 SAP AG. All rights reserved.

Big Data - Introduction to SAP Big Data Technologies

Big Data - Streaming Analytics

Big Data - Smarter Data Virtualization

Big Data - Gain New Insight from Hadoop

Big Data - Spatial Data Processing for Richer Insights

Big Data - Text Analytics

SAP Big Data Webinar Series

© 2013 SAP AG. All rights reserved.

Speaker Introductions

Marie Goodell is a Senior Director on the Big Data Solution

Marketing team at SAP. She specializes in Big Data topics

such as text analytics, spatial analytics, and Hadoop. In this

role, she is engaged with customers and partners on Big Data

projects that deliver unprecedented insight for real-time

business. Prior to SAP, Marie worked in various marketing

leadership roles at Adobe, Oracle and IBM. Marie holds an MS

from the University of Minnesota. Follow Marie at @mhgoodell

on Twitter.

3 © 2013 SAP AG. All rights reserved.

Presented by: Marie Goodell, SAP

Gain Unprecedented Insight with Text Analytics

SAP Big Data Webinar Series

© 2013 SAP AG. All rights reserved.

There is no doubt that the amount of

Big Data created, collected &

managed grows more each day

Traditional content types, including

unstructured data, are growing by

up to 80% per year.

What if you could combine

business data with unstructured content….

Web content

Call centers

Social networks

Data streams

…to gain unprecedented contextual insight from big data?

Unstructured

“why” Structured

“what”

Unified Information Access from SAP

Gaining insight from text sources

Unified

Information

Access

Platform

Integrate large volumes of structured,

semi-structured and unstructured data

Process information in real- time for

rapid analysis and decision making

Combine search, text analysis, and

database processing for ad hoc analysis

Offer traditional business intelligence

(e.g. exploration, dashboards,

visualizations, reports)

SAP HANA

Search and Text Analysis

SAP HANA

Real-time Replication Services Data Services In-Memory Database

Planning and Calculation Engine R and Hadoop Integration Predictive Analysis & Business Function

Libraries

Information Composer and Modeling Studio Spatial Processing Text Search & Text Analysis

SAP HANA

Platform for a new class of real-time analytics & applications

Real-time Applications

Real-time Analytics

Real-time Platform

Operational Reporting

Core Business Acceleration

Database

Data Warehousing

Planning and Optimization

Mobile

Predictive & Text Analytics on

Big Data

Sensing

and Response

Cloud

SAP HANA Search

Search structured and unstructured text in single platform

Native search 1 infrastructure for both unstructured

content and structured data (OLAP or OLTP)

Full-text, fuzzy, free-style

Graphical modeling Easy to use search definition

Built into existing tools

Information Access toolkit Rapid development of search-based

applications through reusable UI building blocks

Capabilities: SAP HANA Info Access includes:

UI & Client Library Toolkit

Application

SAP Services

Allows for visualization and interaction of data in SAP HANA full-text searching, filtering, drill-down, charts)

Supports iPad (download from Apple App Store) and Browser (HTML5) deployments

Consumes SAP HANA models

NOT a general purpose BI tool!

Benefits: Quick development and deployment time

Low TCO and fast response times with 2-tier architecture

Included with SAP HANA license

SAP HANA Info Access

Configuration toolkit for search-based applications

SAP HANA Text Analysis

Unlock key information from text to drive business insight

Acquire Unstructured Text

Once structured it can be…

• Integrated

• Queried

• Analyzed

• Visualized

• Reported against

1.Extract meaning

2.Transform into structured

data for analysis

3.Cleanse and match

SAP HANA

Unlocks key information from

text sources to drive business

insight

SAP HANA Text Analysis

Filter files, extract meaning, structure, and analyze

File Filtering Unlock text from binary documents

Extract & process unstructured text data from popular file formats (txt, html, xml, pdf, doc, ppt, xls, rtf, msg, etc.)

Native Text Analysis Expose linguistic markup for text mining uses

Classify core entities (people, companies, things, etc.)

Identify domain facts (sentiments, topics, requests, etc.)

Supports up to 31 languages for linguistic mark-up and extraction dictionary and 13 languages for predefined core extractions

Transform to Structured Data Query / Analyze

Visualize / Report

Core and Domain Extraction

Extract 2 sorts of elements from text

Core Entities: Davey Jones was one of the Monkeys.

<PERSON> Davey Jones </PERSON> was

one of the Monkeys.

Domain Facts: I love your product.

I <STRONGPOSITIVESENTIMENT> love

</SPS><TOPIC> your product </TOPIC>.

This is not a keyword search. Text Data

Processing applies full linguistic and statistical

techniques to make sure the entities which

get returned are correct.

Grammatical Parsing • Can we bill you?

• Bill was the president.

Grammatical Parsing • I talked to Bill yesterday.

• The duck has a bill.

• The bill was signed into law.

Location ADDRESS1 245 First Street Floor 16

ADDRESS2 Cambridge, MA 02142

LOCALITY Cambridge

REGION@MINOR Napa County

REGION@MAJOR Connecticut

COUNTRY Brazil

CONTINENT South America

GEO_FEATURE Mount Fuji

GEO_AREA Scandinavia

Personal Data NAME_DESIGNATOR c/o, attn

TITLE President

PERSON Barak Obama

PEOPLE Greeks

LANGUAGE Greek

Organization ORGANIZATION@COMMERCIAL AT&T

ORGANIZATION@EDUCATIONAL University of Washington

ORGANIZATION@OTHER FBI

PRODUCT iPhone

TICKER NYSE:SAP

Social Media SOCIAL_MEDIA@TWITTER_ID @SAP

SOCIAL_MEDIA@TWITTER_TOPIC #HANA

Time DATE 2/14/2013

DAY Monday

MONTH June

YEAR 2013

TIME 3:47pm

TIME_PERIOD 3 days, from 9 to 5pm

HOLIDAY Memorial Day

Format CURRENCY 17 euros

MEASURE 217 meters

PERCENT 4%

PHONE 617-677-2030

NIN@US_SSN 522-89-2255

NIN@FR_INSEE xxx

NIN@CA_SIN xxx

URI@EMAIL [email protected]

URI@IP 165.14.2.0

URI@URL http://sap.com

Syntactic Entities NOUN_GROUP big umbrella

PROP_MISC Cup o’ Soup

Predefined Entity Types

Examples

Building a Search-Based Application with SAP HANA

Text search and/or text analysis

Configure App

Use SAP HANA Info Access toolkit to define layout and data for the App

Create Model

Use SAP HANA Studio to define the search data model and configure the search behavior

Run Text Analysis

Extract salient information from text (Linguistic Markup, Entity & Sentiment Extraction)

Create Full-text Index

Use SAP HANA Studio to create full-text indexes for search, file filtering, and optionally run Text Analysis

Consume Data

Search on Text and/or filter, analyze, and perform advanced analytics on Text Analysis table output

SAP HANA Search and Text Analysis

Benefits

For the Business For IT

Exploit Unstructured Data

Ability to extract and analyze information

from unstructured content

Flexibility

Perform text search, text analysis, and

analytics all in one unified platform

Faster Time to Analysis

Achieve faster search and analysis results

by leveraging a high-performing in-memory

platform

Landscape Simplification

Reduces redundant data persistency,

engines, and data movement

Total Cost of Development

One unified platform and model for text

search, text analysis, and analytics

Unified Access Layer

Quickly develop and connect applications to

search and explore data in SAP HANA

SAP Data Services

Text Data Processing

SAP Data Services

Text data processing architecture

SAP Data Services

Text Data Processing

Entity extraction

SAP Data Services Designer

TDP Job Set up

Sources

Semantic Layer for

query & analytics

End User Apps or

dashboards ETL Designer

Business User

Entities,

concepts,

sentiments

Data Quality

transforms

Targets

Execute Text Data Processing to Mine Sentiment

Extract valuable data from Hadoop without coding

SAP Data Services

1. Detects the Hadoop data source and pushes down

text analytics query

2. Generates Pig script which is sent

to Hadoop

Apache Hadoop Distribution

3. Initiates MapReduce job to execute text data

processing in Hadoop

4. Starts MapReduce sub-tasks on

nodes where data resides

SAP Data Services

5. Extracts relevant text data

6. Rapidly loads to SAP HANA

v Apache Hadoop

SAP Data Services

1

PIG

Generator

2

Map Reduce 3

v

SAP HANA

5

4

6

Text data processing

Configure a Text Data Processing Job

SAP Data Services Designer

Integrate Structured and Unstructured Fields

SAP Data Services (Data Integration)

Input models and serial

numbers (entity extraction)

Integrate Structured and Unstructured Fields (continued)

SAP Data Services (Text Data Processing)

Consumer doesn’t like the gentle action of thew washer, she feels that it should be slower. Tech

explained the operation of the gentle mode, consumer is still not happy and feels STS is falsily

advertisting the product. The washer is operating as designed, gentle mode is the same speed as

normal, however; in the gentle mode the wash time is shorter. A new washer will not solve the issue.

tech stated the unit appears locked but is not. advise to ck or replace wiring to door lock ass'y and ck

the pressure switch.repairable

tech stated while putting on the doors the frt of unit on frzr side is dented in at bottom frame and

doors do not seal. this unit has an air leakage permanent. nonrepairable.

Solenode chattering. Recommend replacing dispenser switch WR23X366.

Identify and Extract Concepts

SAP Data Services (Text Data Processing)

Concepts extracted

from full text

description

Aggregate Concepts In Output

SAP Data Services (Data Quality Management)

Match aggregates

concepts based on

similarity into groups

that can be used for

reporting

When to Use…

SAP HANA (Text Analytics) vs SAP Data Services (TDP)

If you want to… Text Analysis

in HANA

Text Data Processing in Data

Services

Load data into HANA using SAP SLT or a 3rd party ETL tool;

then analyze textual data using text analysis capabilities in HANA

Leverage native search capabilities in HANA in conjunction with text analytics (e.g. search-based and

text mining applications for investigative discovery)

Have HANA automatically re-index frequent changes to text analysis processes (without having to re-

load the data)

Access linguistic markup generated when text is processed, which is persisted in HANA (e.g.

tokenization, uninflected forms / stemming, part of speech)

Have high-performing text analytics in HANA

Not load, store, or process the unstructured text data or documents in HANA (because of cost / space

concerns)

Perform text analytics at the source (e.g. push TDP natively down into Hadoop) to uncover relevant

nuggets of info that can be loaded into HANA

Perform transformations before loading data into HANA (e.g. cleanse, match / de-duplicate and enrich

text data)

Utilize your own custom dictionaries and rules

Support continuous text analytics workloads that are submitted regularly

Expose text analytics as real-time Web Service

SAP HANA Sentiment Intelligence

Rapid Deployment Solution

Analyze Customer Sentiment

Improve satisfaction and competitiveness

Sentiments/ Opinions

Product perceptions

Buying experience

Service quality

Requests Trends Issues/ Problems

Topics/ Contexts

Gain unvarnished insights and direct touch with your most

vocal customers regarding:

Tap the full potential of unstructured data and

social media

Source: http://www.pinnaclecart.com/blog/2012/12/21/6-ways-to-make-social-

media-work-for-your-ecommerce-business

SAP Sentiment Intelligence rapid deployment solution

Accelerate insight from unstructured data

Pre-configured data acquisition from

public social media and other

unstructured text sources, automated text

data processing (NLP)

Integration with SAP CRM campaign /

promotions and service management

SAP HANA HTML5 Information Access

(InA), SAP BusinessObjects Explorer

Views, SAP LUMIRA, Mobile for prebuilt

analytical reporting and action taking,

extraction routines, transformations,

loading, universes, and analytics, based on

pre-defined SAP HANA models

How-to guides and additional service

offering to extend unstructured channels

integration and analytical reporting

From (un)structured data… …to insight!

SAP HANA Info Access, SAP BusinessObjects BI,

SAP Lumira, Mobile

SAP Data Services

SAP HANA with text analysis and models &

views

SAP HANA

Sentiment

Intelligence

Step 1: Acquire Unstructured Data

SAP Data Services: Extraction, transformation,

and loading (ETL) data flow

Designer

Job server

Repository

SAP Data Services Designer

Online source’s API calls with configurable

search parameters

Data Flow with user-defined transforms via

Python

Power user configuration environment

Job Schedule for real time data loading

SAP HANA Text Analysis : System view

Core entity and fact domain extraction

Predefined core entities (who, what,

when, where, etc.)

Customization via dictionaries and rules

Natural language processing (NLP)

Named entity recognition (NER)

Leveraging of the voice of the customer

domain rule set

Sentiment status augmentation on a

detailed entity level

Step 2: Analyze Text within SAP HANA

Leverage voice of customer domain extraction

The following major fact types are classified:

Sentiments: expression of a customer’s feelings about something

Problems: a statement about something which impedes a customer’s work

Requests: expression of a customer’s desire for an enhancement/change

Profanity: defines a set of pejorative vocabulary

Emot-icons: expression of someone's feelings about the whole sentence or situation

Within each of these rules certain sub entities are classified. Any rule may have an associated

TOPIC sub entity which, in addition to the sub entitles described on the following slides, describes

the person, service, product, etc. which the sentiment, problem, or request is about.

I hate this book.

I never received the book.

Please send me a new book.

How it Works: Extract & Classify Sentiment

Voice of the customer domain fact extraction

Strong Positive Sentiment – expression of a strongly positive opinion great, excellent, love, etc.: Barbara loves SAP.

Weak Positive Sentiment – expression of a weakly positive opinion good, nice feature, fine, like, etc.: I like BusinessObjects.

Neutral Sentiment – expression of an opinion neither positive nor negative ok, acceptable, can live with, etc.: I’m ok with respect to X’s latest product offerings.

Weak Negative Sentiment – expression of a weakly negative opinion bad, don’t like, etc.: I don’t enjoy working with company X.

Strong Negative Sentiment – expression of a strongly negative opinion hate, horrible, terrible, unusable, etc.: Their office suite is horrible.

How it Works: Sub-classify Sentiment Entities

Voice of the customer domain fact extraction

Major Problem - expression describes an impediment with no work around crashes, fails, etc.: Your database installer crashed my computer.

Minor Problem - expression describes an impediment with work around reboot, slows down, etc.: Running X in the background seems to slow down my computer.

General Request - request for an enhancement to an existing product or service would like, please create, etc.: I would like a product that will handle my SQL data.

please make x do y, would like, etc.: I would like to have an XI plugin for Excel.

Contact Request - request for direct and immediate contact Send me information on Text Data Processing.

Call me now at 555-1212.

Contact Info - Phone numbers or e-mails associated with a contact request Call me now at 555-1212.

How it Works: Identify problems or requests

Voice of the customer domain fact extraction

Ambiguous: words and phrases that are pejorative only in certain contexts Those hooligans threw toilet paper on my lawn.

Unambiguous: words and phrases that are always pejorative I cannot express how angry I am with this asshole.

Weak Positive: extracts emoticons conveying weak positive sentiment Loving my new BlackBerry! No iPhone needed over here.

Strong Positive: extracts emoticons conveying strong positive sentiment The show was hilarious :-D

Weak Negative: extracts emoticons conveying weak negative sentiment I hate this phone I'm using :-(

Strong Negative: extracts emoticons conveying strong negative sentiment The Dow Jones fell 200 points :-(((

How it Works: Identify profanity or emoticons

Voice of the customer domain fact extraction

Step 3: Design Models with SAP HANA Studio

Use predefined information

models

Take advantage of standard SAP

CRM integration with campaigns,

promotions, and service entries

Mash-up of correlation analysis

between campaigns, promotions,

and service data with text data

Step 4: Create Insight to Action Views

Sentiment Cockpit – overview dashboard

SAP BusinessObjects Explorer – all

categories on mobile (Apple iPad)

Exploration view – sentiment details,

including geo-locations

Sentiment Analysis

Information Views

Key Screens

SAP HANA – UI and tool kit (HTML5)

Exploration Views in SAP BusinessObjects

Explorer (optional)

SAP Lumira (Visual Intelligence) (optional)

Implementation

Testing

Key User Training

Successful rollout and adoption

Configuration documentation

Rapid Deployment of SAP HANA Sentiment Intelligence

with estimated project duration of 5 to 7 weeks

Start Run Deploy

Expectations

1 3

Project management

Kick-off workshop participation

Preparing technical infrastructure

Mutually-approved scope document

Working SAP systems

User-acceptance testing

Onsite and remote support

Superior support to ensure smooth

functioning

2

Note: This slide represents a typical deployment. Exact details may differ according to solution.

Results

Customer Examples

“ ”

Mantis Technology Group – Internet

Industry Software solution provider specializing in enterprise custom services for online

retailers & high transaction volume provision systems Product: Pulse Analytics – Social Media Analytics By SAP HANA One (Cloud)

Business Challenges

Offer rapid analysis of social media channels to track consumers and influencers and measure

brand against industry metrics

Scale social media analytics service offering to handle ever increasing volumes of data cost-

effectively

Technical Challenges

Reduce the ETL load times to deliver real-time analysis

Analyze large volumes of social media data – more than 1M documents daily

Lower cost of managing cluster of 18 Text Analysis XI and 3 MySQL servers

Benefits

New real-time analytical capabilities allow for visual presentation of data that is free from previous

performance-based constraints

Faster natural-language-based sentiment analysis with topic identification

Data Architecture simplification by replacing 20+ separate servers with 1 instance of SAP HANA

One

Significant

Simplification Moved from 23 servers

to 1 SAP HANA One

server

99%

reduced ETL times

6x faster Text analysis

processing

We can get close to an order of magnitude improvement in performance, additional headroom, access to new practical capabilities (as a

result of the performance improvements) AND… still save money!

Doug Turner, CEO of Mantis Technology Group

M8kng Snse of Txt Msgs

Text Analysis

Challenge: Automatically process

customer requests from email

Solution: Use text data processing to

classify unstructured text, extract the

essential information, analyze, and

take action

Benefits:

Faster and more accurate response to

customers’ problems and requests

Increase customer satisfaction and reduce

call center costs

Extracting Real-Time Information from Text

Text Analysis

Challenge: Deliver personalized

insight to financial professions in

real-time

Solution: Use text analysis to

extract relevant data from news

feeds & combine with financial

data

Benefit: Faster and more

accurate decision making ahead

of the market

Thank You!

SAP Big Data Webinar Series

Presented by: Marie Goodell, SAP - [email protected]

© 2013 SAP AG. All rights reserved.