Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Big Data - Introduction to SAP Big Data Technologies
Big Data - Streaming Analytics
Big Data - Smarter Data Virtualization
Big Data - Gain New Insight from Hadoop
Big Data - Spatial Data Processing for Richer Insights
Big Data - Text Analytics
SAP Big Data Webinar Series
© 2013 SAP AG. All rights reserved.
Speaker Introductions
Marie Goodell is a Senior Director on the Big Data Solution
Marketing team at SAP. She specializes in Big Data topics
such as text analytics, spatial analytics, and Hadoop. In this
role, she is engaged with customers and partners on Big Data
projects that deliver unprecedented insight for real-time
business. Prior to SAP, Marie worked in various marketing
leadership roles at Adobe, Oracle and IBM. Marie holds an MS
from the University of Minnesota. Follow Marie at @mhgoodell
on Twitter.
3 © 2013 SAP AG. All rights reserved.
Presented by: Marie Goodell, SAP
Gain Unprecedented Insight with Text Analytics
SAP Big Data Webinar Series
© 2013 SAP AG. All rights reserved.
There is no doubt that the amount of
Big Data created, collected &
managed grows more each day
Traditional content types, including
unstructured data, are growing by
up to 80% per year.
What if you could combine
business data with unstructured content….
Web content
Call centers
Social networks
Data streams
Unified Information Access from SAP
Gaining insight from text sources
Unified
Information
Access
Platform
Integrate large volumes of structured,
semi-structured and unstructured data
Process information in real- time for
rapid analysis and decision making
Combine search, text analysis, and
database processing for ad hoc analysis
Offer traditional business intelligence
(e.g. exploration, dashboards,
visualizations, reports)
SAP HANA
Real-time Replication Services Data Services In-Memory Database
Planning and Calculation Engine R and Hadoop Integration Predictive Analysis & Business Function
Libraries
Information Composer and Modeling Studio Spatial Processing Text Search & Text Analysis
SAP HANA
Platform for a new class of real-time analytics & applications
Real-time Applications
Real-time Analytics
Real-time Platform
Operational Reporting
Core Business Acceleration
Database
Data Warehousing
Planning and Optimization
Mobile
Predictive & Text Analytics on
Big Data
Sensing
and Response
Cloud
SAP HANA Search
Search structured and unstructured text in single platform
Native search 1 infrastructure for both unstructured
content and structured data (OLAP or OLTP)
Full-text, fuzzy, free-style
Graphical modeling Easy to use search definition
Built into existing tools
Information Access toolkit Rapid development of search-based
applications through reusable UI building blocks
Capabilities: SAP HANA Info Access includes:
UI & Client Library Toolkit
Application
SAP Services
Allows for visualization and interaction of data in SAP HANA full-text searching, filtering, drill-down, charts)
Supports iPad (download from Apple App Store) and Browser (HTML5) deployments
Consumes SAP HANA models
NOT a general purpose BI tool!
Benefits: Quick development and deployment time
Low TCO and fast response times with 2-tier architecture
Included with SAP HANA license
SAP HANA Info Access
Configuration toolkit for search-based applications
SAP HANA Text Analysis
Unlock key information from text to drive business insight
Acquire Unstructured Text
Once structured it can be…
• Integrated
• Queried
• Analyzed
• Visualized
• Reported against
1.Extract meaning
2.Transform into structured
data for analysis
3.Cleanse and match
SAP HANA
Unlocks key information from
text sources to drive business
insight
SAP HANA Text Analysis
Filter files, extract meaning, structure, and analyze
File Filtering Unlock text from binary documents
Extract & process unstructured text data from popular file formats (txt, html, xml, pdf, doc, ppt, xls, rtf, msg, etc.)
Native Text Analysis Expose linguistic markup for text mining uses
Classify core entities (people, companies, things, etc.)
Identify domain facts (sentiments, topics, requests, etc.)
Supports up to 31 languages for linguistic mark-up and extraction dictionary and 13 languages for predefined core extractions
Transform to Structured Data Query / Analyze
Visualize / Report
Core and Domain Extraction
Extract 2 sorts of elements from text
Core Entities: Davey Jones was one of the Monkeys.
<PERSON> Davey Jones </PERSON> was
one of the Monkeys.
Domain Facts: I love your product.
I <STRONGPOSITIVESENTIMENT> love
</SPS><TOPIC> your product </TOPIC>.
This is not a keyword search. Text Data
Processing applies full linguistic and statistical
techniques to make sure the entities which
get returned are correct.
Grammatical Parsing • Can we bill you?
• Bill was the president.
Grammatical Parsing • I talked to Bill yesterday.
• The duck has a bill.
• The bill was signed into law.
Location ADDRESS1 245 First Street Floor 16
ADDRESS2 Cambridge, MA 02142
LOCALITY Cambridge
REGION@MINOR Napa County
REGION@MAJOR Connecticut
COUNTRY Brazil
CONTINENT South America
GEO_FEATURE Mount Fuji
GEO_AREA Scandinavia
Personal Data NAME_DESIGNATOR c/o, attn
TITLE President
PERSON Barak Obama
PEOPLE Greeks
LANGUAGE Greek
Organization ORGANIZATION@COMMERCIAL AT&T
ORGANIZATION@EDUCATIONAL University of Washington
ORGANIZATION@OTHER FBI
PRODUCT iPhone
TICKER NYSE:SAP
Social Media SOCIAL_MEDIA@TWITTER_ID @SAP
SOCIAL_MEDIA@TWITTER_TOPIC #HANA
Time DATE 2/14/2013
DAY Monday
MONTH June
YEAR 2013
TIME 3:47pm
TIME_PERIOD 3 days, from 9 to 5pm
HOLIDAY Memorial Day
Format CURRENCY 17 euros
MEASURE 217 meters
PERCENT 4%
PHONE 617-677-2030
NIN@US_SSN 522-89-2255
NIN@FR_INSEE xxx
NIN@CA_SIN xxx
URI@EMAIL [email protected]
URI@IP 165.14.2.0
URI@URL http://sap.com
Syntactic Entities NOUN_GROUP big umbrella
PROP_MISC Cup o’ Soup
Predefined Entity Types
Examples
Building a Search-Based Application with SAP HANA
Text search and/or text analysis
Configure App
Use SAP HANA Info Access toolkit to define layout and data for the App
Create Model
Use SAP HANA Studio to define the search data model and configure the search behavior
Run Text Analysis
Extract salient information from text (Linguistic Markup, Entity & Sentiment Extraction)
Create Full-text Index
Use SAP HANA Studio to create full-text indexes for search, file filtering, and optionally run Text Analysis
Consume Data
Search on Text and/or filter, analyze, and perform advanced analytics on Text Analysis table output
SAP HANA Search and Text Analysis
Benefits
For the Business For IT
Exploit Unstructured Data
Ability to extract and analyze information
from unstructured content
Flexibility
Perform text search, text analysis, and
analytics all in one unified platform
Faster Time to Analysis
Achieve faster search and analysis results
by leveraging a high-performing in-memory
platform
Landscape Simplification
Reduces redundant data persistency,
engines, and data movement
Total Cost of Development
One unified platform and model for text
search, text analysis, and analytics
Unified Access Layer
Quickly develop and connect applications to
search and explore data in SAP HANA
SAP Data Services
Text data processing architecture
SAP Data Services
Text Data Processing
Entity extraction
SAP Data Services Designer
TDP Job Set up
Sources
Semantic Layer for
query & analytics
End User Apps or
dashboards ETL Designer
Business User
Entities,
concepts,
sentiments
Data Quality
transforms
Targets
Execute Text Data Processing to Mine Sentiment
Extract valuable data from Hadoop without coding
SAP Data Services
1. Detects the Hadoop data source and pushes down
text analytics query
2. Generates Pig script which is sent
to Hadoop
Apache Hadoop Distribution
3. Initiates MapReduce job to execute text data
processing in Hadoop
4. Starts MapReduce sub-tasks on
nodes where data resides
SAP Data Services
5. Extracts relevant text data
6. Rapidly loads to SAP HANA
v Apache Hadoop
SAP Data Services
1
PIG
Generator
2
Map Reduce 3
v
SAP HANA
5
4
6
Text data processing
Integrate Structured and Unstructured Fields
SAP Data Services (Data Integration)
Input models and serial
numbers (entity extraction)
Integrate Structured and Unstructured Fields (continued)
SAP Data Services (Text Data Processing)
Consumer doesn’t like the gentle action of thew washer, she feels that it should be slower. Tech
explained the operation of the gentle mode, consumer is still not happy and feels STS is falsily
advertisting the product. The washer is operating as designed, gentle mode is the same speed as
normal, however; in the gentle mode the wash time is shorter. A new washer will not solve the issue.
tech stated the unit appears locked but is not. advise to ck or replace wiring to door lock ass'y and ck
the pressure switch.repairable
tech stated while putting on the doors the frt of unit on frzr side is dented in at bottom frame and
doors do not seal. this unit has an air leakage permanent. nonrepairable.
Solenode chattering. Recommend replacing dispenser switch WR23X366.
Identify and Extract Concepts
SAP Data Services (Text Data Processing)
Concepts extracted
from full text
description
Aggregate Concepts In Output
SAP Data Services (Data Quality Management)
Match aggregates
concepts based on
similarity into groups
that can be used for
reporting
When to Use…
SAP HANA (Text Analytics) vs SAP Data Services (TDP)
If you want to… Text Analysis
in HANA
Text Data Processing in Data
Services
Load data into HANA using SAP SLT or a 3rd party ETL tool;
then analyze textual data using text analysis capabilities in HANA
Leverage native search capabilities in HANA in conjunction with text analytics (e.g. search-based and
text mining applications for investigative discovery)
Have HANA automatically re-index frequent changes to text analysis processes (without having to re-
load the data)
Access linguistic markup generated when text is processed, which is persisted in HANA (e.g.
tokenization, uninflected forms / stemming, part of speech)
Have high-performing text analytics in HANA
Not load, store, or process the unstructured text data or documents in HANA (because of cost / space
concerns)
Perform text analytics at the source (e.g. push TDP natively down into Hadoop) to uncover relevant
nuggets of info that can be loaded into HANA
Perform transformations before loading data into HANA (e.g. cleanse, match / de-duplicate and enrich
text data)
Utilize your own custom dictionaries and rules
Support continuous text analytics workloads that are submitted regularly
Expose text analytics as real-time Web Service
Analyze Customer Sentiment
Improve satisfaction and competitiveness
Sentiments/ Opinions
Product perceptions
Buying experience
Service quality
Requests Trends Issues/ Problems
Topics/ Contexts
Gain unvarnished insights and direct touch with your most
vocal customers regarding:
Tap the full potential of unstructured data and
social media
Source: http://www.pinnaclecart.com/blog/2012/12/21/6-ways-to-make-social-
media-work-for-your-ecommerce-business
SAP Sentiment Intelligence rapid deployment solution
Accelerate insight from unstructured data
Pre-configured data acquisition from
public social media and other
unstructured text sources, automated text
data processing (NLP)
Integration with SAP CRM campaign /
promotions and service management
SAP HANA HTML5 Information Access
(InA), SAP BusinessObjects Explorer
Views, SAP LUMIRA, Mobile for prebuilt
analytical reporting and action taking,
extraction routines, transformations,
loading, universes, and analytics, based on
pre-defined SAP HANA models
How-to guides and additional service
offering to extend unstructured channels
integration and analytical reporting
From (un)structured data… …to insight!
SAP HANA Info Access, SAP BusinessObjects BI,
SAP Lumira, Mobile
SAP Data Services
SAP HANA with text analysis and models &
views
SAP HANA
Sentiment
Intelligence
Step 1: Acquire Unstructured Data
SAP Data Services: Extraction, transformation,
and loading (ETL) data flow
Designer
Job server
Repository
SAP Data Services Designer
Online source’s API calls with configurable
search parameters
Data Flow with user-defined transforms via
Python
Power user configuration environment
Job Schedule for real time data loading
SAP HANA Text Analysis : System view
Core entity and fact domain extraction
Predefined core entities (who, what,
when, where, etc.)
Customization via dictionaries and rules
Natural language processing (NLP)
Named entity recognition (NER)
Leveraging of the voice of the customer
domain rule set
Sentiment status augmentation on a
detailed entity level
Step 2: Analyze Text within SAP HANA
Leverage voice of customer domain extraction
The following major fact types are classified:
Sentiments: expression of a customer’s feelings about something
Problems: a statement about something which impedes a customer’s work
Requests: expression of a customer’s desire for an enhancement/change
Profanity: defines a set of pejorative vocabulary
Emot-icons: expression of someone's feelings about the whole sentence or situation
Within each of these rules certain sub entities are classified. Any rule may have an associated
TOPIC sub entity which, in addition to the sub entitles described on the following slides, describes
the person, service, product, etc. which the sentiment, problem, or request is about.
I hate this book.
I never received the book.
Please send me a new book.
How it Works: Extract & Classify Sentiment
Voice of the customer domain fact extraction
Strong Positive Sentiment – expression of a strongly positive opinion great, excellent, love, etc.: Barbara loves SAP.
Weak Positive Sentiment – expression of a weakly positive opinion good, nice feature, fine, like, etc.: I like BusinessObjects.
Neutral Sentiment – expression of an opinion neither positive nor negative ok, acceptable, can live with, etc.: I’m ok with respect to X’s latest product offerings.
Weak Negative Sentiment – expression of a weakly negative opinion bad, don’t like, etc.: I don’t enjoy working with company X.
Strong Negative Sentiment – expression of a strongly negative opinion hate, horrible, terrible, unusable, etc.: Their office suite is horrible.
How it Works: Sub-classify Sentiment Entities
Voice of the customer domain fact extraction
Major Problem - expression describes an impediment with no work around crashes, fails, etc.: Your database installer crashed my computer.
Minor Problem - expression describes an impediment with work around reboot, slows down, etc.: Running X in the background seems to slow down my computer.
General Request - request for an enhancement to an existing product or service would like, please create, etc.: I would like a product that will handle my SQL data.
please make x do y, would like, etc.: I would like to have an XI plugin for Excel.
Contact Request - request for direct and immediate contact Send me information on Text Data Processing.
Call me now at 555-1212.
Contact Info - Phone numbers or e-mails associated with a contact request Call me now at 555-1212.
How it Works: Identify problems or requests
Voice of the customer domain fact extraction
Ambiguous: words and phrases that are pejorative only in certain contexts Those hooligans threw toilet paper on my lawn.
Unambiguous: words and phrases that are always pejorative I cannot express how angry I am with this asshole.
Weak Positive: extracts emoticons conveying weak positive sentiment Loving my new BlackBerry! No iPhone needed over here.
Strong Positive: extracts emoticons conveying strong positive sentiment The show was hilarious :-D
Weak Negative: extracts emoticons conveying weak negative sentiment I hate this phone I'm using :-(
Strong Negative: extracts emoticons conveying strong negative sentiment The Dow Jones fell 200 points :-(((
How it Works: Identify profanity or emoticons
Voice of the customer domain fact extraction
Step 3: Design Models with SAP HANA Studio
Use predefined information
models
Take advantage of standard SAP
CRM integration with campaigns,
promotions, and service entries
Mash-up of correlation analysis
between campaigns, promotions,
and service data with text data
Step 4: Create Insight to Action Views
Sentiment Cockpit – overview dashboard
SAP BusinessObjects Explorer – all
categories on mobile (Apple iPad)
Exploration view – sentiment details,
including geo-locations
Sentiment Analysis
Information Views
Key Screens
SAP HANA – UI and tool kit (HTML5)
Exploration Views in SAP BusinessObjects
Explorer (optional)
SAP Lumira (Visual Intelligence) (optional)
Implementation
Testing
Key User Training
Successful rollout and adoption
Configuration documentation
Rapid Deployment of SAP HANA Sentiment Intelligence
with estimated project duration of 5 to 7 weeks
Start Run Deploy
Expectations
1 3
Project management
Kick-off workshop participation
Preparing technical infrastructure
Mutually-approved scope document
Working SAP systems
User-acceptance testing
Onsite and remote support
Superior support to ensure smooth
functioning
2
Note: This slide represents a typical deployment. Exact details may differ according to solution.
Results
“ ”
Mantis Technology Group – Internet
Industry Software solution provider specializing in enterprise custom services for online
retailers & high transaction volume provision systems Product: Pulse Analytics – Social Media Analytics By SAP HANA One (Cloud)
Business Challenges
Offer rapid analysis of social media channels to track consumers and influencers and measure
brand against industry metrics
Scale social media analytics service offering to handle ever increasing volumes of data cost-
effectively
Technical Challenges
Reduce the ETL load times to deliver real-time analysis
Analyze large volumes of social media data – more than 1M documents daily
Lower cost of managing cluster of 18 Text Analysis XI and 3 MySQL servers
Benefits
New real-time analytical capabilities allow for visual presentation of data that is free from previous
performance-based constraints
Faster natural-language-based sentiment analysis with topic identification
Data Architecture simplification by replacing 20+ separate servers with 1 instance of SAP HANA
One
Significant
Simplification Moved from 23 servers
to 1 SAP HANA One
server
99%
reduced ETL times
6x faster Text analysis
processing
We can get close to an order of magnitude improvement in performance, additional headroom, access to new practical capabilities (as a
result of the performance improvements) AND… still save money!
Doug Turner, CEO of Mantis Technology Group
M8kng Snse of Txt Msgs
Text Analysis
Challenge: Automatically process
customer requests from email
Solution: Use text data processing to
classify unstructured text, extract the
essential information, analyze, and
take action
Benefits:
Faster and more accurate response to
customers’ problems and requests
Increase customer satisfaction and reduce
call center costs
Extracting Real-Time Information from Text
Text Analysis
Challenge: Deliver personalized
insight to financial professions in
real-time
Solution: Use text analysis to
extract relevant data from news
feeds & combine with financial
data
Benefit: Faster and more
accurate decision making ahead
of the market
Thank You!
SAP Big Data Webinar Series
Presented by: Marie Goodell, SAP - [email protected]
© 2013 SAP AG. All rights reserved.