Stacy FaughtIT Business Solutions
11/18/2016
Introduction
GOAL OF THIS PRESENTATION: Define text analytics capabilities to build the business case for creating the service.
2
creating the service.
Ontology Of Text Analytics BUZZ Words
Text AnalyticsText Mining Big Data
Linguistics
Machine
NLP
associated withassociated with
relies upon
sub-discipline ofworks together with
technology tools for
works together with
Faceted Search
enables
3
Taxonomy
SyntaxSemanticsMachine Learning
OntologyThesaurus
Semantic Network
Synonymous with
with
more complex
more complex
used to build
Morphology
Disambiguation
Entity Extraction
Sentiment
Beginnings
Text analytics leverages and learns from massive quantities of textual data to reveal customer intentions and sentiment…
Text Analytics World
4
Text Analytics World
Gaining Interest
Text Analytics is the process of deriving information from text sources.
Gartner
5
Gartner
Text Analytics is a method for making unstructured contentuseful and accessible.
Expert System
It’s All About the CONCEPT
6
Current State - Keyword Searching
All these documents contain the keywords “big cat oil discoveries”. Read ALL the documents to find the ones relevant to you.
7
Keyword Search Not Always Enough
Advantages
Speculative searching (i.e. Where are the best tacos in Houston?)
Finding general information (i.e. Address of Access Sciences’ website)
Disadvantages
8
Large result sets mean not enough time to read all documents
Noisy and irrelevant hits have to be filtered out
Narrowing the question may mean missing a key result
Have to type in all variants of a term
i.e. significant oil discovery, large oil find, >200M barrels???
What Text Analytics DoesTYPICAL SEARCH
Text Analytics occurs here
9
INDEX
Text Analytics occurs hereAnalyzes content & extracts meaningful metadata
Entities
Themes
Sentiment
SMARTER Searching
How it works
Concept:
big cat
Definition: Carnivorous
mammal
Child:
Jaguar
Relationship: Wild
Synonym:
Constraint: Not domestic
Parent: Mammal
Concept:
big cat
Definition: Caterpillar machinery
Child:
Drilling equipment
Relationship: Mining
Synonym: Heavy
Constraint: Not Hitachi
Parent: Equipment
Definition: Large oil discovery
10
Synonym: Feline
Heavy equipment
Concept:big cat
discovery
Child:Shale big cat
Relationship: Elephant
Synonym: Significant
Constraint: >200Mboe
Parent:Oil discovery
Interpreting the meaning of text
• Groups words into meaningful units
• Searches for different forms of words (morphology)
• Searches for words with semantic relationships
sentences Noun groupsMatch entities
verb groupsMatch actions
morphologyMatch different formssemanticsMatch related meanings
11
Match related meanings
Total has confirmed just one “big cat” -- with more than 200 million barrels -- in Bolivia in May 2011 that extends a 2004 discovery.
Shell has discovered oil on three big cat prospects offshore Nigeria, plus a large gas-condensate field in the Norwegian Sea.
Firm makes major Gatwick oil find.
Automatically Extract Known EntitiesPeople OrganizationsPlaces
Total S.A.Europe
France
12
• Entity extraction
Total S.A.– French oil & gas co.
vs.
total - adj. meaning entire
Saudi AramcoRoyal Dutch Shell
Exxon Mobil
Erle P. Halliburton
Charles Holiday– Shell Chairman
vs.
4th of July Holiday
FranceParisMonaco
Tyrrhenian Sea
Oceania
BasinsPlaysFields
Put the Puzzle Pieces Together…
Concepts: Big Cat
Discoveries Prospects
Play
Entities: Organizations
Shell
13
• Faceted search
Play Shale Conventional
Shell Total
Places Africa S. America
Find the Missing Piece
14
Automatically Extract Relevant Facts
15
WHO WHEN WHERE
Total May 2011 Bolivia
Shell July 2005 Offshore Nigeria
UK Oil and Gas April 2015 Sussex
Use Cases
Auto Classification Competitive Intel
16
Auto Classification
Internal Data Sources
In Place Share Drives Legacy Datasets
Migrations
17
• In its simplest form
Migrations Consolidation/Expansion Mergers and Acquisitions
Competitive Intel
External Data Sources
Public Domain Industry Publications Regulatory Reporting
18
Industries and Drivers
Oil & Gas
Pharmaceuticals
Government Agencies
Legal
Competitive IntelResearch
19
Knowledge Areas/Roles
Text Mining Library Sciences Taxonomy Linguistics Foreign Language Technology Domain Expert
20
Domain Expert
Technology Tools
Expert System Cogito conceptSearching Smartlogic Semaphore HP Autonomy Linguamatics I2E
SAS Text Analytics Suite IBM Languageware / Content Analytics Lexalytics Text Analytics Provalis Research QDAMinder / WordStat
21
Provalis Research QDAMinder / WordStat PingarAPI AlchemyAPI Content Analyst Angoss KnowledgeREADER NetOwl Language Computer Corp. Basis Technology MeaningCloud Forest RIM’s Textual ETL
Q & A
22