33
IBM Watson Content Analytics Discover Hidden Value in Your Unstructured Data

IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured Data

Embed Size (px)

Citation preview

IBM Watson Content AnalyticsDiscover Hidden Value in Your Unstructured Data

2

ABOUT PERFICIENT

Perficient is a leading information

technology consulting firm serving

clients throughout North America.

We help clients implement business-driven technology

solutions that integrate business processes, improve

worker productivity, increase customer loyalty and create

a more agile enterprise to better respond to new

business opportunities.

3

PERFICIENT PROFILE

• Founded in 1997

• Public, NASDAQ: PRFT

• 2013 revenue $373M

• Major market locations:

Allentown, Ann Arbor, Atlanta, Boston, Charlotte,

Chicago, Cincinnati, Columbus, Dallas, Denver,

Detroit, Fairfax, Houston, Indianapolis, Lafayette,

Milwaukee, Minneapolis, New York City, Northern

California, Oxford (UK), Philadelphia, Southern

California, St. Louis, Toronto, Washington, D.C.

• Global delivery centers in China and India

• >2,600 colleagues

• Dedicated solution practices

• ~90% repeat business rate

• Alliance partnerships with major technology vendors

• Multiple vendor/industry technology and growth awards

4

INDUSTRIES Healthcare

Financial Services

Life Sciences

Retail & Consumer Goods

Automotive & Transportation

High Tech

Telecom

Energy & Utilities

Manufacturing

Media & Entertainment

PORTALPortal Frameworks

SearchSecurityWeb AnalyticsWeb Content Management

Social & CollaborationMobilityExperience Design

INTEGRATIONIntegration Frameworks

Cloud ArchitectureReference Architecture

Application IntegrationEnterprise Application IntegrationService Oriented Architecture

Process & Content IntegrationBusiness Process ManagementComplex Event ProcessingRules Engines

DATA & CONTENTBusiness Analytics

Business IntelligencePredictive AnalyticsReporting

Structured Data ManagementData Integration, Quality & GovernanceEnterprise Data WarehouseMaster Data Management

Unstructured Data ManagementBig DataContent IntelligenceContent Management

Enterprise Search

CUSTOMER EXPERIENCECustomer 360

Multi Channel EnablementRelationship ManagementSocial Engagement

CommerceMarketing Strategy ImplementationOrder ManagementSupply Chain ManagementService & Support

Sales & Service SupportCustomer Service, Sales Force Automation

Experience DesignStrategic Roadmaps & Envision Workshops User Research & Metrics AnalysisCreative & Interaction DesignCustom & Responsive UI Development

Management Consulting

BUSINESS OPERATIONSCorporate Performance Management

Budgeting, Forecasting & PlanningBusiness Analysis & Predictive Analytics

Enterprise Business SolutionsOracle EBSVertex Tax Solutions

Human Resource SolutionsEmployee Portals Human Resource ManagementTalent Management

Enterprise Social PlatformsSocial StrategyLync Unified CommunicationsOffice 365

Management Consulting

OUR SOLUTIONS PORTFOLIO

5

INTRODUCTIONS

Christine LivingstonSenior Project Manager, Enterprise Content Intelligence, Perficient

Christine Livingston leads Perficient's

Advanced Case Management and Watson

Content Analytics practice and works in

collaboration with IBM to develop industry-

leading solutions that incorporate IBM's case

management, information lifecycle

governance, enterprise content management,

and business process management

technologies.

David MeintelDirector, IBM Channel Development ,

Executive – Healthcare, Perficient

David Meintel has 20 years of experience in

data warehousing and analytics in a wide

range of industries. He has worked with a

large number of provider and health plan

organizations to assist them in better

leveraging their data assets. David manages

the healthcare organizations relationship with

IBM and assists in the development of new

offerings for healthcare clients.

6

AGENDA

• Introduction

– Why do we need Content Analytics?

– Where does Watson Content Analytics fit?

• Overview

– How is unstructured content analyzed?

– How is Watson different?

– How is it applied?

• Healthcare Examples

• Healthcare Accelerators

• Demonstration

7

“There were 5 exabytes of

information created between the

dawn of civilization through

2003, but that much information

is now created every 2 days,

and the pace is increasing.”

Google CEO Eric Schmidt, August 2010

8

BIG DATA: WHY WCA?

90% of the world’s data was created in the

last two years

80%of the world’s data today is unstructured

1 Trillionconnected devices

generate 2.5 quintillion bytes

data / day

CONTENT

Volume

12 terabytes of Tweets created

daily

Velocity

5 million trade events per

second

Variety

Structured, unstructured, multimedia,

textVeracity

Uncertainty from

inconsistency, ambiguities

15 petabytes of new

information daily

80% information

growth is unstructured content……

9

STRUCTURED VS. UNSTRUCTURED DATA

Column Value

Patient Joe Brown

Date of Birth 02/13/1972

Date Admitted 02/05/2014

Structured Data

High Degree of organization, such as

a relational database

“The patient came in complaining of chest

pain, shortness of breath, and lingering

headaches…smokes 2 packs a day… family

history of heart disease…has been

experiencing similar symptoms for the past

12 hours….”

Unstructured Data

Information that is difficult to organize

using traditional mechanisms

10

WATSON CONTENT ANALYTICS OVERVIEW

80% 100%of enterprise content

is unstructured

of social content

is unstructured

Watson Content Analytics mines unstructured content to provide a

holistic and contextual understanding – the “Why” behind the “What”

• Analyzing structured data only

gives you a partial view of the

world around you

• Only 20 percent of enterprise

content is structured

• Data analytics gives you the who,

what, where and when of a subject

• Mining unstructured content

gives you a comprehensive

understanding of the world

around you

• 80 percent of enterprise content

is unstructured

• Content analytics distinctively

adds the why and the how

What is happening? Why is it happening?

11

ANALYZING UNSTRUCTURED CONTENT

explorer

India

In May

1898

India

In May

celebrated

anniversary

in Portugal

In May, Gary arrived in India after

he celebrated his anniversary in

Portugal

Portugal

400th

anniversary

celebrated

Gary

In May, 1898 Portugal celebrated the

400th anniversary of this explorer’s

arrival in India

This evidence suggests

“Gary” is the answer

BUT the system must

learn that keyword

matching may be weak

relative to other types of

evidence

arrived in

arrival in

Legend

Keyword “Hit”

Reference Text

Answer

Weak evidenceRed Text

Answering complex natural language questions requires more than keyword evidence

12

THE WATSON DIFFERENCE:

27th May 1498

Vasco da

Gama

landed in

arrival in

explorer

India

Para-

phrases

Geo-

KB

Date

Match

Stronger evidence can

be much harder to find

and score …

… and the evidence is still

not 100% certain

Search far and wide

Explore many hypotheses

Find judge evidence

Many inference algorithms

On the 27th of May 1498, Vasco da

Gama landed in Kappad Beach

400th anniversary

Portugal

May 1898

celebrated

In May, 1898 Portugal celebrated the

400th anniversary of this explorer’s arrival

in India.

Kappad Beach

Legend

Temporal Reasoning

Reference Text

Answer

Statistical Paraphrasing

GeoSpatial Reasoning

LEVERAGING MULTIPLE ALGORITHMS

13

WATSON CONTENT ANALYTICS APPLIED

PreProcessing

NaturalLanguageProcessing

ContentAnalysis

PostProcessing

Color Key

Disease / Not a Disease

Symptom / Not a Symptom

Drug / Dosage

Patient / Doctor

Procedure

• Language Identification• Lexical Analysis• Classification• Disambiguation• Entity Extraction• Fact Extraction• Concept Extraction• Relationship Extraction• Inferencing

14

WATSON CONTENT ANALYTICS APPLIED

• Accurately identify and extract facts from text including negation

– “55%” = LVEF

– “Patient does not show signs” = Negative Symptom

• Accurately interpret and assign values to ambiguous statements

– “Shows slightly elevated levels” = If condition A=10%, if condition B=20%

• Infer meaning from non-contextual content

– “Cut back from two packs to one per day” = Smoker

• Cleanse, enhance and normalize raw data

– “Myocardial infarction” and “heart attack” = equal same thing

– Enhance or augment by assigning correct RxNorm, SNOMED, ICD-10 or other codes /

terminology

• Preserve and structure facts and concepts from contextual content

15

HEALTHCARE USE CASES

16

READMISSION PREDICTORS AT SETONTHE VALUE OF UNSTRUCTURED DATA

The Data We Thought Would Be Useful … Wasn’t

• Structured data not available, not accurate enough, without the unstructured data - which was more trustworthy

What We Thought Was Causing 30 Day Readmissions … Wasn’t

• 113 possible candidate predictors expanded and changed after mining the data for hidden insights

New Hidden Indicators Emerged … Readmissions is a Highly Predictive Model

• 18 accurate indicators or predictors

Predictor Analysis % EncountersStructured Data

% Encounters

Unstructured Data

Ejection Fraction (LVEF)

2% 74%

Smoking Indicator 35%(65% Accurate)

81%(95% Accurate)

Living Arrangements <1% 73%(100% Accurate)

Drug and Alcohol Abuse

16% 81%

Assisted Living 0% 13%

49% at 20th percentile

97% at 80th percentile

17

1. Jugular Venous Distention Indicator

2. Paid by Medicaid Indicator

3. Immunity Disorder Disease Indicator

4. Cardiac Rehab Admit Diagnosis with CHF Indicator

5. Lack of Emotion Support Indicator

6. Self COPD Moderate Limit Health History Indicator

7. With Genitourinary System and Endocrine Disorders

8. Heart Failure History

9. High BNP Indicator

10. Low Hemoglobin Indicator

11. Low Sodium Level Indicator

12. Assisted Living

13. High Cholesterol History

14. Presence of Blood Diseases in Diagnosis History

15. High Blood Pressure Health History

16. Self Alcohol / Drug Use Indicator

17. Heart Attack History

18. Heart Disease History

0123456789

101112131415161718

0 1 2 3 4 5 6

Ran

kingofStrengthofModelV

ariable

ProjectedOddsRa o

18 17 16 15 14 13 12 11 109 8 7 6 5 4 3 2 1

READMISSION PREDICTORS AT SETONTOP 18 FACTORS

18

• Top indicator JVDI not on the original list of 113 - as well as

several others

• Assisted Living and Drug and Alcohol Abuse emerged as

key predictors - only found in unstructured data

• LVEF and Smoking are significant indicators of CHF but not

readmissions

• A combination of actionable and non-actionable risk factors

READMISSION PREDICTORS AT SETONNEW INSIGHTS UNCOVERED BY COMBINING CONTENT & PREDICTIVE ANALYTICS

19

RADIOLOGY DIAGNOSIS NOTES

Case: Patient sent for a chest scan to determine if pneumonia exists.

Radiologist examines scan results and documents findings in a combination of

structured and unstructured data.

No sign of fluid or other indicators of pneumonia are present within patient. Observed

suspicious dark area that should be followed up with primary physician for further diagnostics.

Without content analytics reviewing

unstructured notes, the secondary

finding might go unnoticed leading to

further complications.

20

MEMBER CHURN FOR HEALTH PLANS

Call center operator fields calls from Members on a wide range of topics. Operator

documents important aspects of the call in the notes as call is resolved.

Leverage content analytics to analyze unstructured notes to identify members with a

high-risk of changing health plans.

Case: Desire to reduce Member Churn by

identifying unsatisfied Members

21

WCA AND EPIC INTEGRATION

• Care providers are adopting electronic medical records but traditional doctors’ notes still play an important role in tracking & managing patients

• Q1 2014: Integration testing with the Epic EMR 2014 release & IBM Advanced Care Insights for Natural Language Processing (NLP) has been successfully completed, solidifying leadership of both companies in their respective markets

What’s new?

• Traditionally a manual process, IBM’s software can analyze doctors’ notes & transform them into a format that can be readily uploaded into the patient record, including automatically adding industry standard diagnosis & treatment codes

• Allows doctors to accurately capture information from unstructured text in real-time, to improve patient outcomes & simplify administrative processes

What value does this provide?

• Empowers the Health Systems that have adopted Epic to capture actionable insight from IBM’s NLP capabilities – the same technology utilized in the revolutionary Watson cognitive system

What does this mean?

22

ACCELERATORS &

DEMONSTRATION

23

HEALTHCARE ACCELERATORS

• Problems

– Result of a series of interim annotations that identify diseases, symptoms, and disorders

– Normalize to standard terms and standard coding systems including SNOMED CT, ICD-9, HCC, CCS

– Capture timeframes of the problem

• Past or current problem

– Determine confidence

• Positive, Negative, Rule Out

• Negation example

• “abdominal pain”

• Procedures

– Identify compound procedures

– Normalize to standard terms and standard coding systems including SNOMED CT, CCS, CPT

– Capture timeframes of the procedure

• Medications

– Series of interim annotations that identify drugs, administrations, measurements

– Normalize to standard terms RxNorm

• Demographic and Social

– Patient Age

– Living Arrangement

– Employment status

– Smoking status

– Alcohol use

• Compliance & Noncompliance

– Patient's history of medication compliance

with directions such as "take all doses,

even if you feel better earlier“

– Noncompliance - Patient's history of

medication noncompliance with

directions.

• Labs results

– Type of lab test performed, unit of

measure, result value

• Ejection Fraction – in support of CHF use

cases

• Coding Systems – can identify these codes

– CPT

– CCS

– HCC

– NDC ( National Drug Codes)

24

WCA INTERFACES

Content Analytics Miner – Primary user interface for unstructured content analysis. Provides

customizable dashboard views to identify deviations, trends, patterns, etc.

Content Analytics Studio – Eclipse interface to create custom UIMA annotators, rules,

dictionaries, noise filters, etc.

25

IBM WATSON CONTENT ANALYTICS

26

THINGS OF INTEREST

10 Tech Trends Impacting Healthcare in 2015

Download the Trend Guide Today

www.Perficient.com > Thought Leadership > White Papers

HIMSS15 – April 12-15, 2015 – Chicago, IL

Our healthcare experts will be on hand to answer your questions.

Email [email protected] to set up a meeting, or stop by booth #4460.

27

QUESTIONS?

28

THANK YOU

For more information contact:

Christine [email protected]

David Meintel

[email protected]

29

APPENDIX

30

WCA DATA SOURCES (BUILT-IN CRAWLERS)

• Content Management

– IBM Content Manager

– EMC/Documentum

– FileNet P8

– SharePoint

• Database Platforms

– DB2

– IBM IMS

– Microsoft SQL Server

– MySQL

– Oracle

– Sybase

• IBM Case Manager

• Email, File Systems, Web

– Microsoft Exchange Server

– UNIX file systems

– Windows 2008 Server

– Web servers

• WebSphere Portal

– IBM WebSphere Portal

– IBM Web Content Manager

– Lotus Quickr

• IBM Connections

• Lotus Domino

– Lotus Notes

– Lotus Quickr for Domino

31

WCA INTEGRATION• Cognos

– Dynamically search and explore content for new business insight

– Quickly generate Cognos BI reports

– Integrate unstructured content with structured content to deliver key

insight

• SPSS Analytics Systems

– Combines the power of analyzing the past and present with the

predictive analysis capabilities of SPSS

• IBM Content Classification

– Improve search quality to return highly relevant documents

– Train and improve classification in ICC by exporting from WCA

• Advanced Case Management

– Performs full text index unstructured analytics on content objects

– Provides a connector that crawls case folders indexing both metadata

and document contents

32

WCA – HOW IT WORKS

Analyzed Content (and Data)

“Owner” “reports” “check engine lite”“flashes” “after refueling”

...

Source InformationCorporate (Contact Center, Test Data, Dealer notes, ECM, etc.) and External (NHTSA, Edmunds, Consumer Reports,

MotorTrend etc.)

Noun Verb Noun Phrase Prep Phrase

Person Issue Warning Driver action

Component Issue: “Engine Light”Situation: “Refueling”

ExtractedConcept

Content AnalyticsUIMA Pipeline + Annotators

Fine grain control over the entities and facets that are created

Content Analytics Crawlers

IBM Master Data Mgmt

RDB

Real-time NLP REST API

Content Push API

33

WCA v3.5 SYSTEM ARCHITECTURE

DocumentCache

Raw DataStore

Scheduler LoggingControl ConfigurationMonitor Security

Common Infrastructure

Exporter

Crawler Framework

ThumbnailIndex

Facet CountSub Index

TaxonomyIndex

SearchIndex

CustomCrawler

QuickPlaceCrawler

Domino Doc Crawler

NotesCrawler

SharePointCrawler

ExchangeCrawler

NNTPCrawler

DB2Crawler

ContentIntegratorCrawler

DB2Content Mgr

Crawler

FileNet P8Crawler

WebCrawler

Seed ListCrawler

WebContent Mgr

Crawler

WebSpherePortal

Crawler

Agent forFile System

Crawler

Global Processing

Web LinkAnalysis

ThumbnailGeneration

Collection

Export

Plu

g-in

Contents Miner UI

Admin UI

EnterpriseSearch UI

RESTApplication

Real-time NLPApplication

Document Processor

Document Processor

Document Processor

ParserDocument Generator

Annota

tor

Annota

tor

Annota

tor

UIMA

Text Analytics& SearchRuntime

Inspector

CustomPoint

RDB

Cra

wle

r

Plu

g-in

JDBC DBCrawler

Win FSCrawler

Unix FSCrawler

Importer Framework

CSVImporter

Case MgrCrawler

DocumentCategorizer

DocumentCluster

Term ofInterest

SIAPIApplication

CA Studio

Cognos BIIntegration

Cognos BI

XML CSV

CSV

Social MediaCrawler

RDF

Indexer

Indexer Service

RDFStore