21
Copyright  © SAS  Institute Inc.  All  rights  reserved. Tuba Islam SAS Global Artificial Intelligence Team AI: Extracting the Hidden Value in Unstructured Data 

AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Tuba IslamSAS Global Artificial Intelligence Team

AI: Extracting the Hidden Value in Unstructured Data 

Page 2: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Unstructured text is the largest human generated data source

16,000,000texts

156,000,000emails

470,000tweets

510,000posts

2,400,000searches

Text generated per minute

Page 3: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Rich textual data is collected across every part of an organization

Call Center NotesCall Center Notes

Live ChatLive Chat

Online ForumsOnline Forums

CRM CommentsCRM Comments

Field NotesField Notes

BlogsBlogs

Survey FeedbackSurvey Feedback

HR DataHR Data

Consumer ReviewsConsumer Reviews

Research & PubsResearch & Pubs

Medical RecordsMedical Records

Online NewsOnline News

Claims & Case NotesClaims & Case Notes

Contract/ApplicationContract/Application

Social NetworksSocial Networks

PROPR

IETA

RY SOURC

ESPR

OPR

IETA

RY SOURC

ESPU

BLIC SOURC

ESPU

BLIC SOURC

ES

Page 4: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

However, unearthing the full potential within these complex data sources can be tricky

• Large data volumes and inconsistent formats

• Multiple sources and languages

• Misspellings, slang, and abbreviations

• Highly subjective to interpretation and context

Language is messy!

Manual review is both inconsistent and time consuming, and a sampling approach can mean missing out on 

valuable information and the big picture

Page 5: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Using technology to scale the human acts of reading, organizing, and quantifying freeform text in meaningful ways.

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Page 6: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Parsing and Information Extraction

Natural Language Understanding  and Natural Language Generation

Automatic Summarization, Search

Topic Detection, Text Clustering and Profiling

Classification (Categories , Sentiment)

Speech to Text

NLP

A branch of artificial intelligence that helps computers understand, interpret and manipulate human language.

Natural Language Processing (NLP)

Page 7: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Topic Discovery, Entity Extraction, Categorization, 

Sentiment Analysis 

Human Input

Emerging Trends, Predictive Analytics,Operational Insights, Automated 

Summarization, Chatbots

Discovery

Context

Machine Learning (ML)

Unstructured Text Data

Data + Technology + Domain Expertise

Page 8: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Hybrid Approach to Text Analytics

Rule Based Supervised

Language modelingPart‐of‐speech tagging

Named entity recognitionSentiment analysisCategorization

Topic rule generationText summarization

TokenizationLemmatization

Part‐of‐Speech taggingConcept extractionSentiment analysisCategorization

Unsupervised

Topic DiscoveryText Clustering

Page 9: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

ArabicChineseCroatianCzechDanishDutchEnglishFarsiFinnishFrenchGerman

GreekHebrewHungarianHindiIndonesianItalianJapaneseKazakhKoreanNorwegianPolish

PortugueseRomanianRussianSlovakSloveneSpanishSwedishTagalogThaiTurkishVietnamese

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Global Language Support

Page 10: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Simple, intuitive and efficient 

interactions to improve customer 

satisfaction

Customer Experience

Page 11: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Improved accuracy & response time

Machine Learning

Automatic logging

Severity assigned

DataSources

Social Media

Call Center Transcripts

Service Calls

+

Natural Language Processing

+Multiple Languages

Entity Recognition

Categorization

Concept Extraction

Sentiment Analysis

Topic Discovery

VisualInterface

+ =Dashboard

Daily Report

Text Analytics for Customer Experience

Page 12: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

SAS Visual Text AnalyticsA modern, flexible and end‐to‐end text analytics framework that combines text mining, 

contextual extraction, categorization, sentiment analysis and search.

Page 13: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Best Practice Pipelines

Page 14: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Text Parsing

Page 15: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Predefined Concepts

Page 16: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Automatic Rule Generation

Page 17: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Topic Discovery

Page 18: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Automated Scoring

Page 19: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Adverse Event Analysiswith SAS Text Analytics

ProblemBetter understand and manage adverse reactions patients are experiencing due to vaccinations

DataUnstructured “case notes” found in the publicly available VAERS data

SolutionUse SAS to create structure from unstructured text enabling better reporting and exploration

Page 20: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Page 21: AI: the Hidden Value in Unstructured Data · data sources can be tricky • Large data volumes and inconsistent formats • Multiple sources and languages • Misspellings, slang,

Copyr ight  ©  SAS   Inst i tute   Inc.  Al l   r ights   reserved.

Thank You.Tuba Islam

linkedin.com/in/tubaislam/@tubaislam

www.sas.com/vta