19
Content Analytics Insights from Unstructured Data Mayank Tyagi April 09, 2015

Content analytics

Embed Size (px)

Citation preview

Content Analytics Insights from Unstructured Data

Mayank Tyagi

April 09, 2015

CONTENT ANALYTICS UNLOCKS BUSINESS VALUE

FROM UNSTRUCTURED CONTENT

DELIVERING ANSWERS TO IMPORTANT QUESTIONS

VIA SEMANTIC TECHNOLOGIES

Business Need

A large percentage (estimated at 80% or more) of the information in a company

is maintained as unstructured content, which includes valuable assets such as

emails, customer correspondence, free-form fields on applications, wikis, blobs

of text in a database, content in enterprise content repositories, social media

posts, and messages of all kinds. Because this content lacks structure, it is

difficult to search and analyze it without extensive effort and automation

Structured vs Unstructured Data

Column Value

Patient Joe Brown

Date of Birth 02/13/1972

Date

Admitted

02/05/2014

Structured Data

High Degree of organization,

such as a relational database

Unstructured Data

Information that is difficult to organize

using traditional mechanisms

“The patient came in complaining of

chest pain, shortness of breath, and

lingering headaches…smokes 2

packs a day… family history of heart

disease…has been experiencing

similar symptoms for the past

12 hours….”

Big Content

• Beyond conventional Big Data, there exists a

tsunami of information in the big data

universe that has largely remained untapped

• Big Data has morphed into a world of

unstructured machine-generated data and

human-generated content that is referred to

as ‘Big Content.’ for example, chat logs,

emails, documents, sales and service notes,

CRM case notes, support tickets, weblogs,

social media feeds, and more

Content Analytics

Content analytics is the act of applying

business intelligence and business analytics

practices to this Big Content

Companies use content analytics software to

provide visibility into the amount of content

that is being created, the nature of that

content and how it is used. This contextual

value-adding information has remained

under-used due to lack of recognition and

inadequate technologies

Big Content

Content Analytics approach leverages multiple algorithms to draw patterns and

identify insights from unstructured data

Content analytics solution

processes textual data in ways

that help to search, discover,

and perform the same

analytics on textual data that

is currently performed on

structured data in a business

intelligence style of

application.

With Content Analytics

Solutions, unstructured data

can be used in ways that were

only previously attainable from

structured data sets

Analyze unstructured

content 1

Content Analytics delivers new

business understanding and

visibility from the content and

context of textual information. For

example, it can identify patterns,

view trends and deviations over

time, and reveal unusual

correlations or anomalies. It can

explain why events are occurring

and find new opportunities by

aggregating the voices of

customers, suppliers, and the

market.

Better business

understanding & visibility 2

Tool for reporting

statistics and deriving

actionable insights.

With Content Analytics,

solutions, we can define

many facets (or aspects) of

your data, with each facet

potentially leading to

valuable insights for various

users.

Content Analytics brings the

power of business intelligence

to the entire enterprise

information, not just structured

information(which is less than

20% of the entire enterprise

repository)

3

Content Analytics Solutions

Text Analytics or Natural Language

Processing were a set of linguistic, statistical,

and machine learning techniques that allow

text to be analysed and key information

extraction for business Integration.

However, it gave only answer to who, what,

where and when of a subject? The why was

left to subjective assessment only

Traditional Approach – Text Analytics

Evolution of Content Analytics

Contemporary Solution – Content Analytics

• Content Analytics (Text Analytics + Mining) refers to

the text analytics process plus the ability to visually

identify and explore trends, patterns, and

statistically relevant facts found in various types of

content spread across internal and external content sources.

• Content analytics distinctively adds the why and

the how and provides a comprehensive

understanding of the world around the subject

Identify meaning, trends, patterns, preferences, tastes,

from text for better business decision making

Understand the customers on a granular level primarily

due to to semantic and sentiment analysis

Extract more value from your social media community

by build a richer profile of each person on customer

database

Quickly identify trends amongst the customer base by

filtering and giving structure to the data

Reuse and curate content by analysing and curating

content from partner organisations and external sources

that are pertinent to the target market

Customer-centric marketing: As content analytics can

determine the interests of individual customers &

prospects, so, for each person the content that is most

relevant to them can be customized and personalised

propositions can be delivered

Content Analytics complements business intelligence to provide a more detailed

and accurate understanding of market and customer needs

€ Content

Analytics

1

2

3

4

5

6

Key Benefits of Content Analytics

• 90% of the world’s data was created in the last two years

• 5 million trade events per

second

Key Challenges of Content Analytics

Beyond Volume, Variety and Velocity is the Issue of Big Data Veracity

Velocity

Challenges of Content Analytics

• 1 Trillion connected devices generate 2.5 quintillion bytes data / day

• 12 terabytes of Tweets created daily

Volume

• With big data there is a tendency for errors to snowball e.g. user entry errors, redundancy and corruption all provide uncertainty & ambiguity to quality of data

Veracity

• Structured, unstructured, multimedia, text; varied content creation

• 80% of the world’s data today is unstructured

1

3

2

Variety 4

Content Analytics is used in many verticals and for various applications solving

varied business needs

Note: *This is just a representative list to showcase the capabilities of content analytics and not exhaustive

Usage of Content Analytics Solutions* Examples of Business Problems that can be

addressed

Market intelligence

Case management

Compliance

Risk scoring

“What features of our Banking Services

are most liked/hated by our customers?”

Financial Services

Scientific discovery

Bio-surveillance

Clinical trials

Healthcare and Life Sciences

Digital asset management

Content mining

Contextual advertising

“What caused this recent drop in sales

for Product X?”

Media and Advertising

Industry Solutions

Security

Intelligence

Digital library services

E-learning

Education

and Govt.

“Give me a media profile of Mr. X

including Trends, Quotes, Roles,

Contacts etc. “

“Which regulatory causes and sentences from Past have hindered the objective of universal education?”

Content Analytics Solutions -

Industry Overview

Industry Overview

› Content Analytics solutions are usually evolutionary products

of Enterprise Content Management Solutions providers. These

solutions enable the management of business information

throughout the content lifecycle, from creation to disposition.

As a technical architecture, ECM consists of a platform or a set

of applications that interoperate but that can be sold and

used separately.

› Content Analytics and ECM market will grow from $5.1 billion

in 2013 to over $9.3 billion in 2017, at a CAGR of 16% over the

period.

› Leading providers of content analytics solutions are IBM,

Open text, EMC, Perceptive Software, Hyland, Microsoft and

Oracle. Several other new entrants such as Xerox, Alresco and

Newgen Software have also developed solutions which are

rated highly by industry experts and labeled as visionaries by

IT research firms such as Gartner.

• Content Analytics market includes key players that provide purpose-built and job-

aligned offerings, including case management, composite content applications

and customer communications management. Key assessment of leading players

in the Content Analytics market are detailed below.

Key Players

Strengths

Wide variety of content management and related capabilities, from content ingestion to archiving

Deep analytics and business intelligence tools

Weaknesses

IBM's greatest strengths also poses its greatest challenge: Breadth of its products may make it hard for customers to understand where to start or how to extend their current offerings

Strengths

• Open Text's relationship with SAP provides a firm foundation for expansion and has enabled it to command a strong position in markets where SAP is strong.

Weaknesses

• Complicated architecture

• High Pricing

• Poor after-sales support

Strengths

Extensive content management stack that includes most ECM elements

Customized industry solutions, specifically for the healthcare, life sciences, energy and engineering sectors

Weaknesses

• Only a limited and tactical solution in applicability

Strengths

• Strong product and solution capabilities

• Deep focus on vertical markets, specialized solutions for healthcare and higher education sectors

Weaknesses

• Increasing fragmentation of its product architecture and a lack of clarity about its road map

• Lack of interoperability

IBM Open Text EMC Perceptive

Software

Strengths Long and

extensive experience in developing content-enabled applications

Solution capability for Mobile and Cloud deployment

Weaknesses • Limited global

footprint with 85% of sales coming from NA

• Limited capabilities to manage sophisticated digital asset management requirements

Hyland

Trends

Increased focus on social media text

analytics as it is creating huge

amount of unstructured data.

Large scale changes in system

architecture as new data-centric

model and solutions will emerge.

Large data will live in persistent

memory and many CPUs/clients will

use shallow hierarchy

Significant benefits from Content

Analytics are likely to continue for at

least 5-10 years more before it

reaches the “Plateau of Productivity”

Future outlook for growth in

the Content Analytics space

will continue to remain bright

as businesses continue to

search for these solutions to

enhance their operational

efficiency and better

understanding of their

current and prospective

customers

Implications

Major Trends in Content Analytics

Annexure

CONTENT ANALYTICS

HOW DOES

WORK

AN EXAMPLE

?

17

Analyzing Unstructured Content – Text Analytics Answering complex natural language questions requires more than keyword evidence

This evidence

suggests

“Gary” is the answer

BUT the system must

learn that keyword

matching may be

weak

relative to other

types of

evidence

18

Analyzing Unstructured Content – Content Analytics

CA approach leverages multiple algorithms to draw patterns and identify insights

Stronger evidence

can be much

harder to find and

score …

… and the evidence

is still not 100%

certain

Search far and wide Explore many

hypotheses Find judge evidence Many inference

algorithms

Thank You