Pingar - The Future of Text Analytics

The future of Text Analytics

Agenda

Who is ChrisThe problemWhat is text analyticsWhy use itHow text analytics evolvedUse cases

The FUTUREWho is Pingar?

Who is Chris

Just learned what Cricket is!

VP Marketing @ Pingar

Author in the area of Content Management

Twitter: @HoardingInfo

Unstructured Data Problem

Unstructured content makes up 80% of all digital content *

The value of unstructured content diminishes exponentially after it is published

Metadata is key to making any use of a document after it is published

ref: AIIM.org 2012

Why use it?

Without metadata the time spent in producing content is lost, and the content posses a risk for the organization

Extracting metadata without text analytics is a manual process, which is expensive and prone to human error and inconsistency

What is Text AnalyticsTechnology that extracts value from unstructured content

Turns documents into Keywords and Entities - Metadata

Transforms unstructured to transactional

Evolution of text analytics

Started appearing around 2003

Initial engines were statisticalAccurate but lots of work

Modern engines use machine learningPower of disambiguation & Linked Data

Several general purpose engines but mostly vertical solutions

Use Cases

Use Cases

Content Migration and Discovery

Content Classification and Organization

Internal Content Publishing

Content Migration & Discovery - ProblemA large oil and gas company in the US was recently sued and lost ($ millions ). Due to poor content control, documents left the organization that should not have.

So the company decided to implement an ECM system. But 90% of the organizations content is stored in a File Share, the “Z” drive and no one knows what is there.

In order to move to ECM they need to quickly analyze the file share to isolate relevant content, and remove that which is not relevant. Also to prepare for migration to ECM.

Content Migration & Discovery - SolutionAnalyze the file share to produce a list of content by type and relationships to other content.

Determine what content is relevant, what content should be removed, and build an information architecture for a proper ECM platform.

Visualize the content based on location, people, etc. to help gain insight and make decisions how to deal with the content to avoid future litigation.

Content Migration & Discovery - Result

• New ECM system with relevant content only

• Purged non-relevant content

• Better control which means less legal risk

• Ability to make better business decisions

Content Classification & Organization - Problem

One of the US’s largest commercial banks produces regular collateral and promotional materials. Because the resulting scripts and media files are poorly organized they are finding they are duplicating effort on future campaigns and losing valuable and expensive content.

They need to improve organization of these assets, and cross pollination of information.

Content Classification & Organization - SolutionBuild a hierarchy of content, a taxonomy to be used to file content. As content is saved to the rich media content repository have it automatically filed according to the taxonomy.

Automatically generate search filters so navigation of the content is more efficient, and fewer documents are missed by the team.

Content Classification & Organization - Result• Users spend 50% less time finding content

• Content is now organized by topic automatically

• Save $750,000 a year in duplicated effort

• Improve idea sharing

Internal ContentPublishing - Problem

One of the worlds largest chemical manufactures has many R&D departments. As new chemicals are invented scientists publish documents discussing the intellectual property of these inventions. The articles are to be published to other scientist so they can use the knowledge to further their research and development.

The system for publishing this content is manual and costly. A high paid chemical scientist has to manually tag and summarize articles before they are saved to a content management system. Scientist have to “search” for content they might find interesting, but they don’t always know what to look for. This is costly, prone to human error, and information is lost.

Internal ContentPublishing - SolutionAutomatically tag, classify, and summarize content as it’s being published by scientists.

Generate emails with summaries and links to articles. Send the emails to scientists based on their profile, showing only content that is relevant to them.

Internal ContentPublishing - Result

• 70% cost reduction in publishing process

• Content is published 150 x faster

• Scientist no longer have to search, content is pushed to them

• The content auditors can focus on other responsibilities

Text Analytics is increasing the value of unstructured content,

reducing risk, and making organizations more efficient

The future

Text Analytics will be a mandatory for all organizations doing unified information access

Machine Learning Engines take over

BigData and BigContent join forces

The need for Language Scientist and Data Scientist increases

Buzz Words: Unified Information, Content Intelligence, BigContent

Who Is

• The Text Analytics Subject Matter Experts

• Helping you make money with a Text Analytics practice