Upload
chris-riley
View
1.116
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presentation given by the Pingar India office to Partners in India entering the Text Analytics Market.
Citation preview
The future of Text Analytics
Agenda
Who is ChrisThe problemWhat is text analyticsWhy use itHow text analytics evolvedUse cases
The FUTUREWho is Pingar?
Who is Chris
Just learned what Cricket is!
VP Marketing @ Pingar
Author in the area of Content Management
Twitter: @HoardingInfo
Unstructured Data Problem
Unstructured content makes up 80% of all digital content *
The value of unstructured content diminishes exponentially after it is published
Metadata is key to making any use of a document after it is published
ref: AIIM.org 2012
Why use it?
Without metadata the time spent in producing content is lost, and the content posses a risk for the organization
Extracting metadata without text analytics is a manual process, which is expensive and prone to human error and inconsistency
What is Text AnalyticsTechnology that extracts value from unstructured content
Turns documents into Keywords and Entities - Metadata
Transforms unstructured to transactional
Evolution of text analytics
Started appearing around 2003
Initial engines were statisticalAccurate but lots of work
Modern engines use machine learningPower of disambiguation & Linked Data
Several general purpose engines but mostly vertical solutions
Use Cases
Use Cases
Content Migration and Discovery
Content Classification and Organization
Internal Content Publishing
Content Migration & Discovery - ProblemA large oil and gas company in the US was recently sued and lost ($ millions ). Due to poor content control, documents left the organization that should not have.
So the company decided to implement an ECM system. But 90% of the organizations content is stored in a File Share, the “Z” drive and no one knows what is there.
In order to move to ECM they need to quickly analyze the file share to isolate relevant content, and remove that which is not relevant. Also to prepare for migration to ECM.
Content Migration & Discovery - SolutionAnalyze the file share to produce a list of content by type and relationships to other content.
Determine what content is relevant, what content should be removed, and build an information architecture for a proper ECM platform.
Visualize the content based on location, people, etc. to help gain insight and make decisions how to deal with the content to avoid future litigation.
Content Migration & Discovery - Result
• New ECM system with relevant content only
• Purged non-relevant content
• Better control which means less legal risk
• Ability to make better business decisions
Content Classification & Organization - Problem
One of the US’s largest commercial banks produces regular collateral and promotional materials. Because the resulting scripts and media files are poorly organized they are finding they are duplicating effort on future campaigns and losing valuable and expensive content.
They need to improve organization of these assets, and cross pollination of information.
Content Classification & Organization - SolutionBuild a hierarchy of content, a taxonomy to be used to file content. As content is saved to the rich media content repository have it automatically filed according to the taxonomy.
Automatically generate search filters so navigation of the content is more efficient, and fewer documents are missed by the team.
Content Classification & Organization - Result• Users spend 50% less time finding content
• Content is now organized by topic automatically
• Save $750,000 a year in duplicated effort
• Improve idea sharing
Internal ContentPublishing - Problem
One of the worlds largest chemical manufactures has many R&D departments. As new chemicals are invented scientists publish documents discussing the intellectual property of these inventions. The articles are to be published to other scientist so they can use the knowledge to further their research and development.
The system for publishing this content is manual and costly. A high paid chemical scientist has to manually tag and summarize articles before they are saved to a content management system. Scientist have to “search” for content they might find interesting, but they don’t always know what to look for. This is costly, prone to human error, and information is lost.
Internal ContentPublishing - SolutionAutomatically tag, classify, and summarize content as it’s being published by scientists.
Generate emails with summaries and links to articles. Send the emails to scientists based on their profile, showing only content that is relevant to them.
Internal ContentPublishing - Result
• 70% cost reduction in publishing process
• Content is published 150 x faster
• Scientist no longer have to search, content is pushed to them
• The content auditors can focus on other responsibilities
Text Analytics is increasing the value of unstructured content,
reducing risk, and making organizations more efficient
The future
Text Analytics will be a mandatory for all organizations doing unified information access
Machine Learning Engines take over
BigData and BigContent join forces
The need for Language Scientist and Data Scientist increases
Buzz Words: Unified Information, Content Intelligence, BigContent
Who Is
• The Text Analytics Subject Matter Experts
• Helping you make money with a Text Analytics practice