12
Future-ready business intelligence platform Gain actionable insight Technical White Paper

Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

Future-ready business intelligence platform

Gain actionable insight

Technical White Paper

5

6

9

11

Table of contents

Understand the nature of unstructured data

Look beyond with unstructured data analytics

Use HPE IDOLVertica for unstructured analytics

Bridge BI and Big Data

Be a data-driven agile enterprise 11

Forward-looking enterprises are aggressively building new analytic platforms that create actionable analytics from operational and external data generated by enterprise and external data feeds This gives your organization greater insight to make better decisions and drive innovationmdashensuring better business outcomes are reached

3

Technical White Paper

Secure better outcomes with analyticsOrganizations can no longer ignore unstructured data it includes important insight into customers behaviors and trends and information generated in doctorsrsquo notes twitter feeds or financial documents and tends to be of variable quality and consistency Traditional analytic platforms need expanded capabilities to stay relevantWith increasing globalization businesses are being pushed to innovate in order to maintain a competitive advantage within their industry segments Information is ubiquitous and the ability for competitors to challenge industry leaders becomes easier as technology improves and startup costs decrease

This trend continues amidst the backdrop of information overload where the amount of information doubles every 12 to 18 months1 This new deluge of data generally consists of enterprise datamdashgenerated in the context of operations along with external data residing in social media blogs and other repositories Although overwhelming proactive companies are beginning to develop frameworks that will let them ask questions and receive answers with greater insight speed and clarity

Add to this the convergence of intelligent devices and mobile cloud social and Big Data technologies which promises to deliver another disruptivemdashbut positivemdashwave of innovation to enterprises that are truly ready to harness it The new business imperative for organizations is using new analytic platforms that integrate and correlate all of these data assets

1 ldquoKnowledge Doubling Every 12 Months Soon to Be Every 12 Hoursrdquo David Russell Schilling [http wwwindustrytapcomknowledge- doubling-every- 12-months-soon-to-be-every-12-hours3950]

Technical White Paper

4

Technical White Paper

The stakes are high and competitive businesses need to knowmdashin real timemdashwhat their customers are saying and to quickly identify or discount new areas of innovation Some of the benefits within specific industry segments can include

bull In the pharmaceutical industry integrated analytic techniques can leverage thelarge amount of clinical and scientific data to improve the drug discovery process

bull In the financial industry monitoring stock sentiment combined with pricemomentum and trends can be effectively used to more accurately predict stockprice and momentum

bull In the telecommunications industry analytic techniques can be used to improvecustomer service call center compliance and customer satisfaction

bull In the consumer goods industry monitoring reactions to a product launchmdashgenerated from news media and customers in real timemdashenables quick reaction toprotect the brand

These forward-looking solutions need to integrate three types of information operational external and internal

Operational information is generated by machines or humans with well-defined systems They consist of accounting applications radio frequency identification (RFID) devices and other applications that record structured data This data has traditionally been easy to incorporate into analytics but it represents only a fraction of the total information generated

External information is generated by customers and through external social media sites and blogs This type of information often varies in quality and importance to a business

Internal informationmdashthe majority of the content and information generatedmdashresides in emails and human-friendly data created during a work day

Creating a unified framework to bring all this information together is a unique technical challenge that requires integrating and correlating structured and unstructured data

5

Technical White Paper

Understand the nature of unstructured dataUnstructured data is approximately 80 of the information residing within an enterprise2 Itrsquos the information generated in emails presentations Microsoftreg Word documents videos and audio files These documents often reside in content management systems (CMS) file share documents and even rich media and multimedia servers This information is diverse and varied but generally has the following characteristics

bull The exact meaning of terms are not rigid making them ambiguous

bull No standard file format or document structure exists

bull Document content and style is as diverse as the human language

bull The quality of content is variable differing in quality and usefulness

Structured data on the other hand is an organizationrsquos operational data It resides in accounting systems devices and other business systems generated during the course of day-to-day operations All operational data resides in a database Structured data generally has the following characteristics

bull The meaning of data is well defined and unambiguous

bull Highly controlled methods and standards exist for entering and maintainingthe data

bull Data and data entry is restricted to work flows and applications

The primary tool for managing structured data is the database and these techniques have revolutionized business over the last 50 years The database lets businesses quickly calculate income revenue and accounts payable It gave us the modern stock market with its ability to buy and sell stocks in electronic brokerage agencies not to mention monitoring stock price and performance in real time

On the other hand the tools available to manage unstructured data today are still in their infancy The primary tool today is the Enterprise Search System3 which has gained prominence over the last two decades Prior to it no electronic mechanism existed for finding documents The paradigm of the user typing in some free-form text and receiving a set of documents is well known from Internet search engines

An early leader in this space was Autonomy which burst onto the scene with its technology rooted in pattern recognition and advanced mathematics The basic idea involves key concepts within a document and relating them to the text entered by a user The novel approach known as Adaptive Probabilistic Concept Modelling is rooted in Bayesian Inference and Shannonrsquos Information Theory the technique proved effective at finding associations between concepts within a document These techniques protected by over 120 patents continue to provide value and can significantly compound value when integrated into data governance and other structured data techniques

2 Seth Grimes ldquoBreakthrough Analysisrdquo [http breakthroughanalysiscom20080801 unstructured-data-and-the-80-percent-rule]

3 AIIM ldquoWhat is Enterprise Searchrdquo [httpwww aiimorgwhat-is-enterprise-search]

6

Technical White Paper

Look beyond with unstructured data analyticsThe analytics-driven business needs to look beyond merely finding and locating documents There are additional insights to be derived when advanced text analytics techniques can be incorporated and correlated into other forms of enterprise data

Linked data analysis

Today most companies use enterprise search in isolated data silos as a mechanism for finding documents It returns documents not data But linking data in documents to internal and external information systems enables a 360-degree view on a topic When data is linked it opens up a greater possibility for generating analytic insight Analytic agents capable of monitoring specific topics or analyzing data trends in real time becomes possible

For example a drug company might generate toxicity results that get immediately linked to externally published numbers derived from research journals The configured agents responsible for monitoring and aggregating lab results can immediately alert research scientists to discrepancies Such capabilities require the perfect marriage of unstructured text analytics with the structured knowledge residing in a database known as a knowledgebase Text analytic tools are capable of mining text to generate factsmdashfor example the toxicity is X for compound Y

A knowledgebase contains systematic knowledge of an industry area often being generated by a group of experts in the field and it models key concepts and subject relationships such as Morgan Doe is an expert in cancer research cancer is a disease The knowledgebase acts as the source of truth on a subject area operating as a textbook on key concepts and their relationships It becomes important as a guide for text analytic tools as important facts are mined and extracted It can aid in disambiguating terms and providing provenance to facts and metadata extracted from documents

For example a text mining agent might come across a group of names associated with an important journal article on cancer During the extraction process the names can be compared to the knowledgebase containing all known experts in cancer research Based on information gleaned from mined information new names might be encountered indicating the need to add them to the known list of experts in cancer research In this example data mining becomes a kind of expert location tool that continually updates the source of truth of known experts

7

Technical White Paper

Entity analysis

The first step toward enabling linked data analysis is entity extraction It recognizes important termsmdashsuch as people places locations or credit card numbersmdashlocated within text documents Entity extraction algorithms are specific to a particular industry and are usually intelligent enough to disambiguate similar names or synonyms A drug might have a number of different name variations for the same drug and entity extraction algorithms will associate them to the same drug

Other uses include locating and extracting important people places and locations Once extracted they can be cross- referenced disambiguated or incorporated into a human workflow or rules engine This cleansing process improves the quality of entities returned and ensures the accuracy of the entities as they relate to important concepts

Sentiment analysis

Another technique for gaining insight is sentiment analysis Itrsquos a powerful technique for gauging the aggregate view on a particular subject and is powerful in call center analysis It can parse through call center transcripts and determine whether the text contains positive negative or neutral feelings toward a subject

Itrsquos an especially powerful technique when used in the telecommunications industry The phrase ldquoI had a positive experience using brand Xrdquo indicates positive sentiment while the phrase ldquoI am not happy with brand Xrdquo indicates negative sentiment When call center transcripts are run through sentiment analysis tools they can serve as a gauge for providing insight into customer service and satisfaction

When customer sentiment is monitored in real time it provides a mechanism for trouble-shooting and pinpointing problems with customer satisfaction Negative sentiment tied to a particular customer call can indicate an unhappy customer who requires further help to ensure customer satisfaction This kind of analytic insight provides a mechanism for focusing on customer satisfaction and reducing customer attrition When a high-value customer calls into the call center with highly negative sentiment identified in the transcripts it becomes an indication that action needs to be taken in order to maintain the customer relationship A company is able to monitor the response to its product based on the unfiltered comments found in blogs and tweets

8

Technical White Paper

Conceptual analysis

Closely related to sentiment analysis conceptual analysis identifies key concepts within textual content This can be a powerful technique for watching new trends emerge within social media text

For example a healthcare company interested in monitoring patient side effects to a drug during clinical trials might use the capability As users enter comments they might discuss their symptoms as being sleepy nauseous or having a loss of appetite Particularly when the number of documents becomes too large for a human to read this technique becomes important

It helps the business monitor ldquounknown unknownsrdquo Most drug side effects would be well documented and incorporated into structured systems Yet there may be unknown side effects documented by a small minority of usersmdash such as the few cases where it resulted in extreme itching or migraines Conceptual analysis would uncover these situations

Other text analytic capabilities

These techniques include geolocation categorization and summarization Used in specialized situations they serve as important metadata enrichment techniques

Geolocation uses complex analytic approaches to determine latitude and longitude based on a postal code or location referenced within text Categorization automatically categorizes documents based on a taxonomy Summarization summarizes a document or group of documents These additional techniques provide additional text analytic approaches that can be leveraged as another tool in the tool chest

9

Technical White Paper

Use HPE IDOLVertica for unstructured analyticsDXC Technology is uniquely positioned to provide unified analytics Using Vertica4 and HPE IDOL5 together enables analytics to be done on all types of information sources HPE IDOL has a competitive advantage with its unique pattern matching technology and text analytic capabilities for unstructured data Vertica has a competitive advantage as a fast columnar data store built from the ground upmdashwith Big Data and analytics in mind The integration of Vertica with HPE IDOL is the perfect marriage and makes the unified Actionable Analytics framework possible In terms of architecture there are a few other options available

DXC IDOL within Vertica

HPE IDOL UDX Pack for Vertica6 represents a major move to incorporate HPE IDOL technologies into the native Vertica database With the HPE IDOL User-Defined Extension Pack for Vertica a user can use an SQL-based tool to analyze human friendly and business data together This becomes powerful for unified analytics and is appropriate for use cases that want to store native text content within Vertica

Rather than indexing text into a standalone full-text HPE IDOL index text can be included as a column within the Vertica database Once incorporated users can query unstructured data directlymdashusing SQL syntax This is appropriate for business use cases where full text is a component of a data solution requiring a relational data store In addition only a subset of HPE IDOL functionality is exposed making it appropriate for more lightweight unstructured use cases

Current HPE IDOL functionality provided within Vertica includes

bull Entity Text Analytics is supported and lets important entities be extracted from documents that have been imported and stored in the HPE IDOL Server

bull Geolocation extracts important postal codes country codes and locations

bull Key view capabilities can be integrated so text can be extracted from a PDF Microsoft Word and over 1000 different file types

bull Language detection determines the language supported

bull Summarization enables a document summary to be generated from text

Using the integrated HPE IDOL capability is appropriate in the following situations

bull Documents are stored on a file share accessible by the Vertica database

bull Textual data represents a component of a database schema

bull Full HPE IDOL analytic capabilities are not required

bull Full-text search capability is not a requirement

4 DXC Vertica MarketPlace [httpsvertica hpwsportalcomHomeShow]

5 ldquoThe Most Advanced Search and Analytics Platform DXC IDOL 10rdquo [httpwwwautonomycomassets globalpdfProductsPower IDOL20140114_ PI_B_HP_IDOL10_Migration_Overview_webpdf]

6 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

10

Technical White Paper

HPE IDOL Hadoop and Vertica

For use cases that require full HPE IDOL text analytic capabilities the native HPE IDOL Vertica integration may not be appropriate and a full VerticaHPE IDOL architecture may need to be adopted Using HPE IDOL Hadoop and Vertica together is a way to combine all the analytic capabilities into one framework There are few different HPE IDOLHadoop integration packages available and the most powerfulmdashthe Extreme HPE IDOL Analytics pack7mdashsupports text analytic enrichment such as sentiment analysis entity extraction and conceptual analysis capabilities

The Vertica repository can integrate with this technology along with the internal Vertica Advanced Analytics capabilities such as predictive analytics In this fashion insights generated from the unstructured repository are integrated into the relational insights for a comprehensive view from all information sources

The summary of the appropriate times to leverage the full HPE IDOLVertica is

bull Full text search over a large number of documents is a priority

bull Documents reside in a diverse number of repositories requiring diverse connectors

bull Full analytic capabilities are required

bull Structured information needs to be stored in a relational index for analytics

bull Full analytic capabilities available in a Big Data relational data storemdashis required

7 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

11

Technical White Paper

Bridge BI and Big DataIn order to address new business intelligence (BI) needs and become truly data-driven DXC has launched a suite of services known as DXC Business Intelligence Modernization Services It provides a proven business-led approach that bridges traditional BI with new Big Data technologiesmdash enabling you to transition to the business analytics of the future We use several methods to help transform BI and analytics capabilities on your terms

bull Discovery ServicesmdashExplore test share and learn to understand your data andshare new insights Data lakes data visualization tools and services enable rapidenterprise-wide data sharing and analytic discovery collaboration

bull Analytic SolutionsmdashAnalyze understand and act to apply insights to yourbusiness Solutions that address specific analytics run the business better

bull Hybrid Data Management ServicesmdashOptimize integrate govern and manageto become more agile through hybrid data management services Services enableenterprises to deliver production-grade analytics integrated into business processesand systems that leverage 100 of relevant data

Our end-to-end approach enables you to develop and execute on a strategic roadmapmdashto quickly respond to growing Big Data challenges optimize your existing BI investments control upfront and long-term costs avoid technology obsolescence and ultimately improve business outcomes in a real-time analytics environment

Be a data-driven agile enterpriseProviding an analytic platform that enables organizations to derive the most insight has become a business imperative The majority of information today consists of human friendly information business analytic frameworks need to incorporate it with structured data to provide actionable analytics DXC BI Modernization Services provides a pragmatic approach to bridging traditional BI with new Big Data technologiesmdashenabling the transition to a data-driven and agile enterprise

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1

Page 2: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

5

6

9

11

Table of contents

Understand the nature of unstructured data

Look beyond with unstructured data analytics

Use HPE IDOLVertica for unstructured analytics

Bridge BI and Big Data

Be a data-driven agile enterprise 11

Forward-looking enterprises are aggressively building new analytic platforms that create actionable analytics from operational and external data generated by enterprise and external data feeds This gives your organization greater insight to make better decisions and drive innovationmdashensuring better business outcomes are reached

3

Technical White Paper

Secure better outcomes with analyticsOrganizations can no longer ignore unstructured data it includes important insight into customers behaviors and trends and information generated in doctorsrsquo notes twitter feeds or financial documents and tends to be of variable quality and consistency Traditional analytic platforms need expanded capabilities to stay relevantWith increasing globalization businesses are being pushed to innovate in order to maintain a competitive advantage within their industry segments Information is ubiquitous and the ability for competitors to challenge industry leaders becomes easier as technology improves and startup costs decrease

This trend continues amidst the backdrop of information overload where the amount of information doubles every 12 to 18 months1 This new deluge of data generally consists of enterprise datamdashgenerated in the context of operations along with external data residing in social media blogs and other repositories Although overwhelming proactive companies are beginning to develop frameworks that will let them ask questions and receive answers with greater insight speed and clarity

Add to this the convergence of intelligent devices and mobile cloud social and Big Data technologies which promises to deliver another disruptivemdashbut positivemdashwave of innovation to enterprises that are truly ready to harness it The new business imperative for organizations is using new analytic platforms that integrate and correlate all of these data assets

1 ldquoKnowledge Doubling Every 12 Months Soon to Be Every 12 Hoursrdquo David Russell Schilling [http wwwindustrytapcomknowledge- doubling-every- 12-months-soon-to-be-every-12-hours3950]

Technical White Paper

4

Technical White Paper

The stakes are high and competitive businesses need to knowmdashin real timemdashwhat their customers are saying and to quickly identify or discount new areas of innovation Some of the benefits within specific industry segments can include

bull In the pharmaceutical industry integrated analytic techniques can leverage thelarge amount of clinical and scientific data to improve the drug discovery process

bull In the financial industry monitoring stock sentiment combined with pricemomentum and trends can be effectively used to more accurately predict stockprice and momentum

bull In the telecommunications industry analytic techniques can be used to improvecustomer service call center compliance and customer satisfaction

bull In the consumer goods industry monitoring reactions to a product launchmdashgenerated from news media and customers in real timemdashenables quick reaction toprotect the brand

These forward-looking solutions need to integrate three types of information operational external and internal

Operational information is generated by machines or humans with well-defined systems They consist of accounting applications radio frequency identification (RFID) devices and other applications that record structured data This data has traditionally been easy to incorporate into analytics but it represents only a fraction of the total information generated

External information is generated by customers and through external social media sites and blogs This type of information often varies in quality and importance to a business

Internal informationmdashthe majority of the content and information generatedmdashresides in emails and human-friendly data created during a work day

Creating a unified framework to bring all this information together is a unique technical challenge that requires integrating and correlating structured and unstructured data

5

Technical White Paper

Understand the nature of unstructured dataUnstructured data is approximately 80 of the information residing within an enterprise2 Itrsquos the information generated in emails presentations Microsoftreg Word documents videos and audio files These documents often reside in content management systems (CMS) file share documents and even rich media and multimedia servers This information is diverse and varied but generally has the following characteristics

bull The exact meaning of terms are not rigid making them ambiguous

bull No standard file format or document structure exists

bull Document content and style is as diverse as the human language

bull The quality of content is variable differing in quality and usefulness

Structured data on the other hand is an organizationrsquos operational data It resides in accounting systems devices and other business systems generated during the course of day-to-day operations All operational data resides in a database Structured data generally has the following characteristics

bull The meaning of data is well defined and unambiguous

bull Highly controlled methods and standards exist for entering and maintainingthe data

bull Data and data entry is restricted to work flows and applications

The primary tool for managing structured data is the database and these techniques have revolutionized business over the last 50 years The database lets businesses quickly calculate income revenue and accounts payable It gave us the modern stock market with its ability to buy and sell stocks in electronic brokerage agencies not to mention monitoring stock price and performance in real time

On the other hand the tools available to manage unstructured data today are still in their infancy The primary tool today is the Enterprise Search System3 which has gained prominence over the last two decades Prior to it no electronic mechanism existed for finding documents The paradigm of the user typing in some free-form text and receiving a set of documents is well known from Internet search engines

An early leader in this space was Autonomy which burst onto the scene with its technology rooted in pattern recognition and advanced mathematics The basic idea involves key concepts within a document and relating them to the text entered by a user The novel approach known as Adaptive Probabilistic Concept Modelling is rooted in Bayesian Inference and Shannonrsquos Information Theory the technique proved effective at finding associations between concepts within a document These techniques protected by over 120 patents continue to provide value and can significantly compound value when integrated into data governance and other structured data techniques

2 Seth Grimes ldquoBreakthrough Analysisrdquo [http breakthroughanalysiscom20080801 unstructured-data-and-the-80-percent-rule]

3 AIIM ldquoWhat is Enterprise Searchrdquo [httpwww aiimorgwhat-is-enterprise-search]

6

Technical White Paper

Look beyond with unstructured data analyticsThe analytics-driven business needs to look beyond merely finding and locating documents There are additional insights to be derived when advanced text analytics techniques can be incorporated and correlated into other forms of enterprise data

Linked data analysis

Today most companies use enterprise search in isolated data silos as a mechanism for finding documents It returns documents not data But linking data in documents to internal and external information systems enables a 360-degree view on a topic When data is linked it opens up a greater possibility for generating analytic insight Analytic agents capable of monitoring specific topics or analyzing data trends in real time becomes possible

For example a drug company might generate toxicity results that get immediately linked to externally published numbers derived from research journals The configured agents responsible for monitoring and aggregating lab results can immediately alert research scientists to discrepancies Such capabilities require the perfect marriage of unstructured text analytics with the structured knowledge residing in a database known as a knowledgebase Text analytic tools are capable of mining text to generate factsmdashfor example the toxicity is X for compound Y

A knowledgebase contains systematic knowledge of an industry area often being generated by a group of experts in the field and it models key concepts and subject relationships such as Morgan Doe is an expert in cancer research cancer is a disease The knowledgebase acts as the source of truth on a subject area operating as a textbook on key concepts and their relationships It becomes important as a guide for text analytic tools as important facts are mined and extracted It can aid in disambiguating terms and providing provenance to facts and metadata extracted from documents

For example a text mining agent might come across a group of names associated with an important journal article on cancer During the extraction process the names can be compared to the knowledgebase containing all known experts in cancer research Based on information gleaned from mined information new names might be encountered indicating the need to add them to the known list of experts in cancer research In this example data mining becomes a kind of expert location tool that continually updates the source of truth of known experts

7

Technical White Paper

Entity analysis

The first step toward enabling linked data analysis is entity extraction It recognizes important termsmdashsuch as people places locations or credit card numbersmdashlocated within text documents Entity extraction algorithms are specific to a particular industry and are usually intelligent enough to disambiguate similar names or synonyms A drug might have a number of different name variations for the same drug and entity extraction algorithms will associate them to the same drug

Other uses include locating and extracting important people places and locations Once extracted they can be cross- referenced disambiguated or incorporated into a human workflow or rules engine This cleansing process improves the quality of entities returned and ensures the accuracy of the entities as they relate to important concepts

Sentiment analysis

Another technique for gaining insight is sentiment analysis Itrsquos a powerful technique for gauging the aggregate view on a particular subject and is powerful in call center analysis It can parse through call center transcripts and determine whether the text contains positive negative or neutral feelings toward a subject

Itrsquos an especially powerful technique when used in the telecommunications industry The phrase ldquoI had a positive experience using brand Xrdquo indicates positive sentiment while the phrase ldquoI am not happy with brand Xrdquo indicates negative sentiment When call center transcripts are run through sentiment analysis tools they can serve as a gauge for providing insight into customer service and satisfaction

When customer sentiment is monitored in real time it provides a mechanism for trouble-shooting and pinpointing problems with customer satisfaction Negative sentiment tied to a particular customer call can indicate an unhappy customer who requires further help to ensure customer satisfaction This kind of analytic insight provides a mechanism for focusing on customer satisfaction and reducing customer attrition When a high-value customer calls into the call center with highly negative sentiment identified in the transcripts it becomes an indication that action needs to be taken in order to maintain the customer relationship A company is able to monitor the response to its product based on the unfiltered comments found in blogs and tweets

8

Technical White Paper

Conceptual analysis

Closely related to sentiment analysis conceptual analysis identifies key concepts within textual content This can be a powerful technique for watching new trends emerge within social media text

For example a healthcare company interested in monitoring patient side effects to a drug during clinical trials might use the capability As users enter comments they might discuss their symptoms as being sleepy nauseous or having a loss of appetite Particularly when the number of documents becomes too large for a human to read this technique becomes important

It helps the business monitor ldquounknown unknownsrdquo Most drug side effects would be well documented and incorporated into structured systems Yet there may be unknown side effects documented by a small minority of usersmdash such as the few cases where it resulted in extreme itching or migraines Conceptual analysis would uncover these situations

Other text analytic capabilities

These techniques include geolocation categorization and summarization Used in specialized situations they serve as important metadata enrichment techniques

Geolocation uses complex analytic approaches to determine latitude and longitude based on a postal code or location referenced within text Categorization automatically categorizes documents based on a taxonomy Summarization summarizes a document or group of documents These additional techniques provide additional text analytic approaches that can be leveraged as another tool in the tool chest

9

Technical White Paper

Use HPE IDOLVertica for unstructured analyticsDXC Technology is uniquely positioned to provide unified analytics Using Vertica4 and HPE IDOL5 together enables analytics to be done on all types of information sources HPE IDOL has a competitive advantage with its unique pattern matching technology and text analytic capabilities for unstructured data Vertica has a competitive advantage as a fast columnar data store built from the ground upmdashwith Big Data and analytics in mind The integration of Vertica with HPE IDOL is the perfect marriage and makes the unified Actionable Analytics framework possible In terms of architecture there are a few other options available

DXC IDOL within Vertica

HPE IDOL UDX Pack for Vertica6 represents a major move to incorporate HPE IDOL technologies into the native Vertica database With the HPE IDOL User-Defined Extension Pack for Vertica a user can use an SQL-based tool to analyze human friendly and business data together This becomes powerful for unified analytics and is appropriate for use cases that want to store native text content within Vertica

Rather than indexing text into a standalone full-text HPE IDOL index text can be included as a column within the Vertica database Once incorporated users can query unstructured data directlymdashusing SQL syntax This is appropriate for business use cases where full text is a component of a data solution requiring a relational data store In addition only a subset of HPE IDOL functionality is exposed making it appropriate for more lightweight unstructured use cases

Current HPE IDOL functionality provided within Vertica includes

bull Entity Text Analytics is supported and lets important entities be extracted from documents that have been imported and stored in the HPE IDOL Server

bull Geolocation extracts important postal codes country codes and locations

bull Key view capabilities can be integrated so text can be extracted from a PDF Microsoft Word and over 1000 different file types

bull Language detection determines the language supported

bull Summarization enables a document summary to be generated from text

Using the integrated HPE IDOL capability is appropriate in the following situations

bull Documents are stored on a file share accessible by the Vertica database

bull Textual data represents a component of a database schema

bull Full HPE IDOL analytic capabilities are not required

bull Full-text search capability is not a requirement

4 DXC Vertica MarketPlace [httpsvertica hpwsportalcomHomeShow]

5 ldquoThe Most Advanced Search and Analytics Platform DXC IDOL 10rdquo [httpwwwautonomycomassets globalpdfProductsPower IDOL20140114_ PI_B_HP_IDOL10_Migration_Overview_webpdf]

6 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

10

Technical White Paper

HPE IDOL Hadoop and Vertica

For use cases that require full HPE IDOL text analytic capabilities the native HPE IDOL Vertica integration may not be appropriate and a full VerticaHPE IDOL architecture may need to be adopted Using HPE IDOL Hadoop and Vertica together is a way to combine all the analytic capabilities into one framework There are few different HPE IDOLHadoop integration packages available and the most powerfulmdashthe Extreme HPE IDOL Analytics pack7mdashsupports text analytic enrichment such as sentiment analysis entity extraction and conceptual analysis capabilities

The Vertica repository can integrate with this technology along with the internal Vertica Advanced Analytics capabilities such as predictive analytics In this fashion insights generated from the unstructured repository are integrated into the relational insights for a comprehensive view from all information sources

The summary of the appropriate times to leverage the full HPE IDOLVertica is

bull Full text search over a large number of documents is a priority

bull Documents reside in a diverse number of repositories requiring diverse connectors

bull Full analytic capabilities are required

bull Structured information needs to be stored in a relational index for analytics

bull Full analytic capabilities available in a Big Data relational data storemdashis required

7 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

11

Technical White Paper

Bridge BI and Big DataIn order to address new business intelligence (BI) needs and become truly data-driven DXC has launched a suite of services known as DXC Business Intelligence Modernization Services It provides a proven business-led approach that bridges traditional BI with new Big Data technologiesmdash enabling you to transition to the business analytics of the future We use several methods to help transform BI and analytics capabilities on your terms

bull Discovery ServicesmdashExplore test share and learn to understand your data andshare new insights Data lakes data visualization tools and services enable rapidenterprise-wide data sharing and analytic discovery collaboration

bull Analytic SolutionsmdashAnalyze understand and act to apply insights to yourbusiness Solutions that address specific analytics run the business better

bull Hybrid Data Management ServicesmdashOptimize integrate govern and manageto become more agile through hybrid data management services Services enableenterprises to deliver production-grade analytics integrated into business processesand systems that leverage 100 of relevant data

Our end-to-end approach enables you to develop and execute on a strategic roadmapmdashto quickly respond to growing Big Data challenges optimize your existing BI investments control upfront and long-term costs avoid technology obsolescence and ultimately improve business outcomes in a real-time analytics environment

Be a data-driven agile enterpriseProviding an analytic platform that enables organizations to derive the most insight has become a business imperative The majority of information today consists of human friendly information business analytic frameworks need to incorporate it with structured data to provide actionable analytics DXC BI Modernization Services provides a pragmatic approach to bridging traditional BI with new Big Data technologiesmdashenabling the transition to a data-driven and agile enterprise

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1

Page 3: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

3

Technical White Paper

Secure better outcomes with analyticsOrganizations can no longer ignore unstructured data it includes important insight into customers behaviors and trends and information generated in doctorsrsquo notes twitter feeds or financial documents and tends to be of variable quality and consistency Traditional analytic platforms need expanded capabilities to stay relevantWith increasing globalization businesses are being pushed to innovate in order to maintain a competitive advantage within their industry segments Information is ubiquitous and the ability for competitors to challenge industry leaders becomes easier as technology improves and startup costs decrease

This trend continues amidst the backdrop of information overload where the amount of information doubles every 12 to 18 months1 This new deluge of data generally consists of enterprise datamdashgenerated in the context of operations along with external data residing in social media blogs and other repositories Although overwhelming proactive companies are beginning to develop frameworks that will let them ask questions and receive answers with greater insight speed and clarity

Add to this the convergence of intelligent devices and mobile cloud social and Big Data technologies which promises to deliver another disruptivemdashbut positivemdashwave of innovation to enterprises that are truly ready to harness it The new business imperative for organizations is using new analytic platforms that integrate and correlate all of these data assets

1 ldquoKnowledge Doubling Every 12 Months Soon to Be Every 12 Hoursrdquo David Russell Schilling [http wwwindustrytapcomknowledge- doubling-every- 12-months-soon-to-be-every-12-hours3950]

Technical White Paper

4

Technical White Paper

The stakes are high and competitive businesses need to knowmdashin real timemdashwhat their customers are saying and to quickly identify or discount new areas of innovation Some of the benefits within specific industry segments can include

bull In the pharmaceutical industry integrated analytic techniques can leverage thelarge amount of clinical and scientific data to improve the drug discovery process

bull In the financial industry monitoring stock sentiment combined with pricemomentum and trends can be effectively used to more accurately predict stockprice and momentum

bull In the telecommunications industry analytic techniques can be used to improvecustomer service call center compliance and customer satisfaction

bull In the consumer goods industry monitoring reactions to a product launchmdashgenerated from news media and customers in real timemdashenables quick reaction toprotect the brand

These forward-looking solutions need to integrate three types of information operational external and internal

Operational information is generated by machines or humans with well-defined systems They consist of accounting applications radio frequency identification (RFID) devices and other applications that record structured data This data has traditionally been easy to incorporate into analytics but it represents only a fraction of the total information generated

External information is generated by customers and through external social media sites and blogs This type of information often varies in quality and importance to a business

Internal informationmdashthe majority of the content and information generatedmdashresides in emails and human-friendly data created during a work day

Creating a unified framework to bring all this information together is a unique technical challenge that requires integrating and correlating structured and unstructured data

5

Technical White Paper

Understand the nature of unstructured dataUnstructured data is approximately 80 of the information residing within an enterprise2 Itrsquos the information generated in emails presentations Microsoftreg Word documents videos and audio files These documents often reside in content management systems (CMS) file share documents and even rich media and multimedia servers This information is diverse and varied but generally has the following characteristics

bull The exact meaning of terms are not rigid making them ambiguous

bull No standard file format or document structure exists

bull Document content and style is as diverse as the human language

bull The quality of content is variable differing in quality and usefulness

Structured data on the other hand is an organizationrsquos operational data It resides in accounting systems devices and other business systems generated during the course of day-to-day operations All operational data resides in a database Structured data generally has the following characteristics

bull The meaning of data is well defined and unambiguous

bull Highly controlled methods and standards exist for entering and maintainingthe data

bull Data and data entry is restricted to work flows and applications

The primary tool for managing structured data is the database and these techniques have revolutionized business over the last 50 years The database lets businesses quickly calculate income revenue and accounts payable It gave us the modern stock market with its ability to buy and sell stocks in electronic brokerage agencies not to mention monitoring stock price and performance in real time

On the other hand the tools available to manage unstructured data today are still in their infancy The primary tool today is the Enterprise Search System3 which has gained prominence over the last two decades Prior to it no electronic mechanism existed for finding documents The paradigm of the user typing in some free-form text and receiving a set of documents is well known from Internet search engines

An early leader in this space was Autonomy which burst onto the scene with its technology rooted in pattern recognition and advanced mathematics The basic idea involves key concepts within a document and relating them to the text entered by a user The novel approach known as Adaptive Probabilistic Concept Modelling is rooted in Bayesian Inference and Shannonrsquos Information Theory the technique proved effective at finding associations between concepts within a document These techniques protected by over 120 patents continue to provide value and can significantly compound value when integrated into data governance and other structured data techniques

2 Seth Grimes ldquoBreakthrough Analysisrdquo [http breakthroughanalysiscom20080801 unstructured-data-and-the-80-percent-rule]

3 AIIM ldquoWhat is Enterprise Searchrdquo [httpwww aiimorgwhat-is-enterprise-search]

6

Technical White Paper

Look beyond with unstructured data analyticsThe analytics-driven business needs to look beyond merely finding and locating documents There are additional insights to be derived when advanced text analytics techniques can be incorporated and correlated into other forms of enterprise data

Linked data analysis

Today most companies use enterprise search in isolated data silos as a mechanism for finding documents It returns documents not data But linking data in documents to internal and external information systems enables a 360-degree view on a topic When data is linked it opens up a greater possibility for generating analytic insight Analytic agents capable of monitoring specific topics or analyzing data trends in real time becomes possible

For example a drug company might generate toxicity results that get immediately linked to externally published numbers derived from research journals The configured agents responsible for monitoring and aggregating lab results can immediately alert research scientists to discrepancies Such capabilities require the perfect marriage of unstructured text analytics with the structured knowledge residing in a database known as a knowledgebase Text analytic tools are capable of mining text to generate factsmdashfor example the toxicity is X for compound Y

A knowledgebase contains systematic knowledge of an industry area often being generated by a group of experts in the field and it models key concepts and subject relationships such as Morgan Doe is an expert in cancer research cancer is a disease The knowledgebase acts as the source of truth on a subject area operating as a textbook on key concepts and their relationships It becomes important as a guide for text analytic tools as important facts are mined and extracted It can aid in disambiguating terms and providing provenance to facts and metadata extracted from documents

For example a text mining agent might come across a group of names associated with an important journal article on cancer During the extraction process the names can be compared to the knowledgebase containing all known experts in cancer research Based on information gleaned from mined information new names might be encountered indicating the need to add them to the known list of experts in cancer research In this example data mining becomes a kind of expert location tool that continually updates the source of truth of known experts

7

Technical White Paper

Entity analysis

The first step toward enabling linked data analysis is entity extraction It recognizes important termsmdashsuch as people places locations or credit card numbersmdashlocated within text documents Entity extraction algorithms are specific to a particular industry and are usually intelligent enough to disambiguate similar names or synonyms A drug might have a number of different name variations for the same drug and entity extraction algorithms will associate them to the same drug

Other uses include locating and extracting important people places and locations Once extracted they can be cross- referenced disambiguated or incorporated into a human workflow or rules engine This cleansing process improves the quality of entities returned and ensures the accuracy of the entities as they relate to important concepts

Sentiment analysis

Another technique for gaining insight is sentiment analysis Itrsquos a powerful technique for gauging the aggregate view on a particular subject and is powerful in call center analysis It can parse through call center transcripts and determine whether the text contains positive negative or neutral feelings toward a subject

Itrsquos an especially powerful technique when used in the telecommunications industry The phrase ldquoI had a positive experience using brand Xrdquo indicates positive sentiment while the phrase ldquoI am not happy with brand Xrdquo indicates negative sentiment When call center transcripts are run through sentiment analysis tools they can serve as a gauge for providing insight into customer service and satisfaction

When customer sentiment is monitored in real time it provides a mechanism for trouble-shooting and pinpointing problems with customer satisfaction Negative sentiment tied to a particular customer call can indicate an unhappy customer who requires further help to ensure customer satisfaction This kind of analytic insight provides a mechanism for focusing on customer satisfaction and reducing customer attrition When a high-value customer calls into the call center with highly negative sentiment identified in the transcripts it becomes an indication that action needs to be taken in order to maintain the customer relationship A company is able to monitor the response to its product based on the unfiltered comments found in blogs and tweets

8

Technical White Paper

Conceptual analysis

Closely related to sentiment analysis conceptual analysis identifies key concepts within textual content This can be a powerful technique for watching new trends emerge within social media text

For example a healthcare company interested in monitoring patient side effects to a drug during clinical trials might use the capability As users enter comments they might discuss their symptoms as being sleepy nauseous or having a loss of appetite Particularly when the number of documents becomes too large for a human to read this technique becomes important

It helps the business monitor ldquounknown unknownsrdquo Most drug side effects would be well documented and incorporated into structured systems Yet there may be unknown side effects documented by a small minority of usersmdash such as the few cases where it resulted in extreme itching or migraines Conceptual analysis would uncover these situations

Other text analytic capabilities

These techniques include geolocation categorization and summarization Used in specialized situations they serve as important metadata enrichment techniques

Geolocation uses complex analytic approaches to determine latitude and longitude based on a postal code or location referenced within text Categorization automatically categorizes documents based on a taxonomy Summarization summarizes a document or group of documents These additional techniques provide additional text analytic approaches that can be leveraged as another tool in the tool chest

9

Technical White Paper

Use HPE IDOLVertica for unstructured analyticsDXC Technology is uniquely positioned to provide unified analytics Using Vertica4 and HPE IDOL5 together enables analytics to be done on all types of information sources HPE IDOL has a competitive advantage with its unique pattern matching technology and text analytic capabilities for unstructured data Vertica has a competitive advantage as a fast columnar data store built from the ground upmdashwith Big Data and analytics in mind The integration of Vertica with HPE IDOL is the perfect marriage and makes the unified Actionable Analytics framework possible In terms of architecture there are a few other options available

DXC IDOL within Vertica

HPE IDOL UDX Pack for Vertica6 represents a major move to incorporate HPE IDOL technologies into the native Vertica database With the HPE IDOL User-Defined Extension Pack for Vertica a user can use an SQL-based tool to analyze human friendly and business data together This becomes powerful for unified analytics and is appropriate for use cases that want to store native text content within Vertica

Rather than indexing text into a standalone full-text HPE IDOL index text can be included as a column within the Vertica database Once incorporated users can query unstructured data directlymdashusing SQL syntax This is appropriate for business use cases where full text is a component of a data solution requiring a relational data store In addition only a subset of HPE IDOL functionality is exposed making it appropriate for more lightweight unstructured use cases

Current HPE IDOL functionality provided within Vertica includes

bull Entity Text Analytics is supported and lets important entities be extracted from documents that have been imported and stored in the HPE IDOL Server

bull Geolocation extracts important postal codes country codes and locations

bull Key view capabilities can be integrated so text can be extracted from a PDF Microsoft Word and over 1000 different file types

bull Language detection determines the language supported

bull Summarization enables a document summary to be generated from text

Using the integrated HPE IDOL capability is appropriate in the following situations

bull Documents are stored on a file share accessible by the Vertica database

bull Textual data represents a component of a database schema

bull Full HPE IDOL analytic capabilities are not required

bull Full-text search capability is not a requirement

4 DXC Vertica MarketPlace [httpsvertica hpwsportalcomHomeShow]

5 ldquoThe Most Advanced Search and Analytics Platform DXC IDOL 10rdquo [httpwwwautonomycomassets globalpdfProductsPower IDOL20140114_ PI_B_HP_IDOL10_Migration_Overview_webpdf]

6 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

10

Technical White Paper

HPE IDOL Hadoop and Vertica

For use cases that require full HPE IDOL text analytic capabilities the native HPE IDOL Vertica integration may not be appropriate and a full VerticaHPE IDOL architecture may need to be adopted Using HPE IDOL Hadoop and Vertica together is a way to combine all the analytic capabilities into one framework There are few different HPE IDOLHadoop integration packages available and the most powerfulmdashthe Extreme HPE IDOL Analytics pack7mdashsupports text analytic enrichment such as sentiment analysis entity extraction and conceptual analysis capabilities

The Vertica repository can integrate with this technology along with the internal Vertica Advanced Analytics capabilities such as predictive analytics In this fashion insights generated from the unstructured repository are integrated into the relational insights for a comprehensive view from all information sources

The summary of the appropriate times to leverage the full HPE IDOLVertica is

bull Full text search over a large number of documents is a priority

bull Documents reside in a diverse number of repositories requiring diverse connectors

bull Full analytic capabilities are required

bull Structured information needs to be stored in a relational index for analytics

bull Full analytic capabilities available in a Big Data relational data storemdashis required

7 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

11

Technical White Paper

Bridge BI and Big DataIn order to address new business intelligence (BI) needs and become truly data-driven DXC has launched a suite of services known as DXC Business Intelligence Modernization Services It provides a proven business-led approach that bridges traditional BI with new Big Data technologiesmdash enabling you to transition to the business analytics of the future We use several methods to help transform BI and analytics capabilities on your terms

bull Discovery ServicesmdashExplore test share and learn to understand your data andshare new insights Data lakes data visualization tools and services enable rapidenterprise-wide data sharing and analytic discovery collaboration

bull Analytic SolutionsmdashAnalyze understand and act to apply insights to yourbusiness Solutions that address specific analytics run the business better

bull Hybrid Data Management ServicesmdashOptimize integrate govern and manageto become more agile through hybrid data management services Services enableenterprises to deliver production-grade analytics integrated into business processesand systems that leverage 100 of relevant data

Our end-to-end approach enables you to develop and execute on a strategic roadmapmdashto quickly respond to growing Big Data challenges optimize your existing BI investments control upfront and long-term costs avoid technology obsolescence and ultimately improve business outcomes in a real-time analytics environment

Be a data-driven agile enterpriseProviding an analytic platform that enables organizations to derive the most insight has become a business imperative The majority of information today consists of human friendly information business analytic frameworks need to incorporate it with structured data to provide actionable analytics DXC BI Modernization Services provides a pragmatic approach to bridging traditional BI with new Big Data technologiesmdashenabling the transition to a data-driven and agile enterprise

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1

Page 4: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

4

Technical White Paper

The stakes are high and competitive businesses need to knowmdashin real timemdashwhat their customers are saying and to quickly identify or discount new areas of innovation Some of the benefits within specific industry segments can include

bull In the pharmaceutical industry integrated analytic techniques can leverage thelarge amount of clinical and scientific data to improve the drug discovery process

bull In the financial industry monitoring stock sentiment combined with pricemomentum and trends can be effectively used to more accurately predict stockprice and momentum

bull In the telecommunications industry analytic techniques can be used to improvecustomer service call center compliance and customer satisfaction

bull In the consumer goods industry monitoring reactions to a product launchmdashgenerated from news media and customers in real timemdashenables quick reaction toprotect the brand

These forward-looking solutions need to integrate three types of information operational external and internal

Operational information is generated by machines or humans with well-defined systems They consist of accounting applications radio frequency identification (RFID) devices and other applications that record structured data This data has traditionally been easy to incorporate into analytics but it represents only a fraction of the total information generated

External information is generated by customers and through external social media sites and blogs This type of information often varies in quality and importance to a business

Internal informationmdashthe majority of the content and information generatedmdashresides in emails and human-friendly data created during a work day

Creating a unified framework to bring all this information together is a unique technical challenge that requires integrating and correlating structured and unstructured data

5

Technical White Paper

Understand the nature of unstructured dataUnstructured data is approximately 80 of the information residing within an enterprise2 Itrsquos the information generated in emails presentations Microsoftreg Word documents videos and audio files These documents often reside in content management systems (CMS) file share documents and even rich media and multimedia servers This information is diverse and varied but generally has the following characteristics

bull The exact meaning of terms are not rigid making them ambiguous

bull No standard file format or document structure exists

bull Document content and style is as diverse as the human language

bull The quality of content is variable differing in quality and usefulness

Structured data on the other hand is an organizationrsquos operational data It resides in accounting systems devices and other business systems generated during the course of day-to-day operations All operational data resides in a database Structured data generally has the following characteristics

bull The meaning of data is well defined and unambiguous

bull Highly controlled methods and standards exist for entering and maintainingthe data

bull Data and data entry is restricted to work flows and applications

The primary tool for managing structured data is the database and these techniques have revolutionized business over the last 50 years The database lets businesses quickly calculate income revenue and accounts payable It gave us the modern stock market with its ability to buy and sell stocks in electronic brokerage agencies not to mention monitoring stock price and performance in real time

On the other hand the tools available to manage unstructured data today are still in their infancy The primary tool today is the Enterprise Search System3 which has gained prominence over the last two decades Prior to it no electronic mechanism existed for finding documents The paradigm of the user typing in some free-form text and receiving a set of documents is well known from Internet search engines

An early leader in this space was Autonomy which burst onto the scene with its technology rooted in pattern recognition and advanced mathematics The basic idea involves key concepts within a document and relating them to the text entered by a user The novel approach known as Adaptive Probabilistic Concept Modelling is rooted in Bayesian Inference and Shannonrsquos Information Theory the technique proved effective at finding associations between concepts within a document These techniques protected by over 120 patents continue to provide value and can significantly compound value when integrated into data governance and other structured data techniques

2 Seth Grimes ldquoBreakthrough Analysisrdquo [http breakthroughanalysiscom20080801 unstructured-data-and-the-80-percent-rule]

3 AIIM ldquoWhat is Enterprise Searchrdquo [httpwww aiimorgwhat-is-enterprise-search]

6

Technical White Paper

Look beyond with unstructured data analyticsThe analytics-driven business needs to look beyond merely finding and locating documents There are additional insights to be derived when advanced text analytics techniques can be incorporated and correlated into other forms of enterprise data

Linked data analysis

Today most companies use enterprise search in isolated data silos as a mechanism for finding documents It returns documents not data But linking data in documents to internal and external information systems enables a 360-degree view on a topic When data is linked it opens up a greater possibility for generating analytic insight Analytic agents capable of monitoring specific topics or analyzing data trends in real time becomes possible

For example a drug company might generate toxicity results that get immediately linked to externally published numbers derived from research journals The configured agents responsible for monitoring and aggregating lab results can immediately alert research scientists to discrepancies Such capabilities require the perfect marriage of unstructured text analytics with the structured knowledge residing in a database known as a knowledgebase Text analytic tools are capable of mining text to generate factsmdashfor example the toxicity is X for compound Y

A knowledgebase contains systematic knowledge of an industry area often being generated by a group of experts in the field and it models key concepts and subject relationships such as Morgan Doe is an expert in cancer research cancer is a disease The knowledgebase acts as the source of truth on a subject area operating as a textbook on key concepts and their relationships It becomes important as a guide for text analytic tools as important facts are mined and extracted It can aid in disambiguating terms and providing provenance to facts and metadata extracted from documents

For example a text mining agent might come across a group of names associated with an important journal article on cancer During the extraction process the names can be compared to the knowledgebase containing all known experts in cancer research Based on information gleaned from mined information new names might be encountered indicating the need to add them to the known list of experts in cancer research In this example data mining becomes a kind of expert location tool that continually updates the source of truth of known experts

7

Technical White Paper

Entity analysis

The first step toward enabling linked data analysis is entity extraction It recognizes important termsmdashsuch as people places locations or credit card numbersmdashlocated within text documents Entity extraction algorithms are specific to a particular industry and are usually intelligent enough to disambiguate similar names or synonyms A drug might have a number of different name variations for the same drug and entity extraction algorithms will associate them to the same drug

Other uses include locating and extracting important people places and locations Once extracted they can be cross- referenced disambiguated or incorporated into a human workflow or rules engine This cleansing process improves the quality of entities returned and ensures the accuracy of the entities as they relate to important concepts

Sentiment analysis

Another technique for gaining insight is sentiment analysis Itrsquos a powerful technique for gauging the aggregate view on a particular subject and is powerful in call center analysis It can parse through call center transcripts and determine whether the text contains positive negative or neutral feelings toward a subject

Itrsquos an especially powerful technique when used in the telecommunications industry The phrase ldquoI had a positive experience using brand Xrdquo indicates positive sentiment while the phrase ldquoI am not happy with brand Xrdquo indicates negative sentiment When call center transcripts are run through sentiment analysis tools they can serve as a gauge for providing insight into customer service and satisfaction

When customer sentiment is monitored in real time it provides a mechanism for trouble-shooting and pinpointing problems with customer satisfaction Negative sentiment tied to a particular customer call can indicate an unhappy customer who requires further help to ensure customer satisfaction This kind of analytic insight provides a mechanism for focusing on customer satisfaction and reducing customer attrition When a high-value customer calls into the call center with highly negative sentiment identified in the transcripts it becomes an indication that action needs to be taken in order to maintain the customer relationship A company is able to monitor the response to its product based on the unfiltered comments found in blogs and tweets

8

Technical White Paper

Conceptual analysis

Closely related to sentiment analysis conceptual analysis identifies key concepts within textual content This can be a powerful technique for watching new trends emerge within social media text

For example a healthcare company interested in monitoring patient side effects to a drug during clinical trials might use the capability As users enter comments they might discuss their symptoms as being sleepy nauseous or having a loss of appetite Particularly when the number of documents becomes too large for a human to read this technique becomes important

It helps the business monitor ldquounknown unknownsrdquo Most drug side effects would be well documented and incorporated into structured systems Yet there may be unknown side effects documented by a small minority of usersmdash such as the few cases where it resulted in extreme itching or migraines Conceptual analysis would uncover these situations

Other text analytic capabilities

These techniques include geolocation categorization and summarization Used in specialized situations they serve as important metadata enrichment techniques

Geolocation uses complex analytic approaches to determine latitude and longitude based on a postal code or location referenced within text Categorization automatically categorizes documents based on a taxonomy Summarization summarizes a document or group of documents These additional techniques provide additional text analytic approaches that can be leveraged as another tool in the tool chest

9

Technical White Paper

Use HPE IDOLVertica for unstructured analyticsDXC Technology is uniquely positioned to provide unified analytics Using Vertica4 and HPE IDOL5 together enables analytics to be done on all types of information sources HPE IDOL has a competitive advantage with its unique pattern matching technology and text analytic capabilities for unstructured data Vertica has a competitive advantage as a fast columnar data store built from the ground upmdashwith Big Data and analytics in mind The integration of Vertica with HPE IDOL is the perfect marriage and makes the unified Actionable Analytics framework possible In terms of architecture there are a few other options available

DXC IDOL within Vertica

HPE IDOL UDX Pack for Vertica6 represents a major move to incorporate HPE IDOL technologies into the native Vertica database With the HPE IDOL User-Defined Extension Pack for Vertica a user can use an SQL-based tool to analyze human friendly and business data together This becomes powerful for unified analytics and is appropriate for use cases that want to store native text content within Vertica

Rather than indexing text into a standalone full-text HPE IDOL index text can be included as a column within the Vertica database Once incorporated users can query unstructured data directlymdashusing SQL syntax This is appropriate for business use cases where full text is a component of a data solution requiring a relational data store In addition only a subset of HPE IDOL functionality is exposed making it appropriate for more lightweight unstructured use cases

Current HPE IDOL functionality provided within Vertica includes

bull Entity Text Analytics is supported and lets important entities be extracted from documents that have been imported and stored in the HPE IDOL Server

bull Geolocation extracts important postal codes country codes and locations

bull Key view capabilities can be integrated so text can be extracted from a PDF Microsoft Word and over 1000 different file types

bull Language detection determines the language supported

bull Summarization enables a document summary to be generated from text

Using the integrated HPE IDOL capability is appropriate in the following situations

bull Documents are stored on a file share accessible by the Vertica database

bull Textual data represents a component of a database schema

bull Full HPE IDOL analytic capabilities are not required

bull Full-text search capability is not a requirement

4 DXC Vertica MarketPlace [httpsvertica hpwsportalcomHomeShow]

5 ldquoThe Most Advanced Search and Analytics Platform DXC IDOL 10rdquo [httpwwwautonomycomassets globalpdfProductsPower IDOL20140114_ PI_B_HP_IDOL10_Migration_Overview_webpdf]

6 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

10

Technical White Paper

HPE IDOL Hadoop and Vertica

For use cases that require full HPE IDOL text analytic capabilities the native HPE IDOL Vertica integration may not be appropriate and a full VerticaHPE IDOL architecture may need to be adopted Using HPE IDOL Hadoop and Vertica together is a way to combine all the analytic capabilities into one framework There are few different HPE IDOLHadoop integration packages available and the most powerfulmdashthe Extreme HPE IDOL Analytics pack7mdashsupports text analytic enrichment such as sentiment analysis entity extraction and conceptual analysis capabilities

The Vertica repository can integrate with this technology along with the internal Vertica Advanced Analytics capabilities such as predictive analytics In this fashion insights generated from the unstructured repository are integrated into the relational insights for a comprehensive view from all information sources

The summary of the appropriate times to leverage the full HPE IDOLVertica is

bull Full text search over a large number of documents is a priority

bull Documents reside in a diverse number of repositories requiring diverse connectors

bull Full analytic capabilities are required

bull Structured information needs to be stored in a relational index for analytics

bull Full analytic capabilities available in a Big Data relational data storemdashis required

7 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

11

Technical White Paper

Bridge BI and Big DataIn order to address new business intelligence (BI) needs and become truly data-driven DXC has launched a suite of services known as DXC Business Intelligence Modernization Services It provides a proven business-led approach that bridges traditional BI with new Big Data technologiesmdash enabling you to transition to the business analytics of the future We use several methods to help transform BI and analytics capabilities on your terms

bull Discovery ServicesmdashExplore test share and learn to understand your data andshare new insights Data lakes data visualization tools and services enable rapidenterprise-wide data sharing and analytic discovery collaboration

bull Analytic SolutionsmdashAnalyze understand and act to apply insights to yourbusiness Solutions that address specific analytics run the business better

bull Hybrid Data Management ServicesmdashOptimize integrate govern and manageto become more agile through hybrid data management services Services enableenterprises to deliver production-grade analytics integrated into business processesand systems that leverage 100 of relevant data

Our end-to-end approach enables you to develop and execute on a strategic roadmapmdashto quickly respond to growing Big Data challenges optimize your existing BI investments control upfront and long-term costs avoid technology obsolescence and ultimately improve business outcomes in a real-time analytics environment

Be a data-driven agile enterpriseProviding an analytic platform that enables organizations to derive the most insight has become a business imperative The majority of information today consists of human friendly information business analytic frameworks need to incorporate it with structured data to provide actionable analytics DXC BI Modernization Services provides a pragmatic approach to bridging traditional BI with new Big Data technologiesmdashenabling the transition to a data-driven and agile enterprise

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1

Page 5: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

5

Technical White Paper

Understand the nature of unstructured dataUnstructured data is approximately 80 of the information residing within an enterprise2 Itrsquos the information generated in emails presentations Microsoftreg Word documents videos and audio files These documents often reside in content management systems (CMS) file share documents and even rich media and multimedia servers This information is diverse and varied but generally has the following characteristics

bull The exact meaning of terms are not rigid making them ambiguous

bull No standard file format or document structure exists

bull Document content and style is as diverse as the human language

bull The quality of content is variable differing in quality and usefulness

Structured data on the other hand is an organizationrsquos operational data It resides in accounting systems devices and other business systems generated during the course of day-to-day operations All operational data resides in a database Structured data generally has the following characteristics

bull The meaning of data is well defined and unambiguous

bull Highly controlled methods and standards exist for entering and maintainingthe data

bull Data and data entry is restricted to work flows and applications

The primary tool for managing structured data is the database and these techniques have revolutionized business over the last 50 years The database lets businesses quickly calculate income revenue and accounts payable It gave us the modern stock market with its ability to buy and sell stocks in electronic brokerage agencies not to mention monitoring stock price and performance in real time

On the other hand the tools available to manage unstructured data today are still in their infancy The primary tool today is the Enterprise Search System3 which has gained prominence over the last two decades Prior to it no electronic mechanism existed for finding documents The paradigm of the user typing in some free-form text and receiving a set of documents is well known from Internet search engines

An early leader in this space was Autonomy which burst onto the scene with its technology rooted in pattern recognition and advanced mathematics The basic idea involves key concepts within a document and relating them to the text entered by a user The novel approach known as Adaptive Probabilistic Concept Modelling is rooted in Bayesian Inference and Shannonrsquos Information Theory the technique proved effective at finding associations between concepts within a document These techniques protected by over 120 patents continue to provide value and can significantly compound value when integrated into data governance and other structured data techniques

2 Seth Grimes ldquoBreakthrough Analysisrdquo [http breakthroughanalysiscom20080801 unstructured-data-and-the-80-percent-rule]

3 AIIM ldquoWhat is Enterprise Searchrdquo [httpwww aiimorgwhat-is-enterprise-search]

6

Technical White Paper

Look beyond with unstructured data analyticsThe analytics-driven business needs to look beyond merely finding and locating documents There are additional insights to be derived when advanced text analytics techniques can be incorporated and correlated into other forms of enterprise data

Linked data analysis

Today most companies use enterprise search in isolated data silos as a mechanism for finding documents It returns documents not data But linking data in documents to internal and external information systems enables a 360-degree view on a topic When data is linked it opens up a greater possibility for generating analytic insight Analytic agents capable of monitoring specific topics or analyzing data trends in real time becomes possible

For example a drug company might generate toxicity results that get immediately linked to externally published numbers derived from research journals The configured agents responsible for monitoring and aggregating lab results can immediately alert research scientists to discrepancies Such capabilities require the perfect marriage of unstructured text analytics with the structured knowledge residing in a database known as a knowledgebase Text analytic tools are capable of mining text to generate factsmdashfor example the toxicity is X for compound Y

A knowledgebase contains systematic knowledge of an industry area often being generated by a group of experts in the field and it models key concepts and subject relationships such as Morgan Doe is an expert in cancer research cancer is a disease The knowledgebase acts as the source of truth on a subject area operating as a textbook on key concepts and their relationships It becomes important as a guide for text analytic tools as important facts are mined and extracted It can aid in disambiguating terms and providing provenance to facts and metadata extracted from documents

For example a text mining agent might come across a group of names associated with an important journal article on cancer During the extraction process the names can be compared to the knowledgebase containing all known experts in cancer research Based on information gleaned from mined information new names might be encountered indicating the need to add them to the known list of experts in cancer research In this example data mining becomes a kind of expert location tool that continually updates the source of truth of known experts

7

Technical White Paper

Entity analysis

The first step toward enabling linked data analysis is entity extraction It recognizes important termsmdashsuch as people places locations or credit card numbersmdashlocated within text documents Entity extraction algorithms are specific to a particular industry and are usually intelligent enough to disambiguate similar names or synonyms A drug might have a number of different name variations for the same drug and entity extraction algorithms will associate them to the same drug

Other uses include locating and extracting important people places and locations Once extracted they can be cross- referenced disambiguated or incorporated into a human workflow or rules engine This cleansing process improves the quality of entities returned and ensures the accuracy of the entities as they relate to important concepts

Sentiment analysis

Another technique for gaining insight is sentiment analysis Itrsquos a powerful technique for gauging the aggregate view on a particular subject and is powerful in call center analysis It can parse through call center transcripts and determine whether the text contains positive negative or neutral feelings toward a subject

Itrsquos an especially powerful technique when used in the telecommunications industry The phrase ldquoI had a positive experience using brand Xrdquo indicates positive sentiment while the phrase ldquoI am not happy with brand Xrdquo indicates negative sentiment When call center transcripts are run through sentiment analysis tools they can serve as a gauge for providing insight into customer service and satisfaction

When customer sentiment is monitored in real time it provides a mechanism for trouble-shooting and pinpointing problems with customer satisfaction Negative sentiment tied to a particular customer call can indicate an unhappy customer who requires further help to ensure customer satisfaction This kind of analytic insight provides a mechanism for focusing on customer satisfaction and reducing customer attrition When a high-value customer calls into the call center with highly negative sentiment identified in the transcripts it becomes an indication that action needs to be taken in order to maintain the customer relationship A company is able to monitor the response to its product based on the unfiltered comments found in blogs and tweets

8

Technical White Paper

Conceptual analysis

Closely related to sentiment analysis conceptual analysis identifies key concepts within textual content This can be a powerful technique for watching new trends emerge within social media text

For example a healthcare company interested in monitoring patient side effects to a drug during clinical trials might use the capability As users enter comments they might discuss their symptoms as being sleepy nauseous or having a loss of appetite Particularly when the number of documents becomes too large for a human to read this technique becomes important

It helps the business monitor ldquounknown unknownsrdquo Most drug side effects would be well documented and incorporated into structured systems Yet there may be unknown side effects documented by a small minority of usersmdash such as the few cases where it resulted in extreme itching or migraines Conceptual analysis would uncover these situations

Other text analytic capabilities

These techniques include geolocation categorization and summarization Used in specialized situations they serve as important metadata enrichment techniques

Geolocation uses complex analytic approaches to determine latitude and longitude based on a postal code or location referenced within text Categorization automatically categorizes documents based on a taxonomy Summarization summarizes a document or group of documents These additional techniques provide additional text analytic approaches that can be leveraged as another tool in the tool chest

9

Technical White Paper

Use HPE IDOLVertica for unstructured analyticsDXC Technology is uniquely positioned to provide unified analytics Using Vertica4 and HPE IDOL5 together enables analytics to be done on all types of information sources HPE IDOL has a competitive advantage with its unique pattern matching technology and text analytic capabilities for unstructured data Vertica has a competitive advantage as a fast columnar data store built from the ground upmdashwith Big Data and analytics in mind The integration of Vertica with HPE IDOL is the perfect marriage and makes the unified Actionable Analytics framework possible In terms of architecture there are a few other options available

DXC IDOL within Vertica

HPE IDOL UDX Pack for Vertica6 represents a major move to incorporate HPE IDOL technologies into the native Vertica database With the HPE IDOL User-Defined Extension Pack for Vertica a user can use an SQL-based tool to analyze human friendly and business data together This becomes powerful for unified analytics and is appropriate for use cases that want to store native text content within Vertica

Rather than indexing text into a standalone full-text HPE IDOL index text can be included as a column within the Vertica database Once incorporated users can query unstructured data directlymdashusing SQL syntax This is appropriate for business use cases where full text is a component of a data solution requiring a relational data store In addition only a subset of HPE IDOL functionality is exposed making it appropriate for more lightweight unstructured use cases

Current HPE IDOL functionality provided within Vertica includes

bull Entity Text Analytics is supported and lets important entities be extracted from documents that have been imported and stored in the HPE IDOL Server

bull Geolocation extracts important postal codes country codes and locations

bull Key view capabilities can be integrated so text can be extracted from a PDF Microsoft Word and over 1000 different file types

bull Language detection determines the language supported

bull Summarization enables a document summary to be generated from text

Using the integrated HPE IDOL capability is appropriate in the following situations

bull Documents are stored on a file share accessible by the Vertica database

bull Textual data represents a component of a database schema

bull Full HPE IDOL analytic capabilities are not required

bull Full-text search capability is not a requirement

4 DXC Vertica MarketPlace [httpsvertica hpwsportalcomHomeShow]

5 ldquoThe Most Advanced Search and Analytics Platform DXC IDOL 10rdquo [httpwwwautonomycomassets globalpdfProductsPower IDOL20140114_ PI_B_HP_IDOL10_Migration_Overview_webpdf]

6 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

10

Technical White Paper

HPE IDOL Hadoop and Vertica

For use cases that require full HPE IDOL text analytic capabilities the native HPE IDOL Vertica integration may not be appropriate and a full VerticaHPE IDOL architecture may need to be adopted Using HPE IDOL Hadoop and Vertica together is a way to combine all the analytic capabilities into one framework There are few different HPE IDOLHadoop integration packages available and the most powerfulmdashthe Extreme HPE IDOL Analytics pack7mdashsupports text analytic enrichment such as sentiment analysis entity extraction and conceptual analysis capabilities

The Vertica repository can integrate with this technology along with the internal Vertica Advanced Analytics capabilities such as predictive analytics In this fashion insights generated from the unstructured repository are integrated into the relational insights for a comprehensive view from all information sources

The summary of the appropriate times to leverage the full HPE IDOLVertica is

bull Full text search over a large number of documents is a priority

bull Documents reside in a diverse number of repositories requiring diverse connectors

bull Full analytic capabilities are required

bull Structured information needs to be stored in a relational index for analytics

bull Full analytic capabilities available in a Big Data relational data storemdashis required

7 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

11

Technical White Paper

Bridge BI and Big DataIn order to address new business intelligence (BI) needs and become truly data-driven DXC has launched a suite of services known as DXC Business Intelligence Modernization Services It provides a proven business-led approach that bridges traditional BI with new Big Data technologiesmdash enabling you to transition to the business analytics of the future We use several methods to help transform BI and analytics capabilities on your terms

bull Discovery ServicesmdashExplore test share and learn to understand your data andshare new insights Data lakes data visualization tools and services enable rapidenterprise-wide data sharing and analytic discovery collaboration

bull Analytic SolutionsmdashAnalyze understand and act to apply insights to yourbusiness Solutions that address specific analytics run the business better

bull Hybrid Data Management ServicesmdashOptimize integrate govern and manageto become more agile through hybrid data management services Services enableenterprises to deliver production-grade analytics integrated into business processesand systems that leverage 100 of relevant data

Our end-to-end approach enables you to develop and execute on a strategic roadmapmdashto quickly respond to growing Big Data challenges optimize your existing BI investments control upfront and long-term costs avoid technology obsolescence and ultimately improve business outcomes in a real-time analytics environment

Be a data-driven agile enterpriseProviding an analytic platform that enables organizations to derive the most insight has become a business imperative The majority of information today consists of human friendly information business analytic frameworks need to incorporate it with structured data to provide actionable analytics DXC BI Modernization Services provides a pragmatic approach to bridging traditional BI with new Big Data technologiesmdashenabling the transition to a data-driven and agile enterprise

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1

Page 6: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

6

Technical White Paper

Look beyond with unstructured data analyticsThe analytics-driven business needs to look beyond merely finding and locating documents There are additional insights to be derived when advanced text analytics techniques can be incorporated and correlated into other forms of enterprise data

Linked data analysis

Today most companies use enterprise search in isolated data silos as a mechanism for finding documents It returns documents not data But linking data in documents to internal and external information systems enables a 360-degree view on a topic When data is linked it opens up a greater possibility for generating analytic insight Analytic agents capable of monitoring specific topics or analyzing data trends in real time becomes possible

For example a drug company might generate toxicity results that get immediately linked to externally published numbers derived from research journals The configured agents responsible for monitoring and aggregating lab results can immediately alert research scientists to discrepancies Such capabilities require the perfect marriage of unstructured text analytics with the structured knowledge residing in a database known as a knowledgebase Text analytic tools are capable of mining text to generate factsmdashfor example the toxicity is X for compound Y

A knowledgebase contains systematic knowledge of an industry area often being generated by a group of experts in the field and it models key concepts and subject relationships such as Morgan Doe is an expert in cancer research cancer is a disease The knowledgebase acts as the source of truth on a subject area operating as a textbook on key concepts and their relationships It becomes important as a guide for text analytic tools as important facts are mined and extracted It can aid in disambiguating terms and providing provenance to facts and metadata extracted from documents

For example a text mining agent might come across a group of names associated with an important journal article on cancer During the extraction process the names can be compared to the knowledgebase containing all known experts in cancer research Based on information gleaned from mined information new names might be encountered indicating the need to add them to the known list of experts in cancer research In this example data mining becomes a kind of expert location tool that continually updates the source of truth of known experts

7

Technical White Paper

Entity analysis

The first step toward enabling linked data analysis is entity extraction It recognizes important termsmdashsuch as people places locations or credit card numbersmdashlocated within text documents Entity extraction algorithms are specific to a particular industry and are usually intelligent enough to disambiguate similar names or synonyms A drug might have a number of different name variations for the same drug and entity extraction algorithms will associate them to the same drug

Other uses include locating and extracting important people places and locations Once extracted they can be cross- referenced disambiguated or incorporated into a human workflow or rules engine This cleansing process improves the quality of entities returned and ensures the accuracy of the entities as they relate to important concepts

Sentiment analysis

Another technique for gaining insight is sentiment analysis Itrsquos a powerful technique for gauging the aggregate view on a particular subject and is powerful in call center analysis It can parse through call center transcripts and determine whether the text contains positive negative or neutral feelings toward a subject

Itrsquos an especially powerful technique when used in the telecommunications industry The phrase ldquoI had a positive experience using brand Xrdquo indicates positive sentiment while the phrase ldquoI am not happy with brand Xrdquo indicates negative sentiment When call center transcripts are run through sentiment analysis tools they can serve as a gauge for providing insight into customer service and satisfaction

When customer sentiment is monitored in real time it provides a mechanism for trouble-shooting and pinpointing problems with customer satisfaction Negative sentiment tied to a particular customer call can indicate an unhappy customer who requires further help to ensure customer satisfaction This kind of analytic insight provides a mechanism for focusing on customer satisfaction and reducing customer attrition When a high-value customer calls into the call center with highly negative sentiment identified in the transcripts it becomes an indication that action needs to be taken in order to maintain the customer relationship A company is able to monitor the response to its product based on the unfiltered comments found in blogs and tweets

8

Technical White Paper

Conceptual analysis

Closely related to sentiment analysis conceptual analysis identifies key concepts within textual content This can be a powerful technique for watching new trends emerge within social media text

For example a healthcare company interested in monitoring patient side effects to a drug during clinical trials might use the capability As users enter comments they might discuss their symptoms as being sleepy nauseous or having a loss of appetite Particularly when the number of documents becomes too large for a human to read this technique becomes important

It helps the business monitor ldquounknown unknownsrdquo Most drug side effects would be well documented and incorporated into structured systems Yet there may be unknown side effects documented by a small minority of usersmdash such as the few cases where it resulted in extreme itching or migraines Conceptual analysis would uncover these situations

Other text analytic capabilities

These techniques include geolocation categorization and summarization Used in specialized situations they serve as important metadata enrichment techniques

Geolocation uses complex analytic approaches to determine latitude and longitude based on a postal code or location referenced within text Categorization automatically categorizes documents based on a taxonomy Summarization summarizes a document or group of documents These additional techniques provide additional text analytic approaches that can be leveraged as another tool in the tool chest

9

Technical White Paper

Use HPE IDOLVertica for unstructured analyticsDXC Technology is uniquely positioned to provide unified analytics Using Vertica4 and HPE IDOL5 together enables analytics to be done on all types of information sources HPE IDOL has a competitive advantage with its unique pattern matching technology and text analytic capabilities for unstructured data Vertica has a competitive advantage as a fast columnar data store built from the ground upmdashwith Big Data and analytics in mind The integration of Vertica with HPE IDOL is the perfect marriage and makes the unified Actionable Analytics framework possible In terms of architecture there are a few other options available

DXC IDOL within Vertica

HPE IDOL UDX Pack for Vertica6 represents a major move to incorporate HPE IDOL technologies into the native Vertica database With the HPE IDOL User-Defined Extension Pack for Vertica a user can use an SQL-based tool to analyze human friendly and business data together This becomes powerful for unified analytics and is appropriate for use cases that want to store native text content within Vertica

Rather than indexing text into a standalone full-text HPE IDOL index text can be included as a column within the Vertica database Once incorporated users can query unstructured data directlymdashusing SQL syntax This is appropriate for business use cases where full text is a component of a data solution requiring a relational data store In addition only a subset of HPE IDOL functionality is exposed making it appropriate for more lightweight unstructured use cases

Current HPE IDOL functionality provided within Vertica includes

bull Entity Text Analytics is supported and lets important entities be extracted from documents that have been imported and stored in the HPE IDOL Server

bull Geolocation extracts important postal codes country codes and locations

bull Key view capabilities can be integrated so text can be extracted from a PDF Microsoft Word and over 1000 different file types

bull Language detection determines the language supported

bull Summarization enables a document summary to be generated from text

Using the integrated HPE IDOL capability is appropriate in the following situations

bull Documents are stored on a file share accessible by the Vertica database

bull Textual data represents a component of a database schema

bull Full HPE IDOL analytic capabilities are not required

bull Full-text search capability is not a requirement

4 DXC Vertica MarketPlace [httpsvertica hpwsportalcomHomeShow]

5 ldquoThe Most Advanced Search and Analytics Platform DXC IDOL 10rdquo [httpwwwautonomycomassets globalpdfProductsPower IDOL20140114_ PI_B_HP_IDOL10_Migration_Overview_webpdf]

6 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

10

Technical White Paper

HPE IDOL Hadoop and Vertica

For use cases that require full HPE IDOL text analytic capabilities the native HPE IDOL Vertica integration may not be appropriate and a full VerticaHPE IDOL architecture may need to be adopted Using HPE IDOL Hadoop and Vertica together is a way to combine all the analytic capabilities into one framework There are few different HPE IDOLHadoop integration packages available and the most powerfulmdashthe Extreme HPE IDOL Analytics pack7mdashsupports text analytic enrichment such as sentiment analysis entity extraction and conceptual analysis capabilities

The Vertica repository can integrate with this technology along with the internal Vertica Advanced Analytics capabilities such as predictive analytics In this fashion insights generated from the unstructured repository are integrated into the relational insights for a comprehensive view from all information sources

The summary of the appropriate times to leverage the full HPE IDOLVertica is

bull Full text search over a large number of documents is a priority

bull Documents reside in a diverse number of repositories requiring diverse connectors

bull Full analytic capabilities are required

bull Structured information needs to be stored in a relational index for analytics

bull Full analytic capabilities available in a Big Data relational data storemdashis required

7 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

11

Technical White Paper

Bridge BI and Big DataIn order to address new business intelligence (BI) needs and become truly data-driven DXC has launched a suite of services known as DXC Business Intelligence Modernization Services It provides a proven business-led approach that bridges traditional BI with new Big Data technologiesmdash enabling you to transition to the business analytics of the future We use several methods to help transform BI and analytics capabilities on your terms

bull Discovery ServicesmdashExplore test share and learn to understand your data andshare new insights Data lakes data visualization tools and services enable rapidenterprise-wide data sharing and analytic discovery collaboration

bull Analytic SolutionsmdashAnalyze understand and act to apply insights to yourbusiness Solutions that address specific analytics run the business better

bull Hybrid Data Management ServicesmdashOptimize integrate govern and manageto become more agile through hybrid data management services Services enableenterprises to deliver production-grade analytics integrated into business processesand systems that leverage 100 of relevant data

Our end-to-end approach enables you to develop and execute on a strategic roadmapmdashto quickly respond to growing Big Data challenges optimize your existing BI investments control upfront and long-term costs avoid technology obsolescence and ultimately improve business outcomes in a real-time analytics environment

Be a data-driven agile enterpriseProviding an analytic platform that enables organizations to derive the most insight has become a business imperative The majority of information today consists of human friendly information business analytic frameworks need to incorporate it with structured data to provide actionable analytics DXC BI Modernization Services provides a pragmatic approach to bridging traditional BI with new Big Data technologiesmdashenabling the transition to a data-driven and agile enterprise

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1

Page 7: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

7

Technical White Paper

Entity analysis

The first step toward enabling linked data analysis is entity extraction It recognizes important termsmdashsuch as people places locations or credit card numbersmdashlocated within text documents Entity extraction algorithms are specific to a particular industry and are usually intelligent enough to disambiguate similar names or synonyms A drug might have a number of different name variations for the same drug and entity extraction algorithms will associate them to the same drug

Other uses include locating and extracting important people places and locations Once extracted they can be cross- referenced disambiguated or incorporated into a human workflow or rules engine This cleansing process improves the quality of entities returned and ensures the accuracy of the entities as they relate to important concepts

Sentiment analysis

Another technique for gaining insight is sentiment analysis Itrsquos a powerful technique for gauging the aggregate view on a particular subject and is powerful in call center analysis It can parse through call center transcripts and determine whether the text contains positive negative or neutral feelings toward a subject

Itrsquos an especially powerful technique when used in the telecommunications industry The phrase ldquoI had a positive experience using brand Xrdquo indicates positive sentiment while the phrase ldquoI am not happy with brand Xrdquo indicates negative sentiment When call center transcripts are run through sentiment analysis tools they can serve as a gauge for providing insight into customer service and satisfaction

When customer sentiment is monitored in real time it provides a mechanism for trouble-shooting and pinpointing problems with customer satisfaction Negative sentiment tied to a particular customer call can indicate an unhappy customer who requires further help to ensure customer satisfaction This kind of analytic insight provides a mechanism for focusing on customer satisfaction and reducing customer attrition When a high-value customer calls into the call center with highly negative sentiment identified in the transcripts it becomes an indication that action needs to be taken in order to maintain the customer relationship A company is able to monitor the response to its product based on the unfiltered comments found in blogs and tweets

8

Technical White Paper

Conceptual analysis

Closely related to sentiment analysis conceptual analysis identifies key concepts within textual content This can be a powerful technique for watching new trends emerge within social media text

For example a healthcare company interested in monitoring patient side effects to a drug during clinical trials might use the capability As users enter comments they might discuss their symptoms as being sleepy nauseous or having a loss of appetite Particularly when the number of documents becomes too large for a human to read this technique becomes important

It helps the business monitor ldquounknown unknownsrdquo Most drug side effects would be well documented and incorporated into structured systems Yet there may be unknown side effects documented by a small minority of usersmdash such as the few cases where it resulted in extreme itching or migraines Conceptual analysis would uncover these situations

Other text analytic capabilities

These techniques include geolocation categorization and summarization Used in specialized situations they serve as important metadata enrichment techniques

Geolocation uses complex analytic approaches to determine latitude and longitude based on a postal code or location referenced within text Categorization automatically categorizes documents based on a taxonomy Summarization summarizes a document or group of documents These additional techniques provide additional text analytic approaches that can be leveraged as another tool in the tool chest

9

Technical White Paper

Use HPE IDOLVertica for unstructured analyticsDXC Technology is uniquely positioned to provide unified analytics Using Vertica4 and HPE IDOL5 together enables analytics to be done on all types of information sources HPE IDOL has a competitive advantage with its unique pattern matching technology and text analytic capabilities for unstructured data Vertica has a competitive advantage as a fast columnar data store built from the ground upmdashwith Big Data and analytics in mind The integration of Vertica with HPE IDOL is the perfect marriage and makes the unified Actionable Analytics framework possible In terms of architecture there are a few other options available

DXC IDOL within Vertica

HPE IDOL UDX Pack for Vertica6 represents a major move to incorporate HPE IDOL technologies into the native Vertica database With the HPE IDOL User-Defined Extension Pack for Vertica a user can use an SQL-based tool to analyze human friendly and business data together This becomes powerful for unified analytics and is appropriate for use cases that want to store native text content within Vertica

Rather than indexing text into a standalone full-text HPE IDOL index text can be included as a column within the Vertica database Once incorporated users can query unstructured data directlymdashusing SQL syntax This is appropriate for business use cases where full text is a component of a data solution requiring a relational data store In addition only a subset of HPE IDOL functionality is exposed making it appropriate for more lightweight unstructured use cases

Current HPE IDOL functionality provided within Vertica includes

bull Entity Text Analytics is supported and lets important entities be extracted from documents that have been imported and stored in the HPE IDOL Server

bull Geolocation extracts important postal codes country codes and locations

bull Key view capabilities can be integrated so text can be extracted from a PDF Microsoft Word and over 1000 different file types

bull Language detection determines the language supported

bull Summarization enables a document summary to be generated from text

Using the integrated HPE IDOL capability is appropriate in the following situations

bull Documents are stored on a file share accessible by the Vertica database

bull Textual data represents a component of a database schema

bull Full HPE IDOL analytic capabilities are not required

bull Full-text search capability is not a requirement

4 DXC Vertica MarketPlace [httpsvertica hpwsportalcomHomeShow]

5 ldquoThe Most Advanced Search and Analytics Platform DXC IDOL 10rdquo [httpwwwautonomycomassets globalpdfProductsPower IDOL20140114_ PI_B_HP_IDOL10_Migration_Overview_webpdf]

6 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

10

Technical White Paper

HPE IDOL Hadoop and Vertica

For use cases that require full HPE IDOL text analytic capabilities the native HPE IDOL Vertica integration may not be appropriate and a full VerticaHPE IDOL architecture may need to be adopted Using HPE IDOL Hadoop and Vertica together is a way to combine all the analytic capabilities into one framework There are few different HPE IDOLHadoop integration packages available and the most powerfulmdashthe Extreme HPE IDOL Analytics pack7mdashsupports text analytic enrichment such as sentiment analysis entity extraction and conceptual analysis capabilities

The Vertica repository can integrate with this technology along with the internal Vertica Advanced Analytics capabilities such as predictive analytics In this fashion insights generated from the unstructured repository are integrated into the relational insights for a comprehensive view from all information sources

The summary of the appropriate times to leverage the full HPE IDOLVertica is

bull Full text search over a large number of documents is a priority

bull Documents reside in a diverse number of repositories requiring diverse connectors

bull Full analytic capabilities are required

bull Structured information needs to be stored in a relational index for analytics

bull Full analytic capabilities available in a Big Data relational data storemdashis required

7 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

11

Technical White Paper

Bridge BI and Big DataIn order to address new business intelligence (BI) needs and become truly data-driven DXC has launched a suite of services known as DXC Business Intelligence Modernization Services It provides a proven business-led approach that bridges traditional BI with new Big Data technologiesmdash enabling you to transition to the business analytics of the future We use several methods to help transform BI and analytics capabilities on your terms

bull Discovery ServicesmdashExplore test share and learn to understand your data andshare new insights Data lakes data visualization tools and services enable rapidenterprise-wide data sharing and analytic discovery collaboration

bull Analytic SolutionsmdashAnalyze understand and act to apply insights to yourbusiness Solutions that address specific analytics run the business better

bull Hybrid Data Management ServicesmdashOptimize integrate govern and manageto become more agile through hybrid data management services Services enableenterprises to deliver production-grade analytics integrated into business processesand systems that leverage 100 of relevant data

Our end-to-end approach enables you to develop and execute on a strategic roadmapmdashto quickly respond to growing Big Data challenges optimize your existing BI investments control upfront and long-term costs avoid technology obsolescence and ultimately improve business outcomes in a real-time analytics environment

Be a data-driven agile enterpriseProviding an analytic platform that enables organizations to derive the most insight has become a business imperative The majority of information today consists of human friendly information business analytic frameworks need to incorporate it with structured data to provide actionable analytics DXC BI Modernization Services provides a pragmatic approach to bridging traditional BI with new Big Data technologiesmdashenabling the transition to a data-driven and agile enterprise

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1

Page 8: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

8

Technical White Paper

Conceptual analysis

Closely related to sentiment analysis conceptual analysis identifies key concepts within textual content This can be a powerful technique for watching new trends emerge within social media text

For example a healthcare company interested in monitoring patient side effects to a drug during clinical trials might use the capability As users enter comments they might discuss their symptoms as being sleepy nauseous or having a loss of appetite Particularly when the number of documents becomes too large for a human to read this technique becomes important

It helps the business monitor ldquounknown unknownsrdquo Most drug side effects would be well documented and incorporated into structured systems Yet there may be unknown side effects documented by a small minority of usersmdash such as the few cases where it resulted in extreme itching or migraines Conceptual analysis would uncover these situations

Other text analytic capabilities

These techniques include geolocation categorization and summarization Used in specialized situations they serve as important metadata enrichment techniques

Geolocation uses complex analytic approaches to determine latitude and longitude based on a postal code or location referenced within text Categorization automatically categorizes documents based on a taxonomy Summarization summarizes a document or group of documents These additional techniques provide additional text analytic approaches that can be leveraged as another tool in the tool chest

9

Technical White Paper

Use HPE IDOLVertica for unstructured analyticsDXC Technology is uniquely positioned to provide unified analytics Using Vertica4 and HPE IDOL5 together enables analytics to be done on all types of information sources HPE IDOL has a competitive advantage with its unique pattern matching technology and text analytic capabilities for unstructured data Vertica has a competitive advantage as a fast columnar data store built from the ground upmdashwith Big Data and analytics in mind The integration of Vertica with HPE IDOL is the perfect marriage and makes the unified Actionable Analytics framework possible In terms of architecture there are a few other options available

DXC IDOL within Vertica

HPE IDOL UDX Pack for Vertica6 represents a major move to incorporate HPE IDOL technologies into the native Vertica database With the HPE IDOL User-Defined Extension Pack for Vertica a user can use an SQL-based tool to analyze human friendly and business data together This becomes powerful for unified analytics and is appropriate for use cases that want to store native text content within Vertica

Rather than indexing text into a standalone full-text HPE IDOL index text can be included as a column within the Vertica database Once incorporated users can query unstructured data directlymdashusing SQL syntax This is appropriate for business use cases where full text is a component of a data solution requiring a relational data store In addition only a subset of HPE IDOL functionality is exposed making it appropriate for more lightweight unstructured use cases

Current HPE IDOL functionality provided within Vertica includes

bull Entity Text Analytics is supported and lets important entities be extracted from documents that have been imported and stored in the HPE IDOL Server

bull Geolocation extracts important postal codes country codes and locations

bull Key view capabilities can be integrated so text can be extracted from a PDF Microsoft Word and over 1000 different file types

bull Language detection determines the language supported

bull Summarization enables a document summary to be generated from text

Using the integrated HPE IDOL capability is appropriate in the following situations

bull Documents are stored on a file share accessible by the Vertica database

bull Textual data represents a component of a database schema

bull Full HPE IDOL analytic capabilities are not required

bull Full-text search capability is not a requirement

4 DXC Vertica MarketPlace [httpsvertica hpwsportalcomHomeShow]

5 ldquoThe Most Advanced Search and Analytics Platform DXC IDOL 10rdquo [httpwwwautonomycomassets globalpdfProductsPower IDOL20140114_ PI_B_HP_IDOL10_Migration_Overview_webpdf]

6 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

10

Technical White Paper

HPE IDOL Hadoop and Vertica

For use cases that require full HPE IDOL text analytic capabilities the native HPE IDOL Vertica integration may not be appropriate and a full VerticaHPE IDOL architecture may need to be adopted Using HPE IDOL Hadoop and Vertica together is a way to combine all the analytic capabilities into one framework There are few different HPE IDOLHadoop integration packages available and the most powerfulmdashthe Extreme HPE IDOL Analytics pack7mdashsupports text analytic enrichment such as sentiment analysis entity extraction and conceptual analysis capabilities

The Vertica repository can integrate with this technology along with the internal Vertica Advanced Analytics capabilities such as predictive analytics In this fashion insights generated from the unstructured repository are integrated into the relational insights for a comprehensive view from all information sources

The summary of the appropriate times to leverage the full HPE IDOLVertica is

bull Full text search over a large number of documents is a priority

bull Documents reside in a diverse number of repositories requiring diverse connectors

bull Full analytic capabilities are required

bull Structured information needs to be stored in a relational index for analytics

bull Full analytic capabilities available in a Big Data relational data storemdashis required

7 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

11

Technical White Paper

Bridge BI and Big DataIn order to address new business intelligence (BI) needs and become truly data-driven DXC has launched a suite of services known as DXC Business Intelligence Modernization Services It provides a proven business-led approach that bridges traditional BI with new Big Data technologiesmdash enabling you to transition to the business analytics of the future We use several methods to help transform BI and analytics capabilities on your terms

bull Discovery ServicesmdashExplore test share and learn to understand your data andshare new insights Data lakes data visualization tools and services enable rapidenterprise-wide data sharing and analytic discovery collaboration

bull Analytic SolutionsmdashAnalyze understand and act to apply insights to yourbusiness Solutions that address specific analytics run the business better

bull Hybrid Data Management ServicesmdashOptimize integrate govern and manageto become more agile through hybrid data management services Services enableenterprises to deliver production-grade analytics integrated into business processesand systems that leverage 100 of relevant data

Our end-to-end approach enables you to develop and execute on a strategic roadmapmdashto quickly respond to growing Big Data challenges optimize your existing BI investments control upfront and long-term costs avoid technology obsolescence and ultimately improve business outcomes in a real-time analytics environment

Be a data-driven agile enterpriseProviding an analytic platform that enables organizations to derive the most insight has become a business imperative The majority of information today consists of human friendly information business analytic frameworks need to incorporate it with structured data to provide actionable analytics DXC BI Modernization Services provides a pragmatic approach to bridging traditional BI with new Big Data technologiesmdashenabling the transition to a data-driven and agile enterprise

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1

Page 9: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

9

Technical White Paper

Use HPE IDOLVertica for unstructured analyticsDXC Technology is uniquely positioned to provide unified analytics Using Vertica4 and HPE IDOL5 together enables analytics to be done on all types of information sources HPE IDOL has a competitive advantage with its unique pattern matching technology and text analytic capabilities for unstructured data Vertica has a competitive advantage as a fast columnar data store built from the ground upmdashwith Big Data and analytics in mind The integration of Vertica with HPE IDOL is the perfect marriage and makes the unified Actionable Analytics framework possible In terms of architecture there are a few other options available

DXC IDOL within Vertica

HPE IDOL UDX Pack for Vertica6 represents a major move to incorporate HPE IDOL technologies into the native Vertica database With the HPE IDOL User-Defined Extension Pack for Vertica a user can use an SQL-based tool to analyze human friendly and business data together This becomes powerful for unified analytics and is appropriate for use cases that want to store native text content within Vertica

Rather than indexing text into a standalone full-text HPE IDOL index text can be included as a column within the Vertica database Once incorporated users can query unstructured data directlymdashusing SQL syntax This is appropriate for business use cases where full text is a component of a data solution requiring a relational data store In addition only a subset of HPE IDOL functionality is exposed making it appropriate for more lightweight unstructured use cases

Current HPE IDOL functionality provided within Vertica includes

bull Entity Text Analytics is supported and lets important entities be extracted from documents that have been imported and stored in the HPE IDOL Server

bull Geolocation extracts important postal codes country codes and locations

bull Key view capabilities can be integrated so text can be extracted from a PDF Microsoft Word and over 1000 different file types

bull Language detection determines the language supported

bull Summarization enables a document summary to be generated from text

Using the integrated HPE IDOL capability is appropriate in the following situations

bull Documents are stored on a file share accessible by the Vertica database

bull Textual data represents a component of a database schema

bull Full HPE IDOL analytic capabilities are not required

bull Full-text search capability is not a requirement

4 DXC Vertica MarketPlace [httpsvertica hpwsportalcomHomeShow]

5 ldquoThe Most Advanced Search and Analytics Platform DXC IDOL 10rdquo [httpwwwautonomycomassets globalpdfProductsPower IDOL20140114_ PI_B_HP_IDOL10_Migration_Overview_webpdf]

6 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

10

Technical White Paper

HPE IDOL Hadoop and Vertica

For use cases that require full HPE IDOL text analytic capabilities the native HPE IDOL Vertica integration may not be appropriate and a full VerticaHPE IDOL architecture may need to be adopted Using HPE IDOL Hadoop and Vertica together is a way to combine all the analytic capabilities into one framework There are few different HPE IDOLHadoop integration packages available and the most powerfulmdashthe Extreme HPE IDOL Analytics pack7mdashsupports text analytic enrichment such as sentiment analysis entity extraction and conceptual analysis capabilities

The Vertica repository can integrate with this technology along with the internal Vertica Advanced Analytics capabilities such as predictive analytics In this fashion insights generated from the unstructured repository are integrated into the relational insights for a comprehensive view from all information sources

The summary of the appropriate times to leverage the full HPE IDOLVertica is

bull Full text search over a large number of documents is a priority

bull Documents reside in a diverse number of repositories requiring diverse connectors

bull Full analytic capabilities are required

bull Structured information needs to be stored in a relational index for analytics

bull Full analytic capabilities available in a Big Data relational data storemdashis required

7 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

11

Technical White Paper

Bridge BI and Big DataIn order to address new business intelligence (BI) needs and become truly data-driven DXC has launched a suite of services known as DXC Business Intelligence Modernization Services It provides a proven business-led approach that bridges traditional BI with new Big Data technologiesmdash enabling you to transition to the business analytics of the future We use several methods to help transform BI and analytics capabilities on your terms

bull Discovery ServicesmdashExplore test share and learn to understand your data andshare new insights Data lakes data visualization tools and services enable rapidenterprise-wide data sharing and analytic discovery collaboration

bull Analytic SolutionsmdashAnalyze understand and act to apply insights to yourbusiness Solutions that address specific analytics run the business better

bull Hybrid Data Management ServicesmdashOptimize integrate govern and manageto become more agile through hybrid data management services Services enableenterprises to deliver production-grade analytics integrated into business processesand systems that leverage 100 of relevant data

Our end-to-end approach enables you to develop and execute on a strategic roadmapmdashto quickly respond to growing Big Data challenges optimize your existing BI investments control upfront and long-term costs avoid technology obsolescence and ultimately improve business outcomes in a real-time analytics environment

Be a data-driven agile enterpriseProviding an analytic platform that enables organizations to derive the most insight has become a business imperative The majority of information today consists of human friendly information business analytic frameworks need to incorporate it with structured data to provide actionable analytics DXC BI Modernization Services provides a pragmatic approach to bridging traditional BI with new Big Data technologiesmdashenabling the transition to a data-driven and agile enterprise

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1

Page 10: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

10

Technical White Paper

HPE IDOL Hadoop and Vertica

For use cases that require full HPE IDOL text analytic capabilities the native HPE IDOL Vertica integration may not be appropriate and a full VerticaHPE IDOL architecture may need to be adopted Using HPE IDOL Hadoop and Vertica together is a way to combine all the analytic capabilities into one framework There are few different HPE IDOLHadoop integration packages available and the most powerfulmdashthe Extreme HPE IDOL Analytics pack7mdashsupports text analytic enrichment such as sentiment analysis entity extraction and conceptual analysis capabilities

The Vertica repository can integrate with this technology along with the internal Vertica Advanced Analytics capabilities such as predictive analytics In this fashion insights generated from the unstructured repository are integrated into the relational insights for a comprehensive view from all information sources

The summary of the appropriate times to leverage the full HPE IDOLVertica is

bull Full text search over a large number of documents is a priority

bull Documents reside in a diverse number of repositories requiring diverse connectors

bull Full analytic capabilities are required

bull Structured information needs to be stored in a relational index for analytics

bull Full analytic capabilities available in a Big Data relational data storemdashis required

7 IDOL for Hadoop [httpwwwautonomy comproductsidol-hadoop-data]

11

Technical White Paper

Bridge BI and Big DataIn order to address new business intelligence (BI) needs and become truly data-driven DXC has launched a suite of services known as DXC Business Intelligence Modernization Services It provides a proven business-led approach that bridges traditional BI with new Big Data technologiesmdash enabling you to transition to the business analytics of the future We use several methods to help transform BI and analytics capabilities on your terms

bull Discovery ServicesmdashExplore test share and learn to understand your data andshare new insights Data lakes data visualization tools and services enable rapidenterprise-wide data sharing and analytic discovery collaboration

bull Analytic SolutionsmdashAnalyze understand and act to apply insights to yourbusiness Solutions that address specific analytics run the business better

bull Hybrid Data Management ServicesmdashOptimize integrate govern and manageto become more agile through hybrid data management services Services enableenterprises to deliver production-grade analytics integrated into business processesand systems that leverage 100 of relevant data

Our end-to-end approach enables you to develop and execute on a strategic roadmapmdashto quickly respond to growing Big Data challenges optimize your existing BI investments control upfront and long-term costs avoid technology obsolescence and ultimately improve business outcomes in a real-time analytics environment

Be a data-driven agile enterpriseProviding an analytic platform that enables organizations to derive the most insight has become a business imperative The majority of information today consists of human friendly information business analytic frameworks need to incorporate it with structured data to provide actionable analytics DXC BI Modernization Services provides a pragmatic approach to bridging traditional BI with new Big Data technologiesmdashenabling the transition to a data-driven and agile enterprise

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1

Page 11: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

11

Technical White Paper

Bridge BI and Big DataIn order to address new business intelligence (BI) needs and become truly data-driven DXC has launched a suite of services known as DXC Business Intelligence Modernization Services It provides a proven business-led approach that bridges traditional BI with new Big Data technologiesmdash enabling you to transition to the business analytics of the future We use several methods to help transform BI and analytics capabilities on your terms

bull Discovery ServicesmdashExplore test share and learn to understand your data andshare new insights Data lakes data visualization tools and services enable rapidenterprise-wide data sharing and analytic discovery collaboration

bull Analytic SolutionsmdashAnalyze understand and act to apply insights to yourbusiness Solutions that address specific analytics run the business better

bull Hybrid Data Management ServicesmdashOptimize integrate govern and manageto become more agile through hybrid data management services Services enableenterprises to deliver production-grade analytics integrated into business processesand systems that leverage 100 of relevant data

Our end-to-end approach enables you to develop and execute on a strategic roadmapmdashto quickly respond to growing Big Data challenges optimize your existing BI investments control upfront and long-term costs avoid technology obsolescence and ultimately improve business outcomes in a real-time analytics environment

Be a data-driven agile enterpriseProviding an analytic platform that enables organizations to derive the most insight has become a business imperative The majority of information today consists of human friendly information business analytic frameworks need to incorporate it with structured data to provide actionable analytics DXC BI Modernization Services provides a pragmatic approach to bridging traditional BI with new Big Data technologiesmdashenabling the transition to a data-driven and agile enterprise

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1

Page 12: Gain actionable insight€¦ · disease. The knowledgebase acts as the source of truth on a subject area, operating as a textbook on key concepts and their relationships. It becomes

About the authorsHussain Mazhar Practice Leader Analytics and Data Management

Dave Tindell Solutions Architect HPE IDOL solutions

Learn more at[wwwdxctechnology analytics]

Technical White Paper

wwwdxctechnology

About DXC DXC Technology (NYSE DXC) is the worldrsquos leading independent end-to-end IT services company helping clients harness the power of innovation to thrive on change Created by the merger of CSC and the Enterprise Services business of Hewlett Packard Enterprise DXC Technology serves nearly 6000 private and public sector clients across 70 countries The companyrsquos technology independence global talent and extensive partner alliance combine to deliver powerful next-generation IT services and solutions DXC Technology is recognized among the best corporate citizens globally For more information visit wwwdxctechnology

copy 2017 DXC Technology Company All rights reserved DXC_4AA5-6430ENW November 2015 Rev 1