Text Mining in Combination with in Combination with Enterprise Search Enterprise Search Thomas...

Preview:

Citation preview

Text Mining Text Mining in Combination within Combination with Enterprise SearchEnterprise Search

Thomas HerbstCEO B-S-S GmbH

7th Fraunhofer Symposium on Text Mining5./6. October 2009

Todays Challenge: Information Overload

CMSCMS

KMKM

SearchSearch

DMSDMS APPsAPPs

WWWWWW

30% of working time is used for search of relevant information.

85% of all relevant data are unstructured.

the amount of unstructured information doubles approximately every 8 months.

user has the need to get information combined

user is missing the 360°view on all relevant content

21.04.23 2B-S-S Business Software Solutions GmbH

What Customers ask for...

...provide a dynamic holistic view

of all information

in a proper context.

21.04.23 3B-S-S Business Software Solutions GmbH

Todays information system

architecture

21.04.23 4B-S-S Business Software Solutions GmbH

Classic information architecture

Portal

News App KM Intranet

DMS

WCMS KM MOSS

...siloed content, that can‘t be used in a combined context.

DMSDB

21.04.23 5B-S-S Business Software Solutions GmbH

Enterprise Search Today

enterprise search SearchSearch

... find most of the content, but only links to the content silo‘s.

21.04.23 6B-S-S Business Software Solutions GmbH

Todays KM + Search Infrastruct

KM Search 1 Search 2

Information Worker

Web Search

GoogleOracle WebLucene

• Search or KM Systems often only address a specific need or purpose

• Data must be transferred and transformed between the systems

•Time consuming

•Information lost

• Holistic view cannot be created because every system is a new data silo that can’t be combined

• User must learn the query language of every system

21.04.23 7B-S-S Business Software Solutions GmbH

21.04.23 8B-S-S Business Software Solutions GmbH

Enterprise Search + Text Mining

based on a

Information Access Layer

Information access layer

CMSCMS

KMKM

SearchSearch

DMSDMS APPsAPPs

WWWWWWIAL

InformationAccessLayer

21.04.23 9B-S-S Business Software Solutions GmbH

10

Create Virtual Datasources

CMS DB AppSearch DMS

Portal 1

Portal 2

App 2App 1

21.04.23 10B-S-S Business Software Solutions GmbH

MarketingMarketing

HealthcareHealthcare

Brand Protection

Brand Protection

MarketWatchMarketWatch

IntranetIntranet

21.04.23 11B-S-S Business Software Solutions GmbH

Conversion LanguageCompanyGeographyPeople

Lemmas

OntologyPLUG-IN

Speechtagger

AlertSearch

Taxonomy Sentiment Entities

Pipeline = Extract + Enrich

<pages>

<page id=„1“><abstract id=„0“><sentence id=„0“>dpa-afx <location country=„Germany“ long=„46225533“ lat=„13452345“>FRANKFURT</location>. <sentence><sentence id=„1“>“Wir werden weiter profitabel wachsen, die Qualität verbessern und die operative Marge vergrößern“, sagte Vorstandschef <person typ=„male“ class=„economy“>Wolfgang Mayrhuber</person> am Donnerstag in <location country=„Germany“ long=„46225533“ lat=„13452345“ >Frankfurt</location>. </sentence>...

</page>

<pages>

<page id=„1“><abstract id=„0“><sentence id=„0“>dpa-afx <location country=„Germany“ long=„46225533“ lat=„13452345“>FRANKFURT</location>. <sentence><sentence id=„1“>“Wir werden weiter profitabel wachsen, die Qualität verbessern und die operative Marge vergrößern“, sagte Vorstandschef <person typ=„male“ class=„economy“>Wolfgang Mayrhuber</person> am Donnerstag in <location country=„Germany“ long=„46225533“ lat=„13452345“ >Frankfurt</location>. </sentence>...

</page>

Cerebral infarctCerebral infarct

Cerebral infarctsApoplexyApoplectic insultStroke

“Cerebral infarct”Cerebral infarktSerebral infarctCetebral ingarct

Cerebral diseaseInfarction

Cerebral infarct / medicineCerebral infarct / biology

Cerebral infarct / conferences

Infarctus cérébral

Phrasing

Doc typeclassification

Spellchecking – Phonetic match

Synonymy

Thesaurussupport

Refinement

Characternormalization

Lemmatization

Topic classification

Ambiguousqueries

Advanced Linguistics

13

Architecture Overview

• Intuitional generation of dynamic application and portals

• Enablement of search driven portals

• Highly flexibel to modify, adapt and update

• Rank based content delivery (popularity, expected sales, confidence)

CMS DB AppSearch DMS

PortalFrontend

• Building a real information layer

• Integrate all needed content

• Convert to one common access layer

• Combine all content into virtual datasources

• WITHOUT INFLUENCING THE EXISTING INFRASTRUCTURE

Information Access Layer

e.g.

Portal 1

Portal 2

App 2App 1

21.04.23 13B-S-S Business Software Solutions GmbH

14

Dynamic Content networking

Portal

Boulevard

Sport

Gallery

Events

Automatic cross linking of content based on either user context, content context or extracted entities

• A sport article about „Tiger Woods“ links to Galleries' and boulevard news about him

• A boulevard article also offers upcoming events

21.04.23 15B-S-S Business Software Solutions GmbH

Automatic content linking

• Paragraphs

• Persons

• Locations

• Countries / Regions

• Companies

• Branches

• Acronyms

• Chemical Structures

• Dates

• Other custom entities

21.04.23 16B-S-S Business Software Solutions GmbH

Navigators + TagcloudsAutomatically generated navigators and clouds for most common topics

Enables the user to get an idea of the list of content and results and also to understand and to navigate through them

Automatic search by relevant words or pair of words

21.04.23 17B-S-S Business Software Solutions GmbH

Offering similiar news offering of similar contents, based on topic-sensitive matching techniques

Real-time provision of related content (Find, Refine, Exclude, Custom Logic)

21.04.23 18B-S-S Business Software Solutions GmbH

Document thumbnailing

Creates thumbnails from many document types in different sizes

Gives a user a quick look without opening a native application

Allows visual navigation on page level between text and images

21.04.23 19B-S-S Business Software Solutions GmbH

Content Analysis

On the fly multi dimensional cross tab content analysis

Discover trends, knowlege or content relations in structured or unstructured content

e.g. sales per region, expert for products, relations between persons and locations

21.04.23

20

B-S-S Business Software Solutions GmbH

User generated content

Put comments on every content

Comment list to show the last comments or the content with the most comments

Let users rate your content

Use the rating to boost or deboost content in the result

21.04.23 21B-S-S Business Software Solutions GmbH

Information Access Layer can combine different kind of data silos

Integrate content once and use it in different scenarios under different perspectivs

fully security and access control support

Seamless integration of different Text Mining Products

21.04.23 22B-S-S Business Software Solutions GmbH

Summary

Thank you

B-S-S Business Software Solutions GmbHWartburgstrasse 199817 Eisenach/GermanyTel. +49 3691 709000thomas.herbst@b-s-s.dewww.b-s-s.de

21.04.23 23B-S-S Business Software Solutions GmbH

Recommended