View
217
Download
1
Category
Tags:
Preview:
Citation preview
Text Mining Text Mining in Combination within Combination with Enterprise SearchEnterprise Search
Thomas HerbstCEO B-S-S GmbH
7th Fraunhofer Symposium on Text Mining5./6. October 2009
Todays Challenge: Information Overload
CMSCMS
KMKM
SearchSearch
DMSDMS APPsAPPs
WWWWWW
30% of working time is used for search of relevant information.
85% of all relevant data are unstructured.
the amount of unstructured information doubles approximately every 8 months.
user has the need to get information combined
user is missing the 360°view on all relevant content
21.04.23 2B-S-S Business Software Solutions GmbH
What Customers ask for...
...provide a dynamic holistic view
of all information
in a proper context.
21.04.23 3B-S-S Business Software Solutions GmbH
Todays information system
architecture
21.04.23 4B-S-S Business Software Solutions GmbH
Classic information architecture
Portal
News App KM Intranet
DMS
WCMS KM MOSS
...siloed content, that can‘t be used in a combined context.
DMSDB
21.04.23 5B-S-S Business Software Solutions GmbH
Enterprise Search Today
enterprise search SearchSearch
... find most of the content, but only links to the content silo‘s.
21.04.23 6B-S-S Business Software Solutions GmbH
Todays KM + Search Infrastruct
KM Search 1 Search 2
Information Worker
Web Search
GoogleOracle WebLucene
• Search or KM Systems often only address a specific need or purpose
• Data must be transferred and transformed between the systems
•Time consuming
•Information lost
• Holistic view cannot be created because every system is a new data silo that can’t be combined
• User must learn the query language of every system
21.04.23 7B-S-S Business Software Solutions GmbH
21.04.23 8B-S-S Business Software Solutions GmbH
Enterprise Search + Text Mining
based on a
Information Access Layer
Information access layer
CMSCMS
KMKM
SearchSearch
DMSDMS APPsAPPs
WWWWWWIAL
InformationAccessLayer
21.04.23 9B-S-S Business Software Solutions GmbH
10
Create Virtual Datasources
CMS DB AppSearch DMS
Portal 1
Portal 2
App 2App 1
21.04.23 10B-S-S Business Software Solutions GmbH
MarketingMarketing
HealthcareHealthcare
Brand Protection
Brand Protection
MarketWatchMarketWatch
IntranetIntranet
21.04.23 11B-S-S Business Software Solutions GmbH
Conversion LanguageCompanyGeographyPeople
Lemmas
OntologyPLUG-IN
Speechtagger
AlertSearch
Taxonomy Sentiment Entities
Pipeline = Extract + Enrich
<pages>
<page id=„1“><abstract id=„0“><sentence id=„0“>dpa-afx <location country=„Germany“ long=„46225533“ lat=„13452345“>FRANKFURT</location>. <sentence><sentence id=„1“>“Wir werden weiter profitabel wachsen, die Qualität verbessern und die operative Marge vergrößern“, sagte Vorstandschef <person typ=„male“ class=„economy“>Wolfgang Mayrhuber</person> am Donnerstag in <location country=„Germany“ long=„46225533“ lat=„13452345“ >Frankfurt</location>. </sentence>...
</page>
<pages>
<page id=„1“><abstract id=„0“><sentence id=„0“>dpa-afx <location country=„Germany“ long=„46225533“ lat=„13452345“>FRANKFURT</location>. <sentence><sentence id=„1“>“Wir werden weiter profitabel wachsen, die Qualität verbessern und die operative Marge vergrößern“, sagte Vorstandschef <person typ=„male“ class=„economy“>Wolfgang Mayrhuber</person> am Donnerstag in <location country=„Germany“ long=„46225533“ lat=„13452345“ >Frankfurt</location>. </sentence>...
</page>
Cerebral infarctCerebral infarct
Cerebral infarctsApoplexyApoplectic insultStroke
“Cerebral infarct”Cerebral infarktSerebral infarctCetebral ingarct
Cerebral diseaseInfarction
Cerebral infarct / medicineCerebral infarct / biology
Cerebral infarct / conferences
Infarctus cérébral
Phrasing
Doc typeclassification
Spellchecking – Phonetic match
Synonymy
Thesaurussupport
Refinement
Characternormalization
Lemmatization
Topic classification
Ambiguousqueries
Advanced Linguistics
13
Architecture Overview
• Intuitional generation of dynamic application and portals
• Enablement of search driven portals
• Highly flexibel to modify, adapt and update
• Rank based content delivery (popularity, expected sales, confidence)
CMS DB AppSearch DMS
PortalFrontend
• Building a real information layer
• Integrate all needed content
• Convert to one common access layer
• Combine all content into virtual datasources
• WITHOUT INFLUENCING THE EXISTING INFRASTRUCTURE
Information Access Layer
e.g.
Portal 1
Portal 2
App 2App 1
21.04.23 13B-S-S Business Software Solutions GmbH
14
Dynamic Content networking
Portal
Boulevard
Sport
Gallery
Events
Automatic cross linking of content based on either user context, content context or extracted entities
• A sport article about „Tiger Woods“ links to Galleries' and boulevard news about him
• A boulevard article also offers upcoming events
21.04.23 15B-S-S Business Software Solutions GmbH
Automatic content linking
• Paragraphs
• Persons
• Locations
• Countries / Regions
• Companies
• Branches
• Acronyms
• Chemical Structures
• Dates
• Other custom entities
21.04.23 16B-S-S Business Software Solutions GmbH
Navigators + TagcloudsAutomatically generated navigators and clouds for most common topics
Enables the user to get an idea of the list of content and results and also to understand and to navigate through them
Automatic search by relevant words or pair of words
21.04.23 17B-S-S Business Software Solutions GmbH
Offering similiar news offering of similar contents, based on topic-sensitive matching techniques
Real-time provision of related content (Find, Refine, Exclude, Custom Logic)
21.04.23 18B-S-S Business Software Solutions GmbH
Document thumbnailing
Creates thumbnails from many document types in different sizes
Gives a user a quick look without opening a native application
Allows visual navigation on page level between text and images
21.04.23 19B-S-S Business Software Solutions GmbH
Content Analysis
On the fly multi dimensional cross tab content analysis
Discover trends, knowlege or content relations in structured or unstructured content
e.g. sales per region, expert for products, relations between persons and locations
21.04.23
20
B-S-S Business Software Solutions GmbH
User generated content
Put comments on every content
Comment list to show the last comments or the content with the most comments
Let users rate your content
Use the rating to boost or deboost content in the result
21.04.23 21B-S-S Business Software Solutions GmbH
Information Access Layer can combine different kind of data silos
Integrate content once and use it in different scenarios under different perspectivs
fully security and access control support
Seamless integration of different Text Mining Products
21.04.23 22B-S-S Business Software Solutions GmbH
Summary
Thank you
B-S-S Business Software Solutions GmbHWartburgstrasse 199817 Eisenach/GermanyTel. +49 3691 709000thomas.herbst@b-s-s.dewww.b-s-s.de
21.04.23 23B-S-S Business Software Solutions GmbH
Recommended