Upload
diego-tosato
View
234
Download
0
Embed Size (px)
Citation preview
Introduction - idea
• Enterprise search on small data is much more important than web search on big data for many companies. [Liu et al. IR 2014]
In particular for SME (Small and Medium Enterprises).
• Last advances in enterprise search focus on the extraction of concepts or entities from enterprise data. [Brauer WWW 2010, Liu et al. IR 2014, Meij et al.
WSDM 2014, Graus et al. WSDM 2016]
• Among entities, Enterprise Rresource Planning entities (such as orders, invoices, estimates, etc.) play a key role for enterprises.
[Nazemi et al. IJAMT 2012]
Introduction - idea
• ERP systems are used by organizations to collect, store, manage and interpret data from many business activities.
• They are typically composed of several modules.
• We build a graph knowledge base that we call Entity Graph (EG).
• We model its main type of entities (33) and the related entity links (70).
ERP
Sales
Production
Purchasing
…
Finance
Our approach – Entity Graph (EG)
• EG is a directed graph
• A configuration file determines the queries to extract the relations, their direction, and the weights of each type of relation.
• EG improves enterprise search with an exploration experience complementary to faceted navigation and full text search.
A set of nodes
A set of edges
A set of edge weights
Our approach – Entity Ranking
• We follow the idea proposed by Turney et al. JAIR 2010, which led us to design a pipeline of components.
• The final rank of the results is
• We instantiate the model as follows
= entity
= set of scores
= set of weights
= # components
date scoreTF-IDF score
EG score
Page Rank score
Our approach – Prototype
• SeNSE (Skyline eNterprise Search Engine) is the name of the prototype.
– Back-end tecnology
– Front-end technology
Experiments
• We built three different enterprise datasets with real (small) data
– 1 million entities and 10 million entity links.
• How effective is the method?
– we computed the precision on a testing set of 100 user information needs.
– Relevance judgments are obtained by merging the user ranking on the top 5 entities.
– We assigned a weight performing a grid search. TF-IDF score is the most important contribution.
Method Precision
TF-IDF score (baseline) 54%
Our approach 69%
Open issues
• By analyzing the links of EG, we found that there are huge node hubs because there are some types of entities. This is a problem for PageRank because it gives higher rank to hubs which are not necessarily relevant for each enterprise information need.
• SeNSE needs different representations of an entity to provide its services. This is not only a scalability issue but also a modeling one.
The extension of the search pipeline with further components could introduce novel representations for the entities.
Entity =?
+ + . . .
Open issues
• Another tricky problem concerns the update of the indexed entities, because enterprise search engines updates should be processed in near real-time.
• The system has to deal with all the type of updates, in particular it has to manage the cancellation of entities which is the most difficult case.
• SeNSE implements three update policies: batch full, batch delta, and real time.
Conclusions and Future Works
• We presented an enterprise search model that exploits ERP entities to enhance the enterprise search experience and its implementation: SeNSE.
• We presented the open issue coming from our industrial experience.
• In future work, we aim to clarify the benefit given by each contribution to entity ranking
• We will implement an automatic method to compute the weights for those contributions.