Exploiting ERP Systems in Enterprise Search

Exploiting ERP Systems in Enterprise Search

Diego Tosato

Outline

• Introduction

• Our approach

• Experiments

• Open issues

• Conclusions

Introduction - idea

• Enterprise search on small data is much more important than web search on big data for many companies. [Liu et al. IR 2014]

In particular for SME (Small and Medium Enterprises).

• Last advances in enterprise search focus on the extraction of concepts or entities from enterprise data. [Brauer WWW 2010, Liu et al. IR 2014, Meij et al.

WSDM 2014, Graus et al. WSDM 2016]

• Among entities, Enterprise Rresource Planning entities (such as orders, invoices, estimates, etc.) play a key role for enterprises.

[Nazemi et al. IJAMT 2012]

Introduction - idea

• ERP systems are used by organizations to collect, store, manage and interpret data from many business activities.

• They are typically composed of several modules.

• We build a graph knowledge base that we call Entity Graph (EG).

• We model its main type of entities (33) and the related entity links (70).

ERP

Sales

Production

Purchasing

…

Finance

Our approach – system design

Our approach – Entity Graph (EG)

• EG is a directed graph

• A configuration file determines the queries to extract the relations, their direction, and the weights of each type of relation.

• EG improves enterprise search with an exploration experience complementary to faceted navigation and full text search.

A set of nodes

A set of edges

A set of edge weights

Our approach – Entity Ranking

• We follow the idea proposed by Turney et al. JAIR 2010, which led us to design a pipeline of components.

• The final rank of the results is

• We instantiate the model as follows

= entity

= set of scores

= set of weights

= # components

date scoreTF-IDF score

EG score

Page Rank score

Our approach – Prototype

• SeNSE (Skyline eNterprise Search Engine) is the name of the prototype.

– Back-end tecnology

– Front-end technology

Our approach – User Experience

• The protoype of our SERP (Search Engine Results Page).

Experiments

• We built three different enterprise datasets with real (small) data

– 1 million entities and 10 million entity links.

• How effective is the method?

– we computed the precision on a testing set of 100 user information needs.

– Relevance judgments are obtained by merging the user ranking on the top 5 entities.

– We assigned a weight performing a grid search. TF-IDF score is the most important contribution.

Method Precision

TF-IDF score (baseline) 54%

Our approach 69%

Open issues

• By analyzing the links of EG, we found that there are huge node hubs because there are some types of entities. This is a problem for PageRank because it gives higher rank to hubs which are not necessarily relevant for each enterprise information need.

• SeNSE needs different representations of an entity to provide its services. This is not only a scalability issue but also a modeling one.

The extension of the search pipeline with further components could introduce novel representations for the entities.

Entity =?

+ + . . .

Open issues

• Another tricky problem concerns the update of the indexed entities, because enterprise search engines updates should be processed in near real-time.

• The system has to deal with all the type of updates, in particular it has to manage the cancellation of entities which is the most difficult case.

• SeNSE implements three update policies: batch full, batch delta, and real time.

Conclusions and Future Works

• We presented an enterprise search model that exploits ERP entities to enhance the enterprise search experience and its implementation: SeNSE.

• We presented the open issue coming from our industrial experience.

• In future work, we aim to clarify the benefit given by each contribution to entity ranking

• We will implement an automatic method to compute the weights for those contributions.

www.freewayskyline.com/demosense

http://www.freewayskyline.com/demosense

Software

Exploiting ERP Systems in Enterprise Search