4
Interactive Search Profiles as a Means of Personalisation Maram Barifah, Monica Landoni Università della Svizzera italiana (USI) Faculty of Informatics, Lugano, Switzerland maram.barifah,[email protected] ABSTRACT User proling (USP) is a common practice in Interactive Informa- tion Retrieval (IIR) domain. It has been used widely to overcome the problems with information overload and retrieval of the non rele- vant documents. By looking into the details of how people actually use a system instead of who they are, various implicit signals about the users can be extracted. Motivated by moving beyond a "one-size ts all" approach, we are exploring individual search experiences with minimum user involvement and less expenses by investigating the usage patterns (UP)s found in the log les (LF). Ultimately, in this paper, we propose interactive search proles (ISP) as a means to personalise the interface of a digital library system in order to provide better search experiences. KEYWORDS personalisation, interactive prole, digital library 1 INTRODUCTION AND RELATED WORKS LF analysis is considered as an informative and inexpensive source for revealing valuable information about the users as LF records nat- ural user interactions uninuenced by experimenters or observers. Researchers utilise LF analysis for dierent purposes, constructing USPs is an example. USP is "a digital representation of the unique data concerning a particular user" [12] where information about individuals or com- munities are presented [11]. Generally USP data can be collected explicitly that involves qual- itative tools e.g. forms or questionnaire, implicitly by analysing LFs, eye tracking, and queries histories or collaborative mode which extracts data from the correlation of the user groups. Proles can be static where the same information is kept over time or dynamic where the content can be changed or modied [6]. Dierent USP construction tools have been proposed. For exam- ple, [3] classify user proling constructions into: (i) data mining and machine learning algorithms method referring to the intelligent user proling where dierent techniques are used e.g. Bayesian Net- works, and Case-Based Reasoning. Such proles are obtained from the observation of the users actions without considering other ex- plicit data [11]. (ii) Traditional proling methods including: content- based method where user proles can be built based on the content similarity, and collaborative method assumes that similar users who share similar attributes can have similar proles [3]. In digital libraries, [10] propose two methods: metadata proling method which is explicitly built from the ratings of the items by Conference’17, July 2017, Washington, DC, USA © 2019 Copyright held by the owner/author(s). the user, and content proling method which is extracted implic- itly from the previous pages relevance judgement. Similarly, [13] propose a combination of content-based and citation-based user proling for personalising the digital libraries. Also, [9] introduce proling method based on terms extracted from clicked documents. Similarly, [4] propose specic elements including queries, topics and content to generate human-readable user prole. [1] suggest the following features as components for USP: per- sonal data, gathering data that refers to the document content i.e. aboutness and language, document structure e.g. format, type, and document source e.g. URL, publishers, series, and authors. Deliv- ering data that indicates the delivery means e.g. e-mail, fax and delivery time. Actions data that contains user-system interactions and navigation data. Security data refers to the users privacy. Such eorts help the digital libraries to be more proactive in oering and tailoring information for individuals or communities of users. Little attention has been paid to the user-interactions with the systems and what inuenced them. The users tend to search a DL to full information needs which are addressed by the available contents and aected by the interface design and the familiarity of the users. Considering such factors in the user proling will produce more reliable users’ representations [5]. Analysing the digital information footprints of the users enables the researchers to trace the information searching process. LF analysis has been used extensively in the eld of IIR due to its ability to capture invisible, hidden and real life behaviour, [2, 8]. We aim in our research to investigate how informative the LF is to reveal the searching habits of the users, and how such information can support the designers in the production of better personalised applications. 2 METHODOLOGY The dataset was adopted from RERO Doc 1 which is a Swiss digital library connecting libraries of Western Switzerland. The library oers free access to its contents and services. The dataset contains the entries of eight-months records from (May 2017-January 2018) with more than 6M sessions consisting of 24M records of 20 G. The long period covers dierent seasons i.e. before and after exams, annual holidays, and during the semesters. The extraction of the UPs from the LFs went through the following phases: 2.1 Data preparation The raw data was prepared as the following: Data cleaning: the data was cleaned, and only the human interaction sessions were included. Data parsing: the text of the logs was divided into meaningful parts e.g. user IP, time, and URL. 1 https://doc.rero.ch

Interactive Search Profiles as a Means of Personalisation · Interactive Search Profiles as a Means of Personalisation Maram Barifah, Monica Landoni Università della Svizzera italiana

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Interactive Search Profiles as a Means of Personalisation · Interactive Search Profiles as a Means of Personalisation Maram Barifah, Monica Landoni Università della Svizzera italiana

Interactive Search Profiles as a Means of PersonalisationMaram Barifah, Monica LandoniUniversità della Svizzera italiana (USI)

Faculty of Informatics, Lugano, Switzerlandmaram.barifah,[email protected]

ABSTRACTUser pro�ling (USP) is a common practice in Interactive Informa-tion Retrieval (IIR) domain. It has been used widely to overcome theproblems with information overload and retrieval of the non rele-vant documents. By looking into the details of how people actuallyuse a system instead of who they are, various implicit signals aboutthe users can be extracted. Motivated by moving beyond a "one-size�ts all" approach, we are exploring individual search experienceswith minimum user involvement and less expenses by investigatingthe usage patterns (UP)s found in the log �les (LF). Ultimately, inthis paper, we propose interactive search pro�les (ISP) as a meansto personalise the interface of a digital library system in order toprovide better search experiences.

KEYWORDSpersonalisation, interactive pro�le, digital library

1 INTRODUCTION AND RELATEDWORKSLF analysis is considered as an informative and inexpensive sourcefor revealing valuable information about the users as LF records nat-ural user interactions unin�uenced by experimenters or observers.Researchers utilise LF analysis for di�erent purposes, constructingUSPs is an example.

USP is "a digital representation of the unique data concerning aparticular user" [12] where information about individuals or com-munities are presented [11].

Generally USP data can be collected explicitly that involves qual-itative tools e.g. forms or questionnaire, implicitly by analysingLFs, eye tracking, and queries histories or collaborative mode whichextracts data from the correlation of the user groups. Pro�les canbe static where the same information is kept over time or dynamicwhere the content can be changed or modi�ed [6].

Di�erent USP construction tools have been proposed. For exam-ple, [3] classify user pro�ling constructions into: (i) data miningand machine learning algorithms method referring to the intelligentuser pro�ling where di�erent techniques are used e.g. Bayesian Net-works, and Case-Based Reasoning. Such pro�les are obtained fromthe observation of the users actions without considering other ex-plicit data [11]. (ii) Traditional pro�ling methods including: content-based method where user pro�les can be built based on the contentsimilarity, and collaborative method assumes that similar users whoshare similar attributes can have similar pro�les [3].

In digital libraries, [10] propose two methods:metadata pro�lingmethod which is explicitly built from the ratings of the items by

Conference’17, July 2017, Washington, DC, USA© 2019 Copyright held by the owner/author(s).

the user, and content pro�ling method which is extracted implic-itly from the previous pages relevance judgement. Similarly, [13]propose a combination of content-based and citation-based userpro�ling for personalising the digital libraries. Also, [9] introducepro�ling method based on terms extracted from clicked documents.Similarly, [4] propose speci�c elements including queries, topics andcontent to generate human-readable user pro�le.

[1] suggest the following features as components for USP: per-sonal data, gathering data that refers to the document content i.e.aboutness and language, document structure e.g. format, type, anddocument source e.g. URL, publishers, series, and authors. Deliv-ering data that indicates the delivery means e.g. e-mail, fax anddelivery time. Actions data that contains user-system interactionsand navigation data. Security data refers to the users privacy.

Such e�orts help the digital libraries to be more proactive ino�ering and tailoring information for individuals or communitiesof users.

Little attention has been paid to the user-interactions with thesystems and what in�uenced them. The users tend to search a DLto ful�l information needs which are addressed by the availablecontents and a�ected by the interface design and the familiarityof the users. Considering such factors in the user pro�ling willproduce more reliable users’ representations [5]. Analysing thedigital information footprints of the users enables the researchers totrace the information searching process. LF analysis has been usedextensively in the �eld of IIR due to its ability to capture invisible,hidden and real life behaviour, [2, 8]. We aim in our research toinvestigate how informative the LF is to reveal the searching habitsof the users, and how such information can support the designersin the production of better personalised applications.

2 METHODOLOGYThe dataset was adopted from RERO Doc 1 which is a Swiss digitallibrary connecting libraries of Western Switzerland. The libraryo�ers free access to its contents and services. The dataset containsthe entries of eight-months records from (May 2017-January 2018)with more than 6M sessions consisting of 24M records of 20 G. Thelong period covers di�erent seasons i.e. before and after exams,annual holidays, and during the semesters. The extraction of theUPs from the LFs went through the following phases:

2.1 Data preparationThe raw data was prepared as the following:• Data cleaning: the data was cleaned, and only the humaninteraction sessions were included.• Data parsing: the text of the logs was divided intomeaningfulparts e.g. user IP, time, and URL.

1https://doc.rero.ch

Page 2: Interactive Search Profiles as a Means of Personalisation · Interactive Search Profiles as a Means of Personalisation Maram Barifah, Monica Landoni Università della Svizzera italiana

Figure 1: RERO Doc Interface

• Sessions determination: the sessions had to be determinedas they were not automatically available.

Due to users privacy issue, we considered the session as the unit ofanalysis.

2.2 Interface analysisWe analysed the interface with the aim of understanding the func-tionality of the system. Figure 1 shows the interface of the library.

2.3 Building actions typologyA hierarchical taxonomy of RERO Doc was built to identify thepotential users’ actions from LF. The actions were coded.

2.4 Data SamplingGiven the size of the data, the sessions were randomly selected overthe available 8-months period to guarantee the generalisability ofthe UPs. The sampling was as the following:• Create four datasets from the population with di�erent sizes(10%, 5%, 2%, and 1%) by random selection without replace-ment.• The samples were built by conducting a random generationof the sessions across all the months.

2.5 Analytical Techniques SelectionAfter processing and splitting the dataset, two unsupervised ma-chine learning techniques were implemented with the aim of ex-ploring the UPs. The techniques are: K-means and agglomerativeclustering. K-means performed better than the other in terms ofconsistency and quality of clusters.

2.6 Features ExaminationFor the purpose of grouping the users according to their similarityin the usage of the library, we identi�ed the most relevant featuresor attributes that might result in meaningful UPs. The signi�cantfeatures are:• Session duration (D): refers to the time required to conducta task. Four SDs were found in the data: short (10-60 sec-onds), average (<60-300 seconds), long (900-1800 seconds),and longer (1800-2700 seconds) sessions.

• Action type (A): categorising the UPs according to theircontent discovering styles e.g. searching or navigating.• Access point (AC): describing the session starting points.Users can reach RERO Doc contents from: search engines,click a link from emails, and RERO Doc home page.• Functionused (F): investigating the type of functions utilised(if any) during the interactions e.g. �lter the result by facets,include full-text only, or sort the results.• Termination points (T): describing how the users �nish asession e.g. view results list (VRL), Snippet view (SV), Displayitem (DI), Download item (DO), Click similar record (SR),and Add to personal list (PL).

2.7 UPs extractionAccording to the previous phases, three main UPs were found alongwith their sub-patterns. Table 1 shows the found patterns alongwith their characteristics.

3 USER MODELINGAccording to [7] a user model is "a data structure that characterisesa user U at a certain moment in time." Where user modeling is "theprocess of creating and updating a user model by deriving usercharacteristics from user data". Such data can be explicitly providedby the users or inferred from the raw data. Di�erent informationabout the users can be presented in the user model including: de-mographic information, user goals, tasks, background knowledge,interests, skills and capabilities, and traits. In the context of thisresearch the user interactions can be modelled as:

Uc: [A, F, T, D]Where(1) A represents the action type, from our analysis we found

that there are three main actions:• View items: users seek authorised items.• Navigate: users navigate RERO Doc by: collection, institu-tion, content, or press.• Search: users submit queries through simple or advancesearch functions.

(2) F refers to the functions available on RERO Doc, either theusers take advantage of the function during their interactionsor not. The available functions are:• Filter results by Facets: Doc type (FT), Institution (FI), Do-main (FD), Collection (FC), Author (FA), Keyword (FK),and Language (FL), Or search in full-text.• Sort the results by: Ascending (SA), Descending (SD) Date(Default), Title (ST), Author (SU).

(3) T represents the termination action. The available termina-tion actions in RERO Doc are:• Snippet view (SV)• Display item (DI)• Download item (DO)• Click similar record (SR)• Add to personal list (PL)

(4) D expresses the session duration.The previous mentioned components distinguish segment users

from the others.ii

Page 3: Interactive Search Profiles as a Means of Personalisation · Interactive Search Profiles as a Means of Personalisation Maram Barifah, Monica Landoni Università della Svizzera italiana

Table 1: Summarisation of the Characteristics of UPs

UPs sub UPs General Characteristics

Item seeker (IS) Satis�ed IS (SIS) ISs are the main UPs where they were redirected from search engines or emails.Multivio IS (MIS) They seek authorised items. Their session durations vary between short (60), averageAverage (AIS) (60-300), and advance (900-1800) seconds. Their sessions characterisedAdvance (DIS) by conducting one action download or view items

Navigator (N) Light (LN) They navigate RERO Doc by: collection, institution, content, or press.Average (AN) Some of the Ns �lter and sort the search results by implementing di�erent facets.Advance (DN) Their session durations vary between short (60), average (60-300), & long (900-1800)Press (PN) seconds. Their termination actions are: VRL, DI, DO, and PL.

Searcher (S) Known item (KS) The Ss interact with RERO Doc by submitting queries through simple or advanceSimple (SS) search functions. They vary in terms of: session durations, �ltering or sorting results,Average (AS) and termination actions. Their session durations vary between short (60), averageFamiliar average (FAS) (60-300), & long (900-2700) seconds. Some of the Ss �lter and sort the search results byAdvance (DS) implementing di�erent facets. Their termination actions are: VRL, DI, DO, and PL.Familiar advance (FDS) Some of the Ss, their searching went through many iterations, including querySophisticated (PS) reformulations.

Ui = {ui1,ui2, ...,uin }The components of the user model help us to generate di�erent

scenarios to be presented in the ISPs. Accordingly, the interface canbe personalised. The next section describes the ISPs and examplesof their use.

3.1 ISP as a design toolISP can be de�ned as a type of data-driven pro�les that can beconstructed by extracting real data from LF. It di�ers than the tra-ditional user pro�les which mostly contain demographic data andnot include the naturalistic interactions. ISP is a dynamic tool rep-resenting various interaction experiences of heterogeneous users.Such pro�les may enhance the user descriptions by looking intohow they interact with the system. The components of such pro�lesare attributes used to generate speci�c instances inside the sameUPs. The suggested applications of ISPs are:• Interface personalisation where the system may present avariation of the interface according to the user model. Forexample, in the case of the known item searcher (KS), thescreen can be simpli�ed by removing other facets and high-light the author or keyword facets. Another example is thelight navigators (LN) whom might represent the (come-and-leave) visitors. The interface may display a pop-up windowshowing the available navigation functions i.e. collection,institution, content, or press, as a reminder to assist the LNs.• An evaluation tool to test alternatives of the interface design.For instance, in our study, we could evaluate the valuablefunctions for each segment of users. We found that authorand keyword facets were utilised more frequently by theKS searchers compared to other users. By knowing that theside bar of the functions has impact on one or more UPs, thedesigners can provide more personalised interface options.For example, instead of providing a long and static list of

unneeded facets, the designers may provide a drop list ofdi�erent facets.• The ISP can also be used as a tool that reveals the learnabilitylevel of the system. For example, ISPs can inform the design-ers about the ease of use of the system across the found UPs.Di�erent level of expertise with the system may require tofollow di�erent paths. Such information is valuable as thedesigners might try to improve unsuccessful searching ex-periences. Thus, ISPs can be used as tools to test hypothesesand allow designers to explore new interaction paradigms.For example, test di�erent interface parameters on a segmentof users.

References to such pro�les when redesigning of IIR systems canresult in systems that are consistent with and re�ects the naturalisticinteractions. Such pro�les serve as valuable input into the redesignand re�nement of the interface, and in general of the functionalityof the system. Besides, ISPs serve the need of iterative evaluations,accounting for new users coming, and new material being addedto a DL.

4 CONCLUSIONSearching for information is a self-directed activity. It is possible toidentify and extract from LF representative features of user interac-tions. The initial �ndings of this study points towards the usefulnessof a novel tool, the ISPs. Such pro�les are: (a) data-driven pro�lesbased on the signi�cant attributes extracted from valuable source,(b) real representations of the potential users of a system (new andexisting users), (c) a tool can be used to guide designers to providemore useable interfaces. As future step, in order to validate ourISPs, a user study will be conducted involving experts in DL designand development. Ultimately, ISPs can be exploited to build richrepresentations for modeling users and interactions.

iii

Page 4: Interactive Search Profiles as a Means of Personalisation · Interactive Search Profiles as a Means of Personalisation Maram Barifah, Monica Landoni Università della Svizzera italiana

REFERENCES[1] Giuseppe Amato and Umberto Straccia. 1999. User pro�le modeling and applica-

tions to digital libraries. Research and Advanced Technology for Digital Libraries(1999).

[2] Christiane Behnert, Christiane Behnert, Dirk Lewandowski, and DirkLewandowski. 2017. A framework for designing retrieval e�ectiveness stud-ies of library information systems using human relevance assessments. Journalof Documentation 73, 3 (2017), 509–527.

[3] Ayse Cufoglu, Mahi Lohi, and Colin Everiss. 2012. Weighted instance basedlearner (WIBL) for user pro�ling. In Applied Machine Intelligence and Informatics.

[4] Carsten Eickho�, Kevyn Collins-Thompson, Paul N Bennett, and Susan Dumais.2013. Personalizing atypical web search sessions. In Proceedings of the sixth ACMinternational conference on Web search and data mining.

[5] Enrique Frias-Martinez, Sherry Y Chen, Robert DMacredie, and Xiaohui Liu. 2007.The role of human factors in stereotyping behavior and perception of digitallibrary users: a robust clustering approach. User Modeling and User-AdaptedInteraction 17, 3 (2007), 305–337.

[6] Susan Gauch, Mirco Speretta, Aravind Chandramouli, and Alessandro Micarelli.2007. User Pro�les for Personalized Information Access. In The adaptive web.Springer.

[7] Eelco Herder. 2016. User Modeling and Personalization 2. (April 2016). lectureslides.

[8] Diane Kelly. 2009. Methods for evaluating interactive information retrievalsystems with users. Foundations and Trends in Information Retrieval (2009).

[9] Nikolaos Nanas, Victoria Uren, and Anne De Roeck. 2003. Building and apply-ing a concept hierarchy representation of a user pro�le. In Proceedings of the26th annual international ACM SIGIR conference on Research and development ininformaion retrieval.

[10] U Rohini and Vamshi Ambati. 2005. A collaborative �ltering based re-rankingstrategy for search in digital libraries. International Conference on Asian DigitalLibraries (2005).

[11] Silvia Schia�no and Analía Amandi. 2009. Intelligent User Pro�ling. In Arti�cialIntelligence An International Perspective. Springer, 193–216.

[12] Kerry-Louise Skillen, Liming Chen, Chris D Nugent, Mark P Donnelly, WilliamBurns, and Ivar Solheim. 2012. Ontological User Pro�le Modeling for Context-Aware Application Personalization. In Ubiquitous Computing and Ambient Intel-ligence. Springer, 261–268.

[13] Thanh-Trung Van and Michel Beigbeder. 2008. Hybrid method for personalizedsearch in scienti�c digital libraries. In International Conference on Intelligent TextProcessing and Computational Linguistics.

iv