12
LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio , Alexander Yeh, Inderjeet Mani, Christine Doran, Julia Maria Schulz LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop

LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

Embed Size (px)

Citation preview

Page 1: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

LogCLEF 2009 Log Analysis for Digital Societies (LADS)

Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine Doran, Julia Maria Schulz

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop

Page 2: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

Overview

• Task

• Data

• Participants

• Results

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop

Page 3: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

The LADS Task

• The aim of the LADS task is to analyze user behavior with a focus on multilingual search.

• User interaction with the portal at query time– e.g. how users interact with the search

interface, what kind of search they perform– how many of them reformulate queries, browse

results, leave the portal to follow the search in a national library.

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop

Page 4: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

The LADS Task

• LADS deals with logs from The European Library (TEL)

• TEL is a free service that offers access to the resources of 48 national libraries of Europe in 35 languages.

• Resources can be both digital (e.g. books, posters, maps, sound recordings, videos) and bibliographical.

• Quality and reliability are guaranteed by the 48 collaborating national libraries of Europe.

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop

Page 5: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

Goals

• This task was open to diverse approaches, in particular data mining techniques in order to extract knowledge from the data and find interesting user patterns:

 – user session reconstruction (necessary)

– user interaction with the portal at query time

– multilinguality and query reformulation

– user context and user profile

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop

Page 6: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

TEL Environment

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop

Page 7: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

TEL Environment

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop

Page 8: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

Data

• The data used for the LADS task are search (“action”) logs of The European Library portal

• All the actions are logged and stored by TEL in a relational table– each record represents a user action.

• The most significant columns of the table are:

– A numeric id, for identifying registered users or “guest” otherwise;– User’s IP address;– An automatically generated alphanumeric, identifying sequential actions of the same

user (sessions) ;– Query contents;– Name of the action that a user performed;– The corresponding collection’s alphanumeric id;– Date and time of the action’s occurrence.

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop

Page 9: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

Data

• Action logs distributed to the participants of the task cover the period from 1st January 2007 until 30th June 2008.– 1,866,330 records

• PostgreSQL table, csv file• Description of the collection

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop

Page 10: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

Participants

• About 20 participants registered

• 4 participants submitted results– University of Sunderland– Trinity College Dublin – University of Hildesheim– CELI Research, Torino

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop

Page 11: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

Results

• CELI: identify translations of search queries.– The result is a list of pairs of queries in two languages.– Combined with session information, it is possible the check whether users translate their

query within a session.

• University of Sunderland: users rarely switch the query language during their sessions.– They also found out that queries are typically submitted in the language of the interface

which the user selects.

• Trinity College Dublin: thorough analysis of query reformulation, query length and activity sequence.– understanding of the behavior of users from different linguistic or cultural backgrounds.

• University of Hildesheim: sequences of interactions within the log file.– Visualized in an interactive user interface which allows the exploration of the sequences.

• University of Amsterdam: gain more context information– limited knowledge about the user which is inherent in log files needs to be tackled– semantic enrichment of the queries by linking them to digital objects [7].

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop

Page 12: LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine

Conclusions

• LogCLEF has provided an evaluation resource with log files of user activities in multilingual search environments:– the Tumba! Search engine and– The European Library (TEL) Web site.

• The results and approaches of the participants to the 2009 campaign will be helpful to define a more formal task in the next LogCLEF.

• Advertise better!– Workshop on Query Log Analyisis (TrebleCLEF 2009)– Workshop on Understanding the User Logging and interpreting user interactions in

information search and retrieval (SIGIR 2009)

• Sharing resources and knowledge about log files, Collaborative User Log Analysis Pool– Mailing list– Web site

LogCLEF OverviewLADS Task

October 1, 2009, Corfu, GR CLEF Workshop