47
Studying SFX Logs to Better Understand User Behavior Bennett Claire Ponsford, Digital Services Librarian Anne L. Highsmith, Consortia Systems Coordinator Texas A&M University Libraries

Studying SFX Logs to Better Understand User Behavior Bennett Claire Ponsford, Digital Services Librarian Anne L. Highsmith, Consortia Systems Coordinator

Embed Size (px)

Citation preview

Studying SFX Logs to Better Understand User Behavior

Bennett Claire Ponsford, Digital Services LibrarianAnne L. Highsmith, Consortia Systems Coordinator

Texas A&M University Libraries

Texas A&M University

46,000-plus undergraduate and 8,500 graduate students

250 degree programs in 10 colleges 2,800 faculty in a research-intensive

environment Branch campuses in Galveston, Texas and

Doha, Qatar

University Libraries

Member of ARL Main library with 5 branches

3 in College Station 1 in Galveston, TX 1 in Doha, Qatar

Spending over $7 million per year on electronic resources

Redesigning our website anyway

Our Current SFX Implementation

Went live with SFX 2 in May 2004 SFX 3 in January 2006

A-Z List in January 2006 A-Z List ver. 3 in January 2008

Current SFX menu design unchanged since we went live

Why study users?

To see how your users search when you’re not watching

To resolve internal disagreements over default features to include, etc.

To see whether changes to SFX menus really improved results

As a counterpoint to focus groups and task-based user testing

What do our users say?

Hated all the pop-up windows Pop-up windows in general Highjacking previous content

SFX menus too busy and confusing Did not understand the 3 catalog links Never used the FAQ or Ulrich’s link

All they want is full text anyway

What do the numbers say?

• Sample from the SFX logs– 14 days per year– 3 years, FY 2006 – as much of FY 2008 as available

• Sample from the apache logs– Data available only for FY 2008

Where are they coming from?Fiscal Year 08 Count Percent

From Catalog to API 5860 8.54%

From AZ list to just SFX Menu 1228 1.79%

From AZ list to Full Text 15995 23.32%

From MetaLib 14095 20.55%

From Databases 31134 45.39%

Total Sample Requests 68585 100.00%

Where do they go?

FY08 Overall Count Percent

Total Sample Requests 68585 100.00%

Requests with Full Text 30626 44.65%

- Clicked On 26679 87.11%

- Not Clicked on 5946 19.41%

What if there is no full text?

FY08 Overall Count PercentILL only 2438 16.81%LibCat (main catalog) only 2299 15.85%Search Google only 449 3.10%LibCat & ILL 287 1.98%LibCat & Chiron (medical catalog) 251 1.73%Chiron only 195 1.34%All 3 catalogs 87 0.60%All 3 catalogs and ILL 70 0.48%ILL for inadequate metadata 57 0.39%LibCat & Search Google 56 0.39%LibCat, Chiron, & ILL 40 0.28%

Full Text from the Catalogs

Where do they go?

FY08 from Catalog to API Count Percent

Sample requests from Catalog to API 5860

No Full Text Found 3987 68.04%

Full Text 1873 31.96%- Clicked on Full Text (once or more) 1701 90.82%

- Clicked on Full Text (more than once) 92 4.91%

- Clicked on Full Text (total clicks) 1802

- User Did Not Click on Full Text button 476 25.41%

Requests over Time

Full Text Availability and Actions

No Full Text Actions

Public/Library Usage

Catalog SFX

Overall Requests: Public/Library

Clicking on Full Text: Public/Library

No Full Text Behavior: Public/Library

Requests from AZ List : Public/Library

Review of Apache LogsSample of titles Type

administrative science quarterly Journal title

adobe Subject?

adolescence Journal title

adolescent literature Subject?

adolescents Subject?

Adopotion of biotechnology Subject?

adult children home Subject?

adult children with parents at home Subject?

adult education Journal title

Adult Learning Journal title

adv exp med biol Journal title

What Next: SFX Menus?

Redesign SFX Menus using simplified menus Just display full text, if available, in basic section Decrease all the verbiage Reduce duplicate listings with display logic Display catalog links only if holdings available Experiment with direct link banner option

User test changes

What Next: Home page?

Search box from Libraries home page Review apache logs re: size of problem Wording changes/help text

Technical section -- outline

• Characteristics of stat tables• How statistics are gathered and stored• Characteristics of Apache logs• Modifications to Apache logging to facilitate

stats• Statistical sample• How you can do this too

Characteristics of stat tables (1)

• 3 stat tables (& offline equivalents)– stat_object– stat_target_service– stat_repeatables

• Request has 1 entry in stat_object table• Tables join on request_id• Request has multiple entries in

stat_target_service table– 1 entry (row) for each link on menu

Characteristics of stat tables (2)

• Exceptions to “request in stat_object table has several corresponding records in stat_target_services”– API requests - 0 entries in stat_target_service– Click on any type of link where direct linking

occurs – 1 entry in stat_target_service

Stat object data elements (1)• Name • ----------------------- • REQUEST_ID • ISSN • ISBN • LCCN • LOCAL • TITLE • ATITLE • JTITLE • BTITLE • CTITLE • SERIES • PUBLISHER • PLACE_OF_PUBLICATION • OBJECT

Stat object data elements (2)• Name • ----------------------- • SUBCATEGORY • STATUS • DOI • REQ_DATE • TIME • SOURCE • IP • OBJECT_TYPE • INSTITUTE • USER_GROUP • FACULTY • HAS_FULLTXT • DATE_OF_PUBLICATION • EPAGE • SPAGE • PRESENTATION_FORMAT • SESSION_ID • OPEN_URL

Rows in target service tableTarget Clicks Service INFORMAWORLD_JOURNALS 1 getFullTxt METAPRESS_ROUTLEDGE 1 getFullTxt AM_VOYAGER 1 getHolding MS_VOYAGER 0 getHolding GA_VOYAGER 0 getHolding WWW_SEARCH_ENGINES 0 getWebSearch AM_PROBLEM_REPORT 0 getWebService AM_SFX_FAQ 0 getWebService ULRICHSWEB_COM 1 getCitedJournal

How stats are gathered & stored (1)

• Run online to offline daily• Run export_tab.pl monthly - Embedded in

special perl script that copies monthly cumulations to report server

• Copy stat_object_offline & stat_target_service_offline, but not stat_repeatables_offline

How stats are gathered & stored (2)

• Copy these tables in their entirety, except for some open_urls in stat_object

• Perl script on report server loads data into Oracle tables

• Create separate tables by academic year because of size -- academic 2008 to date:– 1.3M+ requests– 8.4M+ target service links

Characteristics of Apache logs (1)

• “Real” ip associated with request available only from reverse Apache log

• Logs that span long time period– Beware of differences between v2 and v3 a-z lists– Note if you have changed display options, e.g.

between brief and detail view

Characteristics of Apache logs (2)

• Certain data available only from logs, because it doesn’t generate a “request” or hasn’t generated a request yet.– Category search– Auxiliary functions• Use of info button on az list• Push to Metalib myspace from az list• Opening SFX az list from within Metalib

Modifications to Apache logging

• Set up custom logging statement in httpd.conf– Write the data in a single record– Store tabs between data elements within record

Samples

• STAT table samples– 2 weeks worth of data across 3 years– Same day of week / same week of month• 1st Thursday in January, 2nd Friday in March

• Apache logs– Academic 2008 was only year available with

sufficient data elements– Selected representative days from that year

DIY instructions (1)

• Run online2offline job (server_admin_util)• Run

/exlibris/sfx_ver/sfx_version_3/[instance]/admin/database/export_tab.pl– By default writes *.exp file to scratch directory– Must run once for each table

• Download *.exp files & load into MS Access

DIY instructions (2)

• Go to http://lib.tamu.edu/directory/ahighsmi and click on link for this presentation to download zip file.

• Zip file contains– Copy of this presentation– Sample MS Access databases– Perl program to parse Apache log entries– Custom Apache log format for httpd.conf

Contact info

• Bennett Claire Ponsford– [email protected]– http://lib.tamu.edu/directory/bponsfor

• Anne L. Highsmith– [email protected]– http://lib.tamu.edu/directory/ahighsmi