26
ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY [email protected] HTTP://LIBRARY.TAMU.EDU/DIRECTORY/HISMITH Do It Yourself Primo Statistics The Art of the (Relatively) Painless Extraction

ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY Do It Yourself Primo Statistics

Embed Size (px)

DESCRIPTION

Our Primo Environment  Texas A&M University is a hosted, Direct customer, in production since June  As a hosted customer, we have a staging system as well as production. All program development for these extracts has been done on the production system.  We are currently on release 4.4.1

Citation preview

Page 1: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

ANNE L. HIGHSMITHDIRECTOR, CONSORTIA SYSTEMSTEXAS A&M [email protected]://LIBRARY.TAMU.EDU/DIRECTORY/HISMITH

Do It Yourself Primo StatisticsThe Art of the (Relatively) Painless Extraction

Page 2: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Our Environment

Page 3: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Our Primo Environment Texas A&M University is a hosted, Direct

customer, in production since June 2012. As a hosted customer, we have a staging

system as well as production. All program development for these extracts has been done on the production system.

We are currently on release 4.4.1

Page 4: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Our Reporting Environment Report server with an Oracle database Oracle is separately licensed, so we can

do development on it Contains SFX/MetaLib extracts and

statistics and a full copy of the Voyager database, rebuilt nightly from backup

Page 5: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Viewing the Views

Page 6: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

How to see what’s available Log in as primo user Execute: s+ RPT00 Execute: SELECT VIEW_NAME FROM

ALL_VIEWS WHERE OWNER LIKE ‘%RPT00’ CLICK_EVENTS SEARCH_STATISTICS SEARCH_STRINGS

To see view definition, execute: SELECT TEXT FROM ALL_VIEWS WHERE VIEW_NAME = ‘CLICK_EVENTS’

Page 7: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

SELECT ID,SUMMARY_TIMESTAMP EVENT_DATE,CLICK_TYPE EVENT_TYPE,CASE WHEN CLICK_VALUE='N/A' THEN '' ELSE CLICK_VALUE END CLICK_VALUE,CLICK_COUNT,SOURCE_VIEW,SOURCE_INSTITUTION,SOURCE_ON_CAMPUS,SOURCE_USER_GROUPfrom P41_PRM00.S_CLICK_SUMMARIESWHERE CLICK_TYPE NOT IN ('File System', 'DB Listener', 'Load', 'Indexes', 'Table Space', 'Search Problem', 'IO Wait', 'Memory')

Page 8: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

View Definitions All stats views seem to be based on

S_SEARCH_SUMMARIES & S_CLICK_SUMMARIES tables Notice that CLICK_EVENTS excludes some

system-type stats SEARCH_STATISTICS is a subset of

S_SEARCH_SUMMARIES, where SUMMARY_TYPE='SEARCH_COUNT‘

SEARCH_STRINGS is a subset of S_SEARCH_SUMMARIES, where SUMMARY_TYPE = 'TOP_SEARCHES_SUMMARY'

Page 9: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Pop quiz #1 In 1745, settlers from the English

colonies, assisted by the British fleet, invaded and captured the capital of one of the provinces of New France. Which one?

(Will accept the name of the French province, the modern Canadian province of which it is a part, the fortress, or the place name.)

Page 10: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Data Anomalies

Page 11: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

SQL vs. BIRT Reports Replicate BIRT report for Click EventsEvent type

BIRT SQL

Display details tab

5 49,629

DS 444,539 89,791GetIt!Link2

1 20,682

Page 12: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

SQL Selection Criteria Issues Some tables contain “junk” Out of 10M rows in the

CLICK_EVENTS view, 36% had no institution name

Myriad variations in INSTITUTION_NAME

Page 13: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Basic Selection CriteriaSELECT event_type, click_value, click_count, institution, \"VIEW\" AS view_name, on_campus, user_groupFROM p41_rpt00.click_eventsWHERE to_char(event_date,'YYYYMM') = '$previous_month'AND institution is not nullAND lower(institution) not like 'primo%'

Page 14: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Scope Names Hoped that SCOPE_NAME would be

equivalent to the Search Scope Name as it appears on the Search Scope List in the Primo Back Office.

Current default SCOPE_NAME appears as: scope:("MSL"),scope:(libguides),scope:

(archon),scope:(AMDB_VOYAGER),scope:(TAMU-SFX ),scope:(EVANS),scope:(tamu_dspace_qdc),primo_central_multiple_fe

Collected all known SCOPE_NAME values in a Perl module, TAMU_Primo.pm

Page 15: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Scope Types SEARCH_STATISTICS and

SEARCH_STRINGS views contain an element called SCOPE_TYPE

SCOPE_TYPE in SEARCH_STRINGS should be limited to LOCAL/REMOTE

SCOPE_TYPE IN SEARCH_STATISTICS should be limited to LOCAL/REMOTE/DS

Page 16: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Scope Types (Continued) SEARCH_STATISTICS – 16% of

SCOPE_TYPE values are something other than LOCAL/REMOTE

SEARCH_STRINGS – 12% of SCOPE_TYPE values are something other than LOCAL/REMOTE/DS

If the retrieved value didn’t match the list of defined values, I set it to null.

Page 17: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Data I Can’t Make Sense of SEARCH_STRINGS has only 149,127 rows

in the view Are these unique strings? If yes, why does the same string appear

in different rows? What do the numbers, such as

AVERAGE_RESULTS and SEARCH_COUNT, really mean?

Page 18: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Example “Fluid mechanics” appears as a search

string in the default scope 5 times in the period 1/18/2014-3/5/2014.

AVERAGE_RESULTS by date 18-Jan-14 210677 31-Mar-14 150528 27-Feb-14 58544 5-Mar-14 58576 5-Mar-14 74119

Page 19: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Pop quiz #2 Which city in western Canada was the

birthplace/hometown of the following personalities: Deanna Durbin, actress Anna Pacquin, actress Doug Henning, magician and entertainer Sir William Stephenson, AKA Intrepid, spy Guy Gavriel Kay, novelist and poet Brett Hull, professional hockey player Marshall McLuhan, media guru Fred Turner, musician, Bachman-Turner Overdrive Monty Hall, host of Let’s Make a Deal

Page 20: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Perl Extract Programs

Page 21: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Generalities The extract and processing programs for

the TAMU report server are written in Perl; the front end is written in PHP

The Primo stats extract programs I have written live on the production Primo server; they sftp output to the report server

The perl programs use a local symlink from /exlibris/product/perl-5.8.9/bin/perl to /exlibris/primo/scripts/perl

Page 22: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Generalities (Continued) The Primo group consists of 5 Perl programs

and 1 module click_extract.pl, click_compile.pl, facets.pl,

search_statistics.pl, search_strings.pl, TAMU_Primo.pm

click_extract.pl extracts data from the CLICK_EVENTS view and stores it in output files, which are mined by click_compile.pl & facets.pl to create useful output.

search_statistics.pl & search_strings.pl extract data from their corresponding views to an output file

Page 23: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Generalities (Continued) Programs are designed to be run on a

monthly basis, to be put into a cronjob and cumulate the previous month’s data. But they can also be run from the command line with parameters that let you select other months earlier in the calendar.

The programs that create output files also have a step to sftp the output to a different server. But you have to do the sftp setup between servers yourself.

Page 24: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

A Few Specifics Facets.pl creates 2 sets of output files –

one set which cumulates all facet requests and a second one that provides detail about certain facet types

If it’s a domain, language, library, resource type, or top-level facet, it cumulates the individual values under each of those types. So you would know how many times the facet for English language was applied or the facet for Thesis resource type.

Page 25: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Normalization Contained in TAMU_Primo.pm

Defines variations in the institution value, code versus spelled out name, and normalizes them all to the codes

Defines a list of valid view names Normalizes the user groups.

Defines a long list of valid scope_names Search_statistics.pl collects undefined

scope_names and emails the list to a designated email account so that the list can be updated

Page 26: ANNE L. HIGHSMITH DIRECTOR, CONSORTIA SYSTEMS TEXAS A&M UNIVERSITY  Do It Yourself Primo Statistics

Pop quiz #3 In 1993, the National Hockey League

changed its conference and division names to boring stuff like “Eastern Conference” and “Pacific Division”. Before that, the historic conference and division names were based on people who had something to do with hockey and the history of the NHL. Give one of those old conference or division names.