Upload
theodore-andrews
View
217
Download
0
Embed Size (px)
DESCRIPTION
Our Primo Environment Texas A&M University is a hosted, Direct customer, in production since June As a hosted customer, we have a staging system as well as production. All program development for these extracts has been done on the production system. We are currently on release 4.4.1
Citation preview
ANNE L. HIGHSMITHDIRECTOR, CONSORTIA SYSTEMSTEXAS A&M [email protected]://LIBRARY.TAMU.EDU/DIRECTORY/HISMITH
Do It Yourself Primo StatisticsThe Art of the (Relatively) Painless Extraction
Our Environment
Our Primo Environment Texas A&M University is a hosted, Direct
customer, in production since June 2012. As a hosted customer, we have a staging
system as well as production. All program development for these extracts has been done on the production system.
We are currently on release 4.4.1
Our Reporting Environment Report server with an Oracle database Oracle is separately licensed, so we can
do development on it Contains SFX/MetaLib extracts and
statistics and a full copy of the Voyager database, rebuilt nightly from backup
Viewing the Views
How to see what’s available Log in as primo user Execute: s+ RPT00 Execute: SELECT VIEW_NAME FROM
ALL_VIEWS WHERE OWNER LIKE ‘%RPT00’ CLICK_EVENTS SEARCH_STATISTICS SEARCH_STRINGS
To see view definition, execute: SELECT TEXT FROM ALL_VIEWS WHERE VIEW_NAME = ‘CLICK_EVENTS’
SELECT ID,SUMMARY_TIMESTAMP EVENT_DATE,CLICK_TYPE EVENT_TYPE,CASE WHEN CLICK_VALUE='N/A' THEN '' ELSE CLICK_VALUE END CLICK_VALUE,CLICK_COUNT,SOURCE_VIEW,SOURCE_INSTITUTION,SOURCE_ON_CAMPUS,SOURCE_USER_GROUPfrom P41_PRM00.S_CLICK_SUMMARIESWHERE CLICK_TYPE NOT IN ('File System', 'DB Listener', 'Load', 'Indexes', 'Table Space', 'Search Problem', 'IO Wait', 'Memory')
View Definitions All stats views seem to be based on
S_SEARCH_SUMMARIES & S_CLICK_SUMMARIES tables Notice that CLICK_EVENTS excludes some
system-type stats SEARCH_STATISTICS is a subset of
S_SEARCH_SUMMARIES, where SUMMARY_TYPE='SEARCH_COUNT‘
SEARCH_STRINGS is a subset of S_SEARCH_SUMMARIES, where SUMMARY_TYPE = 'TOP_SEARCHES_SUMMARY'
Pop quiz #1 In 1745, settlers from the English
colonies, assisted by the British fleet, invaded and captured the capital of one of the provinces of New France. Which one?
(Will accept the name of the French province, the modern Canadian province of which it is a part, the fortress, or the place name.)
Data Anomalies
SQL vs. BIRT Reports Replicate BIRT report for Click EventsEvent type
BIRT SQL
Display details tab
5 49,629
DS 444,539 89,791GetIt!Link2
1 20,682
SQL Selection Criteria Issues Some tables contain “junk” Out of 10M rows in the
CLICK_EVENTS view, 36% had no institution name
Myriad variations in INSTITUTION_NAME
Basic Selection CriteriaSELECT event_type, click_value, click_count, institution, \"VIEW\" AS view_name, on_campus, user_groupFROM p41_rpt00.click_eventsWHERE to_char(event_date,'YYYYMM') = '$previous_month'AND institution is not nullAND lower(institution) not like 'primo%'
Scope Names Hoped that SCOPE_NAME would be
equivalent to the Search Scope Name as it appears on the Search Scope List in the Primo Back Office.
Current default SCOPE_NAME appears as: scope:("MSL"),scope:(libguides),scope:
(archon),scope:(AMDB_VOYAGER),scope:(TAMU-SFX ),scope:(EVANS),scope:(tamu_dspace_qdc),primo_central_multiple_fe
Collected all known SCOPE_NAME values in a Perl module, TAMU_Primo.pm
Scope Types SEARCH_STATISTICS and
SEARCH_STRINGS views contain an element called SCOPE_TYPE
SCOPE_TYPE in SEARCH_STRINGS should be limited to LOCAL/REMOTE
SCOPE_TYPE IN SEARCH_STATISTICS should be limited to LOCAL/REMOTE/DS
Scope Types (Continued) SEARCH_STATISTICS – 16% of
SCOPE_TYPE values are something other than LOCAL/REMOTE
SEARCH_STRINGS – 12% of SCOPE_TYPE values are something other than LOCAL/REMOTE/DS
If the retrieved value didn’t match the list of defined values, I set it to null.
Data I Can’t Make Sense of SEARCH_STRINGS has only 149,127 rows
in the view Are these unique strings? If yes, why does the same string appear
in different rows? What do the numbers, such as
AVERAGE_RESULTS and SEARCH_COUNT, really mean?
Example “Fluid mechanics” appears as a search
string in the default scope 5 times in the period 1/18/2014-3/5/2014.
AVERAGE_RESULTS by date 18-Jan-14 210677 31-Mar-14 150528 27-Feb-14 58544 5-Mar-14 58576 5-Mar-14 74119
Pop quiz #2 Which city in western Canada was the
birthplace/hometown of the following personalities: Deanna Durbin, actress Anna Pacquin, actress Doug Henning, magician and entertainer Sir William Stephenson, AKA Intrepid, spy Guy Gavriel Kay, novelist and poet Brett Hull, professional hockey player Marshall McLuhan, media guru Fred Turner, musician, Bachman-Turner Overdrive Monty Hall, host of Let’s Make a Deal
Perl Extract Programs
Generalities The extract and processing programs for
the TAMU report server are written in Perl; the front end is written in PHP
The Primo stats extract programs I have written live on the production Primo server; they sftp output to the report server
The perl programs use a local symlink from /exlibris/product/perl-5.8.9/bin/perl to /exlibris/primo/scripts/perl
Generalities (Continued) The Primo group consists of 5 Perl programs
and 1 module click_extract.pl, click_compile.pl, facets.pl,
search_statistics.pl, search_strings.pl, TAMU_Primo.pm
click_extract.pl extracts data from the CLICK_EVENTS view and stores it in output files, which are mined by click_compile.pl & facets.pl to create useful output.
search_statistics.pl & search_strings.pl extract data from their corresponding views to an output file
Generalities (Continued) Programs are designed to be run on a
monthly basis, to be put into a cronjob and cumulate the previous month’s data. But they can also be run from the command line with parameters that let you select other months earlier in the calendar.
The programs that create output files also have a step to sftp the output to a different server. But you have to do the sftp setup between servers yourself.
A Few Specifics Facets.pl creates 2 sets of output files –
one set which cumulates all facet requests and a second one that provides detail about certain facet types
If it’s a domain, language, library, resource type, or top-level facet, it cumulates the individual values under each of those types. So you would know how many times the facet for English language was applied or the facet for Thesis resource type.
Normalization Contained in TAMU_Primo.pm
Defines variations in the institution value, code versus spelled out name, and normalizes them all to the codes
Defines a list of valid view names Normalizes the user groups.
Defines a long list of valid scope_names Search_statistics.pl collects undefined
scope_names and emails the list to a designated email account so that the list can be updated
Pop quiz #3 In 1993, the National Hockey League
changed its conference and division names to boring stuff like “Eastern Conference” and “Pacific Division”. Before that, the historic conference and division names were based on people who had something to do with hockey and the history of the NHL. Give one of those old conference or division names.