Upload
roger-burke
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
College Center for Library College Center for Library AutomationAutomation
Tallahassee, FLTallahassee, FL
• Susan B. Campbell Susan B. Campbell ([email protected])([email protected])
• Jim McGill Jim McGill
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
automating retrieval and reporting automating retrieval and reporting of database usage statistics for a of database usage statistics for a
consortiumconsortium• CCLA provides and maintains the Library Information CCLA provides and maintains the Library Information
Network for 28 Community Colleges (LINCC) for Florida's Network for 28 Community Colleges (LINCC) for Florida's 65+ community college libraries. 65+ community college libraries.
• db statistics we’re collecting and reportingdb statistics we’re collecting and reporting• 19 vendors19 vendors• over 200 databasesover 200 databases• monthly reports by database, campus, statewidemonthly reports by database, campus, statewide• on demandon demand
• customers for monthly reportscustomers for monthly reports• 28 community colleges in Florida28 community colleges in Florida• internal reportsinternal reports
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
automating retrieval and reporting of automating retrieval and reporting of database usage statistics for a database usage statistics for a
consortiumconsortium
• problemproblem• what we were doing and why it doesn’t workwhat we were doing and why it doesn’t work
• solutionsolution• the pieces, the parts and how they fit togetherthe pieces, the parts and how they fit together
• futurefuture• what we’ve learned and our expectationswhat we’ve learned and our expectations
• problemproblem• what we were doing and why it doesn’t workwhat we were doing and why it doesn’t work
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
• excel excess excel excess
the problemthe problem
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
the problemthe problem
• vendor varietyvendor variety
repeat 28 times or more for each vendor
(and sometimes each database)
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
automating retrieval and reporting of automating retrieval and reporting of database usage statistics for a database usage statistics for a
consortiumconsortium
• problemproblem• what we were doing and why it doesn’t workwhat we were doing and why it doesn’t work
• solutionsolution• the pieces, the parts and how they fit togetherthe pieces, the parts and how they fit together
• futurefuture• what we’ve learned and our expectationswhat we’ve learned and our expectations
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
the solutionthe solution• automatingautomating
• maintenance utilitiesmaintenance utilities• handling retrieved datahandling retrieved data• reporting in multiple formatsreporting in multiple formats
• retrieval of vendor dataretrieval of vendor data
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
intranet web interfaceintranet web interface
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
Vendor not
responding
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
reportingreporting
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
creating retrieval scriptscreating retrieval scripts“nuts and bolts”“nuts and bolts”
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
Process Trace File(ParseHTTPTrace.pl)
Generic Web Page retrieval (GetWebPage_VENDOR.pl)
Automated Web Page Retrieval(GetWebPage_VENDOR.pl)
Web Page Code(GetWebPage_VENDOR.html)
SQL Server EXPRESS
Manual Edits
Parse Web Page Information(ProcessVENDOR.pl)
Parameters
Statistics
ProcessVENDOR.sql
One Time, 4 Step Process Automated Process
(Manual edits for testing & first cleanup – remove everything that isn’t in table. This is iterative and run from the command prompt until satisfactory file is returned.)
Web InterfaceQueue
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
This is a manual process to create the Perl script that will accept variables and create GetWebPage_VENDOR.pl
step 1. capture HTTP headers
Process Trace File(ParseHTTPTrace.pl)
Generic Web Page retrieval(GetWebPage_VENDOR.pl)
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
step 2. modify Perl script to accept command line variables
to reformat standard YYYYMM format to two separate variables: MM and YYYY for URL
$Period=$ARGV[0];$ScopeCustID=$ARGV[1];$UserName=$ARGV[2];$Password=$ARGV[3];
#$ScopeCustID="bcc";#$Period="200701";
$yr=substr($Period,0,4);$mon=substr($Period,4,2);if ($mon < 10) {$mon=~s/0//gi;};
YYYYMM - our DB formatvendor specific scope customer ID
remarks - unremarked for testing
Automated Web Page Retrieval(GetWebPage_VENDOR.pl)
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
Step 3. modify script with command line variables and parse runtime variables
... iodFromMonth=' . $mon . '&timePeriodFromYear=' . $yr . '&timeP ...$content0=$resp5->content;
$pos=index($content0,"VIEWSTATE")+13;$pos2=substr($content0,$pos,5000);$pos3=index($pos2,"value")+7;$pos4=index($pos2,"\/>");$VIEWSTATE=substr($pos2,$pos3,$pos4-$pos3-2);$VIEWSTATE=~s/\//\%2F/gi;$VIEWSTATE=~s/\+/\%2B/gi;$VIEWSTATE=~s/\=/\%3D/gi;
$pos=index($content0,"EVENTVALIDATION")+13;$pos2=substr($content0,$pos,2000);$pos3=index($pos2,"value")+7;$pos4=index($pos2,"\/>");$EVENTVALIDATION=substr($pos2,$pos3,$pos4-$pos3-2);$EVENTVALIDATION=~s/\//\%2F/gi;$EVENTVALIDATION=~s/\+/\%2B/gi;$EVENTVALIDATION=~s/\=/\%3D/gi;
SECURITY CODES
some codes are session based & must be parsed out to pass to subsequent
pages
Automated Web Page Retrieval(GetWebPage_VENDOR.pl)
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
step 4. create page parser (part 1)
Parse Web Page Information(ProcessVENDOR.pl)
creating ProcessVendor.pl script
include file with needed subroutines
$col=$ARGV[0];$vendor=“vendorname";$VDBSuffix=“VENDOR";$jumpin="<b>Site:";$jumpout="Grand Total";require ("../VDBProcs.pl");
anonymized (for this presentation) vendor name
college name – when needed
points to begin and stop processing file
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
After processing, each table row is on one line with all carriage returns, linefeeds, and tabs removed. Blank lines and page feeds are not output, code outside jump* is ignored. Period, college name and other variables are passed from the database by the VDBProc.pl file.
VDBProcs.pl
htmlclean()
htmltotxt()
getperiod()
writestats()
validation()Vendor.pl
SQL log file
Validation is run on SQL log file to look for error messages and write to log. Entries are made for no data, change from previously retrieved period value or other potential problems.
Step 4. create page parser (part 2)
Parse Web Page Information(ProcessVENDOR.pl)
procedures called from common include file
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
automated process
Automated Web Page Retrieval(GetWebPage_VENDOR.pl)
Web Page Code(GetWebPage_VENDOR.html)
SQL Server EXPRESS
Parse Web Page Information(ProcessVENDOR.pl)
Parameters
Statistics
ProcessVENDOR.sql
Automated Process
Web InterfaceQueue
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
handling retrieved datahandling retrieved data delete from VDBStatistics where vendor=‘VENDOR' and college='VALENCIA COMM COLLEGE' and datasource=‘SOME VENDOR DATABASE' and datatype='Sessions' and subdatatype='0' and period='200802'insert into VDBStatistics ( sourcefile, vendor, college, period, datatype, subdatatype, datasource, quantity ) values ('ProcessVENDOR.sql',‘VENDOR','VALENCIA COMM COLLEGE‘,'200802','Sessions','0',SOME VENDOR DATABASE','4348')
ProcessVENDOR.sql
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
handling retrieved datahandling retrieved data
• where/how we store what we where/how we store what we retrieveretrieve
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
daily backup of database via windows daily backup of database via windows schedulerscheduler
* SQL Server Express does not support SQL Agent
handling retrieved datahandling retrieved data
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
toolstools
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
software usedsoftware used• retrieval of data – freeretrieval of data – free
• Internet ExplorerInternet Explorer• PerlPerl
• LWP library (Library for the WWW for Perl)LWP library (Library for the WWW for Perl)• ieHTTP HeadersieHTTP Headers• ParseHTTPTrace.plParseHTTPTrace.pl
• SQLExpress and managerSQLExpress and manager• Intranet Site (IIS, .asp, vbscript, java)Intranet Site (IIS, .asp, vbscript, java)
• reporting – some costreporting – some cost• EZView (low cost)EZView (low cost)• Crystal Reports (had it)Crystal Reports (had it)
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
structurestructure• environmentenvironment
• each vendor has its own working each vendor has its own working directorydirectory
• each vendor has several files in this each vendor has several files in this directorydirectory
• batch file (called from SQL Server)batch file (called from SQL Server)• Perl script (gets web page)Perl script (gets web page)• Perl script (makes sql to load data)Perl script (makes sql to load data)• log files (troubleshoot)log files (troubleshoot)
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
• activePerl 5.8.6 build 811 to download webpagesactivePerl 5.8.6 build 811 to download webpages
• run from command prompt in development and testingrun from command prompt in development and testing
• ieHTTPHeaders - an add-on for IE that displays ieHTTPHeaders - an add-on for IE that displays HTTP HeadersHTTP Headers
http://www.blunck.se/iehttpheaders/iehttpheaders.htmlhttp://www.blunck.se/iehttpheaders/iehttpheaders.html
• once trace file is captured with ieHTTPHeaders once trace file is captured with ieHTTPHeaders add-on, use ParseHTTPTrace.pl to create add-on, use ParseHTTPTrace.pl to create GetWebPage_VENDOR.pl file.GetWebPage_VENDOR.pl file.
• http://www.codeproject.com/KB/perl/http://www.codeproject.com/KB/perl/webautomaton.aspxwebautomaton.aspx
retrieval of vendor dataretrieval of vendor data
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
automating retrieval and reporting of automating retrieval and reporting of database usage statistics for a database usage statistics for a
consortiumconsortium
• problemproblem• what we were doing and why it doesn’t workwhat we were doing and why it doesn’t work
• solutionsolution• the pieces, the parts and how they fit togetherthe pieces, the parts and how they fit together
• futurefuture• what we’ve learned and our expectationswhat we’ve learned and our expectations
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
what have we learned?what have we learned?• large change in service requires staffing and large change in service requires staffing and
supportsupport• project name should be closely related to the project name should be closely related to the
service service • administration understanding of needsadministration understanding of needs
• assignment of prioritiesassignment of priorities• proof-of-conceptproof-of-concept• need for ongoing support –vendor changes, local needsneed for ongoing support –vendor changes, local needs
• moving from proof-of-concept is NOT trivialmoving from proof-of-concept is NOT trivial• data checking/revisions/data checking/revisionsdata checking/revisions/data checking/revisions• handoff from development to maintenancehandoff from development to maintenance
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
expectationsexpectations• future usefuture use
• until SUSHI is widespread ORuntil SUSHI is widespread OR• until data collection and reporting in ERM until data collection and reporting in ERM
products is mature ORproducts is mature OR• until existing automated systems have until existing automated systems have
reasonable consortial pricingreasonable consortial pricing• future plansfuture plans
• customer/college interfacecustomer/college interface• hope…hope…
March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries
Thank you
College Center for Library Automation
1753 W. Paul Dirac Drive
Tallahassee, Florida 32310
Susan Campbell [email protected]
Jim McGill [email protected]