View
216
Download
0
Category
Tags:
Preview:
Citation preview
Batch-Load Points Counter(MARCEdit project)
Amelia C. VanGundyThe University of Virginia’s College at Wise
Virginia SirsiDynix Library Users Group MeetingNov. 14, 2012
2
John Cook Wyllie Library http://library.uvawise.edu/
• Ebook titles in OPAC & Ebook packages on web in finding aids
• Rate of e-book acquisition increased netLibrary – 3k titles per year
EBSCOhost Ebook Academic Collection – 65k titles initial load– 5-10k titles additional every quarter
3
Batch Loading Problems
• Existing procedures were difficult to follow• Procedures were inconsistent– especially for different vendors
• Didn't take advantage of MARCEdit Tools• 949 holdings field now includes $a class#– previously, files loaded with AUTO “call#”
4
Solution? Wish list?
Determine quality of MARC records– OCLC files vs. other vendor files
Determine editing priorities– required (001/949), recommended, optional
Learn to construct Regular Expression Strings– Batch Editing Tools & Find/Replace
• Streamlined format– needed both an outline & more detailed info
• Make available on-line/web-page
5
MARCEdit proficiency
• Beginner
Advanced Beginner– Uses MARCEditor Tools window
(Add/Delete field, Edit Subfield Data, Sort by... )
– Can apply Regular Expression Strings
Intermediate– Uses MARC Tools wizard
(Extract Selected Records, MARCSplit, Extract selected records)
– Can construct Regular Expressions
• Expert
6
Batch-Load Points Counter (BLPC) people.uvawise.edu/acv6d/
7
Batch-Load Points Counter (BLPC) Webpage & Project link
people.uvawise.edu/acv6d/
1. Introduction– project concept & desired outcomes
2. Checklist #– outlines the batch-load procedures & steps– points counter: “what to do” & “when to stop”
3. Processing Guidelines #– procedures & how-tos & copy/paste info
4. 949 processing
8
BLPC Introduction & Outcomes
• Validation– determine integrity of the file
• Processing – determine quality of the records
• Statistics– track vendor pkgs, record counts, 001 prefixes
• Points– max. points = 150 (2.5 hours)• STOP & contact vendor (request corrected file)
9
BLPC CheckList w/Time estimates
• Step 1 & 2: Preparation & validation– number of records in file– integrity of file– valid URL links
• Step 3-4: Review & processing– quality of records– lists all processing/edits possible
• Step 5: 949 holdings
Print on one page (2 p. per sheet / front&back)
10
BLPC Processing Guidelines(Procedures)
• Gives details for CheckList– Steps 1-2, Steps 3-4, Step 5
• Gives the regular expression strings (copy/paste)– Finding/ Replacing/Deleting– MARCEditor Tools & MARCEdit Tools
• Always use along with Checklist– includes information to process every field, BUT
– not every field needs processing
Do not print out
11
BLPC Step 1: Preparation & Reports
• MARC Validator– Identify Invalid Records– Validate Record (copy/paste into text file)
• Material Type Report
• Field Count– verify vendor count against MARCEditor count (LDR/000)– count early / count often
• Deduplicate (See Addt’l Instruct.)
12
Reports/MARC Validator:Identify Invalid Records
13
Reports/MARC Validator:Validate Records
14
Reports/Material Type
15
BLPC Step 2: Verify Field Counts
• Reports/FieldCount for error checking– first field listed is 000 (corresponds to =LDR)
– last field listed is “numeric”– 245 count
• Reports/MARCValidator errors – open text file created in Step 1– look for specific errors in error file
• Check URL links to make sure they work
16
Reports/Field Count(vendor count = 8556)
17
Field Count Error & "bad field tag"(vendor count =694)
18
Reports/Field Count: Detail(highlight field & right-click)
19
Review Validate Records report(saved as text file in Step 1.B)
20
BLPC: Review for processingChecklist Step 3 workflow
Check field counts Mark-up notes on the Checklist
– Track/count fields that need processing Track points for fields that need processing Track points for fields that need manual editing
Each record to fix means extra points Rule of thumb: for more than 12 manual edits
Treat as separate post-load maintenance project
21
BLPC Checklist Step 3: Review FieldsExamples of required processing
Examine first record & check field count Title control# – 001 (prefer OCLC#)
If lacking: use info. from 035 or create local 001 Check field counts / subfield counts
Title/GMD – 245 $h URL – 856 $3 $y $u
Check Validate Record text file for errors “Invalid field format” / “Subfield cannot repeat”
Check field counts / indicator counts Subject – 650 Ind2 = 4/7 or 5/6/8
22
BLPC Checklist Step 4: Review fieldsExamples of optional processing
Check field count & delete if present 029 / 583 / 584 / 938
Check field data and delete Other vendor pkg names
(netLibrary/ebrary/myiLibrary/24x7/Ebsco) Check field data & ignore/defer
300 lacks phrase: (1 electronic resource)
23
BLPC Checklist with mark-ups
24
BLPC Processing workflowStep 3 - Step 4
Review Field Count Review Field data
– Use Find/Sort window and review first/last field Add/Delete/Edit field Review Field data
– look at field in first record or Find/Sort window– Mistake? Typo? – use the Edit/SpecialUndo
Review FieldCount Save edited file / SaveAs new filename
25
MARCEditor Tools window adding/editing/deleting fields adding/editing deleting subfields
MARCEditor Edit/Find window editing/replacing field data displays sortable list
MARCEdit Tools wizard for select & extract records extract tab-delimited records for Excel
MARCEditor / MARCEdit Tools BLPC Checklist identifies fields to process
26
BLPC Processing: Add std. Phrase506 => Step 3.S
• Check Field Count for presence of 506• Delete existing 506 field (if present)• Consult Step 3.S in BLPC Procedures– Determine that AddField Tool is needed for processing– Copy Std.phrase from Step 3.S notes– Paste into AddField Tool window and submit
• Review 506 data in first record• Check field count• Save file
27
MARCEditor Tools: Add std. Phrase506 => Step 3.S
28
BLPC Processing: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V
• Check Field Count for Presence of 650 Ind2=5/6/8• Consult Step 3.V in BLPC Procedures– Optional Review – FindAll(RegEx) instructions– Determine that Tools/DeleteField tool is needed– Copy RegEx pattern from Step 3.V– Paste into Tools/DeleteField window
– Use Regular Expressions radio button option– Submit using Delete button
• Check Field Count & Indicator count• Save file
29
MARCEditor: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V
30
Regular expressions (RegEx)
• Finding/Editing patterns in strings (letters/numbers)
– Like learning another language• Parentheses are used to group data– Forces the computer to "store" data in "chunks"– Data “chunks” are numbered for recall/retrieval/use– Helps the programmer "read" the pattern
• Optional functionality, and not necessary
• Some punctuation is "reserved" (has a special meaning)
• BLPC uses consistent format for RegEx patterns
31
Reading RegEx Patterns650 Ind2= 5/6/8 (non-LC)
Pattern: (=650 )(.[568])(\$a)(.+)
(=650 ) look for 650 fields with two blank spaces
(. [568]) look for any Ind1 & listed Ind2 numbers
(\$a) look for subfield $a (used as "anchor chunk")
(.+) any letter/number to the end of the field
Use Edit/FindAll(RegEx) to verify pattern
32
Interpreting RegEx punctuation
Pattern: (=650 )(.[568])(\$a)(.+)
( ) Parentheses for data “chunks” . Period for any single letter/number[ ] Square brackets for a list using “OR”
\ Backslash before “reserved” punctuation
esp.: $ \ ( ) [ ]
+ Plus sign for more of the same
“Chunks” are stored as: $1$2$3$4
33
Creating RegEx patterns
• Start with known pattern:For non-LC Subjects: (=650 )(.[568])(\$a)(.+)
FindAll(RegEx) for “local” Subjects (Ind2 = 4/7)(=650 )(.[47])(\$a)(.+)
FindAll(RegEx) for “local” Genres (Ind2 = 4/7)(=655 )(.[47])(\$a)(.+)
34
Editing with RegEx string pattern 650 BISAC subjects => 690
Start with known pattern: (=650 )(.[568])(\$a)(.+)
• Use Edit/Replace(RegEx): Change 650 to 690 Identify “BISAC” subjects: Ind2=7 & $2 = bisacsh
• Determine which “chunks” change/stay the same
Find(RegEx): (=650 )(.[7])(\$a)(.+)(\$2bisacsh)
Replace(RegEx): (=690 )$2$3$4$5
35
Reading RegEx Patterns650 BISAC subjects => 690
Pattern: (=650 )(.[7])(\$a)(.+)(\$2bisacsh)
(=650 ) look for 650 fields with two blank spaces
(.[7]) look for any Ind1 & Ind2 =7 (\$a) look for subfield $a (optional “anchor” text)(.+) any letter/number to the next “chunk”(\$2bisacsh) look for subfield & data at end of field
Can be shortened (which makes the pattern look complicated): Find(RegEx): (=650)(.+\$2bisacsh)Replace(RegEx): (=690)$2
36
MARCEditor: FindAll(RegEx) Testing the pattern: 650 BISAC subjects
37
MARCEditor: Replace(RegEx) 650 BISAC subjects => 690
38
BLPC Step 5: 949 processing Required processing
Policy: Include Class# in Unicorn Item record949
$a -- Pull the call# from the 050$a -- Insert the standard phrase: ' INTERNET'$v -- Pull the 001/OCLC# as a unique no.$w $h $t $x $z -- Add standard holdings data
• See Addt'l instruct,
39
Batch-loading• MARCEdit with files no larger than 10k records– MARCEdit/Tool MARCSplit
• MARCEditor/File: Compile File into MARC• Unicorn batch load rpt uses 001 match point– 'o' for OCLC# o & 'g' for local vendor key
• Unicorn batch load rpt settings– create new bibliographic records only
• Date cataloged -- back dated to prev. month– prevents interference w/scheduled Authority reports– max. load two files a day
40
Identifying records for Cleanup
Checklist finds problems to correct post-load
• Item maintenance projects– 949 lacks call#
• Bibliographic record maintenance projects– 245 lacks $h (if more than 5-12 records) – URLs lacking
• Record reload/overlay project– Record already in OPAC (P-N duplicates)
41
MARCEdit Tools: Select/Extract selected records
Step 3.F: 245 lacks $h
42
MARCEdit Tools: Export Tab Delimited records
43
Help!• MarcEdit Help
http://people.oregonstate.edu/~reeset/marcedit/html/help.html– Click thru the Contents menu:
Contents / Using MARCEdit / Using the MARCEditor / Editing Functions / Using Regular Expressions.
• RegularExpressions.info http://www.regular-expressions.info/
MARCEDIT-L listhttp://metis3.gmu.edu/cgi-bin/wa?A0=MARCEDIT-L
BATCH listhttp://listserv.vt.edu/cgi-bin/wa?A0=batch
44
Amelia C. VanGundyThe University of Virginia's College at Wise
John Cook Wyllie Library
276-328-0154acv6d@uvawise.edu
http://people.uvawise.edu/acv6d/
Virginia SirsiDynix Library Users Group MeetingNov. 14, 2012
45
BLPC ProjectPresentation revisions
Originally presented Nov. 14, 2012
• Additional Slides:– BLCP Project web-page– MARCEditor: FindAll(RegEx)– MARCEdit Tools: Export Tab Delimited records– BLPC Project: Presentation revisions
Recommended