Upload
peter-mccracken
View
604
Download
0
Embed Size (px)
Citation preview
CREATING A DATABASE OF
SHIP CITATIONS: THE CHALLENGES ENCOUNTERED
IN SHIPINDEX.ORG
The Charleston Conference, 3 Nov 2010
Peter McCrackenCo-Founder & Director of Content
and Business Development, ShipIndex.org
What kinds of ships are these?
Bark (or barque); Ship; Brigantine; Barquentine; Topsail Schooner; Schooner
Serials :: Ships
Publication pattern (or format?) :: Vessel type
Serial title :: Ship name
ISSN :: IMO
Ship research :: Any other historical research
Ships :: Other historical research
Problems with ships are the same as problems
with personal names, geographic descriptors,
etc.
Can also apply to concepts, as well as things
Also ‘non-unique’ items, like a car model
Data challenges – personal names
Innumerable works by “Anonymous”
Names are often shortened
Pablo Picasso’s full name was Pablo Diego José
Francisco de Paula Juan Nepomuceno María de
los Remedios Cipriano de la Santísima Trinidad
Ruiz y Picasso
Names have strange limitations
Some must be unique – Consider Michael J. Fox
Some are very common – Consider Adam Smith
Data challenges – geographic
names
Numerous variations: Köln; Cologne; Keulen;
Colonia; Colònia; Kolín nad Rýnem; Cwlen;
Κολωνία; Kolonjo; كولونيا; Кьолн; Ķelne; Кёльн
Name changes
Hot Springs, NM -> Truth or Consequences, NM
Halfway, OR -> Half.com, OR
Clark, TX -> DISH, TX
St. Petersburg -> Petrograd -> Leningrad ->
St. Petersburg (“Petersburg,” or “Piter”)
A “meaning-less” identifier
Regardless of the topic, some meaning-less
identifier can provide significant assistance
“Meaning-less” in the sense of a one-to-many
relationship between the identifier and the
data
The identifier doesn’t change, but the data can
Overview of ShipIndex.org
A database of citations –
>1.42 million citations, from >200 resources
>140,000 citations are freely available
Changes how one does maritime research
Far more content can researched more quickly
Opens up maritime research to everyone No need for inside knowledge on where to start
searching
Uncovers many hidden resources
Locates free, but hidden, web resources
Maritime access points
Vessel name
Vessel number
IMO numbers are new; hull numbers change
Captain name
They change between voyages, and die during them
Rig or vessel type
Ships are rebuilt; definitions change; “ship”
ALSO: Port of registration; crew members; others
Vessel names – this is easy!
“What does the
stern say?”
1872, American Lloyd’s Register of American and Foreign
Shipping
1867, American Lloyd’s Register of American and Foreign
Shipping
Sources of errors – primary sources
Mistakes in primary sources are very common,
and forgiveable
Digitized version of Lloyd’s List of 1812
Ships called “Adolph & Fredericka”
Sources of errors – transcribers,
indexers, OCR operators, etc.
Transcription errors are very easy to make –
whether through incorrect assumptions, or
just mistakes
“Earnets” for “Earnest”; “Elizaneth” for
“Elizabeth”, etc.
Some files are much tougher to manage than
others
More challenges
How do we locate Elizabeth? Or Mary?
Elizabeth = 1899 citations
Mary = 2614 citations
Top ten ship names, for no good reason: Mary, Maria, Elizabeth, Anna, Union, Victoria, Hope, Flora, Emma, America
Try to limit results sets?
by time period
by vessel rig (maybe?)
by location(?)
by nationality
Changing vessel names
What do we do when a vessel changes its
name?
A person researching a vessel wants to know the
life of a ship; at present they need to know its
previous or subsequent names
This can only be done when we have unique
vessel identifiers – otherwise, how do you know
which Elizabeth became Hogwarts Belle?
Existing vessel identifiers
Hull Identification Number – Only US; any powered boat
USCG Documentation Number – Only US; >5 net tons
IMO Number – Assigned by Lloyd’s/Fairplay; international; passenger ships >100 gross tons, and cargo ships >300 gross tons; mandatory from 1996
Naval Identifiers – eg, PT-109, CV-42, BB-18, DD-793, D118, etc.
Lloyd’s numbers, and many more…
Unique historical vessel identifiers
Need an easy way to differentiate between
“Mary,” “Mary,” and “Mary”
Needs to be unique and unchanging (unlike
name, naval identifier, etc.)
Identifier itself has no meaning – no
indication within it of size, nationality, etc.
Identifier is quickly & automatically assigned
Identification is coordinated with multiple
organizations
Creating an identifier
Could be done through a standards-creation
process, via NISO or another organization
Or informally, with publicly-defined
guidelines, such as (just as examples):
Nine-digit number; ddd-ddddd-c (c=check digit)
Allow individuals to easily request identifiers for
their vessels or their citations
Need ability to easily combine/split/modify
User-managed is likely most cost-effective solution
Creating an identifier
Must have buy-in from many groups
Should be easy to implement
Should be easy to use; available to many
individuals and resources
Pre-populate as much as possible, open
editing to all
Maintain advisory group to address concerns,
disagreements, etc.
Defining <ShipIdentifier>
<OtherIdentifiers>
<IdentifierType>
<IdentifierNumber>
<ShipName>
<DateNameStartedInUse>
<DateNameEndedInUse>
<PreviousShipName>
<SubsequentShipName>
<RigType> - defined list of types, & “other”
<VoyageIdentifier> - multiple
More <ShipIdentifier>
<MilitaryUsage?> - yes/no/unclear
<Nationality>
<ServiceBranch>
<HullIdentifier>
<VesselMeasurements>
<MeasurementType> - list of options
<MeasurementValue>
Defining <VoyageIdentifier>
<ShipIdentifier>
<Captain>
<Crew> - multiple positions, multiple names
<CrewPosition>
<CrewmemberName>
<OtherVoyageIdentifiers>
<OtherVoyageDatabase>
<OtherVoyageDbId>
Expanding to other fields
Makes discovery more manageable
Makes linking possible
Use the same concept for other areas of
research, linking everything together
People
Places
Manufactured items
Artwork
Everything