23
CREATING A DATABASE OF SHIP CITATIONS: THE CHALLENGES ENCOUNTERED IN SHIPINDEX.ORG The Charleston Conference, 3 Nov 2010 Peter McCracken Co-Founder & Director of Content and Business Development, ShipIndex.org

Creating A Database of Ship Citations

Embed Size (px)

Citation preview

Page 1: Creating A Database of Ship Citations

CREATING A DATABASE OF

SHIP CITATIONS: THE CHALLENGES ENCOUNTERED

IN SHIPINDEX.ORG

The Charleston Conference, 3 Nov 2010

Peter McCrackenCo-Founder & Director of Content

and Business Development, ShipIndex.org

Page 2: Creating A Database of Ship Citations

What kinds of ships are these?

Bark (or barque); Ship; Brigantine; Barquentine; Topsail Schooner; Schooner

Page 3: Creating A Database of Ship Citations

Serials :: Ships

Publication pattern (or format?) :: Vessel type

Serial title :: Ship name

ISSN :: IMO

Ship research :: Any other historical research

Page 4: Creating A Database of Ship Citations

Ships :: Other historical research

Problems with ships are the same as problems

with personal names, geographic descriptors,

etc.

Can also apply to concepts, as well as things

Also ‘non-unique’ items, like a car model

Page 5: Creating A Database of Ship Citations

Data challenges – personal names

Innumerable works by “Anonymous”

Names are often shortened

Pablo Picasso’s full name was Pablo Diego José

Francisco de Paula Juan Nepomuceno María de

los Remedios Cipriano de la Santísima Trinidad

Ruiz y Picasso

Names have strange limitations

Some must be unique – Consider Michael J. Fox

Some are very common – Consider Adam Smith

Page 6: Creating A Database of Ship Citations

Data challenges – geographic

names

Numerous variations: Köln; Cologne; Keulen;

Colonia; Colònia; Kolín nad Rýnem; Cwlen;

Κολωνία; Kolonjo; كولونيا; Кьолн; Ķelne; Кёльн

Name changes

Hot Springs, NM -> Truth or Consequences, NM

Halfway, OR -> Half.com, OR

Clark, TX -> DISH, TX

St. Petersburg -> Petrograd -> Leningrad ->

St. Petersburg (“Petersburg,” or “Piter”)

Page 7: Creating A Database of Ship Citations

A “meaning-less” identifier

Regardless of the topic, some meaning-less

identifier can provide significant assistance

“Meaning-less” in the sense of a one-to-many

relationship between the identifier and the

data

The identifier doesn’t change, but the data can

Page 8: Creating A Database of Ship Citations

Overview of ShipIndex.org

A database of citations –

>1.42 million citations, from >200 resources

>140,000 citations are freely available

Changes how one does maritime research

Far more content can researched more quickly

Opens up maritime research to everyone No need for inside knowledge on where to start

searching

Uncovers many hidden resources

Locates free, but hidden, web resources

Page 9: Creating A Database of Ship Citations

Maritime access points

Vessel name

Vessel number

IMO numbers are new; hull numbers change

Captain name

They change between voyages, and die during them

Rig or vessel type

Ships are rebuilt; definitions change; “ship”

ALSO: Port of registration; crew members; others

Page 12: Creating A Database of Ship Citations

Sources of errors – transcribers,

indexers, OCR operators, etc.

Transcription errors are very easy to make –

whether through incorrect assumptions, or

just mistakes

“Earnets” for “Earnest”; “Elizaneth” for

“Elizabeth”, etc.

Some files are much tougher to manage than

others

Page 13: Creating A Database of Ship Citations

More challenges

How do we locate Elizabeth? Or Mary?

Elizabeth = 1899 citations

Mary = 2614 citations

Top ten ship names, for no good reason: Mary, Maria, Elizabeth, Anna, Union, Victoria, Hope, Flora, Emma, America

Try to limit results sets?

by time period

by vessel rig (maybe?)

by location(?)

by nationality

Page 14: Creating A Database of Ship Citations

Changing vessel names

What do we do when a vessel changes its

name?

A person researching a vessel wants to know the

life of a ship; at present they need to know its

previous or subsequent names

This can only be done when we have unique

vessel identifiers – otherwise, how do you know

which Elizabeth became Hogwarts Belle?

Page 15: Creating A Database of Ship Citations

Existing vessel identifiers

Hull Identification Number – Only US; any powered boat

USCG Documentation Number – Only US; >5 net tons

IMO Number – Assigned by Lloyd’s/Fairplay; international; passenger ships >100 gross tons, and cargo ships >300 gross tons; mandatory from 1996

Naval Identifiers – eg, PT-109, CV-42, BB-18, DD-793, D118, etc.

Lloyd’s numbers, and many more…

Page 16: Creating A Database of Ship Citations

Unique historical vessel identifiers

Need an easy way to differentiate between

“Mary,” “Mary,” and “Mary”

Needs to be unique and unchanging (unlike

name, naval identifier, etc.)

Identifier itself has no meaning – no

indication within it of size, nationality, etc.

Identifier is quickly & automatically assigned

Identification is coordinated with multiple

organizations

Page 17: Creating A Database of Ship Citations

Creating an identifier

Could be done through a standards-creation

process, via NISO or another organization

Or informally, with publicly-defined

guidelines, such as (just as examples):

Nine-digit number; ddd-ddddd-c (c=check digit)

Allow individuals to easily request identifiers for

their vessels or their citations

Need ability to easily combine/split/modify

User-managed is likely most cost-effective solution

Page 18: Creating A Database of Ship Citations

Creating an identifier

Must have buy-in from many groups

Should be easy to implement

Should be easy to use; available to many

individuals and resources

Pre-populate as much as possible, open

editing to all

Maintain advisory group to address concerns,

disagreements, etc.

Page 19: Creating A Database of Ship Citations

Defining <ShipIdentifier>

<OtherIdentifiers>

<IdentifierType>

<IdentifierNumber>

<ShipName>

<DateNameStartedInUse>

<DateNameEndedInUse>

<PreviousShipName>

<SubsequentShipName>

<RigType> - defined list of types, & “other”

<VoyageIdentifier> - multiple

Page 20: Creating A Database of Ship Citations

More <ShipIdentifier>

<MilitaryUsage?> - yes/no/unclear

<Nationality>

<ServiceBranch>

<HullIdentifier>

<VesselMeasurements>

<MeasurementType> - list of options

<MeasurementValue>

Page 21: Creating A Database of Ship Citations

Defining <VoyageIdentifier>

<ShipIdentifier>

<Captain>

<Crew> - multiple positions, multiple names

<CrewPosition>

<CrewmemberName>

<OtherVoyageIdentifiers>

<OtherVoyageDatabase>

<OtherVoyageDbId>

Page 22: Creating A Database of Ship Citations

Expanding to other fields

Makes discovery more manageable

Makes linking possible

Use the same concept for other areas of

research, linking everything together

People

Places

Manufactured items

Artwork

Everything

Page 23: Creating A Database of Ship Citations

Thoughts, questions, more?

Thank you –

Peter McCracken

[email protected]