52
Wikimedia/British Library map mapping project review and latest update How to find 50,000 maps in a haystack of 1,000,000 images; geolocate them, and categorise them ... on a budget of no not many euros. James Heald, Wikimedia volunteer (User:Jheald) Kimberly Kowal, British Library [email protected]

Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Embed Size (px)

Citation preview

Wikimedia/British Library map mapping project

– review and latest update

How to find 50,000 maps in a haystack of 1,000,000 images; geolocate them, and categorise them

... on a budget of no not many euros.

James Heald,Wikimedia volunteer

(User:Jheald)

Kimberly Kowal,British Library

[email protected]

1,000,000 imagesFantastic, but …

Very limited metadata

Very limited metadataCommons said no bulk upload

Volunteer response…

Create a subject index by book…

… encouraging images to be uploaded by the book(20,000 so far – majority by one user)

… however, manual categorisation of images isvery very time-consuming.

Could anything be done more automatically…

?

Maps: natural classification, given co-ordinates

Could anything be done more automatically…

?

So: find the maps on Flickr, and tag them…

… using the index to drive the process

31 Oct

… using the index to drive the process

31 Oct

… using the index to drive the process

31 Oct

… using the index to drive the process

03 Nov

… using the index to drive the process

17 Dec

… using the index to drive the process

19 Dec

But how many maps were there ?

Oct 31

But how many maps were there ?

Oct 31

But how many maps were there ?

Nov 2

But how many maps were there ?

Nov 7

But how many maps were there ?

Nov 14

But how many maps were there ?

Dec 1

But how many maps were there ?

Dec 10

But how many maps were there ?

Dec 17

But how many maps were there ?

Dec 28

-- including 20,000 found independently by @Quasimondo, machine-assisted using his own pattern recognition methods

50,000 maps in all:

classmark detailed totals index index ------ ---------- ----------- misc 16074 14091 1983

Europe 13136 6254 6882British Isles 7191 269 6922North America 6758 1524 5234 USA 5782 1209 4573Asia 2736 1280 1456Africa 2300 1075 1225South America 895 659 236

Geo-location, using the Klokan/BL Georeferencer

(Free alternatives are also available)

Next step:

10x more images than the BL has ever attempted before

Next step:

Success allows the old map to be laid over the top of a modern one

Pilot run of 3,000 completed

Now characterised by location …

Pilot run of 3,000 completed

... and scale

All that is needed to identify individual continents …

… countries …

… nation …

… nations …

… cities …

… and beyond

… and beyond.

Ready to be uploaded to Commons…

Ready to be uploaded to Commons…

… almost

To do list:

Better subject identification

Reasonable Commons categorisation

To do/1: Subject identification

Current: OSM Nominatim, 4 votes out of 5

To do/1: Subject identification

Small features: Look up on Wikidata, find plausible candidate

To do/1: Subject identification

Large features: can be over-cautiousNeed better idea of size of candidate features…

To do/1: Subject identification

Large features:… so compare typical existing maps

To do/2: Categorisation

Principle on Commons is to refine into groups of'human manageable' size.

~ 4 to 40 images (larger for series)

Good for humans, less good for machines... wildly different categorisation depths & naming

To do/2: Categorisation

Routine upload and management categories ... straightforward enough.

Maps from collection uploaded on <date> Maps from collection uploaded on <date> with

categorisation to confirm Images from <book>

but then ...

To do/2: Categorisation

Countries: Old maps of <country> Old maps of part of <country>Cities: Old maps of <city> Old maps of cities in <country>

Old maps of cities in <part of country>+ "<city>" itself ?

Features: (ie buildings, castles, cathedrals, battlefields, etc)

<Feature> / Plans of <Feature> Plans of <feature-type>s in <place>

To do/3: Strengthening Wikidata

<feature-type> should be given by P31 ("Instance of“) -> church, castle, cathedral, battlefield, etc

But data often not yet there...Need to supply: WP category mining (care needed:"category spillage"), databases (if PD), etc.

To do list

There is work to do…

But with some work, (and some human mop-up),automated upload + reasonable categorisationshould be possible.

State of play

Georeferencing is underwayIndex pages now have “to georef” templates.

State of play

Main progress page is live

Conclusions: Tiered levels of wiki-pages leading to image searches can be used to drive large projects Even ad-hoc rough indexes are useful Commons's own old maps should be next

(~ 60,000)

Georeferencing is fun -- come and give it a try