20
Digitisation Mick Eadie Visual Arts Data Service

Digitisation Mick Eadie Visual Arts Data Service

Embed Size (px)

Citation preview

Page 1: Digitisation Mick Eadie Visual Arts Data Service

Digitisation

Mick Eadie

Visual Arts Data Service

Page 2: Digitisation Mick Eadie Visual Arts Data Service

The ‘input channels’ of digitisation (keyboard, scanner etc.)

are narrow and can only capture a partial representation of the original source

chose data model

digital objectschose digitisation method

identify sources to digitise

Source – Digitisation - Resource

Page 3: Digitisation Mick Eadie Visual Arts Data Service

PhotocopyPhotographRecording

OriginalSource

Copyof

Source

Item toDigitise

Sound,Movingimage

DigitalObject

2D Image3D Model

DigitalResource

Digital audio/movie recording

ScanDigital Camera3D Scan

OCRLine tracing

Digitisation Pathways

Page 4: Digitisation Mick Eadie Visual Arts Data Service

Users

Knowledge

Experience

Culture

Environment

HardwareSoftware(OS)(Network)

Digital ObjectsBinary Data

Data ModelsRelationships

The environment of a digital resource often receives the most attention, but it is the users and digital objects that are most important

Hardware and software selection should be based on the needs of the users and the types of digital objects to be used

Fit for Purpose: Digital objects must be created with their intended use/purpose of paramount importance

Elements of a Digital Resource

Page 5: Digitisation Mick Eadie Visual Arts Data Service

Digital Objects

• Text– Data stored as a stream of characters (numbers, letters, etc.)

• Image– Data primarily understood as a spatial pattern or shape– Bitmap and vector images/raster (bitmap) and vector spatial

data

• Time– Data primarily understood as a sequence through time– Audio and/or video (multimedia)

Page 6: Digitisation Mick Eadie Visual Arts Data Service

Text

• Essentially, numeric codes used by the computer to represent specific characters– Fonts must be designed to provide a visual image for each

code– Software must be designed to interpret the codes

• ASCII is the most well known text encoding scheme– 1 byte per character = 256 unique characters, primarily the

Latin alphabet– Other characters are handled by having multiple code pages– Each code page uses the same codes to represent different

characters• UNICODE is the replacement for ASCII

– 2 bytes to store each character = 60,000+ codes– Can represent characters from different alphabets

simultaneously as each character has a unique code

Page 7: Digitisation Mick Eadie Visual Arts Data Service

Text Transcription

• Advantages:– Low overhead to start transcription: person, keyboard,

document– Hand-written documents can be transcribed– A transcriber can follow complex disorganised

documents• Issues:

– Slow and expensive– Human error

• Good practice:– Double entry (two transcribers both enter the same

document and the transcriptions are checked for differences)

– Keep copies of originals with transcriptions (preferably as digital images as this make post-transcription checking simple and quick)

Page 8: Digitisation Mick Eadie Visual Arts Data Service

Optical Character Recognition

• Advantages:– Automatic, suitable for digitising large numbers of

documents– Highly accurate for clean, clear type written documents

• Issues:– Current technology is very poor on hand-writing– Complex document layout can become scrambled

• Good practice:– Proof-read, spell check OCR output for errors– Provide image of page with text so users can check the

text themselves

Page 9: Digitisation Mick Eadie Visual Arts Data Service

Bitmap (Raster) Images

• The image is made up of many pixels• Each pixel stores information about its colour• The standard archival file format is uncompressed TIFF

Page 10: Digitisation Mick Eadie Visual Arts Data Service

Resolution

• Resolution is often expressed as dots per inch (dpi)• More accurately pixels per inch (ppi)• The ‘frequency’ at which samples are taken by the capture

device from the original source

Common misconceptions about ppi• Not an indicator of image size or quality• Unless we know the size (inches, cms) of the original• A better guide to digital image size is pixel dimensions e.g.

2000 x 3000 pixels, which allows us to work out the size of the image we will output to monitor or printer

• No of pixels/output res = output size

Page 11: Digitisation Mick Eadie Visual Arts Data Service

Scanners and Digital Cameras

• Advantages:– Accurate(?) visual representation of the source

• Issues:– Text and logical structure of a document is not captured

(can be through OCR or line tracing)• Good practice:

– Capture master images at appropriate resolution and bit depth

– Check the optical resolution of the scanner (avoid interpolated resolution)

– Check the colour resolution (bit depth)– Check scanning time– Record details of scanner settings and any image editing

done afterwards

Page 12: Digitisation Mick Eadie Visual Arts Data Service

Vectors

• A point represents an exact location in two or three dimensional space

• Two points define a line

• A series of connected lines define an area

x,y x,y,z

Page 13: Digitisation Mick Eadie Visual Arts Data Service

Vector Data

• Advantages:– Can be zoomed (c.f. bitmap images)– Allows spatial analysis (spatial statistics, network

analysis)• Issues:

– Precision versus accuracy (detail versus truthfulness)– Scale versus resolution

• Good practice:– Ensure polygon topology (the polygons each line

belongs to) is stored

Page 14: Digitisation Mick Eadie Visual Arts Data Service

Digital Audio

• Human hearing– Frequency (pitch) - 20Khz to 20,000Khz– Intensity (loudness) - 0 and 120Db

• Full sound reproduction requires digitisation at more than 40,000 samples a second (44,100 is a common standard)– NYQUIST rate: for lossless digitisation, the sampling

rate should be at least twice the maximum audio frequency

• One second of good quality uncompressed digital sound is equivalent to ¼ of the Complete plays of Shakespeare– MP3 offers good quality compressed (lossy) files

• Midi: not a digital recording of actual sounds, but a digital sample ‘library’ of how musical instruments sound

Page 15: Digitisation Mick Eadie Visual Arts Data Service

Digital Moving Images

• 1 second of uncompressed good quality digital video (without sound) is equivalent to about ¾ of the complete plays of Shakespeare

• MPEG - The Motion Pictures Experts Group standards are the most popular compression standards– The three standards, MPEG-1, MPEG-

2, MPEG-4• Compression basically works by selecting

key frames and only recording changes between the frames (but it gets a lot more complicated!)

Page 16: Digitisation Mick Eadie Visual Arts Data Service

Data Models

• A data model is a set of rules that defines a particularly way

of organising a collection of digital objects

• List, one item follows another

• Tree, each item can have several children

• Sets, items belong to one or more groups

• Geography/geometry, items are located using a co-ordinate system

Page 17: Digitisation Mick Eadie Visual Arts Data Service

Selecting a Data Model

• To be useful, digital objects must be:– Arranged according to the rules of an appropriate

data model– Stored in a file format that can represent the data

model– Accessed with software that understands the file

format and the data model, and can present the data in an appropriate way

• When selecting a data model– Consider the ‘natural’ organisation of your source– Consider what method of organisation will be

familiar to your users– Consider the method of organisation that best fits

your purposes• Then seek specialist advice if you need it!

Page 18: Digitisation Mick Eadie Visual Arts Data Service

Selecting Software

• Selecting the right data model is more important than selecting a particular piece of software

• Pick software that works with your preferred data model (can perform the right tasks)– Don’t use a webpage editor as a database– Don’t use a word processor as a spreadsheet

• Avoid little-used software with proprietary features• Look for software with lots of export and import options• Look for software that supports important standards

– Trees markup XML (SGML)– Sets relational databases SQL– Coordinates CAD or GIS less clear, use file formats

like DXF, ESRI shape files

Page 19: Digitisation Mick Eadie Visual Arts Data Service

Digitisation: a Balancing Act

• Successful digitisation involves several trade-offs:– Amount and detail versus time and cost of digitisation– Complexity of the digital resource versus ease of use– Flexibility of the digital resource versus suitability for a

specific use– Digitisation with current technology versus future

possibilities

• Your project should be guided by a firm understanding of the source and the intended purpose of the digital resource– Do not exceed available support (financial, technical,

labour)– Minimise the loss of information from the original during

the digitisation process– Keep information that tracks the origin and history of the

digital resource with the digital resource

Page 20: Digitisation Mick Eadie Visual Arts Data Service

Where to get more advice

AHDS Guides to Good Practice serieshttp://vads.ahds.ac.uk/guides/index.html

Technical Advisory Service for Images (TASI)http://www.tasi.ac.uk

Text Encoding Workshopshttp://www.ota.ahds.ac.uk

BUFVC Workshopshttp://www.bufvc.ac.uk