LIS654 lecture 2 the csv plugin, images Thomas Krichel 2012-02-05

LIS654 lecture 2

the csv plugin, images

Thomas Krichel2012-02-05

plugin example

• I demonstrating the CSV import plugin.• I do this because I hate interacting with a

computer.

CSV

• Comma-separated values are a simple text format to represent a table.

• The table is a sequence of lines.• The first line contains field names.• The next line contains field values.

comma separation

• The separation of fields is done by comma. • If the value contains a comma, it has to be

surrounded by double quotes.• Example table of economists

Name, Birthday“Krichel, Thomas”, 1965-06-05“Marx, Karl Heinrich”, 1818-05-05

CSV plugin

• The CSV plugin allows you to upload a CSV file.• That file will describe resources you want to

include in your collection.• The resources can then be included in bulk

into your omeka installation.

install• Local computer way– Download the plugin from its URL, say URL to your

local machine. – Upload unzipped plugin directory to

ssh://[email protected]/omeka/plugins• On wotan– cd omeka/plugins– GET URL > csv.zip– unzip csv.zip– rm csv.zip

activate plugin• From the main menu, look for “Manage

Plugins”.• Look for “CSV Import”. If this is not there, you

have not put the plugin files into the right place.

• Click “install” next to it.• Accept the defaults on the next screen.• You now see the “CSV Import” option on the

top.

compose and upload the csv file• A sample csv file is at

http://wotan.liu.edu/home/krichel/courses/ lis654/examples/csv/manhattan.csv

• You have to place it into the omeka/plugins/CsvImport/csv_files/ folder of your home directory. Delete test.csv.

• I don’t think that the name matters, but avoid blanks and other exotic characters in the name and give it the ending “.csv”.

steps 1

• Step 1: Select File and Item Settings– CSV File – Item Type – Collection

• After that step, the CSV file is read and checked to have the proper format.

• If that check fails you have to edit the file.

step 2• The name of your columns has been

recognized and you are asked to match it to the omeka information.

• You have three option– match to a metadata element– match to tags– mach to a file

map to element

• Map To Element”– Dublin Core (common) – Item-type dependent fields

• “Use HTML” means interpret value as HTML– “File” Don’t map to metadata, interpret values as

file• You can match to several metadata field at

once.

match to tags

• “Tags” Don’t map to element, interpret values as tags.

• Note that tag have to be separated by the tag separator that you have fixed in your general settings. Otherwise a tag string can not be parsed into separate tags.

match to file for uploads

• “File” values have to be URLs starting with http. – Pointers to files on wotan are not supported.– No big deal because Thomas has given you the

web site. • Omeka will fire up a browser and fetch the file

from the web.

successful import

• If the import is successful,– go to the public interface with another web

browser window– check you like the result

• If you don’t like what you see, hit the undo link in the csv import screen. That link may not be available later.

structure• basics about vision• encoding visual information– vector graphics– raster graphics• choice between formats• raster file formats

– TIFF– PNG – JPEG

human culture: color

• Color is the visual perceptual property corresponding in humans to the categories called violet, pink, ochre etc.

• Color derives from the spectrum of light (distribution of light power versus wavelength) interacting in the eye with the spectral sensitivities of the light receptors.

physics: light

• Visible light is a kind of electromagnetic wave.• It is the one that the eye can perceive.• The wavelength of that radiation is roughly

between 380nm and 700nm. • Infrared is below 380nm.• Ultraviolet is above 700nm.

the eye• The eye has two types of cells to see light.• “Rods” are used to perceive the intensity of

light.• “Cones” are used to perceive colors.

greyscale

• The rods in the eye cell perceive a full spectrum of natural light as white and the absence of light as black.

• In between, they see various shades of grey.• An image that only have shades of gray is

called a greyscale image.

trichromatic theory

• James Clerk Maxwell (1831–1879) is credited with setting up foundation of the trichromatic theory.

• It states that any color can be obtained by combining, in various proportions, three conveniently chosen primary colors.

• This more a property of the human eye than a property of physics.

biological foundation

• There are three types of cones, “short”, “medium” and “long”. They differ in how sensitive they are to different wavelengths of light.

• The fact that we have three cones links types of color back to basic color perception.

color mixing

• When we are mixing primary colors to produce others, there are two approaches.– Subtractive color mixing uses light filters to

remove the unnecessary colors present in the incoming (white) radiation.

– This approach uses several light sources of different colors to create the sensation of another color.

subtractive color details• There are two types of light filters. • Transmissive filters stop some of the

wavelengths from the incoming light. An example is stained class.

• Reflective filters such as absorb the unwanted wavelengths, reflecting back only some of the colors. This is vast bulk of color perception. Most physical object have color because they absorb some light.

additive color detail• There are three methods. • Optical additive mixing combines two or more

colors on the same surface at the same time. You see this in the cinema.

• Spatial color mixing relies on the finite resolution of the human eye. The CRT monitors do this.

• Temporal additive mixing. Colors are combined on a the surface, but at different moments in time. The modern monitors do this.

color model

• A color model is an abstract mathematical model describing the way colors can be represented as group of numbers. – three for a mix of basic colors– a fourth one if we want to express the

transparency of an image• A color model has to be mapped to a space of

colors. Otherwise it is an arbitrary abstract construction.

the RGB color model

• The RGB color model is an additive color model in which red, green, and blue light is added together in various ways to reproduce a broad array of colors.

• What precisely these colors are that are best suited can be found by experiments.

• Strictly speaking, RGB is a group of color spaces.

raster images

• Raster images are rectangular sets of pixels.• Each pixel is a small rectangle that has a

certain color.• Since the points are small the illusion of a non-

pixilated image is created. • The smaller the pixels, the smaller the image. • Modern raster images will look bad on future

devices.

vector graphics

• A vector graphic is a set of instruction on how to draw shapes that make up an image.

• Contrary to raster images, vector graphics are resolution-independent. On a device with small pixels, they look better than on a device with large pixels.

• Modern vector graphic will look good on future devices.

vector graphics usage

• Vector graphics are used in part of digital information.– SVG is a vector graphics W3C recommendation for

the web. Browser support is still sketchy. – Outline fonts (based on vector graphics) are now

in common use.• Transforming scanned material into a vector

format does not appear to be economically feasible.

scanning

• Printed information material can be scanned.• The scanning process creates an image of the

print. • If there are character data, they can be

recovered through optical character recognition. It’s error prone. It’s usually the only step to create a device-independent representation of the information.

raster images

• They are here to stay for better or worse.• There are many raster image formats. Each

formats has strengths and weaknesses.• The main distinction is between formats that

are lossy and formats that are lossless.

• |we now turn to basic considerations related to image format |

grayscale

• A grayscale digital image is an image in which the value of each pixel is a single sample, that is, it carries only intensity information. Images of this sort, also known as black-and-white, are composed exclusively of shades of gray, varying from black at the weakest intensity to white at the strongest.

lossy vs lossless

• Lossy formats remove information from the image that the format designer think the eye will not notice.

• This reduces in smaller files, quicker downloads.

• If an images in a lossy format is manipulated (loaded and then saved) the loss is progressive.

• Lossy formats are bad for archival storage.

usability of standards

• Before deciding to use a format for encoding information is imperative to examine how the format is maintained– is the format patent free?– how concentrated is its ownership?– is the documentation freely available?– is there an open source software implementing

the format?

bit depth

• The bit depth is the amount of information that is retained on every pixel about the colors of the pixel.

• The higher the bit depth, the more color can be simulated.

palette / indexed color

• An alternative to giving the color information in every pixel is to store a color index number in each pixel.

• In a second stage, a palette relates the index number to a color number from a color model.

Classic file image types

• GIF |not further discussed|• TIFF • PNG• JPEG

TIFF

• TIFF stands for Tagged Image File format. • It is a standard lossless file format used for

archival purposes.• In fact it is the de facto standard in the

archival community.• It is a 24bit depth, i.e. “full-color” format.

origin

• It was originally created Microsoft and a software company called Aldus. The latter held the copyright.

• It released the first complete specification in 1986.

• Its aim was to create a standard format for the desktop scanners of the 80s.

• Initially it only supported a single bit depth.

status

• This company was acquired by Adobe, Inc. They now hold the copyright.

• Thus this is a proprietary format.• Use of the format requires no license fees.• The last major update was in 1992.

tagged…

• The TIFF file stores its information in fields called tags. These store things like– image dimensions– copyright information

• The format allows for proprietary tags you can create yourself.

• Initially derided as “thousands of incompatible file formats”.

baseline TIFF

• baseline TIFF is a tag set that every reading and writing software must support.

• Beyond that, there can be optional tags that a software may or may not support.

• Finally, there is the possibility to create private tags.

requirements of baseline TIFF• Multiple images may be in the same file.• Support for two compression schemes– CCITT Group 3 1-Dimensional Modified Huffman

RLE– PackBits compression - a form of run-length

encoding• Support for– bilevel– grayscale– palette-color– RGB full-color

problems

• Adobe also owns PSD, the format for its Photoshop application. They have neglected TIFF. – no tags to specify relationship between pages.– no standards for vector graphics and text

drawings. • There is a size limitation to 4GB.

TIFF since its release

• There is no public process for specification revisions.

• Adobe added a “draft” supplement in 1995 as a first “technical note”.

• A “JPEG-in-TIFF” specification, was released by 2002 as a second technical note.

PNG history

• 1977 and 1978 Jacob Ziv and Abraham Lempel published a pair of papers on a new class of lossless data-compression algorithms, now known ``LZ77” and ``LZ78”.

• 1983, Terry Welch of Sperry developed a very fast variant of LZ78 called LZW. Sperry files for a patent, granted December 1985.

• 1986 Sperry is taken over by Burroughs to form UniSys.

GIF• In 1987 CompuServe’s Bob Berry issued the

GIF compressed image format in 1987. • It uses LZW for compression• Berry and CompuServe claimed they did not

know that UniSys had a patent. • CompuServe become aware of the patent

after the GIF spec is public and widely used.• UniSys appears to ignore the infringing use.

birth of PNG

• In 1993, UniSys has financial problems.• It negotiates with CompuServ that they

collectively would collect royalties for use of LZW in GIF manipulating software.

• This was announced on 28 December 1994. • There is a huge outcry on the Internet. • An informal group, around Thomas Boutell

works on producing a free GIF.

start and end of work• On 4 January 1995 Boutell releasese``PBF,'' for

Portable Bitmap Format.• Within one week, others add other features and

suggest a name change. Most of the work is done.

• On 1 October 1996 PNG 1.0 becomes the first W3C Recommendation

• On 14 Oct 1996 image/png become a MIME type

• On 15 Jan 1997 the PNG spec 1.0 released as RFC 2083

features |1|

• non-patented and completely lossless compression that is better than the compression in GIF, but only by 5%-20%

• Multiple circular redundancy checks so that file integrity can be checked without viewing

• It has a magic signature that can detect the most common types of file corruption.

features |2|

• two-dimensional interlacing scheme |+• 1-, 2-, 4- and 8-bit palette support (like GIF) • 1-, 2-, 4-, 8- and 16-bit grayscale support • 24- and 48-bit truecolor support • full alpha transparency in 8- and 16-bit modes,

not just simple on-off transparency like GIF |+

features |3|

• “palette-alpha” mode, effectively transforming normal RGB palette into RGBA

• gamma correction for cross-platform “brightness” control

• both compressed and uncompressed text chunks for copyright and other info

PNG interlacing• Interlaced GIF images display their data in 4

passes, each one filling in more detail from top to bottom. The first pass begins after 1/8 of the image has downloaded.

• PNG begins rendering the image after only 1/64 of the data has arrived. It’s “7 pass” scheme will fill images

• Interlacing increases the file size.

gamma correction

• Gamma correction is the ability to correct for differences in how computer systems interpret color values. – Macintosh-generated images tend to look too

dark on PCs– PC-generated images tend to look too light on

Macs. • PNG format allows the gamma value of the

computer which created an image to be embedded into the image file.

the alpha

• The alpha value of a pixel indicates how transparent the pixel is.

• In a PNG, thealpha value is represented by a byte, allowing for 255 settings.

• If the alpha value is 0, the pixel is transparent.• If the alpha value is 255, the pixel is opaque.

JPEG

• There are two standards, the original JPEG and JPEG2000.

• We need to worry about this because JPEG 2000 has an option for lossless manipulation of the image.

• JPEG does not have this. • We will assume JPEG is always a lossy format.

compression• Neighboring pixels in pixel images often have

similar color. • This is the basic premise on which

compression in a JPEG is based on.• JPEG lossy compression does not reduce the

amount of pixels, but it similaryfies neighboring pixels.

• This is done in a complex way. Some implementations take shortcuts.

compression scales

• Usually JPEG programs give you as scale by which you can set the amount of lossy compression.

• The numbers they use are not absolute, they depend on the way each software handles JPEG.

• Using different software and/or different settings will aggravate the loss from consecutive manipulation of JPEG.

rotation

• Rotation is the most common manipulation of images that we know of.

• There are software packages that allow for lossless rotation of JPEG images. A $0 example for the MS Windows operating system is irfanview.

JPEG2000

• JPEG2000 is a different standard than JPEG. • It has a different MIME type image/jp2. • It supports a lossless format. • Its support is graphics manipulation programs

is good but its web support is very poor at this time.

• There are controversies about its free status.

http://openlib.org/home/krichel

Please shutdown the computers whenyou are done.

Thank you for your attention!

antialiasing

• In order to reduce an impressing of pixelated ruggedness across a diagonal shape, some formats support antaliasing.

• This is a technique by which pixels at the border are given an intermediate color.

layers

• There may be several layers to an image. • For example, image manipulation programs

can allow you to draw something over an existing image, or add layers together.

palette

• A palette is either a given, finite set of colors for the management of digital images.

• It’s to images what the character set is to text.• When talking about a screen it may also mean

the total number of colors that the screen can show. Not all of them may be visible at high resolutions due to video card memory limits. |look at screen properties|

gamut

• A gamut is a range of colors.• An output device can have a gamut, meaning

the set of all colors that it can show.• A color model may also have a gamut,

meaning all the colors that can be shown in the model.

Documents

LIS654 lecture 2 the csv plugin, images Thomas Krichel 2012-02-05