1
Multimedia RetrievalIntroduction
Remco Veltkamp
Multimedia
What is multimedia? Two viewpoints:• “Multimedia? As far as I’m concerned, it’s
reading with the radio on” – Rory Bremner• “The medium is the massage” – Marshall
MacLuhan– “The medium is the message” – popular
misquotation– “The medium is the mass-age”
The medium is the message
Explanation by son Eric Mcluhan:“Actually, the title was a mistake. When the book
came back from the typesetter, it had on the cover "Massage" as it still does. The title should have read "The Medium is the Message" but the typesetter had made an error. When Marshall McLuhan saw the typo he exclaimed, "Leave it alone! It's great, and right on target!" Now there are four possible readings for the last word of the title, all of them accurate: "Message" and "Mess Age," "Massage" and "Mass Age."”
http://www.brushstroke.tv/week03_35.html
Multimedia
• Hypermedia: software applications that are interactive, non-linear, including digital assets like images and sound.(hyper means ‘beyond’)
• New media: trendier term, but equally ambiguous
• Unstable (instable) media: term used in art
2
Multimedia
What is media anyway?• An intervening substance through which
something else is transmitted or carried on –The American Heritage Dictionary of the English Language 4th ed., 2000
Media
• Mass media:link between info producers and consumers: newspapers, tv, radio
• Transmission media:physical means of transmitting signals: wires, optical cables, microwaves
• Storage media:physical means of storing data: ticker tape, magnetic tape, floppy/diskette, cd/dvd
MediaEtymology:Ancient country of W Asia whose actual boundaries
cannot be defined, occupying generally what is now W Iran and S Azerbaijan. It extended from the Caspian Sea to the Zagros Mts. The Medes were an Indo-European people who spoke an Iranian language closely akin to old Persian. Some scholars claim they were an Aryanized people from Turan. Since there are no Median records, Assyrian and Greek sources must be relied upon for Median history – The Columbia Encyclopedia, 6th ed., 2001.
Medea, Medes, Media
In the Greek myth of Jason and the Argonauts, Medea was for a time Jason's wife. Due to various misdeeds and intrigues, Medea flees Athens and is forced to settle in a nether region, beyond Persia. The kingdom she established there in exile, Media, is named for her son, Medes. It was a middle kingdom between the Hellenic, Egyptian, and Babylonian world and the ancient cultures to the east. It was a region of transit and exchange between them.
3
First use
• The term multimedia seems to be first used in 1962, simply meaning the use of several media – Merriam-Webster’s Collegiate Dictionary 10th ed. 1998
• Used in the 1960s to describe presentations combining photographic slides and audio tapes
Multimedia
Definition:Any combination of two or more media, represented in a digital form, sufficiently well integrated to be presented via a single interface, or manipulated by a single computer program.
Loosely:Multiple media: images, video, sound, 3D scenes
Trends
• Increase in the use of Internet• Increase in volume of digital photography
(e.g. DV, Web-cams)• Increase of mobile services (MMS, UMTS
phones)• Digital TV
Trends in senses: sound
Media: talking heads (beatnik.com)
Web browsing: walls of sound (a la Phil Spector)
Advertising: nowhere to hide
4
Trends in senses: smell
Shopping: stinking out loud
Gaming: the smell of victory
Entertainment: read and sniff
Television: getting to nose you
Television: tilling the vast wasteland
Film and video: MPEG4
Surveillance: eye spy, eye in the sky
Trends in senses: vision
Visual information… is the futureVisual sense dominantLarge part of brain does vision
Optic nerve fibres from the eyes terminate at two bodies in the thalamus (a structure in the middle of the brain) known as the Lateral Geniculate Nuclei. One LGN lies in the left hemisphere and the other in the right. Much of the primate cortex is devoted to visual processing. In the macaque monkey at least 50% of the neocortex appears to be directly involved in vision, with over twenty distinct areas.
Trends in senses: hapto, tactile
• Force feedback
• Joystick, gamepad, wheel
• Data glove
5
Trends in senses: taste
• ?
MM aspects:
Production
• Authoring tools• Specification languages• Media synchronization• SMIL (pronounced as ‘smile’):
– Synchronized Multimedia Integration Language– 3GPP MMS uses subset of SMIL 2.0 as format
of the scene description
MM aspects:
Delivery
• Off-line:– books– CD-Rom– DVD
• On-line:– Internet, Intranet
• WWW• Gopher• Ftp
MM aspects: delivery
Internet• Killer applications, giga hits:
– Email (phone: Short Message Service, Media Message Service)
– WWW– Search engines
• Because, respectively:– Personal communication– Personal showing off– Personal information retrieval
6
MM aspects:
Databases issues
• Standard database organization like relational and hierarchical schemes not sufficient
• Standard database object types not adequate:blobs (binary large objects), but images, video, music, 3D models are not first class citizens
• Standard query languages not sufficient
MM aspects:
Throughput
• Network bandwidth• Framebuffer, videocard, audio synchronization• Disk access rate• Disk storage capacity• Operating system requirements
MM aspects:
Retrieval
• Search, find, fetch, recover, restore, return⇒ getting back
• Traditional ‘information retrieval’: text• MM retrieval: searching in large collections
of images, video, sound, 3D scenes (the 5th
wave in web searching)• Quaero ergo sum
MM Retrieval
Hip, hype and hope:MM techniques help in personalizing retrieval so as to cope with information overload
7
Retrieval aspects
• Feature extraction• Feature indexing• Query formulation • Feature matching• Result visualization• Feedback loop
Retrieval aspects:
Framework• media:
images, sound, video, 3D scenes
• features: color, texture, shape
• indexing: feature space, object space
imagesmusicvideo
3D models
featureextraction
featureextraction
features indexbuilding
indexstructure
matchingdocumentid-s
fetching
resultdocuments
queryfeatures
example
visualization
queryformulation
browsing
direct query
query by example
UI
Query formulation:
Browsingmanuallyspecifyingimage id's
Query formulation:
Direct feature querying
• external features:"get all glyphs with the keyword 'goose' "
• internal features:"get all pictures with attribute 'red' equal to 30%"
• CBIR (content-based image retrieval) usesinternal features
8
Query formulation:
Query by Examplequery: target:
Query formulation:
Query by Sketch
Key-word Based Retrieval
Convert content to keywordsat insertion?run-time?once the query is given?
Questionsexpressivenessambiguity in language
Key-words
Key-words
in Voskuil’s"Het Bureau", Maarten Koning wants to search for cradles in historicart:
CBIR!
Content-Based Image Retrieval
9
Applications• Logo retrieval• CAD searching• Product catalogues• Museum collections• Photo archives• Music selection• Medical imaging• Crime investigation, law enforcement• Video searching• Encyclopedia search• Copyright protection
ApplicationSearching news archives
ApplicationSearching news archives
ApplicationSearching news archives
10
ApplicationSearching news archives
ApplicationSearching news archives
Application6800 hieroglyphs, 72000 polylinesfrom Center for Computer-aidedEgyptological Research:
Domains
WWW general pictures unknown conditionsTrademark assessment 2D of 2D pictures known cameraTrademark in public 2D of 3D pictures unknown conditions Stolen goods retrieval general pictures well-behaved cameraVideo databases general pictures unknown cameraProduct database limited domain known cameraProduct retrieval limited domain well-behaved cameraStamp collections 2D of 2D pictures narrow domains
11
Features: Color
color signature: count pixels of dominant colors
Features: Texture
some pattern of color or intensity changes
Texture Segmentation
under-segmentation:
ShapeHere: shape is geometry
12
Matching
Given two images/objects/features A,B• measure dissimilarity, distance d(f(A),B)• using some distance function d (often called
similarity rather than dissimilarity)• under some transformation f
CBIR Systems• ADL• AltaVista Photofinder• Amore• Blobworld• CANDID• C-bird• Chabot• CBVQ• Digital Library Project• DrawSearch• Excalibur• FIR• FOCUS• ImageFinder• ImageMiner• ImageRETRO• ImageRover• ImageSearch• Jacob• LCPD
• MARS• MetaSEEk• MIR• NETRA• Photobook• Picasso• PicHunter• QBIC• SQUID• SurfImage• SaFe• SYNAPSE• TODAI• VIR image engine• VisualSEEk• VP IRS• WebSEEk• WebSeer• WISE• Zomax
Systems' Features Systems' Features
13
Systems' Features
• 56 systems in the table• 46 use any kind of color features• 38 use texture• 29 use shape• 20 layout• 5 use face detection. • http://give-lab.cs.uu.nl/cbirsurvey/
Level of Content
• Level 1: primitive featurescolor, texture, shape, lay-out
• Level 2: objects, scenestable, mountain
• Level 3: abstract conceptsdancing, democracy!
Level 1: Logo Retrieval
• Services: search, watch• Vienna classification code:
up to 30.000 hits• Visual inspection:
3000 per hour in morning2000 per hour in afternoon
• ⇒ automatic retrieval on the basis of shape and layout
Level 1: Logo Retrieval
Vienna Classification Code“castle”:category 7: constructions, structures for
advertisement, gates or barriersdivision 7.1: dwellings, buildings,
advertisement hoardings or pillars, cages or kennels for animals
section: 7.1.1: castles, fortresses, crenellated walls, palaces
14
Level 1: Logo Retrieval
Perceptual GroupingIdentify which shape elements belong together,for example on the basis of Gestalt principles:
Level 1: Logo Retrieval
Perceptual grouping
original logo:
alternative human segmentations:
Level 1: Logo Retrieval
Perceptual Matching
two geometrical partial matches:
confusingly similar
not confusinglysimilar
Level 2: scene classification
Snow Rock
Sky
Classify scenes on the basis of material semantics
15
Level 2: object classification
Classify animals on the basis of bodyplans
Level 3: Abstract Concepts
Relevance Feedback
changing the query and changing the similarity
Relevance Feedback
16
Relevance FeedbackEnabling technologies
• Image and signal processing• Computer graphics• Pattern recognition• Geometric algorithms
Place in masterprogram GMT,
Game and Media
Technology
Where are we now?
1. How stupid a computer still is in seeing. 2. The language / picture barrier: learning to see.3. The role of invariance and what features to use.4. Multimedia integrated approach.5. Features and similarity have to be become
perceptual.6. Interaction needs are extreme.7. Compute and algorithmic power needs for the big
databases.
17
Scientific Future
1. Scalability 2. Multimodal (text,picture,speech)3. Invariance and perception4. Feedback and learning5. Benchmarking
Market Opportunities• Internet• Digital photography (e.g. DV, Web-cams)• Mobile services (GPRS/UMTS phones)• Digital TV
Retrieval aspects:
Framework
imagesmusicvideo
3D models
featureextraction
featureextraction
features indexbuilding
indexstructure
matchingdocumentid-s
fetching
resultdocuments
queryfeatures
example
visualization
queryformulation
browsing
direct query
query by example
UI