Upload
serena
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
MediaView -- Towards a “ Semantic ” Multimedia Database Model. Qing Li Dept of Computer Science City University of Hong Kong. Outline. Motivation & Introduction Modeling Constructs Logical Implementation Real-World Applications Conclusion. State-of-the-art. - PowerPoint PPT Presentation
Citation preview
MediaView -- Towards a “Semantic” Multimedia Database Model
Qing LiDept of Computer Science
City University of Hong Kong
Outline
Motivation & Introduction Modeling Constructs Logical Implementation Real-World Applications Conclusion
State-of-the-art
Multimedia Systems and Applications an explosive growth in recent years demand on managing multimedia using
databases
Database techniques for multimedia data modeling indexing query processing presentation & synchronization
“Semantic Gap”
semantics-intensive multimedia systems & applications
non-semanticmultimedia data models
require
model
semantic meaning of the
data
raw data,primitive
properties (size, format,
etc)
Semantic Gap
Semantic modeling of multimedia -- Why hard? Context-dependency
Semantics is not a static and intrinsic property The semantics of an object often depends on:
the application/user who manipulate the object the role that the object plays other objects in the same “context”
Van Gogh’s
paintings
flower
Example:
Why hard? (cont.) Modality-independency
Media objects of different modalities may suggest the similar/related semantic meanings.
Example:
Harry Potter has never been the star of a Quidditch team, scoring points while riding a broom far above the ground. He knows no spells, has never helped to hatch a dragon, and has never worn a cloak of invisibility.
Query:
Results:
image video text
MediaView – A “Semantic Bridge”
An object-oriented view mechanism that bridges the semantic gap between multimedia systems and databases
Core concept – media view (MV) a customized context for semantic
interpretation of media objects (text docs, images, video, etc)
collectively constitute the conceptual infrastructure of an multimedia system & application
Architecture
External Schema
mediaview 1
Internal Schema
mediaview 2
mediaview n
. . .
Object-oriented Database
Multimedia Systems
Conceptual Schema
. . .
MediaView Mechanism
Basic Concepts
So, a media view MVi can be represented as a triple:MVi= <Mi, Pi, Ri,>
Where:Mi - a set of objects that are included into MVi as its
members. Each object o∈Mi belongs to a certain source class, and different members of MVi may belong to different source classes.
Piv - a set of properties (attributes and methods) applied on either MVi itself (Piv) or on all the members (Pim).
Ri - a set of relationships, and each r∈Ri is in the form of <oj, ok, t>, which denotes a relationship of type t between member oj and ok in MVi; Ri itself may exhibit a “graph”.
Basic Concepts
An example…
Image
MultimediaObject
TextDocument
AudioClip
VideoClip
Image
BitmapImage
JPEGImage
keyframe
audiotrack
ImpressionisticArtworks
Name
Artist
Type
Style
Wavelet-Texture
Dominant-Shape
Color-Histogram
Artworks
RealisticArtworks
ImpressionisticPaintings
ImpressionisticArtworks
Post-modernArtworks
(a) Base Class (B) Media View
(d) View Schema(c) Base Schema
SongSpeechImpressionistic
Sculptures
subclasssubclass
subviewsubview
Basic Concepts
Semantics-based data reorganization via media views
text
audio
video
image
media view
Basic ConceptsDefinition 5: The semantic graph (SG) is an
undirected graph G={V, E}, where V is a finite set of vertices and E is a finite set of edges. Each element ViV corresponds to a multimedia object Oi in the database. E is a ternary relation defined on V×V×N. Each e=<Vi,Vj, n>E represents a semantic link of degree n between object Oi and Oj, where n is the number of media views to which both objects belong. We define n as the correlation factor between Oi and Oj.
Basic Concepts
Definition 6: The correlation matrix M=[Mij] is an adjacency matrix of the semantic graph. Specifically, each element Mij contains the correlation factor between Oi and Oj, with all the diagonal elements set to be zero.
Basic Concepts
Semantic Graph Model
O1O5O4O3O2
O1
O5
O4
O3
O2
0
0
1
1
2
2
1
0
0
0
0
1
1
1
1
1
1
11
11
1
0 0
0
31
24
5
van Gogh’ sGallery
ImpressionisticArtworks
(a) semantic grpah (b) correlation matrix
O1
O5
O4
O3
O2
“ Sunflower” (by van Gogh)
“ Potato Eaters” (by van Gogh)
Biography of van Gogh
Ohter impressionistic artwork
An audio guide
View Operators
A set of operators that take media views and view instances as operands.
Our intension is not to come up with a complete set of operators, but to focus on those that are indispensable in supporting queries and navigation over multimedia objects.
View Operatorstype-level
V-overlapsyntax<boolean>:= v-overlap (<media view1, media view2 >)semantics true, if and only if ( o O)(oextent(<media view1>) and oextent(<media view2>))
Crosssyntax{<object>}:= cross (<media view1, media view2 >)semantics{<object>} := {o O | o extent(<media view1>) and oextent(<media view2>)}
Sumsyntax{<object>}:= sum (<media view1, meida-view2 >)semantics{<object>} := {o O | o extent(<media view1>) or oextent(<media view2>)}
Subtractsyntax{<object>}:= subtract (<media view1, media view2>)semantics{<object>}:= {o O | o extent(<media view1>) and oextent(<media view2>)}
View Operatorsinstance-level
Classsyntax<base class> := class(<view instance>)semantics<view instance> is a instance of <base class>
componentssyntax{<object>} := components (<view instance>) semantics {<object>} := { oO | o is a component (direct or indirect) of <view instance>}
i-overlapsyntax<boolean> := i-overlap (<view instnace1>, <view instance2>)semantics true, if and only if ( o O) (o components (<view instance1>) and o components(<view instance2>))
View Algebra
Functions-- derivation of new MVs from existing MVs
Heuristic Enumeration1. Blind enumeration 2. Content-based enumeration 3. Semantics-based enumeration
View Algebra
Algebra Operators select from src-MV where <predicate> project <property-list> from src-MV intersect (src-MV1, src-MV2) union (src-MV1, src-MV2) difference (src-MV1, src-MV2)
Comparison (vs. class)
media view object classmembershi
pheterogeneous objects uniform objects
member acquisition
dynamic inclusion/exclusion of existing objects of other classes
creating new objects
mapping one object can belong to multiple media views
one object has exactly one class
relationship inter-member semantic relationship
N/A
Comparison (vs. traditional object view)
media view object viewmembershi
pheterogeneous objects uniform objects
relationship inter-member semantic relationship
N/A
member properties
instance-level properties (user-defined)
inherited or derived properties (for view
instances)global
propertiesMV-level properties (user-
defined)N/A
Logical Implementation
MediaView Construction MediaView Customization MediaView Evolution
MediaViews Construction
Work with CBIR systems to acquire the knowledge from queries Learn from previously performed queries A multi-system approach to support multi-
modality of media objects
Organize the semantics by following WordNet
Why WordNet? Different queries may greatly vary with
the liberty of choosing query keywords
We need an approach to organize those knowledge into a logic structure A simple “context”: a concept in WordNet Common media views: corresponds to
simple contexts We provide all common media views, based
on which users can build complex ones.
Navigating the Multimedia Database
Navigating via semantic relationships of WordNetSemantic Relationship ExamplesSynonymy (similar) pipe, tubeAntonymy (opposite) fast, slowHyponymy (subordinate) tree, plantMeronymy (part) chimney, houseTroponomy (manner) march, walkEntailment drive, ride
Navigating the Multimedia Database
Multimedia Database
MediaView 1
MediaView 2
MediaView 3
MediaView 4
Semantic Relationship in WordNet
User
browse
MediaViews Construction
CBIRSystem(Video)
CBIRSystem(Image)
CBIR System(Text)
Query
...
Multimedia Database
MediaView Engine
System Feedback
Users
User Feedback
Results
Issue
MediaView Customization
Two level MediaView Framework
Basic MediaView
Customized MediaView
Simple Context Advanced Context
MediaView Customization
Dynamically construct complex-context-based media views based on simple ones An example complex context: “the Grand
Hall in City University” Several user-level operators are devised
to support more complex/advanced contexts, besides the basic operators
User-level Operators INHERIT_MV(N: mv-name, NS: set-of-
mv-refs, VP: set-of-property-ref, MP: set-of-property-ref): mv-ref
UNION_MV(N: mv-name, NS: set-of-mv-refs): mv-ref
INTERSECTION_MV(N: mv-name, NS: set-of-mv-refs): mv-ref
DIFFERENCE_MV(N1: mv-ref, N2: mv-ref): mv-ref
Build a MediaView in Run-time Example: find out
info about "Van Gogh"
Who is "Van Gogh"? What is his work? Know more about his
whole life. Know more about his
country. See his famous
painting "sunflower"
Legend
Multimedia Document
Media View 1
Text
Sound
Image
Video
Topic 1
Topic 2
Topic 3
Build MediaView
Build a MediaView in Run-time Who is “Van Gogh”?
INHERIT_MV(“V. Gogh“, {<painter>},name=”Van Gogh” ,);
What is his work? INTERSECTION_MV(“work”, {<painting>, vg});
Know more about his whole life. INTERSECTION_MV(“life”, {<biography>, vg});
Know more about his country. INTERSECTION_MV(“country”, {<country>, vg});
See his famous painting “sunflower” Set sunflower = INTERSECTION_MV(“sunflower”,
{<sunflower>, <painting>});Set vg_sunflower = INTERSECTION_MV(“vg_sunflower”, {vg_work, sunflower});
Authoring Scenario Creates a new media view named after the subject
All multimedia materials used in the document would be put into this MediaView for further reference.
To collect the most relevant materials for authoring, the user performs the MediaView building process.
Import suitable media objects by browsing media views Reference the manner and style of authoring, to
find other media views with similar topics. Drag & Drop “learning-from-references”
Interface of Our Authoring System
System Features
A Dynamic Environment Helps a user select materials from the
database to incorporate into the document
Query other similar media views for referencing the manner and/or style of authoring
Real-World Applications
A Multimedia Recipe Database Modeling basis Personalized (context-aware) manipulation
Cross-media indexing and retrieval system Novel way of annotating and retrieving media o
bjects Lead to new indexing strategies
A Personalized Recipe Database System
People can not live without foods Existing recipe websites provide huge amounts of recipes
throughout the world Fail to give support on analyzing and comparing recipes
(What are important cooking principles & skills; what makes two dishes’ taste so different, etc.)
Unable to help users find similar recipes in a comprehensive manner (only keyword-based search on recipe names)
Fail to adapt recipes to meet the real-world situation (e.g. due to lack of ingredients or user preference)
A Personalized Recipe Database System -- Our Contributions
Propose a recipe model which encompasses static attributes as well as dynamic behaviours (e.g. cooking procedures and constraints)
Present a novel perspective of evaluating the “quality” of a recipe by constructing and analysing its cooking graph (capture both action flows and data/ingredient flows)
Provide a promising way to address the problem of recipe adaptation heuristically (with flexible and feasible solutions)
Recipe on the Web
Ingredients:Chicken Thighs 250 g
Scallions 10 gSesame Paste 2 tsp.
Sugar 1 tsp.Soybean Sauce 2 tsp.
Sesame Oil ½ tsp.… ...
Step Illustration
Steps1. Use chicken thighs and cut away skin and fat2. Poach the chicken. Drain and cool. 3. Mix the sesame paste, sugar, soybean sauce and sesame oil4. Cub the chicken lightly till soft and shred. Put to a plate.5. Put shredded scallion around the chicken and pour the sauce over the chicken.
… ...Users’ Rating
and Comments
CategoryRegion-->Sichuan
Cooking Method-->PoachedIngredient-->Chicken
Video Clip Final Look
Bang Bang Chicken
Sample Recipe -- The Cooking Procedure of
“Triple Cheese Pasta Primavera”
Step number
Recipe cooking procedure in steps
1 Dice bell peppers. Slice squash and mushrooms.
2 Cook pasta according to package directions in unsalted water.
3Meanwhile, in a large skillet melt butter. Add bell peppers; cook
and stir occasionally until barely tender, about three minutes.
4Add squash and mushrooms; cook and stir occasionally until
barely tender, about four minutes.
5 Drain pasta; toss with vegetables in skillet.
6
In the saucepot in which the spaghetti was cooked, combine ricotta, mozzarella, milk, Parmesan, Italian seasoning, salt and black pepper. Over a medium-low heat cook and stir cheese mixture just until hot, about 1 minute.
7Add reserved pasta and vegetables; toss to coat; remove to a
serving platter.
Sample Recipe
Triple Cheese Pasta Primavera
1: Dice bell peppers. Slice squash and mushrooms.2: Cook pasta according to package directions in unsalted water.3: Meanwhile, in a large skillet melt butter. Add bell peppers; cook and stir occasionally until barely tender, about three minutes. 4: ………
action 1: diceaction 2: sliceaction 3: cook…
action i: stir…
action n: remove
p Steps in the Web Page
Primitive LevelComposite Level
Divided into n ActionsRecipeCrawled from the Web Page
Recipe Level
Parsing the Cooking Procedure of “Triple Cheese Pasta Primavera”
Recipe Model A recipe R is modeled and represented by a tuple of three
elements:R = <M, RP, SP>
where (a) M={Mi | i = 1.. m} – a set of ingredients. An ingredient
Mi is either a basic ingredient or a set of ingredients: Mi = <MID, MP>, MID—unique identity, MP—member level
properties (and functions) such as the name, quantity and image An ingredient Mi belongs to one of the three classes: Main, Minor
and Seasoning;
(b) RP is a set of recipe-level properties (and functions) applied on R itself, such as the main cooking style, region, nutrition and images of the dish of the recipe;
Recipe Model (c) SP = (V, E, Cons, Ingr) is a labeled directed “Cooking
Graph”, V={vi | i = 1..n} is a set of nodes.
vi—a cooking action “cooking action constraints”: Cons(vi)—associated constraint
conditions that should be satisfied when the action of vi takes place. e.g. conditions on temperature and duration etc.
E is a set of directed edges on V—temporal execution flow of the cooking actions; named “action flows”. An edge <vi ,vj> —vj should take place after vi. “cooking transition constraints”: Cons(vi , vj) –the conditions that
should be satisfied for the flow to take place. Ingr(vi) – ingredients that should be added into vi
O(vi) –the output ingredients of viThese inputs and outputs for the nodes are called “ingredient flows”.
Cooking Graph
bell peppers
squashmushrooms
dice
slice
add
pasta
meltcook
toss
Start Node
v1
v2
v3
v4
v5v8
v9
v10
v7v6
M1
M2
M4
M3
stir add
cook
stir
v11cook
Loop
LoopFork
Join
Sequential
v12
drain
butterM5
milksalt
mozzarellaParmesan
ricottablack pepper
Italian seasoning
stiradd combinetossremove
v13v14v15v16v17M6M7
M8
M9M10
M11
M12
End Node
Cons(v7,v8)
Cons(v7,v6)
Cons(v10,v12)
Cons(v13)
Cons(v10,v9)Cons(v4)
Cons(v3)
Cons(v12)
Cooking Graph
M : Ingredient Action Node
SP = (V, E, Cons, Ingr)
Action Flow
: Explanation V: E: Ingr:
Ingredient Flow
Cons( ):
Constraint
The Cooking Graph of “Triple Cheese Pasta Primavera”
Basic Properties
Definition 1. (Reachability) A cooking graph is defined as “reachable” if each of its nodes is “reachable”; a node is “reachable” if it is on a directed path from a starting node to the end node.
Definition 2. (Consistency) A cooking graph is defined to be “consistent” if the conditions for each node/edge is consistent (i.e. there exists assignment to variables to make the conditions true).
Constraints and Rules
Definition 3. (Constraint) A constraint is a predicate followed by one or more terms, enclosed in parentheses and separated by commas; a term is either a constant, variable or function expression. Constraints specify all kinds of conditions or
restrictions in the recipe model; Three categories: intra-recipe constraints, inter-recipe
constraints and outer-recipe constraints. Incompatible(Spinach, Tofu) says spinach and tofu ar
e incompatible and should not be cooked together.
Constraints and Rules Definition 4. (Rule) A rule is a logical implication of
the form “If Ф Then Ψ” (or, ), where Ф and Ψ are sentences. Validate the correctness of a recipe through reasoning and
recognition process. Handle complex situations such as to make necessary
adjustment or compensation once an improper cooking action occurs.
Describe cooking skills that have been widely accepted and commonly used.
Over_Put(salt) → Add(vinegar|water) says that if too much salt has been put into a dish, then neutralize the salty taste by adding either vinegar or water.
Recipe Cooking Graph Mining
Pattern — Some subgraphs occur in one or more cooking graphs and they have certain influence on the cooking effects (e.g. taste, appearance).
Find patterns for a set of recipes What’s usually done and what’s usually put in the cooking procedure
(one action, a series of actions, an ingredients, a set of ingredients, actions combined with ingredients)
Cooking graphs of different recipes may share the same pattern
Distinct subgraphs that determine the cooking effect (e.g. taste) should be identified
Sample Patterns
……
marinate
……
e.g. salt, sauce, garlic, scallion
Main Ingredient(s)
Seasoning Ingredient(s)
e.g. pork, chicken
……
coat
……
e.g. starch, water, egg
Main Ingredient(s)
Seasoning Ingredient(s)
e.g. pork, chicken
heat
Ingredient(s)
……
fry/ stir-fry/
deep-fry
remove from oil
……
oil
Passing Oil
boil
Ingredient(s)
……
simmer briefly
……
boiling/cold
water
remove
Blanching
Sample Cooking Style
Cooking Style Pattern with Dominating Action
Soft Deep-frying Coating + Passing Oil + deep-fry
Dry Deep-frying Marinating + Coating + deep-fry
Cooked-frying Passing Oil/Blanching/Steaming+ stir-fry (+ Thickening)
Slip-frying(Marinating + Coating) + Passing Oil + stir-fry +
Thickening
Soft Stirring Blanching/Steaming+ stir + Thickening
BraisingPassing Oil/Blanching/Steaming + simmer in sauce (+
Thickening)
Simmering Blanching + simmer in water/broth
Generally describe how a recipe is cooked in a Pattern Combination or in Graph Abstraction.
User Adaptation
Usually a user wants to make a dish that has the same cooking result (e.g. taste, appearance) as the recipe exhibits.
Unfortunately, the user is very likely to get a slightly or even totally different dish as he/she modifies the cooking procedure.
Objective reasons—e.g. lack of some ingredients, Subjective reasons—e.g. wrong cooking actions by carelessness or personal preference.
User Adaptation When the user makes an
adaptation, the system will check if the modified cooking graph is feasible.
If not, a set of feasible templates are provided.
The remaining subgraph is replaced by the user selected one.
Property check (Reachability, Consistency)
… … …...
… …
… …
… …
Remaining Original Subgraph
Templates
Adapted Subgraph
UserSelection
?
Originally One Recipe
…...
Adapted Subgraph
… …
User Selected Template
Substantial Ingredients & Constraints
Instantiation
Template Selection and Instantiation
Prototype System
Global Systemvs. User Space
Global System
… ...
Conventional Recipes in Structure
Adopted & Adapted Recipes in User Organized Structure
User Space
Export
Import
Linda
Tom
Mary
User Area
Global Area
Export a Recipe “Steamed Chicken”
Search “Spicy Bean Curd”, “West Lake Fish”, “…”
Comment a Recipe “Carp Soup”
Add a Favorite Recipe “Stir-Fried Prawns”
Try a Pop Recipe“Eight Precious Rice”
Prepare a Party Menu
Prototype System – Recipe Browser
Prototype System – Cooking Pattern Miner
Select Recipe
Select Cooking Style
Name of Recipe
Cooking Graph of
Selected Recipe
Show All Patterns in
Cooking Graph
Revert Recipe
Find Common Patterns for Recipes of Selected
Cooking Style
Recipe ListContaining
Selected Pattern
Common Pattern List
Selected Cooking Pattern
Prototype System – Similarity Calculator
Recipe 1
Similarity Ranking List
Name of Recipe 1
Cooking Graph of Recipe 1
Cooking Graph of Recipe 2
Revert Recipe 1 Revert Recipe 2Recipe 2 Find Common Subgraphs for Recipes 1 & 2
Apply Selected Subgraph to Recipe 1 & 2
Graph Similarity
Common Subgraph List
Selected Common Subgraph
Summary
Proposed a data model to represent a recipe Advocated cooking graph mining to find frequent
used patterns (actions, ingredients) Attempt to solve recipe adaptation problem by
using patterns as templates Developed a prototype system—RecipeView Further work include:
discover patterns of cooking graphs Refine and strengthen the algorithm of recipe
adaptation
Application Scenario
Candidates
Seeds
Results
Discover Refine
Users
PresentDesignate
Feedback(adjust)
Application Scenario
Advantages (vs. traditional retrieval techniques)
Easy-to-compose query By browsing (to get “seed” objects of arbitrary modalities) By subject (simply keyword) at various abstraction level
Multi-modal results a collection of images, text docs, videos, etc vs. a single type of media
Semantically relevant results natural outcome of exploring previously learnt knowledge vs. a set of specifically chosen features
Advantages (cont’d)
“Hill-climbing” Effect – retrieval performance grows as more user interactions are conducted
Materialized knowledge
Retrieval process
exploration
encouragelearning
User interactions
Conclusion
MediaView – a semantic multimedia database modeling mechanism to bridge the semantic gap between convention
al database and semantics-intensive multimedia applications
A set of user-level operators to accommodate the specialization/generalization relationships among the media views
Conclusion MediaView promises more effective access t
o the content of media databases Users could get the right stuff and tailor it to the
context of their application easily. Providing the most relevant content from p
re-learnt semantic links between media and context high performance database browsing and multi
media authoring tools can enable more comprehensive applications to the user
Conclusion
Users could customize specific media view according to their tasks, by using user-level operators
The effectiveness of using MediaView in the experimental problem domains Multimedia recipe database Cross-media indexing and retrieval
Further Issues
The development and transition of MediaView to a fully-fledged multimedia database system supporting “declarative” queries
Intensive and extensive performance studies
Advanced semantic relations (eg. temporal and spatial ones) can also be incorporated in combining individual media views