31
Data Management and Data Exploration Prof. Dr. Thomas Seidl The Future of Multimedia Databases The Future of Multimedia Databases and Multimedia Data Mining Thomas Seidl RWTH Aachen University, Germany MMKM 2008 Milton Keynes, Feb. 14 th , 2008

The Future of Multimedia DatabasesThe Future of Multimedia

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

The Future of Multimedia DatabasesThe Future of Multimedia Databases and Multimedia Data Miningg

Thomas SeidlRWTH Aachen University, Germany

MMKM 2008Milton Keynes, Feb. 14th, 2008

Page 2: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

OverviewOverview

• Content-based similarity search– Complex models: quadratic forms, Earth Movers' distance– Efficient algorithms: approximations and indexing

• New interaction models (change of use)– Incremental search, relevance feedback, anytime querying

• From retrieval to new data mining tasksS b l t i tli d t ti t d t i i– Subspace clustering, outlier detection, stream data mining

Thomas Seidl 2MMKM 2008, Milton Keynes

Page 3: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

OverviewOverview

• Content-based similarity search– Complex models: quadratic forms, Earth Movers' distance– Efficient algorithms: approximations and indexing

• New interaction models (change of use)– Incremental search, relevance feedback, anytime querying

• From retrieval to new data mining tasksS b l t i tli d t ti t d t i i– Subspace clustering, outlier detection, stream data mining

Thomas Seidl 3MMKM 2008, Milton Keynes

Page 4: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Content based Similarity SearchContent-based Similarity Search3

Retrieval task: WhichRetrieval task: Which images in the archive are similar to the example?

4

1

2 Justification for hits: Similar color frequencies

Thomas Seidl 4MMKM 2008, Milton Keynes

Page 5: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Example: Mosaic PosterExample: Mosaic Poster

Bild: Hendrik Brixius

Thomas Seidl 5MMKM 2008, Milton Keynes

Page 6: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Result: Mosaic made from 2 500 ImagesResult: Mosaic made from 2.500 Images

Thomas Seidl 6MMKM 2008, Milton Keynes

Page 7: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Result: Mosaic made from 2 500 ImagesResult: Mosaic made from 2.500 Images

Thomas Seidl 7MMKM 2008, Milton Keynes

Page 8: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Description of Image ContentsDescription of Image Contents

• Categories: colors, textures, shapes, etc.• Description by histogramsDescription by histograms

Image Color histogram Texture histogram …Image Color histogram Texture histogram …

#123673 …

#5436430

1

2

Aal

Abend

Amerika

Anblick

Anzahl

Architekt …

Zaun

Zitrone

Zwischen

Zypresse

2

#543643 …

#363273 …0

1

Aal

Abend

Amerika

Anblick

Anzahl

Architekt …

Zaun

Zitrone

Zwischen

Zypresse

0

1

2

Aal

Abend

Amerika

Anblick

Anzahl

Architekt …

Zaun

Zitrone

Zwischen

Zypresse

• Specification of similarity by formal models

Thomas Seidl 8MMKM 2008, Milton Keynes

Page 9: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Adaptable Similarity ModelsAdaptable Similarity Models

• Euclidean DistanceNeglects cross-bin similarities

• Quadratic FormsLinear AlgebraLinear Algebra

( ) ( )tA qpAqpqpd −⋅⋅−=),(q

• Earth Mover‘s Distancej12

Linear Programming

⎪⎪⎬

⎫⎪⎪⎨

≥∀∀

∑∑ ∑n n

nij

ij pfifji

fc

qpEMD :0:

min)(

i

20201

0 2 1 2 1 0

⎪⎪⎭

⎪⎪⎩

=∀

=∀= ∑∑∑∑

= =

=

=i j n

i jij

j iijijj

C

qfj

pfifm

qpEMD1 1

1

1

:

:,min),(0 2 1 2 1 0

Thomas Seidl 9MMKM 2008, Milton Keynes

Page 10: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Mosaic Videos Require Faster RetrievalMosaic Videos, Require Faster Retrieval …

Demo AttentionAttractor[Assent Krieger Seidl: ICDE 2007][Assent, Krieger, Seidl: ICDE 2007]

Thomas Seidl 10MMKM 2008, Milton Keynes

Page 11: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Acceleration by Filter Refine ArchitecturesAcceleration by Filter-Refine Architectures

Usage of filters– Filter step for fast pruning– Filter step for fast pruning

– Refinement step for exact result[GEMINI: Faloutsos 1996; KNOP: Seidl&Kriegel 1998]

Quality of filters: ICESI I Filter is supported by index structures

C Filter does not miss qualifying objects

I ndex-enabled

C omplete

E Filter distance is calculated efficiently

S Filter yields only small candidate set

E fficient

S elective[Assent, Wenning, Seidl: ICDE 2006]

Thomas Seidl 11MMKM 2008, Milton Keynes

Page 12: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

A Filter for the EMDuse instead

of

A Filter for the EMD

⎬⎫

⎨⎧

∀∀≥∀∀∑∑ ∑∑n n nn

ij fjfifjifc

EMD 0i)(

≥ ⊆⎭⎬

⎩⎨ =∀=∀≥∀∀= ∑∑ ∑∑

= = ==i j íjij

jiijijij

jC yfjxfifjif

myxEMD

1 1 11

:,:,0:,min),(

⎫⎧∑∑ ∑

n n nijc

≥ ⊆Constraint Relaxation

⎭⎬⎫

⎩⎨⎧

≤∀∀=∀≥∀∀= ∑∑ ∑= = =i j

jijj

iijijijij

IM yfijxfifjifmc

yxLB1 1 1

:,:,0:,min),(

∑ ∑ ∑= = = ⎭

⎬⎫

⎩⎨⎧

≤∀=≥∀=n

i

n

jjij

n

jiijijij

ij yfjxffjfmc

1 1 1:,,0:,min

I +++C +++E +++S +++S +++

Filter: [Assent, Wenning, Seidl: ICDE 2006]Index: [Assent, Wichterich, Meisen, Seidl: ICDE 2008]Thomas Seidl 12

Page 13: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Scalability w r t Database SizeScalability w.r.t. Database Size

Selectivity Response Time

LB_Max LB_Avg 3DLB M LB M LB IM 3D10 00%

100.0

LB_Man LB_Man+LB_IM 3DLB_IM LB_Avg+LB_IM 3D

10.00%

10.0

1.00%

LB_Max

1.00.10%

LB_ManLB_AvgLB_IM

0.125k 50k 75k 100k 125k 150k 175k 200k

0.01%25k 50k 75k 100k 125k 150k 175k 200k

Data bases with 25,000 to 200,000 images, 64d color histograms, 10NN, log. scales

Page 14: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Scalability w r t DimensionalityScalability w.r.t. Dimensionality

Selectivity Response Time10.000% 1000.0

EMD LB Avg 3D

1.000% 100.0

EMD LB_Avg 3DLB_Man LB_Man+LB_IM 3DLB_IM LB_Avg+LB_IM 3D

0.100%

LB_ManLB_AvgLB IM

10.0

0.010%

LB_IM

1.0

0.001% 0.116 28 40 52 64

Data base with 200,000 images, color histograms of dimensionalities 16 to 64, 10NN, log. scales

16 28 40 52 64 16 28 40 52 64

Page 15: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

OverviewOverview

• Content-based similarity search– Complex models: quadratic forms, Earth Movers' distance– Efficient algorithms: approximations and indexing

• New interaction models (change of use)– Incremental search, relevance feedback, anytime querying

• From retrieval to new data mining tasksS b l t i tli d t ti t d t i i– Subspace clustering, outlier detection, stream data mining

Thomas Seidl 15MMKM 2008, Milton Keynes

Page 16: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Change of Interaction ModelsChange of Interaction Models

• Incremental Retrieval– From ranges over nearest neighbors to incremental o a ges o e ea est e g bo s to c e e ta

retrieval

• Relevance Feedback– From weights over covariances to ground distances

• Anytime Search– From blind waiting over progress estimation to

progress monitoring

Thomas Seidl 16MMKM 2008, Milton Keynes

Page 17: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Incremental Nearest Neighbor SearchIncremental Nearest Neighbor Search

{ }o DB d o q∈ ≤( ) ε

Range Queries k-nn Queries

d r DBq q DB-Ranking :ℑ →

4

{ }o DB d o q∈ ≤( , ) ε i i d r i q d r i qq q( ( ), ) ( ( ), )≤ ⇒ ≤1 2 1 2

1

23

4

lt t lt k t i hb (ℑ )no results too many results k-nearest neighbors: rq(ℑk)

Incremental Search: Give-me-more“Thomas Seidl MMKM 2008, Milton Keynes 17

Incremental Search: „Give me more

Page 18: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Relevance Feedback: MindReaderRelevance Feedback: MindReader

Regard cross-bin similarities (covariance matrix)Y. Ishikawa, R. Subramanya, C. Faloutsos: MindReader: Querying Databases Through Multiple Examples. Proc. 24th Int. Conf. on Very Large Data Bases, 1998.

Thomas Seidl 18MMKM 2008, Milton Keynes

Page 19: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Relevance Feedback: HistoryRelevance Feedback: History

Incorporate History of FeedbackExponential aging of former feedback influenceExponential aging of former feedback influence

Thomas Seidl 19MMKM 2008, Milton Keynes

Page 20: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Relevance Feedback: ForesightRelevance Feedback: Foresight

Two PhasesTwo Phases1. Approach fast to relevant area2. Adjust similarity model at destination areaj y

Thomas Seidl 20MMKM 2008, Milton Keynes

Page 21: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Relevance Feedback: Earth Mover‘s DistanceRelevance Feedback: Earth Mover‘s Distance

From weights over covariances (QF) to ground distances (EMD)

Thomas Seidl 21MMKM 2008, Milton Keynes

Page 22: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Anytime Data Mining (example: classification)Anytime Data Mining (example: classification)

Algor. 1Algor. 2Al 3Algor. 3Algor. 4

• No predefined budget but query can be interrupted at any timeExpectation: increasing classification accuracy• Expectation: increasing classification accuracy– The later the interruption, the better the classication accuracy should be

Yang Y G I Webb K Korb and K M Ting (2007) Classifying under Computational Resource Constraints: AnytimeYang, Y., G.I. Webb, K. Korb, and K-M. Ting (2007). Classifying under Computational Resource Constraints: Anytime Classification Using Probabilistic Estimators. Machine Learning 69(1). Netherlands: Springer, pp. 35-53.

Thomas Seidl 22MMKM 2008, Milton Keynes

Page 23: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

OverviewOverview

• Content-based similarity search– Complex models: quadratic forms, Earth Movers' distance– Efficient algorithms: approximations and indexing

• New interaction models (change of use)– Incremental search, relevance feedback, anytime querying

• From retrieval to new data mining tasksS b l t i tli d t ti t d t i i– Subspace clustering, outlier detection, stream data mining

Thomas Seidl 23MMKM 2008, Milton Keynes

Page 24: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

From Information Retrieval to Data MiningFrom Information Retrieval to Data Mining

• Multimedia Information Retrieval– Driven by queries („query by example“)– Adaptable models for content-based similarity (QF, EMD)– Interactive usage: incremental search, relevance feedback

• Multimedia Data MiningN bj t (Whi h t b it? D b i h l ?)– No query objects (Which to submit? Does browsing help?)

– Reveal patterns which are hidden in vast amounts of data: regularities, irregularitiesg , g

– Typical tasks: clustering, mining association rules, aggregation / generalization of data

Thomas Seidl MMKM 2008, Milton Keynes 24

Page 25: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Subspace ClusteringSubspace Clustering

27

28

2929

30

• Cluster structures are hidden by noisy dimensions• Separate relevant and irrelevant dimensions locally• Identify clusters and their respective subspacesIdentify clusters and their respective subspacesThomas Seidl 25MMKM 2008, Milton Keynes

Page 26: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Subspace Outlier DetectionSubspace Outlier Detection

• New challenge: how to define outliers wrt. subspace clusters?• Definition: clusters = dense areas outliers = singularities• Definition: clusters = dense areas, outliers = singularities• Question: Outliers = noise … or outliers = objects of interest ?• Task: Outlier detection = complementary task to clusteringTask: Outlier detection complementary task to clusteringThomas Seidl 26MMKM 2008, Milton Keynes

Page 27: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Stream Aggregation and GeneralizationStream Aggregation and Generalization

aggr1

Data Aggregation

aggr2

aggr

3Data

Data

Limitations: battery power; bandwidth; monitoring attention

Data Integration

Thomas Seidl MMKM 2008, Milton Keynes 27

Page 28: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

ConclusionConclusion

• Content-based similarity search– Complex models: quadratic forms, Earth Movers' distance– Efficient algorithms: approximations and indexing

• New interaction models (change of use)– Incremental search, relevance feedback, anytime querying

• From retrieval to new data mining tasksS b l t i tli d t ti t d t i i– Subspace clustering, outlier detection, stream data mining

Thomas Seidl 28MMKM 2008, Milton Keynes

Page 29: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Selected Publications• Assent I., Krieger R., Glavic B., Seidl T.: Clustering of Multidimensional Spatial Sequences. In: Int. Journal on Knowledge and Information

Selected Publications

Systems (KAIS), 2008.• Seidl T.: Nearest Neighbor Classification. In: Liu L., Özsu M. T. (Eds.): Encyclopedia of Database Systems. Springer, 2008. (to appear)• Wichterich M., Assent I., Kranen P., Seidl T.: Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality

Reduction. Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD), Vancouver, Canada, 2008.• Assent I., Wichterich M., Meisen T., Seidl T.: Efficient similarity search using the Earth Mover's Distance for large multimedia databases.

Proc. IEEE Int. Conf. on Data Engineering (ICDE), Cancun, Mexico, 2008.• Assent I., Krieger R., Afschari F., Seidl T.: The TS-Tree: Efficient Time Series Search and Retrieval. Proc. Int. Conf. on Extending Data

Base Technology (EDBT), Nantes, France, 2008.• Assent I., Krieger R., Welter P., Herbers J., Seidl T.: SubClass: Classification of Multidimensional Noisy Data Using Subspace Clusters.

Proc. Int. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Osaka, Japan, Springer LNCS-LNAI, 2008.• Wichterich M., Beecks C., Seidl T.: Ranking Multimedia Databases via Relevance Feedback with History and Foresight Support. Proc. 2nd

Int. Workshop on Ranking in Databases (DBRank) in conj. with IEEE 24th Int. Conf. on Data Engineering (ICDE), Cancun, Mexico, 2008. • Müller E., Assent I., Steinhausen U., Seidl T.: OutRank: ranking outliers in high dimensional data. Proc. 2nd Int. Workshop on Ranking in

Databases (DBRank) in conjunction with IEEE 24th Int. Conf. on Data Engineering (ICDE), Cancun, Mexico, 2008.• Zaki M.J., Peters M., Assent I., Seidl T.: Clicks: An effective algorithm for mining subspace clusters in categorical datasets. Data &

Knowledge Engineering (DKE) 60 (1): 51 70 2007Knowledge Engineering (DKE) 60 (1): 51-70, 2007.• Assent I., Krieger R., Müller E., Seidl T.: VISA: Visual Subspace Clustering Analysis. In ACM SIGKDD Explorations Special Issue on

Visual Analytics, Vol. 9(2), Dec. 2007.• Assent I., Krieger R., Müller E., Seidl T.: DUSC: Dimensionality Unbiased Subspace Clustering. Proc. IEEE Int. Conf. on Data Mining

(ICDM), Omaha, Nebraska, USA, 2007.Assent I Krieger R Seidl T : AttentionAttractor: efficient video stream similarity query processing in real time (Demo) Proc IEEE 23rd Int• Assent I., Krieger R., Seidl T.: AttentionAttractor: efficient video stream similarity query processing in real time (Demo). Proc. IEEE 23rd Int. Conf. on Data Engineering (ICDE), Istanbul, Turkey, 2007.

• Assent I., Wenning A., Seidl T.: Approximation Techniques for Indexing the Earth Mover's Distance in Multimedia Databases. Proc. IEEEInt. Conf. on Data Engineering (ICDE), Atlanta, Georgia, USA, 2006.

• Assent I., Wichterich M., Seidl T.: Adaptable Distance Functions for Similarity-based Multimedia Retrieval. In: Datenbank-Spektrum Nr. 19: 23-31 200619: 23-31, 2006.

Thomas Seidl MMKM 2008, Milton Keynes 29

Page 30: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

AbstractAbstract

Multimedia data archives, databases, and web services grow from day to day with a high speed. In order to cope with the huge amount of data, new exploration models and scalable algorithms need to be developed Theexploration models and scalable algorithms need to be developed. The expected changes of use also demand changes on the technology level. Starting with content-based retrieval based on complex distance functions including quadratic forms and Earth Movers' distance interaction modelsincluding quadratic forms and Earth Movers distance, interaction models such as incremental search and relevance feedback are discussed. New data mining approaches including subspace clustering, outlier mining, stream data mining and anytime classification also will be applied tostream data mining, and anytime classification also will be applied to multimedia databases in the future. Following this trend, the underlying retrieval and mining technologies are highly demanded to be extended. F t d l t ill i ld l i ti i d i t h iFuture developments will yield novel approximations, indexing techniques, and multi-step query processing in order to provide efficiency and scalability.

Thomas Seidl 30MMKM 2008, Milton Keynes

Page 31: The Future of Multimedia DatabasesThe Future of Multimedia

Data Management and Data ExplorationProf. Dr. Thomas Seidl

Bio of Thomas SeidlBio of Thomas Seidl

Thomas Seidl is a full professor for Computer Science and head of the Data Management and Data Exploration group at RWTH Aachen University, Germany, where he currently advises eight PhD students.

His research interests include data mining and data management in multimedia and spatio-temporal databases for applications from computational biology, medical imaging, mechanical engineering, computer graphics, etc. whith a focus on content, h t t f l bj t i l d t b C t j t i t f tshape or structure of complex objects in large databases. Current projects aim at fast

content-based multimedia retrieval, relevance feedback, subspace clustering, outlier detection, stream data minining, and anytime mining algorithms. His research in the field of relational indexing aims at exploiting the robustness and high performance ofof relational indexing aims at exploiting the robustness and high performance of relational database systems for complex indexing tasks.

Having finished his MS in 1992 at the Technische Universität München, Thomas received his Ph.D. in 1997 and his venia legendi in 2001 from the University of Munich, g y ,Germany. In 2001, he was a guest lecturer at the University of Augsburg and from 2001 to 2002, he held a substitute professorship for Databases, Data Mining, and Visualization at the University of Constance, Germany.

Thomas Seidl 31MMKM 2008, Milton Keynes