21
Madonne Talk (Tours University) 7 th November 2006 A Fast System for Dropcap Image Retrieval Mathieu Delalandre and Jean-Marc Ogier L3i, La Rochelle University, France [email protected]

Madonne Talk (Tours University) 7 th November 2006 A Fast System for Dropcap Image Retrieval Mathieu Delalandre and Jean-Marc Ogier L3i, La Rochelle University,

Embed Size (px)

Citation preview

Madonne Talk (Tours University)7 th November 2006

A Fast System for Dropcap Image Retrieval

Mathieu Delalandre and Jean-Marc Ogier

L3i, La Rochelle University, France

[email protected]

Madonne Talk (Tours University)7 th November 2006

Short CV

Madonne Talk (Tours University)7th November 2006

Short CV

Personal Information Mathieu Delalandre, 32 years old, Married

Academic Degrees 1995-1998 Lic.Sc In Industrial Computing Rouen University, France 1998-2001 M.Sc in Computer Science Rouen University, France

Research Experiences (5 years, Graphics Recognition) 04/01-09/01 Master PSI Laboratory (Rouen, France) 10/01-04/05 PhD PSI Laboratory (Rouen, France) 05/05-09/05 Post-doc SCSIT (Nottingham, England) 10/05-10/06 Post-doc L3i Laboratory (La Rochelle, France) 11/06-12/06 Post-doc PSI Laboratory (Rouen, France) 01/07-12/09 Post-doc CVC (Barcelone, Spain)

Madonne Talk (Tours University)7 th November 2006

Introduction

- Old books

- Old graphics retrieval

- Our problem

Madonne Talk (Tours University)7th November 2006

IntroductionOld books

Old books of XV° and XVI° centuries Samples

Bartolomeo (1534)

Alciati (1511)

Laurens (1621)

figure

dropcap

headlineheadline

Example of digitized database

(BVH, CESR Tours)

Book 46

Page 1385

Graphics 4755 (3.4/page)

Foreground pixel

63% textual

37% graphical

Graphics type 41% dropcap

59% others Old Graphics

Graphics/Book

0100200300400500600700

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46

Books

Gra

ph

ics n

um

ber

- Old books- Old graphics retrieval- Our problem

Madonne Talk (Tours University)7th November 2006

IntroductionOld graphics retrieval

- Old books- Old graphics retrieval- Our problem

Image Database

Query

Extraction Comparison

Index

IndexingRetrieval

Manual Index

System overview General architecture

Samples

Pareti’05Graphics style

Zip law

Uttama’05Document layout

MST

Baudrier’05Sub image

Hausdorff distance

Bigun’96Stroke image

Radiogram orientation

letter (c) topic (vegetal) pattern (cross)

Retrieval criterion

Madonne Talk (Tours University)7th November 2006

IntroductionOur problem (1/2)

Context MAsse de DOnnées issues de la Numérisation du

patrimoiNE (MADONNE) Project Bibliothèques Virtuelles Humanistes (BVH)

du Centre d’Etudes Supérieures de la Renaissance (CESR)

Class 1 Class 2 Class 3

printingWood plug(bottom view)

Vascosan 1555 Marnef 1576

Wood Plug Tracking

Printing house

tampon

exchange

copy

1531-1548

1511-1542

1555-1578

1497-1507

- Old books- Old graphics retrieval- Our problem

Madonne Talk (Tours University)7th November 2006

IntroductionOur problem (2/2)

Problem features No scaled, no oriented Noise

Offset

Complexity

Accuracy

Scalability

descriptors

fast localcomplex global

Descriptor choice

To scalar [Loncaric’98] Hough, Radon,

Zernike, Hu, Fourrier Scaled and

orientation invariant fast local

To image [Gesu’99] Template matching,

Hausdorff distance no scaled and

orientation invariant global (scene)

Query

CompressionCentering

andComparison

R1 R2 R3

Formatting

Image Database

- Old books- Old graphics retrieval- Our problem

Madonne Talk (Tours University)7 th November 2006

Our system

CompressionCentering

andComparison

Formatting

Madonne Talk (Tours University)7th November 2006

Our systemFormatting

Digitalization problems [Lawrence’00] Problem sources

Several image providers Several digitalization tools Length of process Human supervised …

QUEID « QUery Engine on Image Database »

Diagnostic

Base

Expertise

QUEID

query

charts

analysis

Format

CompressionCentering

andComparison

Formatting

OLDB (Ornamental Letters Database) Before (oldb.jpg)

After

Packbits and JpegCompression

?; from 72 to 450 dpiResolutions

Jpeg and TiffFormats

gray and colourModel

377.7 MpSize

2803Files

250 to 350Resolutions

UncompressCompression

TiffFormats

grayModel

279.7 MpSize

2038Files

Madonne Talk (Tours University)7th November 2006

Our systemCompression

Run based compression Run Length Encoding (RLE)

Compression rate

pixel

runc n

nt 1

]1,[ pixelrun nn

[1,0[ct RLE Types

image foreground background both

OLDB results Fixed threshold binarisation Both RLE

Compression rate/Dropcap

0,7

0,8

0,9

1

1 201 401 601 801 1001 1201 1401 1601 1801 2001

Dropcap

Co

mp

res

sio

n r

ate

0.75

0.950.88

CompressionCentering

andComparison

Formatting

Madonne Talk (Tours University)7th November 2006

Our systemCentering and comparison

Centering

x2 x2x2

x1x1 x1

x2 x2

x1

line (y) image 1

line (y+dy) image 2

xstack

pointeur

while x2 x1 handle image 2while x1 x2 handle image 1

OLDB results

Raster sizes

0

200

400

600

1 201 401 601 801 1001 1201 1401 1601 1801 2001

Dropcap

Size

(k.p

ixel)

903.62600.8Max

337.06137.7Mean

176.677.74Min

Time

s

Size

k.pixel

Run Sizes

0

200

400

600

1 201 401 601 801 1001 1201 1401 1601 1801 2001

Dropcap

Size

(K.ru

n)

137.0687.8Max

41.6815.5Mean

22.321.1Min

Time

s

Size

k.run

kg ,...2,1

lh ,...2,1lk

k

i i

jiikl

jyx h

ghd

10, min

Comparison

CompressionCentering

andComparison

Formatting

r

n

ii tntt

1

image database

query image

Madonne Talk (Tours University)7 th November 2006

In progress

Madonne Talk (Tours University)7th November 2006

In progress

2

clusterth

2121

2121

,max,max vvuu

vvuud

query

1st Level

2sd Level

Our problem Current time : 40 s Wished time : < 4 s

To use a lossless

compression

To use a system

approach

Key idea

First system Level 1 : image sizes Level 2 : black, white pixels Level 3 : RLE comparison

Depth

Speed

Selection algorithm

Distance curve

00,10,20,30,40,50,60,70,8

1 167 333 499 665 831 997 1163 1329 1495 1661 1827 1993

Dropcap

Dis

tan

ce

1

2

if 1 - 2 < 0

push x, cluster

while 1 - 2 < 0

next

Madonne Talk (Tours University)7th November 2006

In progress

0

5

10

15

20

25

30

35

40

0

5

10

15

20

25

30

35

40

Depth level

0%

20%

40%

60%

80%

100%

1 195 389 583 777 971 1165 1359 1553 1747 1941

Dropcap

Dep

th (%

)

SizesDensities

OLDB results

59%Max

24%Mean

4%Min

Depth

%

To decrease variability

To work on

selection

To add a level

Run based signature

0

5

10

15

20

25

30

35

40

Madonne Talk (Tours University)7th November 2006

In progress

Query example

0.1947 0.2517 0.3485 0.3616 0.3819 0.4064

Same plug

Next plug

Query

0.4109 0.4209

Performance evaluation

BaseIHM

Retrieve engine

control

display

retrieve

Labels

driven labelling

Bench1 Bench2 Bench2To produce

Criterion ? - Scalability- Accuracy- Time processing

Benchmark system

Madonne Talk (Tours University)7 th November 2006

Conclusions and perspectives

Madonne Talk (Tours University)7th November 2006

Conclusions et perspectives

Conclusions Dropcap image retrieval « wood tracking » Formatting image database (QUEID) Fast approach, two features

RLE comparison (7 to 9) Top-down strategy (2 to 20)

Results 10 s for 2000 images (300 Mo)

Perspectives Working on RLE signature Benchmark system for performance evaluation

Madonne Talk (Tours University)7 th November 2006

Bibliography

Madonne Talk (Tours University)7th November 2006

Bibliography

1. J. Bigun, S. Bhattacharjee, and S. Michel. Orientation radiograms for image retrieval: An alternative to segmentation. In International Conference on Pattern Recognition (ICPR), volume 3, pages 346-350, 1996.

2. V. D. Gesu and V. Starovoitov. Distance based function for image comparison. Pattern Recognition Letters (PRL), 20(2):207-214, 1999.3. S. Loncaric. A survey of shape analysis techniques. Pattern Recognition (PR), 31(8):983-1001, 1998.4. R. Pareti and N. Vincent. Global discrimination of graphics styles. In Workshop on Graphics Recognition (GREC), pages 120-128, 2005.5. S. Uttama, M. Hammoud, C. Garrido, P. Franco, and J. Ogier. Ancient graphic documents characterization. In Workshop on Graphics Recognition (GREC),

pages 97-105, 2005.6. E. Baudrier, G. Millon, F. Nicolier, and S. Ruan. A fast binary-image comparison method with local-dissimilarity quantification. In International

Conference on Pattern Recognition (ICPR), volume 3, pages 216- 219, 2006.

Madonne Talk (Tours University)7 th November 2006

Thanks …