22
Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology • Chemnitz University of Technology] Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing Stefan Pletschacher; Marcel Eckert; Arved C. Hübler

Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

  • Upload
    calvin

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing. Stefan Pletschacher; Marcel Eckert; Arved C. Hübler. GEB1150. Digitization of Historical Documents. Alphabet und Font Extraction. Vectorization - Raster to Vector Conversion. 41 hex. encoded text e.g. ASCII. - PowerPoint PPT Presentation

Citation preview

Page 1: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

Institut für Print- und Medientechnik der TU Chemnitz[Institute for Print and Media Technology • Chemnitz University of

Technology] Direktor: Prof. Dr. Arved C. Hübler • Reichenhainer Str. 70 • 09126 Chemnitz • Germany

http://www.tu-chemnitz.de/pm • [email protected] • Tel: +49-371-531-2364 • Fax: -3780

Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing Stefan Pletschacher; Marcel Eckert; Arved C. Hübler

Page 2: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

2 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Digitization of Historical Documents

GEB1150

Page 3: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

3 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Alphabet und Font Extraction

XML instance

alphabet and fontdefinition

...

content

...

glyph ID1

glyph ID2

ID1 ID2 ID3ID3 ID4

Page 4: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

4 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Vectorization - Raster to Vector Conversion

font assignmen

t

Vectorization

RIP

41 hex

OCR

vector font

encoded text e.g.

ASCII

bitmap graphic

Page 5: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

5 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

document image

blocks

textual blocks image blocks

structural information

region basedsegmentation

blockclassification

text lines

segmentation

words

characters

segmentation

segmentation

DIA System und Workflow

1. text (headline)

2. bitmap image 3. text block

4. text block

1. text (headline)

2. bitmap image 3. text block

4. text block

Page 6: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

6 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

character images

set of prototypes

clustering

vectorisable glyphs

classification ofvectorisable glyphs

non vectorisable

images

set of bitmap symbols

IDassignment

set of vectorised

paths

vectorisation

document specific

SVG font

transformationto SVG

set of SVG glyph

descriptions

assignment of private Unicode code points

DIA System und Workflow

&#xE000

Page 7: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

7 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

DIA System und Workflow

XML + SVG encoded

document

image blocks

structural information

set of bitmap symbols

document specific

SVG font

encoding

references

specific output formats

layout modificationby means of XSLT

OCR

XML

1. text (headline)

2. bitmap image 3. text block

4. text block

1. text (headline)

2. bitmap image 3. text block

4. text block

Page 8: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

8 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Vectorization Approaches

• Contour based

• Skeleton based

CompxNCompxCore )(: CorexCompxCont :

zxyxCompContxdist

zyCompContzyCompxS

)(,(

, ),(,:

Page 9: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

9 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Applied Algorithms

• Pre-processing- Finding connected components (Region Growing)- Contour extraction (Contour following)

• Polygonal Approximation Based on Relaxation- Phase 1: Clustering of polygonal points- Phase 2: Relaxation (Error correction)

• Automatic Parameter Control- Rasterization of the resulting glyph images- Ascertaining a weighted error (Ground Truth)- Selecting appropriate vectorization parameters

Page 10: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

10 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Finding Connected Components

Ü Ö Ä % “ !

Page 11: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

11 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Region Growing

Page 12: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

12 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Contour Following

white pixel

black pixel

starting point

examination order

Page 13: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

13 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Clustering of Polygonal Points

Page 14: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

14 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Relaxation

Page 15: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

15 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

SVG Representation

Page 16: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

16 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Visual Quality

Page 17: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

17 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Formal Quality Measurement - Ground Truth

Error function- absolute number of wrong pixels- weighted by the distance to the next true component

Page 18: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

18 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Results

Page 19: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

19 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 1,2

vectorization parameter ε

acc

ura

cy

H K d

Adaptive Parameter Control

-5

-4

-3

-2

-1

0

1

2

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 1,2

vectorization parameter ε

accu

racy

gra

die

nt

H K d

Page 20: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

20 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Compression rates

Page 21: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

21 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Conclusions

• Good vectorization results already with linear primitives• High compression rates can be achieved• Extracted fonts can be easily scaled and further formatted• Known vectorization methods have been extended towards an adaptive system for automatic parameter control• These methods can be applied for preservation and handling of unknown type faces in digitized documents• Originals may be re-encoded using a document specific alphabet and font• Direct integration into XML/SVG based processes possible• Various output formats can be supported by means of XSL transformations

Page 22: Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

22 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Thank you very much!

[email protected]

Questions