Upload
calvin
View
28
Download
0
Embed Size (px)
DESCRIPTION
Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing. Stefan Pletschacher; Marcel Eckert; Arved C. Hübler. GEB1150. Digitization of Historical Documents. Alphabet und Font Extraction. Vectorization - Raster to Vector Conversion. 41 hex. encoded text e.g. ASCII. - PowerPoint PPT Presentation
Citation preview
Institut für Print- und Medientechnik der TU Chemnitz[Institute for Print and Media Technology • Chemnitz University of
Technology] Direktor: Prof. Dr. Arved C. Hübler • Reichenhainer Str. 70 • 09126 Chemnitz • Germany
http://www.tu-chemnitz.de/pm • [email protected] • Tel: +49-371-531-2364 • Fax: -3780
Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing Stefan Pletschacher; Marcel Eckert; Arved C. Hübler
2 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Digitization of Historical Documents
GEB1150
3 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Alphabet und Font Extraction
XML instance
alphabet and fontdefinition
...
content
...
glyph ID1
glyph ID2
ID1 ID2 ID3ID3 ID4
4 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Vectorization - Raster to Vector Conversion
font assignmen
t
Vectorization
RIP
41 hex
OCR
vector font
encoded text e.g.
ASCII
bitmap graphic
5 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
document image
blocks
textual blocks image blocks
structural information
region basedsegmentation
blockclassification
text lines
segmentation
words
characters
segmentation
segmentation
DIA System und Workflow
1. text (headline)
2. bitmap image 3. text block
4. text block
…
1. text (headline)
2. bitmap image 3. text block
4. text block
…
6 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
character images
set of prototypes
clustering
vectorisable glyphs
classification ofvectorisable glyphs
non vectorisable
images
set of bitmap symbols
IDassignment
set of vectorised
paths
vectorisation
document specific
SVG font
transformationto SVG
set of SVG glyph
descriptions
assignment of private Unicode code points
DIA System und Workflow

7 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
DIA System und Workflow
XML + SVG encoded
document
image blocks
structural information
set of bitmap symbols
document specific
SVG font
encoding
references
specific output formats
layout modificationby means of XSLT
OCR
XML
1. text (headline)
2. bitmap image 3. text block
4. text block
…
1. text (headline)
2. bitmap image 3. text block
4. text block
…
8 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Vectorization Approaches
• Contour based
• Skeleton based
CompxNCompxCore )(: CorexCompxCont :
zxyxCompContxdist
zyCompContzyCompxS
)(,(
, ),(,:
9 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Applied Algorithms
• Pre-processing- Finding connected components (Region Growing)- Contour extraction (Contour following)
• Polygonal Approximation Based on Relaxation- Phase 1: Clustering of polygonal points- Phase 2: Relaxation (Error correction)
• Automatic Parameter Control- Rasterization of the resulting glyph images- Ascertaining a weighted error (Ground Truth)- Selecting appropriate vectorization parameters
10 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Finding Connected Components
Ü Ö Ä % “ !
11 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Region Growing
12 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Contour Following
white pixel
black pixel
starting point
examination order
13 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Clustering of Polygonal Points
14 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Relaxation
15 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
SVG Representation
16 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Visual Quality
17 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Formal Quality Measurement - Ground Truth
Error function- absolute number of wrong pixels- weighted by the distance to the next true component
18 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Results
19 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 1,2
vectorization parameter ε
acc
ura
cy
H K d
Adaptive Parameter Control
-5
-4
-3
-2
-1
0
1
2
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 1,2
vectorization parameter ε
accu
racy
gra
die
nt
H K d
20 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Compression rates
21 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Conclusions
• Good vectorization results already with linear primitives• High compression rates can be achieved• Extracted fonts can be easily scaled and further formatted• Known vectorization methods have been extended towards an adaptive system for automatic parameter control• These methods can be applied for preservation and handling of unknown type faces in digitized documents• Originals may be re-encoded using a document specific alphabet and font• Direct integration into XML/SVG based processes possible• Various output formats can be supported by means of XSL transformations
22 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Thank you very much!
Questions