18
IAEA International Atomic Energy Agency OCR at INIS Database Production & Imaging Group Yves Reynaud Y.Reynaud-Pulido @ iaea.org INIS Training Seminar 14-16 November 2011, Vienna, Austria

OCR at INIS Database Production & Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

  • Upload
    eliot

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

OCR at INIS Database Production & Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea.org. INIS Training Seminar 14-16 November 2011, Vienna, Austria. Some OCR features. We can find the needle in the haystack OCR offers a basic search from an unstructured document . - PowerPoint PPT Presentation

Citation preview

Page 1: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEAInternational Atomic Energy Agency

OCR at INIS Database Production & Imaging Group

Yves [email protected]

INIS Training Seminar14-16 November 2011, Vienna, Austria

Page 2: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA 2

Some OCR features

We can find the needle in the haystack

• OCR offers a basic search from an unstructured document.

• OCR brings to life your digitilazed collection.• OCR adds an extra value to your image.

INIS Training Seminar14-16 November 2011, Vienna, Austria

Page 3: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

3

OCR is a computer technology software that

• Translate images handwritten or typewritten text into machine-editable text.

• Translate pictures of characters into a standard encoding scheme representing them (e.g. ASCII or Unicode).

Page 4: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

4

• Scanned Image (paper or micrographic)

• Vector Image (created from native application) here a raster image for sake of comparison

Page 5: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

5

“Do not see the trees (letters)try to see the forest (sentences)“

F0R 488UR1N6 7H3 L0N63V17Y 0F 1NF0RM4710N, P3RH4P8 7H3 M087 1MP0R74N7 R0L3 1N 7H3 0P3R4710N 0F 4 D16174L 4RCH1V3 18 M4N461N6 7H3 1D3N717Y, 1N736R17Y 4ND QU4L17Y 0F 7H3 4RCH1V38 1783LF 48 4 7RU873D 80URC3 0F 7H3 CUL7UR4L R3C0RD.

Page 6: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

6

Verdana

FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD.

Page 7: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

7

Brush Script MT (Windows Font)

FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD.

Page 8: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

8

PCs ≠ Humans

• OCR compares patterns and selects closer match, it can be forced to a specific context but requires customization.

• People adapt to circumstances and can circumvent misspellings if context is clear.

Page 9: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

9

True or false

Usually, an image is adequately sampled if each letter is at least two pixels in thickness:

Page 10: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

10

Zoom in

Page 11: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

11

Zoom in

Page 12: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

12

Results from OCR

It is in this context that I…

… and an additional protocol on the basis…

Page 13: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

13

Chinese in pixels

Page 14: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

14

Chinese vector images from OCR滤器

Page 15: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

15

Arabic in pixels

Page 16: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

16

Arabic vector images from OCR

ا هذوشملت

Page 17: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

17

(12)where a . The indices now range from 1 to 5. The bosonic fields obey the commutation rules

(13)

InftyReader - an OCR System for Math Documents

Page 18: OCR at INIS Database Production  &  Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea

IAEA INIS Training Seminar 14-16 November 2011, Vienna, Austria

18

Thank you