13
Project Title Lorem Ipsum Second level Third level Fourth level » Fifth level Introduction to Optical Character Recognition (OCR) Software D-Lab workshop, February 28, 2017 Quinn Dombrowski, DH Coordinator, Research IT, [email protected]

Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

Project Title •  Lorem Ipsum

–  Second level •  Third level

–  Fourth level »  Fifth level

Introduction to Optical Character Recognition

(OCR) Software

D-Lab workshop, February 28, 2017

Quinn Dombrowski, DH Coordinator, Research IT, [email protected]

Page 2: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

Berkeley Research Computing

Page 3: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

Berkeley Research Computing + DH

Page 4: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

Optical Character Recognition (OCR)

Page 5: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

Optical Character Recognition (OCR)

•  Complex layout

•  Non-English (includes Chinese, Japanese, Korean, Cyrillic, Greek, Arabic, Hebrew)

•  Multilingual

•  Precision

Page 6: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

When precision matters

Page 7: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

Google Calendar + Docs + Forms

Page 8: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

OCR at scale

+

Postdoc Adam Anderson

Page 9: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

OCR at scale

Page 10: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

OCR at scale

Page 11: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

Tesseract with complex layouts

ified  that  rabbits  could  make  up  no  more  than  five  percent  of  her  sales  volume.    Then  last  summer,  Crymes  con;nues,  “I  was  called  to  the  office.  [The  Rouse  leasing  agent]  said  to  me,  ‘We  are  ;red  of  bears.  We  want  you  to  get  rid  of  all  your  bears  by  October  l.‘  Isaid,  ‘Excuse  me,  but  this  isJulv  28.  That’s  a  liMle  sudden.’  They  said,  We,“  we  think  bears  are  passe,  and  your  sales  have  been  down.’  I  have  to  report  [sales  figures  to  man-­‐  agement]  every  month,  and  sales  were  down,  but  they  were  down  for  a  lot  of  people  tn  Har-­‐  borplace.  It  was  not  just  me,  although  at  the  ;me  I  didn’t  really  know  it.  ”    

a;  mi  "‘  U;  ,_  wt  «’0'  Ia‘mfamwfi-­‐ww“  I.  4!;  i.  '1  W1»,  1,  3  i  .3.  ‘  

Page 12: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

Early modern OCR

ïll)  h  mgm  ne  m  adii:iïot  bigg  in  meGXWa-­‐'-­‐  owne  a  monster'  kind  more  ’'  z  ;  be  aeth  n  hischrefl':ſ  ety  ;ſh  lm  lil    twal  m  abi?  id  an  '  which  if  muchm  dw-­‐  ldo  not  deceiue  meas  -­‐  i  -­‐'Worth)f  to  bc  a  ſan?1ngl:'?a  gallater  offendpr  T  his  ay  p  fgx:!?tjhëcauſe!M    may  hc  enartſo;ontoſaysh?mn  becaulëitw-­‐  ;bet-­‐uerſo.-­‐  Readëz:Mhell  atjymmdlahm?sesstrlhefflllgës  your  good  -­‐  melaminst  l  lind-­‐hamamde  mt,but  ap  e  at.Apd-­‐  lea  solon  sapheMerrstuli,thanlasln  a,  a  e  r  stmp  gia  a'm  fsathars'eymu  willen-­‐on;-­‐  ue  tolouethe  V  and  mostë  

http://emop.tamu.edu/

Page 13: Introduction to Optical Character Recognition (OCR) Software · 2019. 12. 31. · Project Title • Lorem Ipsum – Second level • Third level – Fourth level » Fifth level Introduction

Thanks. Questions?

For more information visit

research-it.berkeley.edu/ocr Email: [email protected]