19
ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Embed Size (px)

Citation preview

Page 1: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

ST22 revision proposal

June-2006WIPO-SDWG meeting Geneva

Page 2: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Agenda

• Reasons for the revision of the ST22– Age of current standard– Expected benefits– PCT International Bureau experience– Examples of pages difficult to OCR– Conclusion

• Discussion / Questions

Page 3: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Age of current standard

• Inadequate title: “Recommendation for the presentation of patent applications typed in optical character recognition (OCR) format”

• Contains valid recommendations but expressed using an old-fashioned terminology (ribbons, typewriter,…). Some recommendations need to be precised.

• A few new recommendations should be added to take into account the progress in OCR technology in the last 10 years.

• Not enough followed by agents/applicants: some promotion is required

Page 4: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Expected benefits• Experience shows that if documents follow simple layout

rules, the automatic OCR procedures are sufficiently effective to yield a satisfying result for full text search purposes (i.e. an average accuracy above 98.5%).

• An updated standard ST22 would lead to:– Significant reductions in cost for the OCR procedures

performed by the IP regional/national offices and the IB.

– Better quality for the full-text published documents built from OCR procedures

– More efficient and precise search procedures for the IP community

Page 5: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

PCT International BureauExperience

• An internal automatic OCR system and a Quality Checking system have been developed by the PCT

• The system has been tested for 6 months and then put in production. It has been in operations since January, 1st 2006 and OCRs the pamphlets published weekly by the PCT.

Page 6: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Internal OCR key points

• Use an off-the-shelf commercial product and adapt it to the PCT needs

• Build a generic and scalable service so that the OCR function can be used from different applications (on- line or batch) and fulfill PCT future needs

• Operate the service in house to reduce costs and gain flexibility in the publication process (discontinue Outsourcing contract)

Page 7: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Internal OCR: key points

• OCR the description and claims sections of the published PCT pamphlets each week (circa 50’000 pages to OCR weekly)

• Provide the results as ST36 XML files that are used to feed the indexation engine of the Patentscope site and the espacenet site (see

http://www.wipo.int/pctdb/en/browse.jsp)

• Enrich the PCT electronic products with the results of the OCR (searchable PDFs added to the rule 87 DVD)

Page 8: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Internal OCR some figures

• With our hardware configuration, the OCR of a complete publication week lasts around 16 hours (it runs during week ends).

• 5 staffs are performing part-time Quality Checking operations every Monday (Around 3 to 4 man days are spent each week on quality checking) in order to correct the worse cases.

Page 9: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Quality Checking system

Page 10: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Quality Checking system

Page 11: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Some examples of difficult pages submitted in paper

or in image form, the revised ST22 standard should discourage...

Page 12: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Narrow fonts, justified paragraphs

Page 13: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Underline, italic, bold text

Page 14: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Subscripts too small

Page 15: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Mathematical formulae embedded in text

Page 16: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Handwritten text or cursive fonts

Page 17: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Gray or coloured backgrounds

Page 18: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Conclusion

We invite the SDWG to:(a) to consider the proposal to revise WIPO Standard ST.22; and

(b) to consider establishing a task for the revision of WIPO Standard ST.22 and to set up a Task Force to handle such revision.

Page 19: ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

Agenda

• Reasons for the review of the ST22– Age of current standard– Expected benefits– PCT International Bureau experience– Examples of applications difficult to OCR– Conclusion

• Discussion / Questions