26
Quality of Classification

Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Embed Size (px)

Citation preview

Page 1: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Quality of Classification

Page 2: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Optimum:

All documents pertaining to specific technical area (concept) are found by classification search

What to achieve ?

Recall = = 1# retrieved relevant documents

# existing relevant documents

For concepts defined in IPC:

documents have all appropriate symbols

< > Efficiency: documents have no inappropriate symbols

Priority 1:

Priority 2:

Page 3: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

document is unclassified

has wrong / inappropriate classification

has outdated / invalid classification

non-exhaustive / incomplete classification

> appropriate symbols are missing

> given symbols are not specific enough

varying classifications of family members

excessive classification

Phenomenology of quality issues

Page 4: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Different aspects

individual document / publication- classification by publishing IPO- and by other IPOs, e.g. EPO > ECLA

DPMA > "ICP"JPO,… ?

> examiners create their own search files

different publication levels:- unexamined (unsearched) applications- granted patents

families: in MCD reclassification at family level

data in different databases

Page 5: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?
Page 6: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Unclassified documents

Published before 1.1.2006:

many documents in MCD still unclassified / not reclassified:

92% of all documents in MCD*

87% of all documents of EPO members

Published after 1.1.2006:97% of all documents in MCD91% of all WO

each week 6 - 8% of WO publications are not classified at all

*cf IPC/CE/40/4

Page 7: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

06.07.06 14.09.06 23.11.06 01.02.07 12.04.07 21.06.07 30.08.07 08.11.07 17.01.08

Publication week

% u

ncla

ssifi

ed W

O d

ocs

/ w

eek

0

50

100

150

200

250

300

350

400

Num

ber

of u

ncla

ssifi

ed W

O /

wee

k

Percentage unclassifiedNumber unclassified

Unclassified WO documents

Page 8: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Publication week 50 (13.12.2007): 260 of 3272 (7.9%)

ISA

EP 218 (84%)

KR 27 (10%)

AU 5

US 5

RU 2

SE 2

CA 1

Receiving Office

US 177

IB 31

EP 26

GB 9

KR 3

DE 2

FR 2

IL 2

:

Unclassified WO documents

Lesson : There are still many documents without any valid classification

> Top priority: All documents should have at least one valid classification

Page 9: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Wrong classification

A61N 1/00 Electrotherapy; Circuits therefor

courtesy of M. Meier (Audi)

Page 10: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Wrong classification

B60K Arrangement or mounting of propulsion units or of transmissions in vehicles

Lesson : Completely wrong classifications do occur

courtesy of M. Meier (Audi)

Page 11: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Wrong classification

Lesson : Typos may occur; flaws of concordance tables

Example: WO2007126503

ISR: G01L 19/02

Espacenet: G10L 19/02

Wrong classifications: difficult to investigate because difficult to find feedback by users needed

Page 12: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Outdated / invalid classification

Business methods: G06F 17/60 G06Q [2006.01]

in Espacenet: 0 WO docs with a:G06F17/60

in Patentscope: 1506 WO docs with G06F17/60 - e.g. WO2007004271 reclassified in Espacenet only to ECLA

Lesson : Reclassification following revision is still incomplete

Lesson : Classification data may be different in different databases

in Espacenet: many non-PCT min are not reclassified- e.g. CZ, UY, NZ, AR

not all PCT min is reclassified- e.g. only 678 of 14543 KR docs reclassified in ECLA/IPC

Page 13: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Outdated / invalid classification

Traditional medicine: A61K 35/78 A61K 36/.. [2006.01]

in Espacenet: 10413 docs still have 35/78 as ECLAonly 7412 thereof have 36/..

Lesson : Reclassification to valid IPC incomplete

Further example WO1998039019in Espacenet: A61K 36/02 as IPC-AL

A61K 35/80 as ECLAPatentscope: A61K 35/80 as IPC

Lesson : Classification data may be different in different databases

Page 14: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Example: Aircraft cargo loading logistics system

US 2005246132 A1 (3.11.2005)

US 7100827 B2 (5.9.2006)

DE 102005019194 A1 (24.11.2005)

FR 2871269 A1 (9.12.2005)

Classification data on front page

US A1 US B2 DE A1 FR A1

B64C 1/22 G06F 19/00 G06F 17/60 G06F 19/00

G06K 15/00 G07C 11/00 G06F 17/60

Lesson : Classification of granted patents may be very different

Lesson : Assessment of main classification varies

Varying classifications in family

Page 15: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

US A1 US B2 DE A1 FR A1 EspaceIPC

EspaceECLA

Depatis PatFT

B64C 1/20 X X X

B64C 1/22 X X X

B64D 9/00 X X X

B64D 9/00A X

G06K 15/00 X X

G06Q 10/00

G06Q 10/00D X

G06F 17/60 X X X

G06F 19/00 X X X X X

G07C 11/00 X X X

Lesson : classification data from subsequent publications may not be in MCD

Lesson : some reclassification data may not be in MCD; exist as ECLA only

Varying classifications in family

Page 16: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Varying classifications of single document

Example: WO2007126503

ECLA: G01L 19/00B (roll up to IPC: G01L 19/00)

IPC: G01L 19/02

Lesson : different views of different classifiers

US7258017 B1 (granted family member)

IPC: G01L 19/04

Lesson : classification of granted patents may be different

Page 17: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Current problems in classification (I): IPC consistency

• KR20070005367 A (Prio.: KR20050060661)• Multifocal lens and manufacture method thereof • IPC (AL):G02B3/10 • • JP2007017937 A (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02F1/13; G02B3/14; G02F1/1334 • • US2007008599 A (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02B5/32 • • CN1892258 A (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02B3/10 • • EP1742100 A1 (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02F1/1334

Lesson : classifiers may have different views of subject matter to be classified or interpret IPC groups differently

by courtesy of H. Wongel

Page 18: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Non-exhaustive classification

Example: Secondary scheme A01P [2006.01]

"Biocidal, pest repellant ,… activity of chemical compounds"

Espacenet:

not in ECLA !

A01P EP A01N EP

total 43361 1054 (2%)

99994 23330 (24% )

2007 2104 114 (5% )

10328 1040 (10% )

Lesson : incompatibility of IPC and ECLA may cause non-exhaustive classification

Page 19: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Non-exhaustive classification

Example: EP1881839

ECLA: A61K 36/487

IPC: A61K 36/00

Lesson : classifications could be more specific

Lesson : relevant classifications may not be given / available as IPC

Example: A61K 36/..

ECLA: 22440 documents

IPC: only 17847 thereof have a:A61K 36/..

Example: C12Q 1/68

Espacenet: > 100.000 docs

ECLA: > 40 subgroups

IPC: 0 subgroups

Page 20: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Causes/sources for deficiencies "wrong" or varying intellectual classification:

- rules too complicated- drawbacks of classification scheme (too much

overlap)- interpretation of subject matter- differing national practise- lack of expertise, diligence, time pressure

granted claims may differ incompatibility ECLA - IPC; USPC concordance tables lack or delay of reclassification:

- insufficient resources for intellectual reclassification data exchange / management problems data input (typos)

Page 21: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Options for improvement

on IPO level:- allocate resources- adapt / harmonize classification practise / training- develop classification assistance tools

on user level:- knowing deficiencies > adapt search strategies

on IPC level:- improve user-friendliness (e.g. definitions)- simplify IPC scheme, rules

More liberal approach when classifying ?One more symbol better than one symbol missing ?Do we need to be worried about varying classifications ?

Page 22: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Options for improvement

On MCD / database level: crosscheck content of databases pooling / compiling of classification data (in one searchable

field / on family level ?) of- classification data of fam members- subsequent publications- other sources (DE: ICP,…)

processing such compilations of classifications of different origin, e.g.:

compare classification of subsequent publications (A, B, ..)

> create "trusted" classifications (e.g. class (A) = class (B)) ?

Page 23: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Learn from / go WEB 2.0 ?

"Folksonomy", "social tagging", "cooperative, collaborative classification"

> include broader user community ?e.g. any searcher ?

> implement feedback channels ?

Page 24: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Are you satisfied with classification in A61N 1/00 ? Yes / No

Would you like to suggest further classifications: .....................................................................

Submit

Click opens

Page 25: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

Learn from / go WEB 2.0 ?

"Folksonomy", "social tagging", "cooperative, collaborative classification"

> include broader user community> compile varying views, ie classifications

process such data; create "trusted" classifications

broader participation in scheme development, in particular definitions ? Tagging of IPC entries ?

Thank you

Page 26: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?

More liberal approach when classifying ?One more symbol better than one symbol missing ?Do we need to be worried about varying classifications ?

Include broader user community ?e.g. any searcher ?

Implement feedback channels ?

Create "trusted" classifications (e.g. class (A) = class (B)) ?

Top priority: all documents should have at least one valid classification

Priority 1: documents have all appropriate symbols

Priority 2: documents have no inappropriate symbols