Upload
ella-harris
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
Quality of Classification
Optimum:
All documents pertaining to specific technical area (concept) are found by classification search
What to achieve ?
Recall = = 1# retrieved relevant documents
# existing relevant documents
For concepts defined in IPC:
documents have all appropriate symbols
< > Efficiency: documents have no inappropriate symbols
Priority 1:
Priority 2:
document is unclassified
has wrong / inappropriate classification
has outdated / invalid classification
non-exhaustive / incomplete classification
> appropriate symbols are missing
> given symbols are not specific enough
varying classifications of family members
excessive classification
Phenomenology of quality issues
Different aspects
individual document / publication- classification by publishing IPO- and by other IPOs, e.g. EPO > ECLA
DPMA > "ICP"JPO,… ?
> examiners create their own search files
different publication levels:- unexamined (unsearched) applications- granted patents
families: in MCD reclassification at family level
data in different databases
Unclassified documents
Published before 1.1.2006:
many documents in MCD still unclassified / not reclassified:
92% of all documents in MCD*
87% of all documents of EPO members
Published after 1.1.2006:97% of all documents in MCD91% of all WO
each week 6 - 8% of WO publications are not classified at all
*cf IPC/CE/40/4
0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
12.0%
06.07.06 14.09.06 23.11.06 01.02.07 12.04.07 21.06.07 30.08.07 08.11.07 17.01.08
Publication week
% u
ncla
ssifi
ed W
O d
ocs
/ w
eek
0
50
100
150
200
250
300
350
400
Num
ber
of u
ncla
ssifi
ed W
O /
wee
k
Percentage unclassifiedNumber unclassified
Unclassified WO documents
Publication week 50 (13.12.2007): 260 of 3272 (7.9%)
ISA
EP 218 (84%)
KR 27 (10%)
AU 5
US 5
RU 2
SE 2
CA 1
Receiving Office
US 177
IB 31
EP 26
GB 9
KR 3
DE 2
FR 2
IL 2
:
Unclassified WO documents
Lesson : There are still many documents without any valid classification
> Top priority: All documents should have at least one valid classification
Wrong classification
A61N 1/00 Electrotherapy; Circuits therefor
courtesy of M. Meier (Audi)
Wrong classification
B60K Arrangement or mounting of propulsion units or of transmissions in vehicles
Lesson : Completely wrong classifications do occur
courtesy of M. Meier (Audi)
Wrong classification
Lesson : Typos may occur; flaws of concordance tables
Example: WO2007126503
ISR: G01L 19/02
Espacenet: G10L 19/02
Wrong classifications: difficult to investigate because difficult to find feedback by users needed
Outdated / invalid classification
Business methods: G06F 17/60 G06Q [2006.01]
in Espacenet: 0 WO docs with a:G06F17/60
in Patentscope: 1506 WO docs with G06F17/60 - e.g. WO2007004271 reclassified in Espacenet only to ECLA
Lesson : Reclassification following revision is still incomplete
Lesson : Classification data may be different in different databases
in Espacenet: many non-PCT min are not reclassified- e.g. CZ, UY, NZ, AR
not all PCT min is reclassified- e.g. only 678 of 14543 KR docs reclassified in ECLA/IPC
Outdated / invalid classification
Traditional medicine: A61K 35/78 A61K 36/.. [2006.01]
in Espacenet: 10413 docs still have 35/78 as ECLAonly 7412 thereof have 36/..
Lesson : Reclassification to valid IPC incomplete
Further example WO1998039019in Espacenet: A61K 36/02 as IPC-AL
A61K 35/80 as ECLAPatentscope: A61K 35/80 as IPC
Lesson : Classification data may be different in different databases
Example: Aircraft cargo loading logistics system
US 2005246132 A1 (3.11.2005)
US 7100827 B2 (5.9.2006)
DE 102005019194 A1 (24.11.2005)
FR 2871269 A1 (9.12.2005)
Classification data on front page
US A1 US B2 DE A1 FR A1
B64C 1/22 G06F 19/00 G06F 17/60 G06F 19/00
G06K 15/00 G07C 11/00 G06F 17/60
Lesson : Classification of granted patents may be very different
Lesson : Assessment of main classification varies
Varying classifications in family
US A1 US B2 DE A1 FR A1 EspaceIPC
EspaceECLA
Depatis PatFT
B64C 1/20 X X X
B64C 1/22 X X X
B64D 9/00 X X X
B64D 9/00A X
G06K 15/00 X X
G06Q 10/00
G06Q 10/00D X
G06F 17/60 X X X
G06F 19/00 X X X X X
G07C 11/00 X X X
Lesson : classification data from subsequent publications may not be in MCD
Lesson : some reclassification data may not be in MCD; exist as ECLA only
Varying classifications in family
Varying classifications of single document
Example: WO2007126503
ECLA: G01L 19/00B (roll up to IPC: G01L 19/00)
IPC: G01L 19/02
Lesson : different views of different classifiers
US7258017 B1 (granted family member)
IPC: G01L 19/04
Lesson : classification of granted patents may be different
Current problems in classification (I): IPC consistency
• KR20070005367 A (Prio.: KR20050060661)• Multifocal lens and manufacture method thereof • IPC (AL):G02B3/10 • • JP2007017937 A (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02F1/13; G02B3/14; G02F1/1334 • • US2007008599 A (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02B5/32 • • CN1892258 A (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02B3/10 • • EP1742100 A1 (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02F1/1334
Lesson : classifiers may have different views of subject matter to be classified or interpret IPC groups differently
by courtesy of H. Wongel
Non-exhaustive classification
Example: Secondary scheme A01P [2006.01]
"Biocidal, pest repellant ,… activity of chemical compounds"
Espacenet:
not in ECLA !
A01P EP A01N EP
total 43361 1054 (2%)
99994 23330 (24% )
2007 2104 114 (5% )
10328 1040 (10% )
Lesson : incompatibility of IPC and ECLA may cause non-exhaustive classification
Non-exhaustive classification
Example: EP1881839
ECLA: A61K 36/487
IPC: A61K 36/00
Lesson : classifications could be more specific
Lesson : relevant classifications may not be given / available as IPC
Example: A61K 36/..
ECLA: 22440 documents
IPC: only 17847 thereof have a:A61K 36/..
Example: C12Q 1/68
Espacenet: > 100.000 docs
ECLA: > 40 subgroups
IPC: 0 subgroups
Causes/sources for deficiencies "wrong" or varying intellectual classification:
- rules too complicated- drawbacks of classification scheme (too much
overlap)- interpretation of subject matter- differing national practise- lack of expertise, diligence, time pressure
granted claims may differ incompatibility ECLA - IPC; USPC concordance tables lack or delay of reclassification:
- insufficient resources for intellectual reclassification data exchange / management problems data input (typos)
Options for improvement
on IPO level:- allocate resources- adapt / harmonize classification practise / training- develop classification assistance tools
on user level:- knowing deficiencies > adapt search strategies
on IPC level:- improve user-friendliness (e.g. definitions)- simplify IPC scheme, rules
More liberal approach when classifying ?One more symbol better than one symbol missing ?Do we need to be worried about varying classifications ?
Options for improvement
On MCD / database level: crosscheck content of databases pooling / compiling of classification data (in one searchable
field / on family level ?) of- classification data of fam members- subsequent publications- other sources (DE: ICP,…)
processing such compilations of classifications of different origin, e.g.:
compare classification of subsequent publications (A, B, ..)
> create "trusted" classifications (e.g. class (A) = class (B)) ?
Learn from / go WEB 2.0 ?
"Folksonomy", "social tagging", "cooperative, collaborative classification"
> include broader user community ?e.g. any searcher ?
> implement feedback channels ?
Are you satisfied with classification in A61N 1/00 ? Yes / No
Would you like to suggest further classifications: .....................................................................
Submit
Click opens
Learn from / go WEB 2.0 ?
"Folksonomy", "social tagging", "cooperative, collaborative classification"
> include broader user community> compile varying views, ie classifications
process such data; create "trusted" classifications
broader participation in scheme development, in particular definitions ? Tagging of IPC entries ?
Thank you
More liberal approach when classifying ?One more symbol better than one symbol missing ?Do we need to be worried about varying classifications ?
Include broader user community ?e.g. any searcher ?
Implement feedback channels ?
Create "trusted" classifications (e.g. class (A) = class (B)) ?
Top priority: all documents should have at least one valid classification
Priority 1: documents have all appropriate symbols
Priority 2: documents have no inappropriate symbols