Inventor Mobility Index Thorsten Doherr Zentrum für Europäische Wirtschaftsforschung Center of Economic Research, Mannheim Germany

Embed Size (px)

Citation preview

  • Slide 1
  • Inventor Mobility Index Thorsten Doherr Zentrum fr Europische Wirtschaftsforschung Center of Economic Research, Mannheim Germany
  • Slide 2
  • Two inventors with the same name are not neccessarily the same person Defining an inventor only by its name results in too much false mobility especially for inventors with common names Restricting the definition too much (i.e.: name and home address) will cancel any mobility You have to decide wether two patents from inventors with the same name are actually from the same person or from different persons that share the same name Mission: The complete patent data Problem: Tools: Mission
  • Slide 3
  • if they are inventing for the same applicant if they have the same home address if they are working with the same co-inventors if one is citing the other if they have patents in the same area of technology (ipc) Two inventors with the same name are the same person Plausibility Rules Inventor: A single inventor entry in a patent document Person: All inventors with a specific name that are linked by at least one plausibility rule
  • Slide 4
  • Harmonization of Applicants The SearchEngine is an in-house developed software package specialized in company address matching. It implements the following steps: Normalizing of the search fields (company name, address fields) by transforming them to uppercase, replacing special letters to their common (phonetic) representation (i.e.: UE, SS), compressing abbreviations (i.e.: S.P.A. SPA) and replacing special characters with blanks Creating a dictionary containing all the words of the search fields along with their occurrence. To preserve the context, every search field has its own chapter. The occurence is the base for the heuristic search algorithm. There are also supporting tables that link the dictionary entries back to the company table. The search algorithm separates a search term into words. Each word is associated with the occurrence counter of the appropriate dictionary entry. The occurrence reflects the identification potential of the word. A low occurrence has a high identity, because the resulting list of potential hits is small. SearchEngine
  • Slide 5
  • ENTRYOCCURSIDENTITY CORPORATION161/16 = 0.062500 ITALIA4911/491 = 0.002037 LEAR41/4 = 0.250000 SPA61191/6119 = 0.000163 DICTIONARY - Chapter: APPLICANT_NAME LearCorporationITALIAS.p.A. LEARCORPORATIONITALIASPA SUM 0.2500000.0625000.0020370.000163.3147000 79.441%19.860%0.647%0.052%100% NAMEIDENTITY LEAR CORPORATION ITALIA S.p.A.100.000% Lear Corporation Italia S.r.l.99.947% LEAR ITALIA SEATING S.p.A.80.139% Searching for Result Example of the SearchEngine Algorithm Harmonization of Applicants
  • Slide 6
  • The resulting list of matching pairs is not symmetric: A can be linked to B but it is not required that B is linked to A linked pairs create a network Network Analysis: if A is linked to B and B is linked to C, the analysis identifies the group A,B,C Re-iteration of the network analysis for too large groups with an increased cutoff limit for their members. Finalization A cutoff limit for the identity is applied to filter all results (i.e. 90%)
  • Slide 7
  • Creating phonetic representations of the name using the Metaphone algorithm by Lawrence Philips, 1990 Phonetic algorithms create unique representations for similar sounding words (names) and can be indexed direct database access Originally the results they delivered were manually validated because of their strong tendency for false positives automated matching requires an automated validation process Harmonization of Inventor Names Automated comparison of the retrieved names with the searched name The function is based on the least relative character position deltas and requires two words as parameters can not be used for index based direct access Needs phonetic indexing to quickly generate a list of potential candidates Tolerance for typing errors increases with the length of the words longer words are more prone to typing errors The SearchEngine is of limited use because it is most efficient with search terms consisting of multiple words the main problem are typing errors and misspellings
  • Slide 8
  • Harmonization of Inventor Names MRBRTN MAUROBARATONI MARIOBERRETTONI MARIOBERTINI MARIOBERTON MAUROBERTONI MAUROBORDIN FIRST NAMELAST NAME Example for the Metaphone Search
  • Slide 9
  • Harmonization of Inventor Names 01.0 CZARNITZKI CHARNIZKI 00.1250.250.3750.50.6250.750.8751.0 0 0 + + ++ + + + + + == 1.875 Example for the Least Relative Character Position Deltas
  • Slide 10
  • if they are inventing for the same applicant. if they have the same home address. if they are working with the same co-inventors. if one citing the other. if they have patents in the same area of technology (ipc). Two inventors with the same name are the same person Plausibility Rules Inventor: A single inventor entry in a patent document. Person: All inventors with a specific name that are linked by at least one plausibility rule.
  • Slide 11
  • All Patents of an Inventor Name 1 2 3 4 5 7 8 6 9 10 11 12 14 15 17 16 18 19 20 21 13 22
  • Slide 12
  • The Same Applicant Rule 1 2 3 4 5 7 8 6 9 10 11 12 14 15 17 16 18 19 20 21 13 22
  • Slide 13
  • The Same Home Address Rule 1 2 3 4 5 7 8 6 9 10 11 12 14 15 17 16 18 19 20 21 13 22
  • Slide 14
  • The Co-Inventor Rule 1 2 3 4 5 7 8 6 9 10 11 12 14 15 17 16 18 19 20 21 13 22
  • Slide 15
  • The Citation Rule 1 2 3 4 5 7 8 6 9 10 11 12 14 15 17 16 18 19 20 21 13 22
  • Slide 16
  • The IPC Rule 1 2 3 4 5 7 8 6 9 10 11 12 14 15 17 16 18 19 20 21 13 22
  • Slide 17
  • Italian Inventor Mobility Index patents from Italian applicants and inventors different harmonized inventor names nodes after applying the same applicant rule nodes after applying the co-inventor rule nodes after applying the citation rule 123356 49101 60268 nodes after applying the same home address rule53316 53572 52504 50276 nodes after applying the ipc rule Espace Bulletin (March 2010), EPO Patstat (September 2010), OECD Main Database: Citations: Development:Microsoft Visual FoxPro 9.0
  • Slide 18
  • FROMTO 12 15 21 25 27 51 52 67 72 76 Traversal of a Network Table 1 2 3 4 5 7 8 6 GROUPMEMBER 11 12 15 17 16