35
Bio-REGNET Developing an Ontology for the U.S. Patent System Siddharth Taduri, Hang Yu, Gloria T. Lau, Kincho H. Law, Jay P. Kesan Stanford University University of Illinois Urbana-Champaign 06/13/2011

Bio-REGNET Developing an Ontology for the U.S. Patent System

  • Upload
    jabari

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Bio-REGNET Developing an Ontology for the U.S. Patent System. Siddharth Taduri, Hang Yu, Gloria T. Lau, Kincho H. Law, Jay P. Kesan Stanford University University of Illinois Urbana-Champaign 06/13/2011. Problem Statement. Issued Patents and Applications. File Wrappers. Court Cases. - PowerPoint PPT Presentation

Citation preview

Page 1: Bio-REGNET Developing an Ontology for the U.S. Patent System

Bio-REGNETDeveloping an Ontology for the U.S. Patent System

Siddharth Taduri, Hang Yu, Gloria T. Lau, Kincho H. Law, Jay P. KesanStanford UniversityUniversity of Illinois Urbana-Champaign06/13/2011

Page 2: Bio-REGNET Developing an Ontology for the U.S. Patent System

Problem Statement

• Patent Validity and Enforcement Questions involves analysis of documents in various domains – World-wide Patents, PTO File Wrappers, Scientific Publications and Court documents

• The information is siloed into several diverse information sources

06/13/2011 2

Issued Patents

and Applicatio

nsCourt Cases

File Wrappers

Technical Publicatio

nsRegulations and Laws

Page 3: Bio-REGNET Developing an Ontology for the U.S. Patent System

•The sources are diverse in structure, formats, semantics and syntax

• How to develop a comprehensive knowledge of patents in a particular technological space?

Problem StatementIssued Patents

and Applicatio

nsCourt Cases

File Wrappers

Technical Publicatio

nsRegulations and Laws

Specific Technical Domain

06/13/2011 3

Page 4: Bio-REGNET Developing an Ontology for the U.S. Patent System

Patents Documents

Over 7 million U.S. patents

In 2009, 485,312 patent applications were filed

Information is contained in various sections of the documents; a full-text search alone is not sufficient – other metrics such as classification, citations etc. need to be considered

Documents are available in HTML Format and can be easily parsed

06/13/2011 4

Page 5: Bio-REGNET Developing an Ontology for the U.S. Patent System

Court Cases

Court Cases are not very well structured!

Comparatively more difficult to parse information

PACER – an electronic system to access databases for U.S. Courts - requires one to know party/assignee name, case number/type, etc. which may not be known

927 F.2d 1200 (1991)

AMGEN, INC., Plaintiff/Cross-Appellant,v.

CHUGAI PHARMACEUTICAL CO., LTD., and Genetics Institute, Inc., Defendants-Appellants.

Nos. 90-1273, 90-1275.United States Court of Appeals, Federal Circuit.

March 5, 1991.

Suggestion for Rehearing Declined May 20, 1991.… …

Before MARKEY, LOURIE and CLEVENGER, Circuit Judges.…THE PATENTSOn June 30, 1987, the United States Patent and Trademark Office (PTO) issued to Dr. Rodney Hewick U.S. Patent 4,677,195, entitled "Method for the Purification of Erythropoietin and Erythropoietin Compositions" (the '195 patent). The patent claims both homogeneous EPO and compositions thereof and a method for purifying human EPO using reverse phase high performance liquid chromatography. The method claims are not before us. The relevant claims of the '195 patent are:1. Homogeneous erythropoietin characterized by a molecular weight of about 34,000

daltons on SDS PAGE, movement as a single peak on reverse phase high performance liquid chromatography and a specific activity of at least 160,000 IU per absorbance unit at 280 nanometers.

* * * * * *3. A pharmaceutical composition for the treatment of anemia comprising a therapeutically

effective amount of the homogeneous erythropoietin of claim 1 in a pharmaceutically acceptable vehicle.

4. Homogeneous erythropoietin characterized by a molecular weight of about 34,000 daltons on SDS PAGE, movement as a single peak on reverse phase high performance liquid chromatography and a specific activity of at least about 160,000 IU per absorbance unit at 280 nanometers.

06/13/2011 5

Page 6: Bio-REGNET Developing an Ontology for the U.S. Patent System

Patent File Wrappers File Wrappers are folders which contain all documents exchanged between a patent applicant and the patent office

Every File Wrapper is different! No standardized ordering of events

The relevant information is embed within lots of irrelevant text

File Wrappers are available as images requiring additional processing in order to extract text

Events Text

06/13/2011 6

Page 7: Bio-REGNET Developing an Ontology for the U.S. Patent System

There are many aspects of these documents which can be utilized; especially the cross-referencing between the documents

PATENT

United States Patent, 5,955,422September 21, 1999

Production of erthropoietin

Abstract: Disclosed are novel polypeptides possessing part or all of the primary structural conformation and one or more of the biological properties of mammalian erythropoietin ("EPO") …

Inventors: Lin; Fu-Kuen (Thousand Oaks, CA)Assignee: Kirin-Amgen, Inc. (Thousand Oaks, CA) Appl. No.: 08/100,197Filed: August 2, 1993.

COURT CASE

314 F.3d 1313 (2003)AMGEN INC., Plaintiff-Cross Appellant v. HOECHST MARION ROUSSEL, INC. (now known as Aventis Pharmaceuticals, Inc.) and Transkaryotic Therapies, Inc., Defendants-Appellants.

…Plaintiff-Cross Appellant Amgen Inc. is the owner of numerous patents directed to the production of erythropoietin ("EPO"), …alleging that TKT's Investigational New Drug Application ("INDA") infringed United States Patent Nos. 5,547,933; 5,618,698; and 5,621,080. The complaint was amended in October 1999 to include United States Patent Nos. 5,756,349 and 5,955,422, which issued after suit was filed.

FILE WRAPPERU.S. Patent 5,955,422

Claims 61-63 are rejected under 35 U.S.C. § 103 as being unpatentable over any one of Miyake et al., 1977 (R)

…In accordance with the provisions of 37 C.F.R. §1.607, the present continuation is being filed for the purpose of

Publication Database

REGULATIONS:U.S. Code Title 35, C. F. R Title 37, M. P.

E. P. …

BIOPORTAL: DOMAIN KNOWLEDGE

Cross-Referencing

06/13/2011 7

Page 8: Bio-REGNET Developing an Ontology for the U.S. Patent System

Basis on Developing Patent System Ontology Established semantics allow us to reason over the classes, properties and instances to infer new facts

Documents can be connected to form a network similar to citation networks. Only now we have not just citations, but other metadata such as co-inventorships, technological classification and other cross-domain relevancy metrics between documents (ex: patents occurring in court cases etc.)

Allows us to perform link analysis using algorithms such as Page Rank to establish importance

Can develop rules to perform additional inferences over the knowledge

06/13/2011 8

Page 9: Bio-REGNET Developing an Ontology for the U.S. Patent System

Single Domain• Return all patent documents which contain the keyword “erythropoietin” in the “claims”• Return all court cases which involve “Amgen_Inc” either as the plaintiff, defendant of both, and from the court “courtA”

Multi-domain:• Return all patents which contain the keyword – “erythropoietin” in the “claims”, which have been challenged in the courts

The complexity of the queries, depends on the user’s requirement

In general, the ontology should be able to answer:1. Textual queries2. Metadata queries, with numeric filters3. Multi-source queries

Competancy Questions

5/24/2011 9

Page 10: Bio-REGNET Developing an Ontology for the U.S. Patent System

Class Hierarchy - I

06/13/2011 10

Page 11: Bio-REGNET Developing an Ontology for the U.S. Patent System

Class Hierarchy - II

06/13/2011 11

Page 12: Bio-REGNET Developing an Ontology for the U.S. Patent System

Class Hierarchy - III

06/13/2011 12

Page 13: Bio-REGNET Developing an Ontology for the U.S. Patent System

Parsing the document to instantiate the Ontology

Case 1

Amgen ..

Chugai ..

hasPlaintiff

hasDefendant

Documents are automatically parsed using a regular expression based script

Separate scripts needed for each document domain

Ontology is automatically instantiated using the Protégé-OWL API

06/13/2011 13

Page 14: Bio-REGNET Developing an Ontology for the U.S. Patent System

Simple questions can be answered by currently existing systems

Return all Patents by the Inventor – “Fu-Kuen Lin” Return all Court Cases prior to yyyy-mm-dd Return all the patent documents which contain the keyword

“erythropoietin” in the Claims and Assigned to “Amgen_Inc”

The Patent System Ontology is intended to answer simple queries as well as complex queries which span more than a single information domain Return a court case which involves 3 or more patents From a file wrapper, identify the patents involved in an interference,

display information about the inventor, assignee, and claims of that patent. Further, enlist the other patents the inventor owns, if any.

Note: The patent system ontology allows inferring details about one document type (patents), based on the information from other document types (file wrappers)

What can you ask the Patent Ontology?

06/13/2011 14

Page 15: Bio-REGNET Developing an Ontology for the U.S. Patent System

Return all the patent documents which contain the keyword “erythropoietin” in the Claims and Assigned to “Amgen_Inc”. What technology classes do these patent documents belong to?

SPARQL Query:

Example Query

Patent Inventor

5856298Strickland_Thomas_

W5885574 Elliott_Steven_G7304150 Egrie_Joan_C7304150 Elliott_Steven_G7304150 Browne_Jeffrey_K7304150 Sitney_Karen_C7217689 Elliott_Steven_G7217689 Byrne_Thomas_E6319499 Elliott_Steven_G5756349 Lin_Fu-Kuen

SELECT DISTINCT ?patent ?inventorFROM <http://localhost:8890/PatentOntologyInferred>WHERE{  ?patent a ont:Patent .

?patent ont:hasAbstract ?abs .?abs ont:resourceVal ?val .?val bif:contains "erythropoietin" .

 ?patent ont:hasAssignee ont:Amgen_Inc .

 ?patent ont:hasInventor ?inventor

} Limit 10

06/13/2011 15

Page 16: Bio-REGNET Developing an Ontology for the U.S. Patent System

54 Classes, 40 Properties and over 15,000 individuals from 1150 patents, 30 court cases and one partially instantiated file wrapper

Used Protégé-OWL to edit the ontology and Protégé-OWL API to programmatically instantiate physical documents

Can query any SPARQL endpoint such as Protégé or Virtuoso’s Triple Store

Can also use SWRL to query (We haven’t developed SWRL query rules)

So Far …

06/13/2011 16

Page 17: Bio-REGNET Developing an Ontology for the U.S. Patent System

Use-Case: Erythropoietin

5 Core patents – U.S. Patents 5,621,080, 5,756,349, 5,955,422, 5,547,933, 5,618,698

135 directly related patents (through citations) form our gold standard for computing formal measures such as Precision and Recall

Total patent corpus of 1150 patents

Identified over related 3000 publications through citations. These are available on PubMed and can be accessed through Entrez – A tool that provides a search interface to PubMed database

Around 30 court cases, patent litigation involving major companies including Amgen, Hoechst Marion Roussel, Inc., Transkaryotic Therapies, Inc.

Current Corpus : experimental platform to test the overall effectiveness of the framework

06/13/2011 17

Page 18: Bio-REGNET Developing an Ontology for the U.S. Patent System

Querying BioPortal to Extract Concepts and Terms

06/13/2011 18

Page 19: Bio-REGNET Developing an Ontology for the U.S. Patent System

Original Term: Erythropoietin

Synonyms: Erythropoietin, Recombinant Erythropoietin, erythropoietin receptor binding, Hematopoietin, Recombinant EPO, Erythrocyte Colony Stimulating Factor, Epoetin, EPO …

Children: Darbopoietin Alfa, Epoetin Alfa, Epoetin Beta …

Parents: Colony Stimulating Factors, cytokine receptor binding, recombinant hematopoietic growth factors…

Grand-Parents: hematopoietic growth factor, receptor binding, recombinant growth factor …

An appropriate ranking function is to be applied to balance the more general terms. Heuristically, we assign a higher weight to synonyms, and a lower weight as we traverse away from the concept node

Resulting Query: “original term” OR [synonyms]^weight OR [children]^weight OR ….

Expanded Query

06/13/2011 19

Page 20: Bio-REGNET Developing an Ontology for the U.S. Patent System

1. Use bio-ontologies to expand user’s query, covering broader terms and concepts

2. Search document domain using expanded query3. Use patent system ontology’s properties to relate documents (from all

document domains)4. Support user feedback to ensure search progresses in right directions

Current prototype framework

Patent System Ontology

06/13/2011 20

Page 21: Bio-REGNET Developing an Ontology for the U.S. Patent System

Querying with SPARQL

06/13/2011 21

SELECT ?subject ?predicate ?objectWHERE {?subject ?predicate ?object }

VariablesOperation

Triples

SPARQL is a query language for RDF

Syntactically very similar to SQL – for relational databases

Any number of variables can be specified

Many triples can be used in conjunction to form more complex queries

We will use Virtuoso’s triple store to query the ontology

Page 22: Bio-REGNET Developing an Ontology for the U.S. Patent System

SELECT DISTINCT ?casesWHERE {

?cases a :CourtCase .?cases :hasBody ?caseBody .?caseBody :resourceVal ?

comment .

FILTER REGEX (?comment, "erythropoietin", "i") .

}

Court Cases with “Erythropoietin”

06/13/2011 22

Case_4: Amgen v/s Chugai …Case_5: Amgen v/s Genetics …Case_2: Amgen v/s Chugai …Case_3: Amgen v/s F. Hoffma…….

30 Cases retrieved

Page 23: Bio-REGNET Developing an Ontology for the U.S. Patent System

SELECT DISTINCT ?patentsWHERE {

?cases a :CourtCase .?cases :hasBody ?caseBody .?caseBody :resourceVal ?

comment .FILTER REGEX (?comment,

"erythropoietin", "i") .

?cases :patentsInvolved ?patents .}

Patents Involved in the Court Cases

06/13/2011 23

54118685621080: Production of Erythropoietin5547933: Production of Erythropoietin5618698: Production of Erythropoietin5756349: Production of Erythropoietin5955422: Production of Erythropoietin5441868470300846771955322837

Core Patents are in bold

Page 24: Bio-REGNET Developing an Ontology for the U.S. Patent System

SELECT DISTINCT ?docWHERE {

:FileWrapper_5955422 :contains ?doc .

?doc :hasDate ?date}ORDER BY ?date

List of Events in the File Wrapper

07_60974107_609741_Amendment_107_609741_Interference_107_609741_Rejection_107_957073_Amendment_1…P5955422 (Issued Patent)

06/13/2011 24

Page 25: Bio-REGNET Developing an Ontology for the U.S. Patent System

SELECT DISTINCT ?claimWHERE {

:07_609741 :hasClaim ?claim .}ORDER BY ?claim

Initial Claims of File Wrapper

07_609741_claim_107_609741_claim_207_609741_claim_3…07_609741_claim_60

A purified and isolated polypeptide having part or all of the primary structural conformation and one or more of the biological properties of naturally occurring erythropoietin and characterized by being the product of procaryotic or eucaryotic expression of an exogenous DNA sequence.

06/13/2011 25

Page 26: Bio-REGNET Developing an Ontology for the U.S. Patent System

SELECT DISTINCT ?claimWHERE {:07_609741_Interference_1 :InterferingClaims ?claimInt .:07_609741_Interference_1 :affectedClaims ?claim .

}ORDER BY ?claim

Summary of Interference RecordP4879272_claim_2P4879272_claim_3

An erythropoietin-containing, pharmaceutically-acceptable composition wherein human serum albumin is mixed with erythropoietin either during the preparation of said composition or just before administration thereof. 

07_609741_claim_6007_609741_claim_6107_609741_claim_62

An erythropoietin-containing,pharmaceutically-acceptable preparation wherein human serumalbumin is mixed with erythropoietin.06/13/2011 26

Page 27: Bio-REGNET Developing an Ontology for the U.S. Patent System

One needs to know SPARQL in order to query

One needs to know the semantics of the ontology such as the relations, domain and range restrictions etc.

Performing manual querying can be very time consuming. Automation is needed

Domain specific semantics need to be separately integrated

Probabilistic weighing – ranking inventors, assignees, patents etc. is not possible using the SPARQL endpoint

We are developing a user-friendly automated tool to search the patent system

Current Limitations

06/13/2011 27

Page 28: Bio-REGNET Developing an Ontology for the U.S. Patent System

Include other information sources – publications, regulations, laws

Develop automated tool and search framework (Currently under development)

Experiment with more use cases outside of the biomedical domain

Future Work

06/13/2011 28

Page 29: Bio-REGNET Developing an Ontology for the U.S. Patent System

Tool Snapshot

06/13/2011 29

Page 30: Bio-REGNET Developing an Ontology for the U.S. Patent System

Acknowledgement

06/13/2011 30

This research is partially supported by NSF Grant Number 0811975 awarded to the University of Illinois at Urbana-Champaign and NSF Grant Number 0811460 to Stanford University. Any opinions and findings are those of the authors, and do not necessarily reflect the views of the National Science Foundation.

Page 31: Bio-REGNET Developing an Ontology for the U.S. Patent System

Please Visit the System

Demonstration

Thank You!Questions?

06/13/2011 31

Page 32: Bio-REGNET Developing an Ontology for the U.S. Patent System

Extra Slides

06/13/2011 32

Page 33: Bio-REGNET Developing an Ontology for the U.S. Patent System

SELECT DISTINCT ?inv ?class ?assigneeWHERE {

?cases a :CourtCase .?cases :hasBody ?caseBody .?caseBody :resourceVal ?comment .FILTER REGEX (?comment,

"erythropoietin", "i") .?cases :patentsInvolved ?patents .

?patents :hasInventor ?inv . ?patents :hasUSClass ?class . ?patents :hasAssignee ?assignee .

}

Common US Classes, Inventors and Assignee

06/13/2011 33

?invLin_Fu-KuenHewick_Rodney_MSeehra_Jasbir_S?classUSPC 530/380USPC 530/399USPC 530/397USPC 514/8USPC 435/69_6USPC 530/835USPC 530/388_7…

?assignee

Kirin-Amgen_IncGenetics_Institute_Inc…

Page 34: Bio-REGNET Developing an Ontology for the U.S. Patent System

34

SELECT DISTINCT ?forw ?backwWHERE {

?cases a :CourtCase .?cases :hasBody ?caseBody .?caseBody :resourceVal ?

comment .FILTER REGEX (?comment,

"erythropoietin", "i") .?cases :patentsInvolved ?patents .

?patents :hasCitation ?forw?backw :hasCitation ?patents .

}

Extracting Citations

06/13/2011

Results

6541033471047343585354558005446562447570064399216455800638658013033753

Page 35: Bio-REGNET Developing an Ontology for the U.S. Patent System

Generated Results

06/13/2011 35

Around 30 court cases

Several patents including core patents and forward/backward citations

Can search patents by the inventors, assignees and/or US class identified

What’s more? Can go search court cases with new keywords or information gathered

Gathered ResultsCase_4: Amgen v/s Chugai …Case_5: Amgen v/s Genetics In.Case_2: Amgen v/s Chugai …….5621080: Production of Erythropoietin5547933: Production of Erythropoietin5618698: Production of Erythropoietin5756349: Production of Erythropoietin5955422: Production of Erythropoietin… 5441868470300846771955322837…

Patents with Inventor: Lin_Fu-KuenPatents owned by Genetics_Inc…