30
Uniting Patent Data Sources with BigQuery May 2018

Uniting Patent Data Sources with BigQuery · 2019. 11. 4. · Row standard_inchi_key compound_name site_name target_type pref_name 61 HUMNYLRZRPPJDN-UHFFFAOYSA-N Benzaldehyde Tyrosinase,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Uniting Patent Data Sources with BigQuery

    May 2018

  • A lot of data – from a lot of sources

    © 2017

    R&DData – Lab

    Results

    Trials & Test

    Results

    Sales & Market

    Data

    Private Corporate Data Private Vendor Data

    Public Data Providers Collaborators & Partners

  • Patent Data on BigQuery

    © 2018

    Not really “search” –Instead SQL Query/ Join

    Free Database – Google/IFI Public Patents

    Free Database – EBI Data

    Free Database – USPTO PAIR/PEDS

    Paid Table – IFI Data Enrichments & others

    Private Personal Data

    Private Corporate Data

    Go

    ogl

    e B

    igQ

    uer

    y Reports based on Connecting Data

    from Multiple Tables Together

  • What is BigQuery?

    • Enterprise Cloud Data Warehouse• BigQuery is Google's fully managed, petabyte scale, low

    cost enterprise data warehouse.

    • Low cost – but not free.

    • A powerful Big Data analytics platform • Analyze large datasets to find meaningful insights using

    familiar SQL

    • Join public, private, free and paid datasets – Including Patent Data

    © 2018

  • Example – Full Text Search

    © 2018

    IFI Global Patent Database

    IFI CLAIMS Direct

    "VEGF receptor kinase inhibitor“

    (vascular endothelial growth factor)

    2,294 Results

    Assign a “Relevance Score” and load into BigQuery as a private table.

  • VEGF_Receptor.LCPatents – private table,ordered by my private relevance field

    © 2018

    LCPatents (private)lc_number lc_score

    US-20130053409-A1 112

    WO-2009053737-A2 116

    US-20030144298-A1 126

    US-20030055006-A1 130

    JP-2002536414-A 132

    CN-103702990-A 137

    WO-2008078091-A1 142

    EP-2269603-A1 146

    EP-2783686-B1 146

    US-8410131-B2 146

  • LCPatents (private) – Public Data,ordered by my private relevance field

    © 2018

    LCPatents (private) patents-public-data.patents.publications

    Row lc_number lc_score text

    1 US-8778962-B2 148 Treatment of solid tumors with rapamycin derivatives

    2 WO-2013014448-A1 1482 - (2, 4, 5 - substituted -anilino) pyrimidine derivatives as egfrmodulators useful for treating cancer

    3 EP-2269604-A1 147 Treatment of solid tumours with rapamycin derivatives

    4 RU-2325906-C2 147 Cancer medical treatment

    5 EP-2269603-B1 146Treatment of breast tumors with a rapamycin derivative in combination with exemestane

    6 EP-2783686-A1 146Combination of a rapamycin derivative and letrozole for treating breast cancer

    7 EP-2269604-B1 146 Treatment of solid kidney tumours with a rapamycin derivative

    8 US-8877771-B2 146 Treatment of solid tumors with rapamycin derivatives

    9 EP-2269603-A1 146 Treatment of solid tumours with rapamycin derivatives

  • LCPatents (private) – Public Data,ordered by my private relevance field

    © 2018

    SELECT lc.lc_number, lc.lc_score, ttl.text

    FROM `patents-public-data.patents.publications` AS ppd,

    UNNEST(title_localized) AS ttl

    JOIN `civil-dolphin-136720.VEGF_Receptor.LCPatents` AS lc

    ON lc.lc_number = ppd.publication_number

    WHERE ttl.language = "en"

    ORDER BY lc.lc_score DESC

  • LCPatents - IFI Private Data: COUNT

    © 2017

    Row assignee Total1 AstraZeneca AB 102

    2 Astex Therapeutics Ltd 84

    3 Cancer Research Technology Ltd 33

    4 NeuPharma Inc 32

    5 Novartis AG 25

    6 ForSight Vision4 Inc 22

    7 Merck Sharp & Dohme Corp 21

    8 Kinex Pharmaceuticals LLC 19

    9 Eisai R&D Management Co Ltd 16

    10 Novartis Pharma GmbH 15

    11 University of Chicago 11

    12 Medimmune Ltd 10

    LCPatents IFI Data Enrichments

  • LCPatents - IFI Private Data: COUNT

    © 2017

    SELECT assignee, COUNT(IFI.publication_number) AS Total

    FROM `striking-joy-185312.IFIDataEnrichments.IFIDataEnrichments` AS IFI,

    UNNEST(original_assignee) AS assignee

    JOIN `civil-dolphin-136720.VEGF_Receptor.LCPatents` AS lc

    ON IFI.publication_number = lc.lc_number

    GROUP BY assignee ORDER BY Total DESC

  • © 2017

    LCPatents (private) – Public Data –IFI Paid Data

    Row lc_number lc_score family_idpriority_

    datecurrent_ assignee

    legal_ status

    1 JP-2004525899-A 152 26245731 20010219 Granted

    2 WO-2013014448-A1 148 46875901 20110727

    3 US-8778962-B2 148 26245731 20010219Novartis

    Pharmaceuticals Corp

    Active

    4 EP-2269604-A1 147 26245731 20010219 Novartis AG Granted

    5 RU-2325906-C2 147 26245731 20010219

    6 CA-2438504-A1 146 26245731 20010219 Novartis AG Granted

    7 CA-2438504-C 146 26245731 20010219 Novartis AG Active

    8 EP-2764865-A2 146 26245731 20010219Novartis Pharma

    GmbHWithdrawn

    Novartis AG

    9 EP-2762140-A1 146 26245731 20010219 Novartis AG Granted

    LCPatents patents-public-data IFI Data Enrichments

  • LCPatents (private) – Public Data –IFI Paid Data

    © 2017

    SELECT lc.lc_number, lc.lc_score, ppd.family_id, ppd.priority_date,

    IFI.current_assignee, IFI.legal_status

    FROM `striking-joy-185312.IFIDataEnrichments.IFIDataEnrichments` AS IFI

    JOIN `civil-dolphin-136720.VEGF_Receptor.LCPatents` AS lc

    ON IFI.publication_number = lc.lc_number

    JOIN `patents-public-data.patents.publications` AS ppd

    ON lc.lc_number = ppd.publication_number

    ORDER BY lc.lc_score DESC

  • SureChEMBL for LCPatents

    © 2017

    Row lc_number schembl_id smiles inchi_key field

    1 US-20100092474-A1 SCHEMBL8755 COC1=CC=C(CN)C=C1IDPURXSQCKYKIJ-UHFFFAOYSA-N

    5

    2 WO-2008044041-A1 SCHEMBL8755 COC1=CC=C(CN)C=C1IDPURXSQCKYKIJ-UHFFFAOYSA-N

    5

    3 WO-2008044045-A1 SCHEMBL8755 COC1=CC=C(CN)C=C1IDPURXSQCKYKIJ-UHFFFAOYSA-N

    5

    4 US-20090306079-A1 SCHEMBL8755 COC1=CC=C(CN)C=C1IDPURXSQCKYKIJ-UHFFFAOYSA-N

    5

    5 US-20070021494-A1 SCHEMBL8755 COC1=CC=C(CN)C=C1IDPURXSQCKYKIJ-UHFFFAOYSA-N

    5

    6 US-7329660-B2 SCHEMBL104340 CC(C)C1=CC(N)=CC=C1XCCNRBCNYGWTQX-UHFFFAOYSA-N

    5

    7 WO-2008002674-A2 SCHEMBL133876 COC1=CC(O)=C(C=O)C=C1WZUODJNEIXSNEU-UHFFFAOYSA-N

    5

    8 WO-2014037750-A1 SCHEMBL309636 CCOC1=CC(Br)=CC=C1[N+]([O-])=OSVFZXFVVGNPTEF-UHFFFAOYSA-N

    5

    9 US-20100092474-A1 SCHEMBL383820 COC1=CC(C(O)=O)=C(C=C1)C(O)=OJKZSIEDAEHZAHQ-UHFFFAOYSA-N

    5

    10 WO-2008044041-A1 SCHEMBL383820 COC1=CC(C(O)=O)=C(C=C1)C(O)=OJKZSIEDAEHZAHQ-UHFFFAOYSA-N

    5

    LCPatents Ebi_surechembl

  • SureChEMBL for LCPatents

    © 2017

    SELECT v.lc_number, ebi.schembl_id, ebi.smiles, ebi.inchi_key, ebi.field

    FROM `patents-public-data.ebi_surechembl.map` AS ebi

    JOIN `civil-dolphin-136720.VEGF_Receptor.LCPatents` AS v

    ON v.lc_number = ebi.patent_id

    WHERE ebi.field = "5"

    LIMIT 100

  • ChEMBL Compound, Site, Target

    © 2017

    Row standard_inchi_key compound_name site_name target_type pref_name

    61HUMNYLRZRPPJDN-UHFFFAOYSA-N

    Benzaldehyde Tyrosinase, Tyrosinase domain SINGLE PROTEIN Tyrosinase

    62HUMNYLRZRPPJDN-UHFFFAOYSA-N

    Benzaldehyde Tyrosinase, Tyrosinase domain SINGLE PROTEIN Tyrosinase

    63HUMNYLRZRPPJDN-UHFFFAOYSA-N

    Benzaldehyde Tyrosinase, Tyrosinase domain SINGLE PROTEIN Tyrosinase

    64WGQKYBSKWIADBV-UHFFFAOYSA-N

    Benzyl aminePhenylethanolamine N-methyltransferase, NNMT_PNMT_TEMT domain

    SINGLE PROTEINPhenylethanolamine N-methyltransferase

    65WGQKYBSKWIADBV-UHFFFAOYSA-N

    Benzyl aminePhenylethanolamine N-methyltransferase, NNMT_PNMT_TEMT domain

    SINGLE PROTEINPhenylethanolamine N-methyltransferase

    66WGQKYBSKWIADBV-UHFFFAOYSA-N

    Benzyl aminePhenylethanolamine N-methyltransferase, NNMT_PNMT_TEMT domain

    SINGLE PROTEINPhenylethanolamine N-methyltransferase

    67WGQKYBSKWIADBV-UHFFFAOYSA-N

    Benzyl aminePhenylethanolamine N-methyltransferase, NNMT_PNMT_TEMT domain

    SINGLE PROTEINPhenylethanolamine N-methyltransferase

    68XKJCHHZQLQNZHY-UHFFFAOYSA-N

    SID144208998Monoamine oxidase A, Amino_oxidase domain

    SINGLE PROTEIN Monoamine oxidase A

    69XKJCHHZQLQNZHY-UHFFFAOYSA-N

    SID144208998Monoamine oxidase B, Amino_oxidase domain

    SINGLE PROTEIN Monoamine oxidase B

  • ChEMBL Compount, Site, Target

    © 2017

    SELECT cs.standard_inchi_key, cr.compound_name, bs.site_name, td.target_type, td.pref_nameFROM `patents-public-data.ebi_chembl.target_dictionary_23` AS tdJOIN `patents-public-data.ebi_chembl.binding_sites_23` AS bsON td.tid = bs.tid

    JOIN `patents-public-data.ebi_chembl.predicted_binding_domains_23` AS pbdON bs.site_id = pbd.site_id

    JOIN `patents-public-data.ebi_chembl.activities_23` AS actON pbd.activity_id = act.activity_id

    JOIN `patents-public-data.ebi_chembl.compound_structures_23` AS csON act.molregno = cs.molregno

    JOIN `patents-public-data.ebi_chembl.compound_records_23` AS crON cr.molregno = cs.molregno

    WHERE cs.standard_inchi_key IN("JOXIMZWYDAKGHI-UHFFFAOYSA-N","XKJCHHZQLQNZHY-UHFFFAOYSA-N","RMVRSNDYEFQCLF-UHFFFAOYSA-N","WVDDGKGOMKODPV-UHFFFAOYSA-N","VODUKXHGDCJEOZ-YUMQZZPRSA-N","LGRFSURHDFAFJT-UHFFFAOYSA-N","WGQKYBSKWIADBV-UHFFFAOYSA-N","VOLRSQPSJGXRNJ-UHFFFAOYSA-N","WFQDTOYDVUWQMS-UHFFFAOYSA-N","HUMNYLRZRPPJDN-UHFFFAOYSA-N","RWZYAGGXGHYGMB-UHFFFAOYSA-N","DGJKKXAFDOWIQI-UHFFFAOYSA-N","KHBQMWCZKVMBLN-UHFFFAOYSA-N","KWOLFJPFCHCOCG-UHFFFAOYSA-N")

  • EBI – European Biomedical Institute on BigQuery

    © 2018

    ebi_chembl ebi_surechembl• Activities • Smiles

    • Assays • Inchi_Key

    • Components • Patent_ID

    • Compounds

    • Drug Indications

    • Drug Mechanisms

    • Molecules

    • Proteins

    • Targets

  • BigQuery Console

    © 2018

  • LCPatents – USPTO OCE Office Actions

    © 2018

    LCPatents (private) uspto_oce_office_actions

    1 of 1495 rows shown

    Rowpublication_

    numberapp_id action_type claim_numbers

    1 US-8298578-B2 13252942 1031,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26

    2 US-9556120-B2 14554495nonstatutory double patenting

    6,7,8,9,10,11,12,13,14,15,16,17

    3 US-9572800-B2 15086485 103 1,2,3,4,5,6,7,8,9,10,11,12,13

    4 US-9737544-B2 15000304nonstatutory double patenting

    1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20

    5 US-9707202-B2 15044424nonstatutory double patenting

    1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38

    6 US-9616050-B2 14840342nonstatutory double patenting

    23,24,25,26,27,28,29,30

    7 US-9707248-B2 15279361nonstatutory double patenting

    1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24

    8 US-8642610-B2 13366726 1129,11,12,13,14,15,16,17,18,19,20,21,35,48,49,50,51,52,53,112

    9 US-8673906-B2 13765850nonstatutory double patenting

    1,2,3,4,5,6,7,8,9,10,11,12

    10 US-8277830-B2 13252998 1031,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29

    LCPatents (private) uspto_oce_office_actions

  • USPTO OCE Office Actions

    © 2017

    SELECT patents.publication_number, oa.app_id, oa.action_type,

    oa.claim_numbers

    FROM `patents-public-data.uspto_oce_office_actions.rejections` AS oa

    JOIN `patents-public-data.uspto_oce_office_actions.match_app` AS match

    ON oa.app_id = match.app_id

    JOIN `patents-public-data.patents.publications` AS patents

    ON match.application_number = patents.application_number

    JOIN `civil-dolphin-136720.VEGF_Receptor.LCPatents` AS VEGF

    ON VEGF.lc_number = patents.publication_number

  • USPTO PTAB Cases

    © 2017

    SELECT ptab.PatentOwnerName, ptab.PatentNumber, ptab.PetitionerPartyName,

    ptab.TrialNumber, ptab.ProsecutionStatus, IFI.original_assignee

    FROM `patents-public-data.uspto_ptab.trials` AS ptab

    JOIN `patents-public-data.uspto_ptab.match` AS ptab_match

    ON ptab.ApplicationNumber = ptab_match.ApplicationNumber

    JOIN `patents-public-data.patents.publications` AS PUBLIC

    ON PUBLIC.application_number = ptab_match.application_number

    JOIN `striking-joy-185312.IFIDataEnrichments.IFIDataEnrichments` AS IFI

    ON IFI.publication_number = PUBLIC.publication_number

    JOIN `civil-dolphin-136720.VEGF_Receptor.LCPatents` AS VEGF

    ON VEGF.lc_number = IFI.publication_number

    RowPatentOwner

    NamePatentNumber

    Petitioner PartyName

    TrialNumberProsecution

    Statusoriginal_ assignee

    1 Lane et al 8410131Breckenridge

    Pharmaceutial, Inc.IPR2017-01592

    Notice OF Filing Date Accorded

    Novartis Pharmaceuticals

    Corp

  • Patent Data on BigQuery Enhances your Search Results

    © 2018

    Public PatentData(free)

    SureChEMBL(free)

    IFI DataEnhancements

    (paid)

    Office Actions(free)

    uspto

    uspto

    PAIR, PTAB(free)

    Private, On-Premise Data

    (e.g., Docket)

    Search ResultPortfolio

    orAny List

    A Better Search Report

    JOIN

  • Collaboration

    © 2017

    Share with google accounts or groups.

  • What this means to you

    • BigQuery does not replace your text, semantic or structure based search tools

    • BigQuery does let you make your search results more useful for:• Your Legal Team

    • Your Business Sponsors

    • Your Research Partners

    © 2018

  • SECRET Data Fields!

    © 2017

    VEGF patents with secret data code.

    The secret code cannot be visible to Google (even the Google Enterprise Cloud)

  • Tableau Desktop: Local Excel to BigQuery

    © 2017

  • Tableau Desktop + BigQuery

    © 2017

    BigQuery SQL Query Excel file on Desktoppub number

    Data join is created used publication_number. “Secret Code” is never transmitted to Google

  • Tableau Visualization

    © 2017

  • Resources

    • https://cloud.google.com/ - Google Cloud Platform> Launcher for Google Patents Public Data

    • Google Announcement

    • https://github.com/google/patents-public-data - GitHub Home for Google Patents

    • Public Patent Data Now Available on Google BigQuery - IFI Blog Post on BigQuery, with examples

    • IFI Data Enrichments – Information on IFI’s paid data enrichments

    • W3 Schools SQL - SQL Reference

    © 2018

    Support comes with an IFI Data Enrichments Subscription!

    https://cloud.google.com/https://cloud.google.com/blog/big-data/2017/10/google-patents-public-datasets-connecting-public-paid-and-private-patent-datahttps://cloud.google.com/blog/big-data/2017/10/google-patents-public-datasets-connecting-public-paid-and-private-patent-datahttps://www.ificlaims.com/news/view/blog-posts/public-patent-data-now.htmhttps://www.ificlaims.com/bigquery.htmhttps://www.w3schools.com/sql/sql_quickref.asp

  • Thank You!

    Janice Stevenson

    EVP – Client ServicesIFI CLAIMS Patent [email protected]@ificlaims.com

    © 2018