View
459
Download
0
Tags:
Embed Size (px)
Citation preview
Dr. Philipp Daumke
Analyze Text, Gain Answers
ABOUT AVERBIS
Founded: 2007
Location: Freiburg im Breisgau
Team: Domain- & IT-Experts
Focus: Leverage structured & unstructured information
Current Sectors: Pharma, Health, Automotive, Publishers & Libraries
PORTFOLIO
PRODUCTS:
CORE TECHNOLOGIES:
CHALLENGE
Exponential growth of data
• need for data-driven decisions
• limited human resources for analysis
New analytics tools needed for
• Semantic search and discovery
• Competitor analysis
• Identification of market trends
• IP landscaping
• Portfolio analysis
• …
Patent applications:
Medline articles:
� (Semi-)Automate patent categorization
with high precision
� Learning system
imitates the behavior of IP professionals
� Semantic search
Search for meanings, not just keywords
PATENT ANALYTICS
PATENT ANALYTICS
TerminologiesText Mining Rules
Text Mining Machine Learning
Patent Collection
TERMINOLOGY MANAGEMENT
Define the ‚semantic space‘ of your technology fields• Keywords
• Categories
• Hierarchies
• ….
Include relevant word lists from your company• Products
• Devices
• Companies
• Components
• Indications
• …
Reuse already existing terminologies on the market
TEXT MINING
Lung metastasis lung metastasis
lung metastases
metastases in the lung
metastases in the lower lobe of the lung
pulmonal metastates
pulmonal relapse of a metastasis
pulmonal filia
pulmonal filiae
lung filiae
lower lobe filiae
TEXT MINING
tumors tumour
cancer
carcinoma
lymphoma
endometrioma
astrocytoma
glioblastoma
seminoma
ALL
leukemia
TEXT MINING
PATENT CLASSIFICATION – MACHINE LEARNING
System learns how to fine-classify patents
�Observes and imitates human decision making
Advantages
• No explicit externalization of knowledge needed
• No rule-writing
• Better results
• System generalizes (higher recall)
• Statistical model can handle „noise“ better than rules
• Ambiguity and textual variations better handled
THE PROCESS OF MACHINE LEARNING
Labeling
• Up to 100 categories
• ~10-50 patents per category
• Hierarchical categories
• Multi-labeling
Learning
• Learn characteristic patterns in labeled data
• Lots of different classification algorithms
Prediction & Review
• Automatically map new patents to categories
• Confidence value for each category
• Different selection criteria
14
POWERFUL FRONTEND
Linguistic full text search
Lingustic
Filters
Patent Summary
Additional info, e.g. picture
Multilabel Classification
USE CASE1: LARGE-SCALE PATENT LANDSCAPING
• Goal: to semi-automatically categorize patents to the
company‘s technology landscape
• Technology Landscape: 35 Classes (8 main classes, 27 sub-
classes)
• 7.000 patents, 10 competitors
• Evaluation
– between automated judgement with expert judgement
– between two expert judgements (Interrator-Agreement)
USE CASE1: LARGE-SCALE PATENT LANDSCAPING
CONFUSION MATRIX
USE CASE1: LARGE-SCALE PATENT LANDSCAPING
Results Accuracy Time Savings
Automated, Scenario I 85% 70%
Automated, Scenario II 82% 80%
Manual (2 expert judges) 80%
Averbis Patent Analytics save up to 80% of time with
accuracy being on par with manual judges!
USE CASE2: RESEARCH LITERATURE RELEVANCY
• Goal: to automatically identify company‘s relevant
literature
• Rule set:
– Mentionings of company‘s indications, products, etc.
– Competitor products and indications
– „Testosterone, but only given externally“
– „Products shall not be found in an enumeration“
– …
PATENT ANALYTICS
Rule SetText Mining,
Machine LearningSearch, Analysis
Medline, Embase
VERAPAMIL
USE CASE2: RESEARCH LITERATURE RELEVANCY
Rule: Testosterone, but only given externally
USE CASE2: RESEARCH LITERATURE RELEVANCY
Rule: Ignore products listed in enumerations
USE CASE 3: SOCIAL MEDIA ANALYTICS
USE CASE 3: SOCIAL MEDIA ANALYTICS
USE CASE 3: SOCIAL MEDIA ANALYTICS
Main Challenge: what is positive, what is
negative?
– „Could somebody please remove the dead bird from the
balcony“?
– „From the breadcrumbs lying under the bed one could live for
ages“
– „The hotel is situated in the crowdiest party district of the town“
– „The toilets were that big that I couldn‘t sit down for …“
USE CASE4: PATIENT RECRUITMENT/DIAGNOSIS SUPPORT
Disease ProfilesInclusion/Exclusion Criteria
Categorization Visualization
Electronic Health Records
USE CASE4: PATIENT RECRUITMENT/DIAGNOSIS SUPPORT
USE CASE4: PATIENT RECRUITMENT/DIAGNOSIS SUPPORT