48
Deep Distillation from Text Naveen Ashish University of Southern California & Cognie Inc., March 18 th 2014

Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Deep Distillation from Text Naveen Ashish University of Southern California & Cognie Inc., March 18th 2014

Page 2: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

This  is  about  …..  § “DEEP  TEXT  DISTILLATION”  § The  hard  nut  of  having  computers  “understand”  natural  language  (text)  ….  §   Pushing  the  boundaries  of  what  we  can  achieve  ….  

"It's    (the  problem  of  computers  understanding  natural  language)  ambi<ous  ...in  fact  there's  no  more  important  project  than  understanding  intelligence  and  recrea<ng  it.“  -­‐  Ray  Kurzweil  (2013)  

Alan  Turing  based  the  Turing  Test  en<rely  on  wriDen  language….To  really  master  natural  language  …that’s  the  key  to  the  Turing  Test–to  a  human  requires  the  full  scope  of  human  intelligence.  …So  the  point  is  that  natural  language  is  a  very  profound  domain  to  do  ar<ficial  intelligence  in.  -­‐  Ray  Kurzweil  (2013)  

Page 3: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Why  ….  

§   the  problem  is  far  from  solved  …..  !!!!  

search

text analytics

big data analytics

health informatics

social-media intelligence

Page 4: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

IntroducSon  

§ About  myself  § Associate  Professor  (InformaScs),  Keck  School  of  Medicine,  University  of  Southern  California  

§ Cognie  Inc.,    

§ Work  leverages  § InformaSon  extracSon  work  and  systems  developed  at  UC  Irvine  §  XAR,  UCI-­‐PEP  

§ Advisory  consulSng  engagements  with  several  companies  and  start-­‐ups  

Page 5: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Outline  

§ Deep  disSllaSon:  What  is  and  why    § State-­‐of-­‐the-­‐art  § Fundamentals  § Approach    § Details  § Expressions,  EnSSes,  SenSment  

§ Case  studies  § Retail,  Health,  Risk  assessment  

§ Conclusions      

Page 6: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

What is “Deep” text distillation ?

Page 7: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Deep  DisSllaSon  

§ The  abstract,  not  explicitly  menSoned  !  § What  falls  in  this  category  § Expressions  § Contextual  senSment  § Aspect  classificaSon  

I think you need better chefs à SUGGESTION

The mocha is too sweet à NEGATIVE

I used to take Lipitor for …à PERSONAL EXPERIENCE

The dim lights have a cozy effect ….à AMBIENCE

Page 8: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

A  Common  IntersecSon  

§ DisSll  at  sentence  level    § Aggregate  to  enSre  feedback,  post,  comment  or  thread  

§ Three  primary  elements  § Expression/Intent  § EnSSes/Aspects  (and  Classes)  § SenSment  

Page 9: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Why  Deeper  ?  

§   Goal:    Get  acSonable  insights  from  data  !  §   Hypothesis:  Deeper  extracSon  à  Beaer  insights  !  

The top advice items advised for skin rash are aloe vera, vitamin E oil and oatmeal

Complaints comprise 36% of the overall feedback with top issues being slow service, drinks and coffee

Page 10: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Context  

§ COGNIETM:  A  PLATFORM  for  text  analyScs  

COGNIE TM

XAR UCI-PEP

SHIP SURVEY ANALYTICS

RETAIL ANALYTICS

RISK ASSESSMENT

Page 11: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Expressions  

§ Beyond  enSSes  and  senSment  :  EXPRESSSIONS  § EXPRESSIONS  § Introduced  in  [Ashish  et  al,  2011]  

Page 12: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Expressions

You should try Vitamin E oil … à ADVICE

..I have had arthritis since 1991… à EXPERIENCE

HEALTH

..for me lipitor worked like a charm… à OUTCOME

Page 13: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Expressions

…showers had no hot water !… à COMPLAINT

..you should have more veggie options… à SUGGESTION

RETAIL/ENTERPRISE

..meats on special this weekend… à ANNOUNCEMENT

..this is the best store on the west side… à ADVOCACY

There is hardly any evidence to suggest a link between salt and diabetes à -

This results confirm that high intake of salt leads to increase in BPà +

RISK ASSESSMENT

Page 14: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

The Landscape

Page 15: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Text  AnalyScs  Spectrum  

§ Wide  offering  of    § Text  analyScs  engines  § Text  analysis  tools  –  many  open-­‐source  

§ Largely  sSll  for  “spofng  things”    § enSSes,  concepts,  senSment,  topics,  emoSons  ….  

§ Going  deeper  § Luminoso  § Aaensity  (Intents)  

§ Deep  Learning  for  SenSment  § Stanford  §  Recursive  Neural  Networks  

Page 16: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Approach

Page 17: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Approach  

natural language processing

machine learning

semantics

Page 18: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Architecture: COGNIE TM Platform

Segmentation

POS Tagging

Entity extraction

Anaphora

Parsing

Gram analysis

Existing (DMOZ, SNOMED,UMLS)

Creation

Declarative

Naïve-Bayes

MaxEnt

TFIDF

CRF

RNN Deep Learning

ENSEMBLE

NLP

Machine Learning

Knowledge Engineering

Page 19: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

The  Indicators:  “Give  Aways”  

§ A  combina<on  of  mul<ple  types  of  elements  !  

…showers had no hot water !… COMPLAINT

(You) should have more veggie options… SUGGESTION

..i have been on lipitor… EXPERIENCE

..this is the best store on the west side… ADVOCACY

Page 20: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Approach:  Given  Indicators  

§ NLP  § IdenSficaSon  of  individual  elements  §  Unsupervised  

§ RelaSonships  between  elements  

§ SemanScs  § IdenSficaSon  of  individual  elements  §  Knowledge  driven  

§ Machine  Learning  ClassificaSon  § Combine  elements  à  classify  

Page 21: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Natural  Language  Processing  

§   UIMA  and  GATE  §   Stanford  NLP  Tools  § POS  tagging  §   Parsing  §   NE  Recognizer  §   Geo-­‐tagger  §   ….  

Page 22: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Natural  Language  Processing  

§   Text  SegmentaSon  § In  many  cases  the  “unit”  if  disSllaSon  is  a  sentence  

§   SegmentaSon  §   UIMA  (or  GATE)  §   Custom  

§   Complex  sentence  segmentaSon  §   Breakup  into  individual  clauses  

Page 23: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

NLP  

§   Part-­‐of-­‐speech  tags  are  key  indicators    § Expression  disSllaSon  

§   EnSty  extracSon  § Names,  LocaSons,  OrganizaSons  

§   Parsing  § If  required  

§   Anaphora  

Page 24: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

NGram  Analysis  

§ Unigram  and  Bigram  analysis  § Obtain  § Grams  § Frequency  § Entropy  

§ Grams  of  tokens  as  well  as  POS  Paaerns  § VB  VBD  

Page 25: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Before  Automated  ClassificaSon:  Manual  Paaerns  

§ SoL:  Sequences  of  Labels  § Labels  § LEX-­‐FOODADJ  §  spicy  

§ LEX-­‐EXCESS  §  too,  very  

§ ONT-­‐FOOD  § POS-­‐NOUN  

§ Sequences  (Paaerns)  § ANY  LEX-­‐EXCESS  LEX-­‐FOODADJ  ANY  à    § POS-­‐VB  POS-­‐MD  ….  

Page 26: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

ClassificaSon:  Machine  Learning  

§   ClassificaSon  tasks  § Expression  § (Contextual)  SenSment  § Aspect  category  

§ Frameworks  § Weka    § Mallet  

Page 27: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Baseline  Classifiers  

§   Mallet  and  Weka  § NaiveBayes  § MaxEnt  § CRF    

§   Gram-­‐based  § Uni,  Bi  and  Trigram  features  

§ Baseline  § ~  10%  accuracy  

Page 28: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Expression  ClassificaSon:  Features  

§   Features  § Polar  words  § PunctuaSons  § Ngrams  § POS  paaerns  § Length  !  § Beginning    § Ontology  § …  

Page 29: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Classifiers  

§   Trees  § Decision  Tree  (J48)  

§ FuncSons  § LogisSc  Regression  § SVM  

§ Sequence  Tagging  § CRF:  CondiSonal  Random  Fields  

Page 30: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Expression  ClassificaSon:  Results  

§ Have  achieved  75%  precision  and  recall  for  all  expressions  considered  

§ Factors  § Feature  engineering  § Classifier  selecSon  § Knowledge  engineering  

Page 31: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Contextual  SenSment  

§   (Just)  polar  words  can  be  misleading  !  § Polar  words  many  not  be  present  at  all  !  § CombinaSon  of  elements  

The mocha is too sweet

Wait time is over an hour

Aisles are too narrow

Service is slow

Page 32: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

SemanScs:  Ontologies  §   Health  § Drugs  § CondiSons  § Procedures  § Symptoms  § …  

§ Retail  (Dining)  § Food/Entrees  § Service  § Ambience  § ….  

Page 33: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Leverage  Exis<ng  Knowledge  Sources  

§ Health  informaScs  § UMLS  §  NCI  Thesaurus  

§ SNOMED  § Retail    § DMOZ  

§ Many  other  § Freebase  § Wikipedia,  DBPedia  

§ OpenData  § data.gov    

Page 34: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Knowledge  Engineering  Tools  

§ “Mini”  ontology  creaSon  § API  access  § Freebase  § BioPortal  

§ Wrappers  § DMOZ,  ….  

Page 35: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

PracScal  Requirements  

§ Confidence  Measures  § Below  threshold  routed  to  manual  transcripSon  teams  

§ Polarity  § Snippets  

Page 36: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Open-Source Leverage

Page 37: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

COGNIE  TM  :  Open  Source  Tools  § Framework  § UIMA  

§ ClassificaSon  § Weka  § Mallet  

§ NLP  § Stanford  tools  

§ Indexing  § Lucene  

§ Databases  § MySQL,  MongoDB  

§ Knowledge  Engineering  § Protégé    

Page 38: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Select Case Studies

Page 39: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Case  Study:  Health  InformaScs  

Page 40: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Insights from, for, by Patients

Page 41: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

DisSllaSon  

Page 42: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Case  Study:  Retail  &  Survey  AnalyScs  

§ Feedback  § Direct,  device  collected  § Social-­‐media  

§ Typically  short,  few  sentences  § Strong  requirement  for  aspect  classificaSon  § [Food,Service,Ambience,Pricing,Other]  

§ NegaSve  :  “Immediate”  vs  “Long  Term”  classificaSon  

…food was awesome, service needs improvement ….

you need to be open longer !

Page 43: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Case  Study:  Risk  Assessment  

§   Biomedical  Literature  Abstracts  § CorrelaSon  direcSon  (+  -­‐)  § Subject  § ArScle  type  

§ Features  § Clauses  § NegaSon  and  Triggers  § SemanSc  Heterogeneity  

Page 44: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Performance

Page 45: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

MapReduce  

§ Throughput  can  be  an  issue  § Complex  language  processing  algorithms  § Large  ontologies  in  some  cases  

§ Hadoop  MapReduce  § [Kahn  and  Ashish,  2014]  

Page 46: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Conclusions

Page 47: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

Conclusions  

§ Deeper  disSllaSon  from  text  is  important  § Can  be  achieved  by  § DetecSng  and  combining  mulSple  elements  in  text  §  Feature  engineering  §  Knowledge  engineering  §  Classifier  selecSon  

§ Does  not  have  to  be  perfect    § Every  domain,  dataset  has  its  nuances  

Page 48: Deep Distillation from Text · Contextual%senSment! Aspectclassificaon% I think you need better chefs " SUGGESTION The mocha is too sweet " NEGATIVE I used to take Lipitor for …"

thank you ! [email protected]