10
Synthesio – The Truth About Natural Language Processing March 2011 1 The Truth About Sentiment & Natural Language Processing By Synthesio Summary Introduction .2 Artificial Intelligence’s difficulties with sentiment .3 Human analysis is an obligatory step when analyzing web content .5 Current technological advances .5 The future of semantic technology .8 .7.Conclusion .10

The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

Embed Size (px)

DESCRIPTION

Sentiment analysis is a topic that has been looked at repeatedly in the social media monitoring sphere, with some strong advocates of automatic sentiment analysis and other strong advocates of human analysis. Synthesio, that traditionally relies on human analysts but has the capacity to incorporate an automatic analysis, took a look at the pros and cons of text analytics and interviewed Seth Grimes, an expert in the field.

Citation preview

Page 1: The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

 

Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   1  

 

         

             

The  Truth  About  Sentiment  &  Natural  Language  Processing    By  Synthesio      

     

       

Summary  Introduction  .2  

Artificial  Intelligence’s  difficulties  with  sentiment  .3  Human  analysis  is  an  obligatory  step  when  analyzing  web  content  .5  

Current  technological  advances  .5  The  future  of  semantic  technology  .8  

.7.Conclusion  .10      

Page 2: The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

 

Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   2  

 

Introduction  

 

The  web  has  made   it  possible   for   brands   to   discover   what  

people   are   saying   about   their   brands   online,   either   in   mainstream   media   like   online  newspapers  and  magazines,  or  on  social  media.  Consumers  now  search  for  opinions  online  before,  during,  and  after  a  purchase.  The  next  step  for  brands  is  finding  out  whether  people  are  talking  positively  or  negatively  about  their  brand,  and  why.  Some  online  ratings  provide  a  number  but  not  the  reasoning  behind  it,  and  may  only  present  half  of  the  story.  Numerous  companies  have  been  working  on  text  mining  for  close  to  30  years  in  some  cases,  thus   sentiment   analysis   is   not   a   new   area   but   it   has   become   a   hot   topic   thanks   to   social  media.   Social   media   monitoring   companies,   as   well   as   PR   practitioners,   and   digital  marketers   in   general,   have  waged  debate  over  whether   sentiment   should  be  analyzed  by  man  or  machine.   Synthesio   currently   uses   human   analysts   for   sentiment   analysis   but   can  add  natural  language  processing  capacities  on  a  case-­‐by-­‐case  basis.  Although  technology   is  quickly  advancing  to  catch  up  on   its   lag  behind  human  analysis,  as  we  advance  toward  what  is  referred  to  as  the  singularity,  it  seems  as  though  the  best  option  is  currently  combining  both  machine  and  man.                                        

Page 3: The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

 

Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   3  

 

Artificial  Intelligence’s  difficulty  with  sentiment    One  way  that  researchers  have  attempted  to  classify  sentiment  is  by  creating  a  “sentiment  lexicon”      Sentiment   is  not  analyzed  via  artificial   intelligence,  as  some  people  may  be  tempted  to  think.  Rather,   it   is  analyzed  via  a  systematic  process  that  involves  the  use  of  a  sentiment  lexicon.  This  lexicon  assigns  a  degree  of  positivity  or  negativity  to  a  word  by  itself  that  is  then  used  to  give  meaning  to  the  entirety  of  the  article.  This  is  a  way  of  analyzing  sentiment,  then,  by  considering  a   type  of   inherent  positivity  or  negativity  of   each  word   that  would  be  used  by   someone   to   talk   about   your  business  or  products.  For  example,  “happy”  would  be  deemed  a  positive  word,  as  well  as  “like”  and  “love”.  At  the  opposite  end  of  the  spectrum  we  can  see  words  like  “hate”,  “dislike”,  etc.      There  are  two  problems  with  this  methodology,  however.  The  first  problem  is  that  this  assigning  of  positive  and  negative  sentiment  evaluates  a  word  without  the  context  of  what  is  around  it.  The  dictionary  is  extremely  limited  in  the  number  of  words  that  will  always  attach  a  positive  or  negative  sentiment  to  an  expression.  The  second  problem  is  that  researchers  may  assign  different  degrees  of  positivity  to  a  word.  Particularly  in  the  case  of  ambiguous  expressions,  a  researcher  may  be  more  inclined  to  note  a  word  as  more  or  less  positive.      

Text  categorization  classifies  articles  by  topic1    Text  categorization  does  not  classically   look  at  the  various  features  mentioned  within  one  article.  Sentiment  analysis  has  traditionally  been  performed  using   technology   that  evaluates  an  article   at   a   global   level.  Within  one   text,   however,   the  topic  may  not  be  linked  to  the  descriptors.  For  example,  take  the  sentence:  “This  film  should  be  brilliant.  It  sounds  like  a  great  plot,  the  actors  are  first  grade,  and  the  supporting  cast  is  good  as  well,  and  Stallone  is  attempting  to  deliver  a  good  performance.  However,   it  can’t  hold  up.”  The  sentence  should  be  positive,  given  the  number  of  positive  descriptors.  It   is  only  at  the  end  that  a  human  can  identify  the  finality  of  the  judgment  that  is  overall  negative.      The   dictionaries   used   are   developed   through   analysis   of   various   factors,   including   sentiment   polarity   and   degrees   of  positivity   (“like”   vs   “dislike”;   relatedness   of   topics..),   identifying   which   parts   of   a   document   contain   subjective   content  (subjectivity  detection  and  opinion  Identification),   identifying  which  parts  of  a  document  regard  the  same  subject  before  analyzing  (joint  topic-­‐sentiment  analysis),  and  determining  the  political  orientation  of  a  text  (viewpoints  and  perspectives).    Other  non-­‐factual  information  in  the  text  can  also  be  taken  into  account.  For  example,  there  are  six  “universal”  emotions:  anger,  disgust,   fear,  happiness,   sadness,  and   surprise2   that  may  be  analyzed,  as  well   as   term  presence,   term   frequency,  syntax,  and  negation.        

The  majority  of  sentiment  analysis  literature  has  focused  on  text  written  in  English      This  means  that  for  the  time  being,  most  of  the  resources  that  have  been  developed  for  automatic  sentiment  analysis  have  been  developed  in  English  and  for  the  English  language.  We  looked  at  this  with  Seth  Grimes,  a  text  analytics  expert,  later  in  this  document  in  an  exclusive  interview,  but  there  have  traditionally  been  two  types  of  solutions.  One   solution   for   multilingual   resources   has   been   using   bilingual   dictionaries   to   transfer   the   corpus,   meaning   finding  parallels  for  all  of  the  rules  that  were  applied  to  the  English  texts.  A  second  solution  has  been  to  apply  sentiment  analysis  to  a  translated  version  of  the  text,  but  accuracy  rates  may  be  questionable.        

                                                                                                               1  Opinion  mining  and  sentiment  analysis,  2008  2  Idem  

Page 4: The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

 

Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   4  

 

Seth  Grimes,  expert  in  NLP    

There  are  companies  that  propose  sentiment  analysis   in  one   language  (typically  English)  while  others  propose  an  analysis  in  10  different  languages.  Linguistic  approaches  (lexicons  and  dictionaries)  may  be  used   for   several   languages,   but   they   have   incomplete   sentiment   capabilities   in   most   of   them.  Translating  linguistic  content  in  French  or  Chinese,  fore  example,  can’t  possibly  offer  the  best  results.  

 

                                     

Page 5: The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

 

Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   5  

 

Human  analysis  is  an  obligatory  step  when  analyzing  web  content    Machines  are  capable  of  deciphering  meaning  from  large  amounts  of  information    An  advantage  of  having  an  automation  of  text  analysis  is  that  computers  are  able  to  work  on  large  pieces  of  text  that  are  homogenous  in  form  and  written  in  one  language  much  more  quickly  than  a  human  ever  could.  Much  as  in  the  same  way  that  macros  in  Excel  accelerate  the  speed  at  which  a  human  may  advance,  having  algorithms  treat  information  can  accelerate  sentiment  analysis.  The  text  must  be  written  using  a  specific  vocabulary,  however,  with  very  little  variability,  in  order  to  obtain  high  levels  of  accuracy.    

Collocations  and  complex  syntactic  patterns  have  been  found  to  be  useful  in  detecting  subjectivity3    Some  technology  experts  have  attempted  to  create  syntactic  relations  within  feature  sets  that  are  then  tested  on  text  corpuses  to  “train”  the  software  and  allow  for  the  detection  of  subjective  expressions.  This  is  done  by  creating  syntactic  templates  that  are  run  through  a  training  corpus,  generating  extraction  patterns  for  every  time  the  templates  appear.  For  example,  <x>  pleased  me  should  match  any  time  the  word  “pleased”  is  present.  There  are  certain  limitations  to  this  technique,  as  the  software  will  then  search  for  specific  syntactic  expressions  and  not  exact  word  sentences.  When  analyzing  for  sentiment,  then,  this  is  only  the  first  step  in  identifying  if  there  is  sentiment  present  at  all.    

 Online  reviews  have  had  the  most  success  with  NLP  online    “Opinion-­‐oriented   information   extraction”   is   advancing   in   identifying   subjects   in   a   text   and   their   relationship   with   the  words  around  them  that  give  them  their  context.4    Nouns  in  online  reviews  are  particular  in  that  they  most  likely  –  but  not  always   –   pertain   to   the   product   or   service   being   reviewed.   The   context   is   similarly  most   likely   –   but   not   always   –   the  reviewer’s  opinion  of  such  product  or  service.  Whereas   other   online   media,   like   blog   posts,   may   post   various   opinions   throughout   one   post,   with   both   positive   and  negative  sentiment  attached  accordingly,  online  reviews  are  one  type  of  media  that   is   typically   focused  on  uniquely  one  subject.   A   heuristic   for   NLP   software   has   been   to   detect   adjectives   that   are   in   the   same   sentence   as   the  feature/product/service  being  evaluated.  These  can  then  be  analyzed  by  manual  or  semi-­‐manual  rules  or  lexicons.        Specialist  in  PR  Relations  KD  Paine  explains:    

 “Computers  can  do  a   lot  of   things  well,  but  differentiating  between  positive  and  negative  comments   in  consumer  generated  media   isn’t  one  of   them.  The  problem  with  consumer  generated  media   is   that   it   is  filled  with  irony,  sarcasm  and  non-­‐traditional  ways  of  expressing  sentiment.  That’s  why  we  recommend  a  hybrid  solution.  Let  computers  do  the  heavy  lifting,  and  let  humans  provide  the  judgment.”  –KD  Paine      

   

                                                                                                               3  Learning  Extraction  Patterns  for  Subjective  Expression    4  Opinion  mining  and  sentiment  analysis,  2008  

Page 6: The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

 

Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   6  

 

Current  technological  advances    Technology  is  continually  progressing    Mao  and  Lebanon  are  two  researchers  who  proposed  using  “isotonic  conditional  random  fields”  to  analyze  sentiment  at  sentence  level.5  They  created  mathematical  calculations  to  determine  sentiment,  given  that  certain  words  may  be  strongly  positive   or   negative   and   thus   affect   the   “local   sentiment”   positively   or   negatively.   These   could   be   new   models   for  programming   machines   to   determine   sentiment   within   certain   probabilities   by   also   incorporating   the   author   into   the  equation.  Uses  like  these  are  interesting  because  human  reviewers  do  not  always  agree,  either.  

   “I,  for  one,  welcome  our  new  computer  overlords”  –  Ken  Jennings,  Jeopardy  contestant  

   Watson,   a   question-­‐and-­‐answer   computer   developed   by   IBM,  made   history   on   Jeopardy   this   year,   an   American   game   show  renowned  for  its  difficult  question-­‐and-­‐answer  format,  by  making  an   appearance   against   two   top   champions.   Contestants   typically  study   volumes   of   encyclopedias   in   order   to   arrive   at   the   final  round,  but  IBM  put  their  supercomputer  Watson  to  the  test  –  and  he  won.    Programmed  not  only  to  buzz  in  according  to  the  level  of  certainty  

he  had  for  each  question,  Watson  was  trained  to  answer  in  the  form  of  a  question  and  decipher  the  complex  language  that   goes   into   a   game   of   Jeopardy.   The   category   names   are   often   puns,   as   well   as   the   “answers”   (which   serve   as  questions).6     IBM  proved  that  their  technology  has  advanced  to  the  point  where  it  can  intelligently  parse  language  and  weigh  different  parts  of  a  phrase.  Researchers  scanned  some  200  million  pages  of  content  —  or  the  equivalent  of  about  one  million   books  —   into   the   system  but  were   unable   to   teach   it   to   avoid   all   traps.   During   the   practice   session,   one  wrong  answer  led  to  a  string  of  wrong  answers,  as  the  machine  veered  into  a  wrong  direction.    

   The  web  is  comprised  of  many  different  types  of  media,  both  mainstream  and  social    Some  media  online  are  more  “fact-­‐based,”  such  as  newspapers  or  general  news,  while  other  are  inherently  more  “opinion-­‐based,”   like  Twitter,   Facebook,   and   forums.   Still   other  media  may  be  one  or   the  other,   like  blogs,   all   of  which  makes   it  difficult   for  automated  sentiment  analysis   technology  to  differentiate  between  subjective  and  objective   information.  For  example,  if  we  look  at  the  sentence  “the  battery  lasts  2  hours”  versus  “the  battery  only  lasts  2  hours,”  there  is  a  sentiment  that  is  implied  in  the  second  sentence  that  is  not  in  the  first.  Social   media   has   also   engendered   new   forms   of   expression   via   an   “SMS-­‐like”   writing   on   social   media   that  makes   text  analysis  more  complicated.  Emoticons  may  or  may  not  help,  and  slang  is  more  commonly  used  in  social  media,  along  with  misspellings  and  bad  grammar,  or  poor  syntax   like  missing  or  added  characters.  Take,   for  example,  “oh  my  gooooooood  WTF  did  you  see  Biebur’s  concert?   It  was  aewsome!   I   lved   it.”  New  forms  of  association  and  ways  of  depicting  negative  sentiment   have   also   arisen,   including   ironic   or   sarcastic   phrasing.   “Another   winner   from   the   almighty   Microsoft,”   for  example,  or  most  recently  “Charlie  Sheen  is  a  winner.”      

                                                                                                               5  Isotonic  Conditional  Random  Fields  and  Local  Sentiment  Flow  6  IBM  and  the  Jeopardy!  Challenge  -­‐  Video  -­‐  Wired    

Page 7: The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

 

Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   7  

 

Automated  sentiment  analysis  cannot  understand  sentiment  in  the  context  of  your  business  goals      One  factor  that  many  automated  proponents  have  struggled  to  respond  to  is  analyzing  text   in  the  context  of  a  business.  For   example,   “Nike   reports   that   profits   rose”   and   “Adidas   reports   that   profits   rose”   are   both   positive   sentences  when  evaluated  with   no   context.   If   Nike   is   the   firm   listening   to   social  media,   however,   the   second   phrase   is   suddenly   not   as  positive.  The  “goodness”  or  “badness”  depends  on  whether  the  client  is  Nike  or  Adidas.  Beyond   looking   at   whether   the   information   is   positive   or   negative   for   a   client,   automated   text   analysis   may   extract  information  that  the  company  already  knows  or  does  not  wish  to  focus  on.  For  example,  the  level  at  which  machines  can  decipher  meaning   is  often   limited  to  what  brands  already  know.   If  a  machine   is   told  to  analyze  the  top  trends  around  a  brand,  it  may  include  information  that  the  brand  already  knows.    

Automated  analysis  is  limited  in  analyzing  sentiment  for  several  topics  within  an  article    Only   now   are   certain   technologies   emerging   that   can   analyze   sentiment   at   a   feature   level,   but   in   general   automated  sentiment  analysis  technology  has  difficulty  distinguishing  sentiment  between  one  topic  and  another,  particularly  if  more  than  one  are  mentioned  in  the  same  sentence.    A  blog  post  may  be  positive  in  the  first  sentence  and  negative  in  the  second,  or  there  may  be  one  overall  sentiment  for  the  blog  post  with  positive  and  negative  comments.  “Much  work  on  analyzing  sentiment  and  opinions   in  politically  oriented  text  focuses  on  general  attitudes  expressed  through  texts  that  are  not  necessarily  targeted  at  a  particular  issue  or  narrow  subject.”7  A  blogger,   for  example,  may  compare  2  products  within   the  same  post   (or  more).  Posts  on  a   forum  are  often  responses   to   earlier   posts,   and   the   lack   of   context   makes   it   difficult   for   machines   to   decipher   whether   the   post   is   in  agreement  or  disagreement.                                                  

                                                                                                               7  Opinion  mining  and  sentiment  analysis,  2008  

Page 8: The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

 

Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   8  

 

The  future  of  semantic  technology    An  interview  with  Seth  Grimes,  an  “Analytics  visionary”  

   “Watson”,  the   IBM  computer  won  on  the  game  show,  Jeopardy,  created  a  huge  buzz  around  “his”  technology.  Why  do  you  think  there  was  so  much  buzz?    Getting  a  computer  to  play  Jeopardy  was  a  great  stunt.   IBM  made  the  technology  do  something  that  everyone  can  understand.  It  was  a  “stunt,”  however,  because  the  ability  to  win  Jeopardy  is  not  in  high  

demand  in  business  or  society.    Nonetheless,  Watson’s  Jeopardy  playing  helps  the  non-­‐technologist  public  understand  the  potential  and  the  reality  of  the  technology.  Question-­‐answer  systems  are  already  out  there,  automating  responses  to  business  questions  –   for   instance,   for  contact-­‐center  support,  customer  inquiries,  and  online  commerce  –  no  requirement  for  a  live  person  on  the  line.  Right  now  Watson  is   focused   on   extracting   factual   information,   but   the   technology   could   be   working   on   sentiment   via   a   sentiment  “annotator.”   Then   we   won’t   be   limited   to   asking   questions   about   facts.   We’ll   be   able   to   ask   about   opinions   and  emotions.    (An  annotator  analyzes  text  and  marks  it  up  with  meaning,  or  attributes,  features  in  the  text.  For  example,  a  name  identity  annotator  finds  geographic   locations  and  “marks  them  up”,  finding  semantic  meaning.  Annotating  pattern-­‐based  entities  can  find  addresses,  identity  location  numbers  by  looking  for  patterns,  and  other  annotators  can  mark  up  other  parts  of  the  text.)    How  accurate  can  this  technology  be?      Accuracy  goals,  and  the  amount  of  work  you  put  into  meeting  them,  should  be  decided  in  light  of  the  business  problem.  Some  problems  will  be  solvable  even  with   low   levels  of  precision   (e.g.,  positive  versus  negative  sentiment  classification)  while   you  might  need  higher  precision   for  other  applications.   “Recall,”   the  ability   to   identify  all   applicable   cases,   is   also  factored  into  accuracy  measurements.  My   impression   is   that   most   sentiment   tools   that   extract   entities   have   out-­‐of-­‐the-­‐box   accuracy   (without   training)   of  something  like  40-­‐50%  but  can  be  “trained”  (by  having  humans  create  marked-­‐up  samples  or  language  rules  or  correct  the  tool)  to  reach  above  the  80%  level.  I  saw  one  claim  of  98%  accuracy,  which  is  laughable  and  ludicrous.  The  only  way  you  can   do   this   is   by   highly   restricting   the   problem   and   tailoring   the   solution   and   being   more   lenient   on   what   counts   as  accurate  or  not.    It   matters  most,   first   that   you   identify   that   there   is   sentiment   there   at   all,   without   even   identifying   if   it   is   positive   or  negative,   and   then   passing   materials   on   for   human   or   machine   classification.   With   machine   filtering   and   humans  analyzing,  for  certain  problems,  you  can  yield  high  levels  of  accuracy.  If  you  really  want  the  machine  to  do  everything,  you  need  to  do  a   lot  more  work  or  you  will  get  much   lower   levels  of  accuracy  over  all,  but  again,  decisions  should  be  made  based  on  business  needs  and  also  the  nature  of  source  materials.  Let  me  add  that  I  consider  that  while  tools  that  analyze  only  at  the  message  or  document  level  may  be  accurate,  the  results  they  produce  will  also  often  be  far  less  than  useful.    Think  about  it.    It  might  be  helpful  if  you’re  running,  say,  a  hotel  group  with  4,200  hotels,  to  know  that  (making  up  numbers)  77%  of  reviews  were  overall  positive,  17%  neutral,  and  6%  negative.    Wouldn’t  it  be  far  more  helpful  to  know,  by  hotel,  opinion  details?    You  want  to  know  when  a  reviewer  found  that  room  cleanliness  and  staff  friendliness  were  exemplary  but  that  noise  was  a  problem.  The  details  in  a  net  positive  review  are  not  typically  going   to  be  all  positive,  and  only  by  knowing  sentiment  at  a  detailed,   “feature,”   level   can  you   reinforce  what’s  great  and  correct  what’s  not.  By  the  way,  let’s  not  overstate  the  accuracy  of  human  sentiment  analysis.    The  best  study  I’ve  seen  of  accuracy  was  done  at  the  University  of  Pittsburgh   in  2005.  While   they   found  only  82%  human  agreement   in   annotating   for   sentiment  Results  jumped   to  over  90%  when   they   removed  uncertain   cases   (when   they   subtracted   cases  where  people   said   they  weren’t  sure).  

Page 9: The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

 

Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   9  

 

 Are   there   certain  online   channels   (among   forums,   blogs,   Twitter,   etc)   that   are   easier   to   analyze  using   text  mining   as  opposed  to  others?      To   really   do   it   well   you   have   to   go   to   the   feature   level   (to   the   individual   item).   You   need   strong   natural   language  processing  (NLP)  to  do  that  right.  Twitter  is  interesting  because  it  is  very  hard  to  express  more  than  one  idea  in  a  given  tweet.  Most  tweets  focus  on  a  single  idea  which,   in  theory,  should  make   it  easy  to  analyze.  The  problem  is,  people  use  a   lot  of  slang  and  abbreviation,  which  makes  it  difficult  to  analyze,  as  opposed  to  a  blog  or  article.  Also,  a  tweet  is  often  part  of  a  conversation.  Very  few  tweets  stand  on  their  own;  many  including  an  article  link  or  are  responses  to  someone,  for  example.  Others  are  part  of  multi-­‐way  conversations,  and  you  very  often  need  to  understand  the  whole  conversation  to  get  the  context.  Most  of  the  tools  that  are  out   there  don’t   do   that;   they  don’t   reach   “through   the   tweet”   to   take   into   account   the   threaded  nature  of   Twitter  conversations.  The  more  text  there  is,  the  easier  it  is  to  analyze,  but  at  the  same  time  the  shorter  it  is  the  more  focused  it’s  going  to  be.  But  let’s  move  from  ease  of  analysis  to  business  value  delivered.  Applications   like  Synthesio’s  get  a   lot  of  visibility  because  so  many  people  use  social  media,  but  customer  service   is   the  sentiment-­‐analysis  application  that  has  probably  delivered  the  clearest  business  benefits,   the  greatest  business  value.  Contact  centers  and  surveys  provide   important  data  that   is  more  focused  than  material  out  on  the  web,  associated  with  actual  customers  and  transactions.  You’ll  get  greater  benefit  tying  customer  feedback  to  social  media  data,  rather  than  if  you  spend  your  funds  broadly  listening  to  people  that  are  expressing  opinion  in  a  void,  without  context.    There’s  no  denying  the  potential  benefit  in  broad  social-­‐media  monitoring  and  engagement,  however.    People  will  tell  you  what  they  like  about  your  product  (or  don’t)  and  will  post  things  that  can  be  analyzed  and  shown  to  be  indicators  of  their  intent  (to  buy,  to  complain,  or  cancel  their  service,  etc.)  This  information  can  be  used  to  fix  problems:  the  customer-­‐service  scenario.  Answering  a  customer  to  make  that  person  happy  can  turn  them  into  a  “net  promoter,”  and  the  information  can  be  used  to   improve  quality  so  the  problems  don’t  happen  to  other  people.    Posted  and  analyzed   information  –  beyond-­‐polarity   (positive/negative)   intent   signals  –   can  also  be  used  by   companies   to   identify   and  act  on  opportunities.     This   is  engagement  that  not  only  reactively  responds  to  particular  comments  about  products  and  services.    It’s  engagement  that  proactively  creates  new  and  higher-­‐value  customers.    What  recent  advances  have  you  seen  in  sentiment  analysis  technology?    The  latest  advances  in  analysis  do  go  beyond  “polarity”  or  “valence”  (positive,  negative,  neutral),  and  I  don’t  just  mean  by  rating   sentiment  on  a   scale   from   -­‐10   to  +10   to   capture   “intensity”:   an  advance,  but  we  can  do  more.   For  example,   you  might  look  at  sentiment  in  the  terms  of  emotional  categories  such  as  “angry”,  “sad”,  or  “happy,”  about  a  hotel  service,  for  example.   I’m   sure  we   can   all   think   of  ways   that   automated  understanding   of   emotional   tone   can   be   useful   in   business  contexts.  Then  there  are  the  “intent  signals”  I  was  just  discussing:  sentiment  as  an  indicator  of  plans,  or  actions.  You’re  going   to  get   the  most   flexibility   in   creating  business-­‐suited   categorizations  via   statistical   approaches.   That   is,   the  analyst   sets   up   categories   that  make   sense   and   drags   and   drops   documents   into   the   different   categories   for   “training”  purposes.  The  machine  uses  statistical  similarity  measures  to  discover  what  the  items  in  the  category  have  in  common  in  order  to  automate  classification.    Further,   the  market   is   beginning   to   understand   that   influence   is   best  measured   by   ability   to   affect   business.   Certainly  influence  is  correlated  with  the  number  of  Facebook  friends,  Twitter  followers,  and  retweets,  but  what  should  interest  far  more  is  how  those  measures  translate  into  inquiries,  sales,  and  monetizable  perceptions.    A  person  is  influential  for  real  if  he  or  she  drives  business  transactions.    And   the   market   is   understanding   just   how   shallow   many   of   the   listening   tools   are   –   treating   social   media   as   a   silo,  completely   unlinked   to   enterprise   systems   and   actual   business   transactions,   using   simple   keyword   lists   for   sentiment  classification,  and  applying  sentiment  analysis  only  at  message,  article,  or  document  level  –  and  that  they  can  and  should  do  better,  including  by  joining  the  abilities  of  humans,  who  judge  me  and  discern,  and  the  power  of  machines,  which  are  fast,  work  24  hours  per  day,  and  can  tap  huge  volumes  of  social,  online,  and  enterprise  information  that  are  beyond  human  analysis  regardless  of  cost.  

Page 10: The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

 

Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   10  

 

 

Conclusion    No  social  media  monitoring  vendor  would  dare  to  pretend  that  technology  can  accurately  (or  even  near-­‐accurately)  assess  sentiment  on  a  specific  topic.  At  subtopic-­‐level   (such  as  what  we  do  at  Synthesio),   it   is  completely   impossible.  However,  NLP  can  at  least  help  identify  trends  at  a  macro  level  such  as  hot  topics  or  aggregate  changes  in  sentiment  over  time.  The  theory  is  that  even  if  the  sentiment  marking  is  inaccurate  (even  by  an  order  of  magnitude),  by  tracking  and  trending  it  over  time  we  can  watch  the  pattern  for  changes  because  we  are  assuming  that  the  level  of   inaccuracy  will  be  consistent  over  time...  However,  there  is  no  proof  of  this  yet.  

                                   About  Synthesio    Synthesio  is  a  global,  multi-­‐lingual  Social  Media  Monitoring  and  research  company,  utilizing  a  powerful  hybrid  of   tech   and   human   monitoring   services   to   help   Brands   and   Agencies   collect   and   analyze   consumer  

conversations  online.     The   result   is   actionable   analytics   and   insights   that  provide  an  accurate   snapshot  of   a  brand  and  help  answer  the  ultimate  questions  –  how  are  we  really  doing  right  now,  and  how  can  we  make  it  

better.  Founded   in   2006,   the   company  has   grown   to   include   analysts  who  provide  native-­‐language  monitoring   and  

analytic   services   in   over   30   languages   worldwide.   Brands   such   as   Toyota,   Microsoft,   Sanofi,   Accor   Hotels,  Orange  Telecom  and  many  other  well-­‐known  companies  turn  to  Synthesio  for  the  data  they  need  to  engage  

with  their  markets,  anticipate  and  prepare  for  emerging  crisis  situations,  and  prepare  for  new  product  or  new  campaign  launches.  

 WWW.SYNTHESIO.COM