Watson presentation

Joseph Orilogbon Luis Lasierra

Bin Shen

5/12/14 Semantic Technologies in IBM Watson 1

Discovering why Topics are Trending on Twitter

*We set out to explain Why Topics are Trending on Twitter

*Main approach to achieve this was to use summarization.

*News break on Twitter

*Twitter -> prominent way of expressing opinions on the Internet

*Why people are talking about a particular topic in a given location

*Commercial interest

*Summarization of trending topics on Twitter

*Categorization of Topics; and

*Named-Entity Extraction for Trending topics

*http://whytrend.intelworx.com

*Speech Act Guided Summarization

*Phrase Ranking using MLE

*Phrase Extraction using POS filtering

*Salience Score of Extracted Phrases

*Summary generation using templates

*Speech Acts include : Statement [sta], Question [que], Comment [com], Suggestion [sug] and Miscellaneous [mis]

*Speech Act classification is a multiclass problem *K-Nearest neighbors approach was used for classification.

*Extracted Phrase were Ranked using the following equation

*𝑆𝑆𝑆𝑆𝑆 𝑃 = log 𝐿(𝑤𝑤𝑤𝑤𝑤 𝑖𝑖 𝑃 𝑎𝑤𝑎 𝑖𝑖𝑤𝑎𝑖𝑎𝑖𝑤𝑎𝑖𝑖) 𝐿(𝑤𝑤𝑤𝑤𝑤 𝑖𝑖 𝑃 𝑎𝑤𝑎 𝑤𝑎𝑖𝑎𝑖𝑤𝑎𝑖𝑖)

*Dependence/Independence measured based on using a background twitter corpus built from 550,000 tweets

*For lengths 1 to L, we extract the top 50 phrases. *L is a model parameter for maximum phrase length

*Extracted N-Grams are only useful if they are: *Nouns or Noun Phrase

*Verbs or a Verb-Centered Phrase

*After Extracting N-Grams, those not matching the required patterns were filtered out using RegEx on their POS Tag Pattern

*Tagging was done before extracting N-Grams to give the tagger the proper context.

*Different patterns are suitable for different Speech-Act

*This is another round of ranking of phrases based on how “Salient” they are within the given topic

*Salience Score is given as 𝑆𝑆 𝑁𝑔𝑖 = 𝐺𝑆 𝑁𝑔𝑖 × 𝑁𝑖 *𝑁𝑖 is the length of N-Gram 𝑁𝑔𝑖

*𝐺𝑆 𝑁𝑔𝑖 is a graph score obtained by iterating over a graph G=(V, E), where V is the set of N-grams, and E is a set edges weighted based on the number of times the N-Grams co-occur.

*Greedy strategy was used to select most salient phrases

*Phrases were used to fill templates

*Speech acts used to describe how people are talking about the salient phrases.

*Redundant phrases were detected using Jaccard Coefficient of 0.275

*The main reference is Zhang et. al, 2013

*Speech Acts were not used for filtering out tweets

*Two rounds of POS filtering was done, as supposed to one in the original paper

*Greedy strategy was used as opposed to Round-robin used in the original paper

*Representative tweets were also presented to give the user some sense of context.

*Speech Act Training Data Set (Liu, et. al), for speech act classification

*Sentiment 140 dataset, for background corpus

*TweetMotif dataset (O’Connor et. al, 2010) for background corpus.

*Twitter NLP (Gimpel et al) for POS tagging

*Tweets collected via Twitter API for testing summarization model, see examples on site.

*Entity Extraction *Preprocessing, proper nouns extraction

*Google Knowledge Graph: Freebase

*Categorization *uClassify API

*Extract highest ranking category

*Front end *Auto-detection/manual selection of location *Displays trending topics *Sends requests to server to analyze topics

*Back end *Tweets retrieval *Analysis using model of summarization *Send results to Freebase and uClassify APIs *Caches result

*Front end: HTML 5, JS, Google Maps API, Angular JS, JQuery

*Backend: Java / Play framework and MySQL database

*Hosted on AWS

*Asked users to provide feedback on results

*Questions covered all 3 parts of the project

*Got 19 responses as at the time of making this slide,

Avg = 3.89

Avg = 4.00

Avg = 4.21

Avg = 3.84

Avg = 4.16

* Liu, Fei, Yang Liu, and Fuliang Weng. "Why is SXSW trending?: exploring multiple text sources for Twitter topic summarization." 2011. 66--75.

* OConnor, Brendan, Michel Krieger, and David Ahn. "TweetMotif: Exploratory Search and Topic Summarization for Twitter." 2010.

* Zhang, Renxian, Wenjie Li, Dehong Gao, and You Ouyang. "Automatic Twitter Topic Summarization With Speech Acts." Audio, Speech, and Language Processing, IEEE Transactions on (IEEE) 21 (2013): 649--658.

* Nathan Schneider, Brendan O'Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments Kevin Gimpel, In Proceedings of ACL 2011.

* Abeel, T.; de Peer, Y. V. & Saeys, Y. Java-ML: A Machine Learning Library, Journal of Machine Learning Research, 2009, 10, 931-934

*Tweets under a topic are loosely grouped together, sometimes not sharing too much in common.

*Low performance with Speech-Act Classification

*Detection of Main entity

*Normalization of tweets could at times result in weird results

*Limits on Twitter API 180 search queries/user/application/15 minutes

*Real-time indexing of tweets before they start trending, using Lucene/ES or other full-text engines.

*Detection of sentence overlap in the selected phrases

*Detecting redundancies semantically.

*Different templates for various topic categories.

Watson presentation

Technology

Willis Towers Watson – NECA Presentation

IBM Watson - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/S6901-rob-high-keynote.pdf• IBM Watson Developers Lab, Hilton Santa Clara Room • OpenPower

Justin Fessler Presentation - IBM · Watson Engagement Advisor • Watson Discovery Advisor Watson Discovery Advisor • Watson Analytics Cognitive Industry Solutions Field Service

Watson Analytics Presentation

Terry Watson Presentation

170330 cognitive systems institute speaker series mark sherman - watson presentation v1

LEARNING THEORY GROUP PRESENTATION: JEAN WATSON By: April Bilbe, Ashley Denhartigh, Barb Hulwick, Dana Raymer & Deborah Schaefer [Photograph of Jean Watson]

Webcast Presentation: A DevOps Approach Speeds IBM Watson Solutions to Market

Calling Dr Watson To Radiology - RSNA Presentation

Liz Watson Presentation

Morale and Welfare Presentation to (date) Commodore Mark B. Watson DGMWS

Orange County Budget Presentation Linda S. Watson Chief Executive Officer LYNX

SiDE Presentation by Prof. Paul Watson of Newcastle University

Watson Presentation

IBM Watson : Courses Overvie€¦ · IBM Watson Platform • Getting Started with Watson - Basic terminology and the applications of Watson • Watson 101 • External Resources

Rick Watson ETail West 2012 ECommere Strategy Presentation

Towers Watson Presentation to the American Chamber of ... · PDF file© 2012 Towers Watson. All rights reserved. Towers Watson Presentation to the American Chamber of Commerce in Egypt

IBM Watson Financial Services Francis Presentation...IBM Watson Financial Services Risk & Compliance Innovation Forum London 24 May 2017. ... Integrated Maturity Stages –Inventory

Intelligent enterprise: Cognitive Business Presentation from World of Watson

Kempski Watson Listing Presentation