Dandelion: semantic text analytics as a service

Preview:

Citation preview

Ugo ScaiellaR&D Team Lead @ Smarter Engagement – Milano, 20.05.2016

Dandelionsemantic text analytics

as a service

The bag-of-words paradigm

The Mona Lisa is a 16th century oil on canvas painted by Leonardo.It's held at the Louvre in Paris.

The bag-of-words paradigmTerm Freqthe 2mona 1leonardo 1century 1oil 1Paris 1Lisa 1By 1painted 1at 1canvas 1... ...

Classic NLP pipeline

Segmentation Tokenization PoS Tagging Chunking Dependency

Parsing

Classic NLP pipeline1 The the DT O 3 det2 Mona Mona NNP O 3 compound3 Lisa Lisa NNP O 8 nsubj4 is be VBZ O 8 cop5 a a DT O 8 det6 16th 16th JJ DATE 8 amod7 century century NN DATE 8 compound8 oil oil NN O 0 ROOT9 on on IN O 10 case10 canvas canvas NN O 8 nmod11 painted paint VBN O 10 acl12 by by IN O 13 case13 Leonardo Leonardo NNP PERSON 11 nmod14 .. . O _ _

1 It it PRP O 3 nsubjpass2 's be VBZ O 3 auxpass3 held hold VBN O 0 ROOT...

Limitations

The book is on the table

Limitations

Training: expensive, hard

The graph of conceptsThe Mona Lisa is a 16th century oil on canvas painted by Leonardo. It's held at the Louvre in Paris.

The graph of conceptsThe Mona Lisa is a 16th century oil on canvas painted by Leonardo. It's held at the Louvre in Paris.

The graph of conceptsPERSONbirthDatebirthPlacedeathDateauthorOf...

CONCEPT...

WORK...

PLACEcoordscapitalOfpopulation...

BUILDINGcoords...

paris

leonardo

oil on canvas

mona lisa

Oil painting

Paris (mythology)

Mona Lisa (painting)

Mona Lisa (movie)

Paris (city)

Leonardo da Vinci

Leonardodo Nascimento

Spots(aka mentions, surface

forms)

Concepts

Advantages

• Less training• Speed• Customization• Robustness to syntax• … but still (may) use classic NLP to improve results

Applications

• Entity Extraction• Classification• Similarity & clustering … basically any IR task

Applications: an example

Cameron wins the Oscar

Cameron wins general elections

All nominees for the Academy Awards

See more onhttps://dandelion.eu

Real World Use Cases

Use case #1Lawful interception

Identify potential terrorism threats on social networks and message boards

Customized domain-specific taxonomy

Use case #2Website tagging

Profile a company looking at his website• Entity extraction: products, locations• People & Roles

Sales intelligencefor lead generation

http://atoka.io

Use case #3News stream monitoring

News stream of 70k articles per day• BI vertical of semantic engine• Entity extraction: companies, people• Business signals extraction

Use case #4Social media analysis

• Entity extraction, sentiment analysis• Dashboard, tag-cloud

Use case #5Travel recommendation

Crawl the web and understand people’s behaviorDisplay travel offers that match user preferences

Use case #6E-Commerce Optimization

Collect and annotate customer reviews from e-commerce websites

Dashboard for product ratings analysis

Thanks!

scaiella@spaziodati.eu@ugoscaiellalinkedin.com/in/ugoscaielladandelion.eu

Q&A

Recommended