19
Your Grandmother Doesn’t Like Surprises A case study of ANM’s Travel Site Jeffrey Catlin – Lexalytics, Inc. Bob Pierce – Fast Search & Transfer May 3, 2005

Your Grandmother Doesn’t Like Surprises A case study of ANM’s Travel Site Jeffrey Catlin – Lexalytics, Inc. Bob Pierce – Fast Search & Transfer May 3,

Embed Size (px)

Citation preview

Your Grandmother Doesn’t LikeSurprises

A case study of ANM’s Travel Site

Jeffrey Catlin – Lexalytics, Inc.Bob Pierce – Fast Search & Transfer

May 3, 2005

Overview

Project Overview Project Goals Technology Elements

Site Features Improved Search Automated Processing of Hotel Reviews

Knowledge Management in Action Sentiment / Tone capability is unique and fully automated Improvement over 1 to 5 star ratings

Customer Reaction and Futures Go Live for this site Other sites utilizing this technology

Contacts: Jeff Catlin: [email protected] Bob Pierce: [email protected]

Project Overview

ANM – Associated News Media is a publisher in the UK that is leveraging it’s content to reach into Internet Applications like Travel

Project Goals: Improve Stickiness of the site, which is key to generating more add Dollars Improve and simplify the search features of the site, including sorting by a

variety of field types and making search available throughout the site Expose and Automate user reviews. Providing accurate and ready access

to user reviews improves stickiness and acceptance of the site Reduce the cost of utilizing user reviews Dramatically increase the breadth of coverage of user reviews

Project Overview

Technology Elements: ANM

Custom Application interface Utilizing FAST ESP for search features

FAST Marketrac: FAST ESP provides Application Search Features FAST Content Processing Pipeline and web spider for reviews

Lexalytics: Salience Server for Scoring hotel and travel reviews Sentiment Toolkit: Build out a travel focused Sentiment/Tone database

Site Features

Site Features – 4 star:NYC (the best)

Site Features-4 Star:NYC (the worst)

Knowledge Management in Action

Trustworthy User Reviews are a key to the stickiness of the site Reviews are obtained through feeds and spidering:

Feeds: IgoUgo & Fodors Spidering: tripadvisor.com & virtualtourist.com

Reviews are monitored and updated continuously and processed through the FAST Content Processing Pipeline

Automated reviews are more consistent, trusted and up to date than star ratings Unique feature Totally automated and more consistent than human ratings

Knowledge Management in Action

How does it all work? Lexalytics provides out-of-the-box sentiment tone analysis Toolkit to build scoring databases for verticals like travel, finance, security System builds up a dictionary of scored phrases that indicate good or bad

depending on the vertical it’s used for Phrase scores are determined using a training set and msn search Scores are measuring nearness of phrases with good and/or bad terms

Results in a phrase dictionary with phrases like: Sunny Day: 1.2706 Unsafe food: -0.7634

The Lexalytics Salience Server is embedded within FAST’s Marketrac product, so integration of sentiment/tone is very straightforward

Knowledge Management in Action

Let’s drill in to see

how reviews

are scored

Knowledge Management in Action

Let’s score this review

Knowledge Management in Action

Looking at the scoring of an individual review: Review for Marriott Marquis “Great stay, no elevator problems”

Reviews are scored, averaged and displayed on a 1 to 10 scale

Customer Feedback

Customer is pleased with the site Goes live today (5/3/05) Tuning of the hotel scoring has allowed the customer to put their own

touch on the system, giving them a unique offering Combination of information discovery features and integrated booking

should allow ANM to compete with any of the well known travel sites.

Information Intelligence Examples

Financial news and market analysis Market intelligence portal and alerts for brokers

Pharmaceutical competitive analysis Tracking molecules, drugs and companies “in the rear-view mirror”

Intellectual property protection Content similarity analysis and alerting

Illegal e-commerce Contraband trafficking and the “whack-a-mole” problem Cracking pornography rings

Automated image analysis Chat room monitoring and alerting

Threat detection and analysis

Market Intelligence in Financial Services

Leading European financial services group Capital markets, insurance, real estate, asset management, securities Goal: Trade more competitively, create better analyst reports

Leveraged FAST ESP and FAST Marketrac Collect actionable information ahead of general market availability

Premium sources, blogs, local web sites, research reports, etc. Real-time, personalized analysis

Search domains selected by individual analysts Correlate price movements with related news Analyze news flow for market-moving potential

Communicate and act Minimal latency Profile-based SMS/e-mail alerting Automated “morning reports”

Because Timing is Money: First-mover Advantage in

Markets

Speech by CEO at Copenhagen Business School quoted by Danish news site.

When Reuters published hours later, stock moved 2%. But these traders were already done with the trades....

ACTDecideAnalyzeSearch/Gather

Accelerate The Decision Cycle

Identify

time

Deci

sion p

oin

t

ACTDecideAnalyzeDiscoverGather

Deci

sion p

oin

tBefore

After

BETTER Decisions, FASTER!

Time

Impact

poin

t

Futures

Text analysis software has matured to the point where powerful applications can be deployed at a reasonable expense and high degree of confidence

Search and Text Analysis will play an increasingly important part in Business Intelligence, High Volume Storage and Consumer Electronics Entity extraction is relatively mature and fairly high-quality Classification (subject and tone) is being deployed in real-world apps Relationships between content elements is on the short-term horizon

Intellectual Property ProtectionWhen Information, Time are the Assets

WWW

Crawler

Seed URL DBTarget Site profile

SimilarityQueries

SimilarityResults

Real-time index

API - Similarity

Check similarityDetailed similarity check

...The Wimbledonand U.S. Open

champion,seeded second,breezed past...

Document

IP Database

Similarity vector:<[wimbledon, 1][USopen,0,7][champion,0,6]…>

Document

...The Wimbledonand U.S. Open

champion,seeded second,breezed past...

Similar doc.

...The Wimbledonand U.S. Open

champion,seeded second,breezed past...

...The Wimbledonand U.S. Open

champion,

seeded second,breezed past...

Real-Time Content Analysis

Sequential analysis compareslongest common subsequenceand maximum overlap.

Detected matches

1) Article extraction from websites2) Computation of similarity primitives

Validate content anddetermine changes

Notification,Enforcement