Upload
amazon-web-services
View
1.694
Download
0
Embed Size (px)
DESCRIPTION
Today's applications work across many different data assets - documents stored in Amazon S3, metadata stored in NoSQL data stores, catalogs and orders stored in relational database systems, raw files in filesystems, etc. Building a great search experience across all these disparate datasets and contexts can be daunting. Amazon CloudSearch provides simple, low-cost search, enabling your users to find the information they are looking for. In this session, we will show you how to integrate search with your application, including key areas such as data preparation, domain creation and configuration, data upload, integration of search UI, search performance and relevance tuning. We will cover search applications that are deployed for both desktop and mobile devices.
Citation preview
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Enrich Search User Experience for Different Parts
of Your Application Using Amazon CloudSearch
Jon Handler, CloudSearch Solution Architect
November 15, 2013
Agenda
• Sourcing your documents
• Retrieval and ranking
• Search user interface
• Performance and scale
• Developer example:
Peter Simpkin, Solution Architect, Elsevier
Architecting with Amazon CloudSearch
Hands-Off Operation
SEARCH INSTANCE Index Partition n
Copy 1
SEARCH INSTANCE Index Partition 2
Copy 2
SEARCH INSTANCE Index Partition n
Copy 2
SEARCH INSTANCE Index Partition 2
Copy n
SEARCH INSTANCE
Document Quantity and Size
Search Request Volume and Complexity
Index Partition n Copy n
SEARCH INSTANCE Index Partition 1
Copy 1
SEARCH INSTANCE Index Partition 2
Copy 1
SEARCH INSTANCE Index Partition 1
Copy 2
SEARCH INSTANCE Index Partition 1
Copy n
MovieMate Application
Multiple
Sources
Multiple
Functions
When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil.
Iron Man (2008)
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...
Iron Man 2 (2010)
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution.
Iron Man 3 (2013)
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...
The Man With The Iron Fists (2012)
Cancel Iron Man
Movies Search Social Account Nearby
Done Iron Man
Movies Search Social Account Nearby
Mobile Experience
Agenda
• Sourcing your documents
• Retrieval and ranking
• Search user interface
• Performance and scale
• Developer example:
Peter Simpkin, Solution Architect, Elsevier
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Amazon CloudSearch Documents
• Unique identifier
• Version
• Fields – Indexed according to configuration
– Source of matches
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Amazon RDS
Application Content
Movie data
Theater data
User reviews,
lists etc.
DynamoDB
User actions
Amazon S3
Help files
Media (clips,
images)
Articles
Bootstrap Strategy
Source
System
Processing
Script
Queuing Batching
Amazon EC2
Amazon EC2
Amazon
CloudSearch
Amazon SQS
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Document Construction
• One source will be the master
for each record
determine doc id and version
create fields
for each auxiliary source
gather additional data
send or queue the document
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example Relational DB
Movie
Title
Description
TheaterID
Theater
Name
AddressesID
ShowtimesID
Addresses
Street
City
State
Showtimes
Date
Time
State
Amazon S3
• Clips, images, reviews
• Apache Tika to extract content
• Amazon S3 Metadata for additional fields
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Amazon DynamoDB
DynamoDB CloudSearch
Table Domain
Item DocumentAttribute Field
Attribute
Attribute
Attribute
Field
Field
Field
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil.
Iron Man (2008)
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...
Iron Man 2 (2010)
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution.
Iron Man 3 (2013)
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...
The Man With The Iron Fists (2012)
Cancel Iron Man
Movies Search Social Account Nearby
Done Iron Man
Movies Search Social Account Nearby
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Searching Show Times
id title description t_name t_street date time
1 Iron
Man
... Galaxy Main 11/1
1
12:30pm
2 Iron
Man
... Galaxy Main 11/1
1
1:15pm
3 Iron
Man
... Galaxy Main 11/1
1
2:45pm
4 Iron
Man
... Galaxy Main 11/1
1
6:00pm
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example Heterogeneous Data
Multi Domain
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Updating CloudSearch
Amazon EC2 Amazon
CloudSearch
Amazon SQS Amazon EC2
Amazon S3 Amazon
DynamoDB
Amazon RDS
Web Server
Users
Update Processor
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Section Summary
• Multiple sources
• Bootstrap / Update
• Heterogeneous data
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Agenda
• Sourcing your documents
• Retrieval and ranking
• Search user interface
• Performance and scale
• Developer example:
Peter Simpkin, Solution Architect, Elsevier
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Correct Matches When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil.
Iron Man (2008)
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...
Iron Man 2 (2010)
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution.
Iron Man 3 (2013)
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...
The Man With The Iron Fists (2012)
Cancel Iron Man
Movies Search Social Account Nearby
The Search Algorithm
• Locate documents that satisfy Boolean
constraints – Usually intersection
• Relevance rank those documents – Differentiated from databases by relevance
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Document Structure
Movie
title
description
user_rating
likes
release_date
latitude
longitude
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Configuring for Search
• Text fields for individual word search – User-generated and external text – titles, descriptions
• Literal fields for exact matches – Application-generated text like facets
• Integer fields for range searching and ranking
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Searching Text
http(s)://<endpoint>/2011-02-01/search?
• Simple searches – q=<text>
• Filtering – bq= (and title:'iron man' genre:'Action')
• Filtering with integer ranges – bq=(and 'iron man' year:..2010)
• Geo filtering – bq=(and 'iron man' latitude:12700..12900 longitude:5700..5800)
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Search Results
{"rank": "-text_relevance", "match-expr": "(label 'iron man')", "hits": { "found": 204, "start": 0, "hit": [ { "id": "sontsst12cf5f88b42" }, { "id": "sopvopr12ab017f082" }, { "id": "sorzrpw12ac468a13b" }, ] }, ... }
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Relevant Results When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil.
Iron Man (2008)
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...
Iron Man 2 (2010)
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution.
Iron Man 3 (2013)
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...
The Man With The Iron Fists (2012)
Cancel Iron Man
Movies Search Social Account Nearby
Customizing Ranking
• text_relevance and cs.text_relevance
• Rank expressions – Compute a score for each document
– &rank=<function>
• Defined in the console
• Defined at query-time – &q='iron-man'&rank-recency=text_relevance + year
&rank=recency
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Field Weighting
Field Weighting
• Adjust relative importance of fields
• &rank-title=
cs.text_relevance({"weights":{"title":4.0},
"default_weight":1})
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Popularity
Popularity
• Convert floating point to integer
• Weight by the number of ranks
• rank-pop=text_relevance +
(user-rating - 2) * log10(number-user-ranks) * 10
+ metascore * 3
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Freshness
Freshness
• Exponential decay function
• &rank-decay=text_relevance + 200*Math.exp(-
0.1*days_ago)
r = ce-lt
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Done Iron Man
Movies Search Social Account Nearby
Location Sort
Location Sort
• Latitude and longitude
expressed as integers
• Denormalized for particular
theaters with locations
Movie
title
description
user_rating
likes
release_date
latitude
longitude
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Location Sort
• Cartesian distance function
• &rank-geo=sqrt(pow(latitude - lat, 2) +
pow(longitude - lon, 2)
• &rank=-geo
(lat - latuser )2 + (lon- lonuser )
2
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Rank Expressions: Combined
• &rank-combined=text_relevance + 2.0 * geo +
0.5 * popularity + 0.3 * freshness
• &rank=combined
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Section Summary
• Search API basics
• Customizing ranking – Field weighting, popularity, freshness, GEO, combined
• Rank expression comparison tool
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Agenda
• Sourcing your documents
• Retrieval and ranking
• Search user interface
• Performance and scale
• Developer example:
Peter Simpkin, Solution Architect, Elsevier
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Facets
Facets
Simple Faceting: Document
Movie
title
description
genre
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Simple Faceting: Configuration
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Simple Faceting: Query
q=iron+man&facet=genre
{"rank": "-text_relevance", "match-expr": "(label 'star wars')", "hits": {"found": 7, "start": 0, "hit": [] }, "facets": { "genre": { "constraints": [ {"value": "Family", "count": 62}, {"value": "Action/Adventure", "count": 21}, {"value": "Drama", "count": 5 },
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Simple Faceting: UI <div class='facet'>
<ul class='facet_list'>
<?php
$genres = $resultsObj->facets->genre->constraints;
for ($i = 0; $i < count($genres); $i++) {
$curGenre = $genres[$i]; $curCount = $thisGenre->count;
?>
<li class='facet_item'>
<div class='facet_name'><?=$curGenre?></div>
<div class='facet_count'><?=$curCount?></div>
</li>
<?php } ?>
</ul>
</div>
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Facets
Document
• title: Lincoln
• description: ...
• oscar1: Awards
• oscar2: Awards/Best Actor
• oscar3: Awards/Best
Actor/Daniel Day Lewis
Movie
title
description
oscar1
oscar2
oscar3
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Query
&q=lincoln&facet=oscar1,oscar2,oscar3 {"rank": "-text_relevance", "hits":{...}, "facets": { "oscar1": { "constraints": [ {"value": "Awards", "count": 23}, {"value": "Nominations", "count": 124}]}, "oscar2": { "constraints": [ {"value": "Awards/Best Actor", "count": 6}, {"value": "Awards/Best Actress", "count": 3}...]},
"oscar3": { "constraints": [ {"value": "Awards/Best Actor/Daniel Day Lewis", "count": 1}, {"value": "Awards/Best Actor/Denzel Washington", "count": 2}...]},
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Drilldown
• bq=oscar1:'Awards'
• bq=oscar2:'Awards/Best Actor'
• bq=oscar3:'Awards/Best Actor/Daniel Day Lewis'
• bq=(and 'star' oscar2:'Awards/Best Actor')
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Section Summary
• Simple faceting
• Hierarchical faceting
• Hierarchical data handling
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Agenda
• Sourcing your documents
• Retrieval and ranking
• Search user interface
• Performance and scale
• Developer example:
Peter Simpkin, Solution Architect, Elsevier
The Search Algorithm
• Locate documents that satisfy Boolean
constraints – Usually intersection
• Relevance rank those documents – Differentiated from databases by relevance
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Performance Best Practices
• Match set size
• Text queries perform better than integer queries
• Complex relevance functions
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Optimizing Index Size
• Trade off literal and uint for cost/performance
• Result fields matter most
• Enabling faceting increases size
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Wrap Up
• Sourcing documents from various locations
• Building queries and ranking
• UI Components for faceting
• Getting the most out of your index
Peter Simpkin
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Solution Architect, Elsevier
Agenda
• Elsevier Intro
• Search Problem Statement
• Enterprise Content Search
• Hints and Tips
• Amazon CloudSearch Observations
• 7,000+ employees in 26 countries
• 2,200 journals / article market
share 25%
• $3B revenue
• Scientific, Technical & Medical
Customers Products Academic
Research
Institutions
Government &
Health
Corporate
Research
Labs
Individual
Researchers
Content Systems
Content Challenges:
• No central place for consumers
to discover content
• Is not currently possible to
search and retrieve atomic
assets
• Assets are not reusable across
products Consumer Platforms
Enterprise Content Search Engine
Search Opportunities:
• Create a comprehensive
inventory to discover easily
content Elsevier owns
• Provide access to Granular /
Modular content they want at
will
• Assets must be uniquely
addressable
Empower our product development partners
Enterprise Content Search eco-system
Federated Content Warehouse Product Platform Data center
E.U Corporate Data center
U.S Corporate Data center
Amazon S3 Amazon
DynamoDB
Amazon SWF Amazon
CloudSearch SDF metadata
Simple Search UI
Elsevier Technical Drivers & Approach
• Fully-managed, full featured search service in
the cloud
• Automatically scales for data & traffic
• Easy to set up and use
• PoC created in days
• Search engine as a service
• Pay-as-you-go pricing model
Hints & Tips (and issn:'0022-1694'
(and type:'1.2'
(and (not action:'D')
(or (and pubstartdate:..2013176 pubenddate:2005002..)
(or (and pubstartdate:2005001
(and pubstarttime:0.. pubstarttime:..235959))
(or (and pubstartdate:2013177 pubstarttime:..235959)
(or (and pubenddate:2005001 pubendtime:0..)
(and pubenddate:2013177
(and pubendtime:..235959 pubendtime:0..)))))))))
• Query Response Time = 5 seconds
Optimising Nested Queries (and issn:'0022-1694' type:'1.2'
(not action:'D')
(or (and pubstartdate:..2013176 pubenddate:2005002..)
(and pubstartdate:2005001 pubstarttime:0..235959)
(and pubstartdate:2013177 pubstarttime:0..235959)
(and pubenddate:2005001 pubendtime:0..)
(and pubenddate:2013177 pubendtime:0..235959)))
• Response Time = 2.5 seconds
Optimised Nested Query ((not action:'D')
(or (and issn:'0022-1694' and type‘1.2'
and pubstartdate:..2013176 pubenddate:2005002..)
(and issn:'0022-1694' and type‘1.2'
and pubstartdate:2005001 pubstarttime:0..235959)
(and issn:'0022-1694' and type‘1.2'
and pubstartdate:2013177 pubstarttime:0..235959)
(and issn:'0022-1694' and type‘1.2'
and pubenddate:2005001 pubendtime:0..)
(and issn:'0022-1694' and type‘1.2'
and pubenddate:2013177 pubendtime:0..235959)))
• Response Time = 0.17ms
Amazon CloudSearch Observations
facilitates knowledge sharing on content matters across Elsevier’s product platforms
ability to leverage content infrastructure and capabilities across Elsevier’s divisions
easy to integrate with existing on-premise content systems
speed to market, allows developers to focus building other core content strategy components
need to spend time optimising queries to maximise performance
Resources
• Amazon CloudSearch Overview Page http://aws.amazon.com/cloudsearch/
– Developer Guide
– FAQs, Articles
– Community Forum
– Tutorial
• Free 30-day trial
• Contact: [email protected]
Please give us your feedback on this
presentation
As a thank you, we will select prize
winners daily for completed surveys!
SVC302