View
91
Download
0
Category
Preview:
Citation preview
André de Oliveira | @arbocombr
Intelligent Information DiscoveryMachine Driven Search
André de OliveiraSearch Engineering Lead, Liferay Inc.
André de Oliveira | @arbocombr
The explosion of content
André de Oliveira | @arbocombr
André de Oliveira | @arbocombr
What is yourinformation discovery strategy?
André de Oliveira | @arbocombr
▪ 1st Generation: BrowseWhat is your discovery strategy?
André de Oliveira | @arbocombr
▪ Customer "Journey" - with a handrail...▪ Breadcrumbs...▪ Product catalog with category tree...▪ Early "search" with SQL queries...
- Click oriented information discovery -(Still the strategy of many legacy applications)
1st Generation: Browsing for content
Discovery gateway:"The Navigation Menu"
André de Oliveira | @arbocombr
▪ 1st Generation: Browse▪ 2nd Generation: Search
What is your discovery strategy?
André de Oliveira | @arbocombr
▪ Search engines; full text search; facets▪ Customer Journey: freedom from "navigation"▪ User → Application: trial and error until found▪ Results "in the now" - missed new content
- Keyword oriented information discovery -(Strategy of choice for modern applications)
2nd Generation: Searching for content
Discovery gateway:"The Search Bar"
André de Oliveira | @arbocombr
▪ 1st Generation: Browse▪ 2nd Generation: Search▪ Next Generation: Predict
What is your discovery strategy?
André de Oliveira | @arbocombr
▪ Every user action is an indication of interest▪ Searches + Browsing paths + Purchase history...
▪ New content, matching interest? Show it now▪ Application → User: find for you - "search-less"▪ Customer Journey continually improves itself
- Interest oriented information discovery -Dominant discovery strategy of the future.
Next Generation: Predicting relevant content
Discovery gateway:"Everywhere"
André de Oliveira | @arbocombr
Intelligentinformation discovery
André de Oliveira | @arbocombr
▪ Beyond the database▪ More than filters: scores▪ Information retrieval▪ Full text queries▪ Ranking and relevance algorithms
It all starts with Search...
How to use a search engine to predict relevant content?
André de Oliveira | @arbocombr
Input = KeywordsOutput = Scored predictions
Predicting with a search engine
Using... Calculate score of...
Autocomplete User input against document titles
Did you mean…? User input against spellcheck dictionary
Suggest as you type User input against popular queries
More like this Whole result documents against other documents
Percolators Whole new documents against predefined queries
André de Oliveira | @arbocombr
▪ Data Science and Machine Learning▪ Neural networks can be trained to make predictions
▪ A scored guess that best matches prior known results▪ Universal, reusable mathematical algorithms
▪ Regression, Classification, Clustering...▪ A trained neural network is like an API
▪ As long as you can feed it numerical input
Input = NumbersOutput = Scored predictions
Rise of the machines
André de Oliveira | @arbocombr
Modeling your input universe as meaningful numbers
The essential challenge in Machine Learning
Picture?
Voice?
Search query?
User indication of
interest?
Scored prediction
Scored prediction
André de Oliveira | @arbocombr
Pictures ⇒ Pixels ⇒ Numbers
Image classification
André de Oliveira | @arbocombr
Sound waves ⇒ Frequencies and amplitudes ⇒ NumbersSpeech recognition
André de Oliveira | @arbocombr
Facial landmarks ⇒ Measurements ⇒ NumbersFace detection
André de Oliveira | @arbocombr
Universe ⇒ Numerical model ⇒ Algorithms ⇒ Scored predictions
Predicting with Deep Learning: Confidence
Picture?
Voice?
Search query?
User indication of
interest?
Scored prediction
Scored prediction
#LRDEVCON 2017 Highlight
"Machine Learning and DXP - The best of 2 worlds"Carlos Hernandez & Filipe Afonso,
Senior Consultants, Liferay
André de Oliveira | @arbocombr
▪ A similarity is a scoring / ranking model▪ Leverage information retrieval algorithms
▪ BM25, ▪ divergence from randomness, ▪ divergence from independence, ▪ information based...
▪ Can be mapped per field - fine tuning▪ Some models suit shorter fields better (BM25)
▪ Elasticsearch Similarity Module
Predicting with Search: Similarity
"value" : 2.7051764, "description" : "score(doc=0,freq=1.0), product of:", "details" : [ { "value" : 0.66422296, "description" : "queryWeight, product of:", "details" : [ { "value" : 4.0726933, "description" : "idf(docFreq=4, maxDocs=108)", "details" : [ ] }, { "value" : 0.16309182, "description" : "queryNorm", "details" : [ ] } ] }, { "value" : 4.0726933, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0", "details" : [ ] } ] }, { "value" : 4.0726933, "description" : "idf(docFreq=4, maxDocs=108)", "details" : [ ] }, { "value" : 1.0, "description" : "fieldNorm(doc=0)", "details" : [ ]
Elasticsearch Explain API
André de Oliveira | @arbocombr
▪ Numerical representation of a search▪ TF–IDF = Term Frequency–Inverse Document Frequency▪ Tokenize the text▪ Create appropriate term vectors▪ Prepare a TF–IDF matrix
Textual analysis
Term Term Countthis 1is 1a 2sample 1
Document 1
Term Term Countthis 1is 1another 2example 3
Document 2
André de Oliveira | @arbocombr
TF–IDF (Term Frequency–Inverse Document Frequency)
In question form... Score increases...
Term Frequency How often does a term appear in a field? + When the term pops up a lot of times along the text
Inverse Document Frequency
How rare is the term in the whole index? + When the term is found in this document and not many others
Field-length Norm How short is the field where the term is? + When there isn't much else in the same field (like, a title)
André de Oliveira | @arbocombr
▪ Sanitize out "stopwords"▪ Irrelevant words and phrases
▪ Spell check aggressively▪ "Search for … instead"
▪ Predict non text fields as well▪ Serial numbers (and how to analyze them)
▪ More than one way to say the same thing▪ Synonyms; alternate spellings; separate v. combined words
▪ Contextualize (every application is different)▪ The "adwords" effect - search for this, always show that
Improving textual analysis for better predictions
André de Oliveira | @arbocombr
Machine driven search
André de Oliveira | @arbocombr
▪ Predict relatable searches▪ Given a user initiated search,▪ in a universe of searches from all customers,▪ classify by similar interest group,▪ suggest and push predicted relatable results
▪ A user search is an indication of interest in itself▪ Algorithms / recommender systems▪ Smart content delivery
Machine Driven Search
André de Oliveira | @arbocombr
▪ Autocomplete: User input → Document titles▪ Suggest as you type: User input → Popular queries▪ Base case is quickly exhausted; be creative▪ Primary field (e.g. title)
▪ Combine multiple queries for score - match, prefix, phrase▪ Multiple fields
▪ More matches, and may be what they're really looking for▪ Multiple target entities
▪ Why limit to one kind of content?▪ Organize and render drop list accordingly
Predict with Search: Instant results in search bar
André de Oliveira | @arbocombr
▪ Find content similar to a previous existing document▪ and/or additional user input ▪ and/or arbitrary content
▪ MLT on specific target fields▪ “title”, “description”, “content”
▪ TF–IDF is key▪ Input is analyzed same as target fields▪ TF–IDF is calculated for all terms▪ Top terms with highest TF–IDF are selected▪ Combined "OR" query with top terms only
Predict with Search: More Like This
André de Oliveira | @arbocombr
▪ “Classify” content from users, given any number of rules▪ Rules are registered as “percolator queries”
▪ Submit incoming documents to rule set▪ “Would this doc match this query?”
▪ Response indicates rules that matched▪ Content can then be “classified” accordingly
Predict with Search: Percolate
#LRDEVCON 2017 Highlight
"Going in reverse to move forward: How reverse querying gives you fully automated publishing"
Jan Verweij, Sales Engineer, Liferay
André de Oliveira | @arbocombr
▪ Store “successful” queries for users▪ Definition of “successful” according to your Customer Journey▪ Indication of interest
▪ Cluster directly visited content with successful result hits▪ To further refine content relevancy
▪ Cluster successful queries from users with a similar journey▪ “Users belonging to customer profile X also search for Y”▪ Data Science and Single Customer View
Predict with Search: Recommended for YOU
#LRDEVCON 2017 Highlight
"Single Customer View Demystified"Jonathan Lee, Product Manager, Liferay
André de Oliveira | @arbocombr
▪ Discovery gateway: everywhere▪ Front page (Since your last visit)▪ Search bar (As you type)▪ Search results (Also looked for)▪ Content view (You may also like)▪ Push notifications (Never miss another)
▪ Match new content ▪ previous "successful" actions from user
▪ Anticipate and influence▪ Ever-improving Intelligent Information Discovery
Smart Content Delivery
André de Oliveira | @arbocombr
What’s next for Liferay Search
André de Oliveira | @arbocombr
Elasticsearch 6
André de Oliveira | @arbocombr
▪ Rethinking Indexer architecture for extensibility▪ Composition over inheritance▪ Small, single-purpose components▪ New extension points▪ Reusable▪ Easier to test
Modular Search infrastructure
André de Oliveira | @arbocombr
▪ Building custom Search experiences▪ Reusable and flexible▪ Search Page Templates▪ Search Bar▪ Search Results▪ Multi Selection Facets▪ Insights▪ Map▪ … and your own Search-aware Portlets
New Search Components
André de Oliveira | @arbocombr
▪ Powered by Search, end to end▪ Improved User Experience▪ Data driven - and faster than the database▪ Ready for Intelligent Information Discovery
Liferay Commerce
#LRDEVCON 2017 Highlight
"Liferay Commerce: A Preview of Our Upcoming Features"Marco Leo, Software Architect, Liferay
André de Oliveira | @arbocombr
Intelligent Information Discovery
LiferayDXP
Search engine (Elasticsearch 6)
AIaaS(APIs)
- Autocomplete- Did you mean…?
- Suggest as you type- More like this- Percolators- Similarity
- Image classification- Speech recognition
- Face detection
Data Drivenapplication infrastructure
(WeDeploy)
- Search classification- Customer interest prediction
- Machine learning trained models- Recommendation engines
André de Oliveira | @arbocombr
Thank youand many discoveries at #LRDEVCON
Recommended