Solr pattern

  • View
    2.575

  • Download
    4

Embed Size (px)

DESCRIPTION

A pattern for implementing search where search is a business problem, not a technological problem, especially if you've chosen Solr!

Text of Solr pattern

  • A PATTERN FOR IMPLEMENTING SOLR

    1

    1

  • BOTTOM LINE UP FRONT

    Migrating from an existing search architecture to the Solr platform is less an exercise in technology and coding, and more an exercise in project management, metrics, and managing expectations.

    2

  • Typically smart people, fed into the search migration project meat grinder, produce hamburger quality results. Okay search, with okay relevance, and an okay project. But if you apply this pattern, you'll get back steak! - Arin Sime

    3

  • Project definition We Start Here

    Precursor Work

    Prototype Typical starting point for technology driven team

    Implementation

    Testing/QA repeats!

    Deployment

    Ongoing Tuning Forgotten phase for a technology driven team

    I want feedback!

    4

  • PROGRAMMERS DOMINATE

    We dive right into writing indexers and building queries

    We skip the first two phases!

    We dont plan for the last phase!

    5

  • NEED HETEROGENOUS SKILLS

    More so than regular development project, we need multiple skills:

    Business Analysts

    Developers

    QA/Testers

    Report Writers

    Big Brain Scientists

    Content Folks (Writers)

    End Users

    UX Experts

    Ops Team

    Librarians!

    6

  • PHASE 1: PROJECT DEFINITION

    Well understood part of any project right?

    objectives, key success criteria, evaluated risks

    Leads to a Project Charter :

    structure, team membership, acceptable tradeoffs

    7

  • CHALLENGES Competing business stakeholders:

    Tester : When I search for lamp shades, I used to see these documents, now I see a differing set.

    Business Owner: How do I know that the new search engine is better?

    User: My pet feature search within these results works differently.

    Marketing Guy: I want to control the results so the current marketing push for toilet paper brand X always shows up at the top.

    8

  • CHALLENGES

    Stakeholders want a better search implementation, but perversely often want it to all work the exact same way. Getting agreement across all the stakeholders for the project vision, and agree on the metrics is a challenge.

    9

  • CHALLENGES

    Can be difficult to bring in non technical folks onto Search Team.

    Have a content driven site? You need them to provide the right kind of content to fit into your search implementation!

    10

  • ENSURING SKILLS NEEDED

    Search is something everybody uses daily, but is its own specialized domain

    Solr does pass the 15 minute rule, dont get over confident!

    11

  • PERFECT SOLR PERSON WOULD BE ALL OF

    Mathematician

    Librarian

    UX Expert

    Writer

    Programmer

    Business Analyst

    Systems Engineer

    Geographer!

    Psychologist

    12

  • KNOWLEDGE TRANSFER

    If you dont have the perfect team already, bring in experts and do domain knowledge transfer.

    Learn the vocabulary of search to better communicate together

    auto complete vs auto suggest

    Do Solr for Content Team brownbag sessions!

    13

  • 14

  • HAVE A COOL PROJECT NAME!

    15

  • PROJECT LIMELIGHT

    Putting our content in the lime light

    16

  • PHASE 2: PRECURSOR WORK

    A somewhat tenuous phase, this is making sure that we can measure the goals defined in the project definition.

    Do we have tools to track increase conversions through search?

    In a greenfield search, we dont have any previous relevancy/recall to measure against, but in a brownfield migration project we can do some apples to (apples? oranges?) comparisons.

    17

  • METRICS

    18

  • DATA COLLECTION

    Have we been collecting enough data about current search patterns to measure success against?

    Often folks have logs that record search queries but are missing crucial data like number of results returned per query!

    19

  • RELEVANCY

    Do we have any defined relevancy metrics?

    Relevancy is like porn.....

    20

  • I KNOW IT WHEN I SEE IT!

    http://en.wikipedia.org/wiki/Les_Amants

    21

  • 22

  • MEASURE USER BEHAVIOR

    Are we trying to solve user interaction issues with existing search?

    Do we have the analytics in place? Google Analytics? Omniture?

    23

  • POGOSTICKINGimage from http://searchpatterns.org/

    24

  • THRASHINGimage from http://searchpatterns.org/

    25

  • BROAD BASE OF SKILLS

    Not your normal I am a developer, I crank out code type of tasks!

    26

  • INVENTORY USERS

    Search often permeates multiple systems... I can just leverage your search to power my content area

    Do you know which third party systems are actually accessing your existing search?

    A plan for cutting the cord on an existing search platform!

    Users as in Systems!

    27

  • PHASE 3: PROTOTYPE

    The fun part!

  • GOING FROM QUESTIONS TO ANSWERS

    29

  • INDEXING: PUSH ME PULL ME

    Are we in a pull environment?

    DIH

    Crawlers

    Scheduled Indexers

    Are we in a push environment?

    Sunspot

    30

  • VERIFY INDEXING STRATEGY

    Use the complete dataset, not a partial load!

    Is indexing time performance acceptable?

    Quality of indexed data? Duplicates? Odd characters?

    31

  • WHERE IS SEARCH BUSINESS LOGIC?

    Does it go Solr side in request handlers (solrconfig.xml?)

    Is it specified as lots of URL parameters?

    Do you have a frontend library like Sunspot that provides a layer of abstraction/DSL?

    32

  • HOOKING SOLR UP TO FRONTEND

    The first integration tool may not be the right one!

    A simple query/result is very easy to do.

    A highly relevant query/result is very difficult to do.

    33

  • PART OF PROTOTYPING IS DEPLOYMENT

    Make sure when you are demoing the prototype Solr, its been deployed into an environment like QA

    Running Solr by hand on a developers laptop is NOT enough.

    Figuring out deployment (configuration management, environment, 1-click deploy) need to be at least looked at

    34

  • PHASE 4: IMPLEMENTATION

    Back on familiar ground! We are extending the data being indexed, enhancing search queries, adding features.

    Apply all the patterns of any experienced development team.

    Just dont forget to involve your non techies in defining approaches!

    35

  • INDEXERS PROLIFERATE!

    Make sure you have strong patterns for indexers

    A good topic for a code review!

    36

  • PHASE 5: TESTING/QA

    Most typical testing patterns apply EXCEPT

    Can be tough to automate testing if data is changing rapidly

    You want the full dataset at your finger tips

    You can still do it!

    37

  • WATCH OUT FOR RELEVANCY!

    Sometimes seems like once you validate one search, the previous one starts failing

    How do you empirically measure this?

    Need production like data sets during QA

    Dont get tied up in doc id 598 is the third result. Be happy 598 shows up in first 10 results!

    38

  • EXPLORATORY TESTING?

    ...simultaneous learning, test design and test execution

    Requires tester to understand the corpus of data indexed

    behave like a user

    http://en.wikipedia.org/wiki/Exploratory_testing

    James Bach

    39

  • STUMP THE CHUMP

    You can always write a crazy search query that Solr will barf on... Is that what your users are typing in?

    40

  • DOES SOLR ADMIN WORK?

    Do searches via Solr Admin reflect what the front end does? If not, provide your own test harness!

    Make adhoc searches by QA really really easy

    Just type these 15 URL params in! is not an answer!

    41

  • PHASE 6: DEPLOYMENT

    Similar to any large scale system

    Network plumbing tasks, multiple servers, IP addresses

    Hopefully all environment variables are external to Solr configurations?

    Think about monitoring.. Replication, query load!

    42

  • DO YOU NEED UPTIME THROUGH RELEASE?

    Solr is both code, configuration, and data! Do you have to reindex your data?

    Can you reindex your data from someplace else?

    43

  • 44

  • PRACTICE THIS PROCESS!

    mapping out the steps to backup cores, redeploy new ones, update master and slave servers is fairly straightforward if done ahead of time

    These steps are a great thing to involve your Ops team in

    45

  • PHASE 7: ONGOING TUNING

    The part we forget to budget for!

    Many knobs and dials available to Solr, need to keep tweaking them as:

    data set being indexed changes

    as behavior of users changes

    46

  • HAVE REGULAR CHECKINS WITH CONTENT PROVIDERS

    Have an editorial calender of content? Evaluate what synonyms you are using based on content

    Can you better highlight content using Query Elevation to boost certain documents?

    47

  • QUERY TRENDS

    Look at queries returning 0 results

    are queries getting slower/faster

    are users leveraging all the features available to them

    Does your analytics highlight negative behaviors such as pogosticking or thrashing?

    AUTOMATE THESE REPORTS!

    48

  • Less than 0.5 s69%

    0.5-1.0s20%

    1.0-1.5s6%

    1.5-2.0s2%2.0-2.5s

    1%> 2.5s

    2%Qu