Upload
omarion-tifft
View
220
Download
3
Embed Size (px)
Citation preview
Demystifying Endeca’s search results ranking
Kristina Spurginwith input & support from Ben Pennell & Jeff Campbell
UNC Libraries
Endeca details
• Search Configuration and Relevance Ranking– The supported search methods and details on how
results are ranked for each• TRLN Endeca Data Model– The major field groups, with brief descriptions of
their use, and indexing and display properties.• Endeca Extract and Mappings Spreadsheet– Details on how MARC fields get mapped into
Endeca fields
TRLN Endeca Search Interfaces
• Words anywhere (i.e. Keyword)• Author• Title• Journal title• Subject• ISBN/ISSN• (Publisher)
How to think about RelRank
Image source
Spotting the relevancy strata
• Subject search relevancy strategy– Exact phrase match, starting from beginning of a
single field is the gold-standard match– Subject heading search: commonplace book
PubDateSort = 1700
No pub date!
A more complex search: keyword(AKA “Words anywhere”)
“Searches all indexed fields, but only uses some fields to rank results.” -- Search Configuration and Relevance Ranking
What fields are indexed?• Guide to the TRLN Endeca Data Model gives
some info
What fields are indexed?• Endeca Extract and Mappings Spreadsheet gives
the detailed info.
More on keyword search(AKA “Words anywhere”)
“Matches in the main title, subject headings, and main author fields will be given the highest ranking.” -- Search Configuration and Relevance Ranking
More on keyword search(AKA “Words anywhere”)
“Queries that match as a phrase are ranked higher than those which do not.” -- Search Configuration and Relevance Ranking
More on keyword search(AKA “Words anywhere”)
“Exact term matches are ranked higher than those returned because of spell correction, stemming, and thesaurus lookups.” -- Search Configuration and Relevance Ranking
More on keyword search(AKA “Words anywhere”)
“Matches in tables of contents, summaries, or selected EAD elements are not used to determine ranking.” -- Search Configuration and Relevance Ranking
An aside on keyword search(AKA “Words anywhere”)
Fields used to rank Keyword resultsMost important to least
Main TitleMain Title NormalizedTitle VernacularTitle Vernacular SegmentedSubject HeadingsSubjects NormalizedSubjects Vernacular SegmentedMain AuthorMain Author NormalizedMain Author VernacularMain Author Vernacular SegmentedCompanyVarying TitlesVarying Titles Vernacular SegmentedOther AuthorsOther Author TranslationAuthors NormalizedMain Uniform TitleMain Uniform Title VernacularMain Uniform Title Vernacular SegmentedUniform TitleUniform Title VernacularUniform Title Vernacular SegmentedTitle Index
Earlier TitleLater TitleHost Item LinkingUncontrolled SubjectOther TitlesOther Title TranslationTranslated as LinkingTranslation of LinkingSeries Title IndexSeries StatementSeries NormalizedSeries Statement VernacularSeries Statement Vernacular SegmentedPublisherPublisher NormalizedSound Recording ImprintDirectorPerformer CreditsProduction CreditsBiographical SketchRelated CollectionsDigital CollectionGenreProduct
Fields used to rank Title resultsMost important to least
Title1Title2Title3Title4Main TitleMain Title NormalizedJournal Title IndexTitle VernacularTitle Vernacular SegmentedVarying TitlesTitles NormalizedVarying Titles Vernacular SegmentedMain Uniform TitleMain Uniform Title VernacularMain Uniform Title Vernacular Segmented
1 word titles
2 word titles
3 word titles
Fields used to rank Journal Title resultsMost important to least
Journal Title IndexJournal Uniform TitleJournal Title AbbreviationJournal Later TitleJournal Earlier Title
Fields used to rank Author resultsMost important to least
Main AuthorMain Author NormalizedMain Author VernacularMain Author Vernacular SegmentedDirectorPerformer CreditsProduction CreditsAuthor
Fields used to rank Subject resultsMost important to least
Subject HeadingsSubjects Vernacular SegmentedSubjects NormalizedGenre
What is irrelevant to relevancy?
• Many aspects of the record are NOT considered in relevancy ranking
• FORMAT is the biggest surprise, it seems
And, with that whirlwind tour…
Image source