17
Smart Search in Kentico 6.0 1/18/2012 Miro Remias, Solution Architect

Smart Search in Kentico 6.0

  • Upload
    candie

  • View
    73

  • Download
    2

Embed Size (px)

DESCRIPTION

Smart Search in Kentico 6.0. 1/18/2012 Miro Remias, Solution Architect. Agenda. Smart Search: How It Works Index Types Analyzer Types Related Scheduled Tasks & Keys Example - Searching In Content Of Media Files. How It Works. - PowerPoint PPT Presentation

Citation preview

Page 1: Smart Search in  Kentico  6.0

Smart Search in Kentico 6.0

1/18/2012 Miro Remias, Solution Architect

Page 2: Smart Search in  Kentico  6.0

Agenda

• Smart Search: How It Works• Index Types• Analyzer Types• Related Scheduled Tasks & Keys• Example - Searching In Content Of Media

Files

Page 3: Smart Search in  Kentico  6.0

How It Works

DefinitionSmart Search is index-based searching through the content of websites or other objects within the system (3rd party library - Lucene.Net, v 2.1.0).

Where to find the index file(s)?File system: /App_Data/CMSModules/SmartSearch/<Index code name>

Customization: CMSSearchIndexPath ("App_Data\\CMSModules\\SmartSearch\\")

How to analyze the index file?Luke - Lucene Index Toolbox (http://www.getopt.org/luke/)

Note: Don't forget to have the write disk permission assigned to the App_Data folder!

Page 4: Smart Search in  Kentico  6.0

How It Works

“When a search request is sent to the system by a user, it is the index file that gets searched, which results in significantly better performance compared to linear SQL query search.”

Life cycle of a document/object in the index file:

A) When a document/object is created/updated/deleted, new indexing task is logged in the database.

B) The database (CMS_SearchTask table) is automatically checked (on a regular basis) for the presence of indexing tasks.

C) The task is processed and document/object is added/updated/deleted to/in/from the index file.

Page 5: Smart Search in  Kentico  6.0

How It Works - Database

• CMS_SearchIndex• CMS_SearchIndexCulture• CMS_SearchIndexSite• CMS_SearchTask (API: SearchTaskInfo)

SearchTaskType (nvarchar) - (SearchTaskTypeEnum: Update, Delete, Rebuild, Optimize, Process)SearchTaskObjectType (nvarchar) - (PredefinedObjectType: ABTEST, ACCOUNT, BIZFORM etc.)SearchTaskField (nvarchar) - usually name of the ID field.SearchTaskValue (nvarchar) - usually the object/document ID.SearchTaskServerName (nvarchar) - server name in case web farms are used.SearchTaskStatus (SearchTaskStatusEnum: Ready, InProgress).SearchTaskPriority (int) - higher value = higher priority.SearchTaskCreated (datetime) - task creation date.

Page 6: Smart Search in  Kentico  6.0

Index Types

1) Custom index - indexes any kind of data depending on its implementation.2) Custom tables - indexes records in custom tables.3) Documents - indexes content of documents in the content tree.4) Documents crawler - indexes the content of the HTML output generated by documents in the content tree.

• Customization options: CMS.SiteProvider.SearchHelper.OnHtmlToPlainTextTriggered when the HTML output is processed by a crawlerCMS.SiteProvider.SearchHelper.HtmlToPlainText() Converts html to the plain text (body part)CMS.SiteProvider.SearchHelper.DownloadHtmlContent(url)Returns complete HTML code of the page based on the provided URLCMS.TreeEngine.TreeNode.GetSearchDocument()Returns Lucene Document object

5) Forums - indexes content of discussion forums.6) General - indexes objects of a specified type. Any objects within the CMS can be searched this way.7) Users - indexes details about system users.

Page 7: Smart Search in  Kentico  6.0

Analyzer Types

Tokenized Field - indicates if the content of the field should be processed by the analyzer when indexing. The general rule is to use this for Content fields and not for Searchable fields.

1) Custom - Option of performing tokenization according to your particular requirements.2) Keyword - Tokenizes the entire stream as a single token. This is useful for data like zip codes, ids, and some product names.3) Simple - divides text at non-letter characters.4) Standard - grammar-based analyzer (stop-words, shortcuts, ...), very efficient for English, but may not produce satisfactory results with other languages.5) Starts with - tokenizes all prefixes contained in words, which allows searching for words that start with the entered string. Text is divided at whitespace characters. Example: abc => a ab abc6) Stop - contains a collection of stop-words at which text is divided.7) Subset - tokenizes all substrings in words, which allows searching for words that contain the entered string. Text is divided at whitespace characters.

Example: abc => abc ab bc a b c8) White space - divides text at whitespace characters.

Note: Stop words - dictionary containing words which will be omitted from indexing (e.g. 'and', 'or', ...) when a Stop or Standard analyzer is used (~\App_Data\CMSModules\SmartSearch\_StopWords)Starts with & Subset analyzers are using CMS.SiteProvider.SubSetAnalyzer class.

Page 8: Smart Search in  Kentico  6.0

Related Scheduled Tasks & KeysScheduled tasks:• Optimize search indexes (Search.IndexOptimizer) [Enabled] - performs index optimization

(defragmentation resulting in better performance, particularly in the case of large indexes). By default executed once per week.

• Execute search tasks (Search.TaskExecutor) [Enabled] - executes indexing tasks (created and executed automatically when the indexed content changes) that were not completed successfully on their automatic execution. By default performed every 4 hours.

Web.config keys:• CMSRemoveDiacriticsForIndexField (true) - indicates whether diacritics should be removed for

index field.• CMSSearchStoreContentField (false) - indicates whether content field should be stored in the index• CMSSmartSearchIndexCategories (false) - indicates whether document categories should be

indexed.• CMSSearchContentXpathValue ("//property[@name='text' or @name='contentbefore' or

@name='contentafter']") - webparts fields should be added to the search document content.• CMSProcessSearchTasksByScheduler (false) - If true, smart search tasks are processed by scheduler.• CMSCreateTemplateSearchTasks (true)- Any changes made to a page template will automatically

trigger an update of all documents that are based on the given template in the appropriate smart search indexes.

Page 9: Smart Search in  Kentico  6.0

Example - Searching In Content Of Media Files

Requirements: • Be able to search in content of media files.• Whenever the media file is updated/deleted/inserted, smart

search index should be updated as well.Process: • Create custom index - Rebuild operation needs be able to read

media library file definitions [media_file table] plus physical content of them on file system and index them afterwards.

• Create custom analyzer (optional).• Use global events to react to insert/delete/update events of

media files in order to create indexing tasks (CMS_SearchTask).• Create scheduled task for processing these indexing tasks.

Page 10: Smart Search in  Kentico  6.0

Example - Creating Custom Index

Steps:

A.) Create class file that implements CMS.Siteprovider.ICustomSearchIndex interface.B.) Implement Rebuild method.C.) Register the custom index in the CMS.D.) Rebuild index and test.

Note: Step-by-step guide available here: http://devnet.kentico.com/docs/devguide/smart_search_defining_custom_index_content.htm

Page 11: Smart Search in  Kentico  6.0

Example - Creating Custom Analyzer

Steps:

A.) Create class file that inherits from Lucene.Net.Analysis.Analyzer class.B.) Implement TokenStream method.C.) Create class file that inherits from Lucene.Net.Analysis.Tokenizer class.D.) Implement Next method.E.) Register the custom analyzer for index in the CMS.F.) Rebuild index and test.

Note: Step-by-step guide available here: http://devnet.kentico.com/docs/devguide/smart_search_using_a_custom_analyzer.htm

Page 12: Smart Search in  Kentico  6.0

Example - Creating Smart Search Task

Steps:

A.) Register for Insert, Update, Delete events of MediaFileInfo object with global events.B.) Create SearchTaskInfo object (record in CMS_SearchTask).

• Delete task - SearchTaskInfoProvider.CreateTask()• Update/Insert task - SearchTaskInfo

Page 13: Smart Search in  Kentico  6.0

Example - Creating Smart Search Task Processor

Steps:

A.) Create scheduled task class file that inherits from Itask class.B.) Implement Execute method.C.) Register scheduled task in CMS.

Note: Step-by-step guide available here:http://devnet.kentico.com/docs/devguide/scheduling_a_custom_code.htm

Page 14: Smart Search in  Kentico  6.0

Tips

• Documents that have their Exclude this document from search property enabled will not be indexed. This property can be configured by selecting a document from the content tree in CMS Desk and going to Content -> Edit -> Properties -> General.

• If Smart Search is not working - check the event log for possible exception/error to investigate (CMS Site Manager -> Administration -> Event log).

• Remember: Smart Search condition is not a SQL WHERE condition!• Smart Search result DataSet contains columns: id, type, score, position,

title, content, created, image (_customurl in case of custom index).

Page 15: Smart Search in  Kentico  6.0

Questions

?

Page 16: Smart Search in  Kentico  6.0

Sources

Developer’s Guide• http://devnet.kentico.com/docs/devguide/smart_search_overview.htm

Page 17: Smart Search in  Kentico  6.0

Contact

Miro Remias• e-mail: [email protected]• consulting: http://www.kentico.com/Support/Consulting/Overview