20
Apache Solr Advanced use cases Artem Sylchuk

DrupalTour. Lviv — Apache solr. Advanced use cases (Artem Sylchuk, InternetDevels)

Embed Size (px)

Citation preview

Apache SolrAdvanced use cases

Artem Sylchuk

Putting a Search Engine on Your Website

● Installing Your Own Search Engine Script● Using a Free or Commercial Third Party

Hosted Search Engine Service● Using the Major Search Engines

$sql = "SELECT `ID`, `FirstName`, `LastName` FROM `Contacts` WHERE `FirstName` LIKE '%" . $letter . "%' OR `LastName` LIKE '%" . $letter ."%'";

Search in Drupal core

1. Builds index from strings2. Parses tags3. Does trims and cleanups4. Calculates words score5. Handle links between nodes and users6. Easily extandable

https://www.acquia.com/blog/drupal-search-how-indexing-works

Intro to Apache Solr

1. Advanced Full-Text Search Capabilities2. Optimized for High Volume Traffic3. Standards Based Open Interfaces - XML, JSON and HTTP4. Comprehensive Administration Interfaces5. Easy Monitoring6. Highly Scalable and Fault Tolerant7. Flexible and Adaptable with easy configuration8. Near Real-Time Indexing9. Extensible Plugin Architecture10. Schema when you want, schemaless when you don't11. Powerful Extensions12. Faceted Search and Filtering13. Geospatial Search14. Advanced Configurable Text Analysis15. Highly Configurable and User Extensible Caching16. Performance Optimizations17. External Configuration via XML18. Advanced Storage Options19. Monitorable Logging20. Query Suggestions, Spelling and More21. Your Data, Your Way!22. Rich Document Parsing23. Apache UIMA24. Multiple search indices

Lucene, ElasticApache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java. It is a

technology suitable for nearly any application that requires full-text search, especially cross-platform.Apache Lucene is an open source project available for free download.

Elastic:- Schemaless- True REST via JSON- Live updates- Fast Indexing

Features

Search● Fulltext search● Fuzzy search● Stemmer● Transliteration● Tokenizer● Stopwords● Highlighting● Spellcheck● Suggestions● Excerpt● Facets

● Geoclustering● Search withing a distance (proximity search)● Search by polygon● Support for spatial fields● Bounding-box filter● Facet by distance● Boosting closest results

Geospatial

AnalyticsFaceting

● Field Faceting● Support for int, long, float, double, date, string fields● Support for multi-value fields● Support for limit, offset and mincount● Support for sorting of stats-facets by any statistic (ie sort by mean)● Support for range faceting of stats (numeric types and dates)● Support for query faceting of stats

Statistical Expressions● All stats provided by StatsComponent (min,max,count,stddev,sum,sumofsquares,mean)● Unique count● Median value● Percentiles (eg 90th percentile)● Statistics on a combination of fields (eg mean of (field A * field B))● Expressions of statistics (eg (mean of field A) * (mean of field B))

Setting up the Search

1. Download2. Untar3. Run ./bin/solr start4. Enjoy Searching5. …6. Go back and fix xml configs7. Define fields schema 8. Have a lot of pain and goto 6

Query exampleGET /solr/core0/select?fl=%2A%2Cscore&start=0&rows=25&sort=ds_field_entry_date%20desc&facet=true&facet.sort=count&facet.limit=10&facet.mincount=1&facet.missing=false&facet.field=%7B%21ex%3Dfacet%3Afield_beds%7Dis_field_beds&facet.field=im_field_related_transit_route&facet.field=%7B%21ex%3Dfacet%3Afield_fulls_baths%7Dis_field_fulls_baths&facet.field=%7B%21ex%3Dfacet%3Afield_terms_considered%7Dsm_field_terms_considered&facet.field=ss_field_short_sale&facet.field=ss_field_reo&facet.field=ss_field_property_type&facet.field=%7B%21ex%3Dfacet%3Afield_status%7Dss_field_status&facet.field=%7B%21ex%3Dfacet%3Afield_pets_allowed%7Dss_field_pets_allowed&facet.field=%7B%21ex%3Dfacet%3Afield_related_nhood%7Dim_field_related_nhood&facet.field=im_field_related_transportation_sys&facet.field=%7B%21ex%3Dfacet%3Afield_waterfront_description%7Dsm_field_waterfront_description&facet.field=%7B%21ex%3Dfacet%3Afield_type_of_property%7Dsm_field_type_of_property&f.is_field_beds.facet.limit=-1&f.is_field_beds.facet.missing=true&f.im_field_related_transit_route.facet.limit=-1&f.is_field_fulls_baths.facet.limit=-1&f.sm_field_terms_considered.facet.limit=50&f.ss_field_short_sale.facet.limit=-1&f.ss_field_reo.facet.limit=-1&f.ss_field_property_type.facet.limit=-1&f.ss_field_status.facet.limit=50&f.ss_field_pets_allowed.facet.limit=-1&f.im_field_related_nhood.facet.limit=-1&f.im_field_related_transportation_sys.facet.limit=-1&f.sm_field_waterfront_description.facet.limit=-1&f.sm_field_type_of_property.facet.limit=-1&wt=json&json.nl=map&q.alt=%28is_active%3A%221%22%29%20%28index_id%3A%22mls%22%29%20%28hash%3A6s9h6d%29 HTTP/1.0Authorization: Basic YWRtaW46a2V4eXNyMjU=User-Agent: Drupal (+http://drupal.org/)Host: localhost:8983

Performance & Scalability

1. It is fast.2. It is scallable.

Drupal Integration. Search API Solr

Questions?