Upload
michael-reinsch
View
635
Download
2
Embed Size (px)
Citation preview
Finding the right stuff
Michael Reinsch
an intro to Elasticsearch with Ruby/Rails
at Ruby User Group Berlin, Feb 2016
How does it fit into my app?
Blackbox with REST API
elasticsearch
Update API: your app pushes updates (updates are fast, but asynchronous)
Search API: returns search results
For Ruby / Rails
• https://github.com/elastic/elasticsearch-rails
• gems for Rails:
• elasticsearch-model & elasticsearch-rails
• without Rails / AR:
• elasticsearch-persistence
class Event < ActiveRecord::Base include Elasticsearch::Model
class Event < ActiveRecord::Base include Elasticsearch::Model
def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601, featured: group.featured? } end
class Event < ActiveRecord::Base include Elasticsearch::Model
def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601, featured: group.featured? } end
settings do mapping dynamic: 'false' do indexes :title, type: 'string' indexes :description, type: 'string' indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end
Event.import
Elasticsearch cluster
Index: events
Type: event
doc 1
Elasticsearch cluster
Index: creations
Type: creation
doc 1
Type: activity
doc 2 doc 1
Index: events
Type: event
doc 1
Elasticsearch cluster
Documents, not relationships
compose documents with all relevant data
➜ "denormalize" your data
class Event < ActiveRecord::Base include Elasticsearch::Model
def as_indexed_json(options={}) { titles: [ title1, title2 ], locations: locs.map(&:as_indexed_json)
} end
settings do mapping dynamic: 'false' do indexes :titles, type: 'string' indexes :locations, type: 'nested' do indexes :name, type: 'string' indexes :address, type: 'string' indexes :location, type: 'geo_point' end end end
Event.search 'tokyo rubyist'
response = Event.search 'tokyo rubyist'
response.took # => 28
response.results.total # => 2075
response.results.first._score # => 0.921177
response.results.first._source.title # => "Drop in Ruby"
response.page(2).results # => second page of results
response = Event.search 'tokyo rubyist'
response.took # => 28
response.results.total # => 2075
response.results.first._score # => 0.921177
response.results.first._source.title # => "Drop in Ruby"
response.page(2).results # => second page of results supports kaminari /
will_paginate
response = Event.search 'tokyo rubyist'
response.records.to_a # => [#<Event id: 12409, ...>, ...]
response.page(2).records # => second page of result records
response.records.each_with_hit do |rec,hit| puts "* #{rec.title}: #{hit._score}" end # * Drop in Ruby: 0.9205564 # * Javascript meets Ruby in Kamakura: 0.8947 # * Meetup at EC Navi: 0.8766844 # * Pair Programming Session #3: 0.8603562 # * Kickoff Party: 0.8265461
Event.search 'tokyo rubyist'
Event.search 'tokyo rubyist'
only upcoming events?
Event.search 'tokyo rubyist'
only upcoming events?
sorted by start date?
Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } }
Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } }
our query
Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } }
filtered by conditions
our query
Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } }
filtered by conditions
sorted by start time
our query
Query DSL
query: { <query_type>: <arguments> }filter: { <filter_type>: <arguments> }
valid arguments depend on query / filter type
Query DSL
query: { <query_type>: <arguments> }filter: { <filter_type>: <arguments> }
valid arguments depend on query / filter type
scores matching documents
Query DSL
query: { <query_type>: <arguments> }filter: { <filter_type>: <arguments> }
valid arguments depend on query / filter type
scores matching documents
filters documents
Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }
Match QueryMulti Match Query
Bool Query Boosting Query
Common Terms Query Constant Score Query
Dis Max Query Filtered Query
Fuzzy Like This Query Fuzzy Like This Field Query
Function Score QueryFuzzy Query
GeoShape Query Has Child Query
Has Parent Query Ids Query
Indices Query Match All Query
More Like This Query
Nested Query Prefix Query
Query String Query Simple Query String Query
Range Query Regexp Query
Span First Query Span Multi Term Query
Span Near Query Span Not Query Span Or Query
Span Term Query Term Query Terms Query
Top Children Query Wildcard Query
Minimum Should Match Multi Term Query Rewrite
Template Query
And FilterBool Filter
Exists Filter Geo Bounding Box Filter
Geo Distance Filter Geo Distance Range Filter
Geo Polygon Filter GeoShape Filter
Geohash Cell Filter Has Child Filter
Has Parent Filter Ids Filter
Indices Filter
Limit Filter Match All Filter Missing Filter Nested Filter
Not FilterOr Filter
Prefix Filter Query Filter
Range FilterRegexp Filter Script Filter Term Filter
Terms FilterType Filter
Event.search query: { bool: { should: [ { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, { function_score: { filter: { and: [ { range: { starts_at: { lte: 'now' } } }, { term: { featured: true } } ] }, gauss: { starts_at: { origin: 'now', scale: '10d', decay: 0.5 }, }, boost_mode: "sum" } } ], minimum_should_match: 2 } }
Create service objectsclass EventSearch
def initialize @filters = [] end
def starting_after(time) tap { @filters << { range: { starts_at: { gte: time } } } } end
def featured tap { @filters << { term: { featured: true } } } end
def in_group(group_id) tap { @filters << { term: { group_id: group_id } } } end
Event.search '東京rubyist'
Dealing with different languages
built in analysers for arabic, armenian, basque, brazilian, bulgarian, catalan, cjk, czech, danish, dutch, english, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, thai.
class Event < ActiveRecord::Base include Elasticsearch::Model
def as_indexed_json(options={}) { title: { en: title_en, de: title_de, ja: title_ja }, description: { en: desc_en, de: desc_de, ja: desc_ja }, starts_at: starts_at.iso8601, featured: group.featured? } end
settings do mapping dynamic: 'false' do indexes :title do indexes :en, type: 'string', analyzer: 'english' indexes :de, type: 'string', analyzer: 'german' indexes :ja, type: 'string', analyzer: 'cjk' end indexes :description do indexes :en, type: 'string', analyzer: 'english' indexes :de, type: 'string', analyzer: 'german' indexes :ja, type: 'string', analyzer: 'cjk' end indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end
Changes to mappings?
⚠ can't change field types / analysers ⚠
but: we can add new field mappings
class AddCreatedAtToES < ActiveRecord::Migration def up client = Elasticsearch::Client.new client.indices.put_mapping( index: Event.index_name, type: Event.document_type, body: { properties: { created_at: { type: 'date' } } } ) Event.__elasticsearch__.import end
def down end end
Automated tests
class Event < ActiveRecord::Base include Elasticsearch::Model
index_name "drkpr_#{Rails.env}_events"
Index names with environment
Test helpers
• everything is asynchronous!
• Helpers:wait_for_elasticsearchwait_for_elasticsearch_removalclear_elasticsearch!➜ https://gist.github.com/mreinsch/094dc9cf63362314cef4
• specs: Tag tests which require elasticsearch
Production ready?
• use elastic.co/found or AWS ES
• use two clustered instances for redundancy
• Elasticsearch could go away
• keep impact at a minimum!
• update Elasticsearch from background worker
Questions?
Resources:
Elastic Docs https://www.elastic.co/guide/index.html
Ruby Gem Docs https://github.com/elastic/elasticsearch-rails
Elasticsearch rspec helpershttps://gist.github.com/mreinsch/094dc9cf63362314cef4 Elasticsearch indexer job examplehttps://gist.github.com/mreinsch/acb2f6c58891e5cd4f13
or ask me later:
[email protected] @mreinsch