View
129
Download
0
Category
Preview:
Citation preview
NEAR REALTIME (NRT)➤ ElasticSearch is a near realtime search engine➤ There is only a small latency from a document is indexed until it is
searchable➤ The latency is usually one second
CLUSTER➤ A cluster is a collection of nodes (servers)➤ Consists of one or more nodes, depending on the scale
➤ Can contain as many nodes as you want➤ Together, these nodes contain all data➤ A cluster provides indexing and search capability across all nodes➤ Identified by a unique name (defaults to "elasticsearch")
NODE➤ A single server that is part of a cluster➤ Stores searchable data
➤ Stores all data if there is only one node in the cluster, or part of the data if there are multiple nodes
➤ Participates in a cluster's indexing and search capabilities➤ Identified by a name (defaults to a random Marvel character)➤ A node joins a cluster named "elasticsearch" by default➤ Starting a single node on a network will by default create a new single-
node cluster named "elasticsearch"
INDEX➤ A collection of documents (e.g. product, account, movie)
➤ Each of the above examples would be a type➤ Corresponds to a database within a relational database system➤ Identified by a name, which must be lowercased
➤ Used when indexing, searching, updating and deleting documents within the index
➤ You can define as many indexes as you want within a cluster
TYPE➤ Represents a class/category of similar documents, e.g. "user"➤ Consists of a name and a mapping➤ Simplified, you can think of a type as a table within a relational database➤ An index can have one or more types defined, each with their own
mapping➤ Stored within a metadata field named _type because Lucene has no
concept of document types➤ Searching for specific document types applies a filter on this field
MAPPING➤ Similar to a database schema for a table in a relational database➤ Describes the fields that a document of a given type may have
➤ Includes the data type for each field, e.g. string, integer, date, ...➤ Also includes information on how fields should be indexed and stored by
Lucene➤ Dynamic mapping means that it is optional to define a mapping explicitly
DOCUMENT➤ A basic unit of information that can be indexed➤ Consists of fields, which are key/value pairs
➤ A value can be a string, date, object, etc.➤ Corresponds to an object in an object-oriented programming language
➤ A document can be a single user, order, product, etc.➤ Documents are expressed in JSON➤ You can store as many documents within an index as you want
SHARDS➤ An index can be divided into multiple pieces called shards
➤ Useful if an index contains more data than the hardware of a node can store (e.g. 1 TB data on a 500 GB disk)
➤ A shard is a fully functional and independent index➤ Can be stored on any node in a cluster
➤ The number of shards can be specified when creating an index➤ Allows to scale horizontally by content volume (index space)➤ Allows to distribute and parallelize operations across shards, which
increases performance
REPLICAS➤ A replica is a copy of a shard➤ Provides high availability in case a shard or node fails
➤ A replica never resides on the same node as the original shard➤ Allows scaling search volume, because search queries can be executed on
all replicas in parallel➤ By default, Elasticsearch adds 5 primary shards and 1 replica for each
index
THANK YOU FOR WATCHING!
Recommended