← Main page

Elasticsearch concepts

What Elasticsearch is

  • Elasticsearch is a distributed search engine with a REST interface and it based on the Lucene library.
  • Indexed documents are available for search in near real-time.
  • Official documentation

Cluster

  • A cluster consists of one or more nodes which share the same cluster name.
  • Each cluster has a single master node which is chosen automatically by the cluster and which can be replaced if the current master node fails.

Node

A node is a running instance of Elasticsearch. A node can be at least of two types: a master node and a data node.

Shard

Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard.

Replica

  • A replica is a copy of the primary shard, and has two purposes:
  • Increase failover: a replica shard can be promoted to a primary shard if the primary fails.
  • Increase performance: get and search requests can be handled by primary or replica shards. By default, each primary shard has one replica, but the number of replicas can be changed dynamically on an existing index. A replica shard will never be started on the same node as its primary shard.

Field

A field is a smallest data unit in ElasticSearch.

Document

  • A document is a JSON document which is stored in Elasticsearch. It is like a row in a table in a relational database.
  • Each document has its data in fields.
  • The _source field contains the original JSON document body that was passed at index time. The _source field itself is not indexed (and thus is not searchable), but it is stored so that it can be returned when executing get and search requests.

Index

  • An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data.
  • By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees. The ability to use the per-field data structures to assemble and return search results is what makes Elasticsearch so fast.
  • .monitoring-es hidden index is used to save cluster state which allows to monitor RPS, memory, CPU, etc.

Alias

  • An alias is a secondary name for a group of data streams or indices. Most Elasticsearch APIs accept an alias in place of a data stream or index name.
  • You can change the data streams or indices of an alias at any time. If you use aliases in your application’s Elasticsearch requests, you can reindex data with no downtime or changes to your app’s code.

Analyzer

An analyzer is applied to a field. An analyzer consists of the three following units:

  1. zero or more character filters. A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. 
  2. one tokenizer. A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens.
  3. zero or more token filters. A token filter receives the token stream and may add, remove (stop token filter), or change (lowercase or synonym token filter) tokens.  Elasticsearch uses ‘analyzer’ for indexing and ‘search_analyzer’ for searching. If you use both it can cause unexpected results.

Mapping

  • Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.
  • Each document is a collection of fields, which each have their own data type. When mapping your data, you create a mapping definition, which contains a list of fields that are pertinent to the document. A mapping definition also includes metadata fields, like the _source field, which customize how a document’s associated metadata is handled.

Mapping type

Each field has a field data type, or field type. This type indicates the kind of data the field contains, such as strings or boolean values, and its intended use. For example, you can index strings to both text and keyword fields. However, text field values are analyzed for full-text search while keyword strings are left as-is for filtering and sorting.

Index template

An index template is a way to tell Elasticsearch how to configure an index when it is created. For data streams, the index template configures the stream’s backing indices as they are created. Templates are configured prior to index creation. When an index is created - either manually or through indexing a document - the template settings are used as a basis for creating the index.

Endpoints

  1. Show list of indices http://localhost:9200/_cat/indices
  2. Show index's content http://localhost:9200/index_name/_search
  3. Show index's mapping http://localhost:9200/index_name/_mapping

Ports

  • 9200 port provides REST API
  • 9300 port is used for communication between nodes
  • Each node can accept requests but search is handled by data nodes, other nodes can gather and post-process search results if client send request to them