123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504 |
- [[modules-node]]
- === Node
- Any time that you start an instance of {es}, you are starting a _node_. A
- collection of connected nodes is called a <<modules-cluster,cluster>>. If you
- are running a single node of {es}, then you have a cluster of one node.
- Every node in the cluster can handle <<modules-network,HTTP and transport>>
- traffic by default. The transport layer is used exclusively for communication
- between nodes; the HTTP layer is used by REST clients.
- [[modules-node-description]]
- // tag::modules-node-description-tag[]
- All nodes know about all the other nodes in the cluster and can forward client
- requests to the appropriate node.
- // end::modules-node-description-tag[]
- [[node-roles]]
- ==== Node roles
- You define a node's roles by setting `node.roles` in `elasticsearch.yml`. If you
- set `node.roles`, the node is only assigned the roles you specify. If you don't
- set `node.roles`, the node is assigned the following roles:
- * `master`
- * `data`
- * `data_content`
- * `data_hot`
- * `data_warm`
- * `data_cold`
- * `data_frozen`
- * `ingest`
- * `ml`
- * `remote_cluster_client`
- * `transform`
- [IMPORTANT]
- ====
- If you set `node.roles`, ensure you specify every node role your cluster needs.
- Every cluster requires the following node roles:
- * `master`
- * {blank}
- +
- --
- `data_content` and `data_hot` +
- OR +
- `data`
- --
- Some {stack} features also require specific node roles:
- - {ccs-cap} and {ccr} require the `remote_cluster_client` role.
- - {stack-monitor-app} and ingest pipelines require the `ingest` role.
- - {fleet}, the {security-app}, and {transforms} require the `transform` role.
- The `remote_cluster_client` role is also required to use {ccs} with these
- features.
- - {ml-cap} features, such as {anomaly-detect}, require the `ml` role.
- ====
- As the cluster grows and in particular if you have large {ml} jobs or
- {ctransforms}, consider separating dedicated master-eligible nodes from
- dedicated data nodes, {ml} nodes, and {transform} nodes.
- <<master-node,Master-eligible node>>::
- A node that has the `master` role, which makes it eligible to be
- <<modules-discovery,elected as the _master_ node>>, which controls the cluster.
- <<data-node,Data node>>::
- A node that has the `data` role. Data nodes hold data and perform data
- related operations such as CRUD, search, and aggregations. A node with the `data` role can fill any of the specialised data node roles.
- <<node-ingest-node,Ingest node>>::
- A node that has the `ingest` role. Ingest nodes are able to apply an
- <<ingest,ingest pipeline>> to a document in order to transform and enrich the
- document before indexing. With a heavy ingest load, it makes sense to use
- dedicated ingest nodes and to not include the `ingest` role from nodes that have
- the `master` or `data` roles.
- <<remote-node,Remote-eligible node>>::
- A node that has the `remote_cluster_client` role, which makes it eligible to act
- as a remote client.
- <<ml-node,Machine learning node>>::
- A node that has the `ml` role. If you want to use {ml-features}, there must be
- at least one {ml} node in your cluster. For more information, see
- <<ml-settings>> and {ml-docs}/index.html[Machine learning in the {stack}].
- <<transform-node,{transform-cap} node>>::
- A node that has the `transform` role. If you want to use {transforms}, there
- must be at least one {transform} node in your cluster. For more information, see
- <<transform-settings>> and <<transforms>>.
- [NOTE]
- [[coordinating-node]]
- .Coordinating node
- ===============================================
- Requests like search requests or bulk-indexing requests may involve data held
- on different data nodes. A search request, for example, is executed in two
- phases which are coordinated by the node which receives the client request --
- the _coordinating node_.
- In the _scatter_ phase, the coordinating node forwards the request to the data
- nodes which hold the data. Each data node executes the request locally and
- returns its results to the coordinating node. In the _gather_ phase, the
- coordinating node reduces each data node's results into a single global
- result set.
- Every node is implicitly a coordinating node. This means that a node that has
- an explicit empty list of roles via `node.roles` will only act as a coordinating
- node, which cannot be disabled. As a result, such a node needs to have enough
- memory and CPU in order to deal with the gather phase.
- ===============================================
- [[master-node]]
- ==== Master-eligible node
- The master node is responsible for lightweight cluster-wide actions such as
- creating or deleting an index, tracking which nodes are part of the cluster,
- and deciding which shards to allocate to which nodes. It is important for
- cluster health to have a stable master node.
- Any master-eligible node that is not a <<voting-only-node,voting-only node>> may
- be elected to become the master node by the <<modules-discovery,master election
- process>>.
- IMPORTANT: Master nodes must have a `path.data` directory whose contents
- persist across restarts, just like data nodes, because this is where the
- cluster metadata is stored. The cluster metadata describes how to read the data
- stored on the data nodes, so if it is lost then the data stored on the data
- nodes cannot be read.
- [[dedicated-master-node]]
- ===== Dedicated master-eligible node
- It is important for the health of the cluster that the elected master node has
- the resources it needs to fulfill its responsibilities. If the elected master
- node is overloaded with other tasks then the cluster will not operate well. The
- most reliable way to avoid overloading the master with other tasks is to
- configure all the master-eligible nodes to be _dedicated master-eligible nodes_
- which only have the `master` role, allowing them to focus on managing the
- cluster. Master-eligible nodes will still also behave as
- <<coordinating-node,coordinating nodes>> that route requests from clients to
- the other nodes in the cluster, but you should _not_ use dedicated master nodes
- for this purpose.
- A small or lightly-loaded cluster may operate well if its master-eligible nodes
- have other roles and responsibilities, but once your cluster comprises more
- than a handful of nodes it usually makes sense to use dedicated master-eligible
- nodes.
- To create a dedicated master-eligible node, set:
- [source,yaml]
- -------------------
- node.roles: [ master ]
- -------------------
- [[voting-only-node]]
- ===== Voting-only master-eligible node
- A voting-only master-eligible node is a node that participates in
- <<modules-discovery,master elections>> but which will not act as the cluster's
- elected master node. In particular, a voting-only node can serve as a tiebreaker
- in elections.
- It may seem confusing to use the term "master-eligible" to describe a
- voting-only node since such a node is not actually eligible to become the master
- at all. This terminology is an unfortunate consequence of history:
- master-eligible nodes are those nodes that participate in elections and perform
- certain tasks during cluster state publications, and voting-only nodes have the
- same responsibilities even if they can never become the elected master.
- To configure a master-eligible node as a voting-only node, include `master` and
- `voting_only` in the list of roles. For example to create a voting-only data
- node:
- [source,yaml]
- -------------------
- node.roles: [ data, master, voting_only ]
- -------------------
- IMPORTANT: Only nodes with the `master` role can be marked as having the
- `voting_only` role.
- High availability (HA) clusters require at least three master-eligible nodes, at
- least two of which are not voting-only nodes. Such a cluster will be able to
- elect a master node even if one of the nodes fails.
- Voting-only master-eligible nodes may also fill other roles in your cluster.
- For instance, a node may be both a data node and a voting-only master-eligible
- node. A _dedicated_ voting-only master-eligible nodes is a voting-only
- master-eligible node that fills no other roles in the cluster. To create a
- dedicated voting-only master-eligible node, set:
- [source,yaml]
- -------------------
- node.roles: [ master, voting_only ]
- -------------------
- Since dedicated voting-only nodes never act as the cluster's elected master,
- they may require less heap and a less powerful CPU than the true master nodes.
- However all master-eligible nodes, including voting-only nodes, are on the
- critical path for <<cluster-state-publishing,publishing cluster state
- updates>>. Cluster state updates are usually independent of
- performance-critical workloads such as indexing or searches, but they are
- involved in management activities such as index creation and rollover, mapping
- updates, and recovery after a failure. The performance characteristics of these
- activities are a function of the speed of the storage on each master-eligible
- node, as well as the reliability and latency of the network interconnections
- between the elected master node and the other nodes in the cluster. You must
- therefore ensure that the storage and networking available to the nodes in your
- cluster are good enough to meet your performance goals.
- [[data-node]]
- ==== Data node
- Data nodes hold the shards that contain the documents you have indexed. Data
- nodes handle data related operations like CRUD, search, and aggregations.
- These operations are I/O-, memory-, and CPU-intensive. It is important to
- monitor these resources and to add more data nodes if they are overloaded.
- The main benefit of having dedicated data nodes is the separation of the master
- and data roles.
- To create a dedicated data node, set:
- [source,yaml]
- ----
- node.roles: [ data ]
- ----
- In a multi-tier deployment architecture, you use specialized data roles to
- assign data nodes to specific tiers: `data_content`,`data_hot`, `data_warm`,
- `data_cold`, or `data_frozen`. A node can belong to multiple tiers, but a node
- that has one of the specialized data roles cannot have the generic `data` role.
- [role="xpack"]
- [[data-content-node]]
- ==== Content data node
- Content data nodes are part of the content tier.
- include::{es-repo-dir}/datatiers.asciidoc[tag=content-tier]
- To create a dedicated content node, set:
- [source,yaml]
- ----
- node.roles: [ data_content ]
- ----
- [role="xpack"]
- [[data-hot-node]]
- ==== Hot data node
- Hot data nodes are part of the hot tier.
- include::{es-repo-dir}/datatiers.asciidoc[tag=hot-tier]
- To create a dedicated hot node, set:
- [source,yaml]
- ----
- node.roles: [ data_hot ]
- ----
- [role="xpack"]
- [[data-warm-node]]
- ==== Warm data node
- Warm data nodes are part of the warm tier.
- include::{es-repo-dir}/datatiers.asciidoc[tag=warm-tier]
- To create a dedicated warm node, set:
- [source,yaml]
- ----
- node.roles: [ data_warm ]
- ----
- [role="xpack"]
- [[data-cold-node]]
- ==== Cold data node
- Cold data nodes are part of the cold tier.
- include::{es-repo-dir}/datatiers.asciidoc[tag=cold-tier]
- To create a dedicated cold node, set:
- [source,yaml]
- ----
- node.roles: [ data_cold ]
- ----
- [role="xpack"]
- [[data-frozen-node]]
- ==== Frozen data node
- Frozen data nodes are part of the frozen tier.
- include::{es-repo-dir}/datatiers.asciidoc[tag=frozen-tier]
- To create a dedicated frozen node, set:
- [source,yaml]
- ----
- node.roles: [ data_frozen ]
- ----
- [[node-ingest-node]]
- ==== Ingest node
- Ingest nodes can execute pre-processing pipelines, composed of one or more
- ingest processors. Depending on the type of operations performed by the ingest
- processors and the required resources, it may make sense to have dedicated
- ingest nodes, that will only perform this specific task.
- To create a dedicated ingest node, set:
- [source,yaml]
- ----
- node.roles: [ ingest ]
- ----
- [[coordinating-only-node]]
- ==== Coordinating only node
- If you take away the ability to be able to handle master duties, to hold data,
- and pre-process documents, then you are left with a _coordinating_ node that
- can only route requests, handle the search reduce phase, and distribute bulk
- indexing. Essentially, coordinating only nodes behave as smart load balancers.
- Coordinating only nodes can benefit large clusters by offloading the
- coordinating node role from data and master-eligible nodes. They join the
- cluster and receive the full <<cluster-state,cluster state>>, like every other
- node, and they use the cluster state to route requests directly to the
- appropriate place(s).
- WARNING: Adding too many coordinating only nodes to a cluster can increase the
- burden on the entire cluster because the elected master node must await
- acknowledgement of cluster state updates from every node! The benefit of
- coordinating only nodes should not be overstated -- data nodes can happily
- serve the same purpose.
- To create a dedicated coordinating node, set:
- [source,yaml]
- ----
- node.roles: [ ]
- ----
- [[remote-node]]
- ==== Remote-eligible node
- A remote-eligible node acts as a cross-cluster client and connects to
- <<remote-clusters,remote clusters>>. Once connected, you can search
- remote clusters using <<modules-cross-cluster-search,{ccs}>>. You can also sync
- data between clusters using <<xpack-ccr,{ccr}>>.
- [source,yaml]
- ----
- node.roles: [ remote_cluster_client ]
- ----
- [[ml-node]]
- ==== [xpack]#Machine learning node#
- {ml-cap} nodes run jobs and handle {ml} API requests. For more information, see
- <<ml-settings>>.
- To create a dedicated {ml} node, set:
- [source,yaml]
- ----
- node.roles: [ ml, remote_cluster_client]
- ----
- The `remote_cluster_client` role is optional but strongly recommended.
- Otherwise, {ccs} fails when used in {ml} jobs or {dfeeds}. If you use {ccs} in
- your {anomaly-jobs}, the `remote_cluster_client` role is also required on all
- master-eligible nodes. Otherwise, the {dfeed} cannot start. See <<remote-node>>.
- [[transform-node]]
- ==== [xpack]#{transform-cap} node#
- {transform-cap} nodes run {transforms} and handle {transform} API requests. For
- more information, see <<transform-settings>>.
- To create a dedicated {transform} node, set:
- [source,yaml]
- ----
- node.roles: [ transform, remote_cluster_client ]
- ----
- The `remote_cluster_client` role is optional but strongly recommended.
- Otherwise, {ccs} fails when used in {transforms}. See <<remote-node>>.
- [[change-node-role]]
- ==== Changing the role of a node
- Each data node maintains the following data on disk:
- * the shard data for every shard allocated to that node,
- * the index metadata corresponding with every shard allocated to that node, and
- * the cluster-wide metadata, such as settings and index templates.
- Similarly, each master-eligible node maintains the following data on disk:
- * the index metadata for every index in the cluster, and
- * the cluster-wide metadata, such as settings and index templates.
- Each node checks the contents of its data path at startup. If it discovers
- unexpected data then it will refuse to start. This is to avoid importing
- unwanted <<modules-gateway-dangling-indices,dangling indices>> which can lead
- to a red cluster health. To be more precise, nodes without the `data` role will
- refuse to start if they find any shard data on disk at startup, and nodes
- without both the `master` and `data` roles will refuse to start if they have any
- index metadata on disk at startup.
- It is possible to change the roles of a node by adjusting its
- `elasticsearch.yml` file and restarting it. This is known as _repurposing_ a
- node. In order to satisfy the checks for unexpected data described above, you
- must perform some extra steps to prepare a node for repurposing when starting
- the node without the `data` or `master` roles.
- * If you want to repurpose a data node by removing the `data` role then you
- should first use an <<allocation-filtering,allocation filter>> to safely
- migrate all the shard data onto other nodes in the cluster.
- * If you want to repurpose a node to have neither the `data` nor `master` roles
- then it is simplest to start a brand-new node with an empty data path and the
- desired roles. You may find it safest to use an
- <<allocation-filtering,allocation filter>> to migrate the shard data elsewhere
- in the cluster first.
- If it is not possible to follow these extra steps then you may be able to use
- the <<node-tool-repurpose,`elasticsearch-node repurpose`>> tool to delete any
- excess data that prevents a node from starting.
- [discrete]
- === Node data path settings
- [[data-path]]
- ==== `path.data`
- Every data and master-eligible node requires access to a data directory where
- shards and index and cluster metadata will be stored. The `path.data` defaults
- to `$ES_HOME/data` but can be configured in the `elasticsearch.yml` config
- file an absolute path or a path relative to `$ES_HOME` as follows:
- [source,yaml]
- ----
- path.data: /var/elasticsearch/data
- ----
- Like all node settings, it can also be specified on the command line as:
- [source,sh]
- ----
- ./bin/elasticsearch -Epath.data=/var/elasticsearch/data
- ----
- The contents of the `path.data` directory must persist across restarts, because
- this is where your data is stored. {es} requires the filesystem to act as if it
- were backed by a local disk, but this means that it will work correctly on
- properly-configured remote block devices (e.g. a SAN) and remote filesystems
- (e.g. NFS) as long as the remote storage behaves no differently from local
- storage. You can run multiple {es} nodes on the same filesystem, but each {es}
- node must have its own data path.
- The performance of an {es} cluster is often limited by the performance of the
- underlying storage, so you must ensure that your storage supports acceptable
- performance. Some remote storage performs very poorly, especially under the
- kind of load that {es} imposes, so make sure to benchmark your system carefully
- before committing to a particular storage architecture.
- TIP: When using the `.zip` or `.tar.gz` distributions, the `path.data` setting
- should be configured to locate the data directory outside the {es} home
- directory, so that the home directory can be deleted without deleting your data!
- The RPM and Debian distributions do this for you already.
- // tag::modules-node-data-path-warning-tag[]
- WARNING: Don't modify anything within the data directory or run processes that
- might interfere with its contents. If something other than {es} modifies the
- contents of the data directory, then {es} may fail, reporting corruption or
- other data inconsistencies, or may appear to work correctly having silently
- lost some of your data. Don't attempt to take filesystem backups of the data
- directory; there is no supported way to restore such a backup. Instead, use
- <<snapshot-restore>> to take backups safely. Don't run virus scanners on the
- data directory. A virus scanner can prevent {es} from working correctly and may
- modify the contents of the data directory. The data directory contains no
- executables so a virus scan will only find false positives.
- // end::modules-node-data-path-warning-tag[]
- [discrete]
- [[other-node-settings]]
- === Other node settings
- More node settings can be found in <<settings>> and <<important-settings>>,
- including:
- * <<cluster-name,`cluster.name`>>
- * <<node-name,`node.name`>>
- * <<modules-network,network settings>>
|