9 years ago · c0a7f88897
--- a/docs/reference/modules/node.asciidoc
+++ b/docs/reference/modules/node.asciidoc
@@ -1,32 +1,271 @@
 
				 [[modules-node]]
			
 
				 == Node
			
 
				 
			
 
				-*elasticsearch* allows to configure a node to either be allowed to store
			
 
				-data locally or not. Storing data locally basically means that shards of
			
 
				-different indices are allowed to be allocated on that node. By default,
			
 
				-each node is considered to be a data node, and it can be turned off by
			
 
				-setting `node.data` to `false`.
			
 
				-
			
 
				-This is a powerful setting allowing to simply create smart load
			
 
				-balancers that take part in some of different API processing. Lets take
			
 
				-an example:
			
 
				-
			
 
				-We can start a whole cluster of data nodes which do not even start an
			
 
				-HTTP transport by setting `http.enabled` to `false`. Such nodes will
			
 
				-communicate with one another using the
			
 
				-<<modules-transport,transport>> module. In front
			
 
				-of the cluster we can start one or more "non data" nodes which will
			
 
				-start with HTTP enabled. All HTTP communication will be performed
			
 
				-through these "non data" nodes.
			
 
				-
			
 
				-The benefit of using that is first the ability to create smart load
			
 
				-balancers. These "non data" nodes are still part of the cluster, and
			
 
				-they redirect operations exactly to the node that holds the relevant
			
 
				-data. The other benefit is the fact that for scatter / gather based
			
 
				-operations (such as search), these nodes will take part of the
			
 
				-processing since they will start the scatter process, and perform the
			
 
				-actual gather processing.
			
 
				-
			
 
				-This relieves the data nodes to do the heavy duty of indexing and
			
 
				-searching, without needing to process HTTP requests (parsing), overload
			
 
				-the network, or perform the gather processing.
			
 
				+Any time that you start an instance of Elasticsearch, you are starting a
			
 
				+_node_. A collection of connected nodes is  called a
			
 
				+<<modules-cluster,cluster>>. If you are running a single node of Elasticsearch,
			
 
				+then you have a cluster of one node.
			
 
				+
			
 
				+Every node in the cluster can handle <<modules-http,HTTP>> and
			
 
				+<<modules-transport,Transport>> traffic by default. The transport layer
			
 
				+is used exclusively for communication between nodes and between nodes and the
			
 
				+{javaclient}/transport-client.html[Java `TransportClient`]; the HTTP layer is
			
 
				+used only by external REST clients.
			
 
				+
			
 
				+All nodes know about all the other nodes in the cluster and can forward client
			
 
				+requests to the appropriate node. Besides that, each node serves one or more
			
 
				+purpose:
			
 
				+
			
 
				+<<master-node,Master-eligible node>>::
			
 
				+
			
 
				+A node that has `node.master` set to `true` (default), which makes it eligible
			
 
				+to be <<modules-discovery-zen,elected as the _master_ node>>, which controls
			
 
				+the cluster.
			
 
				+
			
 
				+<<data-node,Data node>>::
			
 
				+
			
 
				+A node that has `node.data` set to `true` (default). Data nodes hold data and
			
 
				+perform data related operations such as CRUD, search, and aggregations.
			
 
				+
			
 
				+<<client-node,Client node>>::
			
 
				+
			
 
				+A client node has both `node.master` and `node.data` set to `false`. It can
			
 
				+neither hold data nor become the master node.  It behaves as a ``smart
			
 
				+router'' and is used to forward cluster-level requests to the master node and
			
 
				+data-related requests (such as search) to the appropriate data nodes.
			
 
				+
			
 
				+<<modules-tribe,Tribe node>>::
			
 
				+
			
 
				+A tribe node, configured via the `tribe.*` settings, is a special type  of
			
 
				+client node that can connect to multiple clusters and perform search and other
			
 
				+operations across all connected clusters.
			
 
				+
			
 
				+By default a node is both a master-eligible node and a data node. This is very
			
 
				+convenient for small clusters but, as the cluster grows, it becomes important
			
 
				+to consider separating dedicated master-eligible nodes from dedicated data
			
 
				+nodes.
			
 
				+
			
 
				+[NOTE]
			
 
				+[[coordinating-node]]
			
 
				+.Coordinating node
			
 
				+===============================================
			
 
				+
			
 
				+Requests like search requests or bulk-indexing requests may involve data held
			
 
				+on different data nodes. A search request, for example, is executed in two
			
 
				+phases which are coordinated by the node which receives the client request --
			
 
				+the _coordinating node_.
			
 
				+
			
 
				+In the _scatter_ phase, the coordinating node forwards the request to the data
			
 
				+nodes which hold the data.  Each data node executes the request locally and
			
 
				+returns its results to the coordinating node. In the _gather_  phase, the
			
 
				+coordinating node reduces each data node's results into a single global
			
 
				+resultset.
			
 
				+
			
 
				+This means that a _client_ node needs to have enough memory and CPU in order to
			
 
				+deal with the gather phase.
			
 
				+
			
 
				+===============================================
			
 
				+
			
 
				+[float]
			
 
				+[[master-node]]
			
 
				+=== Master Eligible Node
			
 
				+
			
 
				+The master node is responsible for lightweight cluster-wide actions such as
			
 
				+creating or deleting an index, tracking which nodes are part of the cluster,
			
 
				+and deciding which shards to allocate to which nodes. It is important for
			
 
				+cluster health to have a stable master node.
			
 
				+
			
 
				+Any master-eligible node (all nodes by default) may be elected to become the
			
 
				+master node by the <<modules-discovery-zen,master election process>>.
			
 
				+
			
 
				+Indexing and searching your data is CPU-, memory-, and I/O-intensive work
			
 
				+which can put pressure on a node's resources. To ensure that your master
			
 
				+node is stable and not under pressure, it is a good idea in a bigger
			
 
				+cluster to split the roles between dedicated master-eligible nodes and
			
 
				+dedicated data nodes.
			
 
				+
			
 
				+While master nodes can also behave as <<coordinating-node,coordinating nodes>>
			
 
				+and route search and indexing requests from clients to data nodes, it is
			
 
				+better _not_ to use dedicated master nodes for this purpose. It is important
			
 
				+for the stability of the cluster that master-eligible nodes do as little work
			
 
				+as possible.
			
 
				+
			
 
				+To create a standalone master-eligible node, set:
			
 
				+
			
 
				+[source,yaml]
			
 
				+-------------------
			
 
				+node.master: true <1>
			
 
				+node.data: false <2>
			
 
				+-------------------
			
 
				+<1> The `node.master` role is enabled by default.
			
 
				+<2> Disable the `node.data` role (enabled by default).
			
 
				+
			
 
				+[float]
			
 
				+[[split-brain]]
			
 
				+==== Avoiding split brain with `minimum_master_nodes`
			
 
				+
			
 
				+To prevent data loss, it is vital to configure the
			
 
				+`discovery.zen.minimum_master_nodes` setting (which defaults to `1`) so that
			
 
				+each master-eligible node knows the _minimum number of master-eligible nodes_
			
 
				+that must be visible in order to form a cluster.
			
 
				+
			
 
				+To explain, imagine that you have a cluster consisting of two master-eligible
			
 
				+nodes. A network failure breaks communication between these two nodes.  Each
			
 
				+node sees one master-eligible node... itself. With `minimum_master_nodes` set
			
 
				+to the default of `1`,  this is sufficient to form a cluster. Each node elects
			
 
				+itself as the new master (thinking that the other master-eligible node has
			
 
				+died) and the result is two clusters, or a _split brain_.  These two nodes
			
 
				+will never rejoin until one node is restarted.  Any data that has been written
			
 
				+to the restarted node will be lost.
			
 
				+
			
 
				+Now imagine that you have a cluster with three master-eligible nodes, and
			
 
				+`minimum_master_nodes` set to `2`.  If a network split separates one node from
			
 
				+the other two nodes, the side with one node cannot see enough master-eligible
			
 
				+nodes and will realise that it cannot elect itself as master.  The side with
			
 
				+two nodes will elect a new master (if needed) and continue functioning
			
 
				+correctly.  As soon as the network split is resolved, the single node will
			
 
				+rejoin the cluster and start serving requests again.
			
 
				+
			
 
				+This setting should be set to a _quorum_ of master-eligible nodes:
			
 
				+
			
 
				+  (master_eligible_nodes / 2) + 1
			
 
				+
			
 
				+In other words, if there are three master-eligible nodes, then minimum master
			
 
				+nodes should be set to `(3 / 2) + 1` or `2`:
			
 
				+
			
 
				+[source,yaml]
			
 
				+----------------------------
			
 
				+discovery.zen.minimum_master_nodes: 2 <1>
			
 
				+----------------------------
			
 
				+<1> Defaults to `1`.
			
 
				+
			
 
				+This setting can also be changed dynamically on a live cluster with the
			
 
				+<<cluster-update-settings,cluster update settings API>>:
			
 
				+
			
 
				+[source,js]
			
 
				+----------------------------
			
 
				+PUT _cluster/settings
			
 
				+{
			
 
				+  "transient": {
			
 
				+    "discovery.zen.minimum_master_nodes": 2
			
 
				+  }
			
 
				+}
			
 
				+----------------------------
			
 
				+// AUTOSENSE
			
 
				+
			
 
				+TIP: An advantage of splitting the master and data roles between dedicated
			
 
				+nodes is that you can have just three master-eligible nodes and set
			
 
				+`minimum_master_nodes` to `2`. You never have to change this setting, no
			
 
				+matter how many dedicated data nodes you add to the cluster.
			
 
				+
			
 
				+
			
 
				+[float]
			
 
				+[[data-node]]
			
 
				+=== Data Node
			
 
				+
			
 
				+Data nodes hold the shards that contain the documents you have indexed. Data
			
 
				+nodes handle data related operations like CRUD, search, and aggregations.
			
 
				+These operations are I/O-, memory-, and CPU-intensive. It is important to
			
 
				+monitor these resources and to add more data nodes if they are overloaded.
			
 
				+
			
 
				+The main benefit of having dedicated data nodes is the separation of the
			
 
				+master and data roles.
			
 
				+
			
 
				+To create a dedicated data node, set:
			
 
				+
			
 
				+[source,yaml]
			
 
				+-------------------
			
 
				+node.master: false <1>
			
 
				+node.data: true <2>
			
 
				+-------------------
			
 
				+<1> Disable the `node.master` role (enabled by default).
			
 
				+<2> The `node.data` role is enabled by default.
			
 
				+
			
 
				+[float]
			
 
				+[[client-node]]
			
 
				+=== Client Node
			
 
				+
			
 
				+If you take away the ability to be able to handle master duties and take away
			
 
				+the ability to hold data, then you are left with a _client_ node that can only
			
 
				+route requests, handle the search reduce phase, and distribute bulk indexing.
			
 
				+Essentially, client nodes behave as smart load balancers.
			
 
				+
			
 
				+Standalone client nodes can benefit large clusters by offloading the
			
 
				+coordinating node role from data and master-eligible nodes.  Client nodes join
			
 
				+the cluster and receive the full <<cluster-state,cluster state>>, like every
			
 
				+other node, and they use the cluster state to route requests directly to the
			
 
				+appropriate place(s).
			
 
				+
			
 
				+WARNING: Adding too many client nodes to a cluster can increase the burden on
			
 
				+the entire cluster because the elected master node must await acknowledgement
			
 
				+of cluster state updates from every node! The benefit of client nodes should
			
 
				+not be overstated -- data nodes can happily serve the same purpose as client
			
 
				+nodes.
			
 
				+
			
 
				+To create a deciated client node, set:
			
 
				+
			
 
				+[source,yaml]
			
 
				+-------------------
			
 
				+node.master: false <1>
			
 
				+node.data: false <2>
			
 
				+-------------------
			
 
				+<1> Disable the `node.master` role (enabled by default).
			
 
				+<2> Disable the `node.data` role (enabled by default).
			
 
				+
			
 
				+[float]
			
 
				+== Node data path settings
			
 
				+
			
 
				+[float]
			
 
				+[[data-path]]
			
 
				+=== `path.data`
			
 
				+
			
 
				+Every data and master-eligible node requires access to a data directory where
			
 
				+shards and index and cluster metadata will be stored. The `path.data` defaults
			
 
				+to `$ES_HOME/data` but can be configured in the `elasticsearch.yml` config
			
 
				+file an absolute path or a path relative to `$ES_HOME` as follows:
			
 
				+
			
 
				+[source,yaml]
			
 
				+-----------------------
			
 
				+path.data:  /var/elasticsearch/data
			
 
				+-----------------------
			
 
				+
			
 
				+Like all node settings, it can also be specified on the command line as:
			
 
				+
			
 
				+[source,sh]
			
 
				+-----------------------
			
 
				+./bin/elasticsearch --path.data /var/elasticsearch/data
			
 
				+-----------------------
			
 
				+
			
 
				+TIP: When using the `.zip` or `.tar.gz` distributions, the `path.data` setting
			
 
				+should be configured to locate the data directory outside the Elasticsearch
			
 
				+home directory, so that the home directory can be deleted without deleting
			
 
				+your data! The RPM and Debian distributions do this for you already.
			
 
				+
			
 
				+
			
 
				+[float]
			
 
				+[[max-local-storage-nodes]]
			
 
				+=== `node.max_local_storage_nodes`
			
 
				+
			
 
				+The <<data-path,data path>> can be shared by multiple nodes, even by nodes
			
 
				+from different clusters.  This is very useful for testing failover and
			
 
				+different configurations on your development machine.  In production, however,
			
 
				+it is recommended to run only one node of Elasticsearch per server.
			
 
				+
			
 
				+To prevent more than one node from sharing the same data path, add this
			
 
				+setting to the `elasticsearch.yml` config file:
			
 
				+
			
 
				+[source,yaml]
			
 
				+------------------------------
			
 
				+node.max_local_storage_nodes: 1
			
 
				+------------------------------
			
 
				+
			
 
				+WARNING: Never run different node types (i.e. master, data, client) from the
			
 
				+same data directory. This can lead to unexpected data loss.
			
 
				+
			
 
				+[float]
			
 
				+== Other node settings
			
 
				+
			
 
				+More node settings can be found in <<modules,Modules>>.  Of particular note are
			
 
				+the <<cluster-name,`cluster.name`>>, the <<node-name,`node.name`>> and the
			
 
				+<<modules-network,network settings>>.
			
 
				+