4 年之前 · f8666b930f
--- a/docs/reference/how-to.asciidoc
+++ b/docs/reference/how-to.asciidoc
@@ -25,6 +25,8 @@ include::how-to/search-speed.asciidoc[]
 
				 
			
 
				 include::how-to/disk-usage.asciidoc[]
			
 
				 
			
 
				+include::how-to/fix-common-cluster-issues.asciidoc[]
			
 
				+
			
 
				 include::how-to/size-your-shards.asciidoc[]
			
 
				 
			
 
				 include::how-to/use-elasticsearch-for-time-series-data.asciidoc[]
			
--- a/docs/reference/how-to/fix-common-cluster-issues.asciidoc
+++ b/docs/reference/how-to/fix-common-cluster-issues.asciidoc
@@ -0,0 +1,440 @@
 
				+[[fix-common-cluster-issues]]
			
 
				+== Fix common cluster issues
			
 
				+
			
 
				+This guide describes how to fix common problems with {es} clusters.
			
 
				+
			
 
				+[discrete]
			
 
				+[[circuit-breaker-errors]]
			
 
				+=== Circuit breaker errors
			
 
				+
			
 
				+{es} uses <<circuit-breaker,circuit breakers>> to prevent nodes from running out
			
 
				+of JVM heap memory. If Elasticsearch estimates an operation would exceed a
			
 
				+circuit breaker, it stops the operation and returns an error.
			
 
				+
			
 
				+By default, the <<parent-circuit-breaker,parent circuit breaker>> triggers at
			
 
				+95% JVM memory usage. To prevent errors, we recommend taking steps to reduce
			
 
				+memory pressure if usage consistently exceeds 85%.
			
 
				+
			
 
				+[discrete]
			
 
				+[[diagnose-circuit-breaker-errors]]
			
 
				+==== Diagnose circuit breaker errors
			
 
				+
			
 
				+**Error messages**
			
 
				+
			
 
				+If a request triggers a circuit breaker, {es} returns an error.
			
 
				+
			
 
				+[source,js]
			
 
				+----
			
 
				+{
			
 
				+  'error': {
			
 
				+    'type': 'circuit_breaking_exception',
			
 
				+    'reason': '[parent] Data too large, data for [<http_request>] would be [123848638/118.1mb], which is larger than the limit of [123273216/117.5mb], real usage: [120182112/114.6mb], new bytes reserved: [3666526/3.4mb]',
			
 
				+    'bytes_wanted': 123848638,
			
 
				+    'bytes_limit': 123273216,
			
 
				+    'durability': 'TRANSIENT'
			
 
				+  },
			
 
				+  'status': 429
			
 
				+}
			
 
				+----
			
 
				+// NOTCONSOLE
			
 
				+
			
 
				+{es} also writes circuit breaker errors to <<logging,`elasticsearch.log`>>. This
			
 
				+is helpful when automated processes, such as allocation, trigger a circuit
			
 
				+breaker.
			
 
				+
			
 
				+[source,txt]
			
 
				+----
			
 
				+Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [num/numGB], which is larger than the limit of [num/numGB], usages [request=0/0b, fielddata=num/numKB, in_flight_requests=num/numGB, accounting=num/numGB]
			
 
				+----
			
 
				+
			
 
				+**Check JVM memory usage**
			
 
				+
			
 
				+If you've enabled Stack Monitoring, you can view JVM memory usage in {kib}. In
			
 
				+the main menu, click **Stack Monitoring**. On the Stack Monitoring **Overview**
			
 
				+page, click **Nodes**. The **JVM Heap** column lists the current memory usage
			
 
				+for each node.
			
 
				+
			
 
				+You can also use the <<cat-nodes,cat nodes API>> to get the current
			
 
				+`heap.percent` for each node.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+GET _cat/nodes?v=true&h=name,node*,heap*
			
 
				+----
			
 
				+
			
 
				+To get the JVM memory usage for each circuit breaker, use the
			
 
				+<<cluster-nodes-stats,node stats API>>.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+GET _nodes/stats/breaker
			
 
				+----
			
 
				+
			
 
				+[discrete]
			
 
				+[[prevent-circuit-breaker-errors]]
			
 
				+==== Prevent circuit breaker errors
			
 
				+
			
 
				+**Reduce JVM memory pressure**
			
 
				+
			
 
				+High JVM memory pressure often causes circuit breaker errors. See
			
 
				+<<high-jvm-memory-pressure>>.
			
 
				+
			
 
				+**Avoid using fielddata on `text` fields**
			
 
				+
			
 
				+For high-cardinality `text` fields, fielddata can use a large amount of JVM
			
 
				+memory. To avoid this, {es} disables fielddata on `text` fields by default. If
			
 
				+you've enabled fielddata and triggered the <<fielddata-circuit-breaker,fielddata
			
 
				+circuit breaker>>, consider disabling it and using a `keyword` field instead.
			
 
				+See <<fielddata>>.
			
 
				+
			
 
				+**Clear the fieldata cache**
			
 
				+
			
 
				+If you've triggered the fielddata circuit breaker and can't disable fielddata,
			
 
				+use the <<indices-clearcache,clear cache API>> to clear the fielddata cache.
			
 
				+This may disrupt any in-flight searches that use fielddata.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+POST _all/_cache/clear?fielddata=true
			
 
				+----
			
 
				+// TEST[s/^/PUT my-index\n/]
			
 
				+
			
 
				+[discrete]
			
 
				+[[high-jvm-memory-pressure]]
			
 
				+=== High JVM memory pressure
			
 
				+
			
 
				+High JVM memory usage can degrade cluster performance and trigger
			
 
				+<<circuit-breaker-errors,circuit breaker errors>>. To prevent this, we recommend
			
 
				+taking steps to reduce memory pressure if a node's JVM memory usage consistently
			
 
				+exceeds 85%.
			
 
				+
			
 
				+[discrete]
			
 
				+[[diagnose-high-jvm-memory-pressure]]
			
 
				+==== Diagnose high JVM memory pressure
			
 
				+
			
 
				+**Check JVM memory pressure**
			
 
				+
			
 
				+include::{es-repo-dir}/tab-widgets/code.asciidoc[]
			
 
				+include::{es-repo-dir}/tab-widgets/jvm-memory-pressure-widget.asciidoc[]
			
 
				+
			
 
				+**Check garbage collection logs**
			
 
				+
			
 
				+As memory usage increases, garbage collection becomes more frequent and takes
			
 
				+longer. You can track the frequency and length of garbage collection events in
			
 
				+<<logging,`elasticsearch.log`>>. For example, the following event states {es}
			
 
				+spent more than 50% (21 seconds) of the last 40 seconds performing garbage
			
 
				+collection.
			
 
				+
			
 
				+[source,log]
			
 
				+----
			
 
				+[timestamp_short_interval_from_last][INFO ][o.e.m.j.JvmGcMonitorService] [node_id] [gc][number] overhead, spent [21s] collecting in the last [40s]
			
 
				+----
			
 
				+
			
 
				+[discrete]
			
 
				+[[reduce-jvm-memory-pressure]]
			
 
				+==== Reduce JVM memory pressure
			
 
				+
			
 
				+**Reduce your shard count**
			
 
				+
			
 
				+Every shard uses memory. In most cases, a small set of large shards uses fewer
			
 
				+resources than many small shards. For tips on reducing your shard count, see
			
 
				+<<size-your-shards>>.
			
 
				+
			
 
				+**Avoid expensive searches**
			
 
				+
			
 
				+Expensive searches can use large amounts of memory. To better track expensive
			
 
				+searches on your cluster, enable <<index-modules-slowlog,slow logs>>.
			
 
				+
			
 
				+Expensive searches may have a large <<paginate-search-results,`size` argument>>,
			
 
				+use aggregations with a large number of buckets, or include
			
 
				+<<query-dsl-allow-expensive-queries,expensive queries>>. To prevent expensive
			
 
				+searches, consider the following setting changes:
			
 
				+
			
 
				+* Lower the `size` limit using the
			
 
				+<<index-max-result-window,`index.max_result_window`>> index setting.
			
 
				+
			
 
				+* Decrease the maximum number of allowed aggregation buckets using the
			
 
				+<<search-settings-max-buckets,search.max_buckets>> cluster setting.
			
 
				+
			
 
				+* Disable expensive queries using the
			
 
				+<<query-dsl-allow-expensive-queries,`search.allow_expensive_queries`>> cluster
			
 
				+setting.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+PUT _all/_settings
			
 
				+{
			
 
				+  "index.max_result_window": 5000
			
 
				+}
			
 
				+
			
 
				+PUT _cluster/settings
			
 
				+{
			
 
				+  "persistent": {
			
 
				+    "search.max_buckets": 20000,
			
 
				+    "search.allow_expensive_queries": false
			
 
				+  }
			
 
				+}
			
 
				+----
			
 
				+// TEST[s/^/PUT my-index\n/]
			
 
				+
			
 
				+**Prevent mapping explosions**
			
 
				+
			
 
				+Defining too many fields or nesting fields too deeply can lead to
			
 
				+<<mapping-limit-settings,mapping explosions>> that use large amounts of memory.
			
 
				+To prevent mapping explosions, use the <<mapping-settings-limit,mapping limit
			
 
				+settings>> to limit the number of field mappings.
			
 
				+
			
 
				+**Spread out bulk requests**
			
 
				+
			
 
				+While more efficient than individual requests, large <<docs-bulk,bulk indexing>>
			
 
				+or <<search-multi-search,multi-search>> requests can still create high JVM
			
 
				+memory pressure. If possible, submit smaller requests and allow more time
			
 
				+between them.
			
 
				+
			
 
				+**Upgrade node memory**
			
 
				+
			
 
				+Heavy indexing and search loads can cause high JVM memory pressure. To better
			
 
				+handle heavy workloads, upgrade your nodes to increase their memory capacity.
			
 
				+
			
 
				+[discrete]
			
 
				+[[red-yellow-cluster-status]]
			
 
				+=== Red or yellow cluster status
			
 
				+
			
 
				+A red or yellow cluster status indicates one or more shards are missing or
			
 
				+unallocated. These unassigned shards increase your risk of data loss and can
			
 
				+degrade cluster performance.
			
 
				+
			
 
				+[discrete]
			
 
				+[[diagnose-cluster-status]]
			
 
				+==== Diagnose your cluster status
			
 
				+
			
 
				+**Check your cluster status**
			
 
				+
			
 
				+Use the <<cluster-health,cluster health API>>.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+GET _cluster/health?filter_path=status,*_shards
			
 
				+----
			
 
				+
			
 
				+A healthy cluster has a green `status` and zero `unassigned_shards`. A yellow
			
 
				+status means only replicas are unassigned. A red status means one or
			
 
				+more primary shards are unassigned.
			
 
				+
			
 
				+**View unassigned shards**
			
 
				+
			
 
				+To view unassigned shards, use the <<cat-shards,cat shards API>>.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+GET _cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state
			
 
				+----
			
 
				+
			
 
				+Unassigned shards have a `state` of `UNASSIGNED`. The `prirep` value is `p` for
			
 
				+primary shards and `r` for replicas. The `unassigned.reason` describes why the
			
 
				+shard remains unassigned.
			
 
				+
			
 
				+To get a more in-depth explanation of an unassigned shard's allocation status,
			
 
				+use the <<cluster-allocation-explain,cluster allocation explanation API>>. You
			
 
				+can often use details in the response to resolve the issue.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+GET _cluster/allocation/explain?filter_path=index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*
			
 
				+{
			
 
				+  "index": "my-index",
			
 
				+  "shard": 0,
			
 
				+  "primary": false,
			
 
				+  "current_node": "my-node"
			
 
				+}
			
 
				+----
			
 
				+// TEST[s/^/PUT my-index\n/]
			
 
				+// TEST[s/"primary": false,/"primary": false/]
			
 
				+// TEST[s/"current_node": "my-node"//]
			
 
				+
			
 
				+[discrete]
			
 
				+[[fix-red-yellow-cluster-status]]
			
 
				+==== Fix a red or yellow cluster status
			
 
				+
			
 
				+A shard can become unassigned for several reasons. The following tips outline the
			
 
				+most common causes and their solutions.
			
 
				+
			
 
				+**Re-enable shard allocation**
			
 
				+
			
 
				+You typically disable allocation during a <<restart-cluster,restart>> or other
			
 
				+cluster maintenance. If you forgot to re-enable allocation afterward, {es} will
			
 
				+be unable to assign shards. To re-enable allocation, reset the
			
 
				+`cluster.routing.allocation.enable` cluster setting.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+PUT _cluster/settings
			
 
				+{
			
 
				+  "persistent" : {
			
 
				+    "cluster.routing.allocation.enable" : null
			
 
				+  }
			
 
				+}
			
 
				+----
			
 
				+
			
 
				+**Recover lost nodes**
			
 
				+
			
 
				+Shards often become unassigned when a data node leaves the cluster. This can
			
 
				+occur for several reasons, ranging from connectivity issues to hardware failure.
			
 
				+After you resolve the issue and recover the node, it will rejoin the cluster.
			
 
				+{es} will then automatically allocate any unassigned shards.
			
 
				+
			
 
				+To avoid wasting resources on temporary issues, {es} <<delayed-allocation,delays
			
 
				+allocation>> by one minute by default. If you've recovered a node and don’t want
			
 
				+to wait for the delay period, you can call the <<cluster-reroute,cluster reroute
			
 
				+API>> with no arguments to start the allocation process. The process runs
			
 
				+asynchronously in the background.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+POST _cluster/reroute
			
 
				+----
			
 
				+
			
 
				+**Fix allocation settings**
			
 
				+
			
 
				+Misconfigured allocation settings can result in an unassigned primary shard.
			
 
				+These settings include:
			
 
				+
			
 
				+* <<shard-allocation-filtering,Shard allocation>> index settings
			
 
				+* <<cluster-shard-allocation-filtering,Allocation filtering>> cluster settings
			
 
				+* <<shard-allocation-awareness,Allocation awareness>> cluster settings
			
 
				+
			
 
				+To review your allocation settings, use the <<indices-get-settings,get index
			
 
				+settings>> and <<cluster-get-settings,get cluster settings>> APIs.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+GET my-index/_settings?flat_settings=true&include_defaults=true
			
 
				+
			
 
				+GET _cluster/settings?flat_settings=true&include_defaults=true
			
 
				+----
			
 
				+// TEST[s/^/PUT my-index\n/]
			
 
				+
			
 
				+You can change the settings using the <<indices-update-settings,update index
			
 
				+settings>> and <<cluster-update-settings,update cluster settings>> APIs.
			
 
				+
			
 
				+**Allocate or reduce replicas**
			
 
				+
			
 
				+To protect against hardware failure, {es} will not assign a replica to the same
			
 
				+node as its primary shard. If no other data nodes are available to host the
			
 
				+replica, it remains unassigned. To fix this, you can:
			
 
				+
			
 
				+* Add a data node to the same tier to host the replica.
			
 
				+
			
 
				+* Change the `index.number_of_replicas` index setting to reduce the number of
			
 
				+replicas for each primary shard. We recommend keeping at least one replica per
			
 
				+primary.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+PUT _all/_settings
			
 
				+{
			
 
				+  "index.number_of_replicas": 1
			
 
				+}
			
 
				+----
			
 
				+// TEST[s/^/PUT my-index\n/]
			
 
				+
			
 
				+**Free up or increase disk space**
			
 
				+
			
 
				+{es} uses a <<disk-based-shard-allocation,low disk watermark>> to ensure data
			
 
				+nodes have enough disk space for incoming shards. By default, {es} does not
			
 
				+allocate shards to nodes using more than 85% of disk space.
			
 
				+
			
 
				+To check the current disk space of your nodes, use the <<cat-allocation,cat
			
 
				+allocation API>>.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+GET _cat/allocation?v=true&h=node,shards,disk.*
			
 
				+----
			
 
				+
			
 
				+If your nodes are running low on disk space, you have a few options:
			
 
				+
			
 
				+* Upgrade your nodes to increase disk space.
			
 
				+
			
 
				+* Delete unneeded indices to free up space. If you use {ilm-init}, you can
			
 
				+update your lifecycle policy to use <<ilm-searchable-snapshot,searchable
			
 
				+snapshots>> or add a delete phase. If you no longer need to search the data, you
			
 
				+can use a <<snapshot-restore,snapshot>> to store it off-cluster.
			
 
				+
			
 
				+* If you no longer write to an index, use the <<indices-forcemerge,force merge
			
 
				+API>> or {ilm-init}'s <<ilm-forcemerge,force merge action>> to merge its
			
 
				+segments into larger ones.
			
 
				++
			
 
				+[source,console]
			
 
				+----
			
 
				+POST my-index/_forcemerge
			
 
				+----
			
 
				+// TEST[s/^/PUT my-index\n/]
			
 
				+
			
 
				+* If an index is read-only, use the <<indices-shrink-index,shrink index API>> or
			
 
				+{ilm-init}'s <<ilm-shrink,shrink action>> to reduce its primary shard count.
			
 
				++
			
 
				+[source,console]
			
 
				+----
			
 
				+POST my-index/_shrink/my-shrunken-index
			
 
				+----
			
 
				+// TEST[s/^/PUT my-index\n{"settings":{"index.number_of_shards":2,"blocks.write":true}}\n/]
			
 
				+
			
 
				+* If your node has a large disk capacity, you can increase the low disk
			
 
				+watermark or set it to an explicit byte value.
			
 
				++
			
 
				+[source,console]
			
 
				+----
			
 
				+PUT _cluster/settings
			
 
				+{
			
 
				+  "persistent": {
			
 
				+    "cluster.routing.allocation.disk.watermark.low": "30gb"
			
 
				+  }
			
 
				+}
			
 
				+----
			
 
				+// TEST[s/"30gb"/null/]
			
 
				+
			
 
				+**Reduce JVM memory pressure**
			
 
				+
			
 
				+Shard allocation requires JVM heap memory. High JVM memory pressure can trigger
			
 
				+<<circuit-breaker,circuit breakers>> that stop allocation and leave shards
			
 
				+unassigned. See <<high-jvm-memory-pressure>>.
			
 
				+
			
 
				+**Recover data for a lost primary shard**
			
 
				+
			
 
				+If a node containing a primary shard is lost, {es} can typically replace it
			
 
				+using a replica on another node. If you can't recover the node and replicas
			
 
				+don't exist or are irrecoverable, you'll need to re-add the missing data from a
			
 
				+<<snapshot-restore,snapshot>> or the original data source.
			
 
				+
			
 
				+WARNING: Only use this option if node recovery is no longer possible. This
			
 
				+process allocates an empty primary shard. If the node later rejoins the cluster,
			
 
				+{es} will overwrite its primary shard with data from this newer empty shard,
			
 
				+resulting in data loss.
			
 
				+
			
 
				+Use the <<cluster-reroute,cluster reroute API>> to manually allocate the
			
 
				+unassigned primary shard to another data node in the same tier. Set
			
 
				+`accept_data_loss` to `true`.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+POST _cluster/reroute
			
 
				+{
			
 
				+  "commands": [
			
 
				+    {
			
 
				+      "allocate_empty_primary": {
			
 
				+        "index": "my-index",
			
 
				+        "shard": 0,
			
 
				+        "node": "my-node",
			
 
				+        "accept_data_loss": "true"
			
 
				+      }
			
 
				+    }
			
 
				+  ]
			
 
				+}
			
 
				+----
			
 
				+// TEST[s/^/PUT my-index\n/]
			
 
				+// TEST[catch:bad_request]
			
 
				+
			
 
				+If you backed up the missing index data to a snapshot, use the
			
 
				+<<restore-snapshot-api,restore snapshot API>> to restore the individual index.
			
 
				+Alternatively, you can index the missing data from the original data source.
			
--- a/docs/reference/tab-widgets/jvm-memory-pressure-widget.asciidoc
+++ b/docs/reference/tab-widgets/jvm-memory-pressure-widget.asciidoc
@@ -0,0 +1,40 @@
 
				+++++
			
 
				+<div class="tabs" data-tab-group="host">
			
 
				+  <div role="tablist" aria-label="Check JVM memory pressure">
			
 
				+    <button role="tab"
			
 
				+            aria-selected="true"
			
 
				+            aria-controls="cloud-tab"
			
 
				+            id="cloud-jvm">
			
 
				+      Elasticsearch Service
			
 
				+    </button>
			
 
				+    <button role="tab"
			
 
				+            aria-selected="false"
			
 
				+            aria-controls="self-managed-tab"
			
 
				+            id="self-managed-jvm"
			
 
				+            tabindex="-1">
			
 
				+      Self-managed
			
 
				+    </button>
			
 
				+  </div>
			
 
				+  <div tabindex="0"
			
 
				+       role="tabpanel"
			
 
				+       id="cloud-tab"
			
 
				+       aria-labelledby="cloud-jvm">
			
 
				+++++
			
 
				+
			
 
				+include::jvm-memory-pressure.asciidoc[tag=cloud]
			
 
				+
			
 
				+++++
			
 
				+  </div>
			
 
				+  <div tabindex="0"
			
 
				+       role="tabpanel"
			
 
				+       id="self-managed-tab"
			
 
				+       aria-labelledby="self-managed-jvm"
			
 
				+       hidden="">
			
 
				+++++
			
 
				+
			
 
				+include::jvm-memory-pressure.asciidoc[tag=self-managed]
			
 
				+
			
 
				+++++
			
 
				+  </div>
			
 
				+</div>
			
 
				+++++
			
--- a/docs/reference/tab-widgets/jvm-memory-pressure.asciidoc
+++ b/docs/reference/tab-widgets/jvm-memory-pressure.asciidoc
@@ -0,0 +1,28 @@
 
				+// tag::cloud[]
			
 
				+From your deployment menu, click **Elasticsearch**. Under **Instances**, each
			
 
				+instance displays a **JVM memory pressure** indicator. When the JVM memory
			
 
				+pressure reaches 75%, the indicator turns red.
			
 
				+
			
 
				+You can also use the <<cluster-nodes-stats,nodes stats API>> to calculate the
			
 
				+current JVM memory pressure for each node.
			
 
				+
			
 
				+// tag::jvm-memory-cat-nodes[]
			
 
				+[source,console]
			
 
				+----
			
 
				+GET _nodes/stats?filter_path=nodes.*.jvm.mem.pools.old
			
 
				+----
			
 
				+
			
 
				+Use the response to calculate memory pressure as follows:
			
 
				+
			
 
				+JVM Memory Pressure = `used_in_bytes` / `max_in_bytes`
			
 
				+// end::jvm-memory-cat-nodes[]
			
 
				+
			
 
				+// end::cloud[]
			
 
				+
			
 
				+// tag::self-managed[]
			
 
				+To calculate the current JVM memory pressure for each node, use the
			
 
				+<<cluster-nodes-stats,nodes stats API>>.
			
 
				+
			
 
				+include::jvm-memory-pressure.asciidoc[tag=jvm-memory-cat-nodes]
			
 
				+
			
 
				+// end::self-managed[]