Browse Source

Add docs for filesystem health checks (#59134)

Documents the feature and settings introduced in #52680.

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
David Turner 5 years ago
parent
commit
c661a40083

+ 19 - 0
docs/reference/modules/discovery/discovery-settings.asciidoc

@@ -227,3 +227,22 @@ APIs are not be blocked and can run on any available node.
 ===============================
 
 --
+
+`monitor.fs.health.enabled`::
+
+    (<<cluster-update-settings,Dynamic>>, boolean) If `true`, the node runs
+    periodic <<cluster-fault-detection-filesystem-health,filesystem health
+    checks>>. Defaults to `true`.
+
+`monitor.fs.health.refresh_interval`::
+
+    (<<time-units, Time value>>) Interval between successive
+    <<cluster-fault-detection-filesystem-health,filesystem health checks>>.
+    Defaults to `2m`.
+
+`monitor.fs.health.slow_path_logging_threshold`::
+
+    (<<time-units, Time value>>) If a
+    <<cluster-fault-detection-filesystem-health,filesystem health checks>>
+    takes longer than this threshold then {es} logs a warning. Defaults to
+    `5s`.

+ 7 - 0
docs/reference/modules/discovery/fault-detection.asciidoc

@@ -18,3 +18,10 @@ Similarly, if a node detects that the elected master has disconnected, this
 situation is treated as an immediate failure. The node bypasses the timeout and
 retry settings and restarts its discovery phase to try and find or elect a new
 master.
+
+[[cluster-fault-detection-filesystem-health]]
+Additionally, each node periodically verifies that its data path is healthy by
+writing a small file to disk and then deleting it again. If a node discovers
+its data path is unhealthy then it is removed from the cluster until the data
+path recovers. You can control this behavior with the
+<<modules-discovery-settings,`monitor.fs.health` settings>>.