disk-usage-exceeded.asciidoc 5.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140
  1. [[fix-watermark-errors]]
  2. === Fix watermark errors
  3. ++++
  4. <titleabbrev>Watermark errors</titleabbrev>
  5. ++++
  6. :keywords: {es}, high watermark, low watermark, full disk, flood stage watermark
  7. When a data node is critically low on disk space and has reached the
  8. <<cluster-routing-flood-stage,flood-stage disk usage watermark>>, the following
  9. error is logged: `Error: disk usage exceeded flood-stage watermark, index has read-only-allow-delete block`.
  10. To prevent a full disk, when a node reaches this watermark, {es} <<index-block-settings,blocks writes>>
  11. to any index with a shard on the node. If the block affects related system
  12. indices, {kib} and other {stack} features may become unavailable. For example,
  13. this could induce {kib}'s `Kibana Server is not Ready yet`
  14. {kibana-ref}/access.html#not-ready[error message].
  15. {es} will automatically remove the write block when the affected node's disk
  16. usage falls below the <<cluster-routing-watermark-high,high disk watermark>>.
  17. To achieve this, {es} attempts to rebalance some of the affected node's shards
  18. to other nodes in the same data tier.
  19. ****
  20. If you're using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to https://www.elastic.co/guide/en/cloud/current/ec-autoops.html[Monitor with AutoOps].
  21. ****
  22. [[fix-watermark-errors-rebalance]]
  23. ==== Monitor rebalancing
  24. To verify that shards are moving off the affected node until it falls below high
  25. watermark., use the <<cat-shards,cat shards API>> and <<cat-recovery,cat recovery API>>:
  26. [source,console]
  27. ----
  28. GET _cat/shards?v=true
  29. GET _cat/recovery?v=true&active_only=true
  30. ----
  31. If shards remain on the node keeping it about high watermark, use the
  32. <<cluster-allocation-explain,cluster allocation explanation API>> to get an
  33. explanation for their allocation status.
  34. [source,console]
  35. ----
  36. GET _cluster/allocation/explain
  37. {
  38. "index": "my-index",
  39. "shard": 0,
  40. "primary": false
  41. }
  42. ----
  43. // TEST[s/^/PUT my-index\n/]
  44. // TEST[s/"primary": false,/"primary": false/]
  45. [[fix-watermark-errors-temporary]]
  46. ==== Temporary Relief
  47. To immediately restore write operations, you can temporarily increase
  48. <<disk-based-shard-allocation,disk watermarks>> and remove the
  49. <<index-block-settings,write block>>.
  50. [source,console]
  51. ----
  52. PUT _cluster/settings
  53. {
  54. "persistent": {
  55. "cluster.routing.allocation.disk.watermark.low": "90%",
  56. "cluster.routing.allocation.disk.watermark.low.max_headroom": "100GB",
  57. "cluster.routing.allocation.disk.watermark.high": "95%",
  58. "cluster.routing.allocation.disk.watermark.high.max_headroom": "20GB",
  59. "cluster.routing.allocation.disk.watermark.flood_stage": "97%",
  60. "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": "5GB",
  61. "cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
  62. "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": "5GB"
  63. }
  64. }
  65. PUT */_settings?expand_wildcards=all
  66. {
  67. "index.blocks.read_only_allow_delete": null
  68. }
  69. ----
  70. // TEST[s/^/PUT my-index\n/]
  71. When a long-term solution is in place, to reset or reconfigure the disk watermarks:
  72. [source,console]
  73. ----
  74. PUT _cluster/settings
  75. {
  76. "persistent": {
  77. "cluster.routing.allocation.disk.watermark.low": null,
  78. "cluster.routing.allocation.disk.watermark.low.max_headroom": null,
  79. "cluster.routing.allocation.disk.watermark.high": null,
  80. "cluster.routing.allocation.disk.watermark.high.max_headroom": null,
  81. "cluster.routing.allocation.disk.watermark.flood_stage": null,
  82. "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": null,
  83. "cluster.routing.allocation.disk.watermark.flood_stage.frozen": null,
  84. "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": null
  85. }
  86. }
  87. ----
  88. [[fix-watermark-errors-resolve]]
  89. ==== Resolve
  90. To resolve watermark errors permanently, perform one of the following actions:
  91. * Horizontally scale nodes of the affected <<data-tiers,data tiers>>.
  92. * Vertically scale existing nodes to increase disk space.
  93. * Delete indices using the <<indices-delete-index,delete index API>>, either
  94. permanently if the index isn't needed, or temporarily to later
  95. <<snapshots-restore-snapshot,restore>>.
  96. * update related <<index-lifecycle-management,ILM policy>> to push indices
  97. through to later <<data-tiers,data tiers>>
  98. TIP: On {ess} and {ece}, indices may need to be temporarily deleted via
  99. its {cloud}/ec-api-console.html[Elasticsearch API Console] to later
  100. <<snapshots-restore-snapshot,snapshot restore>> in order to resolve
  101. <<cluster-health,cluster health>> `status:red` which will block
  102. {cloud}/ec-activity-page.html[attempted changes]. If you experience issues
  103. with this resolution flow on {ess}, kindly reach out to
  104. https://support.elastic.co[Elastic Support] for assistance.
  105. [discrete]
  106. [[fix-watermark-errors-prevent]]
  107. === Prevent watermark errors
  108. To avoid watermark errors in future, perform one of the following actions:
  109. * If you're using {ess}, {ece}, or {eck}: Enable <<xpack-autoscaling,autoscaling>>.
  110. * Set up {kibana-ref}/kibana-alerts.html[stack monitoring alerts] on top of
  111. <<monitor-elasticsearch-cluster,{es} monitoring>> to be notified before
  112. the flood-stage watermark is reached.