health.asciidoc 3.5 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
  1. [[cat-health]]
  2. == cat health
  3. `health` is a terse, one-line representation of the same information
  4. from `/_cluster/health`.
  5. [source,js]
  6. --------------------------------------------------
  7. GET /_cat/health?v
  8. --------------------------------------------------
  9. // CONSOLE
  10. // TEST[s/^/PUT twitter\n{"settings":{"number_of_replicas": 0}}\n/]
  11. [source,txt]
  12. --------------------------------------------------
  13. epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
  14. 1475871424 16:17:04 elasticsearch green 1 1 1 1 0 0 0 0 - 100.0%
  15. --------------------------------------------------
  16. // TESTRESPONSE[s/1475871424 16:17:04/\\d+ \\d+:\\d+:\\d+/]
  17. // TESTRESPONSE[s/elasticsearch/[^ ]+/ s/0 -/\\d+ (-|\\d+(\\.\\d+)?[ms]+)/ _cat]
  18. It has one option `ts` to disable the timestamping:
  19. [source,js]
  20. --------------------------------------------------
  21. GET /_cat/health?v&ts=false
  22. --------------------------------------------------
  23. // CONSOLE
  24. // TEST[s/^/PUT twitter\n{"settings":{"number_of_replicas": 0}}\n/]
  25. which looks like:
  26. [source,txt]
  27. --------------------------------------------------
  28. cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
  29. elasticsearch green 1 1 1 1 0 0 0 0 - 100.0%
  30. --------------------------------------------------
  31. // TESTRESPONSE[s/elasticsearch/[^ ]+/ s/0 -/\\d+ (-|\\d+(\\.\\d+)?[ms]+)/ _cat]
  32. A common use of this command is to verify the health is consistent
  33. across nodes:
  34. [source,sh]
  35. --------------------------------------------------
  36. % pssh -i -h list.of.cluster.hosts curl -s localhost:9200/_cat/health
  37. [1] 20:20:52 [SUCCESS] es3.vm
  38. 1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0
  39. [2] 20:20:52 [SUCCESS] es1.vm
  40. 1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0
  41. [3] 20:20:52 [SUCCESS] es2.vm
  42. 1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0
  43. --------------------------------------------------
  44. // NOTCONSOLE
  45. A less obvious use is to track recovery of a large cluster over
  46. time. With enough shards, starting a cluster, or even recovering after
  47. losing a node, can take time (depending on your network & disk). A way
  48. to track its progress is by using this command in a delayed loop:
  49. [source,sh]
  50. --------------------------------------------------
  51. % while true; do curl localhost:9200/_cat/health; sleep 120; done
  52. 1384309446 18:24:06 foo red 3 3 20 20 0 0 1812 0
  53. 1384309566 18:26:06 foo yellow 3 3 950 916 0 12 870 0
  54. 1384309686 18:28:06 foo yellow 3 3 1328 916 0 12 492 0
  55. 1384309806 18:30:06 foo green 3 3 1832 916 4 0 0
  56. ^C
  57. --------------------------------------------------
  58. // NOTCONSOLE
  59. In this scenario, we can tell that recovery took roughly four minutes.
  60. If this were going on for hours, we would be able to watch the
  61. `UNASSIGNED` shards drop precipitously. If that number remained
  62. static, we would have an idea that there is a problem.
  63. [float]
  64. [[timestamp]]
  65. === Why the timestamp?
  66. You typically are using the `health` command when a cluster is
  67. malfunctioning. During this period, it's extremely important to
  68. correlate activities across log files, alerting systems, etc.
  69. There are two outputs. The `HH:MM:SS` output is simply for quick
  70. human consumption. The epoch time retains more information, including
  71. date, and is machine sortable if your recovery spans days.