health.asciidoc 2.4 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
  1. [[cat-health]]
  2. == cat health
  3. `health` is a terse, one-line representation of the same information
  4. from `/_cluster/health`. It has one option `ts` to disable the
  5. timestamping.
  6. [source,sh]
  7. --------------------------------------------------
  8. % curl 192.168.56.10:9200/_cat/health
  9. 1384308967 18:16:07 foo green 3 3 3 3 0 0 0
  10. % curl '192.168.56.10:9200/_cat/health?v&ts=0'
  11. cluster status nodeTotal nodeData shards pri relo init unassign tasks
  12. foo green 3 3 3 3 0 0 0 0
  13. --------------------------------------------------
  14. A common use of this command is to verify the health is consistent
  15. across nodes:
  16. [source,sh]
  17. --------------------------------------------------
  18. % pssh -i -h list.of.cluster.hosts curl -s localhost:9200/_cat/health
  19. [1] 20:20:52 [SUCCESS] es3.vm
  20. 1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0
  21. [2] 20:20:52 [SUCCESS] es1.vm
  22. 1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0
  23. [3] 20:20:52 [SUCCESS] es2.vm
  24. 1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0
  25. --------------------------------------------------
  26. A less obvious use is to track recovery of a large cluster over
  27. time. With enough shards, starting a cluster, or even recovering after
  28. losing a node, can take time (depending on your network & disk). A way
  29. to track its progress is by using this command in a delayed loop:
  30. [source,sh]
  31. --------------------------------------------------
  32. % while true; do curl 192.168.56.10:9200/_cat/health; sleep 120; done
  33. 1384309446 18:24:06 foo red 3 3 20 20 0 0 1812 0
  34. 1384309566 18:26:06 foo yellow 3 3 950 916 0 12 870 0
  35. 1384309686 18:28:06 foo yellow 3 3 1328 916 0 12 492 0
  36. 1384309806 18:30:06 foo green 3 3 1832 916 4 0 0
  37. ^C
  38. --------------------------------------------------
  39. In this scenario, we can tell that recovery took roughly four minutes.
  40. If this were going on for hours, we would be able to watch the
  41. `UNASSIGNED` shards drop precipitously. If that number remained
  42. static, we would have an idea that there is a problem.
  43. [float]
  44. [[timestamp]]
  45. === Why the timestamp?
  46. You typically are using the `health` command when a cluster is
  47. malfunctioning. During this period, it's extremely important to
  48. correlate activities across log files, alerting systems, etc.
  49. There are two outputs. The `HH:MM:SS` output is simply for quick
  50. human consumption. The epoch time retains more information, including
  51. date, and is machine sortable if your recovery spans days.