diagnostic.asciidoc 6.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154
  1. [[diagnostic]]
  2. == Capturing diagnostics
  3. ++++
  4. <titleabbrev>Capture diagnostics</titleabbrev>
  5. ++++
  6. :keywords: Elasticsearch diagnostic, diagnostics
  7. The {es} https://github.com/elastic/support-diagnostics[Support Diagnostic] tool captures a point-in-time snapshot of cluster statistics and most settings.
  8. It works against all {es} versions.
  9. This information can be used to troubleshoot problems with your cluster. For examples of issues that you can troubleshoot using Support Diagnostic tool output, refer to https://www.elastic.co/blog/why-does-elastic-support-keep-asking-for-diagnostic-files[the Elastic blog].
  10. You can generate diagnostic information using this tool before you contact https://support.elastic.co[Elastic Support] or
  11. https://discuss.elastic.co[Elastic Discuss] to minimize turnaround time.
  12. See this https://www.youtube.com/watch?v=Bb6SaqhqYHw[this video] for a walkthrough of capturing an {es} diagnostic.
  13. [discrete]
  14. [[diagnostic-tool-requirements]]
  15. === Requirements
  16. - Java Runtime Environment or Java Development Kit v1.8 or higher
  17. [discrete]
  18. [[diagnostic-tool-access]]
  19. === Access the tool
  20. The Support Diagnostic tool is included as a sub-library in some Elastic deployments:
  21. * {ece}: Located under **{ece}** > **Deployment** > **Operations** >
  22. **Prepare Bundle** > **{es}**.
  23. * {eck}: Run as https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-take-eck-dump.html[`eck-diagnostics`].
  24. You can also directly download the `diagnostics-X.X.X-dist.zip` file for the latest Support Diagnostic release
  25. from https://github.com/elastic/support-diagnostics/releases/latest[the `support-diagnostic` repo].
  26. [discrete]
  27. [[diagnostic-capture]]
  28. === Capture diagnostic information
  29. To capture an {es} diagnostic:
  30. . In a terminal, verify that your network and user permissions are sufficient to connect to your {es}
  31. cluster by polling the cluster's <<cluster-health,health>>.
  32. +
  33. For example, with the parameters `host:localhost`, `port:9200`, and `username:elastic`, you'd use the following curl request:
  34. +
  35. [source,sh]
  36. ----
  37. curl -X GET -k -u elastic -p https://localhost:9200/_cluster/health
  38. ----
  39. // NOTCONSOLE
  40. +
  41. If you receive a an HTTP 200 `OK` response, then you can proceed to the next step. If you receive a different
  42. response code, then <<diagnostic-non-200,diagnose the issue>> before proceeding.
  43. . Using the same environment parameters, run the diagnostic tool script.
  44. +
  45. For information about the parameters that you can pass to the tool, refer to the https://github.com/elastic/support-diagnostics#standard-options[diagnostic
  46. parameter reference].
  47. +
  48. The following command options are recommended:
  49. +
  50. **Unix-based systems**
  51. +
  52. [source,sh]
  53. ----
  54. sudo ./diagnostics.sh --type local --host localhost --port 9200 -u elastic -p --bypassDiagVerify --ssl --noVerify
  55. ----
  56. +
  57. **Windows**
  58. +
  59. [source,sh]
  60. ----
  61. sudo .\diagnostics.bat --type local --host localhost --port 9200 -u elastic -p --bypassDiagVerify --ssl --noVerify
  62. ----
  63. +
  64. [TIP]
  65. .Script execution modes
  66. ====
  67. You can execute the script in three https://github.com/elastic/support-diagnostics#diagnostic-types[modes]:
  68. * `local` (default, recommended): Polls the <<rest-apis,{es} API>>,
  69. gathers operating system info, and captures cluster and GC logs.
  70. * `remote`: Establishes an ssh session
  71. to the applicable target server to pull the same information as `local`.
  72. * `api`: Polls the <<rest-apis,{es} API>>. All other data must be
  73. collected manually.
  74. ====
  75. . When the script has completed, verify that no errors were logged to `diagnostic.log`.
  76. If the log file contains errors, then refer to <<diagnostic-log-errors,Diagnose errors in `diagnostic.log`>>.
  77. . If the script completed without errors, then an archive with the format `<diagnostic type>-diagnostics-<DateTimeStamp>.zip` is created in the working directory, or an output directory you have specified. You can review or share the diagnostic archive as needed.
  78. [discrete]
  79. [[diagnostic-non-200]]
  80. === Diagnose a non-200 cluster health response
  81. When you poll your cluster health, if you receive any response other than `200 0K`, then the diagnostic tool
  82. might not work as intended. The following are possible error codes and their resolutions:
  83. HTTP 401 `UNAUTHENTICATED`::
  84. Additional information in the error will usually indicate either
  85. that your `username:password` pair is invalid, or that your `.security`
  86. index is unavailable and you need to setup a temporary
  87. <<file-realm,file-based realm>> user with `role:superuser` to authenticate.
  88. HTTP 403 `UNAUTHORIZED`::
  89. Your `username` is recognized but
  90. has insufficient permissions to run the diagnostic. Either use a different
  91. username or elevate the user's privileges.
  92. HTTP 429 `TOO_MANY_REQUESTS` (for example, `circuit_breaking_exception`)::
  93. Your username authenticated and authorized, but the cluster is under
  94. sufficiently high strain that it's not responding to API calls. These
  95. responses are usually intermittent. You can proceed with running the diagnostic,
  96. but the diagnostic results might be incomplete.
  97. HTTP 504 `BAD_GATEWAY`::
  98. Your network is experiencing issues reaching the cluster. You might be using a proxy or firewall.
  99. Consider running the diagnostic tool from a different location, confirming your port, or using an IP
  100. instead of a URL domain.
  101. HTTP 503 `SERVICE_UNAVAILABLE` (for example, `master_not_discovered_exception`)::
  102. Your cluster does not currently have an elected master node, which is
  103. required for it to be API-responsive. This might be temporary while the master
  104. node rotates. If the issue persists, then <<cluster-fault-detection,investigate the cause>>
  105. before proceeding.
  106. [discrete]
  107. [[diagnostic-log-errors]]
  108. === Diagnose errors in `diagnostic.log`
  109. The following are common errors that you might encounter when running the diagnostic tool:
  110. * `Error: Could not find or load main class com.elastic.support.diagnostics.DiagnosticApp`
  111. +
  112. This indicates that you accidentally downloaded the source code file
  113. instead of `diagnostics-X.X.X-dist.zip` from the releases page.
  114. * `Could not retrieve the Elasticsearch version due to a system or network error - unable to continue.`
  115. +
  116. This indicates that the diagnostic couldn't run commands against the cluster.
  117. Poll the cluster's health again, and ensure that you're using the same parameters
  118. when you run the dianostic batch or shell file.
  119. * A `security_exception` that includes `is unauthorized for user`:
  120. +
  121. The provided user has insufficient admin permissions to run the diagnostic tool. Use another
  122. user, or grant the user `role:superuser` privileges.