network-timeouts.asciidoc 2.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445
  1. tag::troubleshooting-network-timeouts-gc-vm[]
  2. * GC pauses are recorded in the GC logs that {es} emits by default, and also
  3. usually by the `JvmMonitorService` in the main node logs. Use these logs to
  4. confirm whether or not GC is resulting in delays.
  5. * VM pauses also affect other processes on the same host. A VM pause also
  6. typically causes a discontinuity in the system clock, which {es} will report in
  7. its logs.
  8. end::troubleshooting-network-timeouts-gc-vm[]
  9. tag::troubleshooting-network-timeouts-packet-capture-elections[]
  10. * Packet captures will reveal system-level and network-level faults, especially
  11. if you capture the network traffic simultaneously at all relevant nodes. You
  12. should be able to observe any retransmissions, packet loss, or other delays on
  13. the connections between the nodes.
  14. end::troubleshooting-network-timeouts-packet-capture-elections[]
  15. tag::troubleshooting-network-timeouts-packet-capture-fault-detection[]
  16. * Packet captures will reveal system-level and network-level faults, especially
  17. if you capture the network traffic simultaneously at the elected master and the
  18. faulty node. The connection used for follower checks is not used for any other
  19. traffic so it can be easily identified from the flow pattern alone, even if TLS
  20. is in use: almost exactly every second there will be a few hundred bytes sent
  21. each way, first the request by the master and then the response by the
  22. follower. You should be able to observe any retransmissions, packet loss, or
  23. other delays on such a connection.
  24. end::troubleshooting-network-timeouts-packet-capture-fault-detection[]
  25. tag::troubleshooting-network-timeouts-threads[]
  26. * Long waits for particular threads to be available can be identified by taking
  27. stack dumps (for example, using `jstack`) or a profiling trace (for example,
  28. using Java Flight Recorder) in the few seconds leading up to the relevant log
  29. message.
  30. +
  31. The <<cluster-nodes-hot-threads>> API sometimes yields useful information, but
  32. bear in mind that this API also requires a number of `transport_worker` and
  33. `generic` threads across all the nodes in the cluster. The API may be affected
  34. by the very problem you're trying to diagnose. `jstack` is much more reliable
  35. since it doesn't require any JVM threads.
  36. +
  37. The threads involved in discovery and cluster membership are mainly
  38. `transport_worker` and `cluster_coordination` threads, for which there should
  39. never be a long wait. There may also be evidence of long waits for threads in
  40. the {es} logs. See <<modules-network-threading-model>> for more information.
  41. end::troubleshooting-network-timeouts-threads[]