threading.asciidoc 6.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
  1. [[modules-network-threading-model]]
  2. ==== Networking threading model
  3. This section describes the threading model used by the networking subsystem in
  4. {es}. This information isn't required to use {es}, but it may be useful to
  5. advanced users who are diagnosing network problems in a cluster.
  6. {es} nodes communicate over a collection of TCP channels that together form a
  7. transport connection. {es} clients communicate with the cluster over HTTP,
  8. which also uses one or more TCP channels. Each of these TCP channels is owned
  9. by exactly one of the `transport_worker` threads in the node. This owning
  10. thread is chosen when the channel is opened and remains the same for the
  11. lifetime of the channel.
  12. Each `transport_worker` thread has sole responsibility for sending and
  13. receiving data over the channels it owns. One of the `transport_worker` threads
  14. is also responsible for accepting new incoming transport connections, and one
  15. is responsible for accepting new HTTP connections.
  16. If a thread in {es} wants to send data over a particular channel, it passes the
  17. data to the owning `transport_worker` thread for the actual transmission.
  18. Normally the `transport_worker` threads will not completely handle the messages
  19. they receive. Instead, they will do a small amount of preliminary processing
  20. and then dispatch (hand off) the message to a different
  21. <<modules-threadpool,threadpool>> for the rest of their handling. For instance,
  22. bulk messages are dispatched to the `write` threadpool, searches are dispatched
  23. to one of the `search` threadpools, and requests for statistics and other
  24. management tasks are mostly dispatched to the `management` threadpool. However
  25. in some cases the processing of a message is expected to be so quick that {es}
  26. will do all of the processing on the `transport_worker` thread rather than
  27. incur the overhead of dispatching it elsewhere.
  28. By default, there is one `transport_worker` thread per CPU. In contrast, there
  29. may sometimes be tens-of-thousands of TCP channels. If data arrives on a TCP
  30. channel and its owning `transport_worker` thread is busy, the data isn't
  31. processed until the thread finishes whatever it is doing. Similarly, outgoing
  32. data are not sent over a channel until the owning `transport_worker` thread is
  33. free. This means that we require every `transport_worker` thread to be idle
  34. frequently. An idle `transport_worker` looks something like this in a stack
  35. dump:
  36. [source,text]
  37. ----
  38. "elasticsearch[instance-0000000004][transport_worker][T#1]" #32 daemon prio=5 os_prio=0 cpu=9645.94ms elapsed=501.63s tid=0x00007fb83b6307f0 nid=0x1c4 runnable [0x00007fb7b8ffe000]
  39. java.lang.Thread.State: RUNNABLE
  40. at sun.nio.ch.EPoll.wait(java.base@17.0.2/Native Method)
  41. at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@17.0.2/EPollSelectorImpl.java:118)
  42. at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@17.0.2/SelectorImpl.java:129)
  43. - locked <0x00000000c443c518> (a sun.nio.ch.Util$2)
  44. - locked <0x00000000c38f7700> (a sun.nio.ch.EPollSelectorImpl)
  45. at sun.nio.ch.SelectorImpl.select(java.base@17.0.2/SelectorImpl.java:146)
  46. at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:813)
  47. at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
  48. at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
  49. at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
  50. at java.lang.Thread.run(java.base@17.0.2/Thread.java:833)
  51. ----
  52. In the <<cluster-nodes-hot-threads>> API an idle `transport_worker` thread is
  53. reported like this:
  54. [source,text]
  55. ----
  56. 100.0% [cpu=0.0%, other=100.0%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000004][transport_worker][T#1]'
  57. 10/10 snapshots sharing following 9 elements
  58. java.base@17.0.2/sun.nio.ch.EPoll.wait(Native Method)
  59. java.base@17.0.2/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:118)
  60. java.base@17.0.2/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
  61. java.base@17.0.2/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
  62. io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:813)
  63. io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
  64. io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
  65. io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
  66. java.base@17.0.2/java.lang.Thread.run(Thread.java:833)
  67. ----
  68. Note that `transport_worker` threads should always be in state `RUNNABLE`, even
  69. when waiting for input, because they block in the native `EPoll#wait` method.
  70. This means the hot threads API will report these threads at 100% overall
  71. utilisation. This is normal, and the breakdown of time into `cpu=` and `other=`
  72. fractions shows how much time the thread spent running and waiting for input
  73. respectively.
  74. If a `transport_worker` thread is not frequently idle, it may build up a
  75. backlog of work. This can cause delays in processing messages on the channels
  76. that it owns. It's hard to predict exactly which work will be delayed:
  77. * There are many more channels than threads. If work related to one channel is
  78. causing delays to its worker thread, all other channels owned by that thread
  79. will also suffer delays.
  80. * The mapping from TCP channels to worker threads is fixed but arbitrary. Each
  81. channel is assigned an owning thread in a round-robin fashion when the channel
  82. is opened. Each worker thread is responsible for many different kinds of
  83. channel.
  84. * There are many channels open between each pair of nodes. For each request,
  85. {es} will choose from the appropriate channels in a round-robin fashion. Some
  86. requests may end up on a channel owned by a delayed worker while other
  87. identical requests will be sent on a channel that's working smoothly.
  88. If the backlog builds up too far, some messages may be delayed by many seconds.
  89. The node might even <<cluster-fault-detection,fail its health checks>> and be
  90. removed from the cluster. Sometimes, you can find evidence of busy
  91. `transport_worker` threads using the <<cluster-nodes-hot-threads>> API.
  92. However, this API itself sends network messages so may not work correctly if
  93. the `transport_worker` threads are too busy. It is more reliable to use
  94. `jstack` to obtain stack dumps or use Java Flight Recorder to obtain a
  95. profiling trace. These tools are independent of any work the JVM is performing.