123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118 |
- [[modules-network-threading-model]]
- ==== Networking threading model
- This section describes the threading model used by the networking subsystem in
- {es}. This information isn't required to use {es}, but it may be useful to
- advanced users who are diagnosing network problems in a cluster.
- {es} nodes communicate over a collection of TCP channels that together form a
- transport connection. {es} clients communicate with the cluster over HTTP,
- which also uses one or more TCP channels. Each of these TCP channels is owned
- by exactly one of the `transport_worker` threads in the node. This owning
- thread is chosen when the channel is opened and remains the same for the
- lifetime of the channel.
- Each `transport_worker` thread has sole responsibility for sending and
- receiving data over the channels it owns. Additionally, each http and transport
- server socket is assigned to one of the `transport_worker` threads. That worker
- has the responsibility of accepting new incoming connections to the server
- socket it owns.
- If a thread in {es} wants to send data over a particular channel, it passes the
- data to the owning `transport_worker` thread for the actual transmission.
- Normally the `transport_worker` threads will not completely handle the messages
- they receive. Instead, they will do a small amount of preliminary processing
- and then dispatch (hand off) the message to a different
- <<modules-threadpool,threadpool>> for the rest of their handling. For instance,
- bulk messages are dispatched to the `write` threadpool, searches are dispatched
- to one of the `search` threadpools, and requests for statistics and other
- management tasks are mostly dispatched to the `management` threadpool. However
- in some cases the processing of a message is expected to be so quick that {es}
- will do all of the processing on the `transport_worker` thread rather than
- incur the overhead of dispatching it elsewhere.
- By default, there is one `transport_worker` thread per CPU. In contrast, there
- may sometimes be tens-of-thousands of TCP channels. If data arrives on a TCP
- channel and its owning `transport_worker` thread is busy, the data isn't
- processed until the thread finishes whatever it is doing. Similarly, outgoing
- data are not sent over a channel until the owning `transport_worker` thread is
- free. This means that we require every `transport_worker` thread to be idle
- frequently. An idle `transport_worker` looks something like this in a stack
- dump:
- [source,text]
- ----
- "elasticsearch[instance-0000000004][transport_worker][T#1]" #32 daemon prio=5 os_prio=0 cpu=9645.94ms elapsed=501.63s tid=0x00007fb83b6307f0 nid=0x1c4 runnable [0x00007fb7b8ffe000]
- java.lang.Thread.State: RUNNABLE
- at sun.nio.ch.EPoll.wait(java.base@17.0.2/Native Method)
- at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@17.0.2/EPollSelectorImpl.java:118)
- at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@17.0.2/SelectorImpl.java:129)
- - locked <0x00000000c443c518> (a sun.nio.ch.Util$2)
- - locked <0x00000000c38f7700> (a sun.nio.ch.EPollSelectorImpl)
- at sun.nio.ch.SelectorImpl.select(java.base@17.0.2/SelectorImpl.java:146)
- at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:813)
- at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
- at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
- at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
- at java.lang.Thread.run(java.base@17.0.2/Thread.java:833)
- ----
- In the <<cluster-nodes-hot-threads>> API an idle `transport_worker` thread is
- reported like this:
- [source,text]
- ----
- 0.0% [cpu=0.0%, idle=100.0%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000004][transport_worker][T#1]'
- 10/10 snapshots sharing following 9 elements
- java.base@17.0.2/sun.nio.ch.EPoll.wait(Native Method)
- java.base@17.0.2/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:118)
- java.base@17.0.2/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
- java.base@17.0.2/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
- io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:813)
- io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
- io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
- io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
- java.base@17.0.2/java.lang.Thread.run(Thread.java:833)
- ----
- Note that `transport_worker` threads should always be in state `RUNNABLE`, even
- when waiting for input, because they block in the native `EPoll#wait` method. The `idle=`
- time reports the proportion of time the thread spent waiting for input, whereas the `cpu=` time
- reports the proportion of time the thread spent processing input it has received.
- If a `transport_worker` thread is not frequently idle, it may build up a
- backlog of work. This can cause delays in processing messages on the channels
- that it owns. It's hard to predict exactly which work will be delayed:
- * There are many more channels than threads. If work related to one channel is
- causing delays to its worker thread, all other channels owned by that thread
- will also suffer delays.
- * The mapping from TCP channels to worker threads is fixed but arbitrary. Each
- channel is assigned an owning thread in a round-robin fashion when the channel
- is opened. Each worker thread is responsible for many different kinds of
- channel.
- * There are many channels open between each pair of nodes. For each request,
- {es} will choose from the appropriate channels in a round-robin fashion. Some
- requests may end up on a channel owned by a delayed worker while other
- identical requests will be sent on a channel that's working smoothly.
- If the backlog builds up too far, some messages may be delayed by many seconds.
- The node might even <<cluster-fault-detection,fail its health checks>> and be
- removed from the cluster. Sometimes, you can find evidence of busy
- `transport_worker` threads using the <<cluster-nodes-hot-threads>> API.
- However, this API itself sends network messages so may not work correctly if
- the `transport_worker` threads are too busy. It is more reliable to use
- `jstack` to obtain stack dumps or use Java Flight Recorder to obtain a
- profiling trace. These tools are independent of any work the JVM is performing.
- It may also be possible to identify some reasons for delays from the server
- logs, particularly looking at warnings from
- `org.elasticsearch.transport.InboundHandler` and
- `org.elasticsearch.transport.OutboundHandler`. Warnings about long processing
- times from the `InboundHandler` are particularly indicative of incorrect
- threading behaviour, whereas the transmission time reported by the
- `OutboundHandler` includes time spent waiting for network congestion and the
- `transport_worker` thread is free to do other work during this time.
|