|
@@ -1,10 +1,10 @@
|
|
|
[[task-queue-backlog]]
|
|
|
=== Task queue backlog
|
|
|
|
|
|
-A backlogged task queue can prevent tasks from completing and
|
|
|
-put the cluster into an unhealthy state.
|
|
|
-Resource constraints, a large number of tasks being triggered at once,
|
|
|
-and long running tasks can all contribute to a backlogged task queue.
|
|
|
+A backlogged task queue can prevent tasks from completing and put the cluster
|
|
|
+into an unhealthy state. Resource constraints, a large number of tasks being
|
|
|
+triggered at once, and long running tasks can all contribute to a backlogged
|
|
|
+task queue.
|
|
|
|
|
|
[discrete]
|
|
|
[[diagnose-task-queue-backlog]]
|
|
@@ -12,39 +12,77 @@ and long running tasks can all contribute to a backlogged task queue.
|
|
|
|
|
|
**Check the thread pool status**
|
|
|
|
|
|
-A <<high-cpu-usage,depleted thread pool>> can result in <<rejected-requests,rejected requests>>.
|
|
|
+A <<high-cpu-usage,depleted thread pool>> can result in
|
|
|
+<<rejected-requests,rejected requests>>.
|
|
|
|
|
|
-You can use the <<cat-thread-pool,cat thread pool API>> to
|
|
|
-see the number of active threads in each thread pool and
|
|
|
-how many tasks are queued, how many have been rejected, and how many have completed.
|
|
|
+Thread pool depletion might be restricted to a specific <<data-tiers,data tier>>. If <<hotspotting,hot spotting>> is occuring, one node might experience depletion faster than other nodes, leading to performance issues and a growing task backlog.
|
|
|
+
|
|
|
+You can use the <<cat-thread-pool,cat thread pool API>> to see the number of
|
|
|
+active threads in each thread pool and how many tasks are queued, how many
|
|
|
+have been rejected, and how many have completed.
|
|
|
|
|
|
[source,console]
|
|
|
----
|
|
|
GET /_cat/thread_pool?v&s=t,n&h=type,name,node_name,active,queue,rejected,completed
|
|
|
----
|
|
|
|
|
|
+The `active` and `queue` statistics are instantaneous while the `rejected` and
|
|
|
+`completed` statistics are cumulative from node startup.
|
|
|
+
|
|
|
**Inspect the hot threads on each node**
|
|
|
|
|
|
-If a particular thread pool queue is backed up,
|
|
|
-you can periodically poll the <<cluster-nodes-hot-threads,Nodes hot threads>> API
|
|
|
-to determine if the thread has sufficient
|
|
|
-resources to progress and gauge how quickly it is progressing.
|
|
|
+If a particular thread pool queue is backed up, you can periodically poll the
|
|
|
+<<cluster-nodes-hot-threads,Nodes hot threads>> API to determine if the thread
|
|
|
+has sufficient resources to progress and gauge how quickly it is progressing.
|
|
|
|
|
|
[source,console]
|
|
|
----
|
|
|
GET /_nodes/hot_threads
|
|
|
----
|
|
|
|
|
|
-**Look for long running tasks**
|
|
|
+**Look for long running node tasks**
|
|
|
+
|
|
|
+Long-running tasks can also cause a backlog. You can use the <<tasks,task
|
|
|
+management>> API to get information about the node tasks that are running.
|
|
|
+Check the `running_time_in_nanos` to identify tasks that are taking an
|
|
|
+excessive amount of time to complete.
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----
|
|
|
+GET /_tasks?pretty=true&human=true&detailed=true
|
|
|
+----
|
|
|
|
|
|
-Long-running tasks can also cause a backlog.
|
|
|
-You can use the <<tasks,task management>> API to get information about the tasks that are running.
|
|
|
-Check the `running_time_in_nanos` to identify tasks that are taking an excessive amount of time to complete.
|
|
|
+If a particular `action` is suspected, you can filter the tasks further. The most common long-running tasks are <<docs-bulk,bulk index>>- or search-related.
|
|
|
|
|
|
+* Filter for <<docs-bulk,bulk index>> actions:
|
|
|
++
|
|
|
[source,console]
|
|
|
----
|
|
|
-GET /_tasks?filter_path=nodes.*.tasks
|
|
|
+GET /_tasks?human&detailed&actions=indices:data/write/bulk
|
|
|
+----
|
|
|
+
|
|
|
+* Filter for search actions:
|
|
|
++
|
|
|
+[source,console]
|
|
|
----
|
|
|
+GET /_tasks?human&detailed&actions=indices:data/write/search
|
|
|
+----
|
|
|
+
|
|
|
+The API response may contain additional tasks columns, including `description` and `header`, which provides the task parameters, target, and requestor. You can use this information to perform further diagnosis.
|
|
|
+
|
|
|
+**Look for long running cluster tasks**
|
|
|
+
|
|
|
+A task backlog might also appear as a delay in synchronizing the cluster state. You
|
|
|
+can use the <<cluster-pending,cluster pending tasks API>> to get information
|
|
|
+about the pending cluster state sync tasks that are running.
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----
|
|
|
+GET /_cluster/pending_tasks
|
|
|
+----
|
|
|
+
|
|
|
+Check the `timeInQueue` to identify tasks that are taking an excessive amount
|
|
|
+of time to complete.
|
|
|
|
|
|
[discrete]
|
|
|
[[resolve-task-queue-backlog]]
|