task-queue-backlog.asciidoc 2.0 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
  1. [[task-queue-backlog]]
  2. === Task queue backlog
  3. A backlogged task queue can prevent tasks from completing and
  4. put the cluster into an unhealthy state.
  5. Resource constraints, a large number of tasks being triggered at once,
  6. and long running tasks can all contribute to a backlogged task queue.
  7. [discrete]
  8. [[diagnose-task-queue-backlog]]
  9. ==== Diagnose a task queue backlog
  10. **Check the thread pool status**
  11. A <<high-cpu-usage,depleted thread pool>> can result in <<rejected-requests,rejected requests>>.
  12. You can use the <<cat-thread-pool,cat thread pool API>> to
  13. see the number of active threads in each thread pool and
  14. how many tasks are queued, how many have been rejected, and how many have completed.
  15. [source,console]
  16. ----
  17. GET /_cat/thread_pool?v&s=t,n&h=type,name,node_name,active,queue,rejected,completed
  18. ----
  19. **Inspect the hot threads on each node**
  20. If a particular thread pool queue is backed up,
  21. you can periodically poll the <<cluster-nodes-hot-threads,Nodes hot threads>> API
  22. to determine if the thread has sufficient
  23. resources to progress and gauge how quickly it is progressing.
  24. [source,console]
  25. ----
  26. GET /_nodes/hot_threads
  27. ----
  28. **Look for long running tasks**
  29. Long-running tasks can also cause a backlog.
  30. You can use the <<tasks,task management>> API to get information about the tasks that are running.
  31. Check the `running_time_in_nanos` to identify tasks that are taking an excessive amount of time to complete.
  32. [source,console]
  33. ----
  34. GET /_tasks?filter_path=nodes.*.tasks
  35. ----
  36. [discrete]
  37. [[resolve-task-queue-backlog]]
  38. ==== Resolve a task queue backlog
  39. **Increase available resources**
  40. If tasks are progressing slowly and the queue is backing up,
  41. you might need to take steps to <<reduce-cpu-usage>>.
  42. In some cases, increasing the thread pool size might help.
  43. For example, the `force_merge` thread pool defaults to a single thread.
  44. Increasing the size to 2 might help reduce a backlog of force merge requests.
  45. **Cancel stuck tasks**
  46. If you find the active task's hot thread isn't progressing and there's a backlog,
  47. consider canceling the task.