high-cpu-usage.asciidoc 4.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
  1. [[high-cpu-usage]]
  2. === High CPU usage
  3. {es} uses <<modules-threadpool,thread pools>> to manage CPU resources for
  4. concurrent operations. High CPU usage typically means one or more thread pools
  5. are running low.
  6. If a thread pool is depleted, {es} will <<rejected-requests,reject requests>>
  7. related to the thread pool. For example, if the `search` thread pool is
  8. depleted, {es} will reject search requests until more threads are available.
  9. You might experience high CPU usage if a <<data-tiers,data tier>>, and therefore the nodes assigned to that tier, is experiencing more traffic than other tiers. This imbalance in resource utilization is also known as <<hotspotting,hot spotting>>.
  10. ****
  11. If you're using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to https://www.elastic.co/guide/en/cloud/current/ec-autoops.html[Monitor with AutoOps].
  12. ****
  13. [discrete]
  14. [[diagnose-high-cpu-usage]]
  15. ==== Diagnose high CPU usage
  16. **Check CPU usage**
  17. You can check the CPU usage per node using the <<cat-nodes,cat nodes API>>:
  18. // tag::cpu-usage-cat-nodes[]
  19. [source,console]
  20. ----
  21. GET _cat/nodes?v=true&s=cpu:desc
  22. ----
  23. The response's `cpu` column contains the current CPU usage as a percentage.
  24. The `name` column contains the node's name. Elevated but transient CPU usage is
  25. normal. However, if CPU usage is elevated for an extended duration, it should be
  26. investigated.
  27. To track CPU usage over time, we recommend enabling monitoring:
  28. include::{es-ref-dir}/tab-widgets/cpu-usage-widget.asciidoc[]
  29. **Check hot threads**
  30. If a node has high CPU usage, use the <<cluster-nodes-hot-threads,nodes hot
  31. threads API>> to check for resource-intensive threads running on the node.
  32. [source,console]
  33. ----
  34. GET _nodes/hot_threads
  35. ----
  36. // TEST[s/\/my-node,my-other-node//]
  37. This API returns a breakdown of any hot threads in plain text. High CPU usage
  38. frequently correlates to <<task-queue-backlog,a long-running task, or a
  39. backlog of tasks>>.
  40. [discrete]
  41. [[reduce-cpu-usage]]
  42. ==== Reduce CPU usage
  43. The following tips outline the most common causes of high CPU usage and their
  44. solutions.
  45. **Scale your cluster**
  46. Heavy indexing and search loads can deplete smaller thread pools. To better
  47. handle heavy workloads, add more nodes to your cluster or upgrade your existing
  48. nodes to increase capacity.
  49. **Spread out bulk requests**
  50. While more efficient than individual requests, large <<docs-bulk,bulk indexing>>
  51. or <<search-multi-search,multi-search>> requests still require CPU resources. If
  52. possible, submit smaller requests and allow more time between them.
  53. **Cancel long-running searches**
  54. Long-running searches can block threads in the `search` thread pool. To check
  55. for these searches, use the <<tasks,task management API>>.
  56. [source,console]
  57. ----
  58. GET _tasks?actions=*search&detailed
  59. ----
  60. The response's `description` contains the search request and its queries.
  61. `running_time_in_nanos` shows how long the search has been running.
  62. [source,console-result]
  63. ----
  64. {
  65. "nodes" : {
  66. "oTUltX4IQMOUUVeiohTt8A" : {
  67. "name" : "my-node",
  68. "transport_address" : "127.0.0.1:9300",
  69. "host" : "127.0.0.1",
  70. "ip" : "127.0.0.1:9300",
  71. "tasks" : {
  72. "oTUltX4IQMOUUVeiohTt8A:464" : {
  73. "node" : "oTUltX4IQMOUUVeiohTt8A",
  74. "id" : 464,
  75. "type" : "transport",
  76. "action" : "indices:data/read/search",
  77. "description" : "indices[my-index], search_type[QUERY_THEN_FETCH], source[{\"query\":...}]",
  78. "start_time_in_millis" : 4081771730000,
  79. "running_time_in_nanos" : 13991383,
  80. "cancellable" : true
  81. }
  82. }
  83. }
  84. }
  85. }
  86. ----
  87. // TESTRESPONSE[skip: no way to get tasks]
  88. To cancel a search and free up resources, use the API's `_cancel` endpoint.
  89. [source,console]
  90. ----
  91. POST _tasks/oTUltX4IQMOUUVeiohTt8A:464/_cancel
  92. ----
  93. For additional tips on how to track and avoid resource-intensive searches, see
  94. <<avoid-expensive-searches,Avoid expensive searches>>.