point-in-time-api.asciidoc 7.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195
  1. [[point-in-time-api]]
  2. === Point in time API
  3. ++++
  4. <titleabbrev>Point in time</titleabbrev>
  5. ++++
  6. A search request by default executes against the most recent visible data of
  7. the target indices, which is called point in time. Elasticsearch pit (point in time)
  8. is a lightweight view into the state of the data as it existed when initiated.
  9. In some cases, it's preferred to perform multiple search requests using
  10. the same point in time. For example, if <<indices-refresh,refreshes>> happen between
  11. search_after requests, then the results of those requests might not be consistent as
  12. changes happening between searches are only visible to the more recent point in time.
  13. [[point-in-time-api-prereqs]]
  14. ==== {api-prereq-title}
  15. * If the {es} {security-features} are enabled, you must have the `read`
  16. <<privileges-list-indices,index privilege>> for the target data stream, index,
  17. or alias.
  18. +
  19. To search a <<point-in-time-api,point in time (PIT)>> for an alias, you
  20. must have the `read` index privilege for the alias's data streams or indices.
  21. [[point-in-time-api-example]]
  22. ==== {api-examples-title}
  23. A point in time must be opened explicitly before being used in search requests. The
  24. keep_alive parameter tells Elasticsearch how long it should keep a point in time alive,
  25. e.g. `?keep_alive=5m`.
  26. [source,console]
  27. --------------------------------------------------
  28. POST /my-index-000001/_pit?keep_alive=1m
  29. --------------------------------------------------
  30. // TEST[setup:my_index]
  31. The result from the above request includes a `id`, which should
  32. be passed to the `id` of the `pit` parameter of a search request.
  33. [source,console]
  34. --------------------------------------------------
  35. POST /_search <1>
  36. {
  37. "size": 100, <2>
  38. "query": {
  39. "match" : {
  40. "title" : "elasticsearch"
  41. }
  42. },
  43. "pit": {
  44. "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", <3>
  45. "keep_alive": "1m" <4>
  46. }
  47. }
  48. --------------------------------------------------
  49. // TEST[catch:unavailable]
  50. <1> A search request with the `pit` parameter must not specify `index`, `routing`,
  51. or <<search-preference,`preference`>>
  52. as these parameters are copied from the point in time.
  53. <2> Just like regular searches, you can <<paginate-search-results,use `from` and
  54. `size` to page through search results>>, up to the first 10,000 hits. If you
  55. want to retrieve more hits, use PIT with <<search-after,`search_after`>>.
  56. <3> The `id` parameter tells Elasticsearch to execute the request using contexts
  57. from this point in time.
  58. <4> The `keep_alive` parameter tells Elasticsearch how long it should extend
  59. the time to live of the point in time.
  60. IMPORTANT: The open point in time request and each subsequent search request can
  61. return different `id`; thus always use the most recently received `id` for the
  62. next search request.
  63. [[point-in-time-keep-alive]]
  64. ==== Keeping point in time alive
  65. The `keep_alive` parameter, which is passed to a open point in time request and
  66. search request, extends the time to live of the corresponding point in time.
  67. The value (e.g. `1m`, see <<time-units>>) does not need to be long enough to
  68. process all data -- it just needs to be long enough for the next request.
  69. Normally, the background merge process optimizes the index by merging together
  70. smaller segments to create new, bigger segments. Once the smaller segments are
  71. no longer needed they are deleted. However, open point-in-times prevent the
  72. old segments from being deleted since they are still in use.
  73. TIP: Keeping older segments alive means that more disk space and file handles
  74. are needed. Ensure that you have configured your nodes to have ample free file
  75. handles. See <<file-descriptors>>.
  76. Additionally, if a segment contains deleted or updated documents then the
  77. point in time must keep track of whether each document in the segment was live at
  78. the time of the initial search request. Ensure that your nodes have sufficient heap
  79. space if you have many open point-in-times on an index that is subject to ongoing
  80. deletes or updates. Note that a point-in-time doesn't prevent its associated indices
  81. from being deleted.
  82. You can check how many point-in-times (i.e, search contexts) are open with the
  83. <<cluster-nodes-stats,nodes stats API>>:
  84. [source,console]
  85. ---------------------------------------
  86. GET /_nodes/stats/indices/search
  87. ---------------------------------------
  88. [[close-point-in-time-api]]
  89. ==== Close point in time API
  90. Point-in-time is automatically closed when its `keep_alive` has
  91. been elapsed. However keeping point-in-times has a cost, as discussed in the
  92. <<point-in-time-keep-alive,previous section>>. Point-in-times should be closed
  93. as soon as they are no longer used in search requests.
  94. [source,console]
  95. ---------------------------------------
  96. DELETE /_pit
  97. {
  98. "id" : "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
  99. }
  100. ---------------------------------------
  101. // TEST[catch:missing]
  102. The API returns the following response:
  103. [source,console-result]
  104. --------------------------------------------------
  105. {
  106. "succeeded": true, <1>
  107. "num_freed": 3 <2>
  108. }
  109. --------------------------------------------------
  110. // TESTRESPONSE[s/"succeeded": true/"succeeded": $body.succeeded/]
  111. // TESTRESPONSE[s/"num_freed": 3/"num_freed": $body.num_freed/]
  112. <1> If true, all search contexts associated with the point-in-time id are successfully closed
  113. <2> The number of search contexts have been successfully closed
  114. [discrete]
  115. [[search-slicing]]
  116. === Search slicing
  117. When paging through a large number of documents, it can be helpful to split the search into multiple slices
  118. to consume them independently:
  119. [source,console]
  120. --------------------------------------------------
  121. GET /_search
  122. {
  123. "slice": {
  124. "id": 0, <1>
  125. "max": 2 <2>
  126. },
  127. "query": {
  128. "match": {
  129. "message": "foo"
  130. }
  131. },
  132. "pit": {
  133. "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
  134. }
  135. }
  136. GET /_search
  137. {
  138. "slice": {
  139. "id": 1,
  140. "max": 2
  141. },
  142. "pit": {
  143. "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
  144. },
  145. "query": {
  146. "match": {
  147. "message": "foo"
  148. }
  149. }
  150. }
  151. --------------------------------------------------
  152. // TEST[skip:both calls will throw errors]
  153. <1> The id of the slice
  154. <2> The maximum number of slices
  155. The result from the first request returns documents belonging to the first slice (id: 0) and the
  156. result from the second request returns documents in the second slice. Since the maximum number of
  157. slices is set to 2 the union of the results of the two requests is equivalent to the results of a
  158. point-in-time search without slicing. By default the splitting is done first on the shards, then
  159. locally on each shard. The local splitting partitions the shard into contiguous ranges based on
  160. Lucene document IDs.
  161. For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices
  162. 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.
  163. IMPORTANT: The same point-in-time ID should be used for all slices. If different PIT IDs are used,
  164. then slices can overlap and miss documents. This is because the splitting criterion is based on
  165. Lucene document IDs, which are not stable across changes to the index.