123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195 |
- [[point-in-time-api]]
- === Point in time API
- ++++
- <titleabbrev>Point in time</titleabbrev>
- ++++
- A search request by default executes against the most recent visible data of
- the target indices, which is called point in time. Elasticsearch pit (point in time)
- is a lightweight view into the state of the data as it existed when initiated.
- In some cases, it's preferred to perform multiple search requests using
- the same point in time. For example, if <<indices-refresh,refreshes>> happen between
- search_after requests, then the results of those requests might not be consistent as
- changes happening between searches are only visible to the more recent point in time.
- [[point-in-time-api-prereqs]]
- ==== {api-prereq-title}
- * If the {es} {security-features} are enabled, you must have the `read`
- <<privileges-list-indices,index privilege>> for the target data stream, index,
- or alias.
- +
- To search a <<point-in-time-api,point in time (PIT)>> for an alias, you
- must have the `read` index privilege for the alias's data streams or indices.
- [[point-in-time-api-example]]
- ==== {api-examples-title}
- A point in time must be opened explicitly before being used in search requests. The
- keep_alive parameter tells Elasticsearch how long it should keep a point in time alive,
- e.g. `?keep_alive=5m`.
- [source,console]
- --------------------------------------------------
- POST /my-index-000001/_pit?keep_alive=1m
- --------------------------------------------------
- // TEST[setup:my_index]
- The result from the above request includes a `id`, which should
- be passed to the `id` of the `pit` parameter of a search request.
- [source,console]
- --------------------------------------------------
- POST /_search <1>
- {
- "size": 100, <2>
- "query": {
- "match" : {
- "title" : "elasticsearch"
- }
- },
- "pit": {
- "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", <3>
- "keep_alive": "1m" <4>
- }
- }
- --------------------------------------------------
- // TEST[catch:unavailable]
- <1> A search request with the `pit` parameter must not specify `index`, `routing`,
- or <<search-preference,`preference`>>
- as these parameters are copied from the point in time.
- <2> Just like regular searches, you can <<paginate-search-results,use `from` and
- `size` to page through search results>>, up to the first 10,000 hits. If you
- want to retrieve more hits, use PIT with <<search-after,`search_after`>>.
- <3> The `id` parameter tells Elasticsearch to execute the request using contexts
- from this point in time.
- <4> The `keep_alive` parameter tells Elasticsearch how long it should extend
- the time to live of the point in time.
- IMPORTANT: The open point in time request and each subsequent search request can
- return different `id`; thus always use the most recently received `id` for the
- next search request.
- [[point-in-time-keep-alive]]
- ==== Keeping point in time alive
- The `keep_alive` parameter, which is passed to a open point in time request and
- search request, extends the time to live of the corresponding point in time.
- The value (e.g. `1m`, see <<time-units>>) does not need to be long enough to
- process all data -- it just needs to be long enough for the next request.
- Normally, the background merge process optimizes the index by merging together
- smaller segments to create new, bigger segments. Once the smaller segments are
- no longer needed they are deleted. However, open point-in-times prevent the
- old segments from being deleted since they are still in use.
- TIP: Keeping older segments alive means that more disk space and file handles
- are needed. Ensure that you have configured your nodes to have ample free file
- handles. See <<file-descriptors>>.
- Additionally, if a segment contains deleted or updated documents then the
- point in time must keep track of whether each document in the segment was live at
- the time of the initial search request. Ensure that your nodes have sufficient heap
- space if you have many open point-in-times on an index that is subject to ongoing
- deletes or updates. Note that a point-in-time doesn't prevent its associated indices
- from being deleted.
- You can check how many point-in-times (i.e, search contexts) are open with the
- <<cluster-nodes-stats,nodes stats API>>:
- [source,console]
- ---------------------------------------
- GET /_nodes/stats/indices/search
- ---------------------------------------
- [[close-point-in-time-api]]
- ==== Close point in time API
- Point-in-time is automatically closed when its `keep_alive` has
- been elapsed. However keeping point-in-times has a cost, as discussed in the
- <<point-in-time-keep-alive,previous section>>. Point-in-times should be closed
- as soon as they are no longer used in search requests.
- [source,console]
- ---------------------------------------
- DELETE /_pit
- {
- "id" : "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
- }
- ---------------------------------------
- // TEST[catch:missing]
- The API returns the following response:
- [source,console-result]
- --------------------------------------------------
- {
- "succeeded": true, <1>
- "num_freed": 3 <2>
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"succeeded": true/"succeeded": $body.succeeded/]
- // TESTRESPONSE[s/"num_freed": 3/"num_freed": $body.num_freed/]
- <1> If true, all search contexts associated with the point-in-time id are successfully closed
- <2> The number of search contexts have been successfully closed
- [discrete]
- [[search-slicing]]
- === Search slicing
- When paging through a large number of documents, it can be helpful to split the search into multiple slices
- to consume them independently:
- [source,console]
- --------------------------------------------------
- GET /_search
- {
- "slice": {
- "id": 0, <1>
- "max": 2 <2>
- },
- "query": {
- "match": {
- "message": "foo"
- }
- },
- "pit": {
- "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
- }
- }
- GET /_search
- {
- "slice": {
- "id": 1,
- "max": 2
- },
- "pit": {
- "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
- },
- "query": {
- "match": {
- "message": "foo"
- }
- }
- }
- --------------------------------------------------
- // TEST[skip:both calls will throw errors]
- <1> The id of the slice
- <2> The maximum number of slices
- The result from the first request returns documents belonging to the first slice (id: 0) and the
- result from the second request returns documents in the second slice. Since the maximum number of
- slices is set to 2 the union of the results of the two requests is equivalent to the results of a
- point-in-time search without slicing. By default the splitting is done first on the shards, then
- locally on each shard. The local splitting partitions the shard into contiguous ranges based on
- Lucene document IDs.
- For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices
- 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.
- IMPORTANT: The same point-in-time ID should be used for all slices. If different PIT IDs are used,
- then slices can overlap and miss documents. This is because the splitting criterion is based on
- Lucene document IDs, which are not stable across changes to the index.
|