123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743 |
- [[docs-update-by-query]]
- === Update By Query API
- ++++
- <titleabbrev>Update by query</titleabbrev>
- ++++
- Updates documents that match the specified query.
- If no query is specified, performs an update on every document in the index without
- modifying the source, which is useful for picking up mapping changes.
- [source,console]
- --------------------------------------------------
- POST twitter/_update_by_query?conflicts=proceed
- --------------------------------------------------
- // TEST[setup:big_twitter]
- ////
- [source,console-result]
- --------------------------------------------------
- {
- "took" : 147,
- "timed_out": false,
- "updated": 120,
- "deleted": 0,
- "batches": 1,
- "version_conflicts": 0,
- "noops": 0,
- "retries": {
- "bulk": 0,
- "search": 0
- },
- "throttled_millis": 0,
- "requests_per_second": -1.0,
- "throttled_until_millis": 0,
- "total": 120,
- "failures" : [ ]
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took" : 147/"took" : "$body.took"/]
- ////
- [[docs-update-by-query-api-request]]
- ==== {api-request-title}
- `POST /<index>/_update_by_query`
- [[docs-update-by-query-api-desc]]
- ==== {api-description-title}
- You can specify the query criteria in the request URI or the request body
- using the same syntax as the <<search-search,Search API>>.
- When you submit an update by query request, {es} gets a snapshot of the index
- when it begins processing the request and updates matching documents using
- `internal` versioning.
- When the versions match, the document is updated and the version number is incremented.
- If a document changes between the time that the snapshot is taken and
- the update operation is processed, it results in a version conflict and the operation fails.
- You can opt to count version conflicts instead of halting and returning by
- setting `conflicts` to `proceed`.
- NOTE: Documents with a version equal to 0 cannot be updated using update by
- query because `internal` versioning does not support 0 as a valid
- version number.
- While processing an update by query request, {es} performs multiple search
- requests sequentially to find all of the matching documents.
- A bulk update request is performed for each batch of matching documents.
- Any query or update failures cause the update by query request to fail and
- the failures are shown in the response.
- Any update requests that completed successfully still stick, they are not rolled back.
- ===== Refreshing shards
- Specifying the `refresh` parameter refreshes all shards once the request completes.
- This is different than the update API#8217;s `refresh` parameter, which causes just the shard
- that received the request to be refreshed. Unlike the update API, it does not support
- `wait_for`.
- [[docs-update-by-query-task-api]]
- ===== Running update by query asynchronously
- If the request contains `wait_for_completion=false`, {es}
- performs some preflight checks, launches the request, and returns a
- <<tasks,`task`>> you can use to cancel or get the status of the task.
- {es} creates a record of this task as a document at `.tasks/task/${taskId}`.
- When you are done with a task, you should delete the task document so
- {es} can reclaim the space.
- ===== Waiting for active shards
- `wait_for_active_shards` controls how many copies of a shard must be active
- before proceeding with the request. See <<index-wait-for-active-shards>>
- for details. `timeout` controls how long each write request waits for unavailable
- shards to become available. Both work exactly the way they work in the
- <<docs-bulk,Bulk API>>. Update by query uses scrolled searches, so you can also
- specify the `scroll` parameter to control how long it keeps the search context
- alive, for example `?scroll=10m`. The default is 5 minutes.
- ===== Throttling update requests
- To control the rate at which update by query issues batches of update operations,
- you can set `requests_per_second` to any positive decimal number. This pads each
- batch with a wait time to throttle the rate. Set `requests_per_second` to `-1`
- to disable throttling.
- Throttling uses a wait time between batches so that the internal scroll requests
- can be given a timeout that takes the request padding into account. The padding
- time is the difference between the batch size divided by the
- `requests_per_second` and the time spent writing. By default the batch size is
- `1000`, so if `requests_per_second` is set to `500`:
- [source,txt]
- --------------------------------------------------
- target_time = 1000 / 500 per second = 2 seconds
- wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
- --------------------------------------------------
- Since the batch is issued as a single `_bulk` request, large batch sizes
- cause {es} to create many requests and wait before starting the next set.
- This is "bursty" instead of "smooth".
- [[docs-update-by-query-slice]]
- ===== Slicing
- Update by query supports <<sliced-scroll, sliced scroll>> to parallelize the
- update process. This can improve efficiency and provide a
- convenient way to break the request down into smaller parts.
- Setting `slices` to `auto` chooses a reasonable number for most indices.
- If you're slicing manually or otherwise tuning automatic slicing, keep in mind
- that:
- * Query performance is most efficient when the number of `slices` is equal to
- the number of shards in the index. If that number is large (for example,
- 500), choose a lower number as too many `slices` hurts performance. Setting
- `slices` higher than the number of shards generally does not improve efficiency
- and adds overhead.
- * Update performance scales linearly across available resources with the
- number of slices.
- Whether query or update performance dominates the runtime depends on the
- documents being reindexed and cluster resources.
- [[docs-update-by-query-api-path-params]]
- ==== {api-path-parms-title}
- `<index>`::
- (Optional, string) A comma-separated list of index names to search. Use `_all`
- or omit to search all indices.
- [[docs-update-by-query-api-query-params]]
- ==== {api-query-parms-title}
- include::{docdir}/rest-api/common-parms.asciidoc[tag=allow-no-indices]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=analyzer]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=analyze_wildcard]
-
- `conflicts`::
- (Optional, string) What to do if delete by query hits version conflicts:
- `abort` or `proceed`. Defaults to `abort`.
- include::{docdir}/rest-api/common-parms.asciidoc[tag=default_operator]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=df]
-
- include::{docdir}/rest-api/common-parms.asciidoc[tag=expand-wildcards]
- +
- Defaults to `open`.
- include::{docdir}/rest-api/common-parms.asciidoc[tag=from]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=index-ignore-unavailable]
-
- include::{docdir}/rest-api/common-parms.asciidoc[tag=lenient]
-
- include::{docdir}/rest-api/common-parms.asciidoc[tag=max_docs]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=pipeline]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=preference]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=search-q]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=request_cache]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=scroll]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=scroll_size]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=search_type]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=search_timeout]
-
- include::{docdir}/rest-api/common-parms.asciidoc[tag=slices]
-
- include::{docdir}/rest-api/common-parms.asciidoc[tag=sort]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=source]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=source_excludes]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=source_includes]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=stats]
-
- include::{docdir}/rest-api/common-parms.asciidoc[tag=terminate_after]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=timeout]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=version]
- include::{docdir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
- [[docs-update-by-query-api-request-body]]
- ==== {api-request-body-title}
- `query`::
- (Optional, <<query-dsl,query object>>) Specifies the documents to update
- using the <<query-dsl,Query DSL>>.
-
- [[docs-update-by-query-api-response-body]]
- ==== Response body
- `took`::
- The number of milliseconds from start to end of the whole operation.
- `timed_out`::
- This flag is set to `true` if any of the requests executed during the
- update by query execution has timed out.
- `total`::
- The number of documents that were successfully processed.
- `updated`::
- The number of documents that were successfully updated.
- `deleted`::
- The number of documents that were successfully deleted.
- `batches`::
- The number of scroll responses pulled back by the update by query.
- `version_conflicts`::
- The number of version conflicts that the update by query hit.
- `noops`::
- The number of documents that were ignored because the script used for
- the update by query returned a `noop` value for `ctx.op`.
- `retries`::
- The number of retries attempted by update by query. `bulk` is the number of bulk
- actions retried, and `search` is the number of search actions retried.
- `throttled_millis`::
- Number of milliseconds the request slept to conform to `requests_per_second`.
- `requests_per_second`::
- The number of requests per second effectively executed during the update by query.
- `throttled_until_millis`::
- This field should always be equal to zero in an `_update_by_query` response. It only
- has meaning when using the <<docs-update-by-query-task-api, Task API>>, where it
- indicates the next time (in milliseconds since epoch) a throttled request will be
- executed again in order to conform to `requests_per_second`.
- `failures`::
- Array of failures if there were any unrecoverable errors during the process. If
- this is non-empty then the request aborted because of those failures.
- Update by query is implemented using batches. Any failure causes the entire
- process to abort, but all failures in the current batch are collected into the
- array. You can use the `conflicts` option to prevent reindex from aborting on
- version conflicts.
- [[docs-update-by-query-api-example]]
- ==== {api-examples-title}
- The simplest usage of `_update_by_query` just performs an update on every
- document in the index without changing the source. This is useful to
- <<picking-up-a-new-property,pick up a new property>> or some other online
- mapping change.
- To update selected documents, specify a query in the request body:
- [source,console]
- --------------------------------------------------
- POST twitter/_update_by_query?conflicts=proceed
- {
- "query": { <1>
- "term": {
- "user": "kimchy"
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:twitter]
- <1> The query must be passed as a value to the `query` key, in the same
- way as the <<search-search,Search API>>. You can also use the `q`
- parameter in the same way as the search API.
- Update documents in multiple indices:
- [source,console]
- --------------------------------------------------
- POST twitter,blog/_update_by_query
- --------------------------------------------------
- // TEST[s/^/PUT twitter\nPUT blog\n/]
- Limit the update by query operation to shards that a particular routing value:
- [source,console]
- --------------------------------------------------
- POST twitter/_update_by_query?routing=1
- --------------------------------------------------
- // TEST[setup:twitter]
- By default update by query uses scroll batches of 1000.
- You can change the batch size with the `scroll_size` parameter:
- [source,console]
- --------------------------------------------------
- POST twitter/_update_by_query?scroll_size=100
- --------------------------------------------------
- // TEST[setup:twitter]
- [[docs-update-by-query-api-source]]
- ===== Update the document source
- Update by query supports scripts to update the document source.
- For example, the following request increments the likes field for all of kimchy’s tweets:
- [source,console]
- --------------------------------------------------
- POST twitter/_update_by_query
- {
- "script": {
- "source": "ctx._source.likes++",
- "lang": "painless"
- },
- "query": {
- "term": {
- "user": "kimchy"
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:twitter]
- Note that `conflicts=proceed` is not specified in this example. In this case, a
- version conflict should halt the process so you can handle the failure.
- As with the <<docs-update,Update API>>, you can set `ctx.op` to change the
- operation that is performed:
- [horizontal]
- `noop`::
- Set `ctx.op = "noop"` if your script decides that it doesn't have to make any changes.
- The update by query operation skips updating the document and increments the `noop` counter.
- `delete`::
- Set `ctx.op = "delete"` if your script decides that the document should be deleted.
- The update by query operation deletes the document and increments the `deleted` counter.
- Update by query only supports `update`, `noop`, and `delete`.
- Setting `ctx.op` to anything else is an error. Setting any other field in `ctx` is an error.
- This API only enables you to modify the source of matching documents, you cannot move them.
- [[docs-update-by-query-api-ingest-pipeline]]
- ===== Update documents using an ingest pipeline
- Update by query can use the <<ingest>> feature by specifying a `pipeline`:
- [source,console]
- --------------------------------------------------
- PUT _ingest/pipeline/set-foo
- {
- "description" : "sets foo",
- "processors" : [ {
- "set" : {
- "field": "foo",
- "value": "bar"
- }
- } ]
- }
- POST twitter/_update_by_query?pipeline=set-foo
- --------------------------------------------------
- // TEST[setup:twitter]
- [float]
- [[docs-update-by-query-fetch-tasks]]
- ===== Get the status of update by query operations
- You can fetch the status of all running update by query requests with the
- <<tasks,Task API>>:
- [source,console]
- --------------------------------------------------
- GET _tasks?detailed=true&actions=*byquery
- --------------------------------------------------
- // TEST[skip:No tasks to retrieve]
- The responses looks like:
- [source,console-result]
- --------------------------------------------------
- {
- "nodes" : {
- "r1A2WoRbTwKZ516z6NEs5A" : {
- "name" : "r1A2WoR",
- "transport_address" : "127.0.0.1:9300",
- "host" : "127.0.0.1",
- "ip" : "127.0.0.1:9300",
- "attributes" : {
- "testattr" : "test",
- "portsfile" : "true"
- },
- "tasks" : {
- "r1A2WoRbTwKZ516z6NEs5A:36619" : {
- "node" : "r1A2WoRbTwKZ516z6NEs5A",
- "id" : 36619,
- "type" : "transport",
- "action" : "indices:data/write/update/byquery",
- "status" : { <1>
- "total" : 6154,
- "updated" : 3500,
- "created" : 0,
- "deleted" : 0,
- "batches" : 4,
- "version_conflicts" : 0,
- "noops" : 0,
- "retries": {
- "bulk": 0,
- "search": 0
- },
- "throttled_millis": 0
- },
- "description" : ""
- }
- }
- }
- }
- }
- --------------------------------------------------
- <1> This object contains the actual status. It is just like the response JSON
- with the important addition of the `total` field. `total` is the total number
- of operations that the reindex expects to perform. You can estimate the
- progress by adding the `updated`, `created`, and `deleted` fields. The request
- will finish when their sum is equal to the `total` field.
- With the task id you can look up the task directly. The following example
- retrieves information about task `r1A2WoRbTwKZ516z6NEs5A:36619`:
- [source,console]
- --------------------------------------------------
- GET /_tasks/r1A2WoRbTwKZ516z6NEs5A:36619
- --------------------------------------------------
- // TEST[catch:missing]
- The advantage of this API is that it integrates with `wait_for_completion=false`
- to transparently return the status of completed tasks. If the task is completed
- and `wait_for_completion=false` was set on it, then it'll come back with a
- `results` or an `error` field. The cost of this feature is the document that
- `wait_for_completion=false` creates at `.tasks/task/${taskId}`. It is up to
- you to delete that document.
- [float]
- [[docs-update-by-query-cancel-task-api]]
- ===== Cancel an update by query operation
- Any update by query can be cancelled using the <<tasks,Task Cancel API>>:
- [source,console]
- --------------------------------------------------
- POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel
- --------------------------------------------------
- The task ID can be found using the <<tasks,tasks API>>.
- Cancellation should happen quickly but might take a few seconds. The task status
- API above will continue to list the update by query task until this task checks
- that it has been cancelled and terminates itself.
- [float]
- [[docs-update-by-query-rethrottle]]
- ===== Change throttling for a request
- The value of `requests_per_second` can be changed on a running update by query
- using the `_rethrottle` API:
- [source,console]
- --------------------------------------------------
- POST _update_by_query/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1
- --------------------------------------------------
- The task ID can be found using the <<tasks, tasks API>>.
- Just like when setting it on the `_update_by_query` API, `requests_per_second`
- can be either `-1` to disable throttling or any decimal number
- like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
- query takes effect immediately, but rethrotting that slows down the query will
- take effect after completing the current batch. This prevents scroll
- timeouts.
- [float]
- [[docs-update-by-query-manual-slice]]
- ===== Slice manually
- Slice an update by query manually by providing a slice id and total number of
- slices to each request:
- [source,console]
- ----------------------------------------------------------------
- POST twitter/_update_by_query
- {
- "slice": {
- "id": 0,
- "max": 2
- },
- "script": {
- "source": "ctx._source['extra'] = 'test'"
- }
- }
- POST twitter/_update_by_query
- {
- "slice": {
- "id": 1,
- "max": 2
- },
- "script": {
- "source": "ctx._source['extra'] = 'test'"
- }
- }
- ----------------------------------------------------------------
- // TEST[setup:big_twitter]
- Which you can verify works with:
- [source,console]
- ----------------------------------------------------------------
- GET _refresh
- POST twitter/_search?size=0&q=extra:test&filter_path=hits.total
- ----------------------------------------------------------------
- // TEST[continued]
- Which results in a sensible `total` like this one:
- [source,console-result]
- ----------------------------------------------------------------
- {
- "hits": {
- "total": {
- "value": 120,
- "relation": "eq"
- }
- }
- }
- ----------------------------------------------------------------
- [float]
- [[docs-update-by-query-automatic-slice]]
- ===== Use automatic slicing
- You can also let update by query automatically parallelize using
- <<sliced-scroll>> to slice on `_id`. Use `slices` to specify the number of
- slices to use:
- [source,console]
- ----------------------------------------------------------------
- POST twitter/_update_by_query?refresh&slices=5
- {
- "script": {
- "source": "ctx._source['extra'] = 'test'"
- }
- }
- ----------------------------------------------------------------
- // TEST[setup:big_twitter]
- Which you also can verify works with:
- [source,console]
- ----------------------------------------------------------------
- POST twitter/_search?size=0&q=extra:test&filter_path=hits.total
- ----------------------------------------------------------------
- // TEST[continued]
- Which results in a sensible `total` like this one:
- [source,console-result]
- ----------------------------------------------------------------
- {
- "hits": {
- "total": {
- "value": 120,
- "relation": "eq"
- }
- }
- }
- ----------------------------------------------------------------
- Setting `slices` to `auto` will let Elasticsearch choose the number of slices
- to use. This setting will use one slice per shard, up to a certain limit. If
- there are multiple source indices, it will choose the number of slices based
- on the index with the smallest number of shards.
- Adding `slices` to `_update_by_query` just automates the manual process used in
- the section above, creating sub-requests which means it has some quirks:
- * You can see these requests in the
- <<docs-update-by-query-task-api,Tasks APIs>>. These sub-requests are "child"
- tasks of the task for the request with `slices`.
- * Fetching the status of the task for the request with `slices` only contains
- the status of completed slices.
- * These sub-requests are individually addressable for things like cancellation
- and rethrottling.
- * Rethrottling the request with `slices` will rethrottle the unfinished
- sub-request proportionally.
- * Canceling the request with `slices` will cancel each sub-request.
- * Due to the nature of `slices` each sub-request won't get a perfectly even
- portion of the documents. All documents will be addressed, but some slices may
- be larger than others. Expect larger slices to have a more even distribution.
- * Parameters like `requests_per_second` and `max_docs` on a request with
- `slices` are distributed proportionally to each sub-request. Combine that with
- the point above about distribution being uneven and you should conclude that
- using `max_docs` with `slices` might not result in exactly `max_docs` documents
- being updated.
- * Each sub-request gets a slightly different snapshot of the source index
- though these are all taken at approximately the same time.
- [float]
- [[picking-up-a-new-property]]
- ===== Pick up a new property
- Say you created an index without dynamic mapping, filled it with data, and then
- added a mapping value to pick up more fields from the data:
- [source,console]
- --------------------------------------------------
- PUT test
- {
- "mappings": {
- "dynamic": false, <1>
- "properties": {
- "text": {"type": "text"}
- }
- }
- }
- POST test/_doc?refresh
- {
- "text": "words words",
- "flag": "bar"
- }
- POST test/_doc?refresh
- {
- "text": "words words",
- "flag": "foo"
- }
- PUT test/_mapping <2>
- {
- "properties": {
- "text": {"type": "text"},
- "flag": {"type": "text", "analyzer": "keyword"}
- }
- }
- --------------------------------------------------
- <1> This means that new fields won't be indexed, just stored in `_source`.
- <2> This updates the mapping to add the new `flag` field. To pick up the new
- field you have to reindex all documents with it.
- Searching for the data won't find anything:
- [source,console]
- --------------------------------------------------
- POST test/_search?filter_path=hits.total
- {
- "query": {
- "match": {
- "flag": "foo"
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- [source,console-result]
- --------------------------------------------------
- {
- "hits" : {
- "total": {
- "value": 0,
- "relation": "eq"
- }
- }
- }
- --------------------------------------------------
- But you can issue an `_update_by_query` request to pick up the new mapping:
- [source,console]
- --------------------------------------------------
- POST test/_update_by_query?refresh&conflicts=proceed
- POST test/_search?filter_path=hits.total
- {
- "query": {
- "match": {
- "flag": "foo"
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- [source,console-result]
- --------------------------------------------------
- {
- "hits" : {
- "total": {
- "value": 1,
- "relation": "eq"
- }
- }
- }
- --------------------------------------------------
- You can do the exact same thing when adding a field to a multifield.
|