Browse Source

[DOCS] Document delete/update by query for data streams (#58679)

James Rodewig 5 years ago
parent
commit
55e2ec6248

+ 15 - 13
docs/reference/data-streams/data-streams-overview.asciidoc

@@ -119,28 +119,30 @@ manually perform a rollover. See <<manually-roll-over-a-data-stream>>.
 === Append-only
 
 For most time-series use cases, existing data is rarely, if ever, updated.
-Because of this, data streams are designed to be append-only. This means you can
-send indexing requests for new documents directly to a data stream. However, you
-cannot send update or deletion requests for existing documents to a data stream.
+Because of this, data streams are designed to be append-only.
 
-To update or delete specific documents in a data stream, submit one of the
-following requests to the backing index containing the document:
+You can send <<add-documents-to-a-data-stream,indexing requests for new
+documents>> directly to a data stream. However, you cannot send the following
+requests for existing documents directly to a data stream:
 
 * An <<docs-index_,index API>> request with an
-  <<docs-index-api-op_type,`op_type`>> of `index`.
-  These requests must include valid <<optimistic-concurrency-control,`if_seq_no`
-  and `if_primary_term`>> arguments.
+  <<docs-index-api-op_type,`op_type`>> of `index`. The `op_type` parameter
+  defaults to `index` for existing documents.
 
 * A <<docs-bulk,bulk API>> request using the `delete`, `index`, or `update`
-  action. If the action type is `index`, the action must include valid
-  <<bulk-optimistic-concurrency-control,`if_seq_no` and `if_primary_term`>>
-  arguments.
+  action.
 
 * A <<docs-delete,delete API>> request
 
-See <<update-delete-docs-in-a-data-stream>>.
+Instead, you can use the <<docs-update-by-query,update by query>> and
+<<docs-delete-by-query,delete by query>> APIs to update or delete existing
+documents in a data stream. See <<update-delete-docs-in-a-data-stream>>.
+
+Alternatively, you can update or delete a document by submitting requests to the
+backing index containing the document. See
+<<update-delete-docs-in-a-backing-index>>.
 
 TIP: If you frequently update or delete existing documents,
 we recommend using an <<indices-add-alias,index alias>> and
 <<indices-templates,index template>> instead of a data stream. You can still
-use <<index-lifecycle-management,{ilm-init}>> to manage indices for the alias.
+use <<index-lifecycle-management,{ilm-init}>> to manage indices for the alias.

+ 4 - 5
docs/reference/data-streams/set-up-a-data-stream.asciidoc

@@ -26,11 +26,10 @@ TIP: Data streams work well with most common log formats. While no schema is
 required to use data streams, we recommend the {ecs-ref}[Elastic Common Schema
 (ECS)].
 
-* Data streams are designed to be <<data-streams-append-only,append-only>>.
-While you can index new documents directly to a data stream, you cannot use a
-data stream to directly update or delete individual documents. To update or
-delete specific documents in a data stream, submit a <<docs-delete,delete>> or
-<<docs-update,update>> API request to the backing index containing the document.
+* Data streams are best suited for time-based,
+<<data-streams-append-only,append-only>> use cases. If you frequently need to
+update or delete existing documents, we recommend using an index alias and an
+index template instead.
 
 
 [discrete]

+ 99 - 8
docs/reference/data-streams/use-a-data-stream.asciidoc

@@ -10,6 +10,7 @@ the following:
 * <<open-closed-backing-indices>>
 * <<reindex-with-a-data-stream>>
 * <<update-delete-docs-in-a-data-stream>>
+* <<update-delete-docs-in-a-backing-index>>
 
 ////
 [source,console]
@@ -67,6 +68,10 @@ POST /logs/_doc/
 ----
 // TEST[continued]
 ====
+
+IMPORTANT: You cannot add new documents to a data stream using the index API's
+`PUT /<target>/_doc/<_id>` request format. Use the `PUT /<target>/_create/<_id>`
+format instead.
 --
 
 * A <<docs-bulk,bulk API>> request using the `create` action. Specify the data
@@ -426,12 +431,96 @@ POST /_reindex
 [[update-delete-docs-in-a-data-stream]]
 === Update or delete documents in a data stream
 
-Data streams are designed to be <<data-streams-append-only,append-only>>. This
-means you cannot send update or deletion requests for existing documents to a
-data stream. However, you can send update or deletion requests to the backing
-index containing the document.
+You can update or delete documents in a data stream using the following
+requests:
+
+* An <<docs-update-by-query,update by query API>> request
++
+.*Example*
+[%collapsible]
+====
+The following update by query API request updates documents in the `logs` data
+stream with a `user.id` of `i96BP1mA`. The request uses a
+<<modules-scripting-using,script>> to assign matching documents a new `user.id`
+value of `XgdX0NoX`.
+
+////
+[source,console]
+----
+PUT /logs/_create/2?refresh=wait_for
+{
+  "@timestamp": "2020-12-07T11:06:07.000Z",
+  "user": {
+    "id": "i96BP1mA"
+  }
+}
+----
+// TEST[continued]
+////
+
+[source,console]
+----
+POST /logs/_update_by_query
+{
+  "query": {
+    "match": {
+      "user.id": "i96BP1mA"
+    }
+  },
+  "script": {
+    "source": "ctx._source.user.id = params.new_id",
+    "params": {
+      "new_id": "XgdX0NoX"
+    }
+  }
+}
+----
+// TEST[continued]
+====
+
+* A <<docs-delete-by-query,delete by query API>> request
++
+.*Example*
+[%collapsible]
+====
+The following delete by query API request deletes documents in the `logs` data
+stream with a `user.id` of `zVZMamUM`.
+
+////
+[source,console]
+----
+PUT /logs/_create/1?refresh=wait_for
+{
+  "@timestamp": "2020-12-07T11:06:07.000Z",
+  "user": {
+    "id": "zVZMamUM"
+  }
+}
+----
+// TEST[continued]
+////
+
+[source,console]
+----
+POST /logs/_delete_by_query
+{
+  "query": {
+    "match": {
+      "user.id": "zVZMamUM"
+    }
+  }
+}
+----
+// TEST[continued]
+====
+
+[discrete]
+[[update-delete-docs-in-a-backing-index]]
+=== Update or delete documents in a backing index
 
-To delete or update a document in a data stream, you first need to get:
+Alternatively, you can update or delete documents in a data stream by sending
+the update or deletion request to the backing index containing the document. To
+do this, you first need to get:
 
 * The <<mapping-id-field,document ID>>
 * The name of the backing index that contains the document
@@ -506,7 +595,7 @@ information for any documents matching the search.
       {
         "_index": ".ds-logs-000003",                <1>
         "_id": "bfspvnIBr7VVZlfp2lqX",              <2>
-        "_seq_no": 4,                               <3>
+        "_seq_no": 8,                               <3>
         "_primary_term": 1,                         <4>
         "_score": 0.2876821,
         "_source": {
@@ -522,6 +611,8 @@ information for any documents matching the search.
 }
 ----
 // TESTRESPONSE[s/"took": 20/"took": $body.took/]
+// TESTRESPONSE[s/"max_score": 0.2876821/"max_score": $body.hits.max_score/]
+// TESTRESPONSE[s/"_score": 0.2876821/"_score": $body.hits.hits.0._score/]
 
 <1> Backing index containing the matching document
 <2> Document ID for the document
@@ -546,7 +637,7 @@ contains a new JSON source for the document.
 
 [source,console]
 ----
-PUT /.ds-logs-000003/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=4&if_primary_term=1
+PUT /.ds-logs-000003/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=8&if_primary_term=1
 {
   "@timestamp": "2020-12-07T11:06:07.000Z",
   "user": {
@@ -611,7 +702,7 @@ parameters.
 [source,console]
 ----
 PUT /_bulk?refresh
-{ "index": { "_index": ".ds-logs-000003", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 4, "if_primary_term": 1 } }
+{ "index": { "_index": ".ds-logs-000003", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 8, "if_primary_term": 1 } }
 { "@timestamp": "2020-12-07T11:06:07.000Z", "user": { "id": "8a4f500d" }, "message": "Login successful" }
 ----
 // TEST[continued]

+ 18 - 13
docs/reference/docs/delete-by-query.asciidoc

@@ -47,7 +47,7 @@ POST /twitter/_delete_by_query
 [[docs-delete-by-query-api-request]]
 ==== {api-request-title}
 
-`POST /<index>/_delete_by_query`
+`POST /<target>/_delete_by_query`
 
 [[docs-delete-by-query-api-desc]]
 ==== {api-description-title}
@@ -55,7 +55,7 @@ POST /twitter/_delete_by_query
 You can specify the query criteria in the request URI or the request body
 using the same syntax as the  <<search-search,Search API>>. 
 
-When you submit a delete by query request, {es} gets a snapshot of the index
+When you submit a delete by query request, {es} gets a snapshot of the data stream or index
 when it begins processing the request and deletes matching documents using
 `internal` versioning. If a document changes between the time that the
 snapshot is taken and the delete operation is processed, it results in a version
@@ -134,12 +134,12 @@ Delete by query supports <<sliced-scroll, sliced scroll>> to parallelize the
 delete process. This can improve efficiency and provide a
 convenient way to break the request down into smaller parts.
 
-Setting `slices` to `auto` chooses a reasonable number for most indices. 
+Setting `slices` to `auto` chooses a reasonable number for most data streams and indices. 
 If you're slicing manually or otherwise tuning automatic slicing, keep in mind 
 that:
 
 * Query performance is most efficient when the number of `slices` is equal to 
-the number of shards in the index. If that number is large (for example,
+the number of shards in the index or backing index. If that number is large (for example,
 500), choose a lower number as too many `slices` hurts performance. Setting
 `slices` higher than the number of shards generally does not improve efficiency
 and adds overhead.
@@ -153,9 +153,11 @@ documents being reindexed and cluster resources.
 [[docs-delete-by-query-api-path-params]]
 ==== {api-path-parms-title}
 
-`<index>`::
-(Optional, string) A comma-separated list of index names to search. Use `_all`
-or omit to search all indices.
+`<target>`::
+(Optional, string)
+A comma-separated list of data streams, indices, and index aliases to search.
+Wildcard (`*`) expressions are supported. To search all data streams or indices
+in a cluster, omit this parameter or use `_all` or `*`.
 
 [[docs-delete-by-query-api-query-params]]
 ==== {api-query-parms-title}
@@ -200,7 +202,10 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
 
 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
 
-include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll]
+`scroll`::
+(Optional, <<time-units,time value>>)
+Period to retain the <<scroll-search-context,search context>> for scrolling. See
+<<request-body-search-scroll>>.
 
 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]
 
@@ -343,7 +348,7 @@ version conflicts.
 [[docs-delete-by-query-api-example]]
 ==== {api-examples-title}
 
-Delete all tweets from the `twitter` index:
+Delete all tweets from the `twitter` data stream or index:
 
 [source,console]
 --------------------------------------------------
@@ -356,7 +361,7 @@ POST twitter/_delete_by_query?conflicts=proceed
 --------------------------------------------------
 // TEST[setup:twitter]
 
-Delete documents from multiple indices:
+Delete documents from multiple data streams or indices:
 
 [source,console]
 --------------------------------------------------
@@ -531,8 +536,8 @@ Which results in a sensible `total` like this one:
 
 Setting `slices` to `auto` will let {es} choose the number of slices
 to use. This setting will use one slice per shard, up to a certain limit. If
-there are multiple source indices, it will choose the number of slices based
-on the index with the smallest number of shards.
+there are multiple source data streams or indices, it will choose the number of slices based
+on the index or backing index with the smallest number of shards.
 
 Adding `slices` to `_delete_by_query` just automates the manual process used in
 the section above, creating sub-requests which means it has some quirks:
@@ -555,7 +560,7 @@ slices` are distributed proportionally to each sub-request. Combine that with
 the point above about distribution being uneven and you should conclude that
 using `max_docs` with `slices` might not result in exactly `max_docs` documents
 being deleted.
-* Each sub-request gets a slightly different snapshot of the source index
+* Each sub-request gets a slightly different snapshot of the source data stream or index
 though these are all taken at approximately the same time.
 
 [float]

+ 20 - 15
docs/reference/docs/update-by-query.asciidoc

@@ -5,7 +5,7 @@
 ++++
 
 Updates documents that match the specified query. 
-If no query is specified, performs an update on every document in the index without
+If no query is specified, performs an update on every document in the data stream or index without
 modifying the source, which is useful for picking up mapping changes.
 
 [source,console]
@@ -44,7 +44,7 @@ POST twitter/_update_by_query?conflicts=proceed
 [[docs-update-by-query-api-request]]
 ==== {api-request-title}
 
-`POST /<index>/_update_by_query`
+`POST /<target>/_update_by_query`
 
 [[docs-update-by-query-api-desc]]
 ==== {api-description-title}
@@ -52,7 +52,7 @@ POST twitter/_update_by_query?conflicts=proceed
 You can specify the query criteria in the request URI or the request body
 using the same syntax as the  <<search-search,Search API>>. 
 
-When you submit an update by query request, {es} gets a snapshot of the index
+When you submit an update by query request, {es} gets a snapshot of the data stream or index
 when it begins processing the request and updates matching documents using
 `internal` versioning. 
 When the versions match, the document is updated and the version number is incremented. 
@@ -75,7 +75,7 @@ Any update requests that completed successfully still stick, they are not rolled
 ===== Refreshing shards
 
 Specifying the `refresh` parameter refreshes all shards once the request completes. 
-This is different than the update API#8217;s `refresh` parameter, which causes just the shard
+This is different than the update API's `refresh` parameter, which causes just the shard
 that received the request to be refreshed. Unlike the update API, it does not support 
 `wait_for`.
 
@@ -129,12 +129,12 @@ Update by query supports <<sliced-scroll, sliced scroll>> to parallelize the
 update process. This can improve efficiency and provide a
 convenient way to break the request down into smaller parts.
 
-Setting `slices` to `auto` chooses a reasonable number for most indices. 
+Setting `slices` to `auto` chooses a reasonable number for most data streams and indices. 
 If you're slicing manually or otherwise tuning automatic slicing, keep in mind 
 that:
 
 * Query performance is most efficient when the number of `slices` is equal to 
-the number of shards in the index. If that number is large (for example,
+the number of shards in the index or backing index. If that number is large (for example,
 500), choose a lower number as too many `slices` hurts performance. Setting
 `slices` higher than the number of shards generally does not improve efficiency
 and adds overhead.
@@ -148,9 +148,11 @@ documents being reindexed and cluster resources.
 [[docs-update-by-query-api-path-params]]
 ==== {api-path-parms-title}
 
-`<index>`::
-(Optional, string) A comma-separated list of index names to search. Use `_all`
-or omit to search all indices.
+`<target>`::
+(Optional, string)
+A comma-separated list of data streams, indices, and index aliases to search.
+Wildcard (`*`) expressions are supported. To search all data streams or indices
+in a cluster, omit this parameter or use `_all` or `*`.
 
 [[docs-update-by-query-api-query-params]]
 ==== {api-query-parms-title}
@@ -197,7 +199,10 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
 
 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
 
-include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll]
+`scroll`::
+(Optional, <<time-units,time value>>)
+Period to retain the <<scroll-search-context,search context>> for scrolling. See
+<<request-body-search-scroll>>.
 
 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]
 
@@ -290,7 +295,7 @@ version conflicts.
 ==== {api-examples-title}
 
 The simplest usage of `_update_by_query` just performs an update on every
-document in the index without changing the source. This is useful to
+document in the data stream or index without changing the source. This is useful to
 <<picking-up-a-new-property,pick up a new property>> or some other online
 mapping change.
 
@@ -313,7 +318,7 @@ POST twitter/_update_by_query?conflicts=proceed
 way as the <<search-search,Search API>>. You can also use the `q`
 parameter in the same way as the search API.
 
-Update documents in multiple indices:
+Update documents in multiple data streams or indices:
 
 [source,console]
 --------------------------------------------------
@@ -617,8 +622,8 @@ Which results in a sensible `total` like this one:
 
 Setting `slices` to `auto` will let Elasticsearch choose the number of slices
 to use. This setting will use one slice per shard, up to a certain limit. If
-there are multiple source indices, it will choose the number of slices based
-on the index with the smallest number of shards.
+there are multiple source data streams or indices, it will choose the number of slices based
+on the index or backing index with the smallest number of shards.
 
 Adding `slices` to `_update_by_query` just automates the manual process used in
 the section above, creating sub-requests which means it has some quirks:
@@ -641,7 +646,7 @@ be larger than others. Expect larger slices to have a more even distribution.
 the point above about distribution being uneven and you should conclude that
 using `max_docs` with `slices` might not result in exactly `max_docs` documents
 being updated.
-* Each sub-request gets a slightly different snapshot of the source index
+* Each sub-request gets a slightly different snapshot of the source data stream or index
 though these are all taken at approximately the same time.
 
 [float]