Browse Source

Clarify data stream recommendations and best practices (#107233)

* Clarify data stream recommendations and best practices

Our documentation around data streams versus aliases could be interpreted in a way where someone doing *any* updates thinks they need to use an alias with indices instead of a data stream. This commit enhances the documentation around these areas to determine the correct abstraction in a more concrete way. It also tries to clarify that data streams still allow updates to the backing indices, and that a difference is last-write-wins versus first-write-wins.
Lee Hinman 1 year ago
parent
commit
91bdfb84a0

+ 31 - 9
docs/reference/data-streams/data-streams.asciidoc

@@ -18,6 +18,28 @@ automate the management of these backing indices. For example, you can use
 hardware and delete unneeded indices. {ilm-init} can help you reduce costs and
 overhead as your data grows.
 
+
+[discrete]
+[[should-you-use-a-data-stream]]
+== Should you use a data stream?
+
+To determine whether you should use a data stream for your data, you should consider the format of
+the data, and your expected interaction. A good candidate for using a data stream will match the
+following criteria:
+
+* Your data contains a timestamp field, or one could be automatically generated.
+* You mostly perform indexing requests, with occasional updates and deletes.
+* You index documents without an `_id`, or when indexing documents with an explicit `_id` you expect first-write-wins behavior.
+
+For most time series data use-cases, a data stream will be a good fit. However, if you find that
+your data doesn't fit into these categories (for example, if you frequently send multiple documents
+using the same `_id` expecting last-write-wins), you may want to use an index alias with a write
+index instead. See documentation for <<manage-time-series-data-without-data-streams,managing time
+series data without a data stream>> for more information.
+
+Keep in mind that some features such as <<tsds,Time Series Data Streams (TSDS)>> and
+<<data-stream-lifecycle,data stream lifecycles>> require a data stream.
+
 [discrete]
 [[backing-indices]]
 == Backing indices
@@ -116,19 +138,19 @@ You should not derive any intelligence from the backing indices names.
 
 [discrete]
 [[data-streams-append-only]]
-== Append-only
+== Append-only (mostly)
 
-Data streams are designed for use cases where existing data is rarely,
-if ever, updated. You cannot send update or deletion requests for existing
-documents directly to a data stream. Instead, use the
+Data streams are designed for use cases where existing data is rarely updated. You cannot send
+update or deletion requests for existing documents directly to a data stream. However, you can still
+<<update-delete-docs-in-a-backing-index,update or delete documents>> in a data stream by submitting
+requests directly to the document's backing index.
+
+If you need to update a larger number of documents in a data stream, you can use the
 <<update-docs-in-a-data-stream-by-query,update by query>> and
 <<delete-docs-in-a-data-stream-by-query,delete by query>> APIs.
 
-If needed, you can <<update-delete-docs-in-a-backing-index,update or delete
-documents>> by submitting requests directly to the document's backing index.
-
-TIP: If you frequently update or delete existing time series data, use an index
-alias with a write index instead of a data stream. See
+TIP: If you frequently send multiple documents using the same `_id` expecting last-write-wins, you
+may want to use an index alias with a write index instead. See
 <<manage-time-series-data-without-data-streams>>.
 
 include::set-up-a-data-stream.asciidoc[]

+ 9 - 8
docs/reference/ilm/ilm-tutorial.asciidoc

@@ -282,14 +282,15 @@ DELETE /_index_template/timeseries_template
 [[manage-time-series-data-without-data-streams]]
 === Manage time series data without data streams
 
-Even though <<data-streams, data streams>> are a convenient way to scale
-and manage time series data, they are designed to be append-only. We recognise there
-might be use-cases where data needs to be updated or deleted in place and the
-data streams don't support delete and update requests directly,
-so the index APIs would need to be used directly on the data stream's backing indices.
-
-In these cases, you can use an index alias to manage indices containing the time series data
-and periodically roll over to a new index.
+Even though <<data-streams, data streams>> are a convenient way to scale and manage time series
+data, they are designed to be append-only. We recognise there might be use-cases where data needs to
+be updated or deleted in place and the data streams don't support delete and update requests
+directly, so the index APIs would need to be used directly on the data stream's backing indices. In
+these cases we still recommend using a data stream.
+
+If you frequently send multiple documents using the same `_id` expecting last-write-wins, you can
+use an index alias instead of a data stream to manage indices containing the time series data and
+periodically roll over to a new index.
 
 To automate rollover and management of time series indices with {ilm-init} using an index
 alias, you:

+ 5 - 2
docs/reference/ilm/set-up-lifecycle-policy.asciidoc

@@ -81,6 +81,8 @@ To use a policy that triggers the rollover action,
 you need to configure the policy in the index template used to create each new index.
 You specify the name of the policy and the alias used to reference the rolling indices.
 
+TIP: An `index.lifecycle.rollover_alias` setting is only required if using {ilm} with an alias. It is unnecessary when using <<data-streams,Data Streams>>.
+
 You can use the {kib} Create template wizard to create a template. To access the
 wizard, open the menu and go to *Stack Management > Index Management*. In the
 *Index Templates* tab, click *Create template*.
@@ -128,8 +130,9 @@ DELETE _index_template/my_template
 [[create-initial-index]]
 ==== Create an initial managed index
 
-When you set up policies for your own rolling indices, you need to manually create the first index 
-managed by a policy and designate it as the write index.
+When you set up policies for your own rolling indices, if you are not using the recommended
+<<data-streams,data streams>>, you need to manually create the first index managed by a policy and
+designate it as the write index.
 
 IMPORTANT: When you enable {ilm} for {beats} or the {ls} {es} output plugin, 
 the necessary policies and configuration changes are applied automatically.