123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161 |
- [role="xpack"]
- [[data-streams]]
- = Data streams
- ++++
- <titleabbrev>Data streams</titleabbrev>
- ++++
- A data stream lets you store append-only time series
- data across multiple indices while giving you a single named resource for
- requests. Data streams are well-suited for logs, events, metrics, and other
- continuously generated data.
- You can submit indexing and search requests directly to a data stream. The
- stream automatically routes the request to backing indices that store the
- stream's data. You can use <<index-lifecycle-management,{ilm} ({ilm-init})>> to
- automate the management of these backing indices. For example, you can use
- {ilm-init} to automatically move older backing indices to less expensive
- hardware and delete unneeded indices. {ilm-init} can help you reduce costs and
- overhead as your data grows.
- [discrete]
- [[should-you-use-a-data-stream]]
- == Should you use a data stream?
- To determine whether you should use a data stream for your data, you should consider the format of
- the data, and your expected interaction. A good candidate for using a data stream will match the
- following criteria:
- * Your data contains a timestamp field, or one could be automatically generated.
- * You mostly perform indexing requests, with occasional updates and deletes.
- * You index documents without an `_id`, or when indexing documents with an explicit `_id` you expect first-write-wins behavior.
- For most time series data use-cases, a data stream will be a good fit. However, if you find that
- your data doesn't fit into these categories (for example, if you frequently send multiple documents
- using the same `_id` expecting last-write-wins), you may want to use an index alias with a write
- index instead. See documentation for <<manage-time-series-data-without-data-streams,managing time
- series data without a data stream>> for more information.
- Keep in mind that some features such as <<tsds,Time Series Data Streams (TSDS)>> and
- <<data-stream-lifecycle,data stream lifecycles>> require a data stream.
- [discrete]
- [[backing-indices]]
- == Backing indices
- A data stream consists of one or more <<index-hidden,hidden>>, auto-generated
- backing indices.
- image::images/data-streams/data-streams-diagram.svg[align="center"]
- A data stream requires a matching <<index-templates,index template>>. The
- template contains the mappings and settings used to configure the stream's
- backing indices.
- // tag::timestamp-reqs[]
- Every document indexed to a data stream must contain a `@timestamp` field,
- mapped as a <<date,`date`>> or <<date_nanos,`date_nanos`>> field type. If the
- index template doesn't specify a mapping for the `@timestamp` field, {es} maps
- `@timestamp` as a `date` field with default options.
- // end::timestamp-reqs[]
- The same index template can be used for multiple data streams. You cannot
- delete an index template in use by a data stream.
- The name pattern for the backing indices is an implementation detail and no
- intelligence should be derived from it. The only invariant the holds is that
- each data stream generation index will have a unique name.
- [discrete]
- [[data-stream-read-requests]]
- == Read requests
- When you submit a read request to a data stream, the stream routes the request
- to all its backing indices.
- image::images/data-streams/data-streams-search-request.svg[align="center"]
- [discrete]
- [[data-stream-write-index]]
- == Write index
- The most recently created backing index is the data stream’s write index.
- The stream adds new documents to this index only.
- image::images/data-streams/data-streams-index-request.svg[align="center"]
- You cannot add new documents to other backing indices, even by sending requests
- directly to the index.
- You also cannot perform operations on a write index that may hinder indexing,
- such as:
- * <<indices-clone-index,Clone>>
- * <<indices-delete-index,Delete>>
- * <<indices-shrink-index,Shrink>>
- * <<indices-split-index,Split>>
- [discrete]
- [[data-streams-rollover]]
- == Rollover
- A <<indices-rollover-index,rollover>> creates a new backing index that becomes
- the stream's new write index.
- We recommend using <<index-lifecycle-management,{ilm-init}>> to automatically
- roll over data streams when the write index reaches a specified age or size.
- If needed, you can also <<manually-roll-over-a-data-stream,manually roll over>>
- a data stream.
- [discrete]
- [[data-streams-generation]]
- == Generation
- Each data stream tracks its generation: a six-digit, zero-padded integer starting at `000001`.
- When a backing index is created, the index is named using the following
- convention:
- [source,text]
- ----
- .ds-<data-stream>-<yyyy.MM.dd>-<generation>
- ----
- `<yyyy.MM.dd>` is the backing index's creation date. Backing indices with a
- higher generation contain more recent data. For example, the `web-server-logs`
- data stream has a generation of `34`. The stream's most recent backing index,
- created on 7 March 2099, is named `.ds-web-server-logs-2099.03.07-000034`.
- Some operations, such as a <<indices-shrink-index,shrink>> or
- <<snapshots-restore-snapshot,restore>>, can change a backing index's name.
- These name changes do not remove a backing index from its data stream.
- The generation of the data stream can change without a new index being added to
- the data stream (e.g. when an existing backing index is shrunk). This means the
- backing indices for some generations will never exist.
- You should not derive any intelligence from the backing indices names.
- [discrete]
- [[data-streams-append-only]]
- == Append-only (mostly)
- Data streams are designed for use cases where existing data is rarely updated. You cannot send
- update or deletion requests for existing documents directly to a data stream. However, you can still
- <<update-delete-docs-in-a-backing-index,update or delete documents>> in a data stream by submitting
- requests directly to the document's backing index.
- If you need to update a larger number of documents in a data stream, you can use the
- <<update-docs-in-a-data-stream-by-query,update by query>> and
- <<delete-docs-in-a-data-stream-by-query,delete by query>> APIs.
- TIP: If you frequently send multiple documents using the same `_id` expecting last-write-wins, you
- may want to use an index alias with a write index instead. See
- <<manage-time-series-data-without-data-streams>>.
- include::set-up-a-data-stream.asciidoc[]
- include::use-a-data-stream.asciidoc[]
- include::change-mappings-and-settings.asciidoc[]
- include::tsds.asciidoc[]
- include::logs.asciidoc[]
- include::lifecycle/index.asciidoc[]
|