data-streams.asciidoc 7.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186
  1. [role="xpack"]
  2. [[data-streams]]
  3. = Data streams
  4. ++++
  5. <titleabbrev>Data streams</titleabbrev>
  6. ++++
  7. A _data stream_ is a convenient, scalable way to ingest, search, and manage
  8. continuously generated time-series data.
  9. Time-series data, such as logs, tends to grow over time. While storing an entire
  10. time series in a single {es} index is simpler, it is often more efficient and
  11. cost-effective to store large volumes of data across multiple, time-based
  12. indices. Multiple indices let you move indices containing older, less frequently
  13. queried data to less expensive hardware and delete indices when they're no
  14. longer needed, reducing overhead and storage costs.
  15. A data stream is designed to give you the best of both worlds:
  16. * The simplicity of a single named resource you can use for requests
  17. * The storage, scalability, and cost-saving benefits of multiple indices
  18. You can submit indexing and search requests directly to a data stream. The
  19. stream automatically routes the requests to a collection of hidden
  20. _backing indices_ that store the stream's data.
  21. You can use <<index-lifecycle-management,{ilm} ({ilm-init})>> to automate the
  22. management of these backing indices. {ilm-init} lets you automatically spin up
  23. new backing indices, allocate indices to different hardware, delete old indices,
  24. and take other automatic actions based on age or size criteria you set. Use data
  25. streams and {ilm-init} to seamlessly scale your data storage based on your
  26. budget, performance, resiliency, and retention needs.
  27. [discrete]
  28. [[when-to-use-data-streams]]
  29. == When to use data streams
  30. We recommend using data streams if you:
  31. * Use {es} to ingest, search, and manage large volumes of time-series data
  32. * Want to scale and reduce costs by using {ilm-init} to automate the management
  33. of your indices
  34. * Index large volumes of time-series data in {es} but rarely delete or update
  35. individual documents
  36. [discrete]
  37. [[backing-indices]]
  38. == Backing indices
  39. A data stream consists of one or more _backing indices_. Backing indices are
  40. <<index-hidden,hidden>>, auto-generated indices used to store a stream's
  41. documents.
  42. image::images/data-streams/data-streams-diagram.svg[align="center"]
  43. To create backing indices, each data stream requires a matching
  44. <<indices-templates,index template>>. This template acts as a blueprint for the
  45. stream's backing indices. It specifies:
  46. * One or more wildcard (`*`) patterns that match the name of the stream.
  47. * The mappings and settings for the stream's backing indices.
  48. * That the template is used exclusively for data streams.
  49. Every document indexed to a data stream must have a `@timestamp` field. This
  50. field can be mapped as a <<date,`date`>> or <<date_nanos,`date_nanos`>> field
  51. data type by the stream's index template. If the template does not specify a
  52. mapping, the `@timestamp` field is mapped as a `date` field with default
  53. options.
  54. The same index template can be used to create multiple data streams.
  55. [discrete]
  56. [[data-streams-generation]]
  57. == Generation
  58. Each data stream tracks its _generation_: a six-digit, zero-padded integer
  59. that acts as a cumulative count of the data stream's backing indices. This count
  60. includes any deleted indices for the stream. The generation is incremented
  61. whenever a new backing index is added to the stream.
  62. When a backing index is created, the index is named using the following
  63. convention:
  64. [source,text]
  65. ----
  66. .ds-<data-stream>-<generation>
  67. ----
  68. For example, the `web_server_logs` data stream has a generation of `34`. The
  69. most recently created backing index for this data stream is named
  70. `.ds-web_server_logs-000034`.
  71. Because the generation increments with each new backing index, backing indices
  72. with a higher generation contain more recent data. Backing indices with a lower
  73. generation contain older data.
  74. A backing index's name can change after its creation due to a
  75. <<indices-shrink-index,shrink>>, <<snapshots-restore-snapshot,restore>>, or
  76. other operations. However, renaming a backing index does not detach it from a
  77. data stream.
  78. [discrete]
  79. [[data-stream-read-requests]]
  80. == Read requests
  81. When a read request is sent to a data stream, it routes the request to all its
  82. backing indices. For example, a search request sent to a data stream would query
  83. all its backing indices.
  84. image::images/data-streams/data-streams-search-request.svg[align="center"]
  85. [discrete]
  86. [[data-stream-write-index]]
  87. == Write index
  88. The most recently created backing index is the data stream’s only
  89. _write index_. The data stream routes all indexing requests for new documents to
  90. this index.
  91. image::images/data-streams/data-streams-index-request.svg[align="center"]
  92. You cannot add new documents to a stream's other backing indices, even by
  93. sending requests directly to the index.
  94. Because it's the only index capable of ingesting new documents, you cannot
  95. perform operations on a write index that might hinder indexing. These
  96. prohibited operations include:
  97. * <<indices-clone-index,Clone>>
  98. * <<indices-close,Close>>
  99. * <<indices-delete-index,Delete>>
  100. * <<freeze-index-api,Freeze>>
  101. * <<indices-shrink-index,Shrink>>
  102. * <<indices-split-index,Split>>
  103. [discrete]
  104. [[data-streams-rollover]]
  105. == Rollover
  106. When a data stream is created, one backing index is automatically created.
  107. Because this single index is also the most recently created backing index, it
  108. acts as the stream's write index.
  109. A <<indices-rollover-index,rollover>> creates a new backing index for a data
  110. stream. This new backing index becomes the stream's write index, replacing
  111. the current one, and increments the stream's generation.
  112. In most cases, we recommend using <<index-lifecycle-management,{ilm}
  113. ({ilm-init})>> to automate rollovers for data streams. This lets you
  114. automatically roll over the current write index when it meets specified
  115. criteria, such as a maximum age or size.
  116. However, you can also use the <<indices-rollover-index,rollover API>> to
  117. manually perform a rollover. See <<manually-roll-over-a-data-stream>>.
  118. [discrete]
  119. [[data-streams-append-only]]
  120. == Append-only
  121. For most time-series use cases, existing data is rarely, if ever, updated.
  122. Because of this, data streams are designed to be append-only.
  123. You can send <<add-documents-to-a-data-stream,indexing requests for new
  124. documents>> directly to a data stream. However, you cannot send the update or
  125. deletion requests for existing documents directly to a data stream.
  126. Instead, you can use the <<docs-update-by-query,update by query>> and
  127. <<docs-delete-by-query,delete by query>> APIs to update or delete existing
  128. documents in a data stream. See <<update-docs-in-a-data-stream-by-query>> and <<delete-docs-in-a-data-stream-by-query>>.
  129. If needed, you can update or delete a document by submitting requests to the
  130. backing index containing the document. See
  131. <<update-delete-docs-in-a-backing-index>>.
  132. TIP: If you frequently update or delete existing documents,
  133. we recommend using an <<indices-add-alias,index alias>> and
  134. <<indices-templates,index template>> instead of a data stream. You can still
  135. use <<index-lifecycle-management,{ilm-init}>> to manage indices for the alias.
  136. include::set-up-a-data-stream.asciidoc[]
  137. include::use-a-data-stream.asciidoc[]
  138. include::change-mappings-and-settings.asciidoc[]