datatiers.asciidoc 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264
  1. [role="xpack"]
  2. [[data-tiers]]
  3. == Data tiers
  4. A _data tier_ is a collection of <<modules-node,nodes>> within a cluster that share the same
  5. <<node-roles,data node role>>, and a hardware profile that's appropriately sized for the role. Elastic recommends that nodes in the same tier share the same
  6. hardware profile to avoid <<hotspotting,hot spotting>>.
  7. The data tiers that you use, and the way that you use them, depends on the data's <<data-management,category>>.
  8. The following data tiers are can be used with each data category:
  9. Content data:
  10. * <<content-tier,Content tier>> nodes handle the indexing and query load for non-timeseries
  11. indices, such as a product catalog.
  12. Time series data:
  13. * <<hot-tier,Hot tier>> nodes handle the indexing load for time series data,
  14. such as logs or metrics. They hold your most recent, most-frequently-accessed data.
  15. * <<warm-tier,Warm tier>> nodes hold time series data that is accessed less-frequently
  16. and rarely needs to be updated.
  17. * <<cold-tier,Cold tier>> nodes hold time series data that is accessed
  18. infrequently and not normally updated. To save space, you can keep
  19. <<fully-mounted,fully mounted indices>> of
  20. <<ilm-searchable-snapshot,{search-snaps}>> on the cold tier. These fully mounted
  21. indices eliminate the need for replicas, reducing required disk space by
  22. approximately 50% compared to the regular indices.
  23. * <<frozen-tier,Frozen tier>> nodes hold time series data that is accessed
  24. rarely and never updated. The frozen tier stores <<partially-mounted,partially
  25. mounted indices>> of <<ilm-searchable-snapshot,{search-snaps}>> exclusively.
  26. This extends the storage capacity even further — by up to 20 times compared to
  27. the warm tier.
  28. TIP: The performance of an {es} node is often limited by the performance of the underlying storage and hardware profile.
  29. For example hardware profiles, refer to Elastic Cloud's {cloud}/ec-reference-hardware.html[instance configurations].
  30. Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and <<search-use-faster-hardware,search>>.
  31. IMPORTANT: {es} generally expects nodes within a data tier to share the same
  32. hardware profile. Variations not following this recommendation should be
  33. carefully architected to avoid <<hotspotting,hot spotting>>.
  34. The way data tiers are used often depends on the data's category:
  35. - Content data remains on the <<content-tier,content tier>> for its entire
  36. data lifecycle.
  37. - Time series data may progress through the
  38. descending temperature data tiers (hot, warm, cold, and frozen) according to your
  39. performance, resiliency, and data retention requirements.
  40. +
  41. You can automate these lifecycle transitions using the <<data-streams,data stream lifecycle>>, or custom <<index-lifecycle-management,{ilm}>>.
  42. [discrete]
  43. [[available-tier]]
  44. === Available data tiers
  45. Learn more about each data tier, including when and how it should be used.
  46. [discrete]
  47. [[content-tier]]
  48. ==== Content tier
  49. // tag::content-tier[]
  50. Data stored in the content tier is generally a collection of items such as a product catalog or article archive.
  51. Unlike time series data, the value of the content remains relatively constant over time,
  52. so it doesn't make sense to move it to a tier with different performance characteristics as it ages.
  53. Content data typically has long data retention requirements, and you want to be able to retrieve
  54. items quickly regardless of how old they are.
  55. Content tier nodes are usually optimized for query performance--they prioritize processing power over IO throughput
  56. so they can process complex searches and aggregations and return results quickly.
  57. While they are also responsible for indexing, content data is generally not ingested at as high a rate
  58. as time series data such as logs and metrics. From a resiliency perspective the indices in this
  59. tier should be configured to use one or more replicas.
  60. The content tier is required and is often deployed within the same node
  61. grouping as the hot tier. System indices and other indices that aren't part
  62. of a data stream are automatically allocated to the content tier.
  63. // end::content-tier[]
  64. [discrete]
  65. [[hot-tier]]
  66. ==== Hot tier
  67. // tag::hot-tier[]
  68. The hot tier is the {es} entry point for time series data and holds your most-recent,
  69. most-frequently-searched time series data.
  70. Nodes in the hot tier need to be fast for both reads and writes,
  71. which requires more hardware resources and faster storage (SSDs).
  72. For resiliency, indices in the hot tier should be configured to use one or more replicas.
  73. The hot tier is required. New indices that are part of a <<data-streams,
  74. data stream>> are automatically allocated to the hot tier.
  75. // end::hot-tier[]
  76. [discrete]
  77. [[warm-tier]]
  78. ==== Warm tier
  79. // tag::warm-tier[]
  80. Time series data can move to the warm tier once it is being queried less frequently
  81. than the recently-indexed data in the hot tier.
  82. The warm tier typically holds data from recent weeks.
  83. Updates are still allowed, but likely infrequent.
  84. Nodes in the warm tier generally don't need to be as fast as those in the hot tier.
  85. For resiliency, indices in the warm tier should be configured to use one or more replicas.
  86. // end::warm-tier[]
  87. [discrete]
  88. [[cold-tier]]
  89. ==== Cold tier
  90. // tag::cold-tier[]
  91. When you no longer need to search time series data regularly, it can move from
  92. the warm tier to the cold tier. While still searchable, this tier is typically
  93. optimized for lower storage costs rather than search speed.
  94. For better storage savings, you can keep <<fully-mounted,fully mounted indices>>
  95. of <<ilm-searchable-snapshot,{search-snaps}>> on the cold tier. Unlike regular
  96. indices, these fully mounted indices don't require replicas for reliability. In
  97. the event of a failure, they can recover data from the underlying snapshot
  98. instead. This potentially halves the local storage needed for the data. A
  99. snapshot repository is required to use fully mounted indices in the cold tier.
  100. Fully mounted indices are read-only.
  101. Alternatively, you can use the cold tier to store regular indices with replicas instead
  102. of using {search-snaps}. This lets you store older data on less expensive hardware
  103. but doesn't reduce required disk space compared to the warm tier.
  104. // end::cold-tier[]
  105. [discrete]
  106. [[frozen-tier]]
  107. ==== Frozen tier
  108. // tag::frozen-tier[]
  109. Once data is no longer being queried, or being queried rarely, it may move from
  110. the cold tier to the frozen tier where it stays for the rest of its life.
  111. The frozen tier requires a snapshot repository.
  112. The frozen tier uses <<partially-mounted,partially mounted indices>> to store
  113. and load data from a snapshot repository. This reduces local storage and
  114. operating costs while still letting you search frozen data. Because {es} must
  115. sometimes fetch frozen data from the snapshot repository, searches on the frozen
  116. tier are typically slower than on the cold tier.
  117. // end::frozen-tier[]
  118. [discrete]
  119. [[configure-data-tiers]]
  120. === Configure data tiers
  121. Follow the instructions for your deployment type to configure data tiers.
  122. [discrete]
  123. [[configure-data-tiers-cloud]]
  124. ==== {ess} or {ece}
  125. The default configuration for an {ecloud} deployment includes a shared tier for
  126. hot and content data. This tier is required and can't be removed.
  127. To add a warm, cold, or frozen tier when you create a deployment:
  128. . On the **Create deployment** page, click **Advanced Settings**.
  129. . Click **+ Add capacity** for any data tiers to add.
  130. . Click **Create deployment** at the bottom of the page to save your changes.
  131. [role="screenshot"]
  132. image::images/data-tiers/ess-advanced-config-data-tiers.png[{ecloud}'s deployment Advanced configuration page,align=center]
  133. To add a data tier to an existing deployment:
  134. . Log in to the {ess-console}[{ecloud} console].
  135. . On the **Deployments** page, select your deployment.
  136. . In your deployment menu, select **Edit**.
  137. . Click **+ Add capacity** for any data tiers to add.
  138. . Click **Save** at the bottom of the page to save your changes.
  139. To remove a data tier, refer to {cloud}/ec-disable-data-tier.html[Disable a data
  140. tier].
  141. [discrete]
  142. [[configure-data-tiers-on-premise]]
  143. ==== Self-managed deployments
  144. For self-managed deployments, each node's <<data-node-role,data role>> is configured
  145. in `elasticsearch.yml`. For example, the highest-performance nodes in a cluster
  146. might be assigned to both the hot and content tiers:
  147. [source,yaml]
  148. ----
  149. node.roles: ["data_hot", "data_content"]
  150. ----
  151. NOTE: We recommend you use <<data-frozen-node,dedicated nodes>> in the frozen
  152. tier.
  153. [discrete]
  154. [[data-tier-allocation]]
  155. === Data tier index allocation
  156. The <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>> setting determines which tier the index should be allocated to.
  157. When you create an index, by default {es} sets the `_tier_preference`
  158. to `data_content` to automatically allocate the index shards to the content tier.
  159. When {es} creates an index as part of a <<data-streams, data stream>>,
  160. by default {es} sets the `_tier_preference`
  161. to `data_hot` to automatically allocate the index shards to the hot tier.
  162. At the time of index creation, you can override the default setting by explicitly setting
  163. the preferred value in one of two ways:
  164. - Using an <<index-templates,index template>>. Refer to <<getting-started-index-lifecycle-management,Automate rollover with ILM>> for details.
  165. - Within the <<indices-create-index,create index>> request body.
  166. You can override this
  167. setting after index creation by <<indices-update-settings,updating the index setting>> to the preferred
  168. value.
  169. This setting also accepts multiple tiers in order of preference. This prevents indices from remaining unallocated if no nodes are available in the preferred tier. For example, when {ilm} migrates an index to the cold phase, it sets the index `_tier_preference` to `data_cold,data_warm,data_hot`.
  170. To remove the data tier preference
  171. setting, set the `_tier_preference` value to `null`. This allows the index to allocate to any data node within the cluster. Setting the `_tier_preference` to `null` does not restore the default value. Note that, in the case of managed indices, a <<ilm-migrate,migrate>> action might apply a new value in its place.
  172. [discrete]
  173. [[data-tier-allocation-value]]
  174. ==== Determine the current data tier preference
  175. You can check an existing index's data tier preference by <<indices-get-settings,polling its
  176. settings>> for `index.routing.allocation.include._tier_preference`:
  177. [source,console]
  178. --------------------------------------------------
  179. GET /my-index-000001/_settings?filter_path=*.settings.index.routing.allocation.include._tier_preference
  180. --------------------------------------------------
  181. // TEST[setup:my_index]
  182. [discrete]
  183. [[data-tier-allocation-troubleshooting]]
  184. ==== Troubleshooting
  185. The `_tier_preference` setting might conflict with other allocation settings. This conflict might prevent the shard from allocating. A conflict might occur when a cluster has not yet been completely <<troubleshoot-migrate-to-tiers,migrated
  186. to data tiers>>.
  187. This setting will not unallocate a currently allocated shard, but might prevent it from migrating from its current location to its designated data tier. To troubleshoot, call the <<cluster-allocation-explain,cluster allocation explain API>> and specify the suspected problematic shard.
  188. [discrete]
  189. [[data-tier-migration]]
  190. ==== Automatic data tier migration
  191. {ilm-init} automatically transitions managed
  192. indices through the available data tiers using the <<ilm-migrate, migrate>> action.
  193. By default, this action is automatically injected in every phase.
  194. You can explicitly specify the migrate action with `"enabled": false` to <<ilm-disable-migrate-ex,disable automatic migration>>,
  195. for example, if you're using the <<ilm-allocate, allocate action>> to manually
  196. specify allocation rules.