datatiers.asciidoc 5.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
  1. [role="xpack"]
  2. [[data-tiers]]
  3. == Data tiers
  4. A _data tier_ is a collection of nodes with the same data role that
  5. typically share the same hardware profile:
  6. * <<content-tier, Content tier>> nodes handle the indexing and query load for content such as a product catalog.
  7. * <<hot-tier, Hot tier>> nodes handle the indexing load for time series data such as logs or metrics
  8. and hold your most recent, most-frequently-accessed data.
  9. * <<warm-tier, Warm tier>> nodes hold time series data that is accessed less-frequently
  10. and rarely needs to be updated.
  11. * <<cold-tier, Cold tier>> nodes hold time series data that is accessed infrequently and not normally updated.
  12. * <<frozen-tier, Frozen tier>> nodes hold time series data that is accessed rarely and never updated, kept in searchable snapshots.
  13. When you index documents directly to a specific index, they remain on content tier nodes indefinitely.
  14. When you index documents to a data stream, they initially reside on hot tier nodes.
  15. You can configure <<index-lifecycle-management, {ilm}>> ({ilm-init}) policies
  16. to automatically transition your time series data through the hot, warm, and cold tiers
  17. according to your performance, resiliency and data retention requirements.
  18. A node's <<data-node, data role>> is configured in `elasticsearch.yml`.
  19. For example, the highest-performance nodes in a cluster might be assigned to both the hot and content tiers:
  20. [source,yaml]
  21. --------------------------------------------------
  22. node.roles: ["data_hot", "data_content"]
  23. --------------------------------------------------
  24. [discrete]
  25. [[content-tier]]
  26. === Content tier
  27. Data stored in the content tier is generally a collection of items such as a product catalog or article archive.
  28. Unlike time series data, the value of the content remains relatively constant over time,
  29. so it doesn't make sense to move it to a tier with different performance characteristics as it ages.
  30. Content data typically has long data retention requirements, and you want to be able to retrieve
  31. items quickly regardless of how old they are.
  32. Content tier nodes are usually optimized for query performance--they prioritize processing power over IO throughput
  33. so they can process complex searches and aggregations and return results quickly.
  34. While they are also responsible for indexing, content data is generally not ingested at as high a rate
  35. as time series data such as logs and metrics. From a resiliency perspective the indices in this
  36. tier should be configured to use one or more replicas.
  37. The content tier is required. System indices and other indices that aren't part
  38. of a data stream are automatically allocated to the content tier.
  39. [discrete]
  40. [[hot-tier]]
  41. === Hot tier
  42. The hot tier is the {es} entry point for time series data and holds your most-recent,
  43. most-frequently-searched time series data.
  44. Nodes in the hot tier need to be fast for both reads and writes,
  45. which requires more hardware resources and faster storage (SSDs).
  46. For resiliency, indices in the hot tier should be configured to use one or more replicas.
  47. The hot tier is required. New indices that are part of a <<data-streams,
  48. data stream>> are automatically allocated to the hot tier.
  49. [discrete]
  50. [[warm-tier]]
  51. === Warm tier
  52. Time series data can move to the warm tier once it is being queried less frequently
  53. than the recently-indexed data in the hot tier.
  54. The warm tier typically holds data from recent weeks.
  55. Updates are still allowed, but likely infrequent.
  56. Nodes in the warm tier generally don't need to be as fast as those in the hot tier.
  57. For resiliency, indices in the warm tier should be configured to use one or more replicas.
  58. [discrete]
  59. [[cold-tier]]
  60. === Cold tier
  61. When data is no longer being updated, it can move from the warm tier to the cold
  62. tier where it stays while being queried infrequently. As data transitions into
  63. the cold tier, it can be compressed and shrunken. Instead of using replicas, the
  64. cold tier can use <<fully-mounted,fully mounted indices>> of
  65. <<ilm-searchable-snapshot,{search-snaps}>> for resiliency, eliminating the need
  66. for replicas. The cold tier is still a responsive query tier, but data becomes
  67. read-only.
  68. [discrete]
  69. [[frozen-tier]]
  70. === Frozen tier
  71. Once data is no longer being queried, or being queried rarely, it may move from
  72. the cold tier to the frozen tier where it stays for the rest of its life.
  73. The frozen tier uses <<partially-mounted,partially mounted indices>> to store
  74. and load data from a snapshot repository. This reduces local storage and
  75. operating costs while still letting you search frozen data. Because {es} must
  76. sometimes fetch frozen data from the snapshot repository, searches on the frozen
  77. tier are typically slower than on the cold tier.
  78. NOTE: We recommend you use <<data-frozen-node,dedicated nodes>> in the frozen
  79. tier.
  80. [discrete]
  81. [[data-tier-allocation]]
  82. === Data tier index allocation
  83. When you create an index, by default {es} sets
  84. <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
  85. to `data_content` to automatically allocate the index shards to the content tier.
  86. When {es} creates an index as part of a <<data-streams, data stream>>,
  87. by default {es} sets
  88. <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
  89. to `data_hot` to automatically allocate the index shards to the hot tier.
  90. You can explicitly set `index.routing.allocation.include._tier_preference`
  91. to opt out of the default tier-based allocation.
  92. [discrete]
  93. [[data-tier-migration]]
  94. === Automatic data tier migration
  95. {ilm-init} automatically transitions managed
  96. indices through the available data tiers using the <<ilm-migrate, migrate>> action.
  97. By default, this action is automatically injected in every phase.
  98. You can explicitly specify the migrate action with `"enabled": false` to disable automatic migration,
  99. for example, if you're using the <<ilm-allocate, allocate action>> to manually
  100. specify allocation rules.