123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198 |
- [role="xpack"]
- [[data-tiers]]
- == Data tiers
- A _data tier_ is a collection of nodes with the same data role that
- typically share the same hardware profile:
- * <<content-tier, Content tier>> nodes handle the indexing and query load for content such as a product catalog.
- * <<hot-tier, Hot tier>> nodes handle the indexing load for time series data such as logs or metrics
- and hold your most recent, most-frequently-accessed data.
- * <<warm-tier, Warm tier>> nodes hold time series data that is accessed less-frequently
- and rarely needs to be updated.
- * <<cold-tier,Cold tier>> nodes hold time series data that is accessed
- infrequently and not normally updated. To save space, you can keep
- <<fully-mounted,fully mounted indices>> of
- <<ilm-searchable-snapshot,{search-snaps}>> on the cold tier. These fully mounted
- indices eliminate the need for replicas, reducing required disk space by
- approximately 50% compared to the regular indices.
- * <<frozen-tier, Frozen tier>> nodes hold time series data that is accessed
- rarely and never updated. The frozen tier stores <<partially-mounted,partially
- mounted indices>> of <<ilm-searchable-snapshot,{search-snaps}>> exclusively.
- This extends the storage capacity even further — by up to 20 times compared to
- the warm tier.
- IMPORTANT: {es} generally expects nodes within a data tier to share the same
- hardware profile. Variations not following this recommendation should be
- carefully architected to avoid <<hotspotting,hot spotting>>.
- When you index documents directly to a specific index, they remain on content tier nodes indefinitely.
- When you index documents to a data stream, they initially reside on hot tier nodes.
- You can configure <<index-lifecycle-management, {ilm}>> ({ilm-init}) policies
- to automatically transition your time series data through the hot, warm, and cold tiers
- according to your performance, resiliency and data retention requirements.
- [discrete]
- [[content-tier]]
- === Content tier
- // tag::content-tier[]
- Data stored in the content tier is generally a collection of items such as a product catalog or article archive.
- Unlike time series data, the value of the content remains relatively constant over time,
- so it doesn't make sense to move it to a tier with different performance characteristics as it ages.
- Content data typically has long data retention requirements, and you want to be able to retrieve
- items quickly regardless of how old they are.
- Content tier nodes are usually optimized for query performance--they prioritize processing power over IO throughput
- so they can process complex searches and aggregations and return results quickly.
- While they are also responsible for indexing, content data is generally not ingested at as high a rate
- as time series data such as logs and metrics. From a resiliency perspective the indices in this
- tier should be configured to use one or more replicas.
- The content tier is required. System indices and other indices that aren't part
- of a data stream are automatically allocated to the content tier.
- // end::content-tier[]
- [discrete]
- [[hot-tier]]
- === Hot tier
- // tag::hot-tier[]
- The hot tier is the {es} entry point for time series data and holds your most-recent,
- most-frequently-searched time series data.
- Nodes in the hot tier need to be fast for both reads and writes,
- which requires more hardware resources and faster storage (SSDs).
- For resiliency, indices in the hot tier should be configured to use one or more replicas.
- The hot tier is required. New indices that are part of a <<data-streams,
- data stream>> are automatically allocated to the hot tier.
- // end::hot-tier[]
- [discrete]
- [[warm-tier]]
- === Warm tier
- // tag::warm-tier[]
- Time series data can move to the warm tier once it is being queried less frequently
- than the recently-indexed data in the hot tier.
- The warm tier typically holds data from recent weeks.
- Updates are still allowed, but likely infrequent.
- Nodes in the warm tier generally don't need to be as fast as those in the hot tier.
- For resiliency, indices in the warm tier should be configured to use one or more replicas.
- // end::warm-tier[]
- [discrete]
- [[cold-tier]]
- === Cold tier
- // tag::cold-tier[]
- When you no longer need to search time series data regularly, it can move from
- the warm tier to the cold tier. While still searchable, this tier is typically
- optimized for lower storage costs rather than search speed.
- For better storage savings, you can keep <<fully-mounted,fully mounted indices>>
- of <<ilm-searchable-snapshot,{search-snaps}>> on the cold tier. Unlike regular
- indices, these fully mounted indices don't require replicas for reliability. In
- the event of a failure, they can recover data from the underlying snapshot
- instead. This potentially halves the local storage needed for the data. A
- snapshot repository is required to use fully mounted indices in the cold tier.
- Fully mounted indices are read-only.
- Alternatively, you can use the cold tier to store regular indices with replicas instead
- of using {search-snaps}. This lets you store older data on less expensive hardware
- but doesn't reduce required disk space compared to the warm tier.
- // end::cold-tier[]
- [discrete]
- [[frozen-tier]]
- === Frozen tier
- // tag::frozen-tier[]
- Once data is no longer being queried, or being queried rarely, it may move from
- the cold tier to the frozen tier where it stays for the rest of its life.
- The frozen tier requires a snapshot repository.
- The frozen tier uses <<partially-mounted,partially mounted indices>> to store
- and load data from a snapshot repository. This reduces local storage and
- operating costs while still letting you search frozen data. Because {es} must
- sometimes fetch frozen data from the snapshot repository, searches on the frozen
- tier are typically slower than on the cold tier.
- // end::frozen-tier[]
- [discrete]
- [[configure-data-tiers-cloud]]
- === Configure data tiers on {ess} or {ece}
- The default configuration for an {ecloud} deployment includes a shared tier for
- hot and content data. This tier is required and can't be removed.
- To add a warm, cold, or frozen tier when you create a deployment:
- . On the **Create deployment** page, click **Advanced Settings**.
- . Click **+ Add capacity** for any data tiers to add.
- . Click **Create deployment** at the bottom of the page to save your changes.
- [role="screenshot"]
- image::images/data-tiers/ess-advanced-config-data-tiers.png[{ecloud}'s deployment Advanced configuration page,align=center]
- To add a data tier to an existing deployment:
- . Log in to the {ess-console}[{ecloud} console].
- . On the **Deployments** page, select your deployment.
- . In your deployment menu, select **Edit**.
- . Click **+ Add capacity** for any data tiers to add.
- . Click **Save** at the bottom of the page to save your changes.
- To remove a data tier, refer to {cloud}/ec-disable-data-tier.html[Disable a data
- tier].
- [discrete]
- [[configure-data-tiers-on-premise]]
- === Configure data tiers for self-managed deployments
- For self-managed deployments, each node's <<data-node,data role>> is configured
- in `elasticsearch.yml`. For example, the highest-performance nodes in a cluster
- might be assigned to both the hot and content tiers:
- [source,yaml]
- ----
- node.roles: ["data_hot", "data_content"]
- ----
- NOTE: We recommend you use <<data-frozen-node,dedicated nodes>> in the frozen
- tier.
- [discrete]
- [[data-tier-allocation]]
- === Data tier index allocation
- When you create an index, by default {es} sets
- <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
- to `data_content` to automatically allocate the index shards to the content tier.
- When {es} creates an index as part of a <<data-streams, data stream>>,
- by default {es} sets
- <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
- to `data_hot` to automatically allocate the index shards to the hot tier.
- You can explicitly set `index.routing.allocation.include._tier_preference`
- to opt out of the default tier-based allocation.
- [discrete]
- [[data-tier-migration]]
- === Automatic data tier migration
- {ilm-init} automatically transitions managed
- indices through the available data tiers using the <<ilm-migrate, migrate>> action.
- By default, this action is automatically injected in every phase.
- You can explicitly specify the migrate action with `"enabled": false` to disable automatic migration,
- for example, if you're using the <<ilm-allocate, allocate action>> to manually
- specify allocation rules.
|