|
@@ -1,64 +1,125 @@
|
|
|
[[indices-shrink-index]]
|
|
|
== Shrink Index
|
|
|
|
|
|
-The shrink index API allows to shrink an existing index into a new index with a single shard.
|
|
|
-In order to shrink an index, all its shards must be allocated on a single node in the cluster.
|
|
|
-This is required since the shrink command will copy all shards index files into the target index
|
|
|
-data folder when the primary of the target index is initially allocated.
|
|
|
+The shrink index API allows you to shrink an existing index into a new index
|
|
|
+with a single primary shard. Before shrinking, a (primary or replica) copy of
|
|
|
+every shard in the index must be present on the same node.
|
|
|
|
|
|
-When an index is shrunk no write operations should happen to the source index. Elasticsearch will
|
|
|
-enforce the `read-only` property when the shrink command is executed. All operations necessary to shrink the
|
|
|
-source index are executed during initial primary recovery. Once the target index primary shard is started the
|
|
|
-shrink operation has successfully finished. To monitor status and progress use <<cat-recovery>>
|
|
|
+Shrinking works as follows:
|
|
|
|
|
|
+* First, it creates a new target index with the same definition as the source
|
|
|
+ index, but with a single primary shard.
|
|
|
|
|
|
-To shrink and index all shards of that index must be allocated on a single node.
|
|
|
+* Then it hard-links segments from the source index into the target index. (If
|
|
|
+ the file system doesn't support hard-linking, then all segments are copied
|
|
|
+ into the new index, which is a much more time consuming process.)
|
|
|
+
|
|
|
+* Finally, it recovers the target index as though it were a closed index which
|
|
|
+ had just been re-opened.
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Preparing an index for shrinking
|
|
|
+
|
|
|
+In order to shrink an index, the index must be marked as read-only, and a
|
|
|
+(primary or replica) copy of every shard in the index must be relocated to the
|
|
|
+same node and have <<cluster-health,health>> `green`.
|
|
|
+
|
|
|
+These two conditions can be achieved with the following request:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
-$ curl -XPUT 'http://localhost:9200/logs/_settings' -d '{
|
|
|
- "settings" : {
|
|
|
- "index.routing.allocation.require._name" : "shrink_node_name", <1>
|
|
|
- "index.blocks.write" : true <2>
|
|
|
- }
|
|
|
-}'
|
|
|
+PUT /my_source_index/_settings
|
|
|
+{
|
|
|
+ "settings": {
|
|
|
+ "index.routing.allocation.require._name": "shrink_node_name", <1>
|
|
|
+ "index.blocks.write": true <2>
|
|
|
+ }
|
|
|
+}
|
|
|
--------------------------------------------------
|
|
|
-<1> Forces the relocation of all of the indices shards to the node `shrink_node_name`
|
|
|
-<2> Prevents write operations to this index while still allowing metadata changes like deleting the index.
|
|
|
+<1> Forces the relocation of a copy of each shard to the node with name
|
|
|
+ `shrink_node_name`. See <<shard-allocation-filtering>> for more options.
|
|
|
+
|
|
|
+<2> Prevents write operations to this index while still allowing metadata
|
|
|
+ changes like deleting the index.
|
|
|
|
|
|
-The above second curl example shows how an index called `logs` can be
|
|
|
-forced to allocate at least one copy of each shard on a specific node in the cluster.
|
|
|
+It can take a while to relocate the source index. Progress can be tracked
|
|
|
+with the <<cat-recovery,`_cat recovery` API>>, or the <<cluster-health,
|
|
|
+`cluster health` API>> can be used to wait until all shards have relocated
|
|
|
+with the `wait_for_relocating_shards` parameter.
|
|
|
|
|
|
-The `_shrink` API is similar to <<indices-create-index>> and accepts `settings` and `aliases` for the target index.
|
|
|
+[float]
|
|
|
+=== Shrinking an index
|
|
|
+
|
|
|
+To shrink `my_source_index` into a new index called `my_target_index`, issue
|
|
|
+the following request:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
-$ curl -XPUT 'http://localhost:9200/logs/_shrink/logs_single_shard' -d '{
|
|
|
- "settings" : {
|
|
|
- "index.codec" : "best_compression", <1>
|
|
|
- }
|
|
|
-}'
|
|
|
+POST my_source_index/_shrink/my_target_index
|
|
|
--------------------------------------------------
|
|
|
-<1> Enables `best_compression` codec on the target index
|
|
|
|
|
|
-The API call above returns immediately once the target index is created but doesn't wait
|
|
|
-for the shrink operation to start. Once the target indices primary shard moves to state `initializing`
|
|
|
-the shrink operation has started.
|
|
|
+The above request returns immediately once the target index has been added to
|
|
|
+the cluster state -- it doesn't wait for the shrink operation to start.
|
|
|
|
|
|
-[float]
|
|
|
-[[shrink-index-limitations]]
|
|
|
-=== Limitations
|
|
|
+[IMPORTANT]
|
|
|
+=====================================
|
|
|
|
|
|
-Indices can only be shrunk into a single shard if they fully the following requirements:
|
|
|
+Indices can only be shrunk into a single shard if they satisfy the following requirements:
|
|
|
|
|
|
- * an instance of all of the indices shards must be allocated on a single node
|
|
|
- * the index must not contain more than `2.14 billion` documents (`2147483519`) in total (sum of all shards)
|
|
|
- This is the maximum shard size elasticsearch can support.
|
|
|
- * the index must have more than one shard
|
|
|
- * the index must be `read-only`, ie. have a cluster block set `index.blocks.write=true`
|
|
|
* the target index must not exist
|
|
|
- * all `index.analysis.*` and `index.similarity.*` settings passed to the `_shrink` call will be overwritten with the
|
|
|
- source indices settings.
|
|
|
- * if the target index can't be allocated on the shrink node, due to throttling or other allocation deciders,
|
|
|
- its primary shard will stay `unassigned` until it can be allocated on that node
|
|
|
+
|
|
|
+* The index must have more than one primary shard.
|
|
|
+
|
|
|
+* The index must not contain more than `2,147,483,519` documents in total
|
|
|
+ across all shards as this is the maximum number of docs that can fit into a
|
|
|
+ single shard.
|
|
|
+
|
|
|
+* The node handling the shrink process must have sufficient free disk space to
|
|
|
+ accommodate a second copy of the existing index.
|
|
|
+
|
|
|
+=====================================
|
|
|
+
|
|
|
+The `_shrink` API is similar to the <<indices-create-index, `create index` API>>
|
|
|
+and accepts `settings` and `aliases` parameters for the target index:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+POST my_source_index/_shrink/my_target_index
|
|
|
+{
|
|
|
+ "settings": {
|
|
|
+ "index.number_of_replicas": 1,
|
|
|
+ "index.codec": "best_compression" <1>
|
|
|
+ },
|
|
|
+ "aliases": {
|
|
|
+ "my_search_indices": {}
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+
|
|
|
+<1> Best compression will only take affect when new writes are made to the
|
|
|
+ index, such as when <<indices-forcemerge,force-merging>> the shard to a single
|
|
|
+ segment.
|
|
|
+
|
|
|
+NOTE: Mappings may not be specified in the `_shrink` request, and all
|
|
|
+`index.analysis.*` and `index.similarity.*` settings will be overwritten with
|
|
|
+the settings from the source index.
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Monitoring the shrink process
|
|
|
+
|
|
|
+The shrink process can be monitored with the <<cat-recovery,`_cat recovery`
|
|
|
+API>>, or the <<cluster-health, `cluster health` API>> can be used to wait
|
|
|
+until all primary shards have been allocated by setting the `wait_for_status`
|
|
|
+parameter to `yellow`.
|
|
|
+
|
|
|
+The `_shrink` API returns as soon as the target index has been added to the
|
|
|
+cluster state, before any shards have been allocated. At this point, all
|
|
|
+shards are in the state `unassigned`. If, for any reason, the target index
|
|
|
+can't be allocated on the shrink node, its primary shard will remain
|
|
|
+`unassigned` until it can be allocated on that node.
|
|
|
+
|
|
|
+Once the primary shard is allocated, it moves to state `initializing`, and the
|
|
|
+shrink process begins. When the shrink operation completes, the shard will
|
|
|
+become `active`. At that point, Elasticsearch will try to allocate any
|
|
|
+replicas and may decide to relocate the primary shard to another node.
|
|
|
|