浏览代码

[DOCS] document tail merging and create tutorial for migrating to DSL (#101117)

This documents tail merging, the enabled flag, and
adds a tutorial to migrate a data stream from ILM to DSL.
Andrei Dan 1 年之前
父节点
当前提交
74ea04fb2d

+ 6 - 0
docs/reference/data-streams/lifecycle/apis/put-lifecycle.asciidoc

@@ -53,6 +53,12 @@ Defaults to `open`.
 (Optional, string)
 If defined, every document added to this data stream will be stored at least for this time frame. Any time after this
 duration the document could be deleted. When empty, every document in this data stream will be stored indefinitely.
+
+`enabled`::
+(Optional, boolean)
+If defined, it turns data streqm lifecycle on/off (`true`/`false`) for this data stream.
+A data stream lifecycle that's disabled (`enabled: false`) will have no effect on the
+data stream. Defaults to `true`.
 ====
 
 [[data-streams-put-lifecycle-example]]

+ 12 - 2
docs/reference/data-streams/lifecycle/index.asciidoc

@@ -28,7 +28,14 @@ each data stream and performs the following steps:
 1. Checks if the data stream has a data stream lifecycle configured, skipping any indices not part of a managed data stream.
 2. Rolls over the write index of the data stream, if it fulfills the conditions defined by
 <<cluster-lifecycle-default-rollover,`cluster.lifecycle.default.rollover`>>.
-3. Applies retention to the remaining backing indices. This means deleting the backing indices whose
+3. After an index is not the write index anymore (i.e. the data stream has been rolled over),
+automatically tail merges the index. Data stream lifecycle executes a merge operation that only targets
+the long tail of small segments instead of the whole shard. As the segments are organised
+into tiers of exponential sizes, merging the long tail of small segments is only a 
+fraction of the cost of force mergeing to a single segment. The small segments would usually
+hold the most recent data so tail mergeing will focus the merging resources on the higher-value
+data that is most likely to keep being queried.
+4. Applies retention to the remaining backing indices. This means deleting the backing indices whose
 `generation_time` is longer than the configured retention period. The `generation_time` is only applicable to rolled over backing
 indices and it is either the time since the backing index got rolled over, or the time optionally configured in the
 <<index-data-stream-lifecycle-origination-date,`index.lifecycle.origination_date`>> setting.
@@ -37,7 +44,7 @@ IMPORTANT: We use the `generation_time` instead of the creation time because thi
 index have passed the retention period. As a result, the retention period is not the exact time data gets deleted, but
 the minimum time data will be stored.
 
-NOTE: The steps `2` and `3` apply only to backing indices that are not already managed by {ilm-init}, meaning that these indices either do
+NOTE: Steps `2-4` apply only to backing indices that are not already managed by {ilm-init}, meaning that these indices either do
 not have an {ilm-init} policy defined, or if they do, they have <<index-lifecycle-prefer-ilm,`index.lifecycle.prefer_ilm`>>
 set to `false`.
 
@@ -55,6 +62,7 @@ that matches the name of your data stream (see <<tutorial-manage-new-data-stream
 with the name of your data stream reaches {es} then the data stream will be created with the respective data stream lifecycle.
 * To update the lifecycle of an existing data stream you need to use the <<data-stream-lifecycle-api, data stream lifecycle APIs>>
 to edit the lifecycle on the data stream itself (see <<tutorial-manage-existing-data-stream>>).
+* Migrate an existing {ilm-init} managed data stream to Data stream lifecycle using <<tutorial-migrate-data-stream-from-ilm-to-dsl>>.
 
 NOTE: Updating the data stream lifecycle of an existing data stream is different from updating the settings or the mapping,
 because it is applied on the data stream level and not on the individual backing indices.
@@ -62,3 +70,5 @@ because it is applied on the data stream level and not on the individual backing
 include::tutorial-manage-new-data-stream.asciidoc[]
 
 include::tutorial-manage-existing-data-stream.asciidoc[]
+
+include::tutorial-migrate-data-stream-from-ilm-to-dsl.asciidoc[]

+ 460 - 0
docs/reference/data-streams/lifecycle/tutorial-migrate-data-stream-from-ilm-to-dsl.asciidoc

@@ -0,0 +1,460 @@
+[role="xpack"]
+[[tutorial-migrate-data-stream-from-ilm-to-dsl]]
+=== Tutorial: Migrate ILM managed data stream to Data stream lifecycle 
+
+preview:[]
+
+In this tutorial we'll look at migrating an existing data stream from {ilm-init} to 
+Data stream lifecycle. The existing {ilm-init} managed backing indices will continue 
+to be managed by {ilm-init} until they age out and get deleted by {ilm-init}; however,
+the new backing indices will be managed by Data stream lifecycle. 
+This way, a data stream is gradually migrated away from being managed by {ilm-cap} to 
+being managed by Data stream lifecycle. As we'll see, {ilm-cap} and Data stream lifecycle
+can co-manage a data stream; however, an index can only be managed by one system at 
+a time.
+
+Let's first create a data stream with two backing indices managed by {ilm-cap}.
+We first create an {ilm-cap} policy:
+
+[source,console]
+----
+PUT _ilm/policy/pre-dsl-ilm-policy
+{
+  "policy": {
+    "phases": {
+      "hot": {
+        "actions": {
+          "rollover": {
+            "max_primary_shard_size": "50gb"
+          }
+        }
+      },
+      "delete": {
+        "min_age": "7d",
+        "actions": {
+          "delete": {}
+        }
+      }
+    }
+  }
+}
+----
+
+And let's create an index template that'll back the data stream and configures {ilm-cap}:
+
+[source,console]
+----
+PUT _index_template/dsl-data-stream-template
+{
+  "index_patterns": ["dsl-data-stream*"],
+  "data_stream": { },
+  "priority": 500,
+  "template": {
+    "settings": {
+      "index.lifecycle.name": "pre-dsl-ilm-policy"
+    }
+  }
+}
+----
+// TEST[continued]
+
+We'll now index a document targetting `dsl-data-stream` to create the data stream
+and we'll also manually rollover the data stream to have another generation index created:
+
+[source,console]
+----
+POST dsl-data-stream/_doc?
+{
+  "@timestamp": "2023-10-18T16:21:15.000Z",
+  "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
+}
+----
+// TEST[continued]
+
+[source,console]
+----
+POST dsl-data-stream/_rollover
+----
+// TEST[continued]
+
+We'll use the <<indices-get-data-stream, GET _data_stream>> API to inspect the state of 
+the data stream:
+
+[source,console]
+--------------------------------------------------
+GET _data_stream/dsl-data-stream
+--------------------------------------------------
+// TEST[continued]
+
+Inspecting the response we'll see that both backing indices are managed by {ilm-init}
+and that the next generation index will also be managed by {ilm-init}: 
+
+[source,console-result]
+----
+{
+  "data_streams": [
+    {
+      "name": "dsl-data-stream",
+      "timestamp_field": {
+        "name": "@timestamp"
+      },
+      "indices": [
+        {
+          "index_name": ".ds-dsl-data-stream-2023.10.19-000001",    <1> 
+          "index_uuid": "xCEhwsp8Tey0-FLNFYVwSg",
+          "prefer_ilm": true,                                       <2>
+          "ilm_policy": "pre-dsl-ilm-policy",                       <3>
+          "managed_by": "Index Lifecycle Management"                <4>
+        },
+        {
+          "index_name": ".ds-dsl-data-stream-2023.10.19-000002",
+          "index_uuid": "PA_JquKGSiKcAKBA8DJ5gw",
+          "prefer_ilm": true,
+          "ilm_policy": "pre-dsl-ilm-policy",
+          "managed_by": "Index Lifecycle Management"
+        }
+      ],
+      "generation": 2,
+      "status": "GREEN",
+      "template": "dsl-data-stream-template",
+      "next_generation_managed_by": "Index Lifecycle Management",   <5>
+      "prefer_ilm": true,                                           <6>
+      "ilm_policy": "pre-dsl-ilm-policy",                           <7>
+      "hidden": false,
+      "system": false,
+      "allow_custom_routing": false,
+      "replicated": false
+    }
+  ]
+}
+----
+// TESTRESPONSE[s/"index_name": ".ds-dsl-data-stream-2023.10.19-000001"/"index_name": $body.data_streams.0.indices.0.index_name/]
+// TESTRESPONSE[s/"index_uuid": "xCEhwsp8Tey0-FLNFYVwSg"/"index_uuid": $body.data_streams.0.indices.0.index_uuid/]
+// TESTRESPONSE[s/"index_name": ".ds-dsl-data-stream-2023.10.19-000002"/"index_name": $body.data_streams.0.indices.1.index_name/]
+// TESTRESPONSE[s/"index_uuid": "PA_JquKGSiKcAKBA8DJ5gw"/"index_uuid": $body.data_streams.0.indices.1.index_uuid/]
+// TESTRESPONSE[s/"status": "GREEN"/"status": "YELLOW"/]
+
+<1> The name of the backing index.
+<2> For each backing index we display the value of the <<index-lifecycle-prefer-ilm, prefer_ilm>> 
+configuration which will indicate if {ilm-init} takes precedence over data stream lifecycle in case
+both systems are configured for an index.
+<3> The {ilm-ini} policy configured for this index.
+<4> The system that manages this index (possible values are "Index Lifecycle Management", 
+"Data stream lifecycle", or "Unmanaged")
+<5> The system that will manage the next generation index (the new write index of this 
+data stream, once the data stream is rolled over). The possible values are 
+"Index Lifecycle Management", "Data stream lifecycle", or "Unmanaged".
+<6> The <<index-lifecycle-prefer-ilm, prefer_ilm>> value configured in the index template
+that's backing the data stream. This value will be configured for all the new backing indices.
+If it's not configured in the index template the backing indices will receive the `true`
+default value ({ilm-init} takes precedence over data stream lifecycle by default as it's 
+currently richer in features).
+<7> The {ilm-init} policy configured in the index template that's backing this data 
+stream (which will be configured on all the new backing indices, as long as it exists 
+in the index template).
+
+To migrate the `dsl-data-stream` to data stream lifecycle we'll have to execute
+two steps:
+
+1. Update the index template that's backing the index template to configure <<index-lifecycle-prefer-ilm, prefer_ilm>> 
+to `false`, and to configure data stream lifecycle. 
+2. Configure the data stream lifecycle for the _existing_ `dsl-data-stream` using
+the <<data-streams-put-lifecycle, lifecycle API>>.
+
+IMPORTANT: The data stream lifecycle configuration that's added to the index template,
+being a data stream configuration, will only apply to **new** data streams. 
+Our data stream exists already, so even though we added a data stream lifecycle
+configuration in the index template it will not be applied to `dsl-data-stream`.
+
+
+[[update-index-template-for-dsl]]
+Let's update the index template:
+
+[source,console]
+----
+PUT _index_template/dsl-data-stream-template
+{
+  "index_patterns": ["dsl-data-stream*"],
+  "data_stream": { },
+  "priority": 500,
+  "template": {
+    "settings": {
+      "index.lifecycle.name": "pre-dsl-ilm-policy",
+      "index.lifecycle.prefer_ilm": false             <1>
+    },
+    "lifecycle": {
+      "data_retention": "7d"                          <2>
+    }
+  }
+}
+----
+// TEST[continued]
+
+<1> The `prefer_ilm` setting will now be configured on the **new** backing indices
+(created by rolling over the data stream) such that {ilm-init} does _not_ take 
+precedence over Data stream lifecycle.
+<2> We're configuring the data stream lifecycle so _new_ data streams will be 
+managed by Data stream lifecycle.
+
+We've now make sure that new data streams will be managed by Data stream lifecycle.
+
+Let's update our existing `dsl-data-stream` and configure Data stream lifecycle:
+
+[source,console]
+----
+PUT _data_stream/dsl-data-stream/_lifecycle
+{
+    "data_retention": "7d"
+}
+----
+// TEST[continued]
+
+We can inspect the data stream to check that the next generation will indeed be 
+managed by Data stream lifecycle:
+
+[source,console]
+--------------------------------------------------
+GET _data_stream/dsl-data-stream
+--------------------------------------------------
+// TEST[continued]
+
+[source,console-result]
+----
+{
+  "data_streams": [
+    {
+      "name": "dsl-data-stream",
+      "timestamp_field": {
+        "name": "@timestamp"
+      },
+      "indices": [
+        {
+          "index_name": ".ds-dsl-data-stream-2023.10.19-000001",     
+          "index_uuid": "xCEhwsp8Tey0-FLNFYVwSg",
+          "prefer_ilm": true,                                       
+          "ilm_policy": "pre-dsl-ilm-policy",                       
+          "managed_by": "Index Lifecycle Management"                <1>
+        },
+        {
+          "index_name": ".ds-dsl-data-stream-2023.10.19-000002",
+          "index_uuid": "PA_JquKGSiKcAKBA8DJ5gw",
+          "prefer_ilm": true,
+          "ilm_policy": "pre-dsl-ilm-policy",
+          "managed_by": "Index Lifecycle Management"                <2>
+        }
+      ],
+      "generation": 2,
+      "status": "GREEN",
+      "template": "dsl-data-stream-template",
+      "lifecycle": {
+        "enabled": true,
+        "data_retention": "7d"
+      },
+      "ilm_policy": "pre-dsl-ilm-policy",                           
+      "next_generation_managed_by": "Data stream lifecycle",         <3>
+      "prefer_ilm": false,                                           <4>
+      "hidden": false,
+      "system": false,
+      "allow_custom_routing": false,
+      "replicated": false
+    }
+  ]
+}
+----
+// TESTRESPONSE[s/"index_name": ".ds-dsl-data-stream-2023.10.19-000001"/"index_name": $body.data_streams.0.indices.0.index_name/]
+// TESTRESPONSE[s/"index_uuid": "xCEhwsp8Tey0-FLNFYVwSg"/"index_uuid": $body.data_streams.0.indices.0.index_uuid/]
+// TESTRESPONSE[s/"index_name": ".ds-dsl-data-stream-2023.10.19-000002"/"index_name": $body.data_streams.0.indices.1.index_name/]
+// TESTRESPONSE[s/"index_uuid": "PA_JquKGSiKcAKBA8DJ5gw"/"index_uuid": $body.data_streams.0.indices.1.index_uuid/]
+// TESTRESPONSE[s/"status": "GREEN"/"status": "YELLOW"/]
+
+<1> The existing backing index will continue to be managed by {ilm-init}
+<2> The existing backing index will continue to be managed by {ilm-init}
+<3> The next generation index will be managed by Data stream lifecycle
+<4> The `prefer_ilm` setting value we configured in the index template is reflected
+and will be configured accordingly for new backing indices.
+
+We'll now rollover the data stream to see the new generation index being managed by 
+Data stream lifecycle:
+
+[source,console]
+----
+POST dsl-data-stream/_rollover
+----
+// TEST[continued]
+
+[source,console]
+----
+GET _data_stream/dsl-data-stream
+----
+// TEST[continued]
+
+[source,console-result]
+----
+{
+  "data_streams": [
+    {
+      "name": "dsl-data-stream",
+      "timestamp_field": {
+        "name": "@timestamp"
+      },
+      "indices": [
+        {
+          "index_name": ".ds-dsl-data-stream-2023.10.19-000001",     
+          "index_uuid": "xCEhwsp8Tey0-FLNFYVwSg",
+          "prefer_ilm": true,                                       
+          "ilm_policy": "pre-dsl-ilm-policy",                       
+          "managed_by": "Index Lifecycle Management"                <1> 
+        },
+        {
+          "index_name": ".ds-dsl-data-stream-2023.10.19-000002",
+          "index_uuid": "PA_JquKGSiKcAKBA8DJ5gw",
+          "prefer_ilm": true,
+          "ilm_policy": "pre-dsl-ilm-policy",
+          "managed_by": "Index Lifecycle Management"                <2>
+        },
+        {
+          "index_name": ".ds-dsl-data-stream-2023.10.19-000003",
+          "index_uuid": "PA_JquKGSiKcAKBA8abcd1",
+          "prefer_ilm": false,                                      <3>
+          "ilm_policy": "pre-dsl-ilm-policy",
+          "managed_by": "Data stream lifecycle"                     <4>
+        }
+      ],
+      "generation": 3,
+      "status": "GREEN",
+      "template": "dsl-data-stream-template",
+      "lifecycle": {
+        "enabled": true,
+        "data_retention": "7d"
+      },
+      "ilm_policy": "pre-dsl-ilm-policy",                           
+      "next_generation_managed_by": "Data stream lifecycle",        
+      "prefer_ilm": false,                                          
+      "hidden": false,
+      "system": false,
+      "allow_custom_routing": false,
+      "replicated": false
+    }
+  ]
+}
+----
+// TESTRESPONSE[s/"index_name": ".ds-dsl-data-stream-2023.10.19-000001"/"index_name": $body.data_streams.0.indices.0.index_name/]
+// TESTRESPONSE[s/"index_uuid": "xCEhwsp8Tey0-FLNFYVwSg"/"index_uuid": $body.data_streams.0.indices.0.index_uuid/]
+// TESTRESPONSE[s/"index_name": ".ds-dsl-data-stream-2023.10.19-000002"/"index_name": $body.data_streams.0.indices.1.index_name/]
+// TESTRESPONSE[s/"index_uuid": "PA_JquKGSiKcAKBA8DJ5gw"/"index_uuid": $body.data_streams.0.indices.1.index_uuid/]
+// TESTRESPONSE[s/"index_name": ".ds-dsl-data-stream-2023.10.19-000003"/"index_name": $body.data_streams.0.indices.2.index_name/]
+// TESTRESPONSE[s/"index_uuid": "PA_JquKGSiKcAKBA8abcd1"/"index_uuid": $body.data_streams.0.indices.2.index_uuid/]
+// TESTRESPONSE[s/"status": "GREEN"/"status": "YELLOW"/]
+
+<1> The backing indices that existed before rollover will continue to be managed by {ilm-init}
+<2> The backing indices that existed before rollover will continue to be managed by {ilm-init}
+<3> The new write index received the `false` value for the `prefer_ilm` setting, as we configured
+in the index template
+<4> The new write index is managed by `Data stream lifecycle`
+
+We can easily change this data stream to be managed by {ilm-cap} because we didn't remove 
+the {ilm-cap} policy when we <<update-index-template-for-dsl, updated 
+the index template>>. 
+
+We can achieve this in two ways:
+
+1. <<data-streams-delete-lifecycle, Delete the lifecycle>> from the data streams
+2. Disable Data stream lifecycle by configured the `enabled` flag to `false`.
+
+Let's implement option 2 and disable the data stream lifecycle: 
+
+[source,console]
+----
+PUT _data_stream/dsl-data-stream/_lifecycle
+{
+    "data_retention": "7d",
+    "enabled": false <1> 
+}
+----
+// TEST[continued]
+<1> The `enabled` flag can be ommitted and defaults to `true` however, here we 
+explicitly configure it to `false`
+Let's check the state of the data stream:
+
+[source,console]
+----
+GET _data_stream/dsl-data-stream
+----
+// TEST[continued]
+
+[source,console-result]
+----
+{
+  "data_streams": [
+    {
+      "name": "dsl-data-stream",
+      "timestamp_field": {
+        "name": "@timestamp"
+      },
+      "indices": [
+        {
+          "index_name": ".ds-dsl-data-stream-2023.10.19-000001",     
+          "index_uuid": "xCEhwsp8Tey0-FLNFYVwSg",
+          "prefer_ilm": true,                                       
+          "ilm_policy": "pre-dsl-ilm-policy",                       
+          "managed_by": "Index Lifecycle Management"                
+        },
+        {
+          "index_name": ".ds-dsl-data-stream-2023.10.19-000002",
+          "index_uuid": "PA_JquKGSiKcAKBA8DJ5gw",
+          "prefer_ilm": true,
+          "ilm_policy": "pre-dsl-ilm-policy",
+          "managed_by": "Index Lifecycle Management"                
+        },
+        {
+          "index_name": ".ds-dsl-data-stream-2023.10.19-000003",
+          "index_uuid": "PA_JquKGSiKcAKBA8abcd1",
+          "prefer_ilm": false,                                      
+          "ilm_policy": "pre-dsl-ilm-policy",
+          "managed_by": "Index Lifecycle Management"                <1>
+        }
+      ],
+      "generation": 3,
+      "status": "GREEN",
+      "template": "dsl-data-stream-template",
+      "lifecycle": {
+        "enabled": false,                                          <2>
+        "data_retention": "7d"
+      },
+      "ilm_policy": "pre-dsl-ilm-policy",                           
+      "next_generation_managed_by": "Index Lifecycle Management",  <3>        
+      "prefer_ilm": false,                                          
+      "hidden": false,
+      "system": false,
+      "allow_custom_routing": false,
+      "replicated": false
+    }
+  ]
+}
+----
+// TESTRESPONSE[s/"index_name": ".ds-dsl-data-stream-2023.10.19-000001"/"index_name": $body.data_streams.0.indices.0.index_name/]
+// TESTRESPONSE[s/"index_uuid": "xCEhwsp8Tey0-FLNFYVwSg"/"index_uuid": $body.data_streams.0.indices.0.index_uuid/]
+// TESTRESPONSE[s/"index_name": ".ds-dsl-data-stream-2023.10.19-000002"/"index_name": $body.data_streams.0.indices.1.index_name/]
+// TESTRESPONSE[s/"index_uuid": "PA_JquKGSiKcAKBA8DJ5gw"/"index_uuid": $body.data_streams.0.indices.1.index_uuid/]
+// TESTRESPONSE[s/"index_name": ".ds-dsl-data-stream-2023.10.19-000003"/"index_name": $body.data_streams.0.indices.2.index_name/]
+// TESTRESPONSE[s/"index_uuid": "PA_JquKGSiKcAKBA8abcd1"/"index_uuid": $body.data_streams.0.indices.2.index_uuid/]
+// TESTRESPONSE[s/"status": "GREEN"/"status": "YELLOW"/]
+<1> The write index is now managed by {ilm-cap}
+<2> The `lifecycle` configured on the data stream is now disabled. 
+<3> The next write index will be managed by {ilm-cap}
+
+Had we removed the {ilm-cap} policy from the index template when we <<update-index-template-for-dsl, updated>>
+it, the write index of the data stream will now be `Unmanaged` because the index 
+wouldn't have the {ilm-cap} policy configured to fallback onto.
+
+//////////////////////////
+[source,console]
+--------------------------------------------------
+DELETE _data_stream/dsl-data-stream
+DELETE _index_template/dsl-data-stream-template
+DELETE _ilm/policy/pre-dsl-ilm-policy
+--------------------------------------------------
+// TEST[continued]
+
+//////////////////////////
+

+ 21 - 0
docs/reference/settings/data-stream-lifecycle-settings.asciidoc

@@ -30,6 +30,27 @@ this means that your data stream will rollover if any of the following condition
 * or the index reaches a certain age which depends on the retention time of your data stream,
 * **and** has at least one document.
 
+[[data-streams-lifecycle-target-merge-factor]]
+`data_streams.lifecycle.target.merge.policy.merge_factor`::
+(<<dynamic-cluster-setting,Dynamic>>, integer)
+Data stream lifecycle implements <<data-streams-lifecycle-how-it-works, tail merging>> by
+updating the lucene merge policy factor for the target backing index. The merge factor 
+is both the number of segments that should be merged together, and the maximum number 
+of segments that we expect to find on a given tier.
+This setting controls what value does <<data-stream-lifecycle, Data stream lifecycle>>
+configures on the target index. It defaults to `16`. 
+The value will be visible under the `index.merge.policy.merge_factor` index setting 
+on the target index.
+
+[[data-streams-lifecycle-target-floor-segment]]
+`data_streams.lifecycle.target.merge.policy.floor_segment`::
+(<<dynamic-cluster-setting,Dynamic>>)
+Data stream lifecycle implements <<data-streams-lifecycle-how-it-works, tail merging>> by
+updating the lucene merge policy floor segment for the target backing index. This floor 
+segment size is a way to prevent indices from having a long tail of very small segments. 
+This setting controls what value does <<data-stream-lifecycle, data stream lifecycle>>
+configures on the target index. It defaults to `100MB`.
+
 ==== Index level settings
 The following index-level settings are typically configured on the backing indices of a data stream.