|
@@ -49,25 +49,25 @@ POST /_flush
|
|
|
[[indices-synced-flush]]
|
|
|
=== Synced Flush
|
|
|
|
|
|
-Elasticsearch tracks the indexing activity of each shards. Shards that have not
|
|
|
-received any indexing operations for 30 minutes (configurable) are automatically marked as inactive. This presents
|
|
|
+Elasticsearch tracks the indexing activity of each shard. Shards that have not
|
|
|
+received any indexing operations for 30 minutes are automatically marked as inactive. This presents
|
|
|
an opportunity for Elasticsearch to reduce shard resources and also perform
|
|
|
-a special kind of flush, called `synced flush`. A synced flush performs normal
|
|
|
-flushing and adds a special uniquely generated marker (`sync_id`) to all shards.
|
|
|
+a special kind of flush, called `synced flush`. A synced flush performs a normal flush, then adds
|
|
|
+a generated unique marker (sync_id) to all shards.
|
|
|
|
|
|
Since the sync id marker was added when there were no ongoing indexing operations, it can
|
|
|
-be used as a quick way to check if two shards indices are identical. This quick sync id
|
|
|
+be used as a quick way to check if the two shards' lucene indices are identical. This quick sync id
|
|
|
comparison (if present) is used during recovery or restarts to skip the first and
|
|
|
most costly phase of the process. In that case, no segment files need to be copied and
|
|
|
the transaction log replay phase of the recovery can start immediately. Note that since the sync id
|
|
|
-marker was applied together with a flush, it is highly likely that the transaction log will be empty,
|
|
|
+marker was applied together with a flush, it is very likely that the transaction log will be empty,
|
|
|
speeding up recoveries even more.
|
|
|
|
|
|
This is particularly useful for use cases having lots of indices which are
|
|
|
never or very rarely updated, such as time based data. This use case typically generates lots of indices whose
|
|
|
recovery without the synced flush marker would take a long time.
|
|
|
|
|
|
-To check whether a shard has a marker or not, one can use the `commit` section of shard stats returned by
|
|
|
+To check whether a shard has a marker or not, look for the `commit` section of shard stats returned by
|
|
|
the <<indices-stats,indices stats>> API:
|
|
|
|
|
|
[source,bash]
|
|
@@ -76,23 +76,64 @@ GET /twitter/_stats/commit?level=shards
|
|
|
--------------------------------------------------
|
|
|
// AUTOSENSE
|
|
|
|
|
|
+
|
|
|
+which returns something similar to:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+{
|
|
|
+ ...
|
|
|
+ "indices": {
|
|
|
+ "twitter": {
|
|
|
+ "primaries": {},
|
|
|
+ "total": {},
|
|
|
+ "shards": {
|
|
|
+ "0": [
|
|
|
+ {
|
|
|
+ "routing": {
|
|
|
+ ...
|
|
|
+ },
|
|
|
+ "commit": {
|
|
|
+ "id": "te7zF7C4UsirqvL6jp/vUg==",
|
|
|
+ "generation": 2,
|
|
|
+ "user_data": {
|
|
|
+ "sync_id": "AU2VU0meX-VX2aNbEUsD" <1>,
|
|
|
+ ...
|
|
|
+ },
|
|
|
+ "num_docs": 0
|
|
|
+ }
|
|
|
+ }
|
|
|
+ ...
|
|
|
+ ],
|
|
|
+ ...
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+<1> the `sync id` marker
|
|
|
+
|
|
|
[float]
|
|
|
=== Synced Flush API
|
|
|
|
|
|
The Synced Flush API allows an administrator to initiate a synced flush manually. This can be particularly useful for
|
|
|
-a planned (rolling) cluster restart where one can stop indexing and doesn't want to wait for the default 30 minutes to pass
|
|
|
-when the synced flush will be performed automatically.
|
|
|
+a planned (rolling) cluster restart where you can stop indexing and don't want to wait the default 30 minutes for
|
|
|
+idle indices to be sync-flushed automatically.
|
|
|
|
|
|
While handy, there are a couple of caveats for this API:
|
|
|
|
|
|
1. Synced flush is a best effort operation. Any ongoing indexing operations will cause
|
|
|
-the synced flush to fail. This means that some shards may be synced flushed while others aren't. See below for more.
|
|
|
+the synced flush to fail on that shard. This means that some shards may be synced flushed while others aren't. See below for more.
|
|
|
2. The `sync_id` marker is removed as soon as the shard is flushed again. That is because a flush replaces the low level
|
|
|
lucene commit point where the marker is stored. Uncommitted operations in the transaction log do not remove the marker.
|
|
|
In practice, one should consider any indexing operation on an index as removing the marker as a flush can be triggered by Elasticsearch
|
|
|
at any time.
|
|
|
|
|
|
|
|
|
+NOTE: It is harmless to request a synced flush while there is ongoing indexing. Shards that are idle will succeed and shards
|
|
|
+ that are not will fail. Any shards that succeeded will have faster recovery times.
|
|
|
+
|
|
|
+
|
|
|
[source,bash]
|
|
|
--------------------------------------------------
|
|
|
POST /twitter/_flush/synced
|
|
@@ -145,6 +186,8 @@ Here is what it looks like when one shard group failed due to pending operations
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
|
|
|
+NOTE: The above error is shown when the synced flush failes due to concurrent indexing operations. The HTTP
|
|
|
+status code in that case will be `409 CONFLICT`.
|
|
|
|
|
|
Sometimes the failures are specific to a shard copy. The copies that failed will not be eligible for
|
|
|
fast recovery but those that succeeded still will be. This case is reported as follows:
|
|
@@ -180,6 +223,8 @@ fast recovery but those that succeeded still will be. This case is reported as f
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
|
+NOTE: When a shard copy fails to sync-flush, the HTTP status code returned will be `409 CONFLICT`.
|
|
|
+
|
|
|
The synced flush API can be applied to more than one index with a single call,
|
|
|
or even on `_all` the indices.
|
|
|
|