123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260 |
- [[indices-flush]]
- === Flush
- Flushing an index is the process of making sure that any data that is currently
- only stored in the <<index-modules-translog,transaction log>> is also
- permanently stored in the Lucene index. When restarting, {es} replays any
- unflushed operations from the transaction log into the Lucene index to bring it
- back into the state that it was in before the restart. {es} automatically
- triggers flushes as needed, using heuristics that trade off the size of the
- unflushed transaction log against the cost of performing each flush.
- Once each operation has been flushed it is permanently stored in the Lucene
- index. This may mean that there is no need to maintain an additional copy of it
- in the transaction log, unless <<index-modules-translog-retention,it is retained
- for some other reason>>. The transaction log is made up of multiple files,
- called _generations_, and {es} will delete any generation files once they are no
- longer needed, freeing up disk space.
- It is also possible to trigger a flush on one or more indices using the flush
- API, although it is rare for users to need to call this API directly. If you
- call the flush API after indexing some documents then a successful response
- indicates that {es} has flushed all the documents that were indexed before the
- flush API was called.
- [source,console]
- --------------------------------------------------
- POST twitter/_flush
- --------------------------------------------------
- // TEST[setup:twitter]
- [float]
- [[flush-parameters]]
- ==== Request Parameters
- The flush API accepts the following request parameters:
- [horizontal]
- `wait_if_ongoing`:: If set to `true` the flush operation will block until the
- flush can be executed if another flush operation is already executing. If set to
- `false` then an exception will be thrown on the shard level if another flush
- operation is already running. Defaults to `true`.
- `force`:: Whether a flush should be forced even if it is not necessarily needed
- i.e. if no changes will be committed to the index. This can be used to force
- the generation number of the transaction log to be incremented even if no
- uncommitted changes are present. This parameter should be considered internal.
- [float]
- [[flush-multi-index]]
- ==== Multi Index
- The flush API can be applied to more than one index with a single call, or even
- on `_all` the indices.
- [source,console]
- --------------------------------------------------
- POST kimchy,elasticsearch/_flush
- POST _flush
- --------------------------------------------------
- // TEST[s/^/PUT kimchy\nPUT elasticsearch\n/]
- [[synced-flush-api]]
- ==== Synced Flush
- {es} keeps track of which shards have received indexing activity recently, and
- considers shards that have not received any indexing operations for 5 minutes to
- be inactive. When a shard becomes inactive {es} performs a special kind of flush
- known as a _synced flush_. A synced flush performs a normal
- <<indices-flush,flush>> on each copy of the shard, and then adds a marker known
- as the `sync_id` to each copy to indicate that these copies have identical
- Lucene indices. Comparing the `sync_id` markers of the two copies is a very
- efficient way to check whether they have identical contents.
- When allocating shard copies, {es} must ensure that each replica contains the
- same data as the primary. If the shard copies have been synced-flushed and the
- replica shares a `sync_id` with the primary then {es} knows that the two copies
- have identical contents. This means there is no need to copy any segment files
- from the primary to the replica, which saves a good deal of time during
- recoveries and restarts.
- This is particularly useful for clusters having lots of indices which are very
- rarely updated, such as with time-based indices. Without the synced flush
- marker, recovery of this kind of cluster would be much slower.
- To check whether a shard has a `sync_id` marker or not, look for the `commit`
- section of the shard stats returned by the <<indices-stats,indices stats>> API:
- [source,console]
- --------------------------------------------------
- GET twitter/_stats?filter_path=**.commit&level=shards <1>
- --------------------------------------------------
- // TEST[s/^/PUT twitter\nPOST twitter\/_flush\/synced\n/]
- <1> `filter_path` is used to reduce the verbosity of the response, but is entirely optional
- which returns something similar to:
- [source,console-result]
- --------------------------------------------------
- {
- "indices": {
- "twitter": {
- "shards": {
- "0": [
- {
- "commit" : {
- "id" : "3M3zkw2GHMo2Y4h4/KFKCg==",
- "generation" : 3,
- "user_data" : {
- "translog_uuid" : "hnOG3xFcTDeoI_kvvvOdNA",
- "history_uuid" : "XP7KDJGiS1a2fHYiFL5TXQ",
- "local_checkpoint" : "-1",
- "translog_generation" : "2",
- "max_seq_no" : "-1",
- "sync_id" : "AVvFY-071siAOuFGEO9P", <1>
- "max_unsafe_auto_id_timestamp" : "-1",
- "min_retained_seq_no" : "0"
- },
- "num_docs" : 0
- }
- }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"id" : "3M3zkw2GHMo2Y4h4\/KFKCg=="/"id": $body.indices.twitter.shards.0.0.commit.id/]
- // TESTRESPONSE[s/"translog_uuid" : "hnOG3xFcTDeoI_kvvvOdNA"/"translog_uuid": $body.indices.twitter.shards.0.0.commit.user_data.translog_uuid/]
- // TESTRESPONSE[s/"history_uuid" : "XP7KDJGiS1a2fHYiFL5TXQ"/"history_uuid": $body.indices.twitter.shards.0.0.commit.user_data.history_uuid/]
- // TESTRESPONSE[s/"sync_id" : "AVvFY-071siAOuFGEO9P"/"sync_id": $body.indices.twitter.shards.0.0.commit.user_data.sync_id/]
- <1> the `sync id` marker
- NOTE: The `sync_id` marker is removed as soon as the shard is flushed again, and
- {es} may trigger an automatic flush of a shard at any time if there are
- unflushed operations in the shard's translog. In practice this means that one
- should consider any indexing operation on an index as having removed its
- `sync_id` markers.
- [float]
- ==== Synced Flush API
- The Synced Flush API allows an administrator to initiate a synced flush
- manually. This can be particularly useful for a planned cluster restart where
- you can stop indexing but don't want to wait for 5 minutes until all indices
- are marked as inactive and automatically sync-flushed.
- You can request a synced flush even if there is ongoing indexing activity, and
- {es} will perform the synced flush on a "best-effort" basis: shards that do not
- have any ongoing indexing activity will be successfully sync-flushed, and other
- shards will fail to sync-flush. The successfully sync-flushed shards will have
- faster recovery times as long as the `sync_id` marker is not removed by a
- subsequent flush.
- [source,console]
- --------------------------------------------------
- POST twitter/_flush/synced
- --------------------------------------------------
- // TEST[setup:twitter]
- The response contains details about how many shards were successfully
- sync-flushed and information about any failure.
- Here is what it looks like when all shards of a two shards and one replica
- index successfully sync-flushed:
- [source,console-result]
- --------------------------------------------------
- {
- "_shards": {
- "total": 2,
- "successful": 2,
- "failed": 0
- },
- "twitter": {
- "total": 2,
- "successful": 2,
- "failed": 0
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"successful": 2/"successful": 1/]
- Here is what it looks like when one shard group failed due to pending
- operations:
- [source,js]
- --------------------------------------------------
- {
- "_shards": {
- "total": 4,
- "successful": 2,
- "failed": 2
- },
- "twitter": {
- "total": 4,
- "successful": 2,
- "failed": 2,
- "failures": [
- {
- "shard": 1,
- "reason": "[2] ongoing operations on primary"
- }
- ]
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- NOTE: The above error is shown when the synced flush fails due to concurrent
- indexing operations. The HTTP status code in that case will be `409 Conflict`.
- Sometimes the failures are specific to a shard copy. The copies that failed
- will not be eligible for fast recovery but those that succeeded still will be.
- This case is reported as follows:
- [source,js]
- --------------------------------------------------
- {
- "_shards": {
- "total": 4,
- "successful": 1,
- "failed": 1
- },
- "twitter": {
- "total": 4,
- "successful": 3,
- "failed": 1,
- "failures": [
- {
- "shard": 1,
- "reason": "unexpected error",
- "routing": {
- "state": "STARTED",
- "primary": false,
- "node": "SZNr2J_ORxKTLUCydGX4zA",
- "relocating_node": null,
- "shard": 1,
- "index": "twitter"
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- NOTE: When a shard copy fails to sync-flush, the HTTP status code returned will
- be `409 Conflict`.
- The synced flush API can be applied to more than one index with a single call,
- or even on `_all` the indices.
- [source,console]
- --------------------------------------------------
- POST kimchy,elasticsearch/_flush/synced
- POST _flush/synced
- --------------------------------------------------
|