123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237 |
- [[indices-flush]]
- == Flush
- The flush API allows to flush one or more indices through an API. The
- flush process of an index basically frees memory from the index by
- flushing data to the index storage and clearing the internal
- <<index-modules-translog,transaction log>>. By
- default, Elasticsearch uses memory heuristics in order to automatically
- trigger flush operations as required in order to clear memory.
- [source,js]
- --------------------------------------------------
- POST /twitter/_flush
- --------------------------------------------------
- // AUTOSENSE
- [float]
- [[flush-parameters]]
- === Request Parameters
- The flush API accepts the following request parameters:
- [horizontal]
- `wait_if_ongoing`:: If set to `true` the flush operation will block until the
- flush can be executed if another flush operation is already executing.
- The default is `false` and will cause an exception to be thrown on
- the shard level if another flush operation is already running.
- `force`:: Whether a flush should be forced even if it is not necessarily needed ie.
- if no changes will be committed to the index. This is useful if transaction log IDs
- should be incremented even if no uncommitted changes are present.
- (This setting can be considered as internal)
- [float]
- [[flush-multi-index]]
- === Multi Index
- The flush API can be applied to more than one index with a single call,
- or even on `_all` the indices.
- [source,js]
- --------------------------------------------------
- POST /kimchy,elasticsearch/_flush
- POST /_flush
- --------------------------------------------------
- // AUTOSENSE
- [[indices-synced-flush]]
- === Synced Flush
- Elasticsearch tracks the indexing activity of each shard. Shards that have not
- received any indexing operations for 5 minutes are automatically marked as inactive. This presents
- an opportunity for Elasticsearch to reduce shard resources and also perform
- a special kind of flush, called `synced flush`. A synced flush performs a normal flush, then adds
- a generated unique marker (sync_id) to all shards.
- Since the sync id marker was added when there were no ongoing indexing operations, it can
- be used as a quick way to check if the two shards' lucene indices are identical. This quick sync id
- comparison (if present) is used during recovery or restarts to skip the first and
- most costly phase of the process. In that case, no segment files need to be copied and
- the transaction log replay phase of the recovery can start immediately. Note that since the sync id
- marker was applied together with a flush, it is very likely that the transaction log will be empty,
- speeding up recoveries even more.
- This is particularly useful for use cases having lots of indices which are
- never or very rarely updated, such as time based data. This use case typically generates lots of indices whose
- recovery without the synced flush marker would take a long time.
- To check whether a shard has a marker or not, look for the `commit` section of shard stats returned by
- the <<indices-stats,indices stats>> API:
- [source,sh]
- --------------------------------------------------
- GET /twitter/_stats/commit?level=shards
- --------------------------------------------------
- // AUTOSENSE
- which returns something similar to:
- [source,js]
- --------------------------------------------------
- {
- ...
- "indices": {
- "twitter": {
- "primaries": {},
- "total": {},
- "shards": {
- "0": [
- {
- "routing": {
- ...
- },
- "commit": {
- "id": "te7zF7C4UsirqvL6jp/vUg==",
- "generation": 2,
- "user_data": {
- "sync_id": "AU2VU0meX-VX2aNbEUsD" <1>,
- ...
- },
- "num_docs": 0
- }
- }
- ...
- ],
- ...
- }
- }
- }
- }
- --------------------------------------------------
- <1> the `sync id` marker
- [float]
- === Synced Flush API
- The Synced Flush API allows an administrator to initiate a synced flush manually. This can be particularly useful for
- a planned (rolling) cluster restart where you can stop indexing and don't want to wait the default 5 minutes for
- idle indices to be sync-flushed automatically.
- While handy, there are a couple of caveats for this API:
- 1. Synced flush is a best effort operation. Any ongoing indexing operations will cause
- the synced flush to fail on that shard. This means that some shards may be synced flushed while others aren't. See below for more.
- 2. The `sync_id` marker is removed as soon as the shard is flushed again. That is because a flush replaces the low level
- lucene commit point where the marker is stored. Uncommitted operations in the transaction log do not remove the marker.
- In practice, one should consider any indexing operation on an index as removing the marker as a flush can be triggered by Elasticsearch
- at any time.
- NOTE: It is harmless to request a synced flush while there is ongoing indexing. Shards that are idle will succeed and shards
- that are not will fail. Any shards that succeeded will have faster recovery times.
- [source,sh]
- --------------------------------------------------
- POST /twitter/_flush/synced
- --------------------------------------------------
- // AUTOSENSE
- The response contains details about how many shards were successfully sync-flushed and information about any failure.
- Here is what it looks like when all shards of a two shards and one replica index successfully
- sync-flushed:
- [source,js]
- --------------------------------------------------
- {
- "_shards": {
- "total": 4,
- "successful": 4,
- "failed": 0
- },
- "twitter": {
- "total": 4,
- "successful": 4,
- "failed": 0
- }
- }
- --------------------------------------------------
- Here is what it looks like when one shard group failed due to pending operations:
- [source,js]
- --------------------------------------------------
- {
- "_shards": {
- "total": 4,
- "successful": 2,
- "failed": 2
- },
- "twitter": {
- "total": 4,
- "successful": 2,
- "failed": 2,
- "failures": [
- {
- "shard": 1,
- "reason": "[2] ongoing operations on primary"
- }
- ]
- }
- }
- --------------------------------------------------
- NOTE: The above error is shown when the synced flush failes due to concurrent indexing operations. The HTTP
- status code in that case will be `409 CONFLICT`.
- Sometimes the failures are specific to a shard copy. The copies that failed will not be eligible for
- fast recovery but those that succeeded still will be. This case is reported as follows:
- [source,js]
- --------------------------------------------------
- {
- "_shards": {
- "total": 4,
- "successful": 1,
- "failed": 1
- },
- "twitter": {
- "total": 4,
- "successful": 3,
- "failed": 1,
- "failures": [
- {
- "shard": 1,
- "reason": "unexpected error",
- "routing": {
- "state": "STARTED",
- "primary": false,
- "node": "SZNr2J_ORxKTLUCydGX4zA",
- "relocating_node": null,
- "shard": 1,
- "index": "twitter"
- }
- }
- ]
- }
- }
- --------------------------------------------------
- NOTE: When a shard copy fails to sync-flush, the HTTP status code returned will be `409 CONFLICT`.
- The synced flush API can be applied to more than one index with a single call,
- or even on `_all` the indices.
- [source,js]
- --------------------------------------------------
- POST /kimchy,elasticsearch/_flush/synced
- POST /_flush/synced
- --------------------------------------------------
- // AUTOSENSE
|