|
@@ -136,7 +136,7 @@ POST _reindex
|
|
|
// TEST[setup:twitter]
|
|
|
|
|
|
You can limit the documents by adding a type to the `source` or by adding a
|
|
|
-query. This will only copy ++tweet++'s made by `kimchy` into `new_twitter`:
|
|
|
+query. This will only copy tweets made by `kimchy` into `new_twitter`:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
@@ -161,11 +161,13 @@ POST _reindex
|
|
|
|
|
|
`index` and `type` in `source` can both be lists, allowing you to copy from
|
|
|
lots of sources in one request. This will copy documents from the `_doc` and
|
|
|
-`post` types in the `twitter` and `blog` index. It'd include the `post` type in
|
|
|
-the `twitter` index and the `_doc` type in the `blog` index. If you want to be
|
|
|
-more specific you'll need to use the `query`. It also makes no effort to handle
|
|
|
-ID collisions. The target index will remain valid but it's not easy to predict
|
|
|
-which document will survive because the iteration order isn't well defined.
|
|
|
+`post` types in the `twitter` and `blog` index. The copied documents would include the
|
|
|
+`post` type in the `twitter` index and the `_doc` type in the `blog` index. For more
|
|
|
+specific parameters, you can use `query`.
|
|
|
+
|
|
|
+The Reindex API makes no effort to handle ID collisions. For such issues, the target index
|
|
|
+will remain valid, but it's not easy to predict which document will survive because
|
|
|
+the iteration order isn't well defined.
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
@@ -203,8 +205,8 @@ POST _reindex
|
|
|
// CONSOLE
|
|
|
// TEST[setup:twitter]
|
|
|
|
|
|
-If you want a particular set of documents from the twitter index you'll
|
|
|
-need to sort. Sorting makes the scroll less efficient but in some contexts
|
|
|
+If you want a particular set of documents from the `twitter` index you'll
|
|
|
+need to use `sort`. Sorting makes the scroll less efficient but in some contexts
|
|
|
it's worth it. If possible, prefer a more selective query to `size` and `sort`.
|
|
|
This will copy 10000 documents from `twitter` into `new_twitter`:
|
|
|
|
|
@@ -226,8 +228,8 @@ POST _reindex
|
|
|
// TEST[setup:twitter]
|
|
|
|
|
|
The `source` section supports all the elements that are supported in a
|
|
|
-<<search-request-body,search request>>. For instance only a subset of the
|
|
|
-fields from the original documents can be reindexed using source filtering
|
|
|
+<<search-request-body,search request>>. For instance, only a subset of the
|
|
|
+fields from the original documents can be reindexed using `source` filtering
|
|
|
as follows:
|
|
|
|
|
|
[source,js]
|
|
@@ -286,10 +288,10 @@ Set `ctx.op = "delete"` if your script decides that the document must be
|
|
|
deleted from the destination index. The deletion will be reported in the
|
|
|
`deleted` counter in the <<docs-reindex-response-body, response body>>.
|
|
|
|
|
|
-Setting `ctx.op` to anything else is an error. Setting any
|
|
|
-other field in `ctx` is an error.
|
|
|
+Setting `ctx.op` to anything else will return an error, as will setting any
|
|
|
+other field in `ctx`.
|
|
|
|
|
|
-Think of the possibilities! Just be careful! With great power.... You can
|
|
|
+Think of the possibilities! Just be careful; you are able to
|
|
|
change:
|
|
|
|
|
|
* `_id`
|
|
@@ -299,7 +301,7 @@ change:
|
|
|
* `_routing`
|
|
|
|
|
|
Setting `_version` to `null` or clearing it from the `ctx` map is just like not
|
|
|
-sending the version in an indexing request. It will cause that document to be
|
|
|
+sending the version in an indexing request; it will cause the document to be
|
|
|
overwritten in the target index regardless of the version on the target or the
|
|
|
version type you use in the `_reindex` request.
|
|
|
|
|
@@ -310,11 +312,11 @@ preserved unless it's changed by the script. You can set `routing` on the
|
|
|
`keep`::
|
|
|
|
|
|
Sets the routing on the bulk request sent for each match to the routing on
|
|
|
-the match. The default.
|
|
|
+the match. This is the default value.
|
|
|
|
|
|
`discard`::
|
|
|
|
|
|
-Sets the routing on the bulk request sent for each match to null.
|
|
|
+Sets the routing on the bulk request sent for each match to `null`.
|
|
|
|
|
|
`=<some text>`::
|
|
|
|
|
@@ -422,7 +424,7 @@ POST _reindex
|
|
|
|
|
|
The `host` parameter must contain a scheme, host, and port (e.g.
|
|
|
`https://otherhost:9200`). The `username` and `password` parameters are
|
|
|
-optional and when they are present reindex will connect to the remote
|
|
|
+optional, and when they are present `_reindex` will connect to the remote
|
|
|
Elasticsearch node using basic auth. Be sure to use `https` when using
|
|
|
basic auth or the password will be sent in plain text.
|
|
|
|
|
@@ -446,7 +448,7 @@ NOTE: Reindexing from remote clusters does not support
|
|
|
|
|
|
Reindexing from a remote server uses an on-heap buffer that defaults to a
|
|
|
maximum size of 100mb. If the remote index includes very large documents you'll
|
|
|
-need to use a smaller batch size. The example below sets the batch size `10`
|
|
|
+need to use a smaller batch size. The example below sets the batch size to `10`
|
|
|
which is very, very small.
|
|
|
|
|
|
[source,js]
|
|
@@ -477,8 +479,8 @@ POST _reindex
|
|
|
|
|
|
It is also possible to set the socket read timeout on the remote connection
|
|
|
with the `socket_timeout` field and the connection timeout with the
|
|
|
-`connect_timeout` field. Both default to thirty seconds. This example
|
|
|
-sets the socket read timeout to one minute and the connection timeout to ten
|
|
|
+`connect_timeout` field. Both default to 30 seconds. This example
|
|
|
+sets the socket read timeout to one minute and the connection timeout to 10
|
|
|
seconds:
|
|
|
|
|
|
[source,js]
|
|
@@ -533,14 +535,14 @@ for details. `timeout` controls how long each write request waits for unavailabl
|
|
|
shards to become available. Both work exactly how they work in the
|
|
|
<<docs-bulk,Bulk API>>. As `_reindex` uses scroll search, you can also specify
|
|
|
the `scroll` parameter to control how long it keeps the "search context" alive,
|
|
|
-eg `?scroll=10m`, by default it's 5 minutes.
|
|
|
+(e.g. `?scroll=10m`). The default value is 5 minutes.
|
|
|
|
|
|
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
|
|
|
-`1000`, etc) and throttles rate at which reindex issues batches of index
|
|
|
+`1000`, etc) and throttles the rate at which `_reindex` issues batches of index
|
|
|
operations by padding each batch with a wait time. The throttling can be
|
|
|
disabled by setting `requests_per_second` to `-1`.
|
|
|
|
|
|
-The throttling is done by waiting between batches so that scroll that reindex
|
|
|
+The throttling is done by waiting between batches so that the `scroll` which `_reindex`
|
|
|
uses internally can be given a timeout that takes into account the padding.
|
|
|
The padding time is the difference between the batch size divided by the
|
|
|
`requests_per_second` and the time spent writing. By default the batch size is
|
|
@@ -552,9 +554,9 @@ target_time = 1000 / 500 per second = 2 seconds
|
|
|
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
|
|
|
--------------------------------------------------
|
|
|
|
|
|
-Since the batch is issued as a single `_bulk` request large batch sizes will
|
|
|
+Since the batch is issued as a single `_bulk` request, large batch sizes will
|
|
|
cause Elasticsearch to create many requests and then wait for a while before
|
|
|
-starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
|
|
|
+starting the next set. This is "bursty" instead of "smooth". The default value is `-1`.
|
|
|
|
|
|
[float]
|
|
|
[[docs-reindex-response-body]]
|
|
@@ -606,12 +608,12 @@ The JSON response looks like this:
|
|
|
|
|
|
`took`::
|
|
|
|
|
|
-The number of milliseconds from start to end of the whole operation.
|
|
|
+The total milliseconds the entire operation took.
|
|
|
|
|
|
`timed_out`::
|
|
|
|
|
|
This flag is set to `true` if any of the requests executed during the
|
|
|
-reindex has timed out.
|
|
|
+reindex timed out.
|
|
|
|
|
|
`total`::
|
|
|
|
|
@@ -657,7 +659,7 @@ The number of requests per second effectively executed during the reindex.
|
|
|
|
|
|
`throttled_until_millis`::
|
|
|
|
|
|
-This field should always be equal to zero in a delete by query response. It only
|
|
|
+This field should always be equal to zero in a `_delete_by_query` response. It only
|
|
|
has meaning when using the <<docs-reindex-task-api, Task API>>, where it
|
|
|
indicates the next time (in milliseconds since epoch) a throttled request will be
|
|
|
executed again in order to conform to `requests_per_second`.
|
|
@@ -681,7 +683,7 @@ GET _tasks?detailed=true&actions=*reindex
|
|
|
--------------------------------------------------
|
|
|
// CONSOLE
|
|
|
|
|
|
-The responses looks like:
|
|
|
+The response looks like:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
@@ -726,9 +728,9 @@ The responses looks like:
|
|
|
// NOTCONSOLE
|
|
|
// We can't test tasks output
|
|
|
|
|
|
-<1> this object contains the actual status. It is just like the response json
|
|
|
-with the important addition of the `total` field. `total` is the total number
|
|
|
-of operations that the reindex expects to perform. You can estimate the
|
|
|
+<1> this object contains the actual status. It is identical to the response JSON
|
|
|
+except for the important addition of the `total` field. `total` is the total number
|
|
|
+of operations that the `_reindex` expects to perform. You can estimate the
|
|
|
progress by adding the `updated`, `created`, and `deleted` fields. The request
|
|
|
will finish when their sum is equal to the `total` field.
|
|
|
|
|
@@ -743,7 +745,7 @@ GET /_tasks/taskId:1
|
|
|
|
|
|
The advantage of this API is that it integrates with `wait_for_completion=false`
|
|
|
to transparently return the status of completed tasks. If the task is completed
|
|
|
-and `wait_for_completion=false` was set on it them it'll come back with a
|
|
|
+and `wait_for_completion=false` was set, it will return a
|
|
|
`results` or an `error` field. The cost of this feature is the document that
|
|
|
`wait_for_completion=false` creates at `.tasks/task/${taskId}`. It is up to
|
|
|
you to delete that document.
|
|
@@ -761,10 +763,10 @@ POST _tasks/task_id:1/_cancel
|
|
|
--------------------------------------------------
|
|
|
// CONSOLE
|
|
|
|
|
|
-The `task_id` can be found using the tasks API above.
|
|
|
+The `task_id` can be found using the Tasks API.
|
|
|
|
|
|
-Cancelation should happen quickly but might take a few seconds. The task status
|
|
|
-API above will continue to list the task until it is wakes to cancel itself.
|
|
|
+Cancelation should happen quickly but might take a few seconds. The Tasks
|
|
|
+API will continue to list the task until it wakes to cancel itself.
|
|
|
|
|
|
|
|
|
[float]
|
|
@@ -780,9 +782,9 @@ POST _reindex/task_id:1/_rethrottle?requests_per_second=-1
|
|
|
--------------------------------------------------
|
|
|
// CONSOLE
|
|
|
|
|
|
-The `task_id` can be found using the tasks API above.
|
|
|
+The `task_id` can be found using the Tasks API above.
|
|
|
|
|
|
-Just like when setting it on the `_reindex` API `requests_per_second`
|
|
|
+Just like when setting it on the Reindex API, `requests_per_second`
|
|
|
can be either `-1` to disable throttling or any decimal number
|
|
|
like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
|
|
|
query takes effect immediately but rethrotting that slows down the query will
|
|
@@ -806,7 +808,7 @@ POST test/_doc/1?refresh
|
|
|
--------------------------------------------------
|
|
|
// CONSOLE
|
|
|
|
|
|
-But you don't like the name `flag` and want to replace it with `tag`.
|
|
|
+but you don't like the name `flag` and want to replace it with `tag`.
|
|
|
`_reindex` can create the other index for you:
|
|
|
|
|
|
[source,js]
|
|
@@ -836,7 +838,7 @@ GET test2/_doc/1
|
|
|
// CONSOLE
|
|
|
// TEST[continued]
|
|
|
|
|
|
-and it'll look like:
|
|
|
+which will return:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
@@ -854,8 +856,6 @@ and it'll look like:
|
|
|
--------------------------------------------------
|
|
|
// TESTRESPONSE
|
|
|
|
|
|
-Or you can search by `tag` or whatever you want.
|
|
|
-
|
|
|
[float]
|
|
|
[[docs-reindex-slice]]
|
|
|
=== Slicing
|
|
@@ -902,7 +902,7 @@ POST _reindex
|
|
|
// CONSOLE
|
|
|
// TEST[setup:big_twitter]
|
|
|
|
|
|
-Which you can verify works with:
|
|
|
+You can verify this works by:
|
|
|
|
|
|
[source,js]
|
|
|
----------------------------------------------------------------
|
|
@@ -912,7 +912,7 @@ POST new_twitter/_search?size=0&filter_path=hits.total
|
|
|
// CONSOLE
|
|
|
// TEST[continued]
|
|
|
|
|
|
-Which results in a sensible `total` like this one:
|
|
|
+which results in a sensible `total` like this one:
|
|
|
|
|
|
[source,js]
|
|
|
----------------------------------------------------------------
|
|
@@ -928,7 +928,7 @@ Which results in a sensible `total` like this one:
|
|
|
[[docs-reindex-automatic-slice]]
|
|
|
==== Automatic slicing
|
|
|
|
|
|
-You can also let reindex automatically parallelize using <<sliced-scroll>> to
|
|
|
+You can also let `_reindex` automatically parallelize using <<sliced-scroll>> to
|
|
|
slice on `_uid`. Use `slices` to specify the number of slices to use:
|
|
|
|
|
|
[source,js]
|
|
@@ -946,7 +946,7 @@ POST _reindex?slices=5&refresh
|
|
|
// CONSOLE
|
|
|
// TEST[setup:big_twitter]
|
|
|
|
|
|
-Which you also can verify works with:
|
|
|
+You can also this verify works by:
|
|
|
|
|
|
[source,js]
|
|
|
----------------------------------------------------------------
|
|
@@ -955,7 +955,7 @@ POST new_twitter/_search?size=0&filter_path=hits.total
|
|
|
// CONSOLE
|
|
|
// TEST[continued]
|
|
|
|
|
|
-Which results in a sensible `total` like this one:
|
|
|
+which results in a sensible `total` like this one:
|
|
|
|
|
|
[source,js]
|
|
|
----------------------------------------------------------------
|
|
@@ -979,7 +979,7 @@ section above, creating sub-requests which means it has some quirks:
|
|
|
sub-requests are "child" tasks of the task for the request with `slices`.
|
|
|
* Fetching the status of the task for the request with `slices` only contains
|
|
|
the status of completed slices.
|
|
|
-* These sub-requests are individually addressable for things like cancellation
|
|
|
+* These sub-requests are individually addressable for things like cancelation
|
|
|
and rethrottling.
|
|
|
* Rethrottling the request with `slices` will rethrottle the unfinished
|
|
|
sub-request proportionally.
|
|
@@ -992,7 +992,7 @@ are distributed proportionally to each sub-request. Combine that with the point
|
|
|
above about distribution being uneven and you should conclude that the using
|
|
|
`size` with `slices` might not result in exactly `size` documents being
|
|
|
`_reindex`ed.
|
|
|
-* Each sub-requests gets a slightly different snapshot of the source index
|
|
|
+* Each sub-request gets a slightly different snapshot of the source index,
|
|
|
though these are all taken at approximately the same time.
|
|
|
|
|
|
[float]
|
|
@@ -1000,12 +1000,12 @@ though these are all taken at approximately the same time.
|
|
|
===== Picking the number of slices
|
|
|
|
|
|
If slicing automatically, setting `slices` to `auto` will choose a reasonable
|
|
|
-number for most indices. If you're slicing manually or otherwise tuning
|
|
|
+number for most indices. If slicing manually or otherwise tuning
|
|
|
automatic slicing, use these guidelines.
|
|
|
|
|
|
Query performance is most efficient when the number of `slices` is equal to the
|
|
|
-number of shards in the index. If that number is large, (for example,
|
|
|
-500) choose a lower number as too many `slices` will hurt performance. Setting
|
|
|
+number of shards in the index. If that number is large (e.g. 500),
|
|
|
+choose a lower number as too many `slices` will hurt performance. Setting
|
|
|
`slices` higher than the number of shards generally does not improve efficiency
|
|
|
and adds overhead.
|
|
|
|
|
@@ -1018,10 +1018,10 @@ documents being reindexed and cluster resources.
|
|
|
[float]
|
|
|
=== Reindex daily indices
|
|
|
|
|
|
-You can use `_reindex` in combination with <<modules-scripting-painless, Painless>>
|
|
|
- to reindex daily indices to apply a new template to the existing documents.
|
|
|
+You can use `_reindex` in combination with <<modules-scripting-painless, Painless>>
|
|
|
+to reindex daily indices to apply a new template to the existing documents.
|
|
|
|
|
|
-Assuming you have indices consisting of documents as following:
|
|
|
+Assuming you have indices consisting of documents as follows:
|
|
|
|
|
|
[source,js]
|
|
|
----------------------------------------------------------------
|
|
@@ -1032,12 +1032,12 @@ PUT metricbeat-2016.05.31/_doc/1?refresh
|
|
|
----------------------------------------------------------------
|
|
|
// CONSOLE
|
|
|
|
|
|
-The new template for the `metricbeat-*` indices is already loaded into Elasticsearch
|
|
|
+The new template for the `metricbeat-*` indices is already loaded into Elasticsearch,
|
|
|
but it applies only to the newly created indices. Painless can be used to reindex
|
|
|
the existing documents and apply the new template.
|
|
|
|
|
|
The script below extracts the date from the index name and creates a new index
|
|
|
-with `-1` appended. All data from `metricbeat-2016.05.31` will be reindex
|
|
|
+with `-1` appended. All data from `metricbeat-2016.05.31` will be reindexed
|
|
|
into `metricbeat-2016.05.31-1`.
|
|
|
|
|
|
[source,js]
|
|
@@ -1059,7 +1059,7 @@ POST _reindex
|
|
|
// CONSOLE
|
|
|
// TEST[continued]
|
|
|
|
|
|
-All documents from the previous metricbeat indices now can be found in the `*-1` indices.
|
|
|
+All documents from the previous metricbeat indices can now be found in the `*-1` indices.
|
|
|
|
|
|
[source,js]
|
|
|
----------------------------------------------------------------
|
|
@@ -1069,13 +1069,13 @@ GET metricbeat-2016.05.31-1/_doc/1
|
|
|
// CONSOLE
|
|
|
// TEST[continued]
|
|
|
|
|
|
-The previous method can also be used in combination with <<docs-reindex-change-name, change the name of a field>>
|
|
|
-to only load the existing data into the new index, but also rename fields if needed.
|
|
|
+The previous method can also be used in conjunction with <<docs-reindex-change-name, change the name of a field>>
|
|
|
+to load only the existing data into the new index and rename any fields if needed.
|
|
|
|
|
|
[float]
|
|
|
=== Extracting a random subset of an index
|
|
|
|
|
|
-Reindex can be used to extract a random subset of an index for testing:
|
|
|
+`_reindex` can be used to extract a random subset of an index for testing:
|
|
|
|
|
|
[source,js]
|
|
|
----------------------------------------------------------------
|
|
@@ -1100,5 +1100,5 @@ POST _reindex
|
|
|
// CONSOLE
|
|
|
// TEST[setup:big_twitter]
|
|
|
|
|
|
-<1> Reindex defaults to sorting by `_doc` so `random_score` won't have any
|
|
|
+<1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any
|
|
|
effect unless you override the sort to `_score`.
|