| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215 |
- [[java-rest-high-document-reindex]]
- === Reindex API
- [[java-rest-high-document-reindex-request]]
- ==== Reindex Request
- A `ReindexRequest` can be used to copy documents from one or more indexes into a destination index.
- It requires an existing source index and a target index which may or may not exist pre-request. Reindex does not attempt
- to set up the destination index. It does not copy the settings of the source index. You should set up the destination
- index prior to running a _reindex action, including setting up mappings, shard counts, replicas, etc.
- The simplest form of a `ReindexRequest` looks like follows:
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request]
- --------------------------------------------------
- <1> Creates the `ReindexRequest`
- <2> Adds a list of sources to copy from
- <3> Adds the destination index
- The `dest` element can be configured like the index API to control optimistic concurrency control. Just leaving out
- `versionType` (as above) or setting it to internal will cause Elasticsearch to blindly dump documents into the target.
- Setting `versionType` to external will cause Elasticsearch to preserve the version from the source, create any documents
- that are missing, and update any documents that have an older version in the destination index than they do in the
- source index.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-versionType]
- --------------------------------------------------
- <1> Set the versionType to `EXTERNAL`
- Setting `opType` to `create` will cause `_reindex` to only create missing documents in the target index. All existing
- documents will cause a version conflict. The default `opType` is `index`.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-opType]
- --------------------------------------------------
- <1> Set the opType to `create`
- By default version conflicts abort the `_reindex` process but you can just count them by settings it to `proceed`
- in the request body
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-conflicts]
- --------------------------------------------------
- <1> Set `proceed` on version conflict
- You can limit the documents by adding a type to the source or by adding a query.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-typeOrQuery]
- --------------------------------------------------
- <1> Only copy `doc` type
- <2> Only copy documents which have field `user` set to `kimchy`
- It’s also possible to limit the number of processed documents by setting size.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-size]
- --------------------------------------------------
- <1> Only copy 10 documents
- By default `_reindex` uses batches of 1000. You can change the batch size with `sourceBatchSize`.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-sourceSize]
- --------------------------------------------------
- <1> Use batches of 100 documents
- Reindex can also use the ingest feature by specifying a `pipeline`.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-pipeline]
- --------------------------------------------------
- <1> set pipeline to `my_pipeline`
- If you want a particular set of documents from the source index you’ll need to use sort. If possible, prefer a more
- selective query to size and sort.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-sort]
- --------------------------------------------------
- <1> add descending sort to`field1`
- <2> add ascending sort to `field2`
- `ReindexRequest` also supports a `script` that modifies the document. It allows you to also change the document's
- metadata. The following example illustrates that.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-script]
- --------------------------------------------------
- <1> `setScript` to increment the `likes` field on all documents with user `kimchy`.
- `ReindexRequest` supports reindexing from a remote Elasticsearch cluster. When using a remote cluster the query should be
- specified inside the `RemoteInfo` object and not using `setSourceQuery`. If both the remote info and the source query are
- set it results in a validation error during the request. The reason for this is that the remote Elasticsearch may not
- understand queries built by the modern query builders. The remote cluster support works all the way back to Elasticsearch
- 0.90 and the query language has changed since then. When reaching older versions, it is safer to write the query by hand
- in JSON.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-remote]
- --------------------------------------------------
- <1> set remote elastic cluster
- `ReindexRequest` also helps in automatically parallelizing using `sliced-scroll` to
- slice on `_uid`. Use `setSlices` to specify the number of slices to use.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-slices]
- --------------------------------------------------
- <1> set number of slices to use
- `ReindexRequest` uses the `scroll` parameter to control how long it keeps the "search context" alive.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-scroll]
- --------------------------------------------------
- <1> set scroll time
- ==== Optional arguments
- In addition to the options above the following arguments can optionally be also provided:
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-timeout]
- --------------------------------------------------
- <1> Timeout to wait for the reindex request to be performed as a `TimeValue`
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-refresh]
- --------------------------------------------------
- <1> Refresh index after calling reindex
- [[java-rest-high-document-reindex-sync]]
- ==== Synchronous Execution
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-execute]
- --------------------------------------------------
- [[java-rest-high-document-reindex-async]]
- ==== Asynchronous Execution
- The asynchronous execution of a reindex request requires both the `ReindexRequest`
- instance and an `ActionListener` instance to be passed to the asynchronous
- method:
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-execute-async]
- --------------------------------------------------
- <1> The `ReindexRequest` to execute and the `ActionListener` to use when
- the execution completes
- The asynchronous method does not block and returns immediately. Once it is
- completed the `ActionListener` is called back using the `onResponse` method
- if the execution successfully completed or using the `onFailure` method if
- it failed.
- A typical listener for `BulkByScrollResponse` looks like:
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-execute-listener]
- --------------------------------------------------
- <1> Called when the execution is successfully completed. The response is
- provided as an argument and contains a list of individual results for each
- operation that was executed. Note that one or more operations might have
- failed while the others have been successfully executed.
- <2> Called when the whole `ReindexRequest` fails. In this case the raised
- exception is provided as an argument and no operation has been executed.
- [[java-rest-high-document-reindex-response]]
- ==== Reindex Response
- The returned `BulkByScrollResponse` contains information about the executed operations and
- allows to iterate over each result as follows:
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-response]
- --------------------------------------------------
- <1> Get total time taken
- <2> Check if the request timed out
- <3> Get total number of docs processed
- <4> Number of docs that were updated
- <5> Number of docs that were created
- <6> Number of docs that were deleted
- <7> Number of batches that were executed
- <8> Number of skipped docs
- <9> Number of version conflicts
- <10> Number of times request had to retry bulk index operations
- <11> Number of times request had to retry search operations
- <12> The total time this request has throttled itself not including the current throttle time if it is currently sleeping
- <13> Remaining delay of any current throttle sleep or 0 if not sleeping
- <14> Failures during search phase
- <15> Failures during bulk index operation
|