reindex.asciidoc 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
  1. [[java-rest-high-document-reindex]]
  2. === Reindex API
  3. [[java-rest-high-document-reindex-request]]
  4. ==== Reindex Request
  5. A `ReindexRequest` can be used to copy documents from one or more indexes into a destination index.
  6. It requires an existing source index and a target index which may or may not exist pre-request. Reindex does not attempt
  7. to set up the destination index. It does not copy the settings of the source index. You should set up the destination
  8. index prior to running a _reindex action, including setting up mappings, shard counts, replicas, etc.
  9. The simplest form of a `ReindexRequest` looks like follows:
  10. ["source","java",subs="attributes,callouts,macros"]
  11. --------------------------------------------------
  12. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request]
  13. --------------------------------------------------
  14. <1> Creates the `ReindexRequest`
  15. <2> Adds a list of sources to copy from
  16. <3> Adds the destination index
  17. The `dest` element can be configured like the index API to control optimistic concurrency control. Just leaving out
  18. `versionType` (as above) or setting it to internal will cause Elasticsearch to blindly dump documents into the target.
  19. Setting `versionType` to external will cause Elasticsearch to preserve the version from the source, create any documents
  20. that are missing, and update any documents that have an older version in the destination index than they do in the
  21. source index.
  22. ["source","java",subs="attributes,callouts,macros"]
  23. --------------------------------------------------
  24. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-versionType]
  25. --------------------------------------------------
  26. <1> Set the versionType to `EXTERNAL`
  27. Setting `opType` to `create` will cause `_reindex` to only create missing documents in the target index. All existing
  28. documents will cause a version conflict. The default `opType` is `index`.
  29. ["source","java",subs="attributes,callouts,macros"]
  30. --------------------------------------------------
  31. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-opType]
  32. --------------------------------------------------
  33. <1> Set the opType to `create`
  34. By default version conflicts abort the `_reindex` process but you can just count them by settings it to `proceed`
  35. in the request body
  36. ["source","java",subs="attributes,callouts,macros"]
  37. --------------------------------------------------
  38. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-conflicts]
  39. --------------------------------------------------
  40. <1> Set `proceed` on version conflict
  41. You can limit the documents by adding a type to the source or by adding a query.
  42. ["source","java",subs="attributes,callouts,macros"]
  43. --------------------------------------------------
  44. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-typeOrQuery]
  45. --------------------------------------------------
  46. <1> Only copy `doc` type
  47. <2> Only copy documents which have field `user` set to `kimchy`
  48. It’s also possible to limit the number of processed documents by setting size.
  49. ["source","java",subs="attributes,callouts,macros"]
  50. --------------------------------------------------
  51. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-size]
  52. --------------------------------------------------
  53. <1> Only copy 10 documents
  54. By default `_reindex` uses batches of 1000. You can change the batch size with `sourceBatchSize`.
  55. ["source","java",subs="attributes,callouts,macros"]
  56. --------------------------------------------------
  57. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-sourceSize]
  58. --------------------------------------------------
  59. <1> Use batches of 100 documents
  60. Reindex can also use the ingest feature by specifying a `pipeline`.
  61. ["source","java",subs="attributes,callouts,macros"]
  62. --------------------------------------------------
  63. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-pipeline]
  64. --------------------------------------------------
  65. <1> set pipeline to `my_pipeline`
  66. If you want a particular set of documents from the source index you’ll need to use sort. If possible, prefer a more
  67. selective query to size and sort.
  68. ["source","java",subs="attributes,callouts,macros"]
  69. --------------------------------------------------
  70. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-sort]
  71. --------------------------------------------------
  72. <1> add descending sort to`field1`
  73. <2> add ascending sort to `field2`
  74. `ReindexRequest` also supports a `script` that modifies the document. It allows you to also change the document's
  75. metadata. The following example illustrates that.
  76. ["source","java",subs="attributes,callouts,macros"]
  77. --------------------------------------------------
  78. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-script]
  79. --------------------------------------------------
  80. <1> `setScript` to increment the `likes` field on all documents with user `kimchy`.
  81. `ReindexRequest` supports reindexing from a remote Elasticsearch cluster. When using a remote cluster the query should be
  82. specified inside the `RemoteInfo` object and not using `setSourceQuery`. If both the remote info and the source query are
  83. set it results in a validation error during the request. The reason for this is that the remote Elasticsearch may not
  84. understand queries built by the modern query builders. The remote cluster support works all the way back to Elasticsearch
  85. 0.90 and the query language has changed since then. When reaching older versions, it is safer to write the query by hand
  86. in JSON.
  87. ["source","java",subs="attributes,callouts,macros"]
  88. --------------------------------------------------
  89. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-remote]
  90. --------------------------------------------------
  91. <1> set remote elastic cluster
  92. `ReindexRequest` also helps in automatically parallelizing using `sliced-scroll` to
  93. slice on `_uid`. Use `setSlices` to specify the number of slices to use.
  94. ["source","java",subs="attributes,callouts,macros"]
  95. --------------------------------------------------
  96. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-slices]
  97. --------------------------------------------------
  98. <1> set number of slices to use
  99. `ReindexRequest` uses the `scroll` parameter to control how long it keeps the "search context" alive.
  100. ["source","java",subs="attributes,callouts,macros"]
  101. --------------------------------------------------
  102. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-scroll]
  103. --------------------------------------------------
  104. <1> set scroll time
  105. ==== Optional arguments
  106. In addition to the options above the following arguments can optionally be also provided:
  107. ["source","java",subs="attributes,callouts,macros"]
  108. --------------------------------------------------
  109. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-timeout]
  110. --------------------------------------------------
  111. <1> Timeout to wait for the reindex request to be performed as a `TimeValue`
  112. ["source","java",subs="attributes,callouts,macros"]
  113. --------------------------------------------------
  114. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-request-refresh]
  115. --------------------------------------------------
  116. <1> Refresh index after calling reindex
  117. [[java-rest-high-document-reindex-sync]]
  118. ==== Synchronous Execution
  119. ["source","java",subs="attributes,callouts,macros"]
  120. --------------------------------------------------
  121. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-execute]
  122. --------------------------------------------------
  123. [[java-rest-high-document-reindex-async]]
  124. ==== Asynchronous Execution
  125. The asynchronous execution of a reindex request requires both the `ReindexRequest`
  126. instance and an `ActionListener` instance to be passed to the asynchronous
  127. method:
  128. ["source","java",subs="attributes,callouts,macros"]
  129. --------------------------------------------------
  130. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-execute-async]
  131. --------------------------------------------------
  132. <1> The `ReindexRequest` to execute and the `ActionListener` to use when
  133. the execution completes
  134. The asynchronous method does not block and returns immediately. Once it is
  135. completed the `ActionListener` is called back using the `onResponse` method
  136. if the execution successfully completed or using the `onFailure` method if
  137. it failed.
  138. A typical listener for `BulkByScrollResponse` looks like:
  139. ["source","java",subs="attributes,callouts,macros"]
  140. --------------------------------------------------
  141. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-execute-listener]
  142. --------------------------------------------------
  143. <1> Called when the execution is successfully completed. The response is
  144. provided as an argument and contains a list of individual results for each
  145. operation that was executed. Note that one or more operations might have
  146. failed while the others have been successfully executed.
  147. <2> Called when the whole `ReindexRequest` fails. In this case the raised
  148. exception is provided as an argument and no operation has been executed.
  149. [[java-rest-high-document-reindex-response]]
  150. ==== Reindex Response
  151. The returned `BulkByScrollResponse` contains information about the executed operations and
  152. allows to iterate over each result as follows:
  153. ["source","java",subs="attributes,callouts,macros"]
  154. --------------------------------------------------
  155. include-tagged::{doc-tests}/CRUDDocumentationIT.java[reindex-response]
  156. --------------------------------------------------
  157. <1> Get total time taken
  158. <2> Check if the request timed out
  159. <3> Get total number of docs processed
  160. <4> Number of docs that were updated
  161. <5> Number of docs that were created
  162. <6> Number of docs that were deleted
  163. <7> Number of batches that were executed
  164. <8> Number of skipped docs
  165. <9> Number of version conflicts
  166. <10> Number of times request had to retry bulk index operations
  167. <11> Number of times request had to retry search operations
  168. <12> The total time this request has throttled itself not including the current throttle time if it is currently sleeping
  169. <13> Remaining delay of any current throttle sleep or 0 if not sleeping
  170. <14> Failures during search phase
  171. <15> Failures during bulk index operation