reindex.asciidoc 8.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196
  1. --
  2. :api: reindex
  3. :request: ReindexRequest
  4. :response: BulkByScrollResponse
  5. --
  6. [id="{upid}-{api}"]
  7. === Reindex API
  8. [id="{upid}-{api}-request"]
  9. ==== Reindex Request
  10. A +{request}+ can be used to copy documents from one or more indexes into a
  11. destination index.
  12. It requires an existing source index and a target index which may or may not exist pre-request. Reindex does not attempt
  13. to set up the destination index. It does not copy the settings of the source index. You should set up the destination
  14. index prior to running a _reindex action, including setting up mappings, shard counts, replicas, etc.
  15. The simplest form of a +{request}+ looks like this:
  16. ["source","java",subs="attributes,callouts,macros"]
  17. --------------------------------------------------
  18. include-tagged::{doc-tests-file}[{api}-request]
  19. --------------------------------------------------
  20. <1> Creates the +{request}+
  21. <2> Adds a list of sources to copy from
  22. <3> Adds the destination index
  23. The `dest` element can be configured like the index API to control optimistic concurrency control. Just leaving out
  24. `versionType` (as above) or setting it to internal will cause Elasticsearch to blindly dump documents into the target.
  25. Setting `versionType` to external will cause Elasticsearch to preserve the version from the source, create any documents
  26. that are missing, and update any documents that have an older version in the destination index than they do in the
  27. source index.
  28. ["source","java",subs="attributes,callouts,macros"]
  29. --------------------------------------------------
  30. include-tagged::{doc-tests-file}[{api}-request-versionType]
  31. --------------------------------------------------
  32. <1> Set the versionType to `EXTERNAL`
  33. Setting `opType` to `create` will cause `_reindex` to only create missing documents in the target index. All existing
  34. documents will cause a version conflict. The default `opType` is `index`.
  35. ["source","java",subs="attributes,callouts,macros"]
  36. --------------------------------------------------
  37. include-tagged::{doc-tests-file}[{api}-request-opType]
  38. --------------------------------------------------
  39. <1> Set the opType to `create`
  40. By default version conflicts abort the `_reindex` process but you can just count
  41. them instead with:
  42. ["source","java",subs="attributes,callouts,macros"]
  43. --------------------------------------------------
  44. include-tagged::{doc-tests-file}[{api}-request-conflicts]
  45. --------------------------------------------------
  46. <1> Set `proceed` on version conflict
  47. You can limit the documents by adding a query.
  48. ["source","java",subs="attributes,callouts,macros"]
  49. --------------------------------------------------
  50. include-tagged::{doc-tests-file}[{api}-request-query]
  51. --------------------------------------------------
  52. <1> Only copy documents which have field `user` set to `kimchy`
  53. It’s also possible to limit the number of processed documents by setting `maxDocs`.
  54. ["source","java",subs="attributes,callouts,macros"]
  55. --------------------------------------------------
  56. include-tagged::{doc-tests-file}[{api}-request-maxDocs]
  57. --------------------------------------------------
  58. <1> Only copy 10 documents
  59. By default `_reindex` uses batches of 1000. You can change the batch size with `sourceBatchSize`.
  60. ["source","java",subs="attributes,callouts,macros"]
  61. --------------------------------------------------
  62. include-tagged::{doc-tests-file}[{api}-request-sourceSize]
  63. --------------------------------------------------
  64. <1> Use batches of 100 documents
  65. Reindex can also use the ingest feature by specifying a `pipeline`.
  66. ["source","java",subs="attributes,callouts,macros"]
  67. --------------------------------------------------
  68. include-tagged::{doc-tests-file}[{api}-request-pipeline]
  69. --------------------------------------------------
  70. <1> set pipeline to `my_pipeline`
  71. If you want a particular set of documents from the source index you’ll need to use sort. If possible, prefer a more
  72. selective query to maxDocs and sort.
  73. ["source","java",subs="attributes,callouts,macros"]
  74. --------------------------------------------------
  75. include-tagged::{doc-tests-file}[{api}-request-sort]
  76. --------------------------------------------------
  77. <1> add descending sort to`field1`
  78. <2> add ascending sort to `field2`
  79. +{request}+ also supports a `script` that modifies the document. It allows you to
  80. also change the document's metadata. The following example illustrates that.
  81. ["source","java",subs="attributes,callouts,macros"]
  82. --------------------------------------------------
  83. include-tagged::{doc-tests-file}[{api}-request-script]
  84. --------------------------------------------------
  85. <1> `setScript` to increment the `likes` field on all documents with user `kimchy`.
  86. +{request}+ supports reindexing from a remote Elasticsearch cluster. When using a remote cluster the query should be
  87. specified inside the `RemoteInfo` object and not using `setSourceQuery`. If both the remote info and the source query are
  88. set it results in a validation error during the request. The reason for this is that the remote Elasticsearch may not
  89. understand queries built by the modern query builders. The remote cluster support works all the way back to Elasticsearch
  90. 0.90 and the query language has changed since then. When reaching older versions, it is safer to write the query by hand
  91. in JSON.
  92. ["source","java",subs="attributes,callouts,macros"]
  93. --------------------------------------------------
  94. include-tagged::{doc-tests-file}[{api}-request-remote]
  95. --------------------------------------------------
  96. <1> set remote elastic cluster
  97. +{request}+ also helps in automatically parallelizing using `sliced-scroll` to
  98. slice on `_uid`. Use `setSlices` to specify the number of slices to use.
  99. ["source","java",subs="attributes,callouts,macros"]
  100. --------------------------------------------------
  101. include-tagged::{doc-tests-file}[{api}-request-slices]
  102. --------------------------------------------------
  103. <1> set number of slices to use
  104. +{request}+ uses the `scroll` parameter to control how long it keeps the
  105. "search context" alive.
  106. ["source","java",subs="attributes,callouts,macros"]
  107. --------------------------------------------------
  108. include-tagged::{doc-tests-file}[{api}-request-scroll]
  109. --------------------------------------------------
  110. <1> set scroll time
  111. ==== Optional arguments
  112. In addition to the options above the following arguments can optionally be also provided:
  113. ["source","java",subs="attributes,callouts,macros"]
  114. --------------------------------------------------
  115. include-tagged::{doc-tests-file}[{api}-request-timeout]
  116. --------------------------------------------------
  117. <1> Timeout to wait for the reindex request to be performed as a `TimeValue`
  118. ["source","java",subs="attributes,callouts,macros"]
  119. --------------------------------------------------
  120. include-tagged::{doc-tests-file}[{api}-request-refresh]
  121. --------------------------------------------------
  122. <1> Refresh index after calling reindex
  123. include::../execution.asciidoc[]
  124. [id="{upid}-{api}-task-submission"]
  125. ==== Reindex task submission
  126. It is also possible to submit a +{request}+ and not wait for it completion with the use of Task API. This is an equivalent of a REST request
  127. with wait_for_completion flag set to false.
  128. ["source","java",subs="attributes,callouts,macros"]
  129. --------------------------------------------------
  130. include-tagged::{hlrc-tests}/ReindexIT.java[submit-reindex-task]
  131. --------------------------------------------------
  132. <1> A +{request}+ is constructed the same way as for the synchronous method
  133. <2> A submit method returns a `TaskSubmissionResponse` which contains a task identifier.
  134. <3> The task identifier can be used to get `response` from a completed task.
  135. [id="{upid}-{api}-response"]
  136. ==== Reindex Response
  137. The returned +{response}+ contains information about the executed operations and
  138. allows to iterate over each result as follows:
  139. ["source","java",subs="attributes,callouts,macros"]
  140. --------------------------------------------------
  141. include-tagged::{doc-tests-file}[{api}-response]
  142. --------------------------------------------------
  143. <1> Get total time taken
  144. <2> Check if the request timed out
  145. <3> Get total number of docs processed
  146. <4> Number of docs that were updated
  147. <5> Number of docs that were created
  148. <6> Number of docs that were deleted
  149. <7> Number of batches that were executed
  150. <8> Number of skipped docs
  151. <9> Number of version conflicts
  152. <10> Number of times request had to retry bulk index operations
  153. <11> Number of times request had to retry search operations
  154. <12> The total time this request has throttled itself not including the current throttle time if it is currently sleeping
  155. <13> Remaining delay of any current throttle sleep or 0 if not sleeping
  156. <14> Failures during search phase
  157. <15> Failures during bulk index operation