bulk.asciidoc 20 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688
  1. [[docs-bulk]]
  2. === Bulk API
  3. ++++
  4. <titleabbrev>Bulk</titleabbrev>
  5. ++++
  6. Performs multiple indexing or delete operations in a single API call.
  7. This reduces overhead and can greatly increase indexing speed.
  8. [source,console]
  9. --------------------------------------------------
  10. POST _bulk
  11. { "index" : { "_index" : "test", "_id" : "1" } }
  12. { "field1" : "value1" }
  13. { "delete" : { "_index" : "test", "_id" : "2" } }
  14. { "create" : { "_index" : "test", "_id" : "3" } }
  15. { "field1" : "value3" }
  16. { "update" : {"_id" : "1", "_index" : "test"} }
  17. { "doc" : {"field2" : "value2"} }
  18. --------------------------------------------------
  19. [[docs-bulk-api-request]]
  20. ==== {api-request-title}
  21. `POST /_bulk`
  22. `POST /<target>/_bulk`
  23. [[docs-bulk-api-desc]]
  24. ==== {api-description-title}
  25. Provides a way to perform multiple `index`, `create`, `delete`, and `update` actions in a single request.
  26. The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
  27. [source,js]
  28. --------------------------------------------------
  29. action_and_meta_data\n
  30. optional_source\n
  31. action_and_meta_data\n
  32. optional_source\n
  33. ....
  34. action_and_meta_data\n
  35. optional_source\n
  36. --------------------------------------------------
  37. // NOTCONSOLE
  38. The `index` and `create` actions expect a source on the next line,
  39. and have the same semantics as the `op_type` parameter in the standard index API:
  40. `create` fails if a document with the same ID already exists in the target,
  41. `index` adds or replaces a document as necessary.
  42. NOTE: <<data-streams,Data streams>> support only the `create` action. To update
  43. or delete a document in a data stream, you must target the backing index
  44. containing the document. See <<update-delete-docs-in-a-backing-index>>.
  45. `update` expects that the partial doc, upsert,
  46. and script and its options are specified on the next line.
  47. `delete` does not expect a source on the next line and
  48. has the same semantics as the standard delete API.
  49. [NOTE]
  50. ====
  51. The final line of data must end with a newline character `\n`.
  52. Each newline character may be preceded by a carriage return `\r`.
  53. When sending requests to the `_bulk` endpoint,
  54. the `Content-Type` header should be set to `application/x-ndjson`.
  55. ====
  56. Because this format uses literal `\n`'s as delimiters,
  57. make sure that the JSON actions and sources are not pretty printed.
  58. If you provide a `<target>` in the request path,
  59. it is used for any actions that don't explicitly specify an `_index` argument.
  60. A note on the format: The idea here is to make processing of this as
  61. fast as possible. As some of the actions are redirected to other
  62. shards on other nodes, only `action_meta_data` is parsed on the
  63. receiving node side.
  64. Client libraries using this protocol should try and strive to do
  65. something similar on the client side, and reduce buffering as much as
  66. possible.
  67. There is no "correct" number of actions to perform in a single bulk request.
  68. Experiment with different settings to find the optimal size for your particular workload.
  69. When using the HTTP API, make sure that the client does not send HTTP chunks,
  70. as this will slow things down.
  71. [discrete]
  72. [[bulk-clients]]
  73. ===== Client support for bulk requests
  74. Some of the officially supported clients provide helpers to assist with
  75. bulk requests and reindexing:
  76. Go::
  77. See https://github.com/elastic/go-elasticsearch/tree/master/_examples/bulk#indexergo[esutil.BulkIndexer]
  78. Perl::
  79. See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
  80. and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
  81. Python::
  82. See https://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
  83. JavaScript::
  84. See {jsclient-current}/client-helpers.html[client.helpers.*]
  85. .NET::
  86. See https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/indexing-documents.html#bulkall-observable[`BulkAllObservable`]
  87. [discrete]
  88. [[bulk-curl]]
  89. ===== Submitting bulk requests with cURL
  90. If you're providing text file input to `curl`, you *must* use the
  91. `--data-binary` flag instead of plain `-d`. The latter doesn't preserve
  92. newlines. Example:
  93. [source,js]
  94. --------------------------------------------------
  95. $ cat requests
  96. { "index" : { "_index" : "test", "_id" : "1" } }
  97. { "field1" : "value1" }
  98. $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
  99. {"took":7, "errors": false, "items":[{"index":{"_index":"test","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}
  100. --------------------------------------------------
  101. // NOTCONSOLE
  102. // Not converting to console because this shows how curl works
  103. [discrete]
  104. [[bulk-optimistic-concurrency-control]]
  105. ===== Optimistic Concurrency Control
  106. Each `index` and `delete` action within a bulk API call may include the
  107. `if_seq_no` and `if_primary_term` parameters in their respective action
  108. and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
  109. how operations are executed, based on the last modification to existing
  110. documents. See <<optimistic-concurrency-control>> for more details.
  111. [discrete]
  112. [[bulk-versioning]]
  113. ===== Versioning
  114. Each bulk item can include the version value using the
  115. `version` field. It automatically follows the behavior of the
  116. index / delete operation based on the `_version` mapping. It also
  117. support the `version_type` (see <<index-versioning, versioning>>).
  118. [discrete]
  119. [[bulk-routing]]
  120. ===== Routing
  121. Each bulk item can include the routing value using the
  122. `routing` field. It automatically follows the behavior of the
  123. index / delete operation based on the `_routing` mapping.
  124. NOTE: Data streams do not support custom routing. Instead, target the
  125. appropriate backing index for the stream.
  126. [discrete]
  127. [[bulk-wait-for-active-shards]]
  128. ===== Wait For Active Shards
  129. When making bulk calls, you can set the `wait_for_active_shards`
  130. parameter to require a minimum number of shard copies to be active
  131. before starting to process the bulk request. See
  132. <<index-wait-for-active-shards,here>> for further details and a usage
  133. example.
  134. [discrete]
  135. [[bulk-refresh]]
  136. ===== Refresh
  137. Control when the changes made by this request are visible to search. See
  138. <<docs-refresh,refresh>>.
  139. NOTE: Only the shards that receive the bulk request will be affected by
  140. `refresh`. Imagine a `_bulk?refresh=wait_for` request with three
  141. documents in it that happen to be routed to different shards in an index
  142. with five shards. The request will only wait for those three shards to
  143. refresh. The other two shards that make up the index do not
  144. participate in the `_bulk` request at all.
  145. [discrete]
  146. [[bulk-security]]
  147. ===== Security
  148. See <<url-access-control>>.
  149. [[docs-bulk-api-path-params]]
  150. ==== {api-path-parms-title}
  151. `<target>`::
  152. (Optional, string)
  153. Name of the data stream, index, or index alias to perform bulk actions
  154. on.
  155. [[docs-bulk-api-query-params]]
  156. ==== {api-query-parms-title}
  157. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=pipeline]
  158. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=refresh]
  159. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
  160. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=source]
  161. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=source_excludes]
  162. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=source_includes]
  163. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=timeout]
  164. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
  165. [[bulk-api-request-body]]
  166. ==== {api-request-body-title}
  167. The request body contains a newline-delimited list of `create`, `delete`, `index`,
  168. and `update` actions and their associated source data.
  169. `create`::
  170. (Optional, string)
  171. Indexes the specified document if it does not already exist.
  172. The following line must contain the source data to be indexed.
  173. +
  174. --
  175. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=bulk-index]
  176. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=bulk-id]
  177. --
  178. `delete`::
  179. (Optional, string)
  180. Removes the specified document from the index.
  181. +
  182. --
  183. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=bulk-index]
  184. `_id`::
  185. (Required, string)
  186. The document ID.
  187. --
  188. `index`::
  189. (Optional, string)
  190. Indexes the specified document.
  191. If the document exists, replaces the document and increments the version.
  192. The following line must contain the source data to be indexed.
  193. +
  194. --
  195. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=bulk-index]
  196. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=bulk-id]
  197. --
  198. `update`::
  199. (Optional, string)
  200. Performs a partial document update.
  201. The following line must contain the partial document and update options.
  202. +
  203. --
  204. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=bulk-index]
  205. include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=bulk-id]
  206. --
  207. `doc`::
  208. (Optional, object)
  209. The partial document to index.
  210. Required for `update` operations.
  211. `<fields>`::
  212. (Optional, object)
  213. The document source to index.
  214. Required for `create` and `index` operations.
  215. [role="child_attributes"]
  216. [[bulk-api-response-body]]
  217. ==== {api-response-body-title}
  218. The bulk API's response contains the individual results of each operation in the
  219. request, returned in the order submitted. The success or failure of an
  220. individual operation does not affect other operations in the request.
  221. [[bulk-partial-responses]]
  222. .Partial responses
  223. ****
  224. To ensure fast responses, the bulk API will respond with partial results if one
  225. or more shards fail. See <<shard-failures, Shard failures>> for more
  226. information.
  227. ****
  228. `took`::
  229. (integer)
  230. How long, in milliseconds, it took to process the bulk request.
  231. `errors`::
  232. (boolean)
  233. If `true`, one or more of the operations in the bulk request did not complete
  234. successfully.
  235. `items`::
  236. (array of objects)
  237. Contains the result of each operation in the bulk request, in the order they
  238. were submitted.
  239. +
  240. .Properties of `items` objects
  241. [%collapsible%open]
  242. ====
  243. <action>::
  244. (object)
  245. The parameter name is an action associated with the operation. Possible values
  246. are `create`, `delete`, `index`, and `update`.
  247. +
  248. The parameter value is an object that contains information for the associated
  249. operation.
  250. +
  251. .Properties of `<action>`
  252. [%collapsible%open]
  253. =====
  254. `_index`::
  255. (string)
  256. Name of the index associated with the operation. If the operation targeted a
  257. data stream, this is the backing index into which the document was written.
  258. `_id`::
  259. (integer)
  260. The document ID associated with the operation.
  261. `_version`::
  262. (integer)
  263. The document version associated with the operation. The document version is
  264. incremented each time the document is updated.
  265. +
  266. This parameter is only returned for successful actions.
  267. `result`::
  268. (string)
  269. Result of the operation. Successful values are `created`, `deleted`, and
  270. `updated`.
  271. +
  272. This parameter is only returned for successful operations.
  273. `_shards`::
  274. (object)
  275. Contains shard information for the operation.
  276. +
  277. This parameter is only returned for successful operations.
  278. +
  279. .Properties of `_shards`
  280. [%collapsible%open]
  281. ======
  282. `total`::
  283. (integer)
  284. Number of shards the operation attempted to execute on.
  285. `successful`::
  286. (integer)
  287. Number of shards the operation succeeded on.
  288. `failed`::
  289. (integer)
  290. Number of shards the operation attempted to execute on but failed.
  291. ======
  292. `_seq_no`::
  293. (integer)
  294. The sequence number assigned to the document for the operation.
  295. Sequence numbers are used to ensure an older version of a document
  296. doesn’t overwrite a newer version. See <<optimistic-concurrency-control-index>>.
  297. +
  298. This parameter is only returned for successful operations.
  299. `_primary_term`::
  300. (integer)
  301. The primary term assigned to the document for the operation.
  302. See <<optimistic-concurrency-control-index>>.
  303. +
  304. This parameter is only returned for successful operations.
  305. `status`::
  306. (integer)
  307. HTTP status code returned for the operation.
  308. `error`::
  309. (object)
  310. Contains additional information about the failed operation.
  311. +
  312. The parameter is only returned for failed operations.
  313. +
  314. .Properties of `error`
  315. [%collapsible%open]
  316. ======
  317. `type`::
  318. (string)
  319. Error type for the operation.
  320. `reason`::
  321. (string)
  322. Reason for the failed operation.
  323. `index_uuid`::
  324. (string)
  325. The universally unique identifier (UUID) of the index associated with the failed
  326. operation.
  327. `shard`::
  328. (string)
  329. ID of the shard associated with the failed operation.
  330. `index`::
  331. (string)
  332. Name of the index associated with the failed operation. If the operation
  333. targeted a data stream, this is the backing index into which the document was
  334. attempted to be written.
  335. ======
  336. =====
  337. ====
  338. [[docs-bulk-api-example]]
  339. ==== {api-examples-title}
  340. [source,console]
  341. --------------------------------------------------
  342. POST _bulk
  343. { "index" : { "_index" : "test", "_id" : "1" } }
  344. { "field1" : "value1" }
  345. { "delete" : { "_index" : "test", "_id" : "2" } }
  346. { "create" : { "_index" : "test", "_id" : "3" } }
  347. { "field1" : "value3" }
  348. { "update" : {"_id" : "1", "_index" : "test"} }
  349. { "doc" : {"field2" : "value2"} }
  350. --------------------------------------------------
  351. The API returns the following result:
  352. [source,console-result]
  353. --------------------------------------------------
  354. {
  355. "took": 30,
  356. "errors": false,
  357. "items": [
  358. {
  359. "index": {
  360. "_index": "test",
  361. "_id": "1",
  362. "_version": 1,
  363. "result": "created",
  364. "_shards": {
  365. "total": 2,
  366. "successful": 1,
  367. "failed": 0
  368. },
  369. "status": 201,
  370. "_seq_no" : 0,
  371. "_primary_term": 1
  372. }
  373. },
  374. {
  375. "delete": {
  376. "_index": "test",
  377. "_id": "2",
  378. "_version": 1,
  379. "result": "not_found",
  380. "_shards": {
  381. "total": 2,
  382. "successful": 1,
  383. "failed": 0
  384. },
  385. "status": 404,
  386. "_seq_no" : 1,
  387. "_primary_term" : 2
  388. }
  389. },
  390. {
  391. "create": {
  392. "_index": "test",
  393. "_id": "3",
  394. "_version": 1,
  395. "result": "created",
  396. "_shards": {
  397. "total": 2,
  398. "successful": 1,
  399. "failed": 0
  400. },
  401. "status": 201,
  402. "_seq_no" : 2,
  403. "_primary_term" : 3
  404. }
  405. },
  406. {
  407. "update": {
  408. "_index": "test",
  409. "_id": "1",
  410. "_version": 2,
  411. "result": "updated",
  412. "_shards": {
  413. "total": 2,
  414. "successful": 1,
  415. "failed": 0
  416. },
  417. "status": 200,
  418. "_seq_no" : 3,
  419. "_primary_term" : 4
  420. }
  421. }
  422. ]
  423. }
  424. --------------------------------------------------
  425. // TESTRESPONSE[s/"took": 30/"took": $body.took/]
  426. // TESTRESPONSE[s/"index_uuid": .../"index_uuid": $body.items.3.update.error.index_uuid/]
  427. // TESTRESPONSE[s/"_seq_no" : 0/"_seq_no" : $body.items.0.index._seq_no/]
  428. // TESTRESPONSE[s/"_primary_term" : 1/"_primary_term" : $body.items.0.index._primary_term/]
  429. // TESTRESPONSE[s/"_seq_no" : 1/"_seq_no" : $body.items.1.delete._seq_no/]
  430. // TESTRESPONSE[s/"_primary_term" : 2/"_primary_term" : $body.items.1.delete._primary_term/]
  431. // TESTRESPONSE[s/"_seq_no" : 2/"_seq_no" : $body.items.2.create._seq_no/]
  432. // TESTRESPONSE[s/"_primary_term" : 3/"_primary_term" : $body.items.2.create._primary_term/]
  433. // TESTRESPONSE[s/"_seq_no" : 3/"_seq_no" : $body.items.3.update._seq_no/]
  434. // TESTRESPONSE[s/"_primary_term" : 4/"_primary_term" : $body.items.3.update._primary_term/]
  435. [discrete]
  436. [[bulk-update]]
  437. ===== Bulk update example
  438. When using the `update` action, `retry_on_conflict` can be used as a field in
  439. the action itself (not in the extra payload line), to specify how many
  440. times an update should be retried in the case of a version conflict.
  441. The `update` action payload supports the following options: `doc`
  442. (partial document), `upsert`, `doc_as_upsert`, `script`, `params` (for
  443. script), `lang` (for script), and `_source`. See update documentation for details on
  444. the options. Example with update actions:
  445. [source,console]
  446. --------------------------------------------------
  447. POST _bulk
  448. { "update" : {"_id" : "1", "_index" : "index1", "retry_on_conflict" : 3} }
  449. { "doc" : {"field" : "value"} }
  450. { "update" : { "_id" : "0", "_index" : "index1", "retry_on_conflict" : 3} }
  451. { "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
  452. { "update" : {"_id" : "2", "_index" : "index1", "retry_on_conflict" : 3} }
  453. { "doc" : {"field" : "value"}, "doc_as_upsert" : true }
  454. { "update" : {"_id" : "3", "_index" : "index1", "_source" : true} }
  455. { "doc" : {"field" : "value"} }
  456. { "update" : {"_id" : "4", "_index" : "index1"} }
  457. { "doc" : {"field" : "value"}, "_source": true}
  458. --------------------------------------------------
  459. [discrete]
  460. [[bulk-failures-ex]]
  461. ===== Example with failed actions
  462. The following bulk API request includes operations that update non-existent
  463. documents.
  464. [source,console]
  465. ----
  466. POST /_bulk
  467. { "update": {"_id": "5", "_index": "index1"} }
  468. { "doc": {"my_field": "foo"} }
  469. { "update": {"_id": "6", "_index": "index1"} }
  470. { "doc": {"my_field": "foo"} }
  471. { "create": {"_id": "7", "_index": "index1"} }
  472. { "my_field": "foo" }
  473. ----
  474. Because these operations cannot complete successfully, the API returns a
  475. response with an `errors` flag of `true`.
  476. The response also includes an `error` object for any failed operations. The
  477. `error` object contains additional information about the failure, such as the
  478. error type and reason.
  479. [source,console-result]
  480. ----
  481. {
  482. "took": 486,
  483. "errors": true,
  484. "items": [
  485. {
  486. "update": {
  487. "_index": "index1",
  488. "_id": "5",
  489. "status": 404,
  490. "error": {
  491. "type": "document_missing_exception",
  492. "reason": "[5]: document missing",
  493. "index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
  494. "shard": "0",
  495. "index": "index1"
  496. }
  497. }
  498. },
  499. {
  500. "update": {
  501. "_index": "index1",
  502. "_id": "6",
  503. "status": 404,
  504. "error": {
  505. "type": "document_missing_exception",
  506. "reason": "[6]: document missing",
  507. "index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
  508. "shard": "0",
  509. "index": "index1"
  510. }
  511. }
  512. },
  513. {
  514. "create": {
  515. "_index": "index1",
  516. "_id": "7",
  517. "_version": 1,
  518. "result": "created",
  519. "_shards": {
  520. "total": 2,
  521. "successful": 1,
  522. "failed": 0
  523. },
  524. "_seq_no": 0,
  525. "_primary_term": 1,
  526. "status": 201
  527. }
  528. }
  529. ]
  530. }
  531. ----
  532. // TESTRESPONSE[s/"took": 486/"took": $body.took/]
  533. // TESTRESPONSE[s/"_seq_no": 0/"_seq_no": $body.items.2.create._seq_no/]
  534. // TESTRESPONSE[s/"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA"/"index_uuid": $body.$_path/]
  535. To return only information about failed operations, use the
  536. <<common-options-response-filtering,`filter_path`>> query parameter with an
  537. argument of `items.*.error`.
  538. [source,console]
  539. ----
  540. POST /_bulk?filter_path=items.*.error
  541. { "update": {"_id": "5", "_index": "index1"} }
  542. { "doc": {"my_field": "baz"} }
  543. { "update": {"_id": "6", "_index": "index1"} }
  544. { "doc": {"my_field": "baz"} }
  545. { "update": {"_id": "7", "_index": "index1"} }
  546. { "doc": {"my_field": "baz"} }
  547. ----
  548. // TEST[continued]
  549. The API returns the following result.
  550. [source,console-result]
  551. ----
  552. {
  553. "items": [
  554. {
  555. "update": {
  556. "error": {
  557. "type": "document_missing_exception",
  558. "reason": "[5]: document missing",
  559. "index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
  560. "shard": "0",
  561. "index": "index1"
  562. }
  563. }
  564. },
  565. {
  566. "update": {
  567. "error": {
  568. "type": "document_missing_exception",
  569. "reason": "[6]: document missing",
  570. "index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
  571. "shard": "0",
  572. "index": "index1"
  573. }
  574. }
  575. }
  576. ]
  577. }
  578. ----
  579. // TESTRESPONSE[s/"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA"/"index_uuid": $body.$_path/]