index_.asciidoc 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433
  1. [[docs-index_]]
  2. == Index API
  3. The index API adds or updates a typed JSON document in a specific index,
  4. making it searchable. The following example inserts the JSON document
  5. into the "twitter" index, under a type called "tweet" with an id of 1:
  6. [source,js]
  7. --------------------------------------------------
  8. PUT twitter/tweet/1
  9. {
  10. "user" : "kimchy",
  11. "post_date" : "2009-11-15T14:12:12",
  12. "message" : "trying out Elasticsearch"
  13. }
  14. --------------------------------------------------
  15. // CONSOLE
  16. The result of the above index operation is:
  17. [source,js]
  18. --------------------------------------------------
  19. {
  20. "_shards" : {
  21. "total" : 2,
  22. "failed" : 0,
  23. "successful" : 2
  24. },
  25. "_index" : "twitter",
  26. "_type" : "tweet",
  27. "_id" : "1",
  28. "_version" : 1,
  29. "created" : true
  30. }
  31. --------------------------------------------------
  32. // TESTRESPONSE[s/"successful" : 2/"successful" : 1/]
  33. The `_shards` header provides information about the replication process of the index operation.
  34. * `total` - Indicates to how many shard copies (primary and replica shards) the index operation should be executed on.
  35. * `successful`- Indicates the number of shard copies the index operation succeeded on.
  36. * `failures` - An array that contains replication related errors in the case an index operation failed on a replica shard.
  37. The index operation is successful in the case `successful` is at least 1.
  38. NOTE: Replica shards may not all be started when an indexing operation successfully returns (by default, a quorum is
  39. required). In that case, `total` will be equal to the total shards based on the index replica settings and
  40. `successful` will be equal to the number of shards started (primary plus replicas). As there were no failures,
  41. the `failed` will be 0.
  42. [float]
  43. [[index-creation]]
  44. === Automatic Index Creation
  45. The index operation automatically creates an index if it has not been
  46. created before (check out the
  47. <<indices-create-index,create index API>> for manually
  48. creating an index), and also automatically creates a
  49. dynamic type mapping for the specific type if one has not yet been
  50. created (check out the <<indices-put-mapping,put mapping>>
  51. API for manually creating a type mapping).
  52. The mapping itself is very flexible and is schema-free. New fields and
  53. objects will automatically be added to the mapping definition of the
  54. type specified. Check out the <<mapping,mapping>>
  55. section for more information on mapping definitions.
  56. Automatic index creation can be disabled by setting
  57. `action.auto_create_index` to `false` in the config file of all nodes.
  58. Automatic mapping creation can be disabled by setting
  59. `index.mapper.dynamic` to `false` in the config files of all nodes (or
  60. on the specific index settings).
  61. Automatic index creation can include a pattern based white/black list,
  62. for example, set `action.auto_create_index` to `+aaa*,-bbb*,+ccc*,-*` (+
  63. meaning allowed, and - meaning disallowed).
  64. [float]
  65. [[index-versioning]]
  66. === Versioning
  67. Each indexed document is given a version number. The associated
  68. `version` number is returned as part of the response to the index API
  69. request. The index API optionally allows for
  70. http://en.wikipedia.org/wiki/Optimistic_concurrency_control[optimistic
  71. concurrency control] when the `version` parameter is specified. This
  72. will control the version of the document the operation is intended to be
  73. executed against. A good example of a use case for versioning is
  74. performing a transactional read-then-update. Specifying a `version` from
  75. the document initially read ensures no changes have happened in the
  76. meantime (when reading in order to update, it is recommended to set
  77. `preference` to `_primary`). For example:
  78. [source,js]
  79. --------------------------------------------------
  80. PUT twitter/tweet/1?version=2
  81. {
  82. "message" : "elasticsearch now has versioning support, double cool!"
  83. }
  84. --------------------------------------------------
  85. // CONSOLE
  86. // TEST[catch: conflict]
  87. *NOTE:* versioning is completely real time, and is not affected by the
  88. near real time aspects of search operations. If no version is provided,
  89. then the operation is executed without any version checks.
  90. By default, internal versioning is used that starts at 1 and increments
  91. with each update, deletes included. Optionally, the version number can be
  92. supplemented with an external value (for example, if maintained in a
  93. database). To enable this functionality, `version_type` should be set to
  94. `external`. The value provided must be a numeric, long value greater or equal to 0,
  95. and less than around 9.2e+18. When using the external version type, instead
  96. of checking for a matching version number, the system checks to see if
  97. the version number passed to the index request is greater than the
  98. version of the currently stored document. If true, the document will be
  99. indexed and the new version number used. If the value provided is less
  100. than or equal to the stored document's version number, a version
  101. conflict will occur and the index operation will fail.
  102. A nice side effect is that there is no need to maintain strict ordering
  103. of async indexing operations executed as a result of changes to a source
  104. database, as long as version numbers from the source database are used.
  105. Even the simple case of updating the elasticsearch index using data from
  106. a database is simplified if external versioning is used, as only the
  107. latest version will be used if the index operations are out of order for
  108. whatever reason.
  109. [float]
  110. ==== Version types
  111. Next to the `internal` & `external` version types explained above, Elasticsearch
  112. also supports other types for specific use cases. Here is an overview of
  113. the different version types and their semantics.
  114. `internal`:: only index the document if the given version is identical to the version
  115. of the stored document.
  116. `external` or `external_gt`:: only index the document if the given version is strictly higher
  117. than the version of the stored document *or* if there is no existing document. The given
  118. version will be used as the new version and will be stored with the new document. The supplied
  119. version must be a non-negative long number.
  120. `external_gte`:: only index the document if the given version is *equal* or higher
  121. than the version of the stored document. If there is no existing document
  122. the operation will succeed as well. The given version will be used as the new version
  123. and will be stored with the new document. The supplied version must be a non-negative long number.
  124. `force`:: the document will be indexed regardless of the version of the stored document or if there
  125. is no existing document. The given version will be used as the new version and will be stored
  126. with the new document. This version type is typically used for correcting errors.
  127. *NOTE*: The `external_gte` & `force` version types are meant for special use cases and should be used
  128. with care. If used incorrectly, they can result in loss of data.
  129. [float]
  130. [[operation-type]]
  131. === Operation Type
  132. The index operation also accepts an `op_type` that can be used to force
  133. a `create` operation, allowing for "put-if-absent" behavior. When
  134. `create` is used, the index operation will fail if a document by that id
  135. already exists in the index.
  136. Here is an example of using the `op_type` parameter:
  137. [source,js]
  138. --------------------------------------------------
  139. PUT twitter/tweet/1?op_type=create
  140. {
  141. "user" : "kimchy",
  142. "post_date" : "2009-11-15T14:12:12",
  143. "message" : "trying out Elasticsearch"
  144. }
  145. --------------------------------------------------
  146. // CONSOLE
  147. Another option to specify `create` is to use the following uri:
  148. [source,js]
  149. --------------------------------------------------
  150. PUT twitter/tweet/1/_create
  151. {
  152. "user" : "kimchy",
  153. "post_date" : "2009-11-15T14:12:12",
  154. "message" : "trying out Elasticsearch"
  155. }
  156. --------------------------------------------------
  157. // CONSOLE
  158. [float]
  159. === Automatic ID Generation
  160. The index operation can be executed without specifying the id. In such a
  161. case, an id will be generated automatically. In addition, the `op_type`
  162. will automatically be set to `create`. Here is an example (note the
  163. *POST* used instead of *PUT*):
  164. [source,js]
  165. --------------------------------------------------
  166. POST twitter/tweet/
  167. {
  168. "user" : "kimchy",
  169. "post_date" : "2009-11-15T14:12:12",
  170. "message" : "trying out Elasticsearch"
  171. }
  172. --------------------------------------------------
  173. // CONSOLE
  174. The result of the above index operation is:
  175. [source,js]
  176. --------------------------------------------------
  177. {
  178. "_shards" : {
  179. "total" : 2,
  180. "failed" : 0,
  181. "successful" : 2
  182. },
  183. "_index" : "twitter",
  184. "_type" : "tweet",
  185. "_id" : "6a8ca01c-7896-48e9-81cc-9f70661fcb32",
  186. "_version" : 1,
  187. "created" : true
  188. }
  189. --------------------------------------------------
  190. // TESTRESPONSE[s/6a8ca01c-7896-48e9-81cc-9f70661fcb32/$body._id/ s/"successful" : 2/"successful" : 1/]
  191. [float]
  192. [[index-routing]]
  193. === Routing
  194. By default, shard placement — or `routing` — is controlled by using a
  195. hash of the document's id value. For more explicit control, the value
  196. fed into the hash function used by the router can be directly specified
  197. on a per-operation basis using the `routing` parameter. For example:
  198. [source,js]
  199. --------------------------------------------------
  200. POST twitter/tweet?routing=kimchy
  201. {
  202. "user" : "kimchy",
  203. "post_date" : "2009-11-15T14:12:12",
  204. "message" : "trying out Elasticsearch"
  205. }
  206. --------------------------------------------------
  207. // CONSOLE
  208. In the example above, the "tweet" document is routed to a shard based on
  209. the `routing` parameter provided: "kimchy".
  210. When setting up explicit mapping, the `_routing` field can be optionally
  211. used to direct the index operation to extract the routing value from the
  212. document itself. This does come at the (very minimal) cost of an
  213. additional document parsing pass. If the `_routing` mapping is defined
  214. and set to be `required`, the index operation will fail if no routing
  215. value is provided or extracted.
  216. [float]
  217. [[parent-children]]
  218. === Parents & Children
  219. A child document can be indexed by specifying its parent when indexing.
  220. For example:
  221. [source,js]
  222. --------------------------------------------------
  223. PUT blogs
  224. {
  225. "mappings": {
  226. "tag_parent": {},
  227. "blog_tag": {
  228. "_parent": {
  229. "type": "tag_parent"
  230. }
  231. }
  232. }
  233. }
  234. PUT blogs/blog_tag/1122?parent=1111
  235. {
  236. "tag" : "something"
  237. }
  238. --------------------------------------------------
  239. // CONSOLE
  240. When indexing a child document, the routing value is automatically set
  241. to be the same as its parent, unless the routing value is explicitly
  242. specified using the `routing` parameter.
  243. [float]
  244. [[index-timestamp]]
  245. === Timestamp
  246. deprecated[2.0.0,The `_timestamp` field is deprecated. Instead, use a normal <<date,`date`>> field and set its value explicitly]
  247. A document can be indexed with a `timestamp` associated with it. The
  248. `timestamp` value of a document can be set using the `timestamp`
  249. parameter. For example:
  250. [source,js]
  251. --------------------------------------------------
  252. PUT twitter/tweet/1?timestamp=2009-11-15T14:12:12
  253. {
  254. "user" : "kimchy",
  255. "message" : "trying out Elasticsearch"
  256. }
  257. --------------------------------------------------
  258. // CONSOLE
  259. If the `timestamp` value is not provided externally or in the `_source`,
  260. the `timestamp` will be automatically set to the date the document was
  261. processed by the indexing chain. More information can be found on the
  262. <<mapping-timestamp-field,_timestamp mapping
  263. page>>.
  264. [float]
  265. [[index-ttl]]
  266. === TTL
  267. deprecated[2.0.0,The current `_ttl` implementation is deprecated and will be replaced with a different implementation in a future version]
  268. A document can be indexed with a `ttl` (time to live) associated with
  269. it. Expired documents will be expunged automatically. The expiration
  270. date that will be set for a document with a provided `ttl` is relative
  271. to the `timestamp` of the document, meaning it can be based on the time
  272. of indexing or on any time provided. The provided `ttl` must be strictly
  273. positive and can be a number (in milliseconds) or any valid time value
  274. as shown in the following examples:
  275. [source,js]
  276. --------------------------------------------------
  277. PUT twitter/tweet/1?ttl=86400000ms
  278. {
  279. "user": "kimchy",
  280. "message": "Trying out elasticsearch, so far so good?"
  281. }
  282. --------------------------------------------------
  283. // CONSOLE
  284. [source,js]
  285. --------------------------------------------------
  286. PUT twitter/tweet/1?ttl=1d
  287. {
  288. "user": "kimchy",
  289. "message": "Trying out elasticsearch, so far so good?"
  290. }
  291. --------------------------------------------------
  292. // CONSOLE
  293. More information can be found on the
  294. <<mapping-ttl-field,_ttl mapping page>>.
  295. [float]
  296. [[index-distributed]]
  297. === Distributed
  298. The index operation is directed to the primary shard based on its route
  299. (see the Routing section above) and performed on the actual node
  300. containing this shard. After the primary shard completes the operation,
  301. if needed, the update is distributed to applicable replicas.
  302. [float]
  303. [[index-consistency]]
  304. === Write Consistency
  305. To prevent writes from taking place on the "wrong" side of a network
  306. partition, by default, index operations only succeed if a quorum
  307. (>replicas/2+1) of active shards are available. This default can be
  308. overridden on a node-by-node basis using the `action.write_consistency`
  309. setting. To alter this behavior per-operation, the `consistency` request
  310. parameter can be used.
  311. Valid write consistency values are `one`, `quorum`, and `all`.
  312. Note, for the case where the number of replicas is 1 (total of 2 copies
  313. of the data), then the default behavior is to succeed if 1 copy (the primary)
  314. can perform the write.
  315. The index operation only returns after all *active* shards within the
  316. replication group have indexed the document (sync replication).
  317. [float]
  318. [[index-refresh]]
  319. === Refresh
  320. To refresh the shard (not the whole index) immediately after the operation
  321. occurs, so that the document appears in search results immediately, the
  322. `refresh` parameter can be set to `true`. Setting this option to `true` should
  323. *ONLY* be done after careful thought and verification that it does not lead to
  324. poor performance, both from an indexing and a search standpoint. Note, getting
  325. a document using the get API is completely realtime and doesn't require a
  326. refresh.
  327. [float]
  328. [[index-noop]]
  329. === Noop Updates
  330. When updating a document using the index api a new version of the document is
  331. always created even if the document hasn't changed. If this isn't acceptable
  332. use the `_update` api with `detect_noop` set to true. This option isn't
  333. available on the index api because the index api doesn't fetch the old source
  334. and isn't able to compare it against the new source.
  335. There isn't a hard and fast rule about when noop updates aren't acceptable.
  336. It's a combination of lots of factors like how frequently your data source
  337. sends updates that are actually noops and how many queries per second
  338. elasticsearch runs on the shard with receiving the updates.
  339. [float]
  340. [[timeout]]
  341. === Timeout
  342. The primary shard assigned to perform the index operation might not be
  343. available when the index operation is executed. Some reasons for this
  344. might be that the primary shard is currently recovering from a gateway
  345. or undergoing relocation. By default, the index operation will wait on
  346. the primary shard to become available for up to 1 minute before failing
  347. and responding with an error. The `timeout` parameter can be used to
  348. explicitly specify how long it waits. Here is an example of setting it
  349. to 5 minutes:
  350. [source,js]
  351. --------------------------------------------------
  352. PUT twitter/tweet/1?timeout=5m
  353. {
  354. "user" : "kimchy",
  355. "post_date" : "2009-11-15T14:12:12",
  356. "message" : "trying out Elasticsearch"
  357. }
  358. --------------------------------------------------
  359. // CONSOLE