mapping.asciidoc 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393
  1. === Mapping changes
  2. A number of changes have been made to mappings to remove ambiguity and to
  3. ensure that conflicting mappings cannot be created.
  4. One major change is that dynamically added fields must have their mapping
  5. confirmed by the master node before indexing continues. This is to avoid a
  6. problem where different shards in the same index dynamically add different
  7. mappings for the same field. These conflicting mappings can silently return
  8. incorrect results and can lead to index corruption.
  9. This change can make indexing slower when frequently adding many new fields.
  10. We are looking at ways of optimising this process but we chose safety over
  11. performance for this extreme use case.
  12. ==== Conflicting field mappings
  13. Fields with the same name, in the same index, in different types, must have
  14. the same mapping, with the exception of the <<copy-to>>, <<dynamic>>,
  15. <<enabled>>, <<ignore-above>>, <<include-in-all>>, and <<properties>>
  16. parameters, which may have different settings per field.
  17. [source,js]
  18. ---------------
  19. PUT my_index
  20. {
  21. "mappings": {
  22. "type_one": {
  23. "properties": {
  24. "name": { <1>
  25. "type": "string"
  26. }
  27. }
  28. },
  29. "type_two": {
  30. "properties": {
  31. "name": { <1>
  32. "type": "string",
  33. "analyzer": "english"
  34. }
  35. }
  36. }
  37. }
  38. }
  39. ---------------
  40. <1> The two `name` fields have conflicting mappings and will prevent Elasticsearch
  41. from starting.
  42. Elasticsearch will not start in the presence of conflicting field mappings.
  43. These indices must be deleted or reindexed using a new mapping.
  44. The `ignore_conflicts` option of the put mappings API has been removed.
  45. Conflicts can't be ignored anymore.
  46. ==== Fields cannot be referenced by short name
  47. A field can no longer be referenced using its short name. Instead, the full
  48. path to the field is required. For instance:
  49. [source,js]
  50. ---------------
  51. PUT my_index
  52. {
  53. "mappings": {
  54. "my_type": {
  55. "properties": {
  56. "title": { "type": "string" }, <1>
  57. "name": {
  58. "properties": {
  59. "title": { "type": "string" }, <2>
  60. "first": { "type": "string" },
  61. "last": { "type": "string" }
  62. }
  63. }
  64. }
  65. }
  66. }
  67. }
  68. ---------------
  69. <1> This field is referred to as `title`.
  70. <2> This field is referred to as `name.title`.
  71. Previously, the two `title` fields in the example above could have been
  72. confused with each other when using the short name `title`.
  73. ==== Type name prefix removed
  74. Previously, two fields with the same name in two different types could
  75. sometimes be disambiguated by prepending the type name. As a side effect, it
  76. would add a filter on the type name to the relevant query. This feature was
  77. ambiguous -- a type name could be confused with a field name -- and didn't
  78. work everywhere e.g. aggregations.
  79. Instead, fields should be specified with the full path, but without a type
  80. name prefix. If you wish to filter by the `_type` field, either specify the
  81. type in the URL or add an explicit filter.
  82. The following example query in 1.x:
  83. [source,js]
  84. ----------------------------
  85. GET my_index/_search
  86. {
  87. "query": {
  88. "match": {
  89. "my_type.some_field": "quick brown fox"
  90. }
  91. }
  92. }
  93. ----------------------------
  94. would be rewritten in 2.0 as:
  95. [source,js]
  96. ----------------------------
  97. GET my_index/my_type/_search <1>
  98. {
  99. "query": {
  100. "match": {
  101. "some_field": "quick brown fox" <2>
  102. }
  103. }
  104. }
  105. ----------------------------
  106. <1> The type name can be specified in the URL to act as a filter.
  107. <2> The field name should be specified without the type prefix.
  108. ==== Field names may not contain dots
  109. In 1.x, it was possible to create fields with dots in their name, for
  110. instance:
  111. [source,js]
  112. ----------------------------
  113. PUT my_index
  114. {
  115. "mappings": {
  116. "my_type": {
  117. "properties": {
  118. "foo.bar": { <1>
  119. "type": "string"
  120. },
  121. "foo": {
  122. "properties": {
  123. "bar": { <1>
  124. "type": "string"
  125. }
  126. }
  127. }
  128. }
  129. }
  130. }
  131. }
  132. ----------------------------
  133. <1> These two fields cannot be distinguised as both are referred to as `foo.bar`.
  134. You can no longer create fields with dots in the name.
  135. ==== Type names may not start with a dot
  136. In 1.x, Elasticsearch would issue a warning if a type name included a dot,
  137. e.g. `my.type`. Now that type names are no longer used to distinguish between
  138. fields in differnt types, this warning has been relaxed: type names may now
  139. contain dots, but they may not *begin* with a dot. The only exception to this
  140. is the special `.percolator` type.
  141. ==== Types may no longer be deleted
  142. In 1.x it was possible to delete a type mapping, along with all of the
  143. documents of that type, using the delete mapping API. This is no longer
  144. supported, because remnants of the fields in the type could remain in the
  145. index, causing corruption later on.
  146. Instead, if you need to delete a type mapping, you should reindex to a new
  147. index which does not contain the mapping. If you just need to delete the
  148. documents that belong to that type, then use the delete-by-query plugin
  149. instead.
  150. [[migration-meta-fields]]
  151. ==== Type meta-fields
  152. The <<mapping-fields,meta-fields>> associated with had configuration options
  153. removed, to make them more reliable:
  154. * `_id` configuration can no longer be changed. If you need to sort, use the <<mapping-uid-field,`_uid`>> field instead.
  155. * `_type` configuration can no longer be changed.
  156. * `_index` configuration can no longer be changed.
  157. * `_routing` configuration is limited to marking routing as required.
  158. * `_field_names` configuration is limited to disabling the field.
  159. * `_size` configuration is limited to enabling the field.
  160. * `_timestamp` configuration is limited to enabling the field, setting format and default value.
  161. * `_boost` has been removed.
  162. * `_analyzer` has been removed.
  163. Importantly, *meta-fields can no longer be specified as part of the document
  164. body.* Instead, they must be specified in the query string parameters. For
  165. instance, in 1.x, the `routing` could be specified as follows:
  166. [source,json]
  167. -----------------------------
  168. PUT my_index
  169. {
  170. "mappings": {
  171. "my_type": {
  172. "_routing": {
  173. "path": "group" <1>
  174. },
  175. "properties": {
  176. "group": { <1>
  177. "type": "string"
  178. }
  179. }
  180. }
  181. }
  182. }
  183. PUT my_index/my_type/1 <2>
  184. {
  185. "group": "foo"
  186. }
  187. -----------------------------
  188. <1> This 1.x mapping tells Elasticsearch to extract the `routing` value from the `group` field in the document body.
  189. <2> This indexing request uses a `routing` value of `foo`.
  190. In 2.0, the routing must be specified explicitly:
  191. [source,json]
  192. -----------------------------
  193. PUT my_index
  194. {
  195. "mappings": {
  196. "my_type": {
  197. "_routing": {
  198. "required": true <1>
  199. },
  200. "properties": {
  201. "group": {
  202. "type": "string"
  203. }
  204. }
  205. }
  206. }
  207. }
  208. PUT my_index/my_type/1?routing=bar <2>
  209. {
  210. "group": "foo"
  211. }
  212. -----------------------------
  213. <1> Routing can be marked as required to ensure it is not forgotten during indexing.
  214. <2> This indexing request uses a `routing` value of `bar`.
  215. ==== Analyzer mappings
  216. Previously, `index_analyzer` and `search_analyzer` could be set separately,
  217. while the `analyzer` setting would set both. The `index_analyzer` setting has
  218. been removed in favour of just using the `analyzer` setting.
  219. If just the `analyzer` is set, it will be used at index time and at search time. To use a different analyzer at search time, specify both the `analyzer` and a `search_analyzer`.
  220. The `index_analyzer`, `search_analyzer`, and `analyzer` type-level settings
  221. have also been removed, as is is no longer possible to select fields based on
  222. the type name.
  223. The `_analyzer` meta-field, which allowed setting an analyzer per document has
  224. also been removed. It will be ignored on older indices.
  225. ==== Date fields and Unix timestamps
  226. Previously, `date` fields would first try to parse values as a Unix timestamp
  227. -- milliseconds-since-the-epoch -- before trying to use their defined date
  228. `format`. This meant that formats like `yyyyMMdd` could never work, as values
  229. would be interpreted as timestamps.
  230. In 2.0, we have added two formats: `epoch_millis` and `epoch_second`. Only
  231. date fields that use these formats will be able to parse timestamps.
  232. These formats cannot be used in dynamic templates, because they are
  233. indistinguishable from long values.
  234. ==== Default date format
  235. The default date format has changed from `date_optional_time` to
  236. `strict_date_optional_time`, which expects a 4 digit year, and a 2 digit month
  237. and day, (and optionally, 2 digit hour, minute, and second).
  238. A dynamically added date field, by default, includes the `epoch_millis`
  239. format to support timestamp parsing. For instance:
  240. [source,js]
  241. -------------------------
  242. PUT my_index/my_type/1
  243. {
  244. "date_one": "2015-01-01" <1>
  245. }
  246. -------------------------
  247. <1> Has `format`: `"strict_date_optional_time||epoch_millis"`.
  248. [[migration-bool-fields]]
  249. ==== Boolean fields
  250. Boolean fields used to have a string fielddata with `F` meaning `false` and `T`
  251. meaning `true`. They have been refactored to use numeric fielddata, with `0`
  252. for `false` and `1` for `true`. As a consequence, the format of the responses of
  253. the following APIs changed when applied to boolean fields: `0`/`1` is returned
  254. instead of `F`/`T`:
  255. * <<search-request-fielddata-fields,fielddata fields>>
  256. * <<search-request-sort,sort values>>
  257. * <<search-aggregations-bucket-terms-aggregation,terms aggregations>>
  258. In addition, terms aggregations use a custom formatter for boolean (like for
  259. dates and ip addresses, which are also backed by numbers) in order to return
  260. the user-friendly representation of boolean fields: `false`/`true`:
  261. [source,js]
  262. ---------------
  263. "buckets": [
  264. {
  265. "key": 0,
  266. "key_as_string": "false",
  267. "doc_count": 42
  268. },
  269. {
  270. "key": 1,
  271. "key_as_string": "true",
  272. "doc_count": 12
  273. }
  274. ]
  275. ---------------
  276. ==== `index_name` and `path` removed
  277. The `index_name` setting was used to change the name of the Lucene field,
  278. and the `path` setting was used on `object` fields to determine whether the
  279. Lucene field should use the full path (including parent object fields), or
  280. just the final `name`.
  281. These setting have been removed as their purpose is better served with the
  282. <<copy-to>> parameter.
  283. ==== Murmur3 Fields
  284. Fields of type `murmur3` can no longer change `doc_values` or `index` setting.
  285. They are always mapped as follows:
  286. [source,js]
  287. ---------------------
  288. {
  289. "type": "murmur3",
  290. "index": "no",
  291. "doc_values": true
  292. }
  293. ---------------------
  294. ==== Mappings in config files not supported
  295. The ability to specify mappings in configuration files has been removed. To
  296. specify default mappings that apply to multiple indexes, use
  297. <<indices-templates,index templates>> instead.
  298. Along with this change, the following settings have ben removed:
  299. * `index.mapper.default_mapping_location`
  300. * `index.mapper.default_percolator_mapping_location`
  301. ==== Posting and doc-values codecs
  302. It is no longer possible to specify per-field postings and doc values formats
  303. in the mappings. This setting will be ignored on indices created before 2.0
  304. and will cause mapping parsing to fail on indices created on or after 2.0. For
  305. old indices, this means that new segments will be written with the default
  306. postings and doc values formats of the current codec.
  307. It is still possible to change the whole codec by using the `index.codec`
  308. setting. Please however note that using a non-default codec is discouraged as
  309. it could prevent future versions of Elasticsearch from being able to read the
  310. index.
  311. ==== Compress and compress threshold
  312. The `compress` and `compress_threshold` options have been removed from the
  313. `_source` field and fields of type `binary`. These fields are compressed by
  314. default. If you would like to increase compression levels, use the new
  315. <<index-codec,`index.codec: best_compression`>> setting instead.
  316. ==== position_offset_gap
  317. The default `position_offset_gap` is now 100. Indexes created in Elasticsearch
  318. 2.0.0 will default to using 100 and indexes created before that will continue
  319. to use the old default of 0. This was done to prevent phrase queries from
  320. matching across different values of the same term unexpectedly. Specifically,
  321. 100 was chosen to cause phrase queries with slops up to 99 to match only within
  322. a single value of a field.