ignore-above.asciidoc 1.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
  1. [[ignore-above]]
  2. === `ignore_above`
  3. Strings longer than the `ignore_above` setting will not be indexed or stored.
  4. For arrays of strings, `ignore_above` will be applied for each array element separately and string elements longer than `ignore_above` will not be indexed or stored.
  5. NOTE: All strings/array elements will still be present in the `_source` field, if the latter is enabled which is the default in Elasticsearch.
  6. [source,js]
  7. --------------------------------------------------
  8. PUT my_index
  9. {
  10. "mappings": {
  11. "_doc": {
  12. "properties": {
  13. "message": {
  14. "type": "keyword",
  15. "ignore_above": 20 <1>
  16. }
  17. }
  18. }
  19. }
  20. }
  21. PUT my_index/_doc/1 <2>
  22. {
  23. "message": "Syntax error"
  24. }
  25. PUT my_index/_doc/2 <3>
  26. {
  27. "message": "Syntax error with some long stacktrace"
  28. }
  29. GET _search <4>
  30. {
  31. "aggs": {
  32. "messages": {
  33. "terms": {
  34. "field": "message"
  35. }
  36. }
  37. }
  38. }
  39. --------------------------------------------------
  40. // CONSOLE
  41. <1> This field will ignore any string longer than 20 characters.
  42. <2> This document is indexed successfully.
  43. <3> This document will be indexed, but without indexing the `message` field.
  44. <4> Search returns both documents, but only the first is present in the terms aggregation.
  45. TIP: The `ignore_above` setting is allowed to have different settings for
  46. fields of the same name in the same index. Its value can be updated on
  47. existing fields using the <<indices-put-mapping,PUT mapping API>>.
  48. This option is also useful for protecting against Lucene's term byte-length
  49. limit of `32766`.
  50. NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
  51. bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
  52. set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
  53. 4 bytes.