lqb
/
elasticsearch
mirror of https://gitee.com/mirrors/elasticsearch.git


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
							[[ignore-above]]
=== `ignore_above`

Strings longer than the `ignore_above` setting will not be indexed or stored.
For arrays of strings, `ignore_above` will be applied for each array element separately and string elements longer than `ignore_above` will not be indexed or stored.

NOTE: All strings/array elements will still be present in the `_source` field, if the latter is enabled which is the default in Elasticsearch.

[source,js]
--------------------------------------------------
PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "message": {
          "type": "keyword",
          "ignore_above": 20 <1>
        }
      }
    }
  }
}

PUT my_index/_doc/1 <2>
{
  "message": "Syntax error"
}

PUT my_index/_doc/2 <3>
{
  "message": "Syntax error with some long stacktrace"
}

GET _search <4>
{
  "aggs": {
    "messages": {
      "terms": {
        "field": "message"
      }
    }
  }
}
--------------------------------------------------
// CONSOLE
<1> This field will ignore any string longer than 20 characters.
<2> This document is indexed successfully.
<3> This document will be indexed, but without indexing the `message` field.
<4> Search returns both documents, but only the first is present in the terms aggregation.

TIP: The `ignore_above` setting is allowed to have different settings for
fields of the same name in the same index.  Its value can be updated on
existing fields using the <<indices-put-mapping,PUT mapping API>>.


This option is also useful for protecting against Lucene's term byte-length
limit of `32766`.

NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
4 bytes.