123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566 |
- [[ignore-above]]
- === `ignore_above`
- Strings longer than the `ignore_above` setting will not be processed by the
- <<analyzer,analyzer>> and will not be indexed. This is mainly useful for
- <<mapping-index,`not_analyzed`>> string fields, which are typically used for
- filtering, aggregations, and sorting. These are structured fields and it
- doesn't usually make sense to allow very long terms to be indexed in these
- fields.
- [source,js]
- --------------------------------------------------
- PUT my_index
- {
- "mappings": {
- "my_type": {
- "properties": {
- "message": {
- "type": "string",
- "index": "not_analyzed",
- "ignore_above": 20 <1>
- }
- }
- }
- }
- }
- PUT my_index/my_type/1 <2>
- {
- "message": "Syntax error"
- }
- PUT my_index/my_type/2 <3>
- {
- "message": "Syntax error with some long stacktrace"
- }
- GET _search <4>
- {
- "aggs": {
- "messages": {
- "terms": {
- "field": "message"
- }
- }
- }
- }
- --------------------------------------------------
- // AUTOSENSE
- <1> This field will ignore any string longer than 20 characters.
- <2> This document is indexed successfully.
- <3> This document will be indexed, but without indexing the `message` field.
- <4> Search returns both documents, but only the first is present in the terms aggregation.
- TIP: The `ignore_above` setting is allowed to have different settings for
- fields of the same name in the same index. Its value can be updated on
- existing fields using the <<indices-put-mapping,PUT mapping API>>.
- This option is also useful for protecting against Lucene's term byte-length
- limit of `32766`.
- NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
- bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
- set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most
- 3 bytes.
|