123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464 |
- [[query-dsl-intervals-query]]
- === Intervals query
- ++++
- <titleabbrev>Intervals</titleabbrev>
- ++++
- Returns documents based on the order and proximity of matching terms.
- The `intervals` query uses *matching rules*, constructed from a small set of
- definitions. These rules are then applied to terms from a specified `field`.
- The definitions produce sequences of minimal intervals that span terms in a
- body of text. These intervals can be further combined and filtered by
- parent sources.
- [[intervals-query-ex-request]]
- ==== Example request
- The following `intervals` search returns documents containing `my
- favorite food` immediately followed by `hot water` or `cold porridge` in the
- `my_text` field.
- This search would match a `my_text` value of `my favorite food is cold
- porridge` but not `when it's cold my favorite food is porridge`.
- [source,console]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "all_of" : {
- "ordered" : true,
- "intervals" : [
- {
- "match" : {
- "query" : "my favorite food",
- "max_gaps" : 0,
- "ordered" : true
- }
- },
- {
- "any_of" : {
- "intervals" : [
- { "match" : { "query" : "hot water" } },
- { "match" : { "query" : "cold porridge" } }
- ]
- }
- }
- ]
- }
- }
- }
- }
- }
- --------------------------------------------------
- [[intervals-top-level-params]]
- ==== Top-level parameters for `intervals`
- [[intervals-rules]]
- `<field>`::
- +
- --
- (Required, rule object) Field you wish to search.
- The value of this parameter is a rule object used to match documents
- based on matching terms, order, and proximity.
- Valid rules include:
- * <<intervals-match,`match`>>
- * <<intervals-prefix,`prefix`>>
- * <<intervals-wildcard,`wildcard`>>
- * <<intervals-fuzzy,`fuzzy`>>
- * <<intervals-all_of,`all_of`>>
- * <<intervals-any_of,`any_of`>>
- --
- [[intervals-match]]
- ==== `match` rule parameters
- The `match` rule matches analyzed text.
- `query`::
- (Required, string) Text you wish to find in the provided `<field>`.
- `max_gaps`::
- +
- --
- (Optional, integer) Maximum number of positions between the matching terms.
- Terms further apart than this are not considered matches. Defaults to
- `-1`.
- If unspecified or set to `-1`, there is no width restriction on the match. If
- set to `0`, the terms must appear next to each other.
- --
- `ordered`::
- (Optional, boolean)
- If `true`, matching terms must appear in their specified order. Defaults to
- `false`.
- `analyzer`::
- (Optional, string) <<analysis, analyzer>> used to analyze terms in the `query`.
- Defaults to the top-level `<field>`'s analyzer.
- `filter`::
- (Optional, <<interval_filter,interval filter>> rule object) An optional interval
- filter.
- `use_field`::
- (Optional, string) If specified, then match intervals from this
- field rather than the top-level `<field>`. Terms are analyzed using the
- search analyzer from this field. This allows you to search across multiple
- fields as if they were all the same field; for example, you could index the same
- text into stemmed and unstemmed fields, and search for stemmed tokens near
- unstemmed ones.
- [[intervals-prefix]]
- ==== `prefix` rule parameters
- The `prefix` rule matches terms that start with a specified set of characters.
- This prefix can expand to match at most 128 terms. If the prefix matches more
- than 128 terms, {es} returns an error. You can use the
- <<index-prefixes,`index-prefixes`>> option in the field mapping to avoid this
- limit.
- `prefix`::
- (Required, string) Beginning characters of terms you wish to find in the
- top-level `<field>`.
- `analyzer`::
- (Optional, string) <<analysis, analyzer>> used to normalize the `prefix`.
- Defaults to the top-level `<field>`'s analyzer.
- `use_field`::
- +
- --
- (Optional, string) If specified, then match intervals from this field rather
- than the top-level `<field>`.
- The `prefix` is normalized using the search analyzer from this field, unless a
- separate `analyzer` is specified.
- --
- [[intervals-wildcard]]
- ==== `wildcard` rule parameters
- The `wildcard` rule matches terms using a wildcard pattern. This pattern can
- expand to match at most 128 terms. If the pattern matches more than 128 terms,
- {es} returns an error.
- `pattern`::
- (Required, string) Wildcard pattern used to find matching terms.
- +
- --
- This parameter supports two wildcard operators:
- * `?`, which matches any single character
- * `*`, which can match zero or more characters, including an empty one
- WARNING: Avoid beginning patterns with `*` or `?`. This can increase
- the iterations needed to find matching terms and slow search performance.
- --
- `analyzer`::
- (Optional, string) <<analysis, analyzer>> used to normalize the `pattern`.
- Defaults to the top-level `<field>`'s analyzer.
- `use_field`::
- +
- --
- (Optional, string) If specified, match intervals from this field rather than the
- top-level `<field>`.
- The `pattern` is normalized using the search analyzer from this field, unless
- `analyzer` is specified separately.
- --
- [[intervals-fuzzy]]
- ==== `fuzzy` rule parameters
- The `fuzzy` rule matches terms that are similar to the provided term, within an
- edit distance defined by <<fuzziness>>. If the fuzzy expansion matches more than
- 128 terms, {es} returns an error.
- `term`::
- (Required, string) The term to match
- `prefix_length`::
- (Optional, string) Number of beginning characters left unchanged when creating
- expansions. Defaults to `0`.
- `transpositions`::
- (Optional, boolean) Indicates whether edits include transpositions of two
- adjacent characters (ab → ba). Defaults to `true`.
- `fuzziness`::
- (Optional, string) Maximum edit distance allowed for matching. See <<fuzziness>>
- for valid values and more information. Defaults to `auto`.
- `analyzer`::
- (Optional, string) <<analysis, analyzer>> used to normalize the `term`.
- Defaults to the top-level `<field>` 's analyzer.
- `use_field`::
- +
- --
- (Optional, string) If specified, match intervals from this field rather than the
- top-level `<field>`.
- The `term` is normalized using the search analyzer from this field, unless
- `analyzer` is specified separately.
- --
- [[intervals-all_of]]
- ==== `all_of` rule parameters
- The `all_of` rule returns matches that span a combination of other rules.
- `intervals`::
- (Required, array of rule objects) An array of rules to combine. All rules must
- produce a match in a document for the overall source to match.
- `max_gaps`::
- +
- --
- (Optional, integer) Maximum number of positions between the matching terms.
- Intervals produced by the rules further apart than this are not considered
- matches. Defaults to `-1`.
- If unspecified or set to `-1`, there is no width restriction on the match. If
- set to `0`, the terms must appear next to each other.
- --
- `ordered`::
- (Optional, boolean) If `true`, intervals produced by the rules should appear in
- the order in which they are specified. Defaults to `false`.
- `filter`::
- (Optional, <<interval_filter,interval filter>> rule object) Rule used to filter
- returned intervals.
- [[intervals-any_of]]
- ==== `any_of` rule parameters
- The `any_of` rule returns intervals produced by any of its sub-rules.
- `intervals`::
- (Required, array of rule objects) An array of rules to match.
- `filter`::
- (Optional, <<interval_filter,interval filter>> rule object) Rule used to filter
- returned intervals.
- [[interval_filter]]
- ==== `filter` rule parameters
- The `filter` rule returns intervals based on a query. See
- <<interval-filter-rule-ex>> for an example.
- `after`::
- (Optional, query object) Query used to return intervals that follow an interval
- from the `filter` rule.
- `before`::
- (Optional, query object) Query used to return intervals that occur before an
- interval from the `filter` rule.
- `contained_by`::
- (Optional, query object) Query used to return intervals contained by an interval
- from the `filter` rule.
- `containing`::
- (Optional, query object) Query used to return intervals that contain an interval
- from the `filter` rule.
- `not_contained_by`::
- (Optional, query object) Query used to return intervals that are *not*
- contained by an interval from the `filter` rule.
- `not_containing`::
- (Optional, query object) Query used to return intervals that do *not* contain
- an interval from the `filter` rule.
- `not_overlapping`::
- (Optional, query object) Query used to return intervals that do *not* overlap
- with an interval from the `filter` rule.
- `overlapping`::
- (Optional, query object) Query used to return intervals that overlap with an
- interval from the `filter` rule.
- `script`::
- (Optional, <<modules-scripting-using, script object>>) Script used to return
- matching documents. This script must return a boolean value, `true` or `false`.
- See <<interval-script-filter>> for an example.
- [[intervals-query-note]]
- ==== Notes
- [[interval-filter-rule-ex]]
- ===== Filter example
- The following search includes a `filter` rule. It returns documents that have
- the words `hot` and `porridge` within 10 positions of each other, without the
- word `salty` in between:
- [source,console]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "match" : {
- "query" : "hot porridge",
- "max_gaps" : 10,
- "filter" : {
- "not_containing" : {
- "match" : {
- "query" : "salty"
- }
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- [[interval-script-filter]]
- ===== Script filters
- You can use a script to filter intervals based on their start position, end
- position, and internal gap count. The following `filter` script uses the
- `interval` variable with the `start`, `end`, and `gaps` methods:
- [source,console]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "match" : {
- "query" : "hot porridge",
- "filter" : {
- "script" : {
- "source" : "interval.start > 10 && interval.end < 20 && interval.gaps == 0"
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- [[interval-minimization]]
- ===== Minimization
- The intervals query always minimizes intervals, to ensure that queries can
- run in linear time. This can sometimes cause surprising results, particularly
- when using `max_gaps` restrictions or filters. For example, take the
- following query, searching for `salty` contained within the phrase `hot
- porridge`:
- [source,console]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "match" : {
- "query" : "salty",
- "filter" : {
- "contained_by" : {
- "match" : {
- "query" : "hot porridge"
- }
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- This query does *not* match a document containing the phrase `hot porridge is
- salty porridge`, because the intervals returned by the match query for `hot
- porridge` only cover the initial two terms in this document, and these do not
- overlap the intervals covering `salty`.
- Another restriction to be aware of is the case of `any_of` rules that contain
- sub-rules which overlap. In particular, if one of the rules is a strict
- prefix of the other, then the longer rule can never match, which can
- cause surprises when used in combination with `max_gaps`. Consider the
- following query, searching for `the` immediately followed by `big` or `big bad`,
- immediately followed by `wolf`:
- [source,console]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "all_of" : {
- "intervals" : [
- { "match" : { "query" : "the" } },
- { "any_of" : {
- "intervals" : [
- { "match" : { "query" : "big" } },
- { "match" : { "query" : "big bad" } }
- ] } },
- { "match" : { "query" : "wolf" } }
- ],
- "max_gaps" : 0,
- "ordered" : true
- }
- }
- }
- }
- }
- --------------------------------------------------
- Counter-intuitively, this query does *not* match the document `the big bad
- wolf`, because the `any_of` rule in the middle only produces intervals
- for `big` - intervals for `big bad` being longer than those for `big`, while
- starting at the same position, and so being minimized away. In these cases,
- it's better to rewrite the query so that all of the options are explicitly
- laid out at the top level:
- [source,console]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "any_of" : {
- "intervals" : [
- { "match" : {
- "query" : "the big bad wolf",
- "ordered" : true,
- "max_gaps" : 0 } },
- { "match" : {
- "query" : "the big wolf",
- "ordered" : true,
- "max_gaps" : 0 } }
- ]
- }
- }
- }
- }
- }
- --------------------------------------------------
|