123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300 |
- [[query-dsl-intervals-query]]
- === Intervals query
- An `intervals` query allows fine-grained control over the order and proximity of
- matching terms. Matching rules are constructed from a small set of definitions,
- and the rules are then applied to terms from a particular `field`.
- The definitions produce sequences of minimal intervals that span terms in a
- body of text. These intervals can be further combined and filtered by
- parent sources.
- The example below will search for the phrase `my favourite food` appearing
- before the terms `hot` and `water` or `cold` and `porridge` in any order, in
- the field `my_text`
- [source,js]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "all_of" : {
- "ordered" : true,
- "intervals" : [
- {
- "match" : {
- "query" : "my favourite food",
- "max_gaps" : 0,
- "ordered" : true
- }
- },
- {
- "any_of" : {
- "intervals" : [
- { "match" : { "query" : "hot water" } },
- { "match" : { "query" : "cold porridge" } }
- ]
- }
- }
- ]
- },
- "_name" : "favourite_food"
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- In the above example, the text `my favourite food is cold porridge` would
- match because the two intervals matching `my favourite food` and `cold
- porridge` appear in the correct order, but the text `when it's cold my
- favourite food is porridge` would not match, because the interval matching
- `cold porridge` starts before the interval matching `my favourite food`.
- [[intervals-match]]
- ==== `match`
- The `match` rule matches analyzed text, and takes the following parameters:
- [horizontal]
- `query`::
- The text to match.
- `max_gaps`::
- Specify a maximum number of gaps between the terms in the text. Terms that
- appear further apart than this will not match. If unspecified, or set to -1,
- then there is no width restriction on the match. If set to 0 then the terms
- must appear next to each other.
- `ordered`::
- Whether or not the terms must appear in their specified order. Defaults to
- `false`
- `analyzer`::
- Which analyzer should be used to analyze terms in the `query`. By
- default, the search analyzer of the top-level field will be used.
- `filter`::
- An optional <<interval_filter,interval filter>>
- `use_field`::
- If specified, then match intervals from this field rather than the top-level field.
- Terms will be analyzed using the search analyzer from this field. This allows you
- to search across multiple fields as if they were all the same field; for example,
- you could index the same text into stemmed and unstemmed fields, and search for
- stemmed tokens near unstemmed ones.
- [[intervals-all_of]]
- ==== `all_of`
- `all_of` returns returns matches that span a combination of other rules.
- [horizontal]
- `intervals`::
- An array of rules to combine. All rules must produce a match in a
- document for the overall source to match.
- `max_gaps`::
- Specify a maximum number of gaps between the rules. Combinations that match
- across a distance greater than this will not match. If set to -1 or
- unspecified, there is no restriction on this distance. If set to 0, then the
- matches produced by the rules must all appear immediately next to each other.
- `ordered`::
- Whether the intervals produced by the rules should appear in the order in
- which they are specified. Defaults to `false`
- `filter`::
- An optional <<interval_filter,interval filter>>
- [[intervals-any_of]]
- ==== `any_of`
- The `any_of` rule emits intervals produced by any of its sub-rules.
- [horizontal]
- `intervals`::
- An array of rules to match
- `filter`::
- An optional <<interval_filter,interval filter>>
- [[interval_filter]]
- ==== filters
- You can filter intervals produced by any rules by their relation to the
- intervals produced by another rule. The following example will return
- documents that have the words `hot` and `porridge` within 10 positions
- of each other, without the word `salty` in between:
- [source,js]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "match" : {
- "query" : "hot porridge",
- "max_gaps" : 10,
- "filter" : {
- "not_containing" : {
- "match" : {
- "query" : "salty"
- }
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- The following filters are available:
- [horizontal]
- `containing`::
- Produces intervals that contain an interval from the filter rule
- `contained_by`::
- Produces intervals that are contained by an interval from the filter rule
- `not_containing`::
- Produces intervals that do not contain an interval from the filter rule
- `not_contained_by`::
- Produces intervals that are not contained by an interval from the filter rule
- `overlapping`::
- Produces intervals that overlap with an interval from the filter rule
- `not_overlapping`::
- Produces intervals that do not overlap with an interval from the filter rule
- `before`::
- Produces intervals that appear before an interval from the filter role
- `after`::
- Produces intervals that appear after an interval from the filter role
- [[interval-script-filter]]
- ==== Script filters
- You can also filter intervals based on their start position, end position and
- internal gap count, using a script. The script has access to an `interval`
- variable, with `start`, `end` and `gaps` methods:
- [source,js]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "match" : {
- "query" : "hot porridge",
- "filter" : {
- "script" : {
- "source" : "interval.start > 10 && interval.end < 20 && interval.gaps == 0"
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- [[interval-minimization]]
- ==== Minimization
- The intervals query always minimizes intervals, to ensure that queries can
- run in linear time. This can sometimes cause surprising results, particularly
- when using `max_gaps` restrictions or filters. For example, take the
- following query, searching for `salty` contained within the phrase `hot
- porridge`:
- [source,js]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "match" : {
- "query" : "salty",
- "filter" : {
- "contained_by" : {
- "match" : {
- "query" : "hot porridge"
- }
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- This query will *not* match a document containing the phrase `hot porridge is
- salty porridge`, because the intervals returned by the match query for `hot
- porridge` only cover the initial two terms in this document, and these do not
- overlap the intervals covering `salty`.
- Another restriction to be aware of is the case of `any_of` rules that contain
- sub-rules which overlap. In particular, if one of the rules is a strict
- prefix of the other, then the longer rule will never be matched, which can
- cause surprises when used in combination with `max_gaps`. Consider the
- following query, searching for `the` immediately followed by `big` or `big bad`,
- immediately followed by `wolf`:
- [source,js]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "all_of" : {
- "intervals" : [
- { "match" : { "query" : "the" } },
- { "any_of" : {
- "intervals" : [
- { "match" : { "query" : "big" } },
- { "match" : { "query" : "big bad" } }
- ] } },
- { "match" : { "query" : "wolf" } }
- ],
- "max_gaps" : 0,
- "ordered" : true
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- Counter-intuitively, this query *will not* match the document `the big bad
- wolf`, because the `any_of` rule in the middle will only produce intervals
- for `big` - intervals for `big bad` being longer than those for `big`, while
- starting at the same position, and so being minimized away. In these cases,
- it's better to rewrite the query so that all of the options are explicitly
- laid out at the top level:
- [source,js]
- --------------------------------------------------
- POST _search
- {
- "query": {
- "intervals" : {
- "my_text" : {
- "any_of" : {
- "intervals" : [
- { "match" : {
- "query" : "the big bad wolf",
- "ordered" : true,
- "max_gaps" : 0 } },
- { "match" : {
- "query" : "the big wolf",
- "ordered" : true,
- "max_gaps" : 0 } }
- ]
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
|