intervals-query.asciidoc 8.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301
  1. [[query-dsl-intervals-query]]
  2. === Intervals query
  3. An `intervals` query allows fine-grained control over the order and proximity of
  4. matching terms. Matching rules are constructed from a small set of definitions,
  5. and the rules are then applied to terms from a particular `field`.
  6. The definitions produce sequences of minimal intervals that span terms in a
  7. body of text. These intervals can be further combined and filtered by
  8. parent sources.
  9. The example below will search for the phrase `my favourite food` appearing
  10. before the terms `hot` and `water` or `cold` and `porridge` in any order, in
  11. the field `my_text`
  12. [source,js]
  13. --------------------------------------------------
  14. POST _search
  15. {
  16. "query": {
  17. "intervals" : {
  18. "my_text" : {
  19. "all_of" : {
  20. "ordered" : true,
  21. "intervals" : [
  22. {
  23. "match" : {
  24. "query" : "my favourite food",
  25. "max_gaps" : 0,
  26. "ordered" : true
  27. }
  28. },
  29. {
  30. "any_of" : {
  31. "intervals" : [
  32. { "match" : { "query" : "hot water" } },
  33. { "match" : { "query" : "cold porridge" } }
  34. ]
  35. }
  36. }
  37. ]
  38. },
  39. "boost" : 2.0,
  40. "_name" : "favourite_food"
  41. }
  42. }
  43. }
  44. }
  45. --------------------------------------------------
  46. // CONSOLE
  47. In the above example, the text `my favourite food is cold porridge` would
  48. match because the two intervals matching `my favourite food` and `cold
  49. porridge` appear in the correct order, but the text `when it's cold my
  50. favourite food is porridge` would not match, because the interval matching
  51. `cold porridge` starts before the interval matching `my favourite food`.
  52. [[intervals-match]]
  53. ==== `match`
  54. The `match` rule matches analyzed text, and takes the following parameters:
  55. [horizontal]
  56. `query`::
  57. The text to match.
  58. `max_gaps`::
  59. Specify a maximum number of gaps between the terms in the text. Terms that
  60. appear further apart than this will not match. If unspecified, or set to -1,
  61. then there is no width restriction on the match. If set to 0 then the terms
  62. must appear next to each other.
  63. `ordered`::
  64. Whether or not the terms must appear in their specified order. Defaults to
  65. `false`
  66. `analyzer`::
  67. Which analyzer should be used to analyze terms in the `query`. By
  68. default, the search analyzer of the top-level field will be used.
  69. `filter`::
  70. An optional <<interval_filter,interval filter>>
  71. `use_field`::
  72. If specified, then match intervals from this field rather than the top-level field.
  73. Terms will be analyzed using the search analyzer from this field. This allows you
  74. to search across multiple fields as if they were all the same field; for example,
  75. you could index the same text into stemmed and unstemmed fields, and search for
  76. stemmed tokens near unstemmed ones.
  77. [[intervals-all_of]]
  78. ==== `all_of`
  79. `all_of` returns returns matches that span a combination of other rules.
  80. [horizontal]
  81. `intervals`::
  82. An array of rules to combine. All rules must produce a match in a
  83. document for the overall source to match.
  84. `max_gaps`::
  85. Specify a maximum number of gaps between the rules. Combinations that match
  86. across a distance greater than this will not match. If set to -1 or
  87. unspecified, there is no restriction on this distance. If set to 0, then the
  88. matches produced by the rules must all appear immediately next to each other.
  89. `ordered`::
  90. Whether the intervals produced by the rules should appear in the order in
  91. which they are specified. Defaults to `false`
  92. `filter`::
  93. An optional <<interval_filter,interval filter>>
  94. [[intervals-any_of]]
  95. ==== `any_of`
  96. The `any_of` rule emits intervals produced by any of its sub-rules.
  97. [horizontal]
  98. `intervals`::
  99. An array of rules to match
  100. `filter`::
  101. An optional <<interval_filter,interval filter>>
  102. [[interval_filter]]
  103. ==== filters
  104. You can filter intervals produced by any rules by their relation to the
  105. intervals produced by another rule. The following example will return
  106. documents that have the words `hot` and `porridge` within 10 positions
  107. of each other, without the word `salty` in between:
  108. [source,js]
  109. --------------------------------------------------
  110. POST _search
  111. {
  112. "query": {
  113. "intervals" : {
  114. "my_text" : {
  115. "match" : {
  116. "query" : "hot porridge",
  117. "max_gaps" : 10,
  118. "filter" : {
  119. "not_containing" : {
  120. "match" : {
  121. "query" : "salty"
  122. }
  123. }
  124. }
  125. }
  126. }
  127. }
  128. }
  129. }
  130. --------------------------------------------------
  131. // CONSOLE
  132. The following filters are available:
  133. [horizontal]
  134. `containing`::
  135. Produces intervals that contain an interval from the filter rule
  136. `contained_by`::
  137. Produces intervals that are contained by an interval from the filter rule
  138. `not_containing`::
  139. Produces intervals that do not contain an interval from the filter rule
  140. `not_contained_by`::
  141. Produces intervals that are not contained by an interval from the filter rule
  142. `overlapping`::
  143. Produces intervals that overlap with an interval from the filter rule
  144. `not_overlapping`::
  145. Produces intervals that do not overlap with an interval from the filter rule
  146. `before`::
  147. Produces intervals that appear before an interval from the filter role
  148. `after`::
  149. Produces intervals that appear after an interval from the filter role
  150. [[interval-script-filter]]
  151. ==== Script filters
  152. You can also filter intervals based on their start position, end position and
  153. internal gap count, using a script. The script has access to an `interval`
  154. variable, with `start`, `end` and `gaps` methods:
  155. [source,js]
  156. --------------------------------------------------
  157. POST _search
  158. {
  159. "query": {
  160. "intervals" : {
  161. "my_text" : {
  162. "match" : {
  163. "query" : "hot porridge",
  164. "filter" : {
  165. "script" : {
  166. "source" : "interval.start > 10 && interval.end < 20 && interval.gaps == 0"
  167. }
  168. }
  169. }
  170. }
  171. }
  172. }
  173. }
  174. --------------------------------------------------
  175. // CONSOLE
  176. [[interval-minimization]]
  177. ==== Minimization
  178. The intervals query always minimizes intervals, to ensure that queries can
  179. run in linear time. This can sometimes cause surprising results, particularly
  180. when using `max_gaps` restrictions or filters. For example, take the
  181. following query, searching for `salty` contained within the phrase `hot
  182. porridge`:
  183. [source,js]
  184. --------------------------------------------------
  185. POST _search
  186. {
  187. "query": {
  188. "intervals" : {
  189. "my_text" : {
  190. "match" : {
  191. "query" : "salty",
  192. "filter" : {
  193. "contained_by" : {
  194. "match" : {
  195. "query" : "hot porridge"
  196. }
  197. }
  198. }
  199. }
  200. }
  201. }
  202. }
  203. }
  204. --------------------------------------------------
  205. // CONSOLE
  206. This query will *not* match a document containing the phrase `hot porridge is
  207. salty porridge`, because the intervals returned by the match query for `hot
  208. porridge` only cover the initial two terms in this document, and these do not
  209. overlap the intervals covering `salty`.
  210. Another restriction to be aware of is the case of `any_of` rules that contain
  211. sub-rules which overlap. In particular, if one of the rules is a strict
  212. prefix of the other, then the longer rule will never be matched, which can
  213. cause surprises when used in combination with `max_gaps`. Consider the
  214. following query, searching for `the` immediately followed by `big` or `big bad`,
  215. immediately followed by `wolf`:
  216. [source,js]
  217. --------------------------------------------------
  218. POST _search
  219. {
  220. "query": {
  221. "intervals" : {
  222. "my_text" : {
  223. "all_of" : {
  224. "intervals" : [
  225. { "match" : { "query" : "the" } },
  226. { "any_of" : {
  227. "intervals" : [
  228. { "match" : { "query" : "big" } },
  229. { "match" : { "query" : "big bad" } }
  230. ] } },
  231. { "match" : { "query" : "wolf" } }
  232. ],
  233. "max_gaps" : 0,
  234. "ordered" : true
  235. }
  236. }
  237. }
  238. }
  239. }
  240. --------------------------------------------------
  241. // CONSOLE
  242. Counter-intuitively, this query *will not* match the document `the big bad
  243. wolf`, because the `any_of` rule in the middle will only produce intervals
  244. for `big` - intervals for `big bad` being longer than those for `big`, while
  245. starting at the same position, and so being minimized away. In these cases,
  246. it's better to rewrite the query so that all of the options are explicitly
  247. laid out at the top level:
  248. [source,js]
  249. --------------------------------------------------
  250. POST _search
  251. {
  252. "query": {
  253. "intervals" : {
  254. "my_text" : {
  255. "any_of" : {
  256. "intervals" : [
  257. { "match" : {
  258. "query" : "the big bad wolf",
  259. "ordered" : true,
  260. "max_gaps" : 0 } },
  261. { "match" : {
  262. "query" : "the big wolf",
  263. "ordered" : true,
  264. "max_gaps" : 0 } }
  265. ]
  266. }
  267. }
  268. }
  269. }
  270. }
  271. --------------------------------------------------
  272. // CONSOLE