intervals-query.asciidoc 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350
  1. [[query-dsl-intervals-query]]
  2. === Intervals query
  3. ++++
  4. <titleabbrev>Intervals</titleabbrev>
  5. ++++
  6. An `intervals` query allows fine-grained control over the order and proximity of
  7. matching terms. Matching rules are constructed from a small set of definitions,
  8. and the rules are then applied to terms from a particular `field`.
  9. The definitions produce sequences of minimal intervals that span terms in a
  10. body of text. These intervals can be further combined and filtered by
  11. parent sources.
  12. The example below will search for the phrase `my favourite food` appearing
  13. before the terms `hot` and `water` or `cold` and `porridge` in any order, in
  14. the field `my_text`
  15. [source,js]
  16. --------------------------------------------------
  17. POST _search
  18. {
  19. "query": {
  20. "intervals" : {
  21. "my_text" : {
  22. "all_of" : {
  23. "ordered" : true,
  24. "intervals" : [
  25. {
  26. "match" : {
  27. "query" : "my favourite food",
  28. "max_gaps" : 0,
  29. "ordered" : true
  30. }
  31. },
  32. {
  33. "any_of" : {
  34. "intervals" : [
  35. { "match" : { "query" : "hot water" } },
  36. { "match" : { "query" : "cold porridge" } }
  37. ]
  38. }
  39. }
  40. ]
  41. },
  42. "_name" : "favourite_food"
  43. }
  44. }
  45. }
  46. }
  47. --------------------------------------------------
  48. // CONSOLE
  49. In the above example, the text `my favourite food is cold porridge` would
  50. match because the two intervals matching `my favourite food` and `cold
  51. porridge` appear in the correct order, but the text `when it's cold my
  52. favourite food is porridge` would not match, because the interval matching
  53. `cold porridge` starts before the interval matching `my favourite food`.
  54. [[intervals-match]]
  55. ==== `match`
  56. The `match` rule matches analyzed text, and takes the following parameters:
  57. [horizontal]
  58. `query`::
  59. The text to match.
  60. `max_gaps`::
  61. Specify a maximum number of gaps between the terms in the text. Terms that
  62. appear further apart than this will not match. If unspecified, or set to -1,
  63. then there is no width restriction on the match. If set to 0 then the terms
  64. must appear next to each other.
  65. `ordered`::
  66. Whether or not the terms must appear in their specified order. Defaults to
  67. `false`
  68. `analyzer`::
  69. Which analyzer should be used to analyze terms in the `query`. By
  70. default, the search analyzer of the top-level field will be used.
  71. `filter`::
  72. An optional <<interval_filter,interval filter>>
  73. `use_field`::
  74. If specified, then match intervals from this field rather than the top-level field.
  75. Terms will be analyzed using the search analyzer from this field. This allows you
  76. to search across multiple fields as if they were all the same field; for example,
  77. you could index the same text into stemmed and unstemmed fields, and search for
  78. stemmed tokens near unstemmed ones.
  79. [[intervals-prefix]]
  80. ==== `prefix`
  81. The `prefix` rule finds terms that start with a specified prefix. The prefix will
  82. expand to match at most 128 terms; if there are more matching terms in the index,
  83. then an error will be returned. To avoid this limit, enable the
  84. <<index-prefixes,`index-prefixes`>> option on the field being searched.
  85. [horizontal]
  86. `prefix`::
  87. Match terms starting with this prefix
  88. `analyzer`::
  89. Which analyzer should be used to normalize the `prefix`. By default, the
  90. search analyzer of the top-level field will be used.
  91. `use_field`::
  92. If specified, then match intervals from this field rather than the top-level field.
  93. The `prefix` will be normalized using the search analyzer from this field, unless
  94. `analyzer` is specified separately.
  95. [[intervals-wildcard]]
  96. ==== `wildcard`
  97. The `wildcard` rule finds terms that match a wildcard pattern. The pattern will
  98. expand to match at most 128 terms; if there are more matching terms in the index,
  99. then an error will be returned.
  100. [horizontal]
  101. `pattern`::
  102. Find terms matching this pattern
  103. +
  104. --
  105. This parameter supports two wildcard operators:
  106. * `?`, which matches any single character
  107. * `*`, which can match zero or more characters, including an empty one
  108. WARNING: Avoid beginning patterns with `*` or `?`. This can increase
  109. the iterations needed to find matching terms and slow search performance.
  110. --
  111. `analyzer`::
  112. Which analyzer should be used to normalize the `pattern`. By default, the
  113. search analyzer of the top-level field will be used.
  114. `use_field`::
  115. If specified, then match intervals from this field rather than the top-level field.
  116. The `pattern` will be normalized using the search analyzer from this field, unless
  117. `analyzer` is specified separately.
  118. [[intervals-all_of]]
  119. ==== `all_of`
  120. `all_of` returns returns matches that span a combination of other rules.
  121. [horizontal]
  122. `intervals`::
  123. An array of rules to combine. All rules must produce a match in a
  124. document for the overall source to match.
  125. `max_gaps`::
  126. Specify a maximum number of gaps between the rules. Combinations that match
  127. across a distance greater than this will not match. If set to -1 or
  128. unspecified, there is no restriction on this distance. If set to 0, then the
  129. matches produced by the rules must all appear immediately next to each other.
  130. `ordered`::
  131. Whether the intervals produced by the rules should appear in the order in
  132. which they are specified. Defaults to `false`
  133. `filter`::
  134. An optional <<interval_filter,interval filter>>
  135. [[intervals-any_of]]
  136. ==== `any_of`
  137. The `any_of` rule emits intervals produced by any of its sub-rules.
  138. [horizontal]
  139. `intervals`::
  140. An array of rules to match
  141. `filter`::
  142. An optional <<interval_filter,interval filter>>
  143. [[interval_filter]]
  144. ==== filters
  145. You can filter intervals produced by any rules by their relation to the
  146. intervals produced by another rule. The following example will return
  147. documents that have the words `hot` and `porridge` within 10 positions
  148. of each other, without the word `salty` in between:
  149. [source,js]
  150. --------------------------------------------------
  151. POST _search
  152. {
  153. "query": {
  154. "intervals" : {
  155. "my_text" : {
  156. "match" : {
  157. "query" : "hot porridge",
  158. "max_gaps" : 10,
  159. "filter" : {
  160. "not_containing" : {
  161. "match" : {
  162. "query" : "salty"
  163. }
  164. }
  165. }
  166. }
  167. }
  168. }
  169. }
  170. }
  171. --------------------------------------------------
  172. // CONSOLE
  173. The following filters are available:
  174. [horizontal]
  175. `containing`::
  176. Produces intervals that contain an interval from the filter rule
  177. `contained_by`::
  178. Produces intervals that are contained by an interval from the filter rule
  179. `not_containing`::
  180. Produces intervals that do not contain an interval from the filter rule
  181. `not_contained_by`::
  182. Produces intervals that are not contained by an interval from the filter rule
  183. `overlapping`::
  184. Produces intervals that overlap with an interval from the filter rule
  185. `not_overlapping`::
  186. Produces intervals that do not overlap with an interval from the filter rule
  187. `before`::
  188. Produces intervals that appear before an interval from the filter role
  189. `after`::
  190. Produces intervals that appear after an interval from the filter role
  191. [[interval-script-filter]]
  192. ==== Script filters
  193. You can also filter intervals based on their start position, end position and
  194. internal gap count, using a script. The script has access to an `interval`
  195. variable, with `start`, `end` and `gaps` methods:
  196. [source,js]
  197. --------------------------------------------------
  198. POST _search
  199. {
  200. "query": {
  201. "intervals" : {
  202. "my_text" : {
  203. "match" : {
  204. "query" : "hot porridge",
  205. "filter" : {
  206. "script" : {
  207. "source" : "interval.start > 10 && interval.end < 20 && interval.gaps == 0"
  208. }
  209. }
  210. }
  211. }
  212. }
  213. }
  214. }
  215. --------------------------------------------------
  216. // CONSOLE
  217. [[interval-minimization]]
  218. ==== Minimization
  219. The intervals query always minimizes intervals, to ensure that queries can
  220. run in linear time. This can sometimes cause surprising results, particularly
  221. when using `max_gaps` restrictions or filters. For example, take the
  222. following query, searching for `salty` contained within the phrase `hot
  223. porridge`:
  224. [source,js]
  225. --------------------------------------------------
  226. POST _search
  227. {
  228. "query": {
  229. "intervals" : {
  230. "my_text" : {
  231. "match" : {
  232. "query" : "salty",
  233. "filter" : {
  234. "contained_by" : {
  235. "match" : {
  236. "query" : "hot porridge"
  237. }
  238. }
  239. }
  240. }
  241. }
  242. }
  243. }
  244. }
  245. --------------------------------------------------
  246. // CONSOLE
  247. This query will *not* match a document containing the phrase `hot porridge is
  248. salty porridge`, because the intervals returned by the match query for `hot
  249. porridge` only cover the initial two terms in this document, and these do not
  250. overlap the intervals covering `salty`.
  251. Another restriction to be aware of is the case of `any_of` rules that contain
  252. sub-rules which overlap. In particular, if one of the rules is a strict
  253. prefix of the other, then the longer rule will never be matched, which can
  254. cause surprises when used in combination with `max_gaps`. Consider the
  255. following query, searching for `the` immediately followed by `big` or `big bad`,
  256. immediately followed by `wolf`:
  257. [source,js]
  258. --------------------------------------------------
  259. POST _search
  260. {
  261. "query": {
  262. "intervals" : {
  263. "my_text" : {
  264. "all_of" : {
  265. "intervals" : [
  266. { "match" : { "query" : "the" } },
  267. { "any_of" : {
  268. "intervals" : [
  269. { "match" : { "query" : "big" } },
  270. { "match" : { "query" : "big bad" } }
  271. ] } },
  272. { "match" : { "query" : "wolf" } }
  273. ],
  274. "max_gaps" : 0,
  275. "ordered" : true
  276. }
  277. }
  278. }
  279. }
  280. }
  281. --------------------------------------------------
  282. // CONSOLE
  283. Counter-intuitively, this query *will not* match the document `the big bad
  284. wolf`, because the `any_of` rule in the middle will only produce intervals
  285. for `big` - intervals for `big bad` being longer than those for `big`, while
  286. starting at the same position, and so being minimized away. In these cases,
  287. it's better to rewrite the query so that all of the options are explicitly
  288. laid out at the top level:
  289. [source,js]
  290. --------------------------------------------------
  291. POST _search
  292. {
  293. "query": {
  294. "intervals" : {
  295. "my_text" : {
  296. "any_of" : {
  297. "intervals" : [
  298. { "match" : {
  299. "query" : "the big bad wolf",
  300. "ordered" : true,
  301. "max_gaps" : 0 } },
  302. { "match" : {
  303. "query" : "the big wolf",
  304. "ordered" : true,
  305. "max_gaps" : 0 } }
  306. ]
  307. }
  308. }
  309. }
  310. }
  311. }
  312. --------------------------------------------------
  313. // CONSOLE