intervals-query.asciidoc 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464
  1. [[query-dsl-intervals-query]]
  2. === Intervals query
  3. ++++
  4. <titleabbrev>Intervals</titleabbrev>
  5. ++++
  6. Returns documents based on the order and proximity of matching terms.
  7. The `intervals` query uses *matching rules*, constructed from a small set of
  8. definitions. These rules are then applied to terms from a specified `field`.
  9. The definitions produce sequences of minimal intervals that span terms in a
  10. body of text. These intervals can be further combined and filtered by
  11. parent sources.
  12. [[intervals-query-ex-request]]
  13. ==== Example request
  14. The following `intervals` search returns documents containing `my
  15. favorite food` immediately followed by `hot water` or `cold porridge` in the
  16. `my_text` field.
  17. This search would match a `my_text` value of `my favorite food is cold
  18. porridge` but not `when it's cold my favorite food is porridge`.
  19. [source,console]
  20. --------------------------------------------------
  21. POST _search
  22. {
  23. "query": {
  24. "intervals" : {
  25. "my_text" : {
  26. "all_of" : {
  27. "ordered" : true,
  28. "intervals" : [
  29. {
  30. "match" : {
  31. "query" : "my favorite food",
  32. "max_gaps" : 0,
  33. "ordered" : true
  34. }
  35. },
  36. {
  37. "any_of" : {
  38. "intervals" : [
  39. { "match" : { "query" : "hot water" } },
  40. { "match" : { "query" : "cold porridge" } }
  41. ]
  42. }
  43. }
  44. ]
  45. }
  46. }
  47. }
  48. }
  49. }
  50. --------------------------------------------------
  51. [[intervals-top-level-params]]
  52. ==== Top-level parameters for `intervals`
  53. [[intervals-rules]]
  54. `<field>`::
  55. +
  56. --
  57. (Required, rule object) Field you wish to search.
  58. The value of this parameter is a rule object used to match documents
  59. based on matching terms, order, and proximity.
  60. Valid rules include:
  61. * <<intervals-match,`match`>>
  62. * <<intervals-prefix,`prefix`>>
  63. * <<intervals-wildcard,`wildcard`>>
  64. * <<intervals-fuzzy,`fuzzy`>>
  65. * <<intervals-all_of,`all_of`>>
  66. * <<intervals-any_of,`any_of`>>
  67. --
  68. [[intervals-match]]
  69. ==== `match` rule parameters
  70. The `match` rule matches analyzed text.
  71. `query`::
  72. (Required, string) Text you wish to find in the provided `<field>`.
  73. `max_gaps`::
  74. +
  75. --
  76. (Optional, integer) Maximum number of positions between the matching terms.
  77. Terms further apart than this are not considered matches. Defaults to
  78. `-1`.
  79. If unspecified or set to `-1`, there is no width restriction on the match. If
  80. set to `0`, the terms must appear next to each other.
  81. --
  82. `ordered`::
  83. (Optional, boolean)
  84. If `true`, matching terms must appear in their specified order. Defaults to
  85. `false`.
  86. `analyzer`::
  87. (Optional, string) <<analysis, analyzer>> used to analyze terms in the `query`.
  88. Defaults to the top-level `<field>`'s analyzer.
  89. `filter`::
  90. (Optional, <<interval_filter,interval filter>> rule object) An optional interval
  91. filter.
  92. `use_field`::
  93. (Optional, string) If specified, then match intervals from this
  94. field rather than the top-level `<field>`. Terms are analyzed using the
  95. search analyzer from this field. This allows you to search across multiple
  96. fields as if they were all the same field; for example, you could index the same
  97. text into stemmed and unstemmed fields, and search for stemmed tokens near
  98. unstemmed ones.
  99. [[intervals-prefix]]
  100. ==== `prefix` rule parameters
  101. The `prefix` rule matches terms that start with a specified set of characters.
  102. This prefix can expand to match at most 128 terms. If the prefix matches more
  103. than 128 terms, {es} returns an error. You can use the
  104. <<index-prefixes,`index-prefixes`>> option in the field mapping to avoid this
  105. limit.
  106. `prefix`::
  107. (Required, string) Beginning characters of terms you wish to find in the
  108. top-level `<field>`.
  109. `analyzer`::
  110. (Optional, string) <<analysis, analyzer>> used to normalize the `prefix`.
  111. Defaults to the top-level `<field>`'s analyzer.
  112. `use_field`::
  113. +
  114. --
  115. (Optional, string) If specified, then match intervals from this field rather
  116. than the top-level `<field>`.
  117. The `prefix` is normalized using the search analyzer from this field, unless a
  118. separate `analyzer` is specified.
  119. --
  120. [[intervals-wildcard]]
  121. ==== `wildcard` rule parameters
  122. The `wildcard` rule matches terms using a wildcard pattern. This pattern can
  123. expand to match at most 128 terms. If the pattern matches more than 128 terms,
  124. {es} returns an error.
  125. `pattern`::
  126. (Required, string) Wildcard pattern used to find matching terms.
  127. +
  128. --
  129. This parameter supports two wildcard operators:
  130. * `?`, which matches any single character
  131. * `*`, which can match zero or more characters, including an empty one
  132. WARNING: Avoid beginning patterns with `*` or `?`. This can increase
  133. the iterations needed to find matching terms and slow search performance.
  134. --
  135. `analyzer`::
  136. (Optional, string) <<analysis, analyzer>> used to normalize the `pattern`.
  137. Defaults to the top-level `<field>`'s analyzer.
  138. `use_field`::
  139. +
  140. --
  141. (Optional, string) If specified, match intervals from this field rather than the
  142. top-level `<field>`.
  143. The `pattern` is normalized using the search analyzer from this field, unless
  144. `analyzer` is specified separately.
  145. --
  146. [[intervals-fuzzy]]
  147. ==== `fuzzy` rule parameters
  148. The `fuzzy` rule matches terms that are similar to the provided term, within an
  149. edit distance defined by <<fuzziness>>. If the fuzzy expansion matches more than
  150. 128 terms, {es} returns an error.
  151. `term`::
  152. (Required, string) The term to match
  153. `prefix_length`::
  154. (Optional, string) Number of beginning characters left unchanged when creating
  155. expansions. Defaults to `0`.
  156. `transpositions`::
  157. (Optional, boolean) Indicates whether edits include transpositions of two
  158. adjacent characters (ab → ba). Defaults to `true`.
  159. `fuzziness`::
  160. (Optional, string) Maximum edit distance allowed for matching. See <<fuzziness>>
  161. for valid values and more information. Defaults to `auto`.
  162. `analyzer`::
  163. (Optional, string) <<analysis, analyzer>> used to normalize the `term`.
  164. Defaults to the top-level `<field>` 's analyzer.
  165. `use_field`::
  166. +
  167. --
  168. (Optional, string) If specified, match intervals from this field rather than the
  169. top-level `<field>`.
  170. The `term` is normalized using the search analyzer from this field, unless
  171. `analyzer` is specified separately.
  172. --
  173. [[intervals-all_of]]
  174. ==== `all_of` rule parameters
  175. The `all_of` rule returns matches that span a combination of other rules.
  176. `intervals`::
  177. (Required, array of rule objects) An array of rules to combine. All rules must
  178. produce a match in a document for the overall source to match.
  179. `max_gaps`::
  180. +
  181. --
  182. (Optional, integer) Maximum number of positions between the matching terms.
  183. Intervals produced by the rules further apart than this are not considered
  184. matches. Defaults to `-1`.
  185. If unspecified or set to `-1`, there is no width restriction on the match. If
  186. set to `0`, the terms must appear next to each other.
  187. --
  188. `ordered`::
  189. (Optional, boolean) If `true`, intervals produced by the rules should appear in
  190. the order in which they are specified. Defaults to `false`.
  191. `filter`::
  192. (Optional, <<interval_filter,interval filter>> rule object) Rule used to filter
  193. returned intervals.
  194. [[intervals-any_of]]
  195. ==== `any_of` rule parameters
  196. The `any_of` rule returns intervals produced by any of its sub-rules.
  197. `intervals`::
  198. (Required, array of rule objects) An array of rules to match.
  199. `filter`::
  200. (Optional, <<interval_filter,interval filter>> rule object) Rule used to filter
  201. returned intervals.
  202. [[interval_filter]]
  203. ==== `filter` rule parameters
  204. The `filter` rule returns intervals based on a query. See
  205. <<interval-filter-rule-ex>> for an example.
  206. `after`::
  207. (Optional, query object) Query used to return intervals that follow an interval
  208. from the `filter` rule.
  209. `before`::
  210. (Optional, query object) Query used to return intervals that occur before an
  211. interval from the `filter` rule.
  212. `contained_by`::
  213. (Optional, query object) Query used to return intervals contained by an interval
  214. from the `filter` rule.
  215. `containing`::
  216. (Optional, query object) Query used to return intervals that contain an interval
  217. from the `filter` rule.
  218. `not_contained_by`::
  219. (Optional, query object) Query used to return intervals that are *not*
  220. contained by an interval from the `filter` rule.
  221. `not_containing`::
  222. (Optional, query object) Query used to return intervals that do *not* contain
  223. an interval from the `filter` rule.
  224. `not_overlapping`::
  225. (Optional, query object) Query used to return intervals that do *not* overlap
  226. with an interval from the `filter` rule.
  227. `overlapping`::
  228. (Optional, query object) Query used to return intervals that overlap with an
  229. interval from the `filter` rule.
  230. `script`::
  231. (Optional, <<modules-scripting-using, script object>>) Script used to return
  232. matching documents. This script must return a boolean value, `true` or `false`.
  233. See <<interval-script-filter>> for an example.
  234. [[intervals-query-note]]
  235. ==== Notes
  236. [[interval-filter-rule-ex]]
  237. ===== Filter example
  238. The following search includes a `filter` rule. It returns documents that have
  239. the words `hot` and `porridge` within 10 positions of each other, without the
  240. word `salty` in between:
  241. [source,console]
  242. --------------------------------------------------
  243. POST _search
  244. {
  245. "query": {
  246. "intervals" : {
  247. "my_text" : {
  248. "match" : {
  249. "query" : "hot porridge",
  250. "max_gaps" : 10,
  251. "filter" : {
  252. "not_containing" : {
  253. "match" : {
  254. "query" : "salty"
  255. }
  256. }
  257. }
  258. }
  259. }
  260. }
  261. }
  262. }
  263. --------------------------------------------------
  264. [[interval-script-filter]]
  265. ===== Script filters
  266. You can use a script to filter intervals based on their start position, end
  267. position, and internal gap count. The following `filter` script uses the
  268. `interval` variable with the `start`, `end`, and `gaps` methods:
  269. [source,console]
  270. --------------------------------------------------
  271. POST _search
  272. {
  273. "query": {
  274. "intervals" : {
  275. "my_text" : {
  276. "match" : {
  277. "query" : "hot porridge",
  278. "filter" : {
  279. "script" : {
  280. "source" : "interval.start > 10 && interval.end < 20 && interval.gaps == 0"
  281. }
  282. }
  283. }
  284. }
  285. }
  286. }
  287. }
  288. --------------------------------------------------
  289. [[interval-minimization]]
  290. ===== Minimization
  291. The intervals query always minimizes intervals, to ensure that queries can
  292. run in linear time. This can sometimes cause surprising results, particularly
  293. when using `max_gaps` restrictions or filters. For example, take the
  294. following query, searching for `salty` contained within the phrase `hot
  295. porridge`:
  296. [source,console]
  297. --------------------------------------------------
  298. POST _search
  299. {
  300. "query": {
  301. "intervals" : {
  302. "my_text" : {
  303. "match" : {
  304. "query" : "salty",
  305. "filter" : {
  306. "contained_by" : {
  307. "match" : {
  308. "query" : "hot porridge"
  309. }
  310. }
  311. }
  312. }
  313. }
  314. }
  315. }
  316. }
  317. --------------------------------------------------
  318. This query does *not* match a document containing the phrase `hot porridge is
  319. salty porridge`, because the intervals returned by the match query for `hot
  320. porridge` only cover the initial two terms in this document, and these do not
  321. overlap the intervals covering `salty`.
  322. Another restriction to be aware of is the case of `any_of` rules that contain
  323. sub-rules which overlap. In particular, if one of the rules is a strict
  324. prefix of the other, then the longer rule can never match, which can
  325. cause surprises when used in combination with `max_gaps`. Consider the
  326. following query, searching for `the` immediately followed by `big` or `big bad`,
  327. immediately followed by `wolf`:
  328. [source,console]
  329. --------------------------------------------------
  330. POST _search
  331. {
  332. "query": {
  333. "intervals" : {
  334. "my_text" : {
  335. "all_of" : {
  336. "intervals" : [
  337. { "match" : { "query" : "the" } },
  338. { "any_of" : {
  339. "intervals" : [
  340. { "match" : { "query" : "big" } },
  341. { "match" : { "query" : "big bad" } }
  342. ] } },
  343. { "match" : { "query" : "wolf" } }
  344. ],
  345. "max_gaps" : 0,
  346. "ordered" : true
  347. }
  348. }
  349. }
  350. }
  351. }
  352. --------------------------------------------------
  353. Counter-intuitively, this query does *not* match the document `the big bad
  354. wolf`, because the `any_of` rule in the middle only produces intervals
  355. for `big` - intervals for `big bad` being longer than those for `big`, while
  356. starting at the same position, and so being minimized away. In these cases,
  357. it's better to rewrite the query so that all of the options are explicitly
  358. laid out at the top level:
  359. [source,console]
  360. --------------------------------------------------
  361. POST _search
  362. {
  363. "query": {
  364. "intervals" : {
  365. "my_text" : {
  366. "any_of" : {
  367. "intervals" : [
  368. { "match" : {
  369. "query" : "the big bad wolf",
  370. "ordered" : true,
  371. "max_gaps" : 0 } },
  372. { "match" : {
  373. "query" : "the big wolf",
  374. "ordered" : true,
  375. "max_gaps" : 0 } }
  376. ]
  377. }
  378. }
  379. }
  380. }
  381. }
  382. --------------------------------------------------