123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695 |
- [[query-dsl-percolate-query]]
- === Percolate query
- ++++
- <titleabbrev>Percolate</titleabbrev>
- ++++
- The `percolate` query can be used to match queries
- stored in an index. The `percolate` query itself
- contains the document that will be used as query
- to match with the stored queries.
- [float]
- === Sample Usage
- Create an index with two fields:
- [source,console]
- --------------------------------------------------
- PUT /my-index
- {
- "mappings": {
- "properties": {
- "message": {
- "type": "text"
- },
- "query": {
- "type": "percolator"
- }
- }
- }
- }
- --------------------------------------------------
- The `message` field is the field used to preprocess the document defined in
- the `percolator` query before it gets indexed into a temporary index.
- The `query` field is used for indexing the query documents. It will hold a
- json object that represents an actual Elasticsearch query. The `query` field
- has been configured to use the <<percolator,percolator field type>>. This field
- type understands the query dsl and stores the query in such a way that it can be
- used later on to match documents defined on the `percolate` query.
- Register a query in the percolator:
- [source,console]
- --------------------------------------------------
- PUT /my-index/_doc/1?refresh
- {
- "query" : {
- "match" : {
- "message" : "bonsai tree"
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- Match a document to the registered percolator queries:
- [source,console]
- --------------------------------------------------
- GET /my-index/_search
- {
- "query" : {
- "percolate" : {
- "field" : "query",
- "document" : {
- "message" : "A new bonsai tree in the office"
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- The above request will yield the following response:
- [source,console-result]
- --------------------------------------------------
- {
- "took": 13,
- "timed_out": false,
- "_shards": {
- "total": 1,
- "successful": 1,
- "skipped" : 0,
- "failed": 0
- },
- "hits": {
- "total" : {
- "value": 1,
- "relation": "eq"
- },
- "max_score": 0.26152915,
- "hits": [
- { <1>
- "_index": "my-index",
- "_type": "_doc",
- "_id": "1",
- "_score": 0.26152915,
- "_source": {
- "query": {
- "match": {
- "message": "bonsai tree"
- }
- }
- },
- "fields" : {
- "_percolator_document_slot" : [0] <2>
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 13,/"took": "$body.took",/]
- <1> The query with id `1` matches our document.
- <2> The `_percolator_document_slot` field indicates which document has matched with this query.
- Useful when percolating multiple document simultaneously.
- TIP: To provide a simple example, this documentation uses one index `my-index` for both the percolate queries and documents.
- This set-up can work well when there are just a few percolate queries registered. However, with heavier usage it is recommended
- to store queries and documents in separate indices. Please see <<how-it-works, How it Works Under the Hood>> for more details.
- [float]
- ==== Parameters
- The following parameters are required when percolating a document:
- [horizontal]
- `field`:: The field of type `percolator` that holds the indexed queries. This is a required parameter.
- `name`:: The suffix to be used for the `_percolator_document_slot` field in case multiple `percolate` queries have been specified.
- This is an optional parameter.
- `document`:: The source of the document being percolated.
- `documents`:: Like the `document` parameter, but accepts multiple documents via a json array.
- `document_type`:: The type / mapping of the document being percolated. This parameter is deprecated and will be removed in Elasticsearch 8.0.
- Instead of specifying the source of the document being percolated, the source can also be retrieved from an already
- stored document. The `percolate` query will then internally execute a get request to fetch that document.
- In that case the `document` parameter can be substituted with the following parameters:
- [horizontal]
- `index`:: The index the document resides in. This is a required parameter.
- `type`:: The type of the document to fetch. This parameter is deprecated and will be removed in Elasticsearch 8.0.
- `id`:: The id of the document to fetch. This is a required parameter.
- `routing`:: Optionally, routing to be used to fetch document to percolate.
- `preference`:: Optionally, preference to be used to fetch document to percolate.
- `version`:: Optionally, the expected version of the document to be fetched.
- [float]
- ==== Percolating in a filter context
- In case you are not interested in the score, better performance can be expected by wrapping
- the percolator query in a `bool` query's filter clause or in a `constant_score` query:
- [source,console]
- --------------------------------------------------
- GET /my-index/_search
- {
- "query" : {
- "constant_score": {
- "filter": {
- "percolate" : {
- "field" : "query",
- "document" : {
- "message" : "A new bonsai tree in the office"
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- At index time terms are extracted from the percolator query and the percolator
- can often determine whether a query matches just by looking at those extracted
- terms. However, computing scores requires to deserialize each matching query
- and run it against the percolated document, which is a much more expensive
- operation. Hence if computing scores is not required the `percolate` query
- should be wrapped in a `constant_score` query or a `bool` query's filter clause.
- Note that the `percolate` query never gets cached by the query cache.
- [float]
- ==== Percolating multiple documents
- The `percolate` query can match multiple documents simultaneously with the indexed percolator queries.
- Percolating multiple documents in a single request can improve performance as queries only need to be parsed and
- matched once instead of multiple times.
- The `_percolator_document_slot` field that is being returned with each matched percolator query is important when percolating
- multiple documents simultaneously. It indicates which documents matched with a particular percolator query. The numbers
- correlate with the slot in the `documents` array specified in the `percolate` query.
- [source,console]
- --------------------------------------------------
- GET /my-index/_search
- {
- "query" : {
- "percolate" : {
- "field" : "query",
- "documents" : [ <1>
- {
- "message" : "bonsai tree"
- },
- {
- "message" : "new tree"
- },
- {
- "message" : "the office"
- },
- {
- "message" : "office tree"
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- <1> The documents array contains 4 documents that are going to be percolated at the same time.
- [source,console-result]
- --------------------------------------------------
- {
- "took": 13,
- "timed_out": false,
- "_shards": {
- "total": 1,
- "successful": 1,
- "skipped" : 0,
- "failed": 0
- },
- "hits": {
- "total" : {
- "value": 1,
- "relation": "eq"
- },
- "max_score": 0.7093853,
- "hits": [
- {
- "_index": "my-index",
- "_type": "_doc",
- "_id": "1",
- "_score": 0.7093853,
- "_source": {
- "query": {
- "match": {
- "message": "bonsai tree"
- }
- }
- },
- "fields" : {
- "_percolator_document_slot" : [0, 1, 3] <1>
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 13,/"took": "$body.took",/]
- <1> The `_percolator_document_slot` indicates that the first, second and last documents specified in the `percolate` query
- are matching with this query.
- [float]
- ==== Percolating an Existing Document
- In order to percolate a newly indexed document, the `percolate` query can be used. Based on the response
- from an index request, the `_id` and other meta information can be used to immediately percolate the newly added
- document.
- [float]
- ===== Example
- Based on the previous example.
- Index the document we want to percolate:
- [source,console]
- --------------------------------------------------
- PUT /my-index/_doc/2
- {
- "message" : "A new bonsai tree in the office"
- }
- --------------------------------------------------
- // TEST[continued]
- Index response:
- [source,console-result]
- --------------------------------------------------
- {
- "_index": "my-index",
- "_type": "_doc",
- "_id": "2",
- "_version": 1,
- "_shards": {
- "total": 2,
- "successful": 1,
- "failed": 0
- },
- "result": "created",
- "_seq_no" : 1,
- "_primary_term" : 1
- }
- --------------------------------------------------
- Percolating an existing document, using the index response as basis to build to new search request:
- [source,console]
- --------------------------------------------------
- GET /my-index/_search
- {
- "query" : {
- "percolate" : {
- "field": "query",
- "index" : "my-index",
- "id" : "2",
- "version" : 1 <1>
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- <1> The version is optional, but useful in certain cases. We can ensure that we are trying to percolate
- the document we just have indexed. A change may be made after we have indexed, and if that is the
- case the search request would fail with a version conflict error.
- The search response returned is identical as in the previous example.
- [float]
- ==== Percolate query and highlighting
- The `percolate` query is handled in a special way when it comes to highlighting. The queries hits are used
- to highlight the document that is provided in the `percolate` query. Whereas with regular highlighting the query in
- the search request is used to highlight the hits.
- [float]
- ===== Example
- This example is based on the mapping of the first example.
- Save a query:
- [source,console]
- --------------------------------------------------
- PUT /my-index/_doc/3?refresh
- {
- "query" : {
- "match" : {
- "message" : "brown fox"
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- Save another query:
- [source,console]
- --------------------------------------------------
- PUT /my-index/_doc/4?refresh
- {
- "query" : {
- "match" : {
- "message" : "lazy dog"
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- Execute a search request with the `percolate` query and highlighting enabled:
- [source,console]
- --------------------------------------------------
- GET /my-index/_search
- {
- "query" : {
- "percolate" : {
- "field": "query",
- "document" : {
- "message" : "The quick brown fox jumps over the lazy dog"
- }
- }
- },
- "highlight": {
- "fields": {
- "message": {}
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- This will yield the following response.
- [source,console-result]
- --------------------------------------------------
- {
- "took": 7,
- "timed_out": false,
- "_shards": {
- "total": 1,
- "successful": 1,
- "skipped" : 0,
- "failed": 0
- },
- "hits": {
- "total" : {
- "value": 2,
- "relation": "eq"
- },
- "max_score": 0.26152915,
- "hits": [
- {
- "_index": "my-index",
- "_type": "_doc",
- "_id": "3",
- "_score": 0.26152915,
- "_source": {
- "query": {
- "match": {
- "message": "brown fox"
- }
- }
- },
- "highlight": {
- "message": [
- "The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" <1>
- ]
- },
- "fields" : {
- "_percolator_document_slot" : [0]
- }
- },
- {
- "_index": "my-index",
- "_type": "_doc",
- "_id": "4",
- "_score": 0.26152915,
- "_source": {
- "query": {
- "match": {
- "message": "lazy dog"
- }
- }
- },
- "highlight": {
- "message": [
- "The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" <1>
- ]
- },
- "fields" : {
- "_percolator_document_slot" : [0]
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 7,/"took": "$body.took",/]
- <1> The terms from each query have been highlighted in the document.
- Instead of the query in the search request highlighting the percolator hits, the percolator queries are highlighting
- the document defined in the `percolate` query.
- When percolating multiple documents at the same time like the request below then the highlight response is different:
- [source,console]
- --------------------------------------------------
- GET /my-index/_search
- {
- "query" : {
- "percolate" : {
- "field": "query",
- "documents" : [
- {
- "message" : "bonsai tree"
- },
- {
- "message" : "new tree"
- },
- {
- "message" : "the office"
- },
- {
- "message" : "office tree"
- }
- ]
- }
- },
- "highlight": {
- "fields": {
- "message": {}
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- The slightly different response:
- [source,console-result]
- --------------------------------------------------
- {
- "took": 13,
- "timed_out": false,
- "_shards": {
- "total": 1,
- "successful": 1,
- "skipped" : 0,
- "failed": 0
- },
- "hits": {
- "total" : {
- "value": 1,
- "relation": "eq"
- },
- "max_score": 0.7093853,
- "hits": [
- {
- "_index": "my-index",
- "_type": "_doc",
- "_id": "1",
- "_score": 0.7093853,
- "_source": {
- "query": {
- "match": {
- "message": "bonsai tree"
- }
- }
- },
- "fields" : {
- "_percolator_document_slot" : [0, 1, 3]
- },
- "highlight" : { <1>
- "0_message" : [
- "<em>bonsai</em> <em>tree</em>"
- ],
- "3_message" : [
- "office <em>tree</em>"
- ],
- "1_message" : [
- "new <em>tree</em>"
- ]
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 13,/"took": "$body.took",/]
- <1> The highlight fields have been prefixed with the document slot they belong to,
- in order to know which highlight field belongs to what document.
- [float]
- ==== Specifying multiple percolate queries
- It is possible to specify multiple `percolate` queries in a single search request:
- [source,console]
- --------------------------------------------------
- GET /my-index/_search
- {
- "query" : {
- "bool" : {
- "should" : [
- {
- "percolate" : {
- "field" : "query",
- "document" : {
- "message" : "bonsai tree"
- },
- "name": "query1" <1>
- }
- },
- {
- "percolate" : {
- "field" : "query",
- "document" : {
- "message" : "tulip flower"
- },
- "name": "query2" <1>
- }
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- <1> The `name` parameter will be used to identify which percolator document slots belong to what `percolate` query.
- The `_percolator_document_slot` field name will be suffixed with what is specified in the `_name` parameter.
- If that isn't specified then the `field` parameter will be used, which in this case will result in ambiguity.
- The above search request returns a response similar to this:
- [source,console-result]
- --------------------------------------------------
- {
- "took": 13,
- "timed_out": false,
- "_shards": {
- "total": 1,
- "successful": 1,
- "skipped" : 0,
- "failed": 0
- },
- "hits": {
- "total" : {
- "value": 1,
- "relation": "eq"
- },
- "max_score": 0.26152915,
- "hits": [
- {
- "_index": "my-index",
- "_type": "_doc",
- "_id": "1",
- "_score": 0.26152915,
- "_source": {
- "query": {
- "match": {
- "message": "bonsai tree"
- }
- }
- },
- "fields" : {
- "_percolator_document_slot_query1" : [0] <1>
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 13,/"took": "$body.took",/]
- <1> The `_percolator_document_slot_query1` percolator slot field indicates that these matched slots are from the `percolate`
- query with `_name` parameter set to `query1`.
- [float]
- [[how-it-works]]
- ==== How it Works Under the Hood
- When indexing a document into an index that has the <<percolator,percolator field type>> mapping configured, the query
- part of the document gets parsed into a Lucene query and is stored into the Lucene index. A binary representation
- of the query gets stored, but also the query's terms are analyzed and stored into an indexed field.
- At search time, the document specified in the request gets parsed into a Lucene document and is stored in a in-memory
- temporary Lucene index. This in-memory index can just hold this one document and it is optimized for that. After this
- a special query is built based on the terms in the in-memory index that select candidate percolator queries based on
- their indexed query terms. These queries are then evaluated by the in-memory index if they actually match.
- The selecting of candidate percolator queries matches is an important performance optimization during the execution
- of the `percolate` query as it can significantly reduce the number of candidate matches the in-memory index needs to
- evaluate. The reason the `percolate` query can do this is because during indexing of the percolator queries the query
- terms are being extracted and indexed with the percolator query. Unfortunately the percolator cannot extract terms from
- all queries (for example the `wildcard` or `geo_shape` query) and as a result of that in certain cases the percolator
- can't do the selecting optimization (for example if an unsupported query is defined in a required clause of a boolean query
- or the unsupported query is the only query in the percolator document). These queries are marked by the percolator and
- can be found by running the following search:
- [source,console]
- ---------------------------------------------------
- GET /_search
- {
- "query": {
- "term" : {
- "query.extraction_result" : "failed"
- }
- }
- }
- ---------------------------------------------------
- NOTE: The above example assumes that there is a `query` field of type
- `percolator` in the mappings.
- Given the design of percolation, it often makes sense to use separate indices for the percolate queries and documents
- being percolated, as opposed to a single index as we do in examples. There are a few benefits to this approach:
- - Because percolate queries contain a different set of fields from the percolated documents, using two separate indices
- allows for fields to be stored in a denser, more efficient way.
- - Percolate queries do not scale in the same way as other queries, so percolation performance may benefit from using
- a different index configuration, like the number of primary shards.
|