123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424 |
- [[query-dsl-percolator-query]]
- === Percolator Query
- Traditionally you design documents based on your data, store them into an index, and then define queries via the search API
- in order to retrieve these documents. The percolator works in the opposite direction. First you store queries into an
- index and then you use the `percolator` query to search for the queries which match a specified document (or documents).
- The reason that queries can be stored comes from the fact that in Elasticsearch both documents and queries are defined in
- JSON. This allows you to embed queries into documents via the index API. Elasticsearch can extract the query from a
- document and make it available for search via the `percolator` query. Since documents are also defined as JSON,
- you can define a document in the `percolator` query.
- [IMPORTANT]
- =====================================
- Fields referred to in a percolator query must *already* exist in the mapping
- associated with the index used for percolation. In order to make sure these fields exist,
- add or update a mapping via the <<indices-create-index,create index>> or <<indices-put-mapping,put mapping>> APIs.
- =====================================
- [float]
- === Sample Usage
- Create an index with a mapping for the field `message`:
- [source,js]
- --------------------------------------------------
- curl -XPUT 'localhost:9200/my-index' -d '{
- "mappings": {
- "my-type": {
- "properties": {
- "message": {
- "type": "string"
- }
- }
- }
- }
- }'
- --------------------------------------------------
- Register a query in the percolator:
- [source,js]
- --------------------------------------------------
- curl -XPUT 'localhost:9200/my-index/.percolator/1' -d '{
- "query" : {
- "match" : {
- "message" : "bonsai tree"
- }
- }
- }'
- --------------------------------------------------
- Match a document to the registered percolator queries:
- [source,js]
- --------------------------------------------------
- curl -XGET 'localhost:9200/my-index/_search' -d '{
- "query" : {
- "percolator" : {
- "document_type" : "my-type",
- "document" : {
- "message" : "A new bonsai tree in the office"
- }
- }
- }
- }'
- --------------------------------------------------
- The above request will yield the following response:
- [source,js]
- --------------------------------------------------
- {
- "took": 5,
- "timed_out": false,
- "_shards": {
- "total": 5,
- "successful": 5,
- "failed": 0
- },
- "hits": {
- "total": 1,
- "max_score": 0,
- "hits": [
- { <1>
- "_index": "my-index",
- "_type": ".percolator",
- "_id": "1",
- "_score": 0,
- "_source": {
- "query": {
- "match": {
- "message": "bonsai tree"
- }
- }
- }
- }
- ]
- }
- }
- --------------------------------------------------
- <1> The percolate query with id `1` matches our document.
- [float]
- === Indexing Percolator Queries
- Percolate queries are stored as documents in a specific format and in an arbitrary index under a reserved type with the
- name `.percolator`. The query itself is placed as is in a JSON object under the top level field `query`.
- [source,js]
- --------------------------------------------------
- {
- "query" : {
- "match" : {
- "field" : "value"
- }
- }
- }
- --------------------------------------------------
- Since this is just an ordinary document, any field can be added to this document. This can be useful later on to only
- percolate documents by specific queries.
- [source,js]
- --------------------------------------------------
- {
- "query" : {
- "match" : {
- "field" : "value"
- }
- },
- "priority" : "high"
- }
- --------------------------------------------------
- Just as with any other type, the `.percolator` type has a mapping, which you can configure via the mappings APIs.
- The default percolate mapping doesn't index the query field, only stores it.
- Because `.percolate` is a type it also has a mapping. By default the following mapping is active:
- [source,js]
- --------------------------------------------------
- {
- ".percolator" : {
- "properties" : {
- "query" : {
- "type" : "percolator"
- }
- }
- }
- }
- --------------------------------------------------
- If needed, this mapping can be modified with the update mapping API.
- In order to un-register a percolate query the delete API can be used. So if the previous added query needs to be deleted
- the following delete requests needs to be executed:
- [source,js]
- --------------------------------------------------
- curl -XDELETE localhost:9200/my-index/.percolator/1
- --------------------------------------------------
- [float]
- ==== Parameters
- The following parameters are required when percolating a document:
- [horizontal]
- `document_type`:: The type / mapping of the document being percolated. This is parameter is always required.
- `document`:: The source of the document being percolated.
- Instead of specifying a the source of the document being percolated, the source can also be retrieved from an already
- stored document. The `percolator` query will then internally execute a get request to fetch that document.
- In that case the `document` parameter can be substituted with the following parameters:
- [horizontal]
- `index`:: The index the document resides in. This is a required parameter.
- `type`:: The type of the document to fetch. This is a required parameter.
- `id`:: The id of the document to fetch. This is a required parameter.
- `routing`:: Optionally, routing to be used to fetch document to percolate.
- `preference`:: Optionally, preference to be used to fetch document to percolate.
- `version`:: Optionally, the expected version of the document to be fetched.
- [float]
- ==== Dedicated Percolator Index
- Percolate queries can be added to any index. Instead of adding percolate queries to the index the data resides in,
- these queries can also be added to a dedicated index. The advantage of this is that this dedicated percolator index
- can have its own index settings (For example the number of primary and replica shards). If you choose to have a dedicated
- percolate index, you need to make sure that the mappings from the normal index are also available on the percolate index.
- Otherwise percolate queries can be parsed incorrectly.
- [float]
- ==== Percolating an Existing Document
- In order to percolate a newly indexed document, the `percolator` query can be used. Based on the response
- from an index request, the `_id` and other meta information can be used to immediately percolate the newly added
- document.
- [float]
- ===== Example
- Based on the previous example.
- Index the document we want to percolate:
- [source,js]
- --------------------------------------------------
- curl -XPUT "http://localhost:9200/my-index/message/1" -d'
- {
- "message" : "A new bonsai tree in the office"
- }'
- --------------------------------------------------
- Index response:
- [source,js]
- --------------------------------------------------
- {
- "_index": "my-index",
- "_type": "message",
- "_id": "1",
- "_version": 1,
- "_shards": {
- "total": 2,
- "successful": 1,
- "failed": 0
- },
- "created": true
- }
- --------------------------------------------------
- Percolating an existing document, using the index response as basis to build to new search request:
- [source,js]
- --------------------------------------------------
- curl -XGET "http://localhost:9200/my-index/_search" -d'
- {
- "query" : {
- "percolator" : {
- "document_type" : "my-type",
- "index" : "my-index",
- "type" : "message",
- "id" : "1",
- "version" : 1 <1>
- }
- }
- }'
- --------------------------------------------------
- <1> The version is optional, but useful in certain cases. We can then ensure that we are try to percolate
- the document we just have indexed. A change may be made after we have indexed, and if that is the
- case the then the search request would fail with a version conflict error.
- The search response returned is identical as in the previous example.
- [float]
- ==== Percolator and highlighting
- The percolator query is handled in a special way when it comes to highlighting. The percolator queries hits are used
- to highlight the document that is provided in the `percolator` query. Whereas with regular highlighting the query in
- the search request is used to highlight the hits.
- [float]
- ===== Example
- This example is based on the mapping of the first example.
- Add a percolator query:
- [source,js]
- --------------------------------------------------
- curl -XPUT "http://localhost:9200/my-index/.percolator/1" -d'
- {
- "query" : {
- "match" : {
- "message" : "brown fox"
- }
- }
- }'
- --------------------------------------------------
- Add another percolator query:
- [source,js]
- --------------------------------------------------
- curl -XPUT "http://localhost:9200/my-index/.percolator/2" -d'
- {
- "query" : {
- "match" : {
- "message" : "lazy dog"
- }
- }
- }'
- --------------------------------------------------
- Execute a search request with `percolator` and highlighting enabled:
- [source,js]
- --------------------------------------------------
- curl -XGET "http://localhost:9200/my-index/_search" -d'
- {
- "query" : {
- "percolator" : {
- "document_type" : "my-type",
- "document" : {
- "message" : "The quick brown fox jumps over the lazy dog"
- }
- }
- },
- "highlight": {
- "fields": {
- "message": {}
- }
- }
- }'
- --------------------------------------------------
- This will yield the following response.
- [source,js]
- --------------------------------------------------
- {
- "took": 14,
- "timed_out": false,
- "_shards": {
- "total": 5,
- "successful": 5,
- "failed": 0
- },
- "hits": {
- "total": 2,
- "max_score": 0,
- "hits": [
- {
- "_index": "my-index",
- "_type": ".percolator",
- "_id": "2",
- "_score": 0,
- "_source": {
- "query": {
- "match": {
- "message": "lazy dog"
- }
- }
- },
- "highlight": {
- "message": [
- "The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" <1>
- ]
- }
- },
- {
- "_index": "my-index",
- "_type": ".percolator",
- "_id": "1",
- "_score": 0,
- "_source": {
- "query": {
- "match": {
- "message": "brown fox"
- }
- }
- },
- "highlight": {
- "message": [
- "The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" <1>
- ]
- }
- }
- ]
- }
- }
- --------------------------------------------------
- <1> Instead of the query in the search request highlighting the percolator hits, the percolator queries are highlighting
- the document defined in the `percolator` query.
- [float]
- ==== How it Works Under the Hood
- When indexing a document that contains a query in an index and the `.percolator` type, the query part of the documents gets
- parsed into a Lucene query and is kept in memory until that percolator document is removed or the index containing the
- `.percolator` type gets removed. So, all the active percolator queries are kept in memory.
- At search time, the document specified in the request gets parsed into a Lucene document and is stored in a in-memory
- Lucene index. This in-memory index can just hold this one document and it is optimized for that. Then all the queries
- that are registered to the index that the searh request is targeted for, are going to be executed on this single document
- in-memory index. This happens on each shard the search request needs to execute.
- By using `routing` or additional queries the amount of percolator queries that need to be executed can be reduced and thus
- the time the search API needs to run can be decreased.
- [float]
- ==== Important Notes
- Because the percolator query is processing one document at a time, it doesn't support queries and filters that run
- against child documents such as `has_child` and `has_parent`.
- The percolator doesn't work with queries like `template` and `geo_shape` queries when these queries fetch documents
- to substitute parts of the query. The reason is that the percolator stores the query terms during indexing in order to
- speedup percolating in certain cases and this doesn't work if part of the query is defined in another document.
- There is no way to know for the percolator to know if an external document has changed and even if this was the case the
- percolator query has to be reindexed.
- The `wildcard` and `regexp` query natively use a lot of memory and because the percolator keeps the queries into memory
- this can easily take up the available memory in the heap space. If possible try to use a `prefix` query or ngramming to
- achieve the same result (with way less memory being used).
- [float]
- ==== Forcing Unmapped Fields to be Handled as Strings
- In certain cases it is unknown what kind of percolator queries do get registered, and if no field mapping exists for fields
- that are referred by percolator queries then adding a percolator query fails. This means the mapping needs to be updated
- to have the field with the appropriate settings, and then the percolator query can be added. But sometimes it is sufficient
- if all unmapped fields are handled as if these were default string fields. In those cases one can configure the
- `index.percolator.map_unmapped_fields_as_string` setting to `true` (default to `false`) and then if a field referred in
- a percolator query does not exist, it will be handled as a default string field so that adding the percolator query doesn't
- fail.
|