|
@@ -1,27 +1,27 @@
|
|
|
[[search-percolate]]
|
|
|
== Percolator
|
|
|
|
|
|
-Traditionally you design documents based on your data and store them into an index and then define queries via the search api
|
|
|
-in order to retrieve these documents. The percolator works in the opposite direction, first you store queries into an
|
|
|
-index and then via the percolate api you define documents in order to retrieve these queries.
|
|
|
+Traditionally you design documents based on your data, store them into an index, and then define queries via the search API
|
|
|
+in order to retrieve these documents. The percolator works in the opposite direction. First you store queries into an
|
|
|
+index and then, via the percolate API, you define documents in order to retrieve these queries.
|
|
|
|
|
|
The reason that queries can be stored comes from the fact that in Elasticsearch both documents and queries are defined in
|
|
|
-JSON. This allows you to embed queries into documents via the index api. Elasticsearch can extract the query from a
|
|
|
-document and make it available to the percolate api. Since documents are also defined as json, you can define a document
|
|
|
-in a request to the percolate api.
|
|
|
+JSON. This allows you to embed queries into documents via the index API. Elasticsearch can extract the query from a
|
|
|
+document and make it available to the percolate API. Since documents are also defined as JSON, you can define a document
|
|
|
+in a request to the percolate API.
|
|
|
|
|
|
The percolator and most of its features work in realtime, so once a percolate query is indexed it can immediately be used
|
|
|
-in the percolate api.
|
|
|
+in the percolate API.
|
|
|
|
|
|
[IMPORTANT]
|
|
|
=====================================
|
|
|
|
|
|
-Field referred to in a percolator query must *already* exist in the mapping
|
|
|
+Fields referred to in a percolator query must *already* exist in the mapping
|
|
|
associated with the index used for percolation.
|
|
|
There are two ways to make sure that a field mapping exist:
|
|
|
|
|
|
* Add or update a mapping via the <<indices-create-index,create index>> or
|
|
|
- <<indices-put-mapping,put mapping>> apis.
|
|
|
+ <<indices-put-mapping,put mapping>> APIs.
|
|
|
* Percolate a document before registering a query. Percolating a document can
|
|
|
add field mappings dynamically, in the same way as happens when indexing a
|
|
|
document.
|
|
@@ -29,7 +29,7 @@ There are two ways to make sure that a field mapping exist:
|
|
|
=====================================
|
|
|
|
|
|
[float]
|
|
|
-=== Sample usage
|
|
|
+=== Sample Usage
|
|
|
|
|
|
Create an index with a mapping for the field `message`:
|
|
|
|
|
@@ -96,10 +96,10 @@ The above request will yield the following response:
|
|
|
<1> The percolate query with id `1` matches our document.
|
|
|
|
|
|
[float]
|
|
|
-=== Indexing percolator queries
|
|
|
+=== Indexing Percolator Queries
|
|
|
|
|
|
Percolate queries are stored as documents in a specific format and in an arbitrary index under a reserved type with the
|
|
|
-name `.percolator`. The query itself is placed as is in a json object under the top level field `query`.
|
|
|
+name `.percolator`. The query itself is placed as is in a JSON object under the top level field `query`.
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
@@ -127,11 +127,11 @@ percolate documents by specific queries.
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
|
|
|
-On top of this also a mapping type can be associated with this query. This allows to control how certain queries
|
|
|
-like range queries, shape filters and other query & filters that rely on mapping settings get constructed. This is
|
|
|
+On top of this, also a mapping type can be associated with this query. This allows to control how certain queries
|
|
|
+like range queries, shape filters, and other query & filters that rely on mapping settings get constructed. This is
|
|
|
important since the percolate queries are indexed into the `.percolator` type, and the queries / filters that rely on
|
|
|
-mapping settings would yield unexpected behaviour. Note by default field names do get resolved in a smart manner,
|
|
|
-but in certain cases with multiple types this can lead to unexpected behaviour, so being explicit about it will help.
|
|
|
+mapping settings would yield unexpected behaviour. Note: By default, field names do get resolved in a smart manner,
|
|
|
+but in certain cases with multiple types this can lead to unexpected behavior, so being explicit about it will help.
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
@@ -149,11 +149,11 @@ but in certain cases with multiple types this can lead to unexpected behaviour,
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
|
|
|
-In the above example the range query gets really parsed into a Lucene numeric range query, based on the settings for
|
|
|
+In the above example the range query really gets parsed into a Lucene numeric range query, based on the settings for
|
|
|
the field `created_at` in the type `tweet`.
|
|
|
|
|
|
-Just as with any other type, the `.percolator` type has a mapping, which you can configure via the mappings apis.
|
|
|
-The default percolate mapping doesn't index the query field and only stores it.
|
|
|
+Just as with any other type, the `.percolator` type has a mapping, which you can configure via the mappings APIs.
|
|
|
+The default percolate mapping doesn't index the query field, only stores it.
|
|
|
|
|
|
Because `.percolate` is a type it also has a mapping. By default the following mapping is active:
|
|
|
|
|
@@ -171,9 +171,9 @@ Because `.percolate` is a type it also has a mapping. By default the following m
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
|
|
|
-If needed this mapping can be modified with the update mapping api.
|
|
|
+If needed, this mapping can be modified with the update mapping API.
|
|
|
|
|
|
-In order to un-register a percolate query the delete api can be used. So if the previous added query needs to be deleted
|
|
|
+In order to un-register a percolate query the delete API can be used. So if the previous added query needs to be deleted
|
|
|
the following delete requests needs to be executed:
|
|
|
|
|
|
[source,js]
|
|
@@ -182,14 +182,14 @@ curl -XDELETE localhost:9200/my-index/.percolator/1
|
|
|
--------------------------------------------------
|
|
|
|
|
|
[float]
|
|
|
-=== Percolate api
|
|
|
+=== Percolate API
|
|
|
|
|
|
-The percolate api executes in a distributed manner, meaning it executes on all shards an index points to.
|
|
|
+The percolate API executes in a distributed manner, meaning it executes on all shards an index points to.
|
|
|
|
|
|
.Required options
|
|
|
* `index` - The index that contains the `.percolator` type. This can also be an alias.
|
|
|
* `type` - The type of the document to be percolated. The mapping of that type is used to parse document.
|
|
|
-* `doc` - The actual document to percolate. Unlike the other two options this needs to be specified in the request body. Note this isn't required when percolating an existing document.
|
|
|
+* `doc` - The actual document to percolate. Unlike the other two options this needs to be specified in the request body. Note: This isn't required when percolating an existing document.
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
@@ -206,8 +206,8 @@ curl -XGET 'localhost:9200/twitter/tweet/_percolate' -d '{
|
|
|
that the percolate request only gets executed on the shard where the routing value is partitioned to. This means that
|
|
|
the percolate request only gets executed on one shard instead of all shards. Multiple values can be specified as a
|
|
|
comma separated string, in that case the request can be be executed on more than one shard.
|
|
|
-* `preference` - Controls which shard replicas are preferred to execute the request on. Works the same as in the search api.
|
|
|
-* `ignore_unavailable` - Controls if missing concrete indices should silently be ignored. Same as is in the search api.
|
|
|
+* `preference` - Controls which shard replicas are preferred to execute the request on. Works the same as in the search API.
|
|
|
+* `ignore_unavailable` - Controls if missing concrete indices should silently be ignored. Same as is in the search API.
|
|
|
* `percolate_format` - If `ids` is specified then the matches array in the percolate response will contain a string
|
|
|
array of the matching ids instead of an array of objects. This can be useful to reduce the amount of data being send
|
|
|
back to the client. Obviously if there are to percolator queries with same id from different indices there is no way
|
|
@@ -223,7 +223,7 @@ occurred for the filter to included the latest percolate queries.
|
|
|
* `track_scores` - Whether the `_score` is included for each match. The `_score` is based on the query and represents
|
|
|
how the query matched the *percolate query's metadata*, *not* how the document (that is being percolated) matched
|
|
|
the query. The `query` option is required for this option. Defaults to `false`.
|
|
|
-* `sort` - Define a sort specification like in the search api. Currently only sorting `_score` reverse (default relevancy)
|
|
|
+* `sort` - Define a sort specification like in the search API. Currently only sorting `_score` reverse (default relevancy)
|
|
|
is supported. Other sort fields will throw an exception. The `size` and `query` option are required for this setting. Like
|
|
|
`track_score` the score is based on the query and represents how the query matched to the percolate query's metadata
|
|
|
and *not* how the document being percolated matched to the query.
|
|
@@ -232,23 +232,23 @@ look at the aggregation documentation on how to define aggregations.
|
|
|
* `highlight` - Allows highlight definitions to be included. The document being percolated is being highlight for each
|
|
|
matching query. This allows you to see how each match is highlighting the document being percolated. See highlight
|
|
|
documentation on how to define highlights. The `size` option is required for highlighting, the performance of highlighting
|
|
|
- in the percolate api depends of how many matches are being highlighted.
|
|
|
+ in the percolate API depends of how many matches are being highlighted.
|
|
|
|
|
|
[float]
|
|
|
-=== Dedicated percolator index
|
|
|
+=== Dedicated Percolator Index
|
|
|
|
|
|
Percolate queries can be added to any index. Instead of adding percolate queries to the index the data resides in,
|
|
|
these queries can also be added to a dedicated index. The advantage of this is that this dedicated percolator index
|
|
|
-can have its own index settings (For example the number of primary and replicas shards). If you choose to have a dedicated
|
|
|
+can have its own index settings (For example the number of primary and replica shards). If you choose to have a dedicated
|
|
|
percolate index, you need to make sure that the mappings from the normal index are also available on the percolate index.
|
|
|
Otherwise percolate queries can be parsed incorrectly.
|
|
|
|
|
|
[float]
|
|
|
=== Filtering Executed Queries
|
|
|
|
|
|
-Filtering allows to reduce the number of queries, any filter that the search api supports, (expect the ones mentioned in important notes)
|
|
|
-can also be used in the percolate api. The filter only works on the metadata fields. The `query` field isn't indexed by
|
|
|
-default. Based on the query we indexed before the following filter can be defined:
|
|
|
+Filtering allows to reduce the number of queries, any filter that the search API supports, (except the ones mentioned in important notes)
|
|
|
+can also be used in the percolate API. The filter only works on the metadata fields. The `query` field isn't indexed by
|
|
|
+default. Based on the query we indexed before, the following filter can be defined:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
@@ -265,9 +265,9 @@ curl -XGET localhost:9200/test/type1/_percolate -d '{
|
|
|
--------------------------------------------------
|
|
|
|
|
|
[float]
|
|
|
-=== Percolator count api
|
|
|
+=== Percolator Count API
|
|
|
|
|
|
-The count percolate api, only keeps track of the number of matches and doesn't keep track of the actual matches
|
|
|
+The count percolate API, only keeps track of the number of matches and doesn't keep track of the actual matches
|
|
|
Example:
|
|
|
|
|
|
[source,js]
|
|
@@ -291,10 +291,10 @@ Response:
|
|
|
|
|
|
|
|
|
[float]
|
|
|
-=== Percolating an existing document
|
|
|
+=== Percolating an Existing Document
|
|
|
|
|
|
-In order to percolate in newly indexed document, the percolate existing document can be used. Based on the response
|
|
|
-from an index request the `_id` and other meta information can be used to immediately percolate the newly added
|
|
|
+In order to percolate a newly indexed document, the percolate existing document can be used. Based on the response
|
|
|
+from an index request, the `_id` and other meta information can be used to immediately percolate the newly added
|
|
|
document.
|
|
|
|
|
|
.Supported options for percolating an existing document on top of existing percolator options:
|
|
@@ -307,8 +307,8 @@ document.
|
|
|
* `percolate_preference` - Which shard to prefer when executing the percolate request.
|
|
|
* `version` - Enables a version check. If the fetched document's version isn't equal to the specified version then the request fails with a version conflict and the percolation request is aborted.
|
|
|
|
|
|
-Internally the percolate api will issue a get request for fetching the`_source` of the document to percolate.
|
|
|
-For this feature to work the `_source` for documents to be percolated need to be stored.
|
|
|
+Internally the percolate API will issue a GET request for fetching the `_source` of the document to percolate.
|
|
|
+For this feature to work, the `_source` for documents to be percolated needs to be stored.
|
|
|
|
|
|
[float]
|
|
|
==== Example
|
|
@@ -326,20 +326,20 @@ Index response:
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
|
|
|
-Percolating an existing document:
|
|
|
+Percolating an Existing Document:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
curl -XGET 'localhost:9200/my-index1/message/1/_percolate'
|
|
|
--------------------------------------------------
|
|
|
|
|
|
-The response is the same as with the regular percolate api.
|
|
|
+The response is the same as with the regular percolate API.
|
|
|
|
|
|
[float]
|
|
|
-=== Multi percolate api
|
|
|
+=== Multi Percolate API
|
|
|
|
|
|
-The multi percolate api allows to bundle multiple percolate requests into a single request, similar to what the multi
|
|
|
-search api does to search requests. The request body format is line based. Each percolate request item takes two lines,
|
|
|
+The multi percolate API allows to bundle multiple percolate requests into a single request, similar to what the multi
|
|
|
+search API does to search requests. The request body format is line based. Each percolate request item takes two lines,
|
|
|
the first line is the header and the second line is the body.
|
|
|
|
|
|
The header can contain any parameter that normally would be set via the request path or query string parameters.
|
|
@@ -369,7 +369,7 @@ Request:
|
|
|
curl -XGET 'localhost:9200/twitter/tweet/_mpercolate' --data-binary @requests.txt; echo
|
|
|
--------------------------------------------------
|
|
|
|
|
|
-The index twitter is the default index and the type tweet is the default type and will be used in the case a header
|
|
|
+The index `twitter` is the default index, and the type `tweet` is the default type and will be used in the case a header
|
|
|
doesn't specify an index or type.
|
|
|
|
|
|
requests.txt:
|
|
@@ -388,7 +388,7 @@ requests.txt:
|
|
|
{}
|
|
|
--------------------------------------------------
|
|
|
|
|
|
-For a percolate existing document item (headers with the `id` field), the response can be an empty json object.
|
|
|
+For a percolate existing document item (headers with the `id` field), the response can be an empty JSON object.
|
|
|
All the required options are set in the header.
|
|
|
|
|
|
Response:
|
|
@@ -473,22 +473,22 @@ Each item represents a percolate response, the order of the items maps to the or
|
|
|
were specified. In case a percolate request failed, the item response is substituted with an error message.
|
|
|
|
|
|
[float]
|
|
|
-=== How it works under the hood
|
|
|
+=== How it Works Under the Hood
|
|
|
|
|
|
-When indexing a document that contains a query in an index and the `.percolator` type the query part of the documents gets
|
|
|
+When indexing a document that contains a query in an index and the `.percolator` type, the query part of the documents gets
|
|
|
parsed into a Lucene query and is kept in memory until that percolator document is removed or the index containing the
|
|
|
-`.percolator` type get removed. So all the active percolator queries are kept in memory.
|
|
|
+`.percolator` type gets removed. So, all the active percolator queries are kept in memory.
|
|
|
|
|
|
-At percolate time the document specified in the request gets parsed into a Lucene document and is stored in a in-memory
|
|
|
+At percolate time, the document specified in the request gets parsed into a Lucene document and is stored in a in-memory
|
|
|
Lucene index. This in-memory index can just hold this one document and it is optimized for that. Then all the queries
|
|
|
-that are registered to the index that the percolate request is targeted for are going to be executed on this single document
|
|
|
+that are registered to the index that the percolate request is targeted for, are going to be executed on this single document
|
|
|
in-memory index. This happens on each shard the percolate request needs to execute.
|
|
|
|
|
|
By using `routing`, `filter` or `query` features the amount of queries that need to be executed can be reduced and thus
|
|
|
-the time the percolate api needs to run can be decreased.
|
|
|
+the time the percolate API needs to run can be decreased.
|
|
|
|
|
|
[float]
|
|
|
-=== Important notes
|
|
|
+=== Important Notes
|
|
|
|
|
|
Because the percolator API is processing one document at a time, it doesn't support queries and filters that run
|
|
|
against child documents such as `has_child`, `has_parent` and `top_children`.
|
|
@@ -497,16 +497,16 @@ The `wildcard` and `regexp` query natively use a lot of memory and because the p
|
|
|
this can easily take up the available memory in the heap space. If possible try to use a `prefix` query or ngramming to
|
|
|
achieve the same result (with way less memory being used).
|
|
|
|
|
|
-The delete-by-query api doesn't work to unregister a query, it only deletes the percolate documents from disk. In order
|
|
|
+The delete-by-query API doesn't work to unregister a query, it only deletes the percolate documents from disk. In order
|
|
|
to update the registered queries in memory the index needs be closed and opened.
|
|
|
|
|
|
[float]
|
|
|
-=== Forcing unmapped fields to be handled as string
|
|
|
+=== Forcing Unmapped Fields to be Handled as Strings
|
|
|
|
|
|
-In certain cases it is unknown what kind of percolator queries do get registered and if no field mapping exist for fields
|
|
|
+In certain cases it is unknown what kind of percolator queries do get registered, and if no field mapping exists for fields
|
|
|
that are referred by percolator queries then adding a percolator query fails. This means the mapping needs to be updated
|
|
|
-to have the field with the appropriate settings and then the percolator query can be added. But sometimes it is sufficient
|
|
|
+to have the field with the appropriate settings, and then the percolator query can be added. But sometimes it is sufficient
|
|
|
if all unmapped fields are handled as if these were default string fields. In those cases one can configure the
|
|
|
`index.percolator.map_unmapped_fields_as_string` setting to `true` (default to `false`) and then if a field referred in
|
|
|
-a percolator query does not exist, it will be handled as a default string field, so adding the percolator query doesn't
|
|
|
+a percolator query does not exist, it will be handled as a default string field so that adding the percolator query doesn't
|
|
|
fail.
|