123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298 |
- [[query-dsl-sparse-vector-query]]
- === Sparse vector query
- ++++
- <titleabbrev>Sparse vector</titleabbrev>
- ++++
- The sparse vector query executes a query consisting of sparse vectors, such as built by a learned sparse retrieval model.
- This can be achieved with one of two strategies:
- - Using an {nlp} model to convert query text into a list of token-weight pairs
- - Sending in precalculated token-weight pairs as query vectors
- These token-weight pairs are then used in a query against a <<sparse-vector,sparse vector>>
- or a <<semantic-text, semantic_text>> field with a compatible sparse inference model.
- At query time, query vectors are calculated using the same inference model that was used to create the tokens.
- When querying, these query vectors are ORed together with their respective weights, which means scoring is effectively a <<vector-functions-dot-product,dot product>> calculation between stored dimensions and query dimensions.
- For example, a stored vector `{"feature_0": 0.12, "feature_1": 1.2, "feature_2": 3.0}` with query vector `{"feature_0": 2.5, "feature_2": 0.2}` would score the document `_score = 0.12*2.5 + 3.0*0.2 = 0.9`
- [discrete]
- [[sparse-vector-query-ex-request]]
- ==== Example request using an {nlp} model
- [source,console]
- ----
- GET _search
- {
- "query":{
- "sparse_vector": {
- "field": "ml.tokens",
- "inference_id": "the inference ID to produce the token weights",
- "query": "the query string"
- }
- }
- }
- ----
- // TEST[skip: Requires inference]
- [discrete]
- ==== Example request using precomputed vectors
- [source,console]
- ----
- GET _search
- {
- "query":{
- "sparse_vector": {
- "field": "ml.tokens",
- "query_vector": { "token1": 0.5, "token2": 0.3, "token3": 0.2 }
- }
- }
- }
- ----
- // TEST[skip: TBD]
- [discrete]
- [[sparse-vector-field-params]]
- ==== Top level parameters for `sparse_vector`
- `field`::
- (Required, string) The name of the field that contains the token-weight pairs to be searched against.
- `inference_id`::
- (Optional, string)
- The <<inference-apis,inference ID>> to use to convert the query text into token-weight pairs.
- It must be the same inference ID that was used to create the tokens from the input text.
- Only one of `inference_id` and `query_vector` is allowed.
- If `inference_id` is specified, `query` must also be specified.
- If all queried fields are of type <<semantic-text, semantic_text>>, the inference ID associated with the `semantic_text` field will be inferred.
- You can reference a `deployment_id` of a {ml} trained model deployment as an `inference_id`.
- For example, if you download and deploy the ELSER model in the {ml-cap} trained models UI in {kib}, you can use the `deployment_id` of that deployment as the `inference_id`.
- `query`::
- (Optional, string) The query text you want to use for search.
- If `inference_id` is specified, `query` must also be specified.
- If `query_vector` is specified, `query` must not be specified.
- `query_vector`::
- (Optional, dictionary) A dictionary of token-weight pairs representing the precomputed query vector to search.
- Searching using this query vector will bypass additional inference.
- Only one of `inference_id` and `query_vector` is allowed.
- `prune` ::
- (Optional, boolean)
- preview:[]
- Whether to perform pruning, omitting the non-significant tokens from the query to improve query performance.
- If `prune` is true but the `pruning_config` is not specified, pruning will occur but default values will be used.
- Default: false.
- `pruning_config` ::
- (Optional, object)
- preview:[]
- Optional pruning configuration.
- If enabled, this will omit non-significant tokens from the query in order to improve query performance.
- This is only used if `prune` is set to `true`.
- If `prune` is set to `true` but `pruning_config` is not specified, default values will be used.
- +
- --
- Parameters for `pruning_config` are:
- `tokens_freq_ratio_threshold`::
- (Optional, integer)
- preview:[]
- Tokens whose frequency is more than `tokens_freq_ratio_threshold` times the average frequency of all tokens in the specified field are considered outliers and pruned.
- This value must between 1 and 100.
- Default: `5`.
- `tokens_weight_threshold`::
- (Optional, float)
- preview:[]
- Tokens whose weight is less than `tokens_weight_threshold` are considered insignificant and pruned.
- This value must be between 0 and 1.
- Default: `0.4`.
- `only_score_pruned_tokens`::
- (Optional, boolean)
- preview:[]
- If `true` we only input pruned tokens into scoring, and discard non-pruned tokens.
- It is strongly recommended to set this to `false` for the main query, but this can be set to `true` for a rescore query to get more relevant results.
- Default: `false`.
- NOTE: The default values for `tokens_freq_ratio_threshold` and `tokens_weight_threshold` were chosen based on tests using ELSERv2 that provided the most optimal results.
- --
- [discrete]
- [[sparse-vector-query-example]]
- ==== Example ELSER query
- The following is an example of the `sparse_vector` query that references the ELSER model to perform semantic search.
- For a more detailed description of how to perform semantic search by using ELSER and the `sparse_vector` query, refer to <<semantic-search-elser,this tutorial>>.
- [source,console]
- ----
- GET my-index/_search
- {
- "query":{
- "sparse_vector": {
- "field": "ml.tokens",
- "inference_id": "my-elser-model",
- "query": "How is the weather in Jamaica?"
- }
- }
- }
- ----
- // TEST[skip: Requires inference]
- Multiple `sparse_vector` queries can be combined with each other or other query types.
- This can be achieved by wrapping them in <<query-dsl-bool-query, boolean query clauses>> and using linear boosting:
- [source,console]
- ----
- GET my-index/_search
- {
- "query": {
- "bool": {
- "should": [
- {
- "sparse_vector": {
- "field": "ml.inference.title_expanded.predicted_value",
- "inference_id": "my-elser-model",
- "query": "How is the weather in Jamaica?",
- "boost": 1
- }
- },
- {
- "sparse_vector": {
- "field": "ml.inference.description_expanded.predicted_value",
- "inference_id": "my-elser-model",
- "query": "How is the weather in Jamaica?",
- "boost": 1
- }
- },
- {
- "multi_match": {
- "query": "How is the weather in Jamaica?",
- "fields": [
- "title",
- "description"
- ],
- "boost": 4
- }
- }
- ]
- }
- }
- }
- ----
- // TEST[skip: Requires inference]
- This can also be achieved using <<rrf, reciprocal rank fusion (RRF)>>, through an <<rrf-retriever, `rrf` retriever>> with multiple
- <<standard-retriever, `standard` retrievers>>.
- [source,console]
- ----
- GET my-index/_search
- {
- "retriever": {
- "rrf": {
- "retrievers": [
- {
- "standard": {
- "query": {
- "multi_match": {
- "query": "How is the weather in Jamaica?",
- "fields": [
- "title",
- "description"
- ]
- }
- }
- }
- },
- {
- "standard": {
- "query": {
- "sparse_vector": {
- "field": "ml.inference.title_expanded.predicted_value",
- "inference_id": "my-elser-model",
- "query": "How is the weather in Jamaica?",
- "boost": 1
- }
- }
- }
- },
- {
- "standard": {
- "query": {
- "sparse_vector": {
- "field": "ml.inference.description_expanded.predicted_value",
- "inference_id": "my-elser-model",
- "query": "How is the weather in Jamaica?",
- "boost": 1
- }
- }
- }
- }
- ],
- "window_size": 10,
- "rank_constant": 20
- }
- }
- }
- ----
- // TEST[skip: Requires inference]
- [discrete]
- [[sparse-vector-query-with-pruning-config-and-rescore-example]]
- ==== Example ELSER query with pruning configuration and rescore
- The following is an extension to the above example that adds a preview:[] pruning configuration to the `sparse_vector` query.
- The pruning configuration identifies non-significant tokens to prune from the query in order to improve query performance.
- Token pruning happens at the shard level.
- While this should result in the same tokens being labeled as insignificant across shards, this is not guaranteed based on the composition of each shard.
- Therefore, if you are running `sparse_vector` with a `pruning_config` on a multi-shard index, we strongly recommend adding a <<rescore>> function with the tokens that were originally pruned from the query.
- This will help mitigate any shard-level inconsistency with pruned tokens and provide better relevance overall.
- [source,console]
- ----
- GET my-index/_search
- {
- "query":{
- "sparse_vector":{
- "field": "ml.tokens",
- "inference_id": "my-elser-model",
- "query":"How is the weather in Jamaica?",
- "prune": true,
- "pruning_config": {
- "tokens_freq_ratio_threshold": 5,
- "tokens_weight_threshold": 0.4,
- "only_score_pruned_tokens": false
- }
- }
- },
- "rescore": {
- "window_size": 100,
- "query": {
- "rescore_query": {
- "sparse_vector": {
- "field": "ml.tokens",
- "inference_id": "my-elser-model",
- "query": "How is the weather in Jamaica?",
- "prune": true,
- "pruning_config": {
- "tokens_freq_ratio_threshold": 5,
- "tokens_weight_threshold": 0.4,
- "only_score_pruned_tokens": true
- }
- }
- }
- }
- }
- }
- ----
- //TEST[skip: Requires inference]
- NOTE: When performing <<modules-cross-cluster-search, cross-cluster search>>, inference is performed on the local cluster.
|