123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235 |
- [[query-dsl-feature-query]]
- === Feature Query
- The `feature` query is a specialized query that only works on
- <<feature,`feature`>> fields and <<feature-vector,`feature_vector`>> fields.
- Its goal is to boost the score of documents based on the values of numeric
- features. It is typically put in a `should` clause of a
- <<query-dsl-bool-query,`bool`>> query so that its score is added to the score
- of the query.
- Compared to using <<query-dsl-function-score-query,`function_score`>> or other
- ways to modify the score, this query has the benefit of being able to
- efficiently skip non-competitive hits when
- <<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be
- spectacular.
- Here is an example that indexes various features:
- - https://en.wikipedia.org/wiki/PageRank[`pagerank`], a measure of the
- importance of a website,
- - `url_length`, the length of the url, which typically correlates negatively
- with relevance,
- - `topics`, which associates a list of topics with every document alongside a
- measure of how well the document is connected to this topic.
- Then the example includes an example query that searches for `"2016"` and boosts
- based or `pagerank`, `url_length` and the `sports` topic.
- [source,js]
- --------------------------------------------------
- PUT test
- {
- "mappings": {
- "_doc": {
- "properties": {
- "pagerank": {
- "type": "feature"
- },
- "url_length": {
- "type": "feature",
- "positive_score_impact": false
- },
- "topics": {
- "type": "feature_vector"
- }
- }
- }
- }
- }
- PUT test/_doc/1
- {
- "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
- "content": "Rio 2016",
- "pagerank": 50.3,
- "url_length": 42,
- "topics": {
- "sports": 50,
- "brazil": 30
- }
- }
- PUT test/_doc/2
- {
- "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
- "content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil",
- "pagerank": 50.3,
- "url_length": 47,
- "topics": {
- "sports": 35,
- "formula one": 65,
- "brazil": 20
- }
- }
- PUT test/_doc/3
- {
- "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
- "content": "Deadpool is a 2016 American superhero film",
- "pagerank": 50.3,
- "url_length": 37,
- "topics": {
- "movies": 60,
- "super hero": 65
- }
- }
- POST test/_refresh
- GET test/_search
- {
- "query": {
- "bool": {
- "must": [
- {
- "match": {
- "content": "2016"
- }
- }
- ],
- "should": [
- {
- "feature": {
- "field": "pagerank"
- }
- },
- {
- "feature": {
- "field": "url_length",
- "boost": 0.1
- }
- },
- {
- "feature": {
- "field": "topics.sports",
- "boost": 0.4
- }
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- [float]
- === Supported functions
- The `feature` query supports 3 functions in order to boost scores using the
- values of features. If you do not know where to start, we recommend that you
- start with the `saturation` function, which is the default when no function is
- provided.
- [float]
- ==== Saturation
- This function gives a score that is equal to `S / (S + pivot)` where `S` is the
- value of the feature and `pivot` is a configurable pivot value so that the
- result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+
- otherwise. Scores are always is +(0, 1)+.
- If the feature has a negative score impact then the function will be computed as
- `pivot / (S + pivot)`, which decreases when `S` increases.
- [source,js]
- --------------------------------------------------
- GET test/_search
- {
- "query": {
- "feature": {
- "field": "pagerank",
- "saturation": {
- "pivot": 8
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- If +pivot+ is not supplied then Elasticsearch will compute a default value that
- will be approximately equal to the geometric mean of all feature values that
- exist in the index. We recommend this if you haven't had the opportunity to
- train a good pivot value.
- [source,js]
- --------------------------------------------------
- GET test/_search
- {
- "query": {
- "feature": {
- "field": "pagerank",
- "saturation": {}
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- [float]
- ==== Logarithm
- This function gives a score that is equal to `log(scaling_factor + S)` where
- `S` is the value of the feature and `scaling_factor` is a configurable scaling
- factor. Scores are unbounded.
- This function only supports features that have a positive score impact.
- [source,js]
- --------------------------------------------------
- GET test/_search
- {
- "query": {
- "feature": {
- "field": "pagerank",
- "log": {
- "scaling_factor": 4
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- [float]
- ==== Sigmoid
- This function is an extension of `saturation` which adds a configurable
- exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
- `saturation` function, `pivot` is the value of `S` that gives a score of +0.5+
- and scores are in +(0, 1)+.
- `exponent` must be positive, but is typically in +[0.5, 1]+. A good value should
- be computed via traning. If you don't have the opportunity to do so, we recommend
- that you stick to the `saturation` function instead.
- [source,js]
- --------------------------------------------------
- GET test/_search
- {
- "query": {
- "feature": {
- "field": "pagerank",
- "sigmoid": {
- "pivot": 7,
- "exponent": 0.6
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
|