123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560 |
- [[query-dsl-function-score-query]]
- === Function Score Query
- The `function_score` allows you to modify the score of documents that are
- retrieved by a query. This can be useful if, for example, a score
- function is computationally expensive and it is sufficient to compute
- the score on a filtered set of documents.
- To use `function_score`, the user has to define a query and one or
- more functions, that compute a new score for each document returned
- by the query.
- `function_score` can be used with only one function like this:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "function_score": {
- "query": {},
- "boost": "5",
- "random_score": {}, <1>
- "boost_mode":"multiply"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <1> See <<score-functions>> for a list of supported functions.
- Furthermore, several functions can be combined. In this case one can
- optionally choose to apply the function only if a document matches a
- given filtering query
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "function_score": {
- "query": {},
- "boost": "5", <1>
- "functions": [
- {
- "filter": {},
- "random_score": {}, <2>
- "weight": 23
- },
- {
- "filter": {},
- "weight": 42
- }
- ],
- "max_boost": 42,
- "score_mode": "max",
- "boost_mode": "multiply",
- "min_score" : 42
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <1> Boost for the whole query.
- <2> See <<score-functions>> for a list of supported functions.
- NOTE: The scores produced by the filtering query of each function do not matter.
- If no query is given with a function this is equivalent to specifying
- `"match_all": {}`
- First, each document is scored by the defined functions. The parameter
- `score_mode` specifies how the computed scores are combined:
- [horizontal]
- `multiply`:: scores are multiplied (default)
- `sum`:: scores are summed
- `avg`:: scores are averaged
- `first`:: the first function that has a matching filter
- is applied
- `max`:: maximum score is used
- `min`:: minimum score is used
- Because scores can be on different scales (for example, between 0 and 1 for decay functions but arbitrary for `field_value_factor`) and also because sometimes a different impact of functions on the score is desirable, the score of each function can be adjusted with a user defined `weight` (). The `weight` can be defined per function in the `functions` array (example above) and is multiplied with the score computed by the respective function.
- If weight is given without any other function declaration, `weight` acts as a function that simply returns the `weight`.
- The new score can be restricted to not exceed a certain limit by setting
- the `max_boost` parameter. The default for `max_boost` is FLT_MAX.
- The newly computed score is combined with the score of the
- query. The parameter `boost_mode` defines how:
- [horizontal]
- `multiply`:: query score and function score is multiplied (default)
- `replace`:: only function score is used, the query score is ignored
- `sum`:: query score and function score are added
- `avg`:: average
- `max`:: max of query score and function score
- `min`:: min of query score and function score
- By default, modifying the score does not change which documents match. To exclude
- documents that do not meet a certain score threshold the `min_score` parameter can be set to the desired score threshold.
- [[score-functions]]
- The `function_score` query provides several types of score functions.
- * <<function-script-score,`script_score`>>
- * <<function-weight,`weight`>>
- * <<function-random,`random_score`>>
- * <<function-field-value-factor,`field_value_factor`>>
- * <<function-decay,decay functions>>: `gauss`, `linear`, `exp`
- [[function-script-score]]
- ==== Script score
- The `script_score` function allows you to wrap another query and customize
- the scoring of it optionally with a computation derived from other numeric
- field values in the doc using a script expression. Here is a
- simple sample:
- [source,js]
- --------------------------------------------------
- "script_score" : {
- "script" : "_score * doc['my_numeric_field'].value"
- }
- --------------------------------------------------
- On top of the different scripting field values and expression, the
- `_score` script parameter can be used to retrieve the score based on the
- wrapped query.
- Scripts are cached for faster execution. If the script has parameters
- that it needs to take into account, it is preferable to reuse the same
- script, and provide parameters to it:
- [source,js]
- --------------------------------------------------
- "script_score": {
- "script": {
- "lang": "lang",
- "params": {
- "param1": value1,
- "param2": value2
- },
- "inline": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
- }
- }
- --------------------------------------------------
- Note that unlike the `custom_score` query, the
- score of the query is multiplied with the result of the script scoring. If
- you wish to inhibit this, set `"boost_mode": "replace"`
- [[function-weight]]
- ==== Weight
- The `weight` score allows you to multiply the score by the provided
- `weight`. This can sometimes be desired since boost value set on
- specific queries gets normalized, while for this score function it does
- not. The number value is of type float.
- [source,js]
- --------------------------------------------------
- "weight" : number
- --------------------------------------------------
- [[function-random]]
- ==== Random
- The `random_score` generates scores using a hash of the `_uid` field,
- with a `seed` for variation. If `seed` is not specified, the current
- time is used.
- NOTE: Using this feature will load field data for `_uid`, which can
- be a memory intensive operation since the values are unique.
- [source,js]
- --------------------------------------------------
- "random_score": {
- "seed" : number
- }
- --------------------------------------------------
- [[function-field-value-factor]]
- ==== Field Value factor
- The `field_value_factor` function allows you to use a field from a document to
- influence the score. It's similar to using the `script_score` function, however,
- it avoids the overhead of scripting. If used on a multi-valued field, only the
- first value of the field is used in calculations.
- As an example, imagine you have a document indexed with a numeric `popularity`
- field and wish to influence the score of a document with this field, an example
- doing so would look like:
- [source,js]
- --------------------------------------------------
- "field_value_factor": {
- "field": "popularity",
- "factor": 1.2,
- "modifier": "sqrt",
- "missing": 1
- }
- --------------------------------------------------
- Which will translate into the following formula for scoring:
- `sqrt(1.2 * doc['popularity'].value)`
- There are a number of options for the `field_value_factor` function:
- [horizontal]
- `field`::
- Field to be extracted from the document.
- `factor`::
- Optional factor to multiply the field value with, defaults to `1`.
- `modifier`::
- Modifier to apply to the field value, can be one of: `none`, `log`,
- `log1p`, `log2p`, `ln`, `ln1p`, `ln2p`, `square`, `sqrt`, or `reciprocal`.
- Defaults to `none`.
- [cols="<,<",options="header",]
- |=======================================================================
- | Modifier | Meaning
- | `none` | Do not apply any multiplier to the field value
- | `log` | Take the https://en.wikipedia.org/wiki/Logarithm[logarithm] of the field value
- | `log1p` | Add 1 to the field value and take the logarithm
- | `log2p` | Add 2 to the field value and take the logarithm
- | `ln` | Take the https://en.wikipedia.org/wiki/Natural_logarithm[natural logarithm] of the field value
- | `ln1p` | Add 1 to the field value and take the natural logarithm
- | `ln2p` | Add 2 to the field value and take the natural logarithm
- | `square` | Square the field value (multiply it by itself)
- | `sqrt` | Take the https://en.wikipedia.org/wiki/Square_root[square root] of the field value
- | `reciprocal` | https://en.wikipedia.org/wiki/Multiplicative_inverse[Reciprocate] the field value, same as `1/x` where `x` is the field's value
- |=======================================================================
- `missing`::
- Value used if the document doesn't have that field. The modifier
- and factor are still applied to it as though it were read from the document.
- Keep in mind that taking the log() of 0, or the square root of a negative number
- is an illegal operation, and an exception will be thrown. Be sure to limit the
- values of the field with a range filter to avoid this, or use `log1p` and
- `ln1p`.
- [[function-decay]]
- ==== Decay functions
- Decay functions score a document with a function that decays depending
- on the distance of a numeric field value of the document from a user
- given origin. This is similar to a range query, but with smooth edges
- instead of boxes.
- To use distance scoring on a query that has numerical fields, the user
- has to define an `origin` and a `scale` for each field. The `origin`
- is needed to define the ``central point'' from which the distance
- is calculated, and the `scale` to define the rate of decay. The
- decay function is specified as
- [source,js]
- --------------------------------------------------
- "DECAY_FUNCTION": { <1>
- "FIELD_NAME": { <2>
- "origin": "11, 12",
- "scale": "2km",
- "offset": "0km",
- "decay": 0.33
- }
- }
- --------------------------------------------------
- <1> The `DECAY_FUNCTION` should be one of `linear`, `exp`, or `gauss`.
- <2> The specified field must be a numeric, date, or geo-point field.
- In the above example, the field is a <<geo-point,`geo_point`>> and origin can be provided in geo format. `scale` and `offset` must be given with a unit in this case. If your field is a date field, you can set `scale` and `offset` as days, weeks, and so on. Example:
- [source,js]
- --------------------------------------------------
- "gauss": {
- "date": {
- "origin": "2013-09-17", <1>
- "scale": "10d",
- "offset": "5d", <2>
- "decay" : 0.5 <2>
- }
- }
- --------------------------------------------------
- <1> The date format of the origin depends on the <<mapping-date-format,`format`>> defined in
- your mapping. If you do not define the origin, the current time is used.
- <2> The `offset` and `decay` parameters are optional.
- [horizontal]
- `origin`::
- The point of origin used for calculating distance. Must be given as a
- number for numeric field, date for date fields and geo point for geo fields.
- Required for geo and numeric field. For date fields the default is `now`. Date
- math (for example `now-1h`) is supported for origin.
- `scale`::
- Required for all types. Defines the distance from origin at which the computed
- score will equal `decay` parameter. For geo fields: Can be defined as number+unit (1km, 12m,...).
- Default unit is meters. For date fields: Can to be defined as a number+unit ("1h", "10d",...).
- Default unit is milliseconds. For numeric field: Any number.
- `offset`::
- If an `offset` is defined, the decay function will only compute the
- decay function for documents with a distance greater that the defined
- `offset`. The default is 0.
- `decay`::
- The `decay` parameter defines how documents are scored at the distance
- given at `scale`. If no `decay` is defined, documents at the distance
- `scale` will be scored 0.5.
- In the first example, your documents might represents hotels and contain a geo
- location field. You want to compute a decay function depending on how
- far the hotel is from a given location. You might not immediately see
- what scale to choose for the gauss function, but you can say something
- like: "At a distance of 2km from the desired location, the score should
- be reduced to one third."
- The parameter "scale" will then be adjusted automatically to assure that
- the score function computes a score of 0.33 for hotels that are 2km away
- from the desired location.
- In the second example, documents with a field value between 2013-09-12 and 2013-09-22 would get a weight of 1.0 and documents which are 15 days from that date a weight of 0.5.
- ===== Supported decay functions
- The `DECAY_FUNCTION` determines the shape of the decay:
- `gauss`::
- +
- --
- Normal decay, computed as:
- image:images/Gaussian.png[]
- where image:images/sigma.png[] is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`
- image:images/sigma_calc.png[]
- See <<gauss-decay>> for graphs demonstrating the curve generated by the `gauss` function.
- --
- `exp`::
- +
- --
- Exponential decay, computed as:
- image:images/Exponential.png[]
- where again the parameter image:images/lambda.png[] is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`
- image:images/lambda_calc.png[]
- See <<exp-decay>> for graphs demonstrating the curve generated by the `exp` function.
- --
- `linear`::
- +
- --
- Linear decay, computed as:
- image:images/Linear.png[].
- where again the parameter `s` is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`
- image:images/s_calc.png[]
- In contrast to the normal and exponential decay, this function actually
- sets the score to 0 if the field value exceeds twice the user given
- scale value.
- --
- For single functions the three decay functions together with their parameters can be visualized like this (the field in this example called "age"):
- image:images/decay_2d.png[width=600]
- ===== Multi-values fields
- If a field used for computing the decay contains multiple values, per default the value closest to the origin is chosen for determining the distance.
- This can be changed by setting `multi_value_mode`.
- [horizontal]
- `min`:: Distance is the minimum distance
- `max`:: Distance is the maximum distance
- `avg`:: Distance is the average distance
- `sum`:: Distance is the sum of all distances
- Example:
- [source,js]
- --------------------------------------------------
- "DECAY_FUNCTION": {
- "FIELD_NAME": {
- "origin": ...,
- "scale": ...
- },
- "multi_value_mode": "avg"
- }
- --------------------------------------------------
- ==== Detailed example
- Suppose you are searching for a hotel in a certain town. Your budget is
- limited. Also, you would like the hotel to be close to the town center,
- so the farther the hotel is from the desired location the less likely
- you are to check in.
- You would like the query results that match your criterion (for
- example, "hotel, Nancy, non-smoker") to be scored with respect to
- distance to the town center and also the price.
- Intuitively, you would like to define the town center as the origin and
- maybe you are willing to walk 2km to the town center from the hotel. +
- In this case your *origin* for the location field is the town center
- and the *scale* is ~2km.
- If your budget is low, you would probably prefer something cheap above
- something expensive. For the price field, the *origin* would be 0 Euros
- and the *scale* depends on how much you are willing to pay, for example 20 Euros.
- In this example, the fields might be called "price" for the price of the
- hotel and "location" for the coordinates of this hotel.
- The function for `price` in this case would be
- [source,js]
- --------------------------------------------------
- "gauss": { <1>
- "price": {
- "origin": "0",
- "scale": "20"
- }
- }
- --------------------------------------------------
- <1> This decay function could also be `linear` or `exp`.
- and for `location`:
- [source,js]
- --------------------------------------------------
- "gauss": { <1>
- "location": {
- "origin": "11, 12",
- "scale": "2km"
- }
- }
- --------------------------------------------------
- <1> This decay function could also be `linear` or `exp`.
- Suppose you want to multiply these two functions on the original score,
- the request would look like this:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "function_score": {
- "functions": [
- {
- "gauss": {
- "price": {
- "origin": "0",
- "scale": "20"
- }
- }
- },
- {
- "gauss": {
- "location": {
- "origin": "11, 12",
- "scale": "2km"
- }
- }
- }
- ],
- "query": {
- "match": {
- "properties": "balcony"
- }
- },
- "score_mode": "multiply"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- Next, we show how the computed score looks like for each of the three
- possible decay functions.
- [[gauss-decay]]
- ===== Normal decay, keyword `gauss`
- When choosing `gauss` as the decay function in the above example, the
- contour and surface plot of the multiplier looks like this:
- image::https://f.cloud.github.com/assets/4320215/768157/cd0e18a6-e898-11e2-9b3c-f0145078bd6f.png[width="700px"]
- image::https://f.cloud.github.com/assets/4320215/768160/ec43c928-e898-11e2-8e0d-f3c4519dbd89.png[width="700px"]
- Suppose your original search results matches three hotels :
- * "Backback Nap"
- * "Drink n Drive"
- * "BnB Bellevue".
- "Drink n Drive" is pretty far from your defined location (nearly 2 km)
- and is not too cheap (about 13 Euros) so it gets a low factor a factor
- of 0.56. "BnB Bellevue" and "Backback Nap" are both pretty close to the
- defined location but "BnB Bellevue" is cheaper, so it gets a multiplier
- of 0.86 whereas "Backpack Nap" gets a value of 0.66.
- [[exp-decay]]
- ===== Exponential decay, keyword `exp`
- When choosing `exp` as the decay function in the above example, the
- contour and surface plot of the multiplier looks like this:
- image::https://f.cloud.github.com/assets/4320215/768161/082975c0-e899-11e2-86f7-174c3a729d64.png[width="700px"]
- image::https://f.cloud.github.com/assets/4320215/768162/0b606884-e899-11e2-907b-aefc77eefef6.png[width="700px"]
- [[linear-decay]]
- ===== Linear decay, keyword `linear`
- When choosing `linear` as the decay function in the above example, the
- contour and surface plot of the multiplier looks like this:
- image::https://f.cloud.github.com/assets/4320215/768164/1775b0ca-e899-11e2-9f4a-776b406305c6.png[width="700px"]
- image::https://f.cloud.github.com/assets/4320215/768165/19d8b1aa-e899-11e2-91bc-6b0553e8d722.png[width="700px"]
- ==== Supported fields for decay functions
- Only numeric, date, and geo-point fields are supported.
- ==== What if a field is missing?
- If the numeric field is missing in the document, the function will
- return 1.
|