similarity.asciidoc 1.8 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
  1. [[similarity]]
  2. === `similarity`
  3. Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
  4. field. The `similarity` setting provides a simple way of choosing a similarity
  5. algorithm other than the default `BM25`, such as `TF/IDF`.
  6. Similarities are mostly useful for <<text,`text`>> fields, but can also apply
  7. to other field types.
  8. Custom similarities can be configured by tuning the parameters of the built-in
  9. similarities. For more details about this expert options, see the
  10. <<index-modules-similarity,similarity module>>.
  11. The only similarities which can be used out of the box, without any further
  12. configuration are:
  13. `BM25`::
  14. The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene.
  15. See {defguide}/pluggable-similarites.html[Pluggable Similarity Algorithms]
  16. for more information.
  17. `classic`::
  18. The TF/IDF algorithm which used to be the default in Elasticsearch and
  19. Lucene. See {defguide}/practical-scoring-function.html[Lucene’s Practical Scoring Function]
  20. for more information.
  21. `boolean`::
  22. A simple boolean similarity, which is used when full-text ranking is not needed
  23. and the score should only be based on whether the query terms match or not.
  24. Boolean similarity gives terms a score equal to their query boost.
  25. The `similarity` can be set on the field level when a field is first created,
  26. as follows:
  27. [source,js]
  28. --------------------------------------------------
  29. PUT my_index
  30. {
  31. "mappings": {
  32. "properties": {
  33. "default_field": { <1>
  34. "type": "text"
  35. },
  36. "boolean_sim_field": {
  37. "type": "text",
  38. "similarity": "boolean" <2>
  39. }
  40. }
  41. }
  42. }
  43. --------------------------------------------------
  44. // CONSOLE
  45. <1> The `default_field` uses the `BM25` similarity.
  46. <2> The `boolean_sim_field` uses the `boolean` similarity.