analyzer.asciidoc 2.7 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
  1. [[analyzer]]
  2. === `analyzer`
  3. The values of <<mapping-index,`analyzed`>> string fields are passed through an
  4. <<analysis,analyzer>> to convert the string into a stream of _tokens_ or
  5. _terms_. For instance, the string `"The quick Brown Foxes."` may, depending
  6. on which analyzer is used, be analyzed to the tokens: `quick`, `brown`,
  7. `fox`. These are the actual terms that are indexed for the field, which makes
  8. it possible to search efficiently for individual words _within_ big blobs of
  9. text.
  10. This analysis process needs to happen not just at index time, but also at
  11. query time: the query string needs to be passed through the same (or a
  12. similar) analyzer so that the terms that it tries to find are in the same
  13. format as those that exist in the index.
  14. Elasticsearch ships with a number of <<analysis-analyzers,pre-defined analyzers>>,
  15. which can be used without further configuration. It also ships with many
  16. <<analysis-charfilters,character filters>>, <<analysis-tokenizers,tokenizers>>,
  17. and <<analysis-tokenfilters>> which can be combined to configure
  18. custom analyzers per index.
  19. Analyzers can be specified per-query, per-field or per-index. At index time,
  20. Elasticsearch will look for an analyzer in this order:
  21. * The `analyzer` defined in the field mapping.
  22. * An analyzer named `default` in the index settings.
  23. * The <<analysis-standard-analyzer,`standard`>> analyzer.
  24. At query time, there are a few more layers:
  25. * The `analyzer` defined in a <<full-text-queries,full-text query>>.
  26. * The `search_analyzer` defined in the field mapping.
  27. * The `analyzer` defined in the field mapping.
  28. * An analyzer named `default_search` in the index settings.
  29. * An analyzer named `default` in the index settings.
  30. * The <<analysis-standard-analyzer,`standard`>> analyzer.
  31. The easiest way to specify an analyzer for a particular field is to define it
  32. in the field mapping, as follows:
  33. [source,js]
  34. --------------------------------------------------
  35. PUT my_index
  36. {
  37. "mappings": {
  38. "my_type": {
  39. "properties": {
  40. "text": { <1>
  41. "type": "string",
  42. "fields": {
  43. "english": { <2>
  44. "type": "string",
  45. "analyzer": "english"
  46. }
  47. }
  48. }
  49. }
  50. }
  51. }
  52. }
  53. GET my_index/_analyze?field=text <3>
  54. {
  55. "text": "The quick Brown Foxes."
  56. }
  57. GET my_index/_analyze?field=text.english <4>
  58. {
  59. "text": "The quick Brown Foxes."
  60. }
  61. --------------------------------------------------
  62. // AUTOSENSE
  63. <1> The `text` field uses the default `standard` analyzer`.
  64. <2> The `text.english` <<multi-fields,multi-field>> uses the `english` analyzer, which removes stop words and applies stemming.
  65. <3> This returns the tokens: [ `the`, `quick`, `brown`, `foxes` ].
  66. <4> This returns the tokens: [ `quick`, `brown`, `fox` ].