analysis.asciidoc 2.4 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
  1. [[analysis]]
  2. = Analysis
  3. [partintro]
  4. --
  5. The index analysis module acts as a configurable registry of Analyzers
  6. that can be used in order to both break indexed (analyzed) fields when a
  7. document is indexed and process query strings. It maps to the Lucene
  8. `Analyzer`.
  9. Analyzers are composed of a single <<analysis-tokenizers,Tokenizer>>
  10. and zero or more <<analysis-tokenfilters,TokenFilters>>. The tokenizer may
  11. be preceded by one or more <<analysis-charfilters,CharFilters>>. The
  12. analysis module allows one to register `TokenFilters`, `Tokenizers` and
  13. `Analyzers` under logical names that can then be referenced either in
  14. mapping definitions or in certain APIs. The Analysis module
  15. automatically registers (*if not explicitly defined*) built in
  16. analyzers, token filters, and tokenizers.
  17. Here is a sample configuration:
  18. [source,js]
  19. --------------------------------------------------
  20. index :
  21. analysis :
  22. analyzer :
  23. standard :
  24. type : standard
  25. stopwords : [stop1, stop2]
  26. myAnalyzer1 :
  27. type : standard
  28. stopwords : [stop1, stop2, stop3]
  29. max_token_length : 500
  30. # configure a custom analyzer which is
  31. # exactly like the default standard analyzer
  32. myAnalyzer2 :
  33. tokenizer : standard
  34. filter : [standard, lowercase, stop]
  35. tokenizer :
  36. myTokenizer1 :
  37. type : standard
  38. max_token_length : 900
  39. myTokenizer2 :
  40. type : keyword
  41. buffer_size : 512
  42. filter :
  43. myTokenFilter1 :
  44. type : stop
  45. stopwords : [stop1, stop2, stop3, stop4]
  46. myTokenFilter2 :
  47. type : length
  48. min : 0
  49. max : 2000
  50. --------------------------------------------------
  51. [float]
  52. [[backwards-compatibility]]
  53. === Backwards compatibility
  54. All analyzers, tokenizers, and token filters can be configured with a
  55. `version` parameter to control which Lucene version behavior they should
  56. use. Possible values are: `3.0` - `3.6`, `4.0` - `4.3` (the highest
  57. version number is the default option).
  58. --
  59. include::analysis/analyzers.asciidoc[]
  60. include::analysis/tokenizers.asciidoc[]
  61. include::analysis/tokenfilters.asciidoc[]
  62. include::analysis/charfilters.asciidoc[]
  63. include::analysis/icu-plugin.asciidoc[]