custom-analyzer.asciidoc 1.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
  1. [[analysis-custom-analyzer]]
  2. === Custom Analyzer
  3. An analyzer of type `custom` that allows to combine a `Tokenizer` with
  4. zero or more `Token Filters`, and zero or more `Char Filters`. The
  5. custom analyzer accepts a logical/registered name of the tokenizer to
  6. use, and a list of logical/registered names of token filters.
  7. The name of the custom analyzer must not start with "_".
  8. The following are settings that can be set for a `custom` analyzer type:
  9. [cols="<,<",options="header",]
  10. |=======================================================================
  11. |Setting |Description
  12. |`tokenizer` |The logical / registered name of the tokenizer to use.
  13. |`filter` |An optional list of logical / registered name of token
  14. filters.
  15. |`char_filter` |An optional list of logical / registered name of char
  16. filters.
  17. |`position_offset_gap` |An optional number of positions to increment
  18. between each field value of a field using this analyzer.
  19. |=======================================================================
  20. Here is an example:
  21. [source,js]
  22. --------------------------------------------------
  23. index :
  24. analysis :
  25. analyzer :
  26. myAnalyzer2 :
  27. type : custom
  28. tokenizer : myTokenizer1
  29. filter : [myTokenFilter1, myTokenFilter2]
  30. char_filter : [my_html]
  31. position_offset_gap: 256
  32. tokenizer :
  33. myTokenizer1 :
  34. type : standard
  35. max_token_length : 900
  36. filter :
  37. myTokenFilter1 :
  38. type : stop
  39. stopwords : [stop1, stop2, stop3, stop4]
  40. myTokenFilter2 :
  41. type : length
  42. min : 0
  43. max : 2000
  44. char_filter :
  45. my_html :
  46. type : html_strip
  47. escaped_tags : [xxx, yyy]
  48. read_ahead : 1024
  49. --------------------------------------------------