custom-analyzer.asciidoc 2.0 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
  1. [[analysis-custom-analyzer]]
  2. === Custom Analyzer
  3. An analyzer of type `custom` that allows to combine a `Tokenizer` with
  4. zero or more `Token Filters`, and zero or more `Char Filters`. The
  5. custom analyzer accepts a logical/registered name of the tokenizer to
  6. use, and a list of logical/registered names of token filters.
  7. The name of the custom analyzer must not start with "_".
  8. The following are settings that can be set for a `custom` analyzer type:
  9. [cols="<,<",options="header",]
  10. |=======================================================================
  11. |Setting |Description
  12. |`tokenizer` |The logical / registered name of the tokenizer to use.
  13. |`filter` |An optional list of logical / registered name of token
  14. filters.
  15. |`char_filter` |An optional list of logical / registered name of char
  16. filters.
  17. |`position_offset_gap` |An optional number of positions to increment
  18. between each field value of a field using this analyzer. Defaults to 100.
  19. 100 was chosen because it prevents phrase queries with reasonably large
  20. slops (less than 100) from matching terms across field values.
  21. |=======================================================================
  22. Here is an example:
  23. [source,js]
  24. --------------------------------------------------
  25. index :
  26. analysis :
  27. analyzer :
  28. myAnalyzer2 :
  29. type : custom
  30. tokenizer : myTokenizer1
  31. filter : [myTokenFilter1, myTokenFilter2]
  32. char_filter : [my_html]
  33. position_offset_gap: 256
  34. tokenizer :
  35. myTokenizer1 :
  36. type : standard
  37. max_token_length : 900
  38. filter :
  39. myTokenFilter1 :
  40. type : stop
  41. stopwords : [stop1, stop2, stop3, stop4]
  42. myTokenFilter2 :
  43. type : length
  44. min : 0
  45. max : 2000
  46. char_filter :
  47. my_html :
  48. type : html_strip
  49. escaped_tags : [xxx, yyy]
  50. read_ahead : 1024
  51. --------------------------------------------------