testing.asciidoc 2.1 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
  1. == Testing analyzers
  2. The <<indices-analyze,`analyze` API>> is an invaluable tool for viewing the
  3. terms produced by an analyzer. A built-in analyzer (or combination of built-in
  4. tokenizer, token filters, and character filters) can be specified inline in
  5. the request:
  6. [source,console]
  7. -------------------------------------
  8. POST _analyze
  9. {
  10. "analyzer": "whitespace",
  11. "text": "The quick brown fox."
  12. }
  13. POST _analyze
  14. {
  15. "tokenizer": "standard",
  16. "filter": [ "lowercase", "asciifolding" ],
  17. "text": "Is this déja vu?"
  18. }
  19. -------------------------------------
  20. .Positions and character offsets
  21. *********************************************************
  22. As can be seen from the output of the `analyze` API, analyzers not only
  23. convert words into terms, they also record the order or relative _positions_
  24. of each term (used for phrase queries or word proximity queries), and the
  25. start and end _character offsets_ of each term in the original text (used for
  26. highlighting search snippets).
  27. *********************************************************
  28. Alternatively, a <<analysis-custom-analyzer,`custom` analyzer>> can be
  29. referred to when running the `analyze` API on a specific index:
  30. [source,console]
  31. -------------------------------------
  32. PUT my_index
  33. {
  34. "settings": {
  35. "analysis": {
  36. "analyzer": {
  37. "std_folded": { <1>
  38. "type": "custom",
  39. "tokenizer": "standard",
  40. "filter": [
  41. "lowercase",
  42. "asciifolding"
  43. ]
  44. }
  45. }
  46. }
  47. },
  48. "mappings": {
  49. "properties": {
  50. "my_text": {
  51. "type": "text",
  52. "analyzer": "std_folded" <2>
  53. }
  54. }
  55. }
  56. }
  57. GET my_index/_analyze <3>
  58. {
  59. "analyzer": "std_folded", <4>
  60. "text": "Is this déjà vu?"
  61. }
  62. GET my_index/_analyze <3>
  63. {
  64. "field": "my_text", <5>
  65. "text": "Is this déjà vu?"
  66. }
  67. -------------------------------------
  68. <1> Define a `custom` analyzer called `std_folded`.
  69. <2> The field `my_text` uses the `std_folded` analyzer.
  70. <3> To refer to this analyzer, the `analyze` API must specify the index name.
  71. <4> Refer to the analyzer by name.
  72. <5> Refer to the analyzer used by field `my_text`.