testing.asciidoc 2.2 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192
  1. == Testing analyzers
  2. The <<indices-analyze,`analyze` API>> is an invaluable tool for viewing the
  3. terms produced by an analyzer. A built-in analyzer (or combination of built-in
  4. tokenizer, token filters, and character filters) can be specified inline in
  5. the request:
  6. [source,js]
  7. -------------------------------------
  8. POST _analyze
  9. {
  10. "analyzer": "whitespace",
  11. "text": "The quick brown fox."
  12. }
  13. POST _analyze
  14. {
  15. "tokenizer": "standard",
  16. "filter": [ "lowercase", "asciifolding" ],
  17. "text": "Is this déja vu?"
  18. }
  19. -------------------------------------
  20. // CONSOLE
  21. .Positions and character offsets
  22. *********************************************************
  23. As can be seen from the output of the `analyze` API, analyzers not only
  24. convert words into terms, they also record the order or relative _positions_
  25. of each term (used for phrase queries or word proximity queries), and the
  26. start and end _character offsets_ of each term in the original text (used for
  27. highlighting search snippets).
  28. *********************************************************
  29. Alternatively, a <<analysis-custom-analyzer,`custom` analyzer>> can be
  30. referred to when running the `analyze` API on a specific index:
  31. [source,js]
  32. -------------------------------------
  33. PUT my_index
  34. {
  35. "settings": {
  36. "analysis": {
  37. "analyzer": {
  38. "std_folded": { <1>
  39. "type": "custom",
  40. "tokenizer": "standard",
  41. "filter": [
  42. "lowercase",
  43. "asciifolding"
  44. ]
  45. }
  46. }
  47. }
  48. },
  49. "mappings": {
  50. "my_type": {
  51. "properties": {
  52. "my_text": {
  53. "type": "text",
  54. "analyzer": "std_folded" <2>
  55. }
  56. }
  57. }
  58. }
  59. }
  60. GET _cluster/health?wait_for_status=yellow
  61. GET my_index/_analyze <3>
  62. {
  63. "analyzer": "std_folded", <4>
  64. "text": "Is this déjà vu?"
  65. }
  66. GET my_index/_analyze <3>
  67. {
  68. "field": "my_text", <5>
  69. "text": "Is this déjà vu?"
  70. }
  71. -------------------------------------
  72. // CONSOLE
  73. <1> Define a `custom` analyzer called `std_folded`.
  74. <2> The field `my_text` uses the `std_folded` analyzer.
  75. <3> To refer to this analyzer, the `analyze` API must specify the index name.
  76. <4> Refer to the analyzer by name.
  77. <5> Refer to the analyzer used by field `my_text`.