testing.asciidoc 2.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
  1. == Testing analyzers
  2. The <<indices-analyze,`analyze` API>> is an invaluable tool for viewing the
  3. terms produced by an analyzer. A built-in analyzer (or combination of built-in
  4. tokenizer, token filters, and character filters) can be specified inline in
  5. the request:
  6. [source,js]
  7. -------------------------------------
  8. POST _analyze
  9. {
  10. "analyzer": "whitespace",
  11. "text": "The quick brown fox."
  12. }
  13. POST _analyze
  14. {
  15. "tokenizer": "standard",
  16. "filter": [ "lowercase", "asciifolding" ],
  17. "text": "Is this déja vu?"
  18. }
  19. -------------------------------------
  20. // CONSOLE
  21. .Positions and character offsets
  22. *********************************************************
  23. As can be seen from the output of the `analyze` API, analyzers not only
  24. convert words into terms, they also record the order or relative _positions_
  25. of each term (used for phrase queries or word proximity queries), and the
  26. start and end _character offsets_ of each term in the original text (used for
  27. highlighting search snippets).
  28. *********************************************************
  29. Alternatively, a <<analysis-custom-analyzer,`custom` analyzer>> can be
  30. referred to when running the `analyze` API on a specific index:
  31. [source,js]
  32. -------------------------------------
  33. PUT my_index
  34. {
  35. "settings": {
  36. "analysis": {
  37. "analyzer": {
  38. "std_folded": { <1>
  39. "type": "custom",
  40. "tokenizer": "standard",
  41. "filter": [
  42. "lowercase",
  43. "asciifolding"
  44. ]
  45. }
  46. }
  47. }
  48. },
  49. "mappings": {
  50. "_doc": {
  51. "properties": {
  52. "my_text": {
  53. "type": "text",
  54. "analyzer": "std_folded" <2>
  55. }
  56. }
  57. }
  58. }
  59. }
  60. GET my_index/_analyze <3>
  61. {
  62. "analyzer": "std_folded", <4>
  63. "text": "Is this déjà vu?"
  64. }
  65. GET my_index/_analyze <3>
  66. {
  67. "field": "my_text", <5>
  68. "text": "Is this déjà vu?"
  69. }
  70. -------------------------------------
  71. // CONSOLE
  72. <1> Define a `custom` analyzer called `std_folded`.
  73. <2> The field `my_text` uses the `std_folded` analyzer.
  74. <3> To refer to this analyzer, the `analyze` API must specify the index name.
  75. <4> Refer to the analyzer by name.
  76. <5> Refer to the analyzer used by field `my_text`.