123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207 |
- [[test-analyzer]]
- === Test an analyzer
- The <<indices-analyze,`analyze` API>> is an invaluable tool for viewing the
- terms produced by an analyzer. A built-in analyzer can be specified inline in
- the request:
- [source,console]
- -------------------------------------
- POST _analyze
- {
- "analyzer": "whitespace",
- "text": "The quick brown fox."
- }
- -------------------------------------
- The API returns the following response:
- [source,console-result]
- -------------------------------------
- {
- "tokens": [
- {
- "token": "The",
- "start_offset": 0,
- "end_offset": 3,
- "type": "word",
- "position": 0
- },
- {
- "token": "quick",
- "start_offset": 4,
- "end_offset": 9,
- "type": "word",
- "position": 1
- },
- {
- "token": "brown",
- "start_offset": 10,
- "end_offset": 15,
- "type": "word",
- "position": 2
- },
- {
- "token": "fox.",
- "start_offset": 16,
- "end_offset": 20,
- "type": "word",
- "position": 3
- }
- ]
- }
- -------------------------------------
- You can also test combinations of:
- * A tokenizer
- * Zero or more token filters
- * Zero or more character filters
- [source,console]
- -------------------------------------
- POST _analyze
- {
- "tokenizer": "standard",
- "filter": [ "lowercase", "asciifolding" ],
- "text": "Is this déja vu?"
- }
- -------------------------------------
- The API returns the following response:
- [source,console-result]
- -------------------------------------
- {
- "tokens": [
- {
- "token": "is",
- "start_offset": 0,
- "end_offset": 2,
- "type": "<ALPHANUM>",
- "position": 0
- },
- {
- "token": "this",
- "start_offset": 3,
- "end_offset": 7,
- "type": "<ALPHANUM>",
- "position": 1
- },
- {
- "token": "deja",
- "start_offset": 8,
- "end_offset": 12,
- "type": "<ALPHANUM>",
- "position": 2
- },
- {
- "token": "vu",
- "start_offset": 13,
- "end_offset": 15,
- "type": "<ALPHANUM>",
- "position": 3
- }
- ]
- }
- -------------------------------------
- .Positions and character offsets
- *********************************************************
- As can be seen from the output of the `analyze` API, analyzers not only
- convert words into terms, they also record the order or relative _positions_
- of each term (used for phrase queries or word proximity queries), and the
- start and end _character offsets_ of each term in the original text (used for
- highlighting search snippets).
- *********************************************************
- Alternatively, a <<analysis-custom-analyzer,`custom` analyzer>> can be
- referred to when running the `analyze` API on a specific index:
- [source,console]
- -------------------------------------
- PUT my-index-000001
- {
- "settings": {
- "analysis": {
- "analyzer": {
- "std_folded": { <1>
- "type": "custom",
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "asciifolding"
- ]
- }
- }
- }
- },
- "mappings": {
- "properties": {
- "my_text": {
- "type": "text",
- "analyzer": "std_folded" <2>
- }
- }
- }
- }
- GET my-index-000001/_analyze <3>
- {
- "analyzer": "std_folded", <4>
- "text": "Is this déjà vu?"
- }
- GET my-index-000001/_analyze <3>
- {
- "field": "my_text", <5>
- "text": "Is this déjà vu?"
- }
- -------------------------------------
- The API returns the following response:
- [source,console-result]
- -------------------------------------
- {
- "tokens": [
- {
- "token": "is",
- "start_offset": 0,
- "end_offset": 2,
- "type": "<ALPHANUM>",
- "position": 0
- },
- {
- "token": "this",
- "start_offset": 3,
- "end_offset": 7,
- "type": "<ALPHANUM>",
- "position": 1
- },
- {
- "token": "deja",
- "start_offset": 8,
- "end_offset": 12,
- "type": "<ALPHANUM>",
- "position": 2
- },
- {
- "token": "vu",
- "start_offset": 13,
- "end_offset": 15,
- "type": "<ALPHANUM>",
- "position": 3
- }
- ]
- }
- -------------------------------------
- <1> Define a `custom` analyzer called `std_folded`.
- <2> The field `my_text` uses the `std_folded` analyzer.
- <3> To refer to this analyzer, the `analyze` API must specify the index name.
- <4> Refer to the analyzer by name.
- <5> Refer to the analyzer used by field `my_text`.
|