| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207 | [[test-analyzer]]=== Test an analyzerThe <<indices-analyze,`analyze` API>> is an invaluable tool for viewing theterms produced by an analyzer. A built-in analyzer can be specified inline inthe request:[source,console]-------------------------------------POST _analyze{  "analyzer": "whitespace",  "text":     "The quick brown fox."}-------------------------------------The API returns the following response:[source,console-result]-------------------------------------{  "tokens": [    {      "token": "The",      "start_offset": 0,      "end_offset": 3,      "type": "word",      "position": 0    },    {      "token": "quick",      "start_offset": 4,      "end_offset": 9,      "type": "word",      "position": 1    },    {      "token": "brown",      "start_offset": 10,      "end_offset": 15,      "type": "word",      "position": 2    },    {      "token": "fox.",      "start_offset": 16,      "end_offset": 20,      "type": "word",      "position": 3    }  ]}-------------------------------------You can also test combinations of:* A tokenizer* Zero or more token filters* Zero or more character filters[source,console]-------------------------------------POST _analyze{  "tokenizer": "standard",  "filter":  [ "lowercase", "asciifolding" ],  "text":      "Is this déja vu?"}-------------------------------------The API returns the following response:[source,console-result]-------------------------------------{  "tokens": [    {      "token": "is",      "start_offset": 0,      "end_offset": 2,      "type": "<ALPHANUM>",      "position": 0    },    {      "token": "this",      "start_offset": 3,      "end_offset": 7,      "type": "<ALPHANUM>",      "position": 1    },    {      "token": "deja",      "start_offset": 8,      "end_offset": 12,      "type": "<ALPHANUM>",      "position": 2    },    {      "token": "vu",      "start_offset": 13,      "end_offset": 15,      "type": "<ALPHANUM>",      "position": 3    }  ]}-------------------------------------.Positions and character offsets*********************************************************As can be seen from the output of the `analyze` API, analyzers not onlyconvert words into terms, they also record the order or relative _positions_of each term (used for phrase queries or word proximity queries), and thestart and end _character offsets_ of each term in the original text (used forhighlighting search snippets).*********************************************************Alternatively, a <<analysis-custom-analyzer,`custom` analyzer>> can bereferred to when running the `analyze` API on a specific index:[source,console]-------------------------------------PUT my-index-000001{  "settings": {    "analysis": {      "analyzer": {        "std_folded": { <1>          "type": "custom",          "tokenizer": "standard",          "filter": [            "lowercase",            "asciifolding"          ]        }      }    }  },  "mappings": {    "properties": {      "my_text": {        "type": "text",        "analyzer": "std_folded" <2>      }    }  }}GET my-index-000001/_analyze <3>{  "analyzer": "std_folded", <4>  "text":     "Is this déjà vu?"}GET my-index-000001/_analyze <3>{  "field": "my_text", <5>  "text":  "Is this déjà vu?"}-------------------------------------The API returns the following response:[source,console-result]-------------------------------------{  "tokens": [    {      "token": "is",      "start_offset": 0,      "end_offset": 2,      "type": "<ALPHANUM>",      "position": 0    },    {      "token": "this",      "start_offset": 3,      "end_offset": 7,      "type": "<ALPHANUM>",      "position": 1    },    {      "token": "deja",      "start_offset": 8,      "end_offset": 12,      "type": "<ALPHANUM>",      "position": 2    },    {      "token": "vu",      "start_offset": 13,      "end_offset": 15,      "type": "<ALPHANUM>",      "position": 3    }  ]}-------------------------------------<1> Define a `custom` analyzer called `std_folded`.<2> The field `my_text` uses the `std_folded` analyzer.<3> To refer to this analyzer, the `analyze` API must specify the index name.<4> Refer to the analyzer by name.<5> Refer to the analyzer used by field `my_text`.
 |