| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316 | [[analysis-edgengram-tokenizer]]=== Edge NGram TokenizerThe `edge_ngram` tokenizer first breaks text down into words whenever itencounters one of a list of specified characters, then it emitshttps://en.wikipedia.org/wiki/N-gram[N-grams] of each word where the start ofthe N-gram is anchored to the beginning of the word.Edge N-Grams are useful for _search-as-you-type_ queries.TIP: When you need _search-as-you-type_ for text which has a widely knownorder, such as movie or song titles, the<<completion-suggester,completion suggester>> is a much more efficientchoice than edge N-grams.  Edge N-grams have the advantage when trying toautocomplete words that can appear in any order.[float]=== Example outputWith the default settings, the `edge_ngram` tokenizer treats the initial text as asingle token and produces N-grams with minimum length `1` and maximum length`2`:[source,console]---------------------------POST _analyze{  "tokenizer": "edge_ngram",  "text": "Quick Fox"}---------------------------/////////////////////[source,console-result]----------------------------{  "tokens": [    {      "token": "Q",      "start_offset": 0,      "end_offset": 1,      "type": "word",      "position": 0    },    {      "token": "Qu",      "start_offset": 0,      "end_offset": 2,      "type": "word",      "position": 1    }  ]}----------------------------/////////////////////The above sentence would produce the following terms:[source,text]---------------------------[ Q, Qu ]---------------------------NOTE: These default gram lengths are almost entirely useless.  You need toconfigure the `edge_ngram` before using it.[float]=== ConfigurationThe `edge_ngram` tokenizer accepts the following parameters:[horizontal]`min_gram`::    Minimum length of characters in a gram.  Defaults to `1`.`max_gram`::    Maximum length of characters in a gram.  Defaults to `2`.`token_chars`::    Character classes that should be included in a token.  Elasticsearch    will split on characters that don't belong to the classes specified.    Defaults to `[]` (keep all characters).+Character classes may be any of the following:+* `letter` --      for example `a`, `b`, `ï` or `京`* `digit` --       for example `3` or `7`* `whitespace` --  for example `" "` or `"\n"`* `punctuation` -- for example `!` or `"`* `symbol` --      for example `$` or `√`[float]=== Example configurationIn this example, we configure the `edge_ngram` tokenizer to treat letters anddigits as tokens, and to produce grams with minimum length `2` and maximumlength `10`:[source,console]----------------------------PUT my_index{  "settings": {    "analysis": {      "analyzer": {        "my_analyzer": {          "tokenizer": "my_tokenizer"        }      },      "tokenizer": {        "my_tokenizer": {          "type": "edge_ngram",          "min_gram": 2,          "max_gram": 10,          "token_chars": [            "letter",            "digit"          ]        }      }    }  }}POST my_index/_analyze{  "analyzer": "my_analyzer",  "text": "2 Quick Foxes."}----------------------------/////////////////////[source,console-result]----------------------------{  "tokens": [    {      "token": "Qu",      "start_offset": 2,      "end_offset": 4,      "type": "word",      "position": 0    },    {      "token": "Qui",      "start_offset": 2,      "end_offset": 5,      "type": "word",      "position": 1    },    {      "token": "Quic",      "start_offset": 2,      "end_offset": 6,      "type": "word",      "position": 2    },    {      "token": "Quick",      "start_offset": 2,      "end_offset": 7,      "type": "word",      "position": 3    },    {      "token": "Fo",      "start_offset": 8,      "end_offset": 10,      "type": "word",      "position": 4    },    {      "token": "Fox",      "start_offset": 8,      "end_offset": 11,      "type": "word",      "position": 5    },    {      "token": "Foxe",      "start_offset": 8,      "end_offset": 12,      "type": "word",      "position": 6    },    {      "token": "Foxes",      "start_offset": 8,      "end_offset": 13,      "type": "word",      "position": 7    }  ]}----------------------------/////////////////////The above example produces the following terms:[source,text]---------------------------[ Qu, Qui, Quic, Quick, Fo, Fox, Foxe, Foxes ]---------------------------Usually we recommend using the same `analyzer` at index time and at searchtime. In the case of the `edge_ngram` tokenizer, the advice is different.  Itonly makes sense to use the `edge_ngram` tokenizer at index time, to ensurethat partial words are available for matching in the index.  At search time,just search for the terms the user has typed in, for instance: `Quick Fo`.Below is an example of how to set up a field for _search-as-you-type_:[source,console]-----------------------------------PUT my_index{  "settings": {    "analysis": {      "analyzer": {        "autocomplete": {          "tokenizer": "autocomplete",          "filter": [            "lowercase"          ]        },        "autocomplete_search": {          "tokenizer": "lowercase"        }      },      "tokenizer": {        "autocomplete": {          "type": "edge_ngram",          "min_gram": 2,          "max_gram": 10,          "token_chars": [            "letter"          ]        }      }    }  },  "mappings": {    "properties": {      "title": {        "type": "text",        "analyzer": "autocomplete",        "search_analyzer": "autocomplete_search"      }    }  }}PUT my_index/_doc/1{  "title": "Quick Foxes" <1>}POST my_index/_refreshGET my_index/_search{  "query": {    "match": {      "title": {        "query": "Quick Fo", <2>        "operator": "and"      }    }  }}-----------------------------------<1> The `autocomplete` analyzer indexes the terms `[qu, qui, quic, quick, fo, fox, foxe, foxes]`.<2> The `autocomplete_search` analyzer searches for the terms `[quick, fo]`, both of which appear in the index./////////////////////[source,console-result]----------------------------{  "took": $body.took,  "timed_out": false,  "_shards": {    "total": 1,    "successful": 1,    "skipped" : 0,    "failed": 0  },  "hits": {    "total" : {        "value": 1,        "relation": "eq"    },    "max_score": 0.5753642,    "hits": [      {        "_index": "my_index",        "_type": "_doc",        "_id": "1",        "_score": 0.5753642,        "_source": {          "title": "Quick Foxes"        }      }    ]  }}----------------------------// TESTRESPONSE[s/"took".*/"took": "$body.took",/]/////////////////////
 |