| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172 | [[analysis-pathhierarchy-tokenizer]]=== Path Hierarchy TokenizerThe `path_hierarchy` tokenizer takes a hierarchical value like a filesystempath, splits on the path separator, and emits a term for each component in thetree.[float]=== Example output[source,console]---------------------------POST _analyze{  "tokenizer": "path_hierarchy",  "text": "/one/two/three"}---------------------------/////////////////////[source,console-result]----------------------------{  "tokens": [    {      "token": "/one",      "start_offset": 0,      "end_offset": 4,      "type": "word",      "position": 0    },    {      "token": "/one/two",      "start_offset": 0,      "end_offset": 8,      "type": "word",      "position": 0    },    {      "token": "/one/two/three",      "start_offset": 0,      "end_offset": 14,      "type": "word",      "position": 0    }  ]}----------------------------/////////////////////The above text would produce the following terms:[source,text]---------------------------[ /one, /one/two, /one/two/three ]---------------------------[float]=== ConfigurationThe `path_hierarchy` tokenizer accepts the following parameters:[horizontal]`delimiter`::    The character to use as the path separator.  Defaults to `/`.`replacement`::    An optional replacement character to use for the delimiter.    Defaults to the `delimiter`.`buffer_size`::    The number of characters read into the term buffer in a single pass.    Defaults to `1024`.  The term buffer will grow by this size until all the    text has been consumed.  It is advisable not to change this setting.`reverse`::    If set to `true`, emits the tokens in reverse order.  Defaults to `false`.`skip`::    The number of initial tokens to skip.  Defaults to `0`.[float]=== Example configurationIn this example, we configure the `path_hierarchy` tokenizer to split on `-`characters, and to replace them with `/`.  The first two tokens are skipped:[source,console]----------------------------PUT my_index{  "settings": {    "analysis": {      "analyzer": {        "my_analyzer": {          "tokenizer": "my_tokenizer"        }      },      "tokenizer": {        "my_tokenizer": {          "type": "path_hierarchy",          "delimiter": "-",          "replacement": "/",          "skip": 2        }      }    }  }}POST my_index/_analyze{  "analyzer": "my_analyzer",  "text": "one-two-three-four-five"}----------------------------/////////////////////[source,console-result]----------------------------{  "tokens": [    {      "token": "/three",      "start_offset": 7,      "end_offset": 13,      "type": "word",      "position": 0    },    {      "token": "/three/four",      "start_offset": 7,      "end_offset": 18,      "type": "word",      "position": 0    },    {      "token": "/three/four/five",      "start_offset": 7,      "end_offset": 23,      "type": "word",      "position": 0    }  ]}----------------------------/////////////////////The above example produces the following terms:[source,text]---------------------------[ /three, /three/four, /three/four/five ]---------------------------If we were to set `reverse` to `true`, it would produce the following:[source,text]---------------------------[ one/two/three/, two/three/, three/ ]---------------------------[float]=== Detailed ExamplesSee <<analysis-pathhierarchy-tokenizer-examples, detailed examples here>>. 
 |