navigation_title: "Delimited payload" mapped_pages:
::::{warning}
The older name delimited_payload_filter
is deprecated and should not be used with new indices. Use delimited_payload
instead.
::::
Separates a token stream into tokens and payloads based on a specified delimiter.
For example, you can use the delimited_payload
filter with a |
delimiter to split the|1 quick|2 fox|3
into the tokens the
, quick
, and fox
with respective payloads of 1
, 2
, and 3
.
This filter uses Lucene’s DelimitedPayloadTokenFilter.
::::{admonition} Payloads :class: note
A payload is user-defined binary data associated with a token position and stored as base64-encoded bytes.
{{es}} does not store token payloads by default. To store payloads, you must:
term_vector
mapping parameter to with_positions_payloads
or with_positions_offsets_payloads
for any field storing payloads.delimited_payload
filterYou can view stored payloads using the term vectors API.
::::
The following analyze API request uses the delimited_payload
filter with the default |
delimiter to split the|0 brown|10 fox|5 is|0 quick|10
into tokens and payloads.
GET _analyze
{
"tokenizer": "whitespace",
"filter": ["delimited_payload"],
"text": "the|0 brown|10 fox|5 is|0 quick|10"
}
The filter produces the following tokens:
[ the, brown, fox, is, quick ]
Note that the analyze API does not return stored payloads. For an example that includes returned payloads, see Return stored payloads.
The following create index API request uses the delimited-payload
filter to configure a new custom analyzer.
PUT delimited_payload
{
"settings": {
"analysis": {
"analyzer": {
"whitespace_delimited_payload": {
"tokenizer": "whitespace",
"filter": [ "delimited_payload" ]
}
}
}
}
}
delimiter
: (Optional, string) Character used to separate tokens from payloads. Defaults to |
.
encoding
: (Optional, string) Data type for the stored payload. Valid values are:
float
: (Default) Float
identity
: Characters
int
: Integer
To customize the delimited_payload
filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.
For example, the following create index API request uses a custom delimited_payload
filter to configure a new custom analyzer. The custom delimited_payload
filter uses the +
delimiter to separate tokens from payloads. Payloads are encoded as integers.
PUT delimited_payload_example
{
"settings": {
"analysis": {
"analyzer": {
"whitespace_plus_delimited": {
"tokenizer": "whitespace",
"filter": [ "plus_delimited" ]
}
},
"filter": {
"plus_delimited": {
"type": "delimited_payload",
"delimiter": "+",
"encoding": "int"
}
}
}
}
}
Use the create index API to create an index that:
Uses a custom index analyzer with the delimited_payload
filter.
PUT text_payloads
{
"mappings": {
"properties": {
"text": {
"type": "text",
"term_vector": "with_positions_payloads",
"analyzer": "payload_delimiter"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"payload_delimiter": {
"tokenizer": "whitespace",
"filter": [ "delimited_payload" ]
}
}
}
}
}
Add a document containing payloads to the index.
POST text_payloads/_doc/1
{
"text": "the|0 brown|3 fox|4 is|0 quick|10"
}
Use the term vectors API to return the document’s tokens and base64-encoded payloads.
GET text_payloads/_termvectors/1
{
"fields": [ "text" ],
"payloads": true
}
The API returns the following response:
{
"_index": "text_payloads",
"_id": "1",
"_version": 1,
"found": true,
"took": 8,
"term_vectors": {
"text": {
"field_statistics": {
"sum_doc_freq": 5,
"doc_count": 1,
"sum_ttf": 5
},
"terms": {
"brown": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"payload": "QEAAAA=="
}
]
},
"fox": {
"term_freq": 1,
"tokens": [
{
"position": 2,
"payload": "QIAAAA=="
}
]
},
"is": {
"term_freq": 1,
"tokens": [
{
"position": 3,
"payload": "AAAAAA=="
}
]
},
"quick": {
"term_freq": 1,
"tokens": [
{
"position": 4,
"payload": "QSAAAA=="
}
]
},
"the": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"payload": "AAAAAA=="
}
]
}
}
}
}
}