浏览代码

Add more docs for multivalue fields (ESQL-1252)

Describes how we fetch multivalued fields by default, return them as
json arrays, how the internal sort order is not guaranteed, how most
functions will turn them into null, and how some fields remove
duplicates on save.

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Nik Everett 2 年之前
父节点
当前提交
74c97dd75b
共有 3 个文件被更改,包括 264 次插入22 次删除
  1. 12 12
      docs/reference/esql/esql-syntax.asciidoc
  2. 12 10
      docs/reference/esql/index.asciidoc
  3. 240 0
      docs/reference/esql/multivalued-fields.asciidoc

+ 12 - 12
docs/reference/esql/esql-syntax.asciidoc

@@ -17,8 +17,8 @@ separated by a pipe character: `|`. For example:
 
 [source,esql]
 ----
-source-command 
-| processing-command1 
+source-command
+| processing-command1
 | processing-command2
 ----
 
@@ -36,7 +36,7 @@ source-command | processing-command1 | processing-command2
 [discrete]
 [[esql-comments]]
 === Comments
-ESQL uses C++ style comments: 
+ESQL uses C++ style comments:
 
 * double slash `//` for single line comments
 * `/*` and `*/` for block comments
@@ -57,9 +57,9 @@ FROM /* Query the employees index */ employees
 [source,esql]
 ----
 FROM employees
-/* Query the 
-*  employees
-*  index */
+/* Query the
+ * employees
+ * index */
 | WHERE height > 2
 ----
 
@@ -90,13 +90,13 @@ For string comparison using wildcards or regular expressions, use `LIKE` or
 are supported:
 +
 --
-** `*` matches zero or more characters. 
-** `?` matches one character. 
+** `*` matches zero or more characters.
+** `?` matches one character.
 
 [source,esql]
 ----
-FROM employees 
-| WHERE first_name LIKE "?b*" 
+FROM employees
+| WHERE first_name LIKE "?b*"
 | PROJECT first_name, last_name
 ----
 --
@@ -105,8 +105,8 @@ FROM employees
 +
 [source,esql]
 ----
-FROM employees 
-| WHERE first_name RLIKE ".leja.*" 
+FROM employees
+| WHERE first_name RLIKE ".leja.*"
 | PROJECT first_name, last_name
 ----
 

+ 12 - 10
docs/reference/esql/index.asciidoc

@@ -90,7 +90,7 @@ POST /_esql?format=txt
   """
 }
 ----
-// TEST[continued]
+// TEST[setup:library]
 
 [discrete]
 ==== {kib}
@@ -108,15 +108,15 @@ with the time filter.
 
 ESQL currently supports the following <<mapping-types,field types>>:
 
-- alias
-- boolean
-- dates
-- ip
-- keyword family (strings)
-- double/float/half_float
-- long/int/short/byte
-- version
-
+- `alias`
+- `boolean`
+- `date`
+- `ip`
+- `keyword` family (`keyword`, `constant_keyword`, and `wildcard`)
+- `double`/`float`/`half_float` (represented as `double`)
+- `long`
+- `int`/`short`/`byte` (represented as `int`)
+- `version`
 --
 
 include::esql-get-started.asciidoc[]
@@ -129,5 +129,7 @@ include::esql-processing-commands.asciidoc[]
 
 include::esql-functions.asciidoc[]
 
+include::multivalued-fields.asciidoc[]
+
 :esql-tests!:
 :esql-specs!:

+ 240 - 0
docs/reference/esql/multivalued-fields.asciidoc

@@ -0,0 +1,240 @@
+[[esql-multivalued-fields]]
+== ESQL multivalued fields
+
+++++
+<titleabbrev>Multivalued fields</titleabbrev>
+++++
+
+ESQL is fine reading from multivalued fields:
+
+[source,console,id=esql-multivalued-fields-reorders]
+----
+POST /mv/_bulk?refresh
+{ "index" : {} }
+{ "a": 1, "b": [2, 1] }
+{ "index" : {} }
+{ "a": 2, "b": 3 }
+
+POST /_esql
+{
+  "query": "FROM mv"
+}
+----
+
+Multivalued fields come back as a JSON array:
+
+[source,console-result]
+----
+{
+  "columns": [
+    { "name": "a", "type": "long"},
+    { "name": "b", "type": "long"}
+  ],
+  "values": [
+    [1, [1, 2]],
+    [2,      3]
+  ]
+}
+----
+
+The relative order of values in a multivalued field is undefined. They'll frequently be in
+ascending order but don't rely on that.
+
+[discrete]
+[[esql-multivalued-fields-dups]]
+==== Duplicate values
+
+Some field types, like <<keyword-field-type,`keyword`>> remove duplicate values on write:
+
+[source,console,id=esql-multivalued-fields-kwdups]
+----
+PUT /mv
+{
+  "mappings": {
+    "properties": {
+      "b": {"type": "keyword"}
+    }
+  }
+}
+
+POST /mv/_bulk?refresh
+{ "index" : {} }
+{ "a": 1, "b": ["foo", "foo", "bar"] }
+{ "index" : {} }
+{ "a": 2, "b": ["bar", "bar"] }
+
+POST /_esql
+{
+  "query": "FROM mv"
+}
+----
+
+And ESQL sees that removal:
+
+[source,console-result]
+----
+{
+  "columns": [
+    { "name": "a", "type": "long"},
+    { "name": "b", "type": "keyword"}
+  ],
+  "values": [
+    [1, ["bar", "foo"]],
+    [2,          "bar"]
+  ]
+}
+----
+
+But other types, like `long` don't remove duplicates.
+
+[source,console,id=esql-multivalued-fields-longdups]
+----
+PUT /mv
+{
+  "mappings": {
+    "properties": {
+      "b": {"type": "long"}
+    }
+  }
+}
+
+POST /mv/_bulk?refresh
+{ "index" : {} }
+{ "a": 1, "b": [2, 2, 1] }
+{ "index" : {} }
+{ "a": 2, "b": [1, 1] }
+
+POST /_esql
+{
+  "query": "FROM mv"
+}
+----
+
+And ESQL also sees that:
+
+[source,console-result]
+----
+{
+  "columns": [
+    { "name": "a", "type": "long"},
+    { "name": "b", "type": "long"}
+  ],
+  "values": [
+    [1, [1, 2, 2]],
+    [2,    [1, 1]]
+  ]
+}
+----
+
+This is all at the storage layer. If you store duplicate `long`s and then
+convert them to strings the duplicates will stay:
+
+[source,console,id=esql-multivalued-fields-longdups-tostring]
+----
+PUT /mv
+{
+  "mappings": {
+    "properties": {
+      "b": {"type": "long"}
+    }
+  }
+}
+
+POST /mv/_bulk?refresh
+{ "index" : {} }
+{ "a": 1, "b": [2, 2, 1] }
+{ "index" : {} }
+{ "a": 2, "b": [1, 1] }
+
+POST /_esql
+{
+  "query": "FROM mv | EVAL b=TO_STRING(b)"
+}
+----
+
+[source,console-result]
+----
+{
+  "columns": [
+    { "name": "a", "type": "long"},
+    { "name": "b", "type": "keyword"}
+  ],
+  "values": [
+    [1, ["1", "2", "2"]],
+    [2,      ["1", "1"]]
+  ]
+}
+----
+
+[discrete]
+[[esql-multivalued-fields-functions]]
+==== Functions
+
+Unless otherwise documented functions will return `null` when applied to a multivalued
+field. This behavior may change in a later version.
+
+[source,console,id=esql-multivalued-fields-mv-into-null]
+----
+POST /mv/_bulk?refresh
+{ "index" : {} }
+{ "a": 1, "b": [2, 1] }
+{ "index" : {} }
+{ "a": 2, "b": 3 }
+
+POST /_esql
+{
+  "query": "FROM mv | EVAL b + 2, a + b"
+}
+----
+
+[source,console-result]
+----
+{
+  "columns": [
+    { "name": "a",   "type": "long"},
+    { "name": "b",   "type": "long"},
+    { "name": "b+2", "type": "long"},
+    { "name": "a+b", "type": "long"}
+  ],
+  "values": [
+    [1, [1, 2], null, null],
+    [2,      3,    5,    5]
+  ]
+}
+----
+
+Work around this limitation by converting the field to single value with one of:
+
+* <<esql-mv_avg>>
+* <<esql-mv_concat>>
+* <<esql-mv_count>>
+* <<esql-mv_max>>
+* <<esql-mv_median>>
+* <<esql-mv_min>>
+* <<esql-mv_sum>>
+
+[source,console,esql-multivalued-fields-mv-into-null]
+----
+POST /_esql
+{
+  "query": "FROM mv | EVAL b=MV_MIN(b) | EVAL b + 2, a + b"
+}
+----
+// TEST[continued]
+
+[source,console-result]
+----
+{
+  "columns": [
+    { "name": "a",   "type": "long"},
+    { "name": "b",   "type": "long"},
+    { "name": "b+2", "type": "long"},
+    { "name": "a+b", "type": "long"}
+  ],
+  "values": [
+    [1, 1, 3, 2],
+    [2, 3, 5, 5]
+  ]
+}
+----
+