浏览代码

Add documentation for passthrough field type (#114720) (#114809)

* Guard second doc parsing pass with index setting

* add test

* updates

* updates

* merge

* Add documentation for passthrough field type

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* updates

* updates

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* address comment

* address comment

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* address comment

---------

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
(cherry picked from commit 4d775cba4ff5708e1e895bdf89b648e609b08b89)
Kostas Krikellas 1 年之前
父节点
当前提交
79580869a8

+ 10 - 11
docs/reference/data-streams/set-up-tsds.asciidoc

@@ -121,7 +121,8 @@ naming scheme].
 * Specify a mapping that defines your dimensions and metrics:
 * Specify a mapping that defines your dimensions and metrics:
 
 
 ** One or more <<time-series-dimension,dimension fields>> with a `time_series_dimension` value of `true`.
 ** One or more <<time-series-dimension,dimension fields>> with a `time_series_dimension` value of `true`.
-   At least one of these dimensions must be a plain `keyword` field.
+   Alternatively, one or more <<passthrough-dimensions, pass-through>> fields configured as dimension containers,
+   provided that they will contain at least one sub-field (mapped statically or dynamically).
 
 
 ** One or more <<time-series-metric,metric fields>>, marked using the `time_series_metric` mapping parameter.
 ** One or more <<time-series-metric,metric fields>>, marked using the `time_series_metric` mapping parameter.
 
 
@@ -203,10 +204,9 @@ DELETE _ilm/policy/my-weather-sensor-lifecycle-policy
 Documents in a TSDS must include:
 Documents in a TSDS must include:
 
 
 * A `@timestamp` field
 * A `@timestamp` field
-* One or more dimension fields. At least one dimension must be a `keyword` field
-that matches the `index.routing_path` index setting, if specified. If not specified
-explicitly, `index.routing_path` is set automatically to whichever mappings have
- `time_series_dimension` set to `true`.
+* One or more dimension fields. At least one dimension must match the `index.routing_path` index setting,
+if specified. If not specified explicitly, `index.routing_path` is set automatically to whichever mappings have
+`time_series_dimension` set to `true`.
 
 
 To automatically create your TSDS, submit an indexing request that
 To automatically create your TSDS, submit an indexing request that
 targets the TSDS's name. This name must match one of your index template's
 targets the TSDS's name. This name must match one of your index template's
@@ -285,13 +285,12 @@ POST metrics-weather_sensors-dev/_rollover
 
 
 Configuring a TSDS via an index template that uses component templates is a bit more complicated.
 Configuring a TSDS via an index template that uses component templates is a bit more complicated.
 Typically with component templates mappings and settings get scattered across multiple component templates.
 Typically with component templates mappings and settings get scattered across multiple component templates.
-When configuring the `index.mode` setting in a component template, the `index.routing_path` setting needs to
-be defined in the same component template. Additionally the fields mentioned in the `index.routing_path`
-also need to be defined in the same component template with the `time_series_dimension` attribute enabled.
+If the `index.routing_path` is defined, the fields it references need to be defined in the same component
+template with the `time_series_dimension` attribute enabled.
 
 
-The reasons for this is that each component template needs to be valid on its own and the time series index mode
-requires the `index.routing_path` setting. When configuring the  `index.mode` setting in an index template, the `index.routing_path` setting is configured automatically. It is derived  from
-the field mappings with `time_series_dimension` attribute enabled.
+The reasons for this is that each component template needs to be valid on its own. When configuring the
+`index.mode` setting in an index template, the `index.routing_path` setting is configured automatically.
+It is derived from the field mappings with `time_series_dimension` attribute enabled.
 
 
 [discrete]
 [discrete]
 [[set-up-tsds-whats-next]]
 [[set-up-tsds-whats-next]]

+ 9 - 3
docs/reference/data-streams/tsds.asciidoc

@@ -109,7 +109,10 @@ parameter:
 * <<number,`unsigned_long`>>
 * <<number,`unsigned_long`>>
 * <<boolean,`boolean`>>
 * <<boolean,`boolean`>>
 
 
-For a flattened field, use the `time_series_dimensions` parameter to configure an array of fields as dimensions. For details refer to <<flattened-params,`flattened`>>.
+For a flattened field, use the `time_series_dimensions` parameter to configure an array of fields as dimensions.
+For details refer to <<flattened-params,`flattened`>>.
+
+Dimension definitions can be simplified through <<passthrough-dimensions, pass-through>> fields.
 
 
 [discrete]
 [discrete]
 [[time-series-metric]]
 [[time-series-metric]]
@@ -294,12 +297,15 @@ When you create the matching index template for a TSDS, you must specify one or
 more dimensions in the `index.routing_path` setting. Each document in a TSDS
 more dimensions in the `index.routing_path` setting. Each document in a TSDS
 must contain one or more dimensions that match the `index.routing_path` setting.
 must contain one or more dimensions that match the `index.routing_path` setting.
 
 
-Dimensions in the `index.routing_path` setting must be plain `keyword` fields.
 The `index.routing_path` setting accepts wildcard patterns (for example `dim.*`)
 The `index.routing_path` setting accepts wildcard patterns (for example `dim.*`)
 and can dynamically match new fields. However, {es} will reject any mapping
 and can dynamically match new fields. However, {es} will reject any mapping
-updates that add scripted, runtime, or non-dimension, non-`keyword` fields that
+updates that add scripted, runtime, or non-dimension fields that
 match the `index.routing_path` value.
 match the `index.routing_path` value.
 
 
+<<passthrough-dimensions, pass-through>> fields may be configured
+as dimension containers. In this case, their sub-fields get included to the
+routing path automatically.
+
 TSDS documents don't support a custom `_routing` value. Similarly, you can't
 TSDS documents don't support a custom `_routing` value. Similarly, you can't
 require a `_routing` value in mappings for a TSDS.
 require a `_routing` value in mappings for a TSDS.
 
 

+ 1 - 0
docs/reference/mapping/params/subobjects.asciidoc

@@ -111,6 +111,7 @@ PUT my-index-000001/_doc/metric_1
 
 
 The `subobjects` setting for existing fields and the top-level mapping definition cannot be updated.
 The `subobjects` setting for existing fields and the top-level mapping definition cannot be updated.
 
 
+[[subobjects-auto-flattening]]
 ==== Auto-flattening object mappings
 ==== Auto-flattening object mappings
 
 
 It is generally recommended to define the properties of an object that is configured with `subobjects: false` with dotted field names
 It is generally recommended to define the properties of an object that is configured with `subobjects: false` with dotted field names

+ 9 - 6
docs/reference/mapping/types.asciidoc

@@ -35,12 +35,13 @@ Dates::                 Date types, including <<date,`date`>> and
 [[object-types]]
 [[object-types]]
 ==== Objects and relational types
 ==== Objects and relational types
 
 
-<<object,`object`>>::         A JSON object.
-<<flattened,`flattened`>>::   An entire JSON object as a single field value.
-<<nested,`nested`>>::         A JSON object that preserves the relationship
-                              between its subfields.
-<<parent-join,`join`>>::      Defines a parent/child relationship for documents
-                              in the same index.
+<<object,`object`>>::           A JSON object.
+<<flattened,`flattened`>>::     An entire JSON object as a single field value.
+<<nested,`nested`>>::           A JSON object that preserves the relationship
+                                between its subfields.
+<<parent-join,`join`>>::        Defines a parent/child relationship for documents
+                                in the same index.
+<<passthrough,`passthrough`>>:: Provides aliases for sub-fields at the same level.
 
 
 
 
 [discrete]
 [discrete]
@@ -167,6 +168,8 @@ include::types/numeric.asciidoc[]
 
 
 include::types/object.asciidoc[]
 include::types/object.asciidoc[]
 
 
+include::types/passthrough.asciidoc[]
+
 include::types/percolator.asciidoc[]
 include::types/percolator.asciidoc[]
 
 
 include::types/point.asciidoc[]
 include::types/point.asciidoc[]

+ 218 - 0
docs/reference/mapping/types/passthrough.asciidoc

@@ -0,0 +1,218 @@
+[[passthrough]]
+=== Pass-through object field type
+++++
+<titleabbrev>Pass-through object</titleabbrev>
+++++
+
+Pass-through objects extend the functionality of <<object, objects>> by allowing to access
+their subfields without including the name of the pass-through object as prefix. For instance:
+
+[source,console]
+--------------------------------------------------
+PUT my-index-000001
+{
+  "mappings": {
+    "properties": {
+      "attributes": {
+        "type": "passthrough", <1>
+        "priority": 10,
+        "properties": {
+          "id": {
+            "type": "keyword"
+          }
+        }
+      }
+    }
+  }
+}
+
+PUT my-index-000001/_doc/1
+{
+  "attributes" : {  <2>
+    "id": "foo",
+    "zone": 10
+  }
+}
+
+GET my-index-000001/_search
+{
+  "query": {
+    "bool": {
+      "must": [
+        { "match": { "id": "foo" }},  <3>
+        { "match": { "zone": 10 }}
+      ]
+    }
+  }
+}
+
+GET my-index-000001/_search
+{
+  "query": {
+    "bool": {
+      "must": [
+        { "match": { "attributes.id": "foo" }}, <4>
+        { "match": { "attributes.zone": 10 }}
+      ]
+    }
+  }
+}
+
+--------------------------------------------------
+
+<1> An object is defined as pass-through. Its priority (required) is used for conflict resolution.
+<2> Object contents get indexed as usual, including dynamic mappings.
+<3> Sub-fields can be referenced in queries as if they're defined at the root level.
+<4> Sub-fields can also be referenced including the object name as prefix.
+
+[[passthrough-conflicts]]
+==== Conflict resolution
+
+It's possible for conflicting names to arise, for fields that are defined within different scopes:
+
+  1. A pass-through object is defined next to a field that has the same name as one of the pass-through object
+     sub-fields, e.g.
+
+[source,console]
+--------------------------------------------------
+PUT my-index-000001/_doc/1
+{
+  "attributes" : {
+    "id": "foo"
+  },
+  "id": "bar"
+}
+--------------------------------------------------
+
+     In this case, references to `id` point to the field at the root level, while field `attributes.id`
+     can only be accessed using the full path.
+
+  1. Two (or more) pass-through objects are defined within the same object and contain fields with the same name, e.g.
+
+[source,console]
+--------------------------------------------------
+PUT my-index-000002
+{
+  "mappings": {
+    "properties": {
+      "attributes": {
+        "type": "passthrough",
+        "priority": 10,
+        "properties": {
+          "id": {
+            "type": "keyword"
+          }
+        }
+      },
+      "resource.attributes": {
+        "type": "passthrough",
+        "priority": 20,
+        "properties": {
+          "id": {
+            "type": "keyword"
+          }
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
+
+In this case, param `priority` is used for conflict resolution, with the higher values taking precedence. In the
+example above, `resource.attributes` has higher priority than `attributes`, so references to `id` point to the field
+within `resource.attributes`. `attributes.id` can still be accessed using its full path.
+
+[[passthrough-dimensions]]
+==== Defining sub-fields as time-series dimensions
+
+It is possible to configure a pass-through field as a container for  <<time-series-dimension,time-series dimensions>>.
+In this case, all sub-fields get annotated with the same parameter under the covers, and they're also
+included in <<dimension-based-routing, routing path>> and <<tsid, tsid>> calculations, thus simplifying
+the <<tsds,TSDS>> setup:
+
+[source,console]
+--------------------------------------------------
+PUT _index_template/my-metrics
+{
+  "index_patterns": ["metrics-mymetrics-*"],
+  "priority": 200,
+  "data_stream": { },
+  "template": {
+    "settings": {
+      "index.mode": "time_series"
+    },
+    "mappings": {
+      "properties": {
+        "attributes": {
+          "type": "passthrough",
+          "priority": 10,
+          "time_series_dimension": true,
+          "properties": {
+            "host.name": {
+              "type": "keyword"
+            }
+          }
+        },
+        "cpu": {
+          "type": "integer",
+          "time_series_metric": "counter"
+        }
+      }
+    }
+  }
+}
+
+POST metrics-mymetrics-test/_doc
+{
+  "@timestamp": "2020-01-01T00:00:00.000Z",
+  "attributes" : {
+    "host.name": "foo",
+    "zone": "bar"
+  },
+  "cpu": 10
+}
+--------------------------------------------------
+// TEST[skip: The @timestamp value won't match an accepted range in the TSDS]
+
+In the example above, `attributes` is defined as a dimension container. Its sub-fields `host.name` (static) and `zone`
+(dynamic) get included in the routing path and tsid, and can be referenced in queries without the `attributes.` prefix.
+
+[[passthrough-flattening]]
+==== Sub-field auto-flattening
+
+Pass-through fields apply <<subobjects-auto-flattening, auto-flattening>> to sub-fields by default, to reduce dynamic
+mapping conflicts. As a consequence, no sub-object definitions are allowed within pass-through fields.
+
+[[passthrough-params]]
+==== Parameters for `passthrough` fields
+
+The following parameters are accepted by `passthrough` fields:
+
+[horizontal]
+
+<<passthrough-conflicts,`priority`>>::
+
+    (Required) used for naming conflict resolution between pass-through fields. The field with the highest value wins.
+    Accepts non-negative integer values.
+
+<<passthrough-dimensions,`time_series_dimension`>>::
+
+    Whether or not to treat sub-fields as <<time-series-dimension,time-series dimensions>>.
+    Accepts `false` (default) or `true`.
+
+<<dynamic,`dynamic`>>::
+
+    Whether or not new `properties` should be added dynamically to an existing object.
+    Accepts `true` (default), `runtime`, `false` and `strict`.
+
+<<enabled,`enabled`>>::
+
+    Whether the JSON value given for the object field should be parsed and indexed (`true`, default)
+    or completely ignored (`false`).
+
+<<properties,`properties`>>::
+
+    The fields within the object, which can be of any <<mapping-types,data type>>, including `object`.
+    New properties may be added to an existing object.
+
+IMPORTANT: If you need to index arrays of objects instead of single objects, read <<nested>> first.

+ 0 - 3
server/src/main/java/org/elasticsearch/index/mapper/PassThroughObjectMapper.java

@@ -34,9 +34,6 @@ import static org.elasticsearch.common.xcontent.support.XContentMapValues.nodeIn
  * In case different pass-through objects contain subfields with the same name (excluding the pass-through prefix), their aliases conflict.
  * In case different pass-through objects contain subfields with the same name (excluding the pass-through prefix), their aliases conflict.
  * To resolve this, the pass-through spec specifies which object takes precedence through required parameter "priority"; non-negative
  * To resolve this, the pass-through spec specifies which object takes precedence through required parameter "priority"; non-negative
  * integer values are accepted, with the highest priority value winning in case of conflicting aliases.
  * integer values are accepted, with the highest priority value winning in case of conflicting aliases.
- *
- * Note that this is an experimental, undocumented mapper type, currently intended for prototyping purposes only.
- * It has not been vetted for use in production systems.
  */
  */
 public class PassThroughObjectMapper extends ObjectMapper {
 public class PassThroughObjectMapper extends ObjectMapper {
     public static final String CONTENT_TYPE = "passthrough";
     public static final String CONTENT_TYPE = "passthrough";