Browse Source

Add documentation for passthrough field type (#114720) (#114809)

* Guard second doc parsing pass with index setting

* add test

* updates

* updates

* merge

* Add documentation for passthrough field type

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* updates

* updates

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* address comment

* address comment

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* address comment

---------

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
(cherry picked from commit 4d775cba4ff5708e1e895bdf89b648e609b08b89)
Kostas Krikellas 1 year ago
parent
commit
79580869a8

+ 10 - 11
docs/reference/data-streams/set-up-tsds.asciidoc

@@ -121,7 +121,8 @@ naming scheme].
 * Specify a mapping that defines your dimensions and metrics:
 
 ** One or more <<time-series-dimension,dimension fields>> with a `time_series_dimension` value of `true`.
-   At least one of these dimensions must be a plain `keyword` field.
+   Alternatively, one or more <<passthrough-dimensions, pass-through>> fields configured as dimension containers,
+   provided that they will contain at least one sub-field (mapped statically or dynamically).
 
 ** One or more <<time-series-metric,metric fields>>, marked using the `time_series_metric` mapping parameter.
 
@@ -203,10 +204,9 @@ DELETE _ilm/policy/my-weather-sensor-lifecycle-policy
 Documents in a TSDS must include:
 
 * A `@timestamp` field
-* One or more dimension fields. At least one dimension must be a `keyword` field
-that matches the `index.routing_path` index setting, if specified. If not specified
-explicitly, `index.routing_path` is set automatically to whichever mappings have
- `time_series_dimension` set to `true`.
+* One or more dimension fields. At least one dimension must match the `index.routing_path` index setting,
+if specified. If not specified explicitly, `index.routing_path` is set automatically to whichever mappings have
+`time_series_dimension` set to `true`.
 
 To automatically create your TSDS, submit an indexing request that
 targets the TSDS's name. This name must match one of your index template's
@@ -285,13 +285,12 @@ POST metrics-weather_sensors-dev/_rollover
 
 Configuring a TSDS via an index template that uses component templates is a bit more complicated.
 Typically with component templates mappings and settings get scattered across multiple component templates.
-When configuring the `index.mode` setting in a component template, the `index.routing_path` setting needs to
-be defined in the same component template. Additionally the fields mentioned in the `index.routing_path`
-also need to be defined in the same component template with the `time_series_dimension` attribute enabled.
+If the `index.routing_path` is defined, the fields it references need to be defined in the same component
+template with the `time_series_dimension` attribute enabled.
 
-The reasons for this is that each component template needs to be valid on its own and the time series index mode
-requires the `index.routing_path` setting. When configuring the  `index.mode` setting in an index template, the `index.routing_path` setting is configured automatically. It is derived  from
-the field mappings with `time_series_dimension` attribute enabled.
+The reasons for this is that each component template needs to be valid on its own. When configuring the
+`index.mode` setting in an index template, the `index.routing_path` setting is configured automatically.
+It is derived from the field mappings with `time_series_dimension` attribute enabled.
 
 [discrete]
 [[set-up-tsds-whats-next]]

+ 9 - 3
docs/reference/data-streams/tsds.asciidoc

@@ -109,7 +109,10 @@ parameter:
 * <<number,`unsigned_long`>>
 * <<boolean,`boolean`>>
 
-For a flattened field, use the `time_series_dimensions` parameter to configure an array of fields as dimensions. For details refer to <<flattened-params,`flattened`>>.
+For a flattened field, use the `time_series_dimensions` parameter to configure an array of fields as dimensions.
+For details refer to <<flattened-params,`flattened`>>.
+
+Dimension definitions can be simplified through <<passthrough-dimensions, pass-through>> fields.
 
 [discrete]
 [[time-series-metric]]
@@ -294,12 +297,15 @@ When you create the matching index template for a TSDS, you must specify one or
 more dimensions in the `index.routing_path` setting. Each document in a TSDS
 must contain one or more dimensions that match the `index.routing_path` setting.
 
-Dimensions in the `index.routing_path` setting must be plain `keyword` fields.
 The `index.routing_path` setting accepts wildcard patterns (for example `dim.*`)
 and can dynamically match new fields. However, {es} will reject any mapping
-updates that add scripted, runtime, or non-dimension, non-`keyword` fields that
+updates that add scripted, runtime, or non-dimension fields that
 match the `index.routing_path` value.
 
+<<passthrough-dimensions, pass-through>> fields may be configured
+as dimension containers. In this case, their sub-fields get included to the
+routing path automatically.
+
 TSDS documents don't support a custom `_routing` value. Similarly, you can't
 require a `_routing` value in mappings for a TSDS.
 

+ 1 - 0
docs/reference/mapping/params/subobjects.asciidoc

@@ -111,6 +111,7 @@ PUT my-index-000001/_doc/metric_1
 
 The `subobjects` setting for existing fields and the top-level mapping definition cannot be updated.
 
+[[subobjects-auto-flattening]]
 ==== Auto-flattening object mappings
 
 It is generally recommended to define the properties of an object that is configured with `subobjects: false` with dotted field names

+ 9 - 6
docs/reference/mapping/types.asciidoc

@@ -35,12 +35,13 @@ Dates::                 Date types, including <<date,`date`>> and
 [[object-types]]
 ==== Objects and relational types
 
-<<object,`object`>>::         A JSON object.
-<<flattened,`flattened`>>::   An entire JSON object as a single field value.
-<<nested,`nested`>>::         A JSON object that preserves the relationship
-                              between its subfields.
-<<parent-join,`join`>>::      Defines a parent/child relationship for documents
-                              in the same index.
+<<object,`object`>>::           A JSON object.
+<<flattened,`flattened`>>::     An entire JSON object as a single field value.
+<<nested,`nested`>>::           A JSON object that preserves the relationship
+                                between its subfields.
+<<parent-join,`join`>>::        Defines a parent/child relationship for documents
+                                in the same index.
+<<passthrough,`passthrough`>>:: Provides aliases for sub-fields at the same level.
 
 
 [discrete]
@@ -167,6 +168,8 @@ include::types/numeric.asciidoc[]
 
 include::types/object.asciidoc[]
 
+include::types/passthrough.asciidoc[]
+
 include::types/percolator.asciidoc[]
 
 include::types/point.asciidoc[]

+ 218 - 0
docs/reference/mapping/types/passthrough.asciidoc

@@ -0,0 +1,218 @@
+[[passthrough]]
+=== Pass-through object field type
+++++
+<titleabbrev>Pass-through object</titleabbrev>
+++++
+
+Pass-through objects extend the functionality of <<object, objects>> by allowing to access
+their subfields without including the name of the pass-through object as prefix. For instance:
+
+[source,console]
+--------------------------------------------------
+PUT my-index-000001
+{
+  "mappings": {
+    "properties": {
+      "attributes": {
+        "type": "passthrough", <1>
+        "priority": 10,
+        "properties": {
+          "id": {
+            "type": "keyword"
+          }
+        }
+      }
+    }
+  }
+}
+
+PUT my-index-000001/_doc/1
+{
+  "attributes" : {  <2>
+    "id": "foo",
+    "zone": 10
+  }
+}
+
+GET my-index-000001/_search
+{
+  "query": {
+    "bool": {
+      "must": [
+        { "match": { "id": "foo" }},  <3>
+        { "match": { "zone": 10 }}
+      ]
+    }
+  }
+}
+
+GET my-index-000001/_search
+{
+  "query": {
+    "bool": {
+      "must": [
+        { "match": { "attributes.id": "foo" }}, <4>
+        { "match": { "attributes.zone": 10 }}
+      ]
+    }
+  }
+}
+
+--------------------------------------------------
+
+<1> An object is defined as pass-through. Its priority (required) is used for conflict resolution.
+<2> Object contents get indexed as usual, including dynamic mappings.
+<3> Sub-fields can be referenced in queries as if they're defined at the root level.
+<4> Sub-fields can also be referenced including the object name as prefix.
+
+[[passthrough-conflicts]]
+==== Conflict resolution
+
+It's possible for conflicting names to arise, for fields that are defined within different scopes:
+
+  1. A pass-through object is defined next to a field that has the same name as one of the pass-through object
+     sub-fields, e.g.
+
+[source,console]
+--------------------------------------------------
+PUT my-index-000001/_doc/1
+{
+  "attributes" : {
+    "id": "foo"
+  },
+  "id": "bar"
+}
+--------------------------------------------------
+
+     In this case, references to `id` point to the field at the root level, while field `attributes.id`
+     can only be accessed using the full path.
+
+  1. Two (or more) pass-through objects are defined within the same object and contain fields with the same name, e.g.
+
+[source,console]
+--------------------------------------------------
+PUT my-index-000002
+{
+  "mappings": {
+    "properties": {
+      "attributes": {
+        "type": "passthrough",
+        "priority": 10,
+        "properties": {
+          "id": {
+            "type": "keyword"
+          }
+        }
+      },
+      "resource.attributes": {
+        "type": "passthrough",
+        "priority": 20,
+        "properties": {
+          "id": {
+            "type": "keyword"
+          }
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
+
+In this case, param `priority` is used for conflict resolution, with the higher values taking precedence. In the
+example above, `resource.attributes` has higher priority than `attributes`, so references to `id` point to the field
+within `resource.attributes`. `attributes.id` can still be accessed using its full path.
+
+[[passthrough-dimensions]]
+==== Defining sub-fields as time-series dimensions
+
+It is possible to configure a pass-through field as a container for  <<time-series-dimension,time-series dimensions>>.
+In this case, all sub-fields get annotated with the same parameter under the covers, and they're also
+included in <<dimension-based-routing, routing path>> and <<tsid, tsid>> calculations, thus simplifying
+the <<tsds,TSDS>> setup:
+
+[source,console]
+--------------------------------------------------
+PUT _index_template/my-metrics
+{
+  "index_patterns": ["metrics-mymetrics-*"],
+  "priority": 200,
+  "data_stream": { },
+  "template": {
+    "settings": {
+      "index.mode": "time_series"
+    },
+    "mappings": {
+      "properties": {
+        "attributes": {
+          "type": "passthrough",
+          "priority": 10,
+          "time_series_dimension": true,
+          "properties": {
+            "host.name": {
+              "type": "keyword"
+            }
+          }
+        },
+        "cpu": {
+          "type": "integer",
+          "time_series_metric": "counter"
+        }
+      }
+    }
+  }
+}
+
+POST metrics-mymetrics-test/_doc
+{
+  "@timestamp": "2020-01-01T00:00:00.000Z",
+  "attributes" : {
+    "host.name": "foo",
+    "zone": "bar"
+  },
+  "cpu": 10
+}
+--------------------------------------------------
+// TEST[skip: The @timestamp value won't match an accepted range in the TSDS]
+
+In the example above, `attributes` is defined as a dimension container. Its sub-fields `host.name` (static) and `zone`
+(dynamic) get included in the routing path and tsid, and can be referenced in queries without the `attributes.` prefix.
+
+[[passthrough-flattening]]
+==== Sub-field auto-flattening
+
+Pass-through fields apply <<subobjects-auto-flattening, auto-flattening>> to sub-fields by default, to reduce dynamic
+mapping conflicts. As a consequence, no sub-object definitions are allowed within pass-through fields.
+
+[[passthrough-params]]
+==== Parameters for `passthrough` fields
+
+The following parameters are accepted by `passthrough` fields:
+
+[horizontal]
+
+<<passthrough-conflicts,`priority`>>::
+
+    (Required) used for naming conflict resolution between pass-through fields. The field with the highest value wins.
+    Accepts non-negative integer values.
+
+<<passthrough-dimensions,`time_series_dimension`>>::
+
+    Whether or not to treat sub-fields as <<time-series-dimension,time-series dimensions>>.
+    Accepts `false` (default) or `true`.
+
+<<dynamic,`dynamic`>>::
+
+    Whether or not new `properties` should be added dynamically to an existing object.
+    Accepts `true` (default), `runtime`, `false` and `strict`.
+
+<<enabled,`enabled`>>::
+
+    Whether the JSON value given for the object field should be parsed and indexed (`true`, default)
+    or completely ignored (`false`).
+
+<<properties,`properties`>>::
+
+    The fields within the object, which can be of any <<mapping-types,data type>>, including `object`.
+    New properties may be added to an existing object.
+
+IMPORTANT: If you need to index arrays of objects instead of single objects, read <<nested>> first.

+ 0 - 3
server/src/main/java/org/elasticsearch/index/mapper/PassThroughObjectMapper.java

@@ -34,9 +34,6 @@ import static org.elasticsearch.common.xcontent.support.XContentMapValues.nodeIn
  * In case different pass-through objects contain subfields with the same name (excluding the pass-through prefix), their aliases conflict.
  * To resolve this, the pass-through spec specifies which object takes precedence through required parameter "priority"; non-negative
  * integer values are accepted, with the highest priority value winning in case of conflicting aliases.
- *
- * Note that this is an experimental, undocumented mapper type, currently intended for prototyping purposes only.
- * It has not been vetted for use in production systems.
  */
 public class PassThroughObjectMapper extends ObjectMapper {
     public static final String CONTENT_TYPE = "passthrough";