Browse Source

Reflect latest changes in synthetic source documentation (#109501)

Oleksandr Kolomiiets 1 year ago
parent
commit
276ae121c2

+ 14 - 0
docs/changelog/109501.yaml

@@ -0,0 +1,14 @@
+pr: 109501
+summary: Reflect latest changes in synthetic source documentation
+area: Mapping
+type: enhancement
+issues: []
+highlight:
+  title: Synthetic `_source` improvements
+  body: |-
+    There are multiple improvements to synthetic `_source` functionality:
+
+    * Synthetic `_source` is now supported for all field types including `nested` and `object`. `object` fields are supported with `enabled` set to `false`.
+
+    * Synthetic `_source` can be enabled together with `ignore_malformed` and `ignore_above` parameters for all field types that support them.
+  notable: false

+ 2 - 1
docs/reference/data-streams/tsds.asciidoc

@@ -53,8 +53,9 @@ shard segments by `_tsid` and `@timestamp`.
 documents, the document `_id` is a hash of the document's dimensions and
 `@timestamp`. A TSDS doesn't support custom document `_id` values.
 
+
 * A TSDS uses <<synthetic-source,synthetic `_source`>>, and as a result is
-subject to a number of <<synthetic-source-restrictions,restrictions>>.
+subject to some <<synthetic-source-restrictions,restrictions>> and <<synthetic-source-modifications,modifications>> applied to the `_source` field.
 
 NOTE: A time series index can contain fields other than dimensions or metrics.
 

+ 6 - 6
docs/reference/mapping/fields/source-field.asciidoc

@@ -6,11 +6,11 @@ at index time. The `_source` field itself is not indexed (and thus is not
 searchable), but it is stored so that it can be returned when executing
 _fetch_ requests, like <<docs-get,get>> or <<search-search,search>>.
 
-If disk usage is important to you then have a look at
-<<synthetic-source,synthetic `_source`>> which shrinks disk usage at the cost of
-only supporting a subset of mappings and slower fetches or (not recommended)
-<<disable-source-field,disabling the `_source` field>> which also shrinks disk
-usage but disables many features.
+If disk usage is important to you, then consider the following options:
+
+- Using <<synthetic-source,synthetic `_source`>>, which reconstructs source content at the time of retrieval instead of storing it on disk. This shrinks disk usage, at the cost of slower access to `_source` in <<docs-get,Get>> and <<search-search,Search>> queries.
+- <<disable-source-field,Disabling the `_source` field completely>>. This shrinks disk
+usage but disables features that rely on `_source`.
 
 include::synthetic-source.asciidoc[]
 
@@ -43,7 +43,7 @@ available then a number of features are not supported:
 * The <<docs-update,`update`>>, <<docs-update-by-query,`update_by_query`>>,
 and <<docs-reindex,`reindex`>> APIs.
 
-* In the {kib} link:{kibana-ref}/discover.html[Discover] application, field data will not be displayed. 
+* In the {kib} link:{kibana-ref}/discover.html[Discover] application, field data will not be displayed.
 
 * On the fly <<highlighting,highlighting>>.
 

+ 48 - 35
docs/reference/mapping/fields/synthetic-source.asciidoc

@@ -28,45 +28,22 @@ PUT idx
 
 While this on the fly reconstruction is *generally* slower than saving the source
 documents verbatim and loading them at query time, it saves a lot of storage
-space.
+space. Additional latency can be avoided by not loading `_source` field in queries when it is not needed.
+
+[[synthetic-source-fields]]
+===== Supported fields
+Synthetic `_source` is supported by all field types. Depending on implementation details, field types have different properties when used with synthetic `_source`.
+
+<<synthetic-source-fields-native-list, Most field types>> construct synthetic `_source` using existing data, most commonly <<doc-values,`doc_values`>> and <<stored-fields, stored fields>>. For these field types, no additional space is needed to store the contents of `_source` field. Due to the storage layout of <<doc-values,`doc_values`>>, the generated `_source` field undergoes <<synthetic-source-modifications, modifications>> compared to original document.
+
+For all other field types, the original value of the field is stored as is, in the same way as the `_source` field in non-synthetic mode. In this case there are no modifications and field data in `_source` is the same as in the original document. Similarly, malformed values of fields that use <<ignore-malformed,`ignore_malformed`>> or <<ignore-above,`ignore_above`>> need to be stored as is. This approach is less storage efficient since data needed for `_source` reconstruction is stored in addition to other data required to index the field (like `doc_values`).
 
 [[synthetic-source-restrictions]]
 ===== Synthetic `_source` restrictions
 
-There are a couple of restrictions to be aware of:
+Synthetic `_source` cannot be used together with field mappings that use <<copy-to,`copy_to`>>. 
 
-* When you retrieve synthetic `_source` content it undergoes minor
-<<synthetic-source-modifications,modifications>> compared to the original JSON.
-* Synthetic `_source` can be used with indices that contain only these field
-types:
-
-** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
-** {plugins}/mapper-annotated-text-usage.html#annotated-text-synthetic-source[`annotated-text`]
-** <<binary-synthetic-source,`binary`>>
-** <<boolean-synthetic-source,`boolean`>>
-** <<numeric-synthetic-source,`byte`>>
-** <<date-synthetic-source,`date`>>
-** <<date-nanos-synthetic-source,`date_nanos`>>
-** <<dense-vector-synthetic-source,`dense_vector`>>
-** <<numeric-synthetic-source,`double`>>
-** <<flattened-synthetic-source, `flattened`>>
-** <<numeric-synthetic-source,`float`>>
-** <<geo-point-synthetic-source,`geo_point`>>
-** <<geo-shape-synthetic-source,`geo_shape`>>
-** <<numeric-synthetic-source,`half_float`>>
-** <<histogram-synthetic-source,`histogram`>>
-** <<numeric-synthetic-source,`integer`>>
-** <<ip-synthetic-source,`ip`>>
-** <<keyword-synthetic-source,`keyword`>>
-** <<numeric-synthetic-source,`long`>>
-** <<range-synthetic-source,`range` types>>
-** <<numeric-synthetic-source,`scaled_float`>>
-** <<search-as-you-type-synthetic-source,`search_as_you_type`>>
-** <<numeric-synthetic-source,`short`>>
-** <<text-synthetic-source,`text`>>
-** <<token-count-synthetic-source,`token_count`>>
-** <<version-synthetic-source,`version`>>
-** <<wildcard-synthetic-source,`wildcard`>>
+Some field types have additional restrictions. These restrictions are documented in the **synthetic `_source`** section of the field type's <<mapping-types,documentation>>.
 
 [[synthetic-source-modifications]]
 ===== Synthetic `_source` modifications
@@ -178,4 +155,40 @@ that ordering.
 
 [[synthetic-source-modifications-ranges]]
 ====== Representation of ranges
-Range field vales (e.g. `long_range`) are always represented as inclusive on both sides with bounds adjusted accordingly.  See <<range-synthetic-source-inclusive, examples>>.
+Range field values (e.g. `long_range`) are always represented as inclusive on both sides with bounds adjusted accordingly. See <<range-synthetic-source-inclusive, examples>>.
+
+[[synthetic-source-precision-loss-for-point-types]]
+====== Reduced precision of `geo_point` values
+Values of `geo_point` fields are represented in synthetic `_source` with reduced precision. See <<geo-point-synthetic-source, examples>>.
+
+
+[[synthetic-source-fields-native-list]]
+===== Field types that support synthetic source with no storage overhead
+The following field types support synthetic source using data from <<doc-values,`doc_values`>> or <<stored-fields, stored fields>>, and require no additional storage space to construct the `_source` field. 
+
+NOTE: If you enable the <<ignore-malformed,`ignore_malformed`>> or <<ignore-above,`ignore_above`>> settings, then additional storage is required to store ignored field values for these types.
+
+** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
+** {plugins}/mapper-annotated-text-usage.html#annotated-text-synthetic-source[`annotated-text`]
+** <<binary-synthetic-source,`binary`>>
+** <<boolean-synthetic-source,`boolean`>>
+** <<numeric-synthetic-source,`byte`>>
+** <<date-synthetic-source,`date`>>
+** <<date-nanos-synthetic-source,`date_nanos`>>
+** <<dense-vector-synthetic-source,`dense_vector`>>
+** <<numeric-synthetic-source,`double`>>
+** <<flattened-synthetic-source, `flattened`>>
+** <<numeric-synthetic-source,`float`>>
+** <<geo-point-synthetic-source,`geo_point`>>
+** <<numeric-synthetic-source,`half_float`>>
+** <<histogram-synthetic-source,`histogram`>>
+** <<numeric-synthetic-source,`integer`>>
+** <<ip-synthetic-source,`ip`>>
+** <<keyword-synthetic-source,`keyword`>>
+** <<numeric-synthetic-source,`long`>>
+** <<range-synthetic-source,`range` types>>
+** <<numeric-synthetic-source,`scaled_float`>>
+** <<numeric-synthetic-source,`short`>>
+** <<text-synthetic-source,`text`>>
+** <<version-synthetic-source,`version`>>
+** <<wildcard-synthetic-source,`wildcard`>>