123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180 |
- [[synthetic-source]]
- ==== Synthetic `_source`
- IMPORTANT: Synthetic `_source` is Generally Available only for TSDB indices
- (indices that have `index.mode` set to `time_series`). For other indices
- synthetic `_source` is in technical preview. Features in technical preview may
- be changed or removed in a future release. Elastic will work to fix
- any issues, but features in technical preview are not subject to the support SLA
- of official GA features.
- Though very handy to have around, the source field takes up a significant amount
- of space on disk. Instead of storing source documents on disk exactly as you
- send them, Elasticsearch can reconstruct source content on the fly upon retrieval.
- Enable this by setting `mode: synthetic` in `_source`:
- [source,console,id=enable-synthetic-source-example]
- ----
- PUT idx
- {
- "mappings": {
- "_source": {
- "mode": "synthetic"
- }
- }
- }
- ----
- // TESTSETUP
- While this on the fly reconstruction is *generally* slower than saving the source
- documents verbatim and loading them at query time, it saves a lot of storage
- space.
- [[synthetic-source-restrictions]]
- ===== Synthetic `_source` restrictions
- There are a couple of restrictions to be aware of:
- * When you retrieve synthetic `_source` content it undergoes minor
- <<synthetic-source-modifications,modifications>> compared to the original JSON.
- * Synthetic `_source` can be used with indices that contain only these field
- types:
- ** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
- ** {plugins}/mapper-annotated-text-usage.html#annotated-text-synthetic-source[`annotated-text`]
- ** <<binary-synthetic-source,`binary`>>
- ** <<boolean-synthetic-source,`boolean`>>
- ** <<numeric-synthetic-source,`byte`>>
- ** <<date-synthetic-source,`date`>>
- ** <<date-nanos-synthetic-source,`date_nanos`>>
- ** <<dense-vector-synthetic-source,`dense_vector`>>
- ** <<numeric-synthetic-source,`double`>>
- ** <<flattened-synthetic-source, `flattened`>>
- ** <<numeric-synthetic-source,`float`>>
- ** <<geo-point-synthetic-source,`geo_point`>>
- ** <<geo-shape-synthetic-source,`geo_shape`>>
- ** <<numeric-synthetic-source,`half_float`>>
- ** <<histogram-synthetic-source,`histogram`>>
- ** <<numeric-synthetic-source,`integer`>>
- ** <<ip-synthetic-source,`ip`>>
- ** <<keyword-synthetic-source,`keyword`>>
- ** <<numeric-synthetic-source,`long`>>
- ** <<range-synthetic-source,`range` types>>
- ** <<numeric-synthetic-source,`scaled_float`>>
- ** <<search-as-you-type-synthetic-source,`search_as_you_type`>>
- ** <<numeric-synthetic-source,`short`>>
- ** <<text-synthetic-source,`text`>>
- ** <<version-synthetic-source,`version`>>
- ** <<wildcard-synthetic-source,`wildcard`>>
- [[synthetic-source-modifications]]
- ===== Synthetic `_source` modifications
- When synthetic `_source` is enabled, retrieved documents undergo some
- modifications compared to the original JSON.
- [[synthetic-source-modifications-leaf-arrays]]
- ====== Arrays moved to leaf fields
- Synthetic `_source` arrays are moved to leaves. For example:
- [source,console,id=synthetic-source-leaf-arrays-example]
- ----
- PUT idx/_doc/1
- {
- "foo": [
- {
- "bar": 1
- },
- {
- "bar": 2
- }
- ]
- }
- ----
- // TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
- Will become:
- [source,console-result]
- ----
- {
- "foo": {
- "bar": [1, 2]
- }
- }
- ----
- // TEST[s/^/{"_source":/ s/\n$/}/]
- This can cause some arrays to vanish:
- [source,console,id=synthetic-source-leaf-arrays-example-sneaky]
- ----
- PUT idx/_doc/1
- {
- "foo": [
- {
- "bar": 1
- },
- {
- "baz": 2
- }
- ]
- }
- ----
- // TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
- Will become:
- [source,console-result]
- ----
- {
- "foo": {
- "bar": 1,
- "baz": 2
- }
- }
- ----
- // TEST[s/^/{"_source":/ s/\n$/}/]
- [[synthetic-source-modifications-field-names]]
- ====== Fields named as they are mapped
- Synthetic source names fields as they are named in the mapping. When used
- with <<dynamic,dynamic mapping>>, fields with dots (`.`) in their names are, by
- default, interpreted as multiple objects, while dots in field names are
- preserved within objects that have <<subobjects>> disabled. For example:
- [source,console,id=synthetic-source-objecty-example]
- ----
- PUT idx/_doc/1
- {
- "foo.bar.baz": 1
- }
- ----
- // TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
- Will become:
- [source,console-result]
- ----
- {
- "foo": {
- "bar": {
- "baz": 1
- }
- }
- }
- ----
- // TEST[s/^/{"_source":/ s/\n$/}/]
- [[synthetic-source-modifications-alphabetical]]
- ====== Alphabetical sorting
- Synthetic `_source` fields are sorted alphabetically. The
- https://www.rfc-editor.org/rfc/rfc7159.html[JSON RFC] defines objects as
- "an unordered collection of zero or more name/value pairs" so applications
- shouldn't care but without synthetic `_source` the original ordering is
- preserved and some applications may, counter to the spec, do something with
- that ordering.
- [[synthetic-source-modifications-ranges]]
- ====== Representation of ranges
- Range field vales (e.g. `long_range`) are always represented as inclusive on both sides with bounds adjusted accordingly. See <<range-synthetic-source-inclusive, examples>>.
|