|
@@ -0,0 +1,120 @@
|
|
|
+[[synthetic-source]]
|
|
|
+==== Synthetic `_source`
|
|
|
+
|
|
|
+Though very handy to have around, the source field takes up a significant amount
|
|
|
+of space on disk. Instead of storing source documents on disk exactly as you
|
|
|
+send them, Elasticsearch can reconstruct source content on the fly upon retrieval.
|
|
|
+Enable this by setting `synthetic: true` in `_source`:
|
|
|
+
|
|
|
+[source,console,id=enable-synthetic-source-example]
|
|
|
+----
|
|
|
+PUT idx
|
|
|
+{
|
|
|
+ "mappings": {
|
|
|
+ "_source": {
|
|
|
+ "synthetic": true
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+----
|
|
|
+// TESTSETUP
|
|
|
+
|
|
|
+While this on the fly reconstruction is *generally* slower than saving the source
|
|
|
+documents verbatim and loading them at query time, it saves a lot of storage
|
|
|
+space. There are a couple of restrictions to be aware of:
|
|
|
+
|
|
|
+* When you retrieve synthetic `_source` content it undergoes minor
|
|
|
+<<synthetic-source-modifications,modifications>> compared to the original JSON.
|
|
|
+* Synthetic `_source` can be used with indices that contain only these field
|
|
|
+types:
|
|
|
+
|
|
|
+** <<boolean-synthetic-source,`boolean`>>
|
|
|
+** <<numeric-synthetic-source,`byte`>>
|
|
|
+** <<numeric-synthetic-source,`double`>>
|
|
|
+** <<numeric-synthetic-source,`float`>>
|
|
|
+** <<geo-point-synthetic-source,`geo_point`>>
|
|
|
+** <<numeric-synthetic-source,`half_float`>>
|
|
|
+** <<numeric-synthetic-source,`integer`>>
|
|
|
+** <<ip-synthetic-source,`ip`>>
|
|
|
+** <<keyword-synthetic-source,`keyword`>>
|
|
|
+** <<numeric-synthetic-source,`long`>>
|
|
|
+** <<numeric-synthetic-source,`scaled_float`>>
|
|
|
+** <<numeric-synthetic-source,`short`>>
|
|
|
+** <<text-synthetic-source,`text`>> (with a `keyword` sub-field)
|
|
|
+
|
|
|
+[[synthetic-source-modifications]]
|
|
|
+===== Synthetic source modifications
|
|
|
+
|
|
|
+When synthetic `_source` is enabled, retrieved documents undergo some
|
|
|
+modifications compared to the original JSON.
|
|
|
+
|
|
|
+[[synthetic-source-modifications-leaf-arrays]]
|
|
|
+====== Arrays moved to leaf fields
|
|
|
+Synthetic `_source` arrays are moved to leaves. For example:
|
|
|
+
|
|
|
+[source,console,id=synthetic-source-leaf-arrays-example]
|
|
|
+----
|
|
|
+PUT idx/_doc/1
|
|
|
+{
|
|
|
+ "foo": [
|
|
|
+ {
|
|
|
+ "bar": 1
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "bar": 2
|
|
|
+ }
|
|
|
+ ]
|
|
|
+}
|
|
|
+----
|
|
|
+// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
|
|
|
+
|
|
|
+Will become:
|
|
|
+
|
|
|
+[source,console-result]
|
|
|
+----
|
|
|
+{
|
|
|
+ "foo": {
|
|
|
+ "bar": [1, 2]
|
|
|
+ }
|
|
|
+}
|
|
|
+----
|
|
|
+// TEST[s/^/{"_source":/ s/\n$/}/]
|
|
|
+
|
|
|
+[[synthetic-source-modifications-field-names]]
|
|
|
+====== Fields named as they are mapped
|
|
|
+Synthetic source names fields as they are named in the mapping. When used
|
|
|
+with <<dynamic,dynamic mapping>>, fields with dots (`.`) in their names are, by
|
|
|
+default, interpreted as multiple objects, while dots in field names are
|
|
|
+preserved within objects that have <<subobjects>> disabled. For example:
|
|
|
+
|
|
|
+[source,console,id=synthetic-source-objecty-example]
|
|
|
+----
|
|
|
+PUT idx/_doc/1
|
|
|
+{
|
|
|
+ "foo.bar.baz": 1
|
|
|
+}
|
|
|
+----
|
|
|
+// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
|
|
|
+
|
|
|
+Will become:
|
|
|
+
|
|
|
+[source,console-result]
|
|
|
+----
|
|
|
+{
|
|
|
+ "foo": {
|
|
|
+ "bar": {
|
|
|
+ "baz": 1
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+----
|
|
|
+// TEST[s/^/{"_source":/ s/\n$/}/]
|
|
|
+
|
|
|
+[[synthetic-source-modifications-alphabetical]]
|
|
|
+====== Alphabetical sorting
|
|
|
+Synthetic `_source` fields are sorted alphabetically. The
|
|
|
+https://www.rfc-editor.org/rfc/rfc7159.html[JSON RFC] defines objects as
|
|
|
+"an unordered collection of zero or more name/value pairs" so applications
|
|
|
+shouldn't care but without synthetic `_source` the original ordering is
|
|
|
+preserved and some applications may, counter to the spec, do something with
|
|
|
+that ordering.
|