|
@@ -1,13 +1,133 @@
|
|
|
[[mapping-source-field]]
|
|
|
=== `_source`
|
|
|
|
|
|
-The `_source` field is an automatically generated field that stores the
|
|
|
-actual JSON that was used as the indexed document. It is not indexed
|
|
|
-(searchable), just stored. When executing "fetch" requests, like
|
|
|
-<<docs-get,get>> or
|
|
|
-<<search-search,search>>, the `_source` field is
|
|
|
-returned by default.
|
|
|
-
|
|
|
-Many APIs may use the `_source` field. For example, the
|
|
|
-<<docs-update,Update API>>. To minimize the storage cost of
|
|
|
-`_source`, set `index.codec: best_compression` in index settings.
|
|
|
+The `_source` field is an automatically generated field that stores the actual
|
|
|
+JSON that was used as the indexed document. It is not indexed (searchable),
|
|
|
+just stored. When executing "fetch" requests, like <<docs-get,get>> or
|
|
|
+<<search-search,search>>, the `_source` field is returned by default.
|
|
|
+
|
|
|
+==== Disabling source
|
|
|
+
|
|
|
+Though very handy to have around, the source field does incur storage overhead
|
|
|
+within the index. For this reason, it can be disabled as follows:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+PUT tweets
|
|
|
+{
|
|
|
+ "mappings": {},
|
|
|
+ "tweet": {
|
|
|
+ "_source": {
|
|
|
+ "enabled": false
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// AUTOSENSE
|
|
|
+
|
|
|
+[WARNING]
|
|
|
+.Think before disabling the source field
|
|
|
+==================================================
|
|
|
+
|
|
|
+Users often disable the `_source` field without thinking about the
|
|
|
+consequences, and then live to regret it. If the `_source` field isn't
|
|
|
+available then a number of features are not supported:
|
|
|
+
|
|
|
+* The <<docs-update,`update` API>>.
|
|
|
+
|
|
|
+* On the fly <<search-request-highlighting,highlighting>>.
|
|
|
+
|
|
|
+* The ability to reindex from one Elasticsearch index to another, either
|
|
|
+ to change mappings or analysis, or to upgrade an index to a new major
|
|
|
+ version.
|
|
|
+
|
|
|
+* The ability to debug queries or aggregations by viewing the original
|
|
|
+ document used at index time.
|
|
|
+
|
|
|
+* Potentially in the future, the ability to repair index corruption
|
|
|
+ automatically.
|
|
|
+
|
|
|
+If disk space is a concern, rather increase the
|
|
|
+<<index-codec,compression level>> instead of disabling the `_source`.
|
|
|
+==================================================
|
|
|
+
|
|
|
+.The metrics use case
|
|
|
+**************************************************
|
|
|
+
|
|
|
+The _metrics_ use case is distinct from other time-based or logging use cases
|
|
|
+in that there are many small documents which consist only of numbers, dates,
|
|
|
+or keywords. There are no updates, no highlighting requests, and the data
|
|
|
+ages quickly so there is no need to reindex. Search requests typically use
|
|
|
+simple queries to filter the dataset by date or tags, and the results are
|
|
|
+returned as aggregations.
|
|
|
+
|
|
|
+In this case, disabling the `_source` field will save space and reduce I/O.
|
|
|
+It is also advisable to disable the <<mapping-all-field,`_all` field>> in the
|
|
|
+metrics case.
|
|
|
+
|
|
|
+**************************************************
|
|
|
+
|
|
|
+
|
|
|
+[[include-exclude]]
|
|
|
+==== Including / Excluding fields from source
|
|
|
+
|
|
|
+An expert-only feature is the ability to prune the contents of the `_source`
|
|
|
+field after the document has been indexed, but before the `_source` field is
|
|
|
+stored. The `includes`/`excludes` parameters (which also accept wildcards)
|
|
|
+can be used as follows:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+PUT logs
|
|
|
+{
|
|
|
+ "mappings": {
|
|
|
+ "event": {
|
|
|
+ "_source": {
|
|
|
+ "includes": [
|
|
|
+ "*.count",
|
|
|
+ "meta.*"
|
|
|
+ ],
|
|
|
+ "excludes": [
|
|
|
+ "meta.description",
|
|
|
+ "meta.other.*"
|
|
|
+ ]
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+
|
|
|
+PUT logs/event/1
|
|
|
+{
|
|
|
+ "requests": {
|
|
|
+ "count": 10,
|
|
|
+ "foo": "bar" <1>
|
|
|
+ },
|
|
|
+ "meta": {
|
|
|
+ "name": "Some metric",
|
|
|
+ "description": "Some metric description", <1>
|
|
|
+ "other": {
|
|
|
+ "foo": "one", <1>
|
|
|
+ "baz": "two" <1>
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+
|
|
|
+GET logs/event/_search
|
|
|
+{
|
|
|
+ "query": {
|
|
|
+ "match": {
|
|
|
+ "meta.other.foo": "one" <2>
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// AUTOSENSE
|
|
|
+
|
|
|
+<1> These fields will be removed from the stored `_source` field.
|
|
|
+<2> We can still search on this field, even though it is not in the stored `_source`.
|
|
|
+
|
|
|
+WARNING: Removing fields from the `_source` has similar downsides to disabling
|
|
|
+`_source`, especially the fact that you cannot reindex documents from one
|
|
|
+Elasticsearch index to another. Consider using
|
|
|
+<<search-request-source-filtering,source filtering>> or a
|
|
|
+<<mapping-transform,transform script>> instead.
|