瀏覽代碼

Docs: Updated the source field docs to remove deprecation of includes/excludes

Also provide warnings about why disabling source is probably something
you don't want to do

Closes #12141
Clinton Gormley 10 年之前
父節點
當前提交
6c0badd0b3
共有 2 個文件被更改,包括 131 次插入11 次删除
  1. 1 1
      docs/reference/index-modules.asciidoc
  2. 130 10
      docs/reference/mapping/fields/source-field.asciidoc

+ 1 - 1
docs/reference/index-modules.asciidoc

@@ -92,7 +92,7 @@ specific index module:
     index visible to search.  Defaults to `1s`.  Can be set to `-1` to disable
     refresh.
 
-`index.codec`::
+[[index-codec]] `index.codec`::
 
     experimental[] The `default` value compresses stored data with LZ4
     compression, but this can be set to `best_compression` for a higher

+ 130 - 10
docs/reference/mapping/fields/source-field.asciidoc

@@ -1,13 +1,133 @@
 [[mapping-source-field]]
 === `_source`
 
-The `_source` field is an automatically generated field that stores the
-actual JSON that was used as the indexed document. It is not indexed
-(searchable), just stored. When executing "fetch" requests, like
-<<docs-get,get>> or
-<<search-search,search>>, the `_source` field is
-returned by default.
-
-Many APIs may use the `_source` field. For example, the
-<<docs-update,Update API>>. To minimize the storage cost of
-`_source`, set `index.codec: best_compression` in index settings.
+The `_source` field is an automatically generated field that stores the actual
+JSON that was used as the indexed document. It is not indexed (searchable),
+just stored. When executing "fetch" requests, like <<docs-get,get>> or
+<<search-search,search>>, the `_source` field is returned by default.
+
+==== Disabling source
+
+Though very handy to have around, the source field does incur storage overhead
+within the index. For this reason, it can be disabled as follows:
+
+[source,js]
+--------------------------------------------------
+PUT tweets
+{
+  "mappings": {},
+  "tweet": {
+    "_source": {
+      "enabled": false
+    }
+  }
+}
+--------------------------------------------------
+// AUTOSENSE
+
+[WARNING]
+.Think before disabling the source field
+==================================================
+
+Users often disable the `_source` field without thinking about the
+consequences, and then live to regret it.  If the `_source` field isn't
+available then a number of features are not supported:
+
+* The <<docs-update,`update` API>>.
+
+* On the fly <<search-request-highlighting,highlighting>>.
+
+* The ability to reindex from one Elasticsearch index to another, either
+  to change mappings or analysis, or to upgrade an index to a new major
+  version.
+
+* The ability to debug queries or aggregations by viewing the original
+  document used at index time.
+
+* Potentially in the future, the ability to repair index corruption
+  automatically.
+
+If disk space is a concern, rather increase the
+<<index-codec,compression level>> instead of disabling the `_source`.
+==================================================
+
+.The metrics use case
+**************************************************
+
+The _metrics_ use case is distinct from other time-based or logging use cases
+in that there are many small documents which consist only of numbers, dates,
+or keywords.  There are no updates, no highlighting requests, and the data
+ages quickly so there is no need to reindex.  Search requests typically use
+simple queries to filter the dataset by date or tags, and the results are
+returned as aggregations.
+
+In this case, disabling the `_source` field will save space and reduce I/O.
+It is also advisable to disable the <<mapping-all-field,`_all` field>> in the
+metrics case.
+
+**************************************************
+
+
+[[include-exclude]]
+==== Including / Excluding fields from source
+
+An expert-only feature is the ability to prune the contents of the `_source`
+field after the document has been indexed, but before the `_source` field is
+stored.  The `includes`/`excludes` parameters (which also accept wildcards)
+can be used as follows:
+
+[source,js]
+--------------------------------------------------
+PUT logs
+{
+  "mappings": {
+    "event": {
+      "_source": {
+        "includes": [
+          "*.count",
+          "meta.*"
+        ],
+        "excludes": [
+          "meta.description",
+          "meta.other.*"
+        ]
+      }
+    }
+  }
+}
+
+PUT logs/event/1
+{
+  "requests": {
+    "count": 10,
+    "foo": "bar" <1>
+  },
+  "meta": {
+    "name": "Some metric",
+    "description": "Some metric description", <1>
+    "other": {
+      "foo": "one", <1>
+      "baz": "two" <1>
+    }
+  }
+}
+
+GET logs/event/_search
+{
+  "query": {
+    "match": {
+      "meta.other.foo": "one" <2>
+    }
+  }
+}
+--------------------------------------------------
+// AUTOSENSE
+
+<1> These fields will be removed from the stored `_source` field.
+<2> We can still search on this field, even though it is not in the stored `_source`.
+
+WARNING: Removing fields from the `_source` has similar downsides to disabling
+`_source`, especially the fact that you cannot reindex documents from one
+Elasticsearch index to another. Consider using
+<<search-request-source-filtering,source filtering>> or a
+<<mapping-transform,transform script>> instead.