Przeglądaj źródła

[DOCS] Focus retrieving selected fields on fields parameter (#71506)

* [DOCS] Focus retrieving selected fields on fields parameter

* Incorporating changes from reviews

* Adding clarifications from review feedback

* Slight wording revisions.

* Clarify language around format parameter and move text out of callout.
Adam Locke 4 lat temu
rodzic
commit
6dfd92c46f

+ 10 - 3
docs/reference/eql/eql.asciidoc

@@ -441,15 +441,22 @@ GET /my-data-stream/_eql/search?filter_path=-hits.events._source
     "event.type",
     "process.*",                <1>
     {
-      "field": "@timestamp",    <2>
-      "format": "epoch_millis"
+      "field": "@timestamp",
+      "format": "epoch_millis"  <2>
     }
   ]
 }
 ----
 // TEST[setup:sec_logs]
 
-include::{es-repo-dir}/search/search-your-data/retrieve-selected-fields.asciidoc[tag=fields-param-callouts]
+<1> Both full field names and wildcard patterns are accepted.
+<2> Use the `format` parameter to apply a custom format for the field's values.
+    <<date,`date`>> and <<date_nanos, `date_nanos`>> fields accept a
+    <<mapping-date-format,date format>>. <<spatial_datatypes, Spatial fields>>
+    accept either `geojson` for http://www.geojson.org[GeoJSON] (the default)
+    or `wkt` for
+    {wikipedia}/Well-known_text_representation_of_geometry[Well Known Text].
+    Other field types do not support the `format` parameter.
 
 The values are returned as a flat list in the `fields` section of each hit:
 

+ 203 - 209
docs/reference/search/search-your-data/retrieve-selected-fields.asciidoc

@@ -6,72 +6,57 @@
 
 By default, each hit in the search response includes the document
 <<mapping-source-field,`_source`>>, which is the entire JSON object that was
-provided when indexing the document. To retrieve specific fields in the search
-response, you can use the `fields` parameter:
+provided when indexing the document. There are two recommended methods to
+retrieve selected fields from a search query:
 
-[source,console]
-----
-POST my-index-000001/_search
-{
-  "query": {
-    "match": {
-      "message": "foo"
-    }
-  },
-  "fields": ["user.id", "@timestamp"],
-  "_source": false
-}
-----
-// TEST[setup:my_index]
+* Use the <<search-fields-param,`fields` option>> to extract the values of
+fields present in the index mapping
+* Use the <<source-filtering,`_source` option>> if you need to access the original data that was passed at index time
 
-The `fields` parameter consults both a document's `_source` and the index
-mappings to load and return values. Because it makes use of the mappings,
-`fields` has some advantages over referencing the `_source` directly: it
-accepts <<multi-fields, multi-fields>> and <<alias, field aliases>>, and
-also formats field values like dates in a consistent way.
-
-A document's `_source` is stored as a single field in Lucene. So the whole
-`_source` object must be loaded and parsed even if only a small number of
-fields are requested. To avoid this limitation, you can try another option for
-loading fields:
-
-* Use the <<docvalue-fields, `docvalue_fields`>>
-parameter to get values for selected fields. This can be a good
-choice when returning a fairly small number of fields that support doc values,
-such as keywords and dates.
-* Use the <<request-body-search-stored-fields, `stored_fields`>> parameter to
-get the values for specific stored fields (fields that use the
-<<mapping-store,`store`>> mapping option).
-
-You can also use the <<script-fields,`script_field`>> parameter to transform
-field values in the response using a script.
-
-You can find more detailed information on each of these methods in the
-following sections:
-
-* <<search-fields-param>>
-* <<docvalue-fields>>
-* <<stored-fields>>
-* <<source-filtering>>
-* <<script-fields>>
+You can use both of these methods, though the `fields` option is preferred
+because it consults both the document data and index mappings. In some
+instances, you might want to use <<field-retrieval-methods,other methods>> of
+retrieving data.
 
 [discrete]
 [[search-fields-param]]
-=== Fields
-
+=== The `fields` option
 // tag::fields-param-desc[]
-The `fields` parameter allows for retrieving a list of document fields in
-the search response. It consults both the document `_source` and the index
-mappings to return each value in a standardized way that matches its mapping
-type. By default, date fields are formatted according to the
-<<mapping-date-format,date format>> parameter in their mappings. You can also
-use the `fields` parameter to retrieve <<runtime-retrieving-fields,runtime field
-values>>.
+To retrieve specific fields in the search response, use the `fields` parameter.
+Because it consults the index mappings, the `fields` parameter provides several
+advantages over referencing the `_source` directly. Specifically, the `fields`
+parameter:
+
+* Returns each value in a standardized way that matches its mapping type
+* Accepts <<multi-fields,multi-fields>> and <<alias,field aliases>>
+* Formats dates and spatial data types
+* Retrieves <<runtime-retrieving-fields,runtime field values>>
+* Returns fields calculated by a script at index time
 // end::fields-param-desc[]
 
+Other mapping options are also respected, including
+<<ignore-above,`ignore_above`>>, <<ignore-malformed,`ignore_malformed`>>, and
+<<null-value,`null_value`>>.
+
+The `fields` option returns values in the way that matches how {es} indexes
+them. For standard fields, this means that the `fields` option looks in
+`_source` to find the values, then parses and formats them using the mappings.
+
+[discrete]
+[[search-fields-request]]
+==== Search for specific fields
 The following search request uses the `fields` parameter to retrieve values
 for the `user.id` field, all fields starting with `http.response.`, and the
-`@timestamp` field:
+`@timestamp` field.
+
+Using object notation, you can pass a `format` parameter for certain fields to
+apply a custom format for the field's values:
+
+* <<date,`date`>> and <<date_nanos,`date_nanos`>> fields accept a <<mapping-date-format,date format>>
+* <<spatial_datatypes, Spatial fields>> accept either `geojson` for http://www.geojson.org[GeoJSON] (the default) or `wkt` for
+{wikipedia}/Well-known_text_representation_of_geometry[Well Known Text]
+
+Other field types do not support the `format` parameter.
 
 [source,console]
 ----
@@ -84,7 +69,7 @@ POST my-index-000001/_search
   },
   "fields": [
     "user.id",
-    "http.response.*", <1>
+    "http.response.*",         <1>
     {
       "field": "@timestamp",
       "format": "epoch_millis" <2>
@@ -94,32 +79,28 @@ POST my-index-000001/_search
 }
 ----
 // TEST[setup:my_index]
+// TEST[s/_search/_search\?filter_path=hits/]
 
-// tag::fields-param-callouts[]
 <1> Both full field names and wildcard patterns are accepted.
-<2> Using object notation, you can pass a `format` parameter to apply a custom
-    format for the field's values.
-    <<date,`date`>> and <<date_nanos, `date_nanos`>> fields accept a
-    <<mapping-date-format,date format>>. <<spatial_datatypes, Spatial fields>>
-    accept either `geojson` for http://www.geojson.org[GeoJSON] (the default)
-    or `wkt` for
-    {wikipedia}/Well-known_text_representation_of_geometry[Well Known Text].
-    Other field types do not support the `format` parameter.
-// end::fields-param-callouts[]
+<2> Use the `format` parameter to apply a custom format for the field's values.
+
+[discrete]
+[[search-fields-response]]
+==== Response always returns an array
+The `fields` response always returns an array of values for each field,
+even when there is a single value in the `_source`. This is because {es} has
+no dedicated array type, and any field could contain multiple values. The
+`fields` parameter also does not guarantee that array values are returned in
+a specific order. See the mapping documentation on <<array,arrays>> for more
+background.
 
-The values are returned as a flat list in the `fields` section in each hit:
+The response includes values as a flat list in the `fields` section for each
+hit. Because the `fields` parameter doesn't fetch entire objects, only leaf
+fields are returned.
 
 [source,console-result]
 ----
 {
-  "took" : 2,
-  "timed_out" : false,
-  "_shards" : {
-    "total" : 1,
-    "successful" : 1,
-    "skipped" : 0,
-    "failed" : 0
-  },
   "hits" : {
     "total" : {
       "value" : 1,
@@ -150,29 +131,12 @@ The values are returned as a flat list in the `fields` section in each hit:
   }
 }
 ----
-// TESTRESPONSE[s/"took" : 2/"took": $body.took/]
 // TESTRESPONSE[s/"max_score" : 1.0/"max_score" : $body.hits.max_score/]
 // TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/]
 
-Only leaf fields are returned -- `fields` does not allow for fetching entire
-objects.
-
-The `fields` parameter handles field types like <<alias, field aliases>> and
-<<constant-keyword-field-type, `constant_keyword`>> whose values aren't always present in
-the `_source`. Other mapping options are also respected, including
-<<ignore-above, `ignore_above`>>, <<ignore-malformed, `ignore_malformed`>> and
-<<null-value, `null_value`>>.
-
-NOTE: The `fields` response always returns an array of values for each field,
-even when there is a single value in the `_source`. This is because {es} has
-no dedicated array type, and any field could contain multiple values. The
-`fields` parameter also does not guarantee that array values are returned in
-a specific order. See the mapping documentation on <<array, arrays>> for more
-background.
-
 [discrete]
 [[search-fields-nested]]
-==== Handling of nested fields
+==== Retrieve nested fields
 
 The `fields` response for <<nested,`nested` fields>> is slightly different from that
 of regular object fields. While leaf values inside regular `object` fields are
@@ -225,7 +189,7 @@ POST my-index-000001/_search
 }
 --------------------------------------------------
 
-the response will group `first` and `last` name instead of
+The response will group `first` and `last` name instead of
 returning them as a flat list.
 
 [source,console-result]
@@ -269,8 +233,9 @@ returning them as a flat list.
 // TESTRESPONSE[s/"max_score" : 1.0/"max_score" : $body.hits.max_score/]
 // TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/]
 
-Nested fields will be grouped by their nested paths, no matter the pattern used to retrieve them.
-For example, querying only for the `user.first` field in the example above:
+Nested fields will be grouped by their nested paths, no matter the pattern used
+to retrieve them. For example, if you query only for the `user.first` field from
+the previous example:
 
 [source,console]
 --------------------------------------------------
@@ -282,7 +247,8 @@ POST my-index-000001/_search
 --------------------------------------------------
 // TEST[continued]
 
-will return only the users first name but still maintain the structure of the nested `user` array:
+The response returns only the user's first name, but still maintains the
+structure of the nested `user` array:
 
 [source,console-result]
 ----
@@ -323,19 +289,19 @@ will return only the users first name but still maintain the structure of the ne
 // TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/]
 
 However, when the `fields` pattern targets the nested `user` field directly, no
-values will be returned since the pattern doesn't match any leaf fields.
+values will be returned because the pattern doesn't match any leaf fields.
 
 [discrete]
 [[retrieve-unmapped-fields]]
-==== Retrieving unmapped fields
-
-By default, the `fields` parameter returns only values of mapped fields. However,
-Elasticsearch allows storing fields in `_source` that are unmapped, for example by
-setting <<dynamic-field-mapping,Dynamic field mapping>> to `false` or by using an
-object field with `enabled: false`, thereby disabling parsing and indexing of its content.
+==== Retrieve unmapped fields
+By default, the `fields` parameter returns only values of mapped fields.
+However, {es} allows storing fields in `_source` that are unmapped, such as
+setting <<dynamic-field-mapping,dynamic field mapping>> to `false` or by using
+an object field with `enabled: false`. These options disable parsing and
+indexing of the object content.
 
-Fields in such an object can be retrieved from `_source` using the `include_unmapped` option
-in the `fields` section:
+To retrieve unmapped fields in an object from `_source`, use the
+`include_unmapped` option in the `fields` section:
 
 [source,console]
 ----
@@ -372,9 +338,10 @@ POST my-index-000001/_search
 <1> Disable all mappings.
 <2> Include unmapped fields matching this field pattern.
 
-The response will contain fields results under the  `session_data.object.*` path even if the
-fields are unmapped, but will not contain `user_id` since it is unmapped but the `include_unmapped`
-flag hasn't been set to `true` for that field pattern.
+The response will contain field results under the  `session_data.object.*` path,
+even if the fields are unmapped. The `user_id` field is also unmapped, but it
+won't be included in the response because `include_unmapped` isn't set to
+`true` for that field pattern.
 
 [source,console-result]
 ----
@@ -412,9 +379,122 @@ flag hasn't been set to `true` for that field pattern.
 // TESTRESPONSE[s/"max_score" : 1.0/"max_score" : $body.hits.max_score/]
 // TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/]
 
+[discrete]
+[[source-filtering]]
+=== The `_source` option
+You can use the `_source` parameter to select what fields of the source are
+returned. This is called _source filtering_.
+
+The following search API request sets the `_source` request body parameter to
+`false`. The document source is not included in the response.
+
+[source,console]
+----
+GET /_search
+{
+  "_source": false,
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+
+To return only a subset of source fields, specify a wildcard (`*`) pattern in
+the `_source` parameter. The following search API request returns the source for
+only the `obj` field and its properties.
+
+[source,console]
+----
+GET /_search
+{
+  "_source": "obj.*",
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+
+You can also specify an array of wildcard patterns in the `_source` field. The
+following search API request returns the source for only the `obj1` and
+`obj2` fields and their properties.
+
+[source,console]
+----
+GET /_search
+{
+  "_source": [ "obj1.*", "obj2.*" ],
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+
+For finer control, you can specify an object containing arrays of `includes` and
+`excludes` patterns in the `_source` parameter.
+
+If the `includes` property is specified, only source fields that match one of
+its patterns are returned. You can exclude fields from this subset using the
+`excludes` property.
+
+If the `includes` property is not specified, the entire document source is
+returned, excluding any fields that match a pattern in the `excludes` property.
+
+The following search API request returns the source for only the `obj1` and
+`obj2` fields and their properties, excluding any child `description` fields.
+
+[source,console]
+----
+GET /_search
+{
+  "_source": {
+    "includes": [ "obj1.*", "obj2.*" ],
+    "excludes": [ "*.description" ]
+  },
+  "query": {
+    "term": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+
+[discrete]
+[[field-retrieval-methods]]
+=== Other methods of retrieving data
+
+.Using `fields` is typically better
+****
+These options are usually not required. Using the `fields` option is typically
+the better choice, unless you absolutely need to force loading a stored or
+`docvalue_fields`.
+****
+
+A document's `_source` is stored as a single field in Lucene. This structure
+means that the whole `_source` object must be loaded and parsed even if you're
+only requesting part of it. To avoid this limitation, you can try other options
+for loading fields:
+
+* Use the <<docvalue-fields,`docvalue_fields`>>
+parameter to get values for selected fields. This can be a good
+choice when returning a fairly small number of fields that support doc values,
+such as keywords and dates.
+* Use the <<request-body-search-stored-fields, `stored_fields`>> parameter to
+get the values for specific stored fields (fields that use the
+<<mapping-store,`store`>> mapping option).
+
+{es} always attempts to load values from `_source`. This behavior has the same
+implications of source filtering where {es} needs to load and parse the entire
+`_source` to retrieve just one field.
+
 [discrete]
 [[docvalue-fields]]
-=== Doc value fields
+==== Doc value fields
 
 You can use the <<docvalue-fields,`docvalue_fields`>> parameter to return
 <<doc-values,doc values>> for one or more fields in the search response.
@@ -468,7 +548,7 @@ property.
 
 [discrete]
 [[stored-fields]]
-=== Stored fields
+==== Stored fields
 
 It's also possible to store an individual field's values by using the
 <<mapping-store,`store`>> mapping option. You can use the
@@ -522,7 +602,7 @@ must be used within an <<inner-hits, `inner_hits`>> block.
 
 [discrete]
 [[disable-stored-fields]]
-==== Disable stored fields
+===== Disable stored fields
 
 To disable the stored fields (and metadata fields) entirely use: `_none_`:
 
@@ -539,95 +619,9 @@ GET /_search
 
 NOTE: <<source-filtering,`_source`>> and <<request-body-search-version, `version`>> parameters cannot be activated if `_none_` is used.
 
-[discrete]
-[[source-filtering]]
-=== Source filtering
-
-You can use the `_source` parameter to select what fields of the source are
-returned. This is called _source filtering_.
-
-The following search API request sets the `_source` request body parameter to
-`false`. The document source is not included in the response.
-
-[source,console]
-----
-GET /_search
-{
-  "_source": false,
-  "query": {
-    "match": {
-      "user.id": "kimchy"
-    }
-  }
-}
-----
-
-To return only a subset of source fields, specify a wildcard (`*`) pattern in
-the `_source` parameter. The following search API request returns the source for
-only the `obj` field and its properties.
-
-[source,console]
-----
-GET /_search
-{
-  "_source": "obj.*",
-  "query": {
-    "match": {
-      "user.id": "kimchy"
-    }
-  }
-}
-----
-
-You can also specify an array of wildcard patterns in the `_source` field. The
-following search API request returns the source for only the `obj1` and
-`obj2` fields and their properties.
-
-[source,console]
-----
-GET /_search
-{
-  "_source": [ "obj1.*", "obj2.*" ],
-  "query": {
-    "match": {
-      "user.id": "kimchy"
-    }
-  }
-}
-----
-
-For finer control, you can specify an object containing arrays of `includes` and
-`excludes` patterns in the `_source` parameter.
-
-If the `includes` property is specified, only source fields that match one of
-its patterns are returned. You can exclude fields from this subset using the
-`excludes` property.
-
-If the `includes` property is not specified, the entire document source is
-returned, excluding any fields that match a pattern in the `excludes` property.
-
-The following search API request returns the source for only the `obj1` and
-`obj2` fields and their properties, excluding any child `description` fields.
-
-[source,console]
-----
-GET /_search
-{
-  "_source": {
-    "includes": [ "obj1.*", "obj2.*" ],
-    "excludes": [ "*.description" ]
-  },
-  "query": {
-    "term": {
-      "user.id": "kimchy"
-    }
-  }
-}
-----
-
 [discrete]
 [[script-fields]]
-=== Script fields
+==== Script fields
 
 You can use the `script_fields` parameter to retrieve a <<modules-scripting,script
 evaluation>> (based on different fields) for each hit. For example:
@@ -671,16 +665,16 @@ Here is an example:
 [source,console]
 --------------------------------------------------
 GET /_search
-    {
-        "query" : {
-            "match_all": {}
-        },
-        "script_fields" : {
-            "test1" : {
-                "script" : "params['_source']['message']"
-            }
-        }
+{
+  "query": {
+    "match_all": {}
+  },
+  "script_fields": {
+    "test1": {
+      "script": "params['_source']['message']"
     }
+  }
+}
 --------------------------------------------------
 // TEST[setup:my_index]