há 9 anos atrás · 05271d58ca
--- a/docs/reference/mapping/params/fielddata.asciidoc
+++ b/docs/reference/mapping/params/fielddata.asciidoc
@@ -2,42 +2,105 @@
 
				 === `fielddata`
			
 
				 
			
 
				 Most fields are <<mapping-index,indexed>> by default, which makes them
			
 
				-searchable. The inverted index allows queries to look up the search term in
			
 
				-unique sorted list of terms, and from that immediately have access to the list
			
 
				-of documents that contain the term.
			
 
				+searchable. Sorting, aggregations, and accessing field values in scripts,
			
 
				+however, requires a different access pattern from search.
			
 
				 
			
 
				-Sorting, aggregations, and access to field values in scripts requires a
			
 
				-different data access pattern.  Instead of lookup up the term and finding
			
 
				-documents, we need to be able to look up the document and find the terms that
			
 
				-it has in a field.
			
 
				+Search needs to answer the question _"Which documents contain this term?"_,
			
 
				+while sorting and aggregations need to answer a different question: _"What is
			
 
				+the value of this field for **this** document?"_.
			
 
				 
			
 
				-Most fields can use index-time, on-disk <<doc-values,`doc_values`>> to support
			
 
				-this type of data access pattern, but `text` fields do not support `doc_values`.
			
 
				+Most fields can use index-time, on-disk <<doc-values,`doc_values`>> for this
			
 
				+data access pattern, but <<text,`text`>> fields do not support `doc_values`.
			
 
				 
			
 
				-Instead, `text` strings use a query-time data structure called
			
 
				+Instead, `text` fields use a query-time *in-memory* data structure called
			
 
				 `fielddata`.  This data structure is built on demand the first time that a
			
 
				-field is used for aggregations, sorting, or is accessed in a script.  It is built
			
 
				-by reading the entire inverted index for each segment from disk, inverting the
			
 
				-term ↔︎ document relationship, and storing the result in memory, in the
			
 
				-JVM heap.
			
 
				-
			
 
				-Loading fielddata is an expensive process so it is disabled by default. Also,
			
 
				-when enabled, once it has been loaded, it remains in memory for the lifetime of
			
 
				-the segment.
			
 
				-
			
 
				-[WARNING]
			
 
				-.Fielddata can fill up your heap space
			
 
				-==============================================================================
			
 
				-Fielddata can consume a lot of heap space, especially when loading high
			
 
				-cardinality `text` fields.  Most of the time, it doesn't make sense
			
 
				-to sort or aggregate on `text` fields (with the notable exception
			
 
				-of the
			
 
				-<<search-aggregations-bucket-significantterms-aggregation,`significant_terms`>>
			
 
				-aggregation).  Always think about whether a <<keyword,`keyword`>> field (which can
			
 
				-use `doc_values`) would be  a better fit for your use case.
			
 
				-==============================================================================
			
 
				-
			
 
				-TIP: The `fielddata.*` settings must have the same settings for fields of the
			
 
				+field is used for aggregations, sorting, or in a script.  It is built by
			
 
				+reading the entire inverted index for each segment from disk, inverting the
			
 
				+term ↔︎ document relationship, and storing the result in memory, in the JVM
			
 
				+heap.
			
 
				+
			
 
				+==== Fielddata is disabled on `text` fields by default
			
 
				+
			
 
				+Fielddata can consume a *lot* of heap space, especially when loading high
			
 
				+cardinality `text` fields.  Once fielddata has been loaded into the heap, it
			
 
				+remains there for the lifetime of the segment. Also, loading fielddata is an
			
 
				+expensive process which can cause users to experience latency hits.  This is
			
 
				+why fielddata is disabled by default.
			
 
				+
			
 
				+If you try to sort, aggregate, or access values from a script on a `text`
			
 
				+field, you will see this exception:
			
 
				+
			
 
				+[quote]
			
 
				+--
			
 
				+Fielddata is disabled on text fields by default.  Set `fielddata=true` on
			
 
				+[`your_field_name`] in order to load  fielddata in memory by uninverting the
			
 
				+inverted index. Note that this can however use significant memory.
			
 
				+--
			
 
				+
			
 
				+[[before-enabling-fielddata]]
			
 
				+==== Before enabling fielddata
			
 
				+
			
 
				+Before you enable fielddata, consider why you are using a `text` field for
			
 
				+aggregations, sorting, or in a script.  It usually doesn't make sense to do
			
 
				+so.
			
 
				+
			
 
				+A text field is analyzed before indexing so that a value like
			
 
				+`New York` can be found by searching for `new` or for `york`.  A `terms`
			
 
				+aggregation on this field will return a `new` bucket and a `york` bucket, when
			
 
				+you probably want a single bucket called `New York`.
			
 
				+
			
 
				+Instead, you should have a `text` field for full text searches, and an
			
 
				+unanalyzed <<keyword,`keyword`>> field with <<doc-values,`doc_values`>>
			
 
				+enabled for aggregations, as follows:
			
 
				+
			
 
				+[source,js]
			
 
				+---------------------------------
			
 
				+PUT my_index
			
 
				+{
			
 
				+  "mappings": {
			
 
				+    "my_type": {
			
 
				+      "properties": {
			
 
				+        "my_field": { <1>
			
 
				+          "type": "text",
			
 
				+          "fields": {
			
 
				+            "keyword": { <2>
			
 
				+              "type": "keyword"
			
 
				+            }
			
 
				+          }
			
 
				+        }
			
 
				+      }
			
 
				+    }
			
 
				+  }
			
 
				+}
			
 
				+---------------------------------
			
 
				+// CONSOLE
			
 
				+<1> Use the `my_field` field for searches.
			
 
				+<2> Use the `my_field.keyword` field for aggregations, sorting, or in scripts.
			
 
				+
			
 
				+==== Enabling fielddata on `text` fields
			
 
				+
			
 
				+You can enable fielddata on an existing `text` field using the
			
 
				+<<indices-put-mapping,PUT mapping API>> as follows:
			
 
				+
			
 
				+[source,js]
			
 
				+-----------------------------------
			
 
				+PUT my_index/_mapping/my_type
			
 
				+{
			
 
				+  "properties": {
			
 
				+    "my_field": { <1>
			
 
				+      "type":     "text",
			
 
				+      "fielddata": true
			
 
				+    }
			
 
				+  }
			
 
				+}
			
 
				+-----------------------------------
			
 
				+// CONSOLE
			
 
				+// TEST[continued]
			
 
				+
			
 
				+<1> The mapping that you specify for `my_field` should consist of the existing
			
 
				+    mapping for that field, plus the `fielddata` parameter.
			
 
				+
			
 
				+TIP: The `fielddata.*` parameter must have the same settings for fields of the
			
 
				 same name in the same index.  Its value can be updated on existing fields
			
 
				 using the <<indices-put-mapping,PUT mapping API>>.
			
 
				 
			
@@ -49,12 +112,13 @@ using the <<indices-put-mapping,PUT mapping API>>.
 
				 Global ordinals is a data-structure on top of fielddata and doc values, that
			
 
				 maintains an incremental numbering for each unique term in a lexicographic
			
 
				 order. Each term has a unique number and the number of term 'A' is lower than
			
 
				-the number of term 'B'. Global ordinals are only supported on string fields.
			
 
				+the number of term 'B'. Global ordinals are only supported on <<text,`text`>>
			
 
				+and <<keyword,`keyword`>> fields.
			
 
				 
			
 
				-Fielddata and doc values also have ordinals, which is a unique numbering for all terms
			
 
				-in a particular segment and field. Global ordinals just build on top of this,
			
 
				-by providing a mapping between the segment ordinals and the global ordinals,
			
 
				-the latter being unique across the entire shard.
			
 
				+Fielddata and doc values also have ordinals, which is a unique numbering for
			
 
				+all terms in a particular segment and field. Global ordinals just build on top
			
 
				+of this, by providing a mapping between the segment ordinals and the global
			
 
				+ordinals, the latter being unique across the entire shard.
			
 
				 
			
 
				 Global ordinals are used for features that use segment ordinals, such as
			
 
				 sorting and the terms aggregation, to improve the execution time. A terms
			
@@ -68,10 +132,11 @@ which is different than for field data for a specific field which is tied to a
 
				 single segment. For this reason global ordinals need to be entirely rebuilt
			
 
				 whenever a once new segment becomes visible.
			
 
				 
			
 
				-The loading time of global ordinals depends on the number of terms in a field, but in general
			
 
				-it is low, since it source field data has already been loaded. The memory overhead of global
			
 
				-ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals
			
 
				-can move the loading time from the first search request, to the refresh itself.
			
 
				+The loading time of global ordinals depends on the number of terms in a field,
			
 
				+but in general it is low, since it source field data has already been loaded.
			
 
				+The memory overhead of global ordinals is a small because it is very
			
 
				+efficiently compressed. Eager loading of global ordinals can move the loading
			
 
				+time from the first search request, to the refresh itself.
			
 
				 
			
 
				 *****************************************