|  | @@ -2,42 +2,105 @@
 | 
	
		
			
				|  |  |  === `fielddata`
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |  Most fields are <<mapping-index,indexed>> by default, which makes them
 | 
	
		
			
				|  |  | -searchable. The inverted index allows queries to look up the search term in
 | 
	
		
			
				|  |  | -unique sorted list of terms, and from that immediately have access to the list
 | 
	
		
			
				|  |  | -of documents that contain the term.
 | 
	
		
			
				|  |  | +searchable. Sorting, aggregations, and accessing field values in scripts,
 | 
	
		
			
				|  |  | +however, requires a different access pattern from search.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -Sorting, aggregations, and access to field values in scripts requires a
 | 
	
		
			
				|  |  | -different data access pattern.  Instead of lookup up the term and finding
 | 
	
		
			
				|  |  | -documents, we need to be able to look up the document and find the terms that
 | 
	
		
			
				|  |  | -it has in a field.
 | 
	
		
			
				|  |  | +Search needs to answer the question _"Which documents contain this term?"_,
 | 
	
		
			
				|  |  | +while sorting and aggregations need to answer a different question: _"What is
 | 
	
		
			
				|  |  | +the value of this field for **this** document?"_.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -Most fields can use index-time, on-disk <<doc-values,`doc_values`>> to support
 | 
	
		
			
				|  |  | -this type of data access pattern, but `text` fields do not support `doc_values`.
 | 
	
		
			
				|  |  | +Most fields can use index-time, on-disk <<doc-values,`doc_values`>> for this
 | 
	
		
			
				|  |  | +data access pattern, but <<text,`text`>> fields do not support `doc_values`.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -Instead, `text` strings use a query-time data structure called
 | 
	
		
			
				|  |  | +Instead, `text` fields use a query-time *in-memory* data structure called
 | 
	
		
			
				|  |  |  `fielddata`.  This data structure is built on demand the first time that a
 | 
	
		
			
				|  |  | -field is used for aggregations, sorting, or is accessed in a script.  It is built
 | 
	
		
			
				|  |  | -by reading the entire inverted index for each segment from disk, inverting the
 | 
	
		
			
				|  |  | -term ↔︎ document relationship, and storing the result in memory, in the
 | 
	
		
			
				|  |  | -JVM heap.
 | 
	
		
			
				|  |  | -
 | 
	
		
			
				|  |  | -Loading fielddata is an expensive process so it is disabled by default. Also,
 | 
	
		
			
				|  |  | -when enabled, once it has been loaded, it remains in memory for the lifetime of
 | 
	
		
			
				|  |  | -the segment.
 | 
	
		
			
				|  |  | -
 | 
	
		
			
				|  |  | -[WARNING]
 | 
	
		
			
				|  |  | -.Fielddata can fill up your heap space
 | 
	
		
			
				|  |  | -==============================================================================
 | 
	
		
			
				|  |  | -Fielddata can consume a lot of heap space, especially when loading high
 | 
	
		
			
				|  |  | -cardinality `text` fields.  Most of the time, it doesn't make sense
 | 
	
		
			
				|  |  | -to sort or aggregate on `text` fields (with the notable exception
 | 
	
		
			
				|  |  | -of the
 | 
	
		
			
				|  |  | -<<search-aggregations-bucket-significantterms-aggregation,`significant_terms`>>
 | 
	
		
			
				|  |  | -aggregation).  Always think about whether a <<keyword,`keyword`>> field (which can
 | 
	
		
			
				|  |  | -use `doc_values`) would be  a better fit for your use case.
 | 
	
		
			
				|  |  | -==============================================================================
 | 
	
		
			
				|  |  | -
 | 
	
		
			
				|  |  | -TIP: The `fielddata.*` settings must have the same settings for fields of the
 | 
	
		
			
				|  |  | +field is used for aggregations, sorting, or in a script.  It is built by
 | 
	
		
			
				|  |  | +reading the entire inverted index for each segment from disk, inverting the
 | 
	
		
			
				|  |  | +term ↔︎ document relationship, and storing the result in memory, in the JVM
 | 
	
		
			
				|  |  | +heap.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +==== Fielddata is disabled on `text` fields by default
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +Fielddata can consume a *lot* of heap space, especially when loading high
 | 
	
		
			
				|  |  | +cardinality `text` fields.  Once fielddata has been loaded into the heap, it
 | 
	
		
			
				|  |  | +remains there for the lifetime of the segment. Also, loading fielddata is an
 | 
	
		
			
				|  |  | +expensive process which can cause users to experience latency hits.  This is
 | 
	
		
			
				|  |  | +why fielddata is disabled by default.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +If you try to sort, aggregate, or access values from a script on a `text`
 | 
	
		
			
				|  |  | +field, you will see this exception:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +[quote]
 | 
	
		
			
				|  |  | +--
 | 
	
		
			
				|  |  | +Fielddata is disabled on text fields by default.  Set `fielddata=true` on
 | 
	
		
			
				|  |  | +[`your_field_name`] in order to load  fielddata in memory by uninverting the
 | 
	
		
			
				|  |  | +inverted index. Note that this can however use significant memory.
 | 
	
		
			
				|  |  | +--
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +[[before-enabling-fielddata]]
 | 
	
		
			
				|  |  | +==== Before enabling fielddata
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +Before you enable fielddata, consider why you are using a `text` field for
 | 
	
		
			
				|  |  | +aggregations, sorting, or in a script.  It usually doesn't make sense to do
 | 
	
		
			
				|  |  | +so.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +A text field is analyzed before indexing so that a value like
 | 
	
		
			
				|  |  | +`New York` can be found by searching for `new` or for `york`.  A `terms`
 | 
	
		
			
				|  |  | +aggregation on this field will return a `new` bucket and a `york` bucket, when
 | 
	
		
			
				|  |  | +you probably want a single bucket called `New York`.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +Instead, you should have a `text` field for full text searches, and an
 | 
	
		
			
				|  |  | +unanalyzed <<keyword,`keyword`>> field with <<doc-values,`doc_values`>>
 | 
	
		
			
				|  |  | +enabled for aggregations, as follows:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +[source,js]
 | 
	
		
			
				|  |  | +---------------------------------
 | 
	
		
			
				|  |  | +PUT my_index
 | 
	
		
			
				|  |  | +{
 | 
	
		
			
				|  |  | +  "mappings": {
 | 
	
		
			
				|  |  | +    "my_type": {
 | 
	
		
			
				|  |  | +      "properties": {
 | 
	
		
			
				|  |  | +        "my_field": { <1>
 | 
	
		
			
				|  |  | +          "type": "text",
 | 
	
		
			
				|  |  | +          "fields": {
 | 
	
		
			
				|  |  | +            "keyword": { <2>
 | 
	
		
			
				|  |  | +              "type": "keyword"
 | 
	
		
			
				|  |  | +            }
 | 
	
		
			
				|  |  | +          }
 | 
	
		
			
				|  |  | +        }
 | 
	
		
			
				|  |  | +      }
 | 
	
		
			
				|  |  | +    }
 | 
	
		
			
				|  |  | +  }
 | 
	
		
			
				|  |  | +}
 | 
	
		
			
				|  |  | +---------------------------------
 | 
	
		
			
				|  |  | +// CONSOLE
 | 
	
		
			
				|  |  | +<1> Use the `my_field` field for searches.
 | 
	
		
			
				|  |  | +<2> Use the `my_field.keyword` field for aggregations, sorting, or in scripts.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +==== Enabling fielddata on `text` fields
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +You can enable fielddata on an existing `text` field using the
 | 
	
		
			
				|  |  | +<<indices-put-mapping,PUT mapping API>> as follows:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +[source,js]
 | 
	
		
			
				|  |  | +-----------------------------------
 | 
	
		
			
				|  |  | +PUT my_index/_mapping/my_type
 | 
	
		
			
				|  |  | +{
 | 
	
		
			
				|  |  | +  "properties": {
 | 
	
		
			
				|  |  | +    "my_field": { <1>
 | 
	
		
			
				|  |  | +      "type":     "text",
 | 
	
		
			
				|  |  | +      "fielddata": true
 | 
	
		
			
				|  |  | +    }
 | 
	
		
			
				|  |  | +  }
 | 
	
		
			
				|  |  | +}
 | 
	
		
			
				|  |  | +-----------------------------------
 | 
	
		
			
				|  |  | +// CONSOLE
 | 
	
		
			
				|  |  | +// TEST[continued]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +<1> The mapping that you specify for `my_field` should consist of the existing
 | 
	
		
			
				|  |  | +    mapping for that field, plus the `fielddata` parameter.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +TIP: The `fielddata.*` parameter must have the same settings for fields of the
 | 
	
		
			
				|  |  |  same name in the same index.  Its value can be updated on existing fields
 | 
	
		
			
				|  |  |  using the <<indices-put-mapping,PUT mapping API>>.
 | 
	
		
			
				|  |  |  
 | 
	
	
		
			
				|  | @@ -49,12 +112,13 @@ using the <<indices-put-mapping,PUT mapping API>>.
 | 
	
		
			
				|  |  |  Global ordinals is a data-structure on top of fielddata and doc values, that
 | 
	
		
			
				|  |  |  maintains an incremental numbering for each unique term in a lexicographic
 | 
	
		
			
				|  |  |  order. Each term has a unique number and the number of term 'A' is lower than
 | 
	
		
			
				|  |  | -the number of term 'B'. Global ordinals are only supported on string fields.
 | 
	
		
			
				|  |  | +the number of term 'B'. Global ordinals are only supported on <<text,`text`>>
 | 
	
		
			
				|  |  | +and <<keyword,`keyword`>> fields.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -Fielddata and doc values also have ordinals, which is a unique numbering for all terms
 | 
	
		
			
				|  |  | -in a particular segment and field. Global ordinals just build on top of this,
 | 
	
		
			
				|  |  | -by providing a mapping between the segment ordinals and the global ordinals,
 | 
	
		
			
				|  |  | -the latter being unique across the entire shard.
 | 
	
		
			
				|  |  | +Fielddata and doc values also have ordinals, which is a unique numbering for
 | 
	
		
			
				|  |  | +all terms in a particular segment and field. Global ordinals just build on top
 | 
	
		
			
				|  |  | +of this, by providing a mapping between the segment ordinals and the global
 | 
	
		
			
				|  |  | +ordinals, the latter being unique across the entire shard.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |  Global ordinals are used for features that use segment ordinals, such as
 | 
	
		
			
				|  |  |  sorting and the terms aggregation, to improve the execution time. A terms
 | 
	
	
		
			
				|  | @@ -68,10 +132,11 @@ which is different than for field data for a specific field which is tied to a
 | 
	
		
			
				|  |  |  single segment. For this reason global ordinals need to be entirely rebuilt
 | 
	
		
			
				|  |  |  whenever a once new segment becomes visible.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -The loading time of global ordinals depends on the number of terms in a field, but in general
 | 
	
		
			
				|  |  | -it is low, since it source field data has already been loaded. The memory overhead of global
 | 
	
		
			
				|  |  | -ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals
 | 
	
		
			
				|  |  | -can move the loading time from the first search request, to the refresh itself.
 | 
	
		
			
				|  |  | +The loading time of global ordinals depends on the number of terms in a field,
 | 
	
		
			
				|  |  | +but in general it is low, since it source field data has already been loaded.
 | 
	
		
			
				|  |  | +The memory overhead of global ordinals is a small because it is very
 | 
	
		
			
				|  |  | +efficiently compressed. Eager loading of global ordinals can move the loading
 | 
	
		
			
				|  |  | +time from the first search request, to the refresh itself.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |  *****************************************
 | 
	
		
			
				|  |  |  
 |