|
@@ -1,37 +1,52 @@
|
|
|
[[eager-global-ordinals]]
|
|
|
=== `eager_global_ordinals`
|
|
|
|
|
|
-Global ordinals is a data-structure on top of doc values, that maintains an
|
|
|
-incremental numbering for each unique term in a lexicographic order. Each
|
|
|
-term has a unique number and the number of term 'A' is lower than the
|
|
|
-number of term 'B'. Global ordinals are only supported with
|
|
|
-<<keyword,`keyword`>> and <<text,`text`>> fields. In `keyword` fields, they
|
|
|
-are available by default but `text` fields can only use them when `fielddata`,
|
|
|
-with all of its associated baggage, is enabled.
|
|
|
-
|
|
|
-Doc values (and fielddata) also have ordinals, which is a unique numbering for
|
|
|
-all terms in a particular segment and field. Global ordinals just build on top
|
|
|
-of this, by providing a mapping between the segment ordinals and the global
|
|
|
-ordinals, the latter being unique across the entire shard. Given that global
|
|
|
-ordinals for a specific field are tied to _all the segments of a shard_, they
|
|
|
-need to be entirely rebuilt whenever a once new segment becomes visible.
|
|
|
-
|
|
|
-Global ordinals are used for features that use segment ordinals, such as
|
|
|
-the <<search-aggregations-bucket-terms-aggregation,`terms` aggregation>>,
|
|
|
-to improve the execution time. A terms aggregation relies purely on global
|
|
|
-ordinals to perform the aggregation at the shard level, then converts global
|
|
|
-ordinals to the real term only for the final reduce phase, which combines
|
|
|
-results from different shards.
|
|
|
-
|
|
|
-The loading time of global ordinals depends on the number of terms in a field,
|
|
|
-but in general it is low, since it source field data has already been loaded.
|
|
|
-The memory overhead of global ordinals is a small because it is very
|
|
|
-efficiently compressed.
|
|
|
-
|
|
|
-By default, global ordinals are loaded at search-time, which is the right
|
|
|
-trade-off if you are optimizing for indexing speed. However, if you are more
|
|
|
-interested in search speed, it could be beneficial to set
|
|
|
-`eager_global_ordinals: true` on fields that you plan to use in terms
|
|
|
+==== What are global ordinals?
|
|
|
+
|
|
|
+To support aggregations and other operations that require looking up field
|
|
|
+values on a per-document basis, Elasticsearch uses a data structure called
|
|
|
+<<doc-values, doc values>>. Term-based field types such as `keyword` store
|
|
|
+their doc values using an ordinal mapping for a more compact representation.
|
|
|
+This mapping works by assigning each term an incremental integer or 'ordinal'
|
|
|
+based on its lexicographic order. The field's doc values store only the
|
|
|
+ordinals for each document instead of the original terms, with a separate
|
|
|
+lookup structure to convert between ordinals and terms.
|
|
|
+
|
|
|
+When used during aggregations, ordinals can greatly improve performance. As an
|
|
|
+example, the `terms` aggregation relies only on ordinals to collect documents
|
|
|
+into buckets at the shard-level, then converts the ordinals back to their
|
|
|
+original term values when combining results across shards.
|
|
|
+
|
|
|
+Each index segment defines its own ordinal mapping, but aggregations collect
|
|
|
+data across an entire shard. So to be able to use ordinals for shard-level
|
|
|
+operations like aggregations, Elasticsearch creates a unified mapping called
|
|
|
+'global ordinals'. The global ordinal mapping is built on top of segment
|
|
|
+ordinals, and works by maintaining a map from global ordinal to the local
|
|
|
+ordinal for each segment.
|
|
|
+
|
|
|
+Global ordinals are used if a search contains any of the following components:
|
|
|
+
|
|
|
+* Bucket aggregations on `keyword` and `flattened` fields. This includes
|
|
|
+`terms` aggregations as mentioned above, as well as `composite`, `sampler`,
|
|
|
+and `significant_terms`.
|
|
|
+* Bucket aggregations on `text` fields that require <<fielddata, `fielddata`>>
|
|
|
+to be enabled.
|
|
|
+* Operations on parent and child documents from a `join` field, including
|
|
|
+`has_child` queries and `parent` aggregations.
|
|
|
+
|
|
|
+NOTE: The global ordinal mapping is an on-heap data structure. When measuring
|
|
|
+memory usage, Elasticsearch counts the memory from global ordinals as
|
|
|
+'fielddata'. Global ordinals memory is included in the
|
|
|
+<<fielddata-circuit-breaker, fielddata circuit breaker>>, and is returned
|
|
|
+under `fielddata` in the <<cluster-nodes-stats, node stats>> response.
|
|
|
+
|
|
|
+==== Loading global ordinals
|
|
|
+
|
|
|
+The global ordinal mapping must be built before ordinals can be used during a
|
|
|
+search. By default, the mapping is loaded during search on the first time that
|
|
|
+global ordinals are needed. This is is the right approach if you are optimizing
|
|
|
+for indexing speed, but if search performance is a priority, it's recommended
|
|
|
+to eagerly load global ordinals eagerly on fields that will be used in
|
|
|
aggregations:
|
|
|
|
|
|
[source,console]
|
|
@@ -48,29 +63,14 @@ PUT my_index/_mapping
|
|
|
------------
|
|
|
// TEST[s/^/PUT my_index\n/]
|
|
|
|
|
|
-This will shift the cost of building the global ordinals from search-time to
|
|
|
-refresh-time. Elasticsearch will make sure that global ordinals are built
|
|
|
-before exposing to searches any changes to the content of the index.
|
|
|
-Elasticsearch will also eagerly build global ordinals when starting a new copy
|
|
|
-of a shard, such as when increasing the number of replicas or when relocating a
|
|
|
-shard onto a new node.
|
|
|
-
|
|
|
-If a shard has been <<indices-forcemerge,force-merged>> down to a single
|
|
|
-segment then its global ordinals are identical to the ordinals for its unique
|
|
|
-segment, which means there is no extra cost for using global ordinals on such a
|
|
|
-shard. Note that for performance reasons you should only force-merge an index
|
|
|
-to which you will never write again.
|
|
|
-
|
|
|
-On a <<frozen-indices,frozen index>>, global ordinals are discarded after each
|
|
|
-search and rebuilt again on the next search if needed or if
|
|
|
-`eager_global_ordinals` is set. This means `eager_global_ordinals` should not
|
|
|
-be used on frozen indices. Instead, force-merge an index to a single segment
|
|
|
-before freezing it so that global ordinals need not be built separately on each
|
|
|
-search.
|
|
|
-
|
|
|
-If you ever decide that you do not need to run `terms` aggregations on this
|
|
|
-field anymore, then you can disable eager loading of global ordinals at any
|
|
|
-time:
|
|
|
+When `eager_global_ordinals` is enabled, global ordinals are built when a shard
|
|
|
+is <<indices-refresh, refreshed>> -- Elasticsearch always loads them before
|
|
|
+exposing changes to the content of the index. This shifts the cost of building
|
|
|
+global ordinals from search to index-time. Elasticsearch will also eagerly
|
|
|
+build global ordinals when creating a new copy of a shard, as can occur when
|
|
|
+increasing the number of replicas or relocating a shard onto a new node.
|
|
|
+
|
|
|
+Eager loading can be disabled at any time by updating the `eager_global_ordinals` setting:
|
|
|
|
|
|
[source,console]
|
|
|
------------
|
|
@@ -85,3 +85,33 @@ PUT my_index/_mapping
|
|
|
}
|
|
|
------------
|
|
|
// TEST[continued]
|
|
|
+
|
|
|
+IMPORTANT: On a <<frozen-indices,frozen index>>, global ordinals are discarded
|
|
|
+after each search and rebuilt again when they're requested. This means that
|
|
|
+`eager_global_ordinals` should not be used on frozen indices: it would
|
|
|
+cause global ordinals to be reloaded on every search. Instead, the index should
|
|
|
+be force-merged to a single segment before being frozen. This avoids building
|
|
|
+global ordinals altogether (more details can be found in the next section).
|
|
|
+
|
|
|
+==== Avoiding global ordinal loading
|
|
|
+
|
|
|
+Usually, global ordinals do not present a large overhead in terms of their
|
|
|
+loading time and memory usage. However, loading global ordinals can be
|
|
|
+expensive on indices with large shards, or if the fields contain a large
|
|
|
+number of unique term values. Because global ordinals provide a unified mapping
|
|
|
+for all segments on the shard, they also need to be rebuilt entirely when a new
|
|
|
+segment becomes visible.
|
|
|
+
|
|
|
+In some cases it is possible to avoid global ordinal loading altogether:
|
|
|
+
|
|
|
+* The `terms`, `sampler`, and `significant_terms` aggregations support a
|
|
|
+parameter
|
|
|
+<<search-aggregations-bucket-terms-aggregation-execution-hint, `execution_hint`>>
|
|
|
+that helps control how buckets are collected. It defaults to `global_ordinals`,
|
|
|
+but can be set to `map` to instead use the term values directly.
|
|
|
+* If a shard has been <<indices-forcemerge,force-merged>> down to a single
|
|
|
+segment, then its segment ordinals are already 'global' to the shard. In this
|
|
|
+case, Elasticsearch does not need to build a global ordinal mapping and there
|
|
|
+is no additional overhead from using global ordinals. Note that for performance
|
|
|
+reasons you should only force-merge an index to which you will never write to
|
|
|
+again.
|