Browse Source

Add note on per-segment field name overhead (#89152)

We encountered a case where a substantial fraction of the heap usage was
due to per-segment-per-field `FieldInfo` objects, particularly
`FieldInfo#name`. This commit adds a note to the sizing docs about this
overhead.
David Turner 3 years ago
parent
commit
546a2e2898
1 changed files with 10 additions and 3 deletions
  1. 10 3
      docs/reference/how-to/size-your-shards.asciidoc

+ 10 - 3
docs/reference/how-to/size-your-shards.asciidoc

@@ -55,14 +55,14 @@ thread pool>>. This can result in low throughput and slow search speeds.
 
 [discrete]
 [[each-shard-has-overhead]]
-==== Each index, shard and field has overhead
+==== Each index, shard, segment and field has overhead
 
 Every index and every shard requires some memory and CPU resources. In most
 cases, a small set of large shards uses fewer resources than many small shards.
 
 Segments play a big role in a shard's resource usage. Most shards contain
-several segments, which store its index data. {es} keeps segment metadata in
-JVM heap memory so it can be quickly retrieved for searches. As a shard grows,
+several segments, which store its index data. {es} keeps some segment metadata
+in heap memory so it can be quickly retrieved for searches. As a shard grows,
 its segments are <<index-modules-merge,merged>> into fewer, larger segments.
 This decreases the number of segments, which means less metadata is kept in
 heap memory.
@@ -72,6 +72,13 @@ space. By default {es} will automatically create a mapping for every field in
 every document it indexes, but you can switch off this behaviour to
 <<explicit-mapping,take control of your mappings>>.
 
+Moreover every segment requires a small amount of heap memory for each mapped
+field. This per-segment-per-field heap overhead includes a copy of the field
+name, encoded using ISO-8859-1 if applicable or UTF-16 otherwise. Usually this
+is not noticeable, but you may need to account for this overhead if your shards
+have high segment counts and the corresponding mappings contain high field
+counts and/or very long field names.
+
 [discrete]
 [[shard-auto-balance]]
 ==== {es} automatically balances shards within a data tier