Pārlūkot izejas kodu

Add note on per-segment field name overhead (#89152)

We encountered a case where a substantial fraction of the heap usage was
due to per-segment-per-field `FieldInfo` objects, particularly
`FieldInfo#name`. This commit adds a note to the sizing docs about this
overhead.
David Turner 3 gadi atpakaļ
vecāks
revīzija
546a2e2898
1 mainītis faili ar 10 papildinājumiem un 3 dzēšanām
  1. 10 3
      docs/reference/how-to/size-your-shards.asciidoc

+ 10 - 3
docs/reference/how-to/size-your-shards.asciidoc

@@ -55,14 +55,14 @@ thread pool>>. This can result in low throughput and slow search speeds.
 
 
 [discrete]
 [discrete]
 [[each-shard-has-overhead]]
 [[each-shard-has-overhead]]
-==== Each index, shard and field has overhead
+==== Each index, shard, segment and field has overhead
 
 
 Every index and every shard requires some memory and CPU resources. In most
 Every index and every shard requires some memory and CPU resources. In most
 cases, a small set of large shards uses fewer resources than many small shards.
 cases, a small set of large shards uses fewer resources than many small shards.
 
 
 Segments play a big role in a shard's resource usage. Most shards contain
 Segments play a big role in a shard's resource usage. Most shards contain
-several segments, which store its index data. {es} keeps segment metadata in
-JVM heap memory so it can be quickly retrieved for searches. As a shard grows,
+several segments, which store its index data. {es} keeps some segment metadata
+in heap memory so it can be quickly retrieved for searches. As a shard grows,
 its segments are <<index-modules-merge,merged>> into fewer, larger segments.
 its segments are <<index-modules-merge,merged>> into fewer, larger segments.
 This decreases the number of segments, which means less metadata is kept in
 This decreases the number of segments, which means less metadata is kept in
 heap memory.
 heap memory.
@@ -72,6 +72,13 @@ space. By default {es} will automatically create a mapping for every field in
 every document it indexes, but you can switch off this behaviour to
 every document it indexes, but you can switch off this behaviour to
 <<explicit-mapping,take control of your mappings>>.
 <<explicit-mapping,take control of your mappings>>.
 
 
+Moreover every segment requires a small amount of heap memory for each mapped
+field. This per-segment-per-field heap overhead includes a copy of the field
+name, encoded using ISO-8859-1 if applicable or UTF-16 otherwise. Usually this
+is not noticeable, but you may need to account for this overhead if your shards
+have high segment counts and the corresponding mappings contain high field
+counts and/or very long field names.
+
 [discrete]
 [discrete]
 [[shard-auto-balance]]
 [[shard-auto-balance]]
 ==== {es} automatically balances shards within a data tier
 ==== {es} automatically balances shards within a data tier