Browse Source

Clarify the resiliency trade-off of disabling replicas to speed up indexing. (#52714)

We should be more explicit about the downsides of disabling replicas and
explain that users should be ready to re-do the entire load in case of
issues mid-way.
Adrien Grand 5 years ago
parent
commit
b2ff78dec7
1 changed files with 12 additions and 9 deletions
  1. 12 9
      docs/reference/how-to/indexing-speed.asciidoc

+ 12 - 9
docs/reference/how-to/indexing-speed.asciidoc

@@ -58,15 +58,18 @@ gets indexed and when it becomes visible, increasing the
 `30s`, might help improve indexing speed.
 
 [float]
-=== Disable refresh and replicas for initial loads
-
-If you need to load a large amount of data at once, you should disable refresh
-by setting `index.refresh_interval` to `-1` and set `index.number_of_replicas`
-to `0`. This will temporarily put your index at risk since the loss of any shard
-will cause data loss, but at the same time indexing will be faster since
-documents will be indexed only once. Once the initial loading is finished, you
-can set `index.refresh_interval` and `index.number_of_replicas` back to their
-original values.
+=== Disable replicas for initial loads
+
+If you have a large amount of data that you want to load all at once into
+Elasticsearch, it may be beneficial to set `index.number_of_replicas` to `0` in
+order to speep up indexing. Having no replicas means that losing a single node
+may incur data loss, so it is important that the data lives elsewhere so that
+this initial load can be retried in case of an issue. Once the initial load is
+finished, you can set `index.number_of_replicas` back to its original value.
+
+If `index.refresh_interval` is configured in the index settings, it may further
+help to unset it during this initial load and setting it back to its original
+value once the initial load is finished.
 
 [float]
 === Disable swapping