Browse Source

Add docs about linearizable registers in repositories (#102050)

We don't mention linearizable registers in the snapshot/restore docs
today, but these things are verified for correctness by the repository
analysis API and some users with incorrect repository implementations
struggle to understand the verification errors. This commit adds some
docs to describe them and their various implementations.
David Turner 1 year ago
parent
commit
20a1503001

+ 14 - 0
docs/reference/snapshot-restore/apis/repo-analysis-api.asciidoc

@@ -178,6 +178,20 @@ complete. In this case it still instructs some of the other nodes in the
 cluster to attempt to read the blob, but all of these reads must fail to find
 the blob.
 
+Linearizable registers are special blobs that {es} manipulates using an atomic
+compare-and-exchange operation. This operation ensures correct and
+strongly-consistent behavior even when the blob is accessed by multiple nodes
+at the same time. The detailed implementation of the compare-and-exchange
+operation on linearizable registers varies by repository type. Repository
+analysis verifies that that uncontended compare-and-exchange operations on a
+linearizable register blob always succeed. Repository analysis also verifies
+that contended operations either succeed or report the contention but do not
+return incorrect results. If an operation fails due to contention, {es} retries
+the operation until it succeeds. Most of the compare-and-exchange operations
+performed by repository analysis atomically increment a counter which is
+represented as an 8-byte blob. Some operations also verify the behavior on
+small blobs with sizes other than 8 bytes.
+
 [[repo-analysis-api-path-params]]
 ==== {api-path-parms-title}
 

+ 13 - 0
docs/reference/snapshot-restore/repository-azure.asciidoc

@@ -257,3 +257,16 @@ following naming rules:
 permitted in container names.
 * All letters in a container name must be lowercase.
 * Container names must be from 3 through 63 characters long.
+
+[[repository-azure-linearizable-registers]]
+[discrete]
+==== Linearizable register implementation
+
+The linearizable register implementation for Azure repositories is based on
+Azure's support for strongly consistent leases. Each lease may only be held by
+a single node at any time. The node presents its lease when performing a read
+or write operation on a protected blob. Lease-protected operations fail if the
+lease is invalid or expired. To perform a compare-and-exchange operation on a
+register, {es} first obtains a lease on the blob, then reads the blob contents
+under the lease, and finally uploads the updated blob under the same lease.
+This process ensures that the read and write operations happen atomically.

+ 11 - 0
docs/reference/snapshot-restore/repository-gcs.asciidoc

@@ -275,3 +275,14 @@ The service account used to access the bucket must have the "Writer" access to t
 3. Go to the https://console.cloud.google.com/storage/browser[Storage Browser].
 4. Select the bucket and "Edit bucket permission".
 5. The service account must be configured as a "User" with "Writer" access.
+
+[[repository-gcs-linearizable-registers]]
+[discrete]
+==== Linearizable register implementation
+
+The linearizable register implementation for GCS repositories is based on GCS's
+support for strongly consistent preconditions on put-blob operations. To
+perform a compare-and-exchange operation on a register, {es} retrieves the
+register blob and its current generation, and then uploads the updated blob
+using the observed generation as its precondition. The precondition ensures
+that the generation has not changed in the meantime.

+ 11 - 0
docs/reference/snapshot-restore/repository-s3.asciidoc

@@ -557,3 +557,14 @@ IMPORTANT: The symlink must be created on all data and master eligible nodes and
 by the `elasticsearch` user. By default, {es} runs as user `elasticsearch` using uid:gid `1000:0`.
 
 If the symlink exists, it will be used by default by all S3 repositories that don't have explicit `client` credentials.
+
+[[repository-s3-linearizable-registers]]
+[discrete]
+==== Linearizable register implementation
+
+The linearizable register implementation for S3 repositories is based on the
+strongly consistent semantics of the multipart upload API. {es} first creates a
+multipart upload to indicate its intention to perform a linearizable register
+operation. {es} then lists and cancels all other multipart uploads for the same
+register. {es} then attempts to complete the upload. If the upload completes
+successfully then the compare-and-exchange operation was atomic.

+ 10 - 0
docs/reference/snapshot-restore/repository-shared-file-system.asciidoc

@@ -84,3 +84,13 @@ each node, but for these accounts to have different numeric user or group IDs.
 If your shared file system uses NFS then ensure that every node is running with
 the same numeric UID and GID, or else update your NFS configuration to account
 for the variance in numeric IDs across nodes.
+
+[[repository-fs-linearizable-registers]]
+[discrete]
+==== Linearizable register implementation
+
+The linearizable register implementation for shared filesystem repositories is
+based around file locking. To perform a compare-and-exchange operation on a
+register, {es} first locks he underlying file and then writes the updated
+contents under the same lock. This ensures that the file has not changed in the
+meantime.