6 years ago · 87bb5d0872
--- a/docs/plugins/repository-s3.asciidoc
+++ b/docs/plugins/repository-s3.asciidoc
@@ -1,21 +1,25 @@
 
				 [[repository-s3]]
			
 
				 === S3 Repository Plugin
			
 
				 
			
 
				-The S3 repository plugin adds support for using S3 as a repository for
			
 
				+The S3 repository plugin adds support for using AWS S3 as a repository for
			
 
				 {ref}/modules-snapshots.html[Snapshot/Restore].
			
 
				 
			
 
				-*If you are looking for a hosted solution of Elasticsearch on AWS, please visit http://www.elastic.co/cloud.*
			
 
				+*If you are looking for a hosted solution of Elasticsearch on AWS, please visit
			
 
				+http://www.elastic.co/cloud.*
			
 
				 
			
 
				 :plugin_name: repository-s3
			
 
				 include::install_remove.asciidoc[]
			
 
				 
			
 
				 [[repository-s3-usage]]
			
 
				-==== Getting started with AWS
			
 
				+==== Getting Started
			
 
				 
			
 
				-The plugin provides a repository type named `s3` which may be used when creating a repository.
			
 
				-The repository defaults to using https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html[ECS IAM Role] or
			
 
				-http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html[EC2 IAM Role]
			
 
				-credentials for authentication. The only mandatory setting is the bucket name:
			
 
				+The plugin provides a repository type named `s3` which may be used when creating
			
 
				+a repository. The repository defaults to using
			
 
				+https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html[ECS
			
 
				+IAM Role] or
			
 
				+http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html[EC2
			
 
				+IAM Role] credentials for authentication. The only mandatory setting is the
			
 
				+bucket name:
			
 
				 
			
 
				 [source,js]
			
 
				 ----
			
@@ -34,10 +38,10 @@ PUT _snapshot/my_s3_repository
 
				 [[repository-s3-client]]
			
 
				 ==== Client Settings
			
 
				 
			
 
				-The client that you use to connect to S3 has a number of settings available. The
			
 
				-settings have the form `s3.client.CLIENT_NAME.SETTING_NAME`. The default client
			
 
				-name that is looked up by an `s3` repository is `default`. It can be modified
			
 
				-using the <<repository-s3-repository,repository setting>> `client`. For example:
			
 
				+The client that you use to connect to S3 has a number of settings available.
			
 
				+The settings have the form `s3.client.CLIENT_NAME.SETTING_NAME`. By default,
			
 
				+`s3` repositories use a client named `default`, but this can be modified using
			
 
				+the <<repository-s3-repository,repository setting>> `client`. For example:
			
 
				 
			
 
				 [source,js]
			
 
				 ----
			
@@ -51,7 +55,7 @@ PUT _snapshot/my_s3_repository
 
				 }
			
 
				 ----
			
 
				 // CONSOLE
			
 
				-// TEST[skip:we don't have s3 setup while testing this]
			
 
				+// TEST[skip:we don't have S3 setup while testing this]
			
 
				 
			
 
				 Most client settings can be added to the `elasticsearch.yml` configuration file
			
 
				 with the exception of the secure settings, which you add to the {es} keystore.
			
@@ -74,9 +78,9 @@ contents, will utilize the latest settings from the keystore. Any existing `s3`
 
				 repositories, as well as any newly created ones, will pick up the new values
			
 
				 stored in the keystore.
			
 
				 
			
 
				-NOTE: In progress snapshot/restore tasks will not be preempted by a *reload*
			
 
				-of the client's secure settings. The task will complete using the client as it
			
 
				-was built when the operation started.
			
 
				+NOTE: In-progress snapshot/restore tasks will not be preempted by a *reload* of
			
 
				+the client's secure settings. The task will complete using the client as it was
			
 
				+built when the operation started.
			
 
				 
			
 
				 The following list contains the available client settings. Those that must be
			
 
				 stored in the keystore are marked as "secure" and are *reloadable*; the other
			
@@ -84,61 +88,86 @@ settings belong in the `elasticsearch.yml` file.
 
				 
			
 
				 `access_key` ({ref}/secure-settings.html[Secure])::
			
 
				 
			
 
				-    An s3 access key. The `secret_key` setting must also be specified. 
			
 
				+    An S3 access key. The `secret_key` setting must also be specified.
			
 
				 
			
 
				 `secret_key` ({ref}/secure-settings.html[Secure])::
			
 
				 
			
 
				-    An s3 secret key. The `access_key` setting must also be specified. 
			
 
				+    An S3 secret key. The `access_key` setting must also be specified.
			
 
				 
			
 
				 `session_token`::
			
 
				-    An s3 session token. The `access_key` and `secret_key` settings must also
			
 
				-    be specified. (Secure)
			
 
				+
			
 
				+    An S3 session token. The `access_key` and `secret_key` settings must also be
			
 
				+    specified. (Secure)
			
 
				 
			
 
				 `endpoint`::
			
 
				 
			
 
				-    The s3 service endpoint to connect to. This will be automatically
			
 
				-    figured out by the s3 client based on the bucket location, but
			
 
				-    can be specified explicitly. See http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region.
			
 
				+    The S3 service endpoint to connect to. This defaults to `s3.amazonaws.com`
			
 
				+    but the
			
 
				+    http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region[AWS
			
 
				+    documentation] lists alternative S3 endpoints. If you are using an
			
 
				+    <<repository-s3-compatible-services,S3-compatible service>> then you should
			
 
				+    set this to the service's endpoint.
			
 
				 
			
 
				 `protocol`::
			
 
				 
			
 
				-    The protocol to use to connect to s3. Valid values are either `http`
			
 
				-    or `https`. Defaults to `https`. 
			
 
				+    The protocol to use to connect to S3. Valid values are either `http` or
			
 
				+    `https`. Defaults to `https`.
			
 
				 
			
 
				 `proxy.host`::
			
 
				 
			
 
				-    The host name of a proxy to connect to s3 through.
			
 
				+    The host name of a proxy to connect to S3 through.
			
 
				 
			
 
				 `proxy.port`::
			
 
				 
			
 
				-    The port of a proxy to connect to s3 through.
			
 
				+    The port of a proxy to connect to S3 through.
			
 
				 
			
 
				 `proxy.username` ({ref}/secure-settings.html[Secure])::
			
 
				 
			
 
				-    The username to connect to the `proxy.host` with.    
			
 
				+    The username to connect to the `proxy.host` with.
			
 
				 
			
 
				 `proxy.password` ({ref}/secure-settings.html[Secure])::
			
 
				 
			
 
				-    The password to connect to the `proxy.host` with.   
			
 
				+    The password to connect to the `proxy.host` with.
			
 
				 
			
 
				 `read_timeout`::
			
 
				 
			
 
				-    The socket timeout for connecting to s3. The value should specify the unit. For example,
			
 
				-    a value of `5s` specifies a 5 second timeout. The default value is 50 seconds.
			
 
				+    The socket timeout for connecting to S3. The value should specify the unit.
			
 
				+    For example, a value of `5s` specifies a 5 second timeout. The default value
			
 
				+    is 50 seconds.
			
 
				 
			
 
				 `max_retries`::
			
 
				 
			
 
				-    The number of retries to use when an s3 request fails. The default value is 3.
			
 
				+    The number of retries to use when an S3 request fails. The default value is
			
 
				+    `3`.
			
 
				 
			
 
				 `use_throttle_retries`::
			
 
				 
			
 
				-    Whether retries should be throttled (ie use backoff). Must be `true` or `false`. Defaults to `true`.
			
 
				+    Whether retries should be throttled (i.e. should back off). Must be `true`
			
 
				+    or `false`. Defaults to `true`.
			
 
				+
			
 
				+[float]
			
 
				+[[repository-s3-compatible-services]]
			
 
				+===== S3-compatible services
			
 
				+
			
 
				+There are a number of storage systems that provide an S3-compatible API, and
			
 
				+the `repository-s3` plugin allows you to use these systems in place of AWS S3.
			
 
				+To do so, you should set the `s3.client.CLIENT_NAME.endpoint` setting to the
			
 
				+system's endpoint. This setting accepts IP addresses and hostnames and may
			
 
				+include a port. For example, the endpoint may be `172.17.0.2` or
			
 
				+`172.17.0.2:9000`. You may also need to set `s3.client.CLIENT_NAME.protocol` to
			
 
				+`http` if the endpoint does not support HTTPS.
			
 
				+
			
 
				+https://minio.io[Minio] is an example of a storage system that provides an
			
 
				+S3-compatible API. The `repository-s3` plugin allows {es} to work with
			
 
				+Minio-backed repositories as well as repositories stored on AWS S3. Other
			
 
				+S3-compatible storage systems may also work with {es}, but these are not tested
			
 
				+or supported.
			
 
				 
			
 
				 [[repository-s3-repository]]
			
 
				 ==== Repository Settings
			
 
				 
			
 
				-The `s3` repository type supports a number of settings to customize how data is stored in S3.
			
 
				-These can be specified when creating the repository. For example:
			
 
				+The `s3` repository type supports a number of settings to customize how data is
			
 
				+stored in S3. These can be specified when creating the repository. For example:
			
 
				 
			
 
				 [source,js]
			
 
				 ----
			
@@ -152,7 +181,7 @@ PUT _snapshot/my_s3_repository
 
				 }
			
 
				 ----
			
 
				 // CONSOLE
			
 
				-// TEST[skip:we don't have s3 set up while testing this]
			
 
				+// TEST[skip:we don't have S3 set up while testing this]
			
 
				 
			
 
				 The following settings are supported:
			
 
				 
			
@@ -162,21 +191,21 @@ The following settings are supported:
 
				 
			
 
				 `client`::
			
 
				 
			
 
				-    The name of the s3 client to use to connect to S3. Defaults to `default`.
			
 
				+    The name of the <<repository-s3-client,S3 client>> to use to connect to S3.
			
 
				+    Defaults to `default`.
			
 
				 
			
 
				 `base_path`::
			
 
				 
			
 
				-    Specifies the path within bucket to repository data. Defaults to
			
 
				-    value of `repositories.s3.base_path` or to root directory if not set.
			
 
				-    Previously, the base_path could take a leading `/` (forward slash).
			
 
				-    However, this has been deprecated and setting the base_path now should
			
 
				-    omit the leading `/`.
			
 
				+    Specifies the path within bucket to repository data. Defaults to value of
			
 
				+    `repositories.s3.base_path` or to root directory if not set.  Previously,
			
 
				+    the base_path could take a leading `/` (forward slash).  However, this has
			
 
				+    been deprecated and setting the base_path now should omit the leading `/`.
			
 
				 
			
 
				 `chunk_size`::
			
 
				 
			
 
				-    Big files can be broken down into chunks during snapshotting if needed.
			
 
				-    The chunk size can be specified in bytes or by using size value notation,
			
 
				-    i.e. `1gb`, `10mb`, `5kb`. Defaults to `1gb`.
			
 
				+    Big files can be broken down into chunks during snapshotting if needed.  The
			
 
				+    chunk size can be specified in bytes or by using size value notation, i.e.
			
 
				+    `1gb`, `10mb`, `5kb`. Defaults to `1gb`.
			
 
				 
			
 
				 `compress`::
			
 
				 
			
@@ -191,41 +220,49 @@ The following settings are supported:
 
				 
			
 
				 `buffer_size`::
			
 
				 
			
 
				-    Minimum threshold below which the chunk is uploaded using a single
			
 
				-    request. Beyond this threshold, the S3 repository will use the
			
 
				-    http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html[AWS Multipart Upload API]
			
 
				-    to split the chunk into several parts, each of `buffer_size` length, and
			
 
				-    to upload each part in its own request. Note that setting a buffer
			
 
				-    size lower than `5mb` is not allowed since it will prevent the use of the
			
 
				-    Multipart API and may result in upload errors. It is also not possible to
			
 
				-    set a buffer size greater than `5gb` as it is the maximum upload size
			
 
				-    allowed by S3. Defaults to the minimum between `100mb` and `5%` of the heap size.
			
 
				+    Minimum threshold below which the chunk is uploaded using a single request.
			
 
				+    Beyond this threshold, the S3 repository will use the
			
 
				+    http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html[AWS
			
 
				+    Multipart Upload API] to split the chunk into several parts, each of
			
 
				+    `buffer_size` length, and to upload each part in its own request. Note that
			
 
				+    setting a buffer size lower than `5mb` is not allowed since it will prevent
			
 
				+    the use of the Multipart API and may result in upload errors. It is also not
			
 
				+    possible to set a buffer size greater than `5gb` as it is the maximum upload
			
 
				+    size allowed by S3. Defaults to the minimum between `100mb` and `5%` of the
			
 
				+    heap size.
			
 
				 
			
 
				 `canned_acl`::
			
 
				 
			
 
				-    The S3 repository supports all http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl[S3 canned ACLs]
			
 
				-    : `private`, `public-read`, `public-read-write`, `authenticated-read`, `log-delivery-write`,
			
 
				-    `bucket-owner-read`, `bucket-owner-full-control`. Defaults to `private`.
			
 
				-    You could specify a canned ACL using the `canned_acl` setting. When the S3 repository
			
 
				-    creates buckets and objects, it adds the canned ACL into the buckets and objects.
			
 
				+    The S3 repository supports all
			
 
				+    http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl[S3
			
 
				+    canned ACLs] : `private`, `public-read`, `public-read-write`,
			
 
				+    `authenticated-read`, `log-delivery-write`, `bucket-owner-read`,
			
 
				+    `bucket-owner-full-control`. Defaults to `private`.  You could specify a
			
 
				+    canned ACL using the `canned_acl` setting. When the S3 repository creates
			
 
				+    buckets and objects, it adds the canned ACL into the buckets and objects.
			
 
				 
			
 
				 `storage_class`::
			
 
				 
			
 
				     Sets the S3 storage class for objects stored in the snapshot repository.
			
 
				-    Values may be `standard`, `reduced_redundancy`, `standard_ia`.
			
 
				-    Defaults to `standard`. Changing this setting on an existing repository
			
 
				-    only affects the storage class for newly created objects, resulting in a
			
 
				-    mixed usage of storage classes. Additionally, S3 Lifecycle Policies can
			
 
				-    be used to manage the storage class of existing objects.
			
 
				-    Due to the extra complexity with the Glacier class lifecycle, it is not
			
 
				-    currently supported by the plugin. For more information about the
			
 
				-    different classes, see http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html[AWS Storage Classes Guide]
			
 
				-
			
 
				-NOTE: The option of defining client settings in the repository settings as documented below is considered deprecated:
			
 
				-
			
 
				-In addition to the above settings, you may also specify all non-secure client settings in the repository settings.
			
 
				-In this case, the client settings found in the repository settings will be merged with those of the named client used by the repository.
			
 
				-Conflicts between client and repository settings are resolved by the repository settings taking precedence over client settings.
			
 
				+    Values may be `standard`, `reduced_redundancy`, `standard_ia`.  Defaults to
			
 
				+    `standard`. Changing this setting on an existing repository only affects the
			
 
				+    storage class for newly created objects, resulting in a mixed usage of
			
 
				+    storage classes. Additionally, S3 Lifecycle Policies can be used to manage
			
 
				+    the storage class of existing objects.  Due to the extra complexity with the
			
 
				+    Glacier class lifecycle, it is not currently supported by the plugin.  For
			
 
				+    more information about the different classes, see
			
 
				+    http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html[AWS
			
 
				+    Storage Classes Guide]
			
 
				+
			
 
				+NOTE: The option of defining client settings in the repository settings as
			
 
				+documented below is considered deprecated, and will be removed in a future
			
 
				+version.
			
 
				+
			
 
				+In addition to the above settings, you may also specify all non-secure client
			
 
				+settings in the repository settings.  In this case, the client settings found in
			
 
				+the repository settings will be merged with those of the named client used by
			
 
				+the repository.  Conflicts between client and repository settings are resolved
			
 
				+by the repository settings taking precedence over client settings.
			
 
				 
			
 
				 For example:
			
 
				 
			
@@ -244,16 +281,19 @@ PUT _snapshot/my_s3_repository
 
				 // CONSOLE
			
 
				 // TEST[skip:we don't have s3 set up while testing this]
			
 
				 
			
 
				-This sets up a repository that uses all client settings from the client `my_client_named` except for the `endpoint` that is overridden
			
 
				-to `my.s3.endpoint` by the repository settings.
			
 
				+This sets up a repository that uses all client settings from the client
			
 
				+`my_client_name` except for the `endpoint` that is overridden to
			
 
				+`my.s3.endpoint` by the repository settings.
			
 
				 
			
 
				 [[repository-s3-permissions]]
			
 
				 ===== Recommended S3 Permissions
			
 
				 
			
 
				-In order to restrict the Elasticsearch snapshot process to the minimum required resources, we recommend using Amazon
			
 
				-IAM in conjunction with pre-existing S3 buckets. Here is an example policy which will allow the snapshot access to an
			
 
				- S3 bucket named "snaps.example.com". This may be configured through the AWS IAM console, by creating a Custom Policy,
			
 
				- and using a Policy Document similar to this (changing snaps.example.com to your bucket name).
			
 
				+In order to restrict the Elasticsearch snapshot process to the minimum required
			
 
				+resources, we recommend using Amazon IAM in conjunction with pre-existing S3
			
 
				+buckets. Here is an example policy which will allow the snapshot access to an S3
			
 
				+bucket named "snaps.example.com". This may be configured through the AWS IAM
			
 
				+console, by creating a Custom Policy, and using a Policy Document similar to
			
 
				+this (changing snaps.example.com to your bucket name).
			
 
				 
			
 
				 [source,js]
			
 
				 ----
			
@@ -290,7 +330,8 @@ IAM in conjunction with pre-existing S3 buckets. Here is an example policy which
 
				 ----
			
 
				 // NOTCONSOLE
			
 
				 
			
 
				-You may further restrict the permissions by specifying a prefix within the bucket, in this example, named "foo".
			
 
				+You may further restrict the permissions by specifying a prefix within the
			
 
				+bucket, in this example, named "foo".
			
 
				 
			
 
				 [source,js]
			
 
				 ----
			
@@ -334,16 +375,23 @@ You may further restrict the permissions by specifying a prefix within the bucke
 
				 ----
			
 
				 // NOTCONSOLE
			
 
				 
			
 
				-The bucket needs to exist to register a repository for snapshots. If you did not create the bucket then the repository
			
 
				-registration will fail.
			
 
				+The bucket needs to exist to register a repository for snapshots. If you did not
			
 
				+create the bucket then the repository registration will fail.
			
 
				 
			
 
				-Note: Starting in version 7.0, all bucket operations are using the path style access pattern. In previous versions the decision to use virtual hosted style
			
 
				-or path style access was made by the AWS Java SDK.
			
 
				+Note: Starting in version 7.0, all bucket operations are using the path style
			
 
				+access pattern. In previous versions the decision to use virtual hosted style or
			
 
				+path style access was made by the AWS Java SDK.
			
 
				 
			
 
				 [[repository-s3-aws-vpc]]
			
 
				 [float]
			
 
				 ==== AWS VPC Bandwidth Settings
			
 
				 
			
 
				-AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch instances reside in a private subnet in an AWS VPC then all traffic to S3 will go through that VPC's NAT instance. If your VPC's NAT instance is a smaller instance size (e.g. a t1.micro) or is handling a high volume of network traffic your bandwidth to S3 may be limited by that NAT instance's networking bandwidth limitations.
			
 
				+AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch
			
 
				+instances reside in a private subnet in an AWS VPC then all traffic to S3 will
			
 
				+go through that VPC's NAT instance. If your VPC's NAT instance is a smaller
			
 
				+instance size (e.g. a t1.micro) or is handling a high volume of network traffic
			
 
				+your bandwidth to S3 may be limited by that NAT instance's networking bandwidth
			
 
				+limitations.
			
 
				 
			
 
				-Instances residing in a public subnet in an AWS VPC will connect to S3 via the VPC's internet gateway and not be bandwidth limited by the VPC's NAT instance.
			
 
				+Instances residing in a public subnet in an AWS VPC will connect to S3 via the
			
 
				+VPC's internet gateway and not be bandwidth limited by the VPC's NAT instance.