123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449 |
- [[repository-s3]]
- === S3 Repository Plugin
- The S3 repository plugin adds support for using AWS S3 as a repository for
- {ref}/modules-snapshots.html[Snapshot/Restore].
- *If you are looking for a hosted solution of Elasticsearch on AWS, please visit
- https://www.elastic.co/cloud/.*
- :plugin_name: repository-s3
- include::install_remove.asciidoc[]
- [[repository-s3-usage]]
- ==== Getting Started
- The plugin provides a repository type named `s3` which may be used when creating
- a repository. The repository defaults to using
- https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html[ECS
- IAM Role] or
- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html[EC2
- IAM Role] credentials for authentication. The only mandatory setting is the
- bucket name:
- [source,console]
- ----
- PUT _snapshot/my_s3_repository
- {
- "type": "s3",
- "settings": {
- "bucket": "my_bucket"
- }
- }
- ----
- // TEST[skip:we don't have s3 setup while testing this]
- [[repository-s3-client]]
- ==== Client Settings
- The client that you use to connect to S3 has a number of settings available.
- The settings have the form `s3.client.CLIENT_NAME.SETTING_NAME`. By default,
- `s3` repositories use a client named `default`, but this can be modified using
- the <<repository-s3-repository,repository setting>> `client`. For example:
- [source,console]
- ----
- PUT _snapshot/my_s3_repository
- {
- "type": "s3",
- "settings": {
- "bucket": "my_bucket",
- "client": "my_alternate_client"
- }
- }
- ----
- // TEST[skip:we don't have S3 setup while testing this]
- Most client settings can be added to the `elasticsearch.yml` configuration file
- with the exception of the secure settings, which you add to the {es} keystore.
- For more information about creating and updating the {es} keystore, see
- {ref}/secure-settings.html[Secure settings].
- For example, if you want to use specific credentials to access S3 then run the
- following commands to add these credentials to the keystore:
- [source,sh]
- ----
- bin/elasticsearch-keystore add s3.client.default.access_key
- bin/elasticsearch-keystore add s3.client.default.secret_key
- # a session token is optional so the following command may not be needed
- bin/elasticsearch-keystore add s3.client.default.session_token
- ----
- If instead you want to use the instance role or container role to access S3
- then you should leave these settings unset. You can switch from using specific
- credentials back to the default of using the instance role or container role by
- removing these settings from the keystore as follows:
- [source,sh]
- ----
- bin/elasticsearch-keystore remove s3.client.default.access_key
- bin/elasticsearch-keystore remove s3.client.default.secret_key
- # a session token is optional so the following command may not be needed
- bin/elasticsearch-keystore remove s3.client.default.session_token
- ----
- *All* client secure settings of this plugin are
- {ref}/secure-settings.html#reloadable-secure-settings[reloadable]. After you
- reload the settings, the internal `s3` clients, used to transfer the snapshot
- contents, will utilize the latest settings from the keystore. Any existing `s3`
- repositories, as well as any newly created ones, will pick up the new values
- stored in the keystore.
- NOTE: In-progress snapshot/restore tasks will not be preempted by a *reload* of
- the client's secure settings. The task will complete using the client as it was
- built when the operation started.
- The following list contains the available client settings. Those that must be
- stored in the keystore are marked as "secure" and are *reloadable*; the other
- settings belong in the `elasticsearch.yml` file.
- `access_key` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
- An S3 access key. If set, the `secret_key` setting must also be specified.
- If unset, the client will use the instance or container role instead.
- `secret_key` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
- An S3 secret key. If set, the `access_key` setting must also be specified.
- `session_token` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
- An S3 session token. If set, the `access_key` and `secret_key` settings
- must also be specified.
- `endpoint`::
- The S3 service endpoint to connect to. This defaults to `s3.amazonaws.com`
- but the
- https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region[AWS
- documentation] lists alternative S3 endpoints. If you are using an
- <<repository-s3-compatible-services,S3-compatible service>> then you should
- set this to the service's endpoint.
- `protocol`::
- The protocol to use to connect to S3. Valid values are either `http` or
- `https`. Defaults to `https`.
- `proxy.host`::
- The host name of a proxy to connect to S3 through.
- `proxy.port`::
- The port of a proxy to connect to S3 through.
- `proxy.username` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
- The username to connect to the `proxy.host` with.
- `proxy.password` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
- The password to connect to the `proxy.host` with.
- `read_timeout`::
- The socket timeout for connecting to S3. The value should specify the unit.
- For example, a value of `5s` specifies a 5 second timeout. The default value
- is 50 seconds.
- `max_retries`::
- The number of retries to use when an S3 request fails. The default value is
- `3`.
- `use_throttle_retries`::
- Whether retries should be throttled (i.e. should back off). Must be `true`
- or `false`. Defaults to `true`.
- `path_style_access`::
- Whether to force the use of the path style access pattern. If `true`, the
- path style access pattern will be used. If `false`, the access pattern will
- be automatically determined by the AWS Java SDK (See
- https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#setPathStyleAccessEnabled-java.lang.Boolean-[AWS
- documentation] for details). Defaults to `false`.
- [[repository-s3-path-style-deprecation]]
- NOTE: In versions `7.0`, `7.1`, `7.2` and `7.3` all bucket operations used the
- https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/[now-deprecated]
- path style access pattern. If your deployment requires the path style access
- pattern then you should set this setting to `true` when upgrading.
- `disable_chunked_encoding`::
- Whether chunked encoding should be disabled or not. If `false`, chunked
- encoding is enabled and will be used where appropriate. If `true`, chunked
- encoding is disabled and will not be used, which may mean that snapshot
- operations consume more resources and take longer to complete. It should
- only be set to `true` if you are using a storage service that does not
- support chunked encoding. See the
- https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#disableChunkedEncoding--[AWS
- Java SDK documentation] for details. Defaults to `false`.
- `region`::
- Allows specifying the signing region to use. Specificing this setting manually should not be necessary for most use cases. Generally,
- the SDK will correctly guess the signing region to use. It should be considered an expert level setting to support S3-compatible APIs
- that require https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html[v4 signatures] and use a region other than the
- default `us-east-1`. Defaults to empty string which means that the SDK will try to automatically determine the correct signing region.
- `signer_override`::
- Allows specifying the name of the signature algorithm to use for signing requests by the S3 client. Specifying this setting should not
- be necessary for most use cases. It should be considered an expert level setting to support S3-compatible APIs that do not support the
- signing algorithm that the SDK automatically determines for them.
- See the
- https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html#setSignerOverride-java.lang.String-[AWS
- Java SDK documentation] for details. Defaults to empty string which means that no signing algorithm override will be used.
- [discrete]
- [[repository-s3-compatible-services]]
- ===== S3-compatible services
- There are a number of storage systems that provide an S3-compatible API, and
- the `repository-s3` plugin allows you to use these systems in place of AWS S3.
- To do so, you should set the `s3.client.CLIENT_NAME.endpoint` setting to the
- system's endpoint. This setting accepts IP addresses and hostnames and may
- include a port. For example, the endpoint may be `172.17.0.2` or
- `172.17.0.2:9000`. You may also need to set `s3.client.CLIENT_NAME.protocol` to
- `http` if the endpoint does not support HTTPS.
- https://minio.io[Minio] is an example of a storage system that provides an
- S3-compatible API. The `repository-s3` plugin allows {es} to work with
- Minio-backed repositories as well as repositories stored on AWS S3. Other
- S3-compatible storage systems may also work with {es}, but these are not tested
- or supported.
- [[repository-s3-repository]]
- ==== Repository Settings
- The `s3` repository type supports a number of settings to customize how data is
- stored in S3. These can be specified when creating the repository. For example:
- [source,console]
- ----
- PUT _snapshot/my_s3_repository
- {
- "type": "s3",
- "settings": {
- "bucket": "my_bucket_name",
- "another_setting": "setting_value"
- }
- }
- ----
- // TEST[skip:we don't have S3 set up while testing this]
- The following settings are supported:
- `bucket`::
- The name of the bucket to be used for snapshots. (Mandatory)
- `client`::
- The name of the <<repository-s3-client,S3 client>> to use to connect to S3.
- Defaults to `default`.
- `base_path`::
- Specifies the path within bucket to repository data. Defaults to value of
- `repositories.s3.base_path` or to root directory if not set. Previously,
- the base_path could take a leading `/` (forward slash). However, this has
- been deprecated and setting the base_path now should omit the leading `/`.
- `chunk_size`::
- Big files can be broken down into chunks during snapshotting if needed.
- Specify the chunk size as a value and unit, for example:
- `1GB`, `10MB`, `5KB`, `500B`. Defaults to `1GB`.
- `compress`::
- When set to `true` metadata files are stored in compressed format. This
- setting doesn't affect index files that are already compressed by default.
- Defaults to `true`.
- include::repository-shared-settings.asciidoc[]
- `server_side_encryption`::
- When set to `true` files are encrypted on server side using AES256
- algorithm. Defaults to `false`.
- `buffer_size`::
- Minimum threshold below which the chunk is uploaded using a single request.
- Beyond this threshold, the S3 repository will use the
- https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html[AWS
- Multipart Upload API] to split the chunk into several parts, each of
- `buffer_size` length, and to upload each part in its own request. Note that
- setting a buffer size lower than `5mb` is not allowed since it will prevent
- the use of the Multipart API and may result in upload errors. It is also not
- possible to set a buffer size greater than `5gb` as it is the maximum upload
- size allowed by S3. Defaults to the minimum between `100mb` and `5%` of the
- heap size.
- `canned_acl`::
- The S3 repository supports all
- https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl[S3
- canned ACLs] : `private`, `public-read`, `public-read-write`,
- `authenticated-read`, `log-delivery-write`, `bucket-owner-read`,
- `bucket-owner-full-control`. Defaults to `private`. You could specify a
- canned ACL using the `canned_acl` setting. When the S3 repository creates
- buckets and objects, it adds the canned ACL into the buckets and objects.
- `storage_class`::
- Sets the S3 storage class for objects stored in the snapshot repository.
- Values may be `standard`, `reduced_redundancy`, `standard_ia`, `onezone_ia`
- and `intelligent_tiering`. Defaults to `standard`.
- Changing this setting on an existing repository only affects the
- storage class for newly created objects, resulting in a mixed usage of
- storage classes. Additionally, S3 Lifecycle Policies can be used to manage
- the storage class of existing objects. Due to the extra complexity with the
- Glacier class lifecycle, it is not currently supported by the plugin. For
- more information about the different classes, see
- https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html[AWS
- Storage Classes Guide]
- NOTE: The option of defining client settings in the repository settings as
- documented below is considered deprecated, and will be removed in a future
- version.
- In addition to the above settings, you may also specify all non-secure client
- settings in the repository settings. In this case, the client settings found in
- the repository settings will be merged with those of the named client used by
- the repository. Conflicts between client and repository settings are resolved
- by the repository settings taking precedence over client settings.
- For example:
- [source,console]
- ----
- PUT _snapshot/my_s3_repository
- {
- "type": "s3",
- "settings": {
- "client": "my_client_name",
- "bucket": "my_bucket_name",
- "endpoint": "my.s3.endpoint"
- }
- }
- ----
- // TEST[skip:we don't have s3 set up while testing this]
- This sets up a repository that uses all client settings from the client
- `my_client_name` except for the `endpoint` that is overridden to
- `my.s3.endpoint` by the repository settings.
- [[repository-s3-permissions]]
- ===== Recommended S3 Permissions
- In order to restrict the Elasticsearch snapshot process to the minimum required
- resources, we recommend using Amazon IAM in conjunction with pre-existing S3
- buckets. Here is an example policy which will allow the snapshot access to an S3
- bucket named "snaps.example.com". This may be configured through the AWS IAM
- console, by creating a Custom Policy, and using a Policy Document similar to
- this (changing snaps.example.com to your bucket name).
- [source,js]
- ----
- {
- "Statement": [
- {
- "Action": [
- "s3:ListBucket",
- "s3:GetBucketLocation",
- "s3:ListBucketMultipartUploads",
- "s3:ListBucketVersions"
- ],
- "Effect": "Allow",
- "Resource": [
- "arn:aws:s3:::snaps.example.com"
- ]
- },
- {
- "Action": [
- "s3:GetObject",
- "s3:PutObject",
- "s3:DeleteObject",
- "s3:AbortMultipartUpload",
- "s3:ListMultipartUploadParts"
- ],
- "Effect": "Allow",
- "Resource": [
- "arn:aws:s3:::snaps.example.com/*"
- ]
- }
- ],
- "Version": "2012-10-17"
- }
- ----
- // NOTCONSOLE
- You may further restrict the permissions by specifying a prefix within the
- bucket, in this example, named "foo".
- [source,js]
- ----
- {
- "Statement": [
- {
- "Action": [
- "s3:ListBucket",
- "s3:GetBucketLocation",
- "s3:ListBucketMultipartUploads",
- "s3:ListBucketVersions"
- ],
- "Condition": {
- "StringLike": {
- "s3:prefix": [
- "foo/*"
- ]
- }
- },
- "Effect": "Allow",
- "Resource": [
- "arn:aws:s3:::snaps.example.com"
- ]
- },
- {
- "Action": [
- "s3:GetObject",
- "s3:PutObject",
- "s3:DeleteObject",
- "s3:AbortMultipartUpload",
- "s3:ListMultipartUploadParts"
- ],
- "Effect": "Allow",
- "Resource": [
- "arn:aws:s3:::snaps.example.com/foo/*"
- ]
- }
- ],
- "Version": "2012-10-17"
- }
- ----
- // NOTCONSOLE
- The bucket needs to exist to register a repository for snapshots. If you did not
- create the bucket then the repository registration will fail.
- [[repository-s3-aws-vpc]]
- [discrete]
- ==== AWS VPC Bandwidth Settings
- AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch
- instances reside in a private subnet in an AWS VPC then all traffic to S3 will
- go through that VPC's NAT instance. If your VPC's NAT instance is a smaller
- instance size (e.g. a t1.micro) or is handling a high volume of network traffic
- your bandwidth to S3 may be limited by that NAT instance's networking bandwidth
- limitations.
- Instances residing in a public subnet in an AWS VPC will connect to S3 via the
- VPC's internet gateway and not be bandwidth limited by the VPC's NAT instance.
|