123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313 |
- [[repository-s3]]
- === S3 Repository Plugin
- The S3 repository plugin adds support for using S3 as a repository for
- {ref}/modules-snapshots.html[Snapshot/Restore].
- *If you are looking for a hosted solution of Elasticsearch on AWS, please visit http://www.elastic.co/cloud.*
- :plugin_name: repository-s3
- include::install_remove.asciidoc[]
- [[repository-s3-usage]]
- ==== Getting started with AWS
- The plugin provides a repository type named `s3` which may be used when creating a repository.
- The repository defaults to using https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html[ECS IAM Role] or
- http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html[EC2 IAM Role]
- credentials for authentication. The only mandatory setting is the bucket name:
- [source,js]
- ----
- PUT _snapshot/my_s3_repository
- {
- "type": "s3",
- "settings": {
- "bucket": "my_bucket"
- }
- }
- ----
- // CONSOLE
- // TEST[skip:we don't have s3 setup while testing this]
- [[repository-s3-client]]
- ==== Client Settings
- The client used to connect to S3 has a number of settings available. Client setting names are of
- the form `s3.client.CLIENT_NAME.SETTING_NAME`. The default client name, which is looked up by
- an `s3` repository, is called `default`. It can be modified using the
- <<repository-s3-repository, repository setting>> `client`. For example:
- [source,js]
- ----
- PUT _snapshot/my_s3_repository
- {
- "type": "s3",
- "settings": {
- "bucket": "my_bucket",
- "client": "my_alternate_client"
- }
- }
- ----
- // CONSOLE
- // TEST[skip:we don't have s3 setup while testing this]
- Most client settings are specified inside `elasticsearch.yml`, but some are
- sensitive and must be stored in the {ref}/secure-settings.html[elasticsearch keystore].
- For example, before you start the node, run these commands to add AWS access
- key settings to the keystore:
- [source,sh]
- ----
- bin/elasticsearch-keystore add s3.client.default.access_key
- bin/elasticsearch-keystore add s3.client.default.secret_key
- ----
- *All* client secure settings of this plugin are
- {ref}/secure-settings.html#reloadable-secure-settings[reloadable]. After you
- reload the settings, the internal `s3` clients, used to transfer the snapshot
- contents, will utilize the latest settings from the keystore. Any existing `s3`
- repositories, as well as any newly created ones, will pick up the new values
- stored in the keystore.
- NOTE: In progress snapshot/restore tasks will not be preempted by a *reload*
- of the client's secure settings. The task will complete using the client as it
- was built when the operation started.
- The following is the list of all the available client settings.
- Those that must be stored in the keystore are marked as `Secure` and are *reloadable*.
- `access_key`::
- An s3 access key. The `secret_key` setting must also be specified. (Secure)
- `secret_key`::
- An s3 secret key. The `access_key` setting must also be specified. (Secure)
- `session_token`::
- An s3 session token. The `access_key` and `secret_key` settings must also
- be specified. (Secure)
- `endpoint`::
- The s3 service endpoint to connect to. This will be automatically
- figured out by the s3 client based on the bucket location, but
- can be specified explicitly. See http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region.
- `protocol`::
- The protocol to use to connect to s3. Valid values are either `http`
- or `https`. Defaults to `https`.
- `proxy.host`::
- The host name of a proxy to connect to s3 through.
- `proxy.port`::
- The port of a proxy to connect to s3 through.
- `proxy.username`::
- The username to connect to the `proxy.host` with. (Secure)
- `proxy.password`::
- The password to connect to the `proxy.host` with. (Secure)
- `read_timeout`::
- The socket timeout for connecting to s3. The value should specify the unit. For example,
- a value of `5s` specifies a 5 second timeout. The default value is 50 seconds.
- `max_retries`::
- The number of retries to use when an s3 request fails. The default value is 3.
- `use_throttle_retries`::
- Whether retries should be throttled (ie use backoff). Must be `true` or `false`. Defaults to `true`.
- [[repository-s3-repository]]
- ==== Repository Settings
- The `s3` repository type supports a number of settings to customize how data is stored in S3.
- These can be specified when creating the repository. For example:
- [source,js]
- ----
- PUT _snapshot/my_s3_repository
- {
- "type": "s3",
- "settings": {
- "bucket": "my_bucket_name",
- "another_setting": "setting_value"
- }
- }
- ----
- // CONSOLE
- // TEST[skip:we don't have s3 set up while testing this]
- The following settings are supported:
- `bucket`::
- The name of the bucket to be used for snapshots. (Mandatory)
- `client`::
- The name of the s3 client to use to connect to S3. Defaults to `default`.
- `base_path`::
- Specifies the path within bucket to repository data. Defaults to
- value of `repositories.s3.base_path` or to root directory if not set.
- Previously, the base_path could take a leading `/` (forward slash).
- However, this has been deprecated and setting the base_path now should
- omit the leading `/`.
- `chunk_size`::
- Big files can be broken down into chunks during snapshotting if needed.
- The chunk size can be specified in bytes or by using size value notation,
- i.e. `1gb`, `10mb`, `5kb`. Defaults to `1gb`.
- `compress`::
- When set to `true` metadata files are stored in compressed format. This
- setting doesn't affect index files that are already compressed by default.
- Defaults to `false`.
- `server_side_encryption`::
- When set to `true` files are encrypted on server side using AES256
- algorithm. Defaults to `false`.
- `buffer_size`::
- Minimum threshold below which the chunk is uploaded using a single
- request. Beyond this threshold, the S3 repository will use the
- http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html[AWS Multipart Upload API]
- to split the chunk into several parts, each of `buffer_size` length, and
- to upload each part in its own request. Note that setting a buffer
- size lower than `5mb` is not allowed since it will prevent the use of the
- Multipart API and may result in upload errors. It is also not possible to
- set a buffer size greater than `5gb` as it is the maximum upload size
- allowed by S3. Defaults to the minimum between `100mb` and `5%` of the heap size.
- `canned_acl`::
- The S3 repository supports all http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl[S3 canned ACLs]
- : `private`, `public-read`, `public-read-write`, `authenticated-read`, `log-delivery-write`,
- `bucket-owner-read`, `bucket-owner-full-control`. Defaults to `private`.
- You could specify a canned ACL using the `canned_acl` setting. When the S3 repository
- creates buckets and objects, it adds the canned ACL into the buckets and objects.
- `storage_class`::
- Sets the S3 storage class type for the backup files. Values may be
- `standard`, `reduced_redundancy`, `standard_ia`. Defaults to `standard`.
- Due to the extra complexity with the Glacier class lifecycle, it is not
- currently supported by the plugin. For more information about the
- different classes, see http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html[AWS Storage Classes Guide]
- [[repository-s3-permissions]]
- ===== Recommended S3 Permissions
- In order to restrict the Elasticsearch snapshot process to the minimum required resources, we recommend using Amazon
- IAM in conjunction with pre-existing S3 buckets. Here is an example policy which will allow the snapshot access to an
- S3 bucket named "snaps.example.com". This may be configured through the AWS IAM console, by creating a Custom Policy,
- and using a Policy Document similar to this (changing snaps.example.com to your bucket name).
- [source,js]
- ----
- {
- "Statement": [
- {
- "Action": [
- "s3:ListBucket",
- "s3:GetBucketLocation",
- "s3:ListBucketMultipartUploads",
- "s3:ListBucketVersions"
- ],
- "Effect": "Allow",
- "Resource": [
- "arn:aws:s3:::snaps.example.com"
- ]
- },
- {
- "Action": [
- "s3:GetObject",
- "s3:PutObject",
- "s3:DeleteObject",
- "s3:AbortMultipartUpload",
- "s3:ListMultipartUploadParts"
- ],
- "Effect": "Allow",
- "Resource": [
- "arn:aws:s3:::snaps.example.com/*"
- ]
- }
- ],
- "Version": "2012-10-17"
- }
- ----
- // NOTCONSOLE
- You may further restrict the permissions by specifying a prefix within the bucket, in this example, named "foo".
- [source,js]
- ----
- {
- "Statement": [
- {
- "Action": [
- "s3:ListBucket",
- "s3:GetBucketLocation",
- "s3:ListBucketMultipartUploads",
- "s3:ListBucketVersions"
- ],
- "Condition": {
- "StringLike": {
- "s3:prefix": [
- "foo/*"
- ]
- }
- },
- "Effect": "Allow",
- "Resource": [
- "arn:aws:s3:::snaps.example.com"
- ]
- },
- {
- "Action": [
- "s3:GetObject",
- "s3:PutObject",
- "s3:DeleteObject",
- "s3:AbortMultipartUpload",
- "s3:ListMultipartUploadParts"
- ],
- "Effect": "Allow",
- "Resource": [
- "arn:aws:s3:::snaps.example.com/foo/*"
- ]
- }
- ],
- "Version": "2012-10-17"
- }
- ----
- // NOTCONSOLE
- The bucket needs to exist to register a repository for snapshots. If you did not create the bucket then the repository
- registration will fail.
- [[repository-s3-aws-vpc]]
- [float]
- ==== AWS VPC Bandwidth Settings
- AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch instances reside in a private subnet in an AWS VPC then all traffic to S3 will go through that VPC's NAT instance. If your VPC's NAT instance is a smaller instance size (e.g. a t1.micro) or is handling a high volume of network traffic your bandwidth to S3 may be limited by that NAT instance's networking bandwidth limitations.
- Instances residing in a public subnet in an AWS VPC will connect to S3 via the VPC's internet gateway and not be bandwidth limited by the VPC's NAT instance.
|