123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277 |
- [[repository-gcs]]
- === Google Cloud Storage repository
- You can use the https://cloud.google.com/storage/[Google Cloud Storage]
- service as a repository for {ref}/modules-snapshots.html[Snapshot/Restore].
- [[repository-gcs-usage]]
- ==== Getting started
- This repository type uses the https://github.com/GoogleCloudPlatform/google-cloud-java/tree/master/google-cloud-clients/google-cloud-storage[Google Cloud Java Client for Storage]
- to connect to the Storage service. If you are using
- https://cloud.google.com/storage/[Google Cloud Storage] for the first time, you
- must connect to the https://console.cloud.google.com/[Google Cloud Platform Console]
- and create a new project. After your project is created, you must enable the
- Cloud Storage Service for your project.
- [[repository-gcs-creating-bucket]]
- ===== Creating a bucket
- The Google Cloud Storage service uses the concept of a
- https://cloud.google.com/storage/docs/key-terms[bucket] as a container for all
- the data. Buckets are usually created using the
- https://console.cloud.google.com/[Google Cloud Platform Console]. This
- repository type does not automatically create buckets.
- To create a new bucket:
- 1. Connect to the https://console.cloud.google.com/[Google Cloud Platform Console].
- 2. Select your project.
- 3. Go to the https://console.cloud.google.com/storage/browser[Storage Browser].
- 4. Click the *Create Bucket* button.
- 5. Enter the name of the new bucket.
- 6. Select a storage class.
- 7. Select a location.
- 8. Click the *Create* button.
- For more detailed instructions, see the
- https://cloud.google.com/storage/docs/quickstart-console#create_a_bucket[Google Cloud documentation].
- [[repository-gcs-service-authentication]]
- ===== Service authentication
- The repository must authenticate the requests it makes to the Google Cloud Storage
- service. It is common for Google client libraries to employ a strategy named https://cloud.google.com/docs/authentication/production#providing_credentials_to_your_application[application default credentials].
- However, that strategy is only **partially supported** by Elasticsearch. The
- repository operates under the Elasticsearch process, which runs with the security
- manager enabled. The security manager obstructs the "automatic" credential discovery
- when the environment variable `GOOGLE_APPLICATION_CREDENTIALS` is used to point to a
- local file on disk. It can, however, retrieve the service account that is attached to
- the resource that is running Elasticsearch, or fall back to the default service
- account that Compute Engine, Kubernetes Engine or App Engine provide.
- Alternatively, you must configure <<repository-gcs-using-service-account,service account>>
- credentials if you are using an environment that does not support automatic
- credential discovery.
- [[repository-gcs-using-service-account]]
- ===== Using a service account
- You have to obtain and provide https://cloud.google.com/iam/docs/overview#service_account[service account credentials]
- manually.
- For detailed information about generating JSON service account files, see the https://cloud.google.com/storage/docs/authentication?hl=en#service_accounts[Google Cloud documentation].
- Note that the PKCS12 format is not supported by this repository type.
- Here is a summary of the steps:
- 1. Connect to the https://console.cloud.google.com/[Google Cloud Platform Console].
- 2. Select your project.
- 3. Select the https://console.cloud.google.com/iam-admin/serviceaccounts[Service Accounts] tab.
- 4. Click *Create service account*.
- 5. After the account is created, select it and go to *Keys*.
- 6. Select *Add Key* and then *Create new key*.
- 7. Select Key Type *JSON* as P12 is unsupported.
- A JSON service account file looks like this:
- [source,js]
- ----
- {
- "type": "service_account",
- "project_id": "your-project-id",
- "private_key_id": "...",
- "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
- "client_email": "service-account-for-your-repository@your-project-id.iam.gserviceaccount.com",
- "client_id": "...",
- "auth_uri": "https://accounts.google.com/o/oauth2/auth",
- "token_uri": "https://accounts.google.com/o/oauth2/token",
- "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
- "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/your-bucket@your-project-id.iam.gserviceaccount.com"
- }
- ----
- // NOTCONSOLE
- To provide this file to the repository, it must be stored in the {ref}/secure-settings.html[Elasticsearch keystore]. You must
- add a `file` setting with the name `gcs.client.NAME.credentials_file` using the `add-file` subcommand.
- `NAME` is the name of the client configuration for the repository. The implicit client
- name is `default`, but a different client name can be specified in the
- repository settings with the `client` key.
- NOTE: Passing the file path via the GOOGLE_APPLICATION_CREDENTIALS environment
- variable is **not** supported.
- For example, if you added a `gcs.client.my_alternate_client.credentials_file`
- setting in the keystore, you can configure a repository to use those credentials
- like this:
- [source,console]
- ----
- PUT _snapshot/my_gcs_repository
- {
- "type": "gcs",
- "settings": {
- "bucket": "my_bucket",
- "client": "my_alternate_client"
- }
- }
- ----
- // TEST[skip:we don't have gcs setup while testing this]
- The `credentials_file` settings are {ref}/secure-settings.html#reloadable-secure-settings[reloadable].
- After you reload the settings, the internal `gcs` clients, which are used to
- transfer the snapshot contents, utilize the latest settings from the keystore.
- NOTE: Snapshot or restore jobs that are in progress are not preempted by a *reload*
- of the client's `credentials_file` settings. They complete using the client as
- it was built when the operation started.
- [[repository-gcs-client]]
- ==== Client settings
- The client used to connect to Google Cloud Storage has a number of settings available.
- Client setting names are of the form `gcs.client.CLIENT_NAME.SETTING_NAME` and are specified
- inside `elasticsearch.yml`. The default client name looked up by a `gcs` repository is
- called `default`, but can be customized with the repository setting `client`.
- For example:
- [source,console]
- ----
- PUT _snapshot/my_gcs_repository
- {
- "type": "gcs",
- "settings": {
- "bucket": "my_bucket",
- "client": "my_alternate_client"
- }
- }
- ----
- // TEST[skip:we don't have gcs setup while testing this]
- Some settings are sensitive and must be stored in the
- {ref}/secure-settings.html[Elasticsearch keystore]. This is the case for the service account file:
- [source,sh]
- ----
- bin/elasticsearch-keystore add-file gcs.client.default.credentials_file /path/service-account.json
- ----
- The following are the available client settings. Those that must be stored in the keystore
- are marked as `Secure`.
- `credentials_file` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
- The service account file that is used to authenticate to the Google Cloud Storage service.
- `endpoint`::
- The Google Cloud Storage service endpoint to connect to. This will be automatically
- determined by the Google Cloud Storage client but can be specified explicitly.
- `connect_timeout`::
- The timeout to establish a connection to the Google Cloud Storage service. The value should
- specify the unit. For example, a value of `5s` specifies a 5 second timeout. The value of `-1`
- corresponds to an infinite timeout. The default value is 20 seconds.
- `read_timeout`::
- The timeout to read data from an established connection. The value should
- specify the unit. For example, a value of `5s` specifies a 5 second timeout. The value of `-1`
- corresponds to an infinite timeout. The default value is 20 seconds.
- `application_name`::
- Name used by the client when it uses the Google Cloud Storage service. Setting
- a custom name can be useful to authenticate your cluster when requests
- statistics are logged in the Google Cloud Platform. Default to `repository-gcs`
- `project_id`::
- The Google Cloud project id. This will be automatically inferred from the credentials file but
- can be specified explicitly. For example, it can be used to switch between projects when the
- same credentials are usable for both the production and the development projects.
- `proxy.host`::
- Host name of a proxy to connect to the Google Cloud Storage through.
- `proxy.port`::
- Port of a proxy to connect to the Google Cloud Storage through.
- `proxy.type`::
- Proxy type for the client. Supported values are `direct` (no proxy),
- `http`, and `socks`. Defaults to `direct`.
- [[repository-gcs-repository]]
- ==== Repository settings
- The `gcs` repository type supports a number of settings to customize how data
- is stored in Google Cloud Storage.
- These can be specified when creating the repository. For example:
- [source,console]
- ----
- PUT _snapshot/my_gcs_repository
- {
- "type": "gcs",
- "settings": {
- "bucket": "my_other_bucket",
- "base_path": "dev"
- }
- }
- ----
- // TEST[skip:we don't have gcs set up while testing this]
- The following settings are supported:
- `bucket`::
- The name of the bucket to be used for snapshots. (Mandatory)
- `client`::
- The name of the client to use to connect to Google Cloud Storage.
- Defaults to `default`.
- `base_path`::
- Specifies the path within bucket to repository data. Defaults to
- the root of the bucket.
- +
- NOTE: Don't set `base_path` when configuring a snapshot repository for {ECE}.
- {ECE} automatically generates the `base_path` for each deployment so that
- multiple deployments may share the same bucket.
- `chunk_size`::
- Big files can be broken down into multiple smaller blobs in the blob store during snapshotting.
- It is not recommended to change this value from its default unless there is an explicit reason for limiting the
- size of blobs in the repository. Setting a value lower than the default can result in an increased number of API
- calls to the Google Cloud Storage Service during snapshot create as well as restore operations compared to using
- the default value and thus make both operations slower as well as more costly.
- Specify the chunk size as a value and unit, for example:
- `10MB`, `5KB`, `500B`. Defaults to the maximum size of a blob in the Google Cloud Storage Service which is `5TB`.
- `compress`::
- When set to `true` metadata files are stored in compressed format. This
- setting doesn't affect index files that are already compressed by default.
- Defaults to `true`.
- include::repository-shared-settings.asciidoc[]
- `application_name`::
- deprecated:[6.3.0, "This setting is now defined in the <<repository-gcs-client, client settings>>."]
- Name used by the client when it uses the Google Cloud Storage service.
- [[repository-gcs-bucket-permission]]
- ===== Recommended bucket permission
- The service account used to access the bucket must have the "Writer" access to the bucket:
- 1. Connect to the https://console.cloud.google.com/[Google Cloud Platform Console].
- 2. Select your project.
- 3. Go to the https://console.cloud.google.com/storage/browser[Storage Browser].
- 4. Select the bucket and "Edit bucket permission".
- 5. The service account must be configured as a "User" with "Writer" access.
|