123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431 |
- [[modules-cross-cluster-search]]
- == Cross Cluster Search
- The _cross cluster search_ feature allows any node to act as a federated client across
- multiple clusters. A cross cluster search node won't join the remote cluster, instead
- it connects to a remote cluster in a light fashion in order to execute
- federated search requests.
- Cross cluster search works by configuring a remote cluster in the cluster state and connecting only to a
- limited number of nodes in the remote cluster. Each remote cluster is referenced by a name and a list of seed nodes.
- When a remote cluster is registered, its cluster state is retrieved from one of the seed nodes so that up to 3
- _gateway nodes_ are selected to be connected to as part of upcoming cross cluster search requests.
- Cross cluster search requests consist of uni-directional connections from the coordinating node to the previously
- selected remote nodes only. It is possible to tag which nodes should be selected through
- node attributes (see <<cross-cluster-search-settings>>).
- Each node in a cluster that has remote clusters configured connects to one or more _gateway nodes_ and uses
- them to federate search requests to the remote cluster.
- [float]
- === Configuring Cross Cluster Search
- Remote clusters can be specified globally using <<cluster-update-settings,cluster settings>>
- (which can be updated dynamically), or local to individual nodes using the
- `elasticsearch.yml` file.
- If a remote cluster is configured via `elasticsearch.yml` only the nodes with
- that configuration will be able to connect to the remote cluster. In other
- words, federated search requests will have to be sent specifically to those
- nodes. Remote clusters set via the <<cluster-update-settings,cluster settings API>>
- will be available on every node in the cluster.
- [WARNING]
- This feature was added as Beta in Elasticsearch `v5.3` with further improvements made in 5.4 and 5.5. It requires gateway eligible nodes to be on `v5.5` onwards.
- The `elasticsearch.yml` config file for a _cross cluster search_ node just needs to list the
- remote clusters that should be connected to, for instance:
- [source,yaml]
- --------------------------------
- search:
- remote:
- cluster_one: <1>
- seeds: 127.0.0.1:9300
- cluster_two: <1>
- seeds: 127.0.0.1:9301
- --------------------------------
- <1> `cluster_one` and `cluster_two` are arbitrary cluster aliases representing the connection to each cluster.
- These names are subsequently used to distinguish between local and remote indices.
- The equivalent example using the <<cluster-update-settings,cluster settings API>>
- to add remote clusters to all nodes in the cluster would look like the
- following:
- [source,js]
- --------------------------------
- PUT _cluster/settings
- {
- "persistent": {
- "search": {
- "remote": {
- "cluster_one": {
- "seeds": [
- "127.0.0.1:9300"
- ]
- },
- "cluster_two": {
- "seeds": [
- "127.0.0.1:9301"
- ]
- },
- "cluster_three": {
- "seeds": [
- "127.0.0.1:9302"
- ]
- }
- }
- }
- }
- }
- --------------------------------
- // CONSOLE
- // TEST[setup:host]
- // TEST[s/127.0.0.1:9300/\${transport_host}/]
- //////////////////////////
- We want to be sure that settings have been updated,
- because we'll use them later.
- [source,js]
- --------------------------------------------------
- {
- "acknowledged" : true,
- "persistent": {
- "search": {
- "remote": {
- "cluster_one": {
- "seeds": [
- "127.0.0.1:9300"
- ]
- },
- "cluster_two": {
- "seeds": [
- "127.0.0.1:9301"
- ]
- },
- "cluster_three": {
- "seeds": [
- "127.0.0.1:9302"
- ]
- }
- }
- }
- },
- "transient" : {}
- }
- --------------------------------------------------
- // TESTRESPONSE[s/127.0.0.1:9300/\${transport_host}/]
- //////////////////////////
- A remote cluster can be deleted from the cluster settings by setting its seeds to `null`:
- [source,js]
- --------------------------------
- PUT _cluster/settings
- {
- "persistent": {
- "search": {
- "remote": {
- "cluster_three": {
- "seeds": null <1>
- }
- }
- }
- }
- }
- --------------------------------
- // CONSOLE
- // TEST[continued]
- <1> `cluster_three` would be removed from the cluster settings, leaving `cluster_one` and `cluster_two` intact.
- //////////////////////////
- We want to be sure that settings have been updated,
- because we'll use them later.
- [source,js]
- --------------------------------------------------
- {
- "acknowledged" : true,
- "persistent" : {},
- "transient" : {}
- }
- --------------------------------------------------
- // TESTRESPONSE
- //////////////////////////
- [float]
- === Using cross cluster search
- To search the `twitter` index on remote cluster `cluster_one` the index name
- must be prefixed with the cluster alias separated by a `:` character:
- [source,js]
- --------------------------------------------------
- GET /cluster_one:twitter/_search
- {
- "query": {
- "match": {
- "user": "kimchy"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- // TEST[setup:twitter]
- [source,js]
- --------------------------------------------------
- {
- "took": 150,
- "timed_out": false,
- "_shards": {
- "total": 1,
- "successful": 1,
- "failed": 0,
- "skipped": 0
- },
- "_clusters": {
- "total": 1,
- "successful": 1,
- "skipped": 0
- },
- "hits": {
- "total": 1,
- "max_score": 1,
- "hits": [
- {
- "_index": "cluster_one:twitter",
- "_type": "_doc",
- "_id": "0",
- "_score": 1,
- "_source": {
- "user": "kimchy",
- "date": "2009-11-15T14:12:12",
- "message": "trying out Elasticsearch",
- "likes": 0
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
- // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
- // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
- Indices can also be searched with the same name on different clusters:
- [source,js]
- --------------------------------------------------
- GET /cluster_one:twitter,twitter/_search
- {
- "query": {
- "match": {
- "user": "kimchy"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- Search results are disambiguated the same way as the indices are disambiguated in the request. Even if index names are
- identical these indices will be treated as different indices when results are merged. All results retrieved from a
- remote index
- will be prefixed with their remote cluster name:
- [source,js]
- --------------------------------------------------
- {
- "took": 150,
- "timed_out": false,
- "_shards": {
- "total": 2,
- "successful": 2,
- "failed": 0,
- "skipped": 0
- },
- "_clusters": {
- "total": 2,
- "successful": 2,
- "skipped": 0
- },
- "hits": {
- "total": 2,
- "max_score": 1,
- "hits": [
- {
- "_index": "cluster_one:twitter",
- "_type": "_doc",
- "_id": "0",
- "_score": 1,
- "_source": {
- "user": "kimchy",
- "date": "2009-11-15T14:12:12",
- "message": "trying out Elasticsearch",
- "likes": 0
- }
- },
- {
- "_index": "twitter",
- "_type": "_doc",
- "_id": "0",
- "_score": 2,
- "_source": {
- "user": "kimchy",
- "date": "2009-11-15T14:12:12",
- "message": "trying out Elasticsearch",
- "likes": 0
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
- // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
- // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
- // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
- [float]
- === Skipping disconnected clusters
- By default all remote clusters that are searched via Cross Cluster Search need to be available when
- the search request is executed, otherwise the whole request fails and no search results are returned
- despite some of the clusters are available. Remote clusters can be made optional through the
- boolean `skip_unavailable` setting, set to `false` by default.
- [source,js]
- --------------------------------
- PUT _cluster/settings
- {
- "persistent": {
- "search.remote.cluster_two.skip_unavailable": true <1>
- }
- }
- --------------------------------
- // CONSOLE
- // TEST[continued]
- <1> `cluster_two` is made optional
- [source,js]
- --------------------------------------------------
- GET /cluster_one:twitter,cluster_two:twitter,twitter/_search <1>
- {
- "query": {
- "match": {
- "user": "kimchy"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- <1> Search against the `twitter` index in `cluster_one`, `cluster_two` and also locally
- [source,js]
- --------------------------------------------------
- {
- "took": 150,
- "timed_out": false,
- "_shards": {
- "total": 2,
- "successful": 2,
- "failed": 0,
- "skipped": 0
- },
- "_clusters": { <1>
- "total": 3,
- "successful": 2,
- "skipped": 1
- },
- "hits": {
- "total": 2,
- "max_score": 1,
- "hits": [
- {
- "_index": "cluster_one:twitter",
- "_type": "_doc",
- "_id": "0",
- "_score": 1,
- "_source": {
- "user": "kimchy",
- "date": "2009-11-15T14:12:12",
- "message": "trying out Elasticsearch",
- "likes": 0
- }
- },
- {
- "_index": "twitter",
- "_type": "_doc",
- "_id": "0",
- "_score": 2,
- "_source": {
- "user": "kimchy",
- "date": "2009-11-15T14:12:12",
- "message": "trying out Elasticsearch",
- "likes": 0
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
- // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
- // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
- // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
- <1> The `clusters` section indicates that one cluster was unavailable and got skipped
- [float]
- [[cross-cluster-search-settings]]
- === Cross cluster search settings
- `search.remote.connections_per_cluster`::
- The number of nodes to connect to per remote cluster. The default is `3`.
- `search.remote.initial_connect_timeout`::
- The time to wait for remote connections to be established when the node starts. The default is `30s`.
- `search.remote.node.attr`::
- A node attribute to filter out nodes that are eligible as a gateway node in
- the remote cluster. For instance a node can have a node attribute
- `node.attr.gateway: true` such that only nodes with this attribute will be
- connected to if `search.remote.node.attr` is set to `gateway`.
- `search.remote.connect`::
- By default, any node in the cluster can act as a cross-cluster client and
- connect to remote clusters. The `search.remote.connect` setting can be set
- to `false` (defaults to `true`) to prevent certain nodes from connecting to
- remote clusters. Cross-cluster search requests must be sent to a node that
- is allowed to act as a cross-cluster client.
- `search.remote.${cluster_alias}.skip_unavailable`::
- Per cluster boolean setting that allows to skip specific clusters when no
- nodes belonging to them are available and they are searched as part of a
- cross cluster search request. Default is `false`, meaning that all clusters
- are mandatory by default, but they can selectively be made optional by
- setting this setting to `true`.
- [float]
- [[retrieve-remote-clusters-info]]
- === Retrieving remote clusters info
- The <<cluster-remote-info, Remote Cluster Info API>> allows to retrieve
- information about the configured remote clusters, as well as the remote
- nodes that the Cross Cluster Search node is connected to.
|