| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317 | [[modules-cross-cluster-search]]== Cross Cluster SearchThe _cross cluster search_ feature allows any node to act as a federated client acrossmultiple clusters. A cross cluster search node won't join the remote cluster, insteadit connects to a remote cluster in a light fashion in order to executefederated search requests.[float]=== Using cross cluster searchCross-cluster search requires <<modules-remote-clusters,configuring remote clusters>>.[source,js]--------------------------------PUT _cluster/settings{  "persistent": {    "cluster": {      "remote": {        "cluster_one": {          "seeds": [            "127.0.0.1:9300"          ]        },        "cluster_two": {          "seeds": [            "127.0.0.1:9301"          ]        },        "cluster_three": {          "seeds": [            "127.0.0.1:9302"          ]        }      }    }  }}--------------------------------// CONSOLE// TEST[setup:host]// TEST[s/127.0.0.1:9300/\${transport_host}/]To search the `twitter` index on remote cluster `cluster_one` the index namemust be prefixed with the cluster alias separated by a `:` character:[source,js]--------------------------------------------------GET /cluster_one:twitter/_search{  "query": {    "match": {      "user": "kimchy"    }  }}--------------------------------------------------// CONSOLE// TEST[continued]// TEST[setup:twitter][source,js]--------------------------------------------------{  "took": 150,  "timed_out": false,  "_shards": {    "total": 1,    "successful": 1,    "failed": 0,    "skipped": 0  },  "_clusters": {    "total": 1,    "successful": 1,    "skipped": 0  },  "hits": {    "total" : {        "value": 1,        "relation": "eq"    },    "max_score": 1,    "hits": [      {        "_index": "cluster_one:twitter",        "_type": "_doc",        "_id": "0",        "_score": 1,        "_source": {          "user": "kimchy",          "date": "2009-11-15T14:12:12",          "message": "trying out Elasticsearch",          "likes": 0        }      }    ]  }}--------------------------------------------------// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]Indices can also be searched with the same name on different clusters:[source,js]--------------------------------------------------GET /cluster_one:twitter,twitter/_search{  "query": {    "match": {      "user": "kimchy"    }  }}--------------------------------------------------// CONSOLE// TEST[continued]Search results are disambiguated the same way as the indices are disambiguated in the request. Even if index names areidentical these indices will be treated as different indices when results are merged. All results retrieved from aremote indexwill be prefixed with their remote cluster name:[source,js]--------------------------------------------------{  "took": 150,  "timed_out": false,  "num_reduce_phases": 3,  "_shards": {    "total": 2,    "successful": 2,    "failed": 0,    "skipped": 0  },  "_clusters": {    "total": 2,    "successful": 2,    "skipped": 0  },  "hits": {    "total" : {        "value": 2,        "relation": "eq"    },    "max_score": 1,    "hits": [      {        "_index": "cluster_one:twitter",        "_type": "_doc",        "_id": "0",        "_score": 1,        "_source": {          "user": "kimchy",          "date": "2009-11-15T14:12:12",          "message": "trying out Elasticsearch",          "likes": 0        }      },      {        "_index": "twitter",        "_type": "_doc",        "_id": "0",        "_score": 2,        "_source": {          "user": "kimchy",          "date": "2009-11-15T14:12:12",          "message": "trying out Elasticsearch",          "likes": 0        }      }    ]  }}--------------------------------------------------// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/][float]=== Skipping disconnected clustersBy default all remote clusters that are searched via Cross Cluster Search need to be available whenthe search request is executed, otherwise the whole request fails and no search results are returneddespite some of the clusters are available. Remote clusters can be made optional through theboolean `skip_unavailable` setting, set to `false` by default.[source,js]--------------------------------PUT _cluster/settings{  "persistent": {    "cluster.remote.cluster_two.skip_unavailable": true <1>  }}--------------------------------// CONSOLE// TEST[continued]<1> `cluster_two` is made optional[source,js]--------------------------------------------------GET /cluster_one:twitter,cluster_two:twitter,twitter/_search <1>{  "query": {    "match": {      "user": "kimchy"    }  }}--------------------------------------------------// CONSOLE// TEST[continued]<1> Search against the `twitter` index in `cluster_one`, `cluster_two` and also locally[source,js]--------------------------------------------------{  "took": 150,  "timed_out": false,  "num_reduce_phases": 3,  "_shards": {    "total": 2,    "successful": 2,    "failed": 0,    "skipped": 0  },  "_clusters": { <1>    "total": 3,    "successful": 2,    "skipped": 1  },  "hits": {    "total" : {        "value": 2,        "relation": "eq"    },    "max_score": 1,    "hits": [      {        "_index": "cluster_one:twitter",        "_type": "_doc",        "_id": "0",        "_score": 1,        "_source": {          "user": "kimchy",          "date": "2009-11-15T14:12:12",          "message": "trying out Elasticsearch",          "likes": 0        }      },      {        "_index": "twitter",        "_type": "_doc",        "_id": "0",        "_score": 2,        "_source": {          "user": "kimchy",          "date": "2009-11-15T14:12:12",          "message": "trying out Elasticsearch",          "likes": 0        }      }    ]  }}--------------------------------------------------// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]<1> The `clusters` section indicates that one cluster was unavailable and got skipped[float][[ccs-reduction]]=== CCS reduction phaseCross-cluster search requests can be executed in two ways:- the CCS coordinating node minimizes network round-trips by sending one searchrequest to each cluster. Each cluster performs the search independently,reducing and fetching results. Once the CCS node has received all theresponses, it performs another reduction and returns the relevant results backto the user. This strategy is beneficial when there is network latency betweenthe CCS coordinating node and the remote clusters involved, which is typicallythe case. A single request is sent to each remote cluster, at the cost ofretrieving `from` + `size` already fetched results. This is the defaultstrategy, used whenever possible. In case a scroll is provided, or inner hitsare requested as part of field collapsing, this strategy is not supported hencenetwork round-trips cannot be minimized and the following strategy is usedinstead.- the CCS coordinating node sends a <<search-shards,search shards>> request toeach remote cluster, in order to collect information about their correspondingremote indices involved in the search request and the shards where their datais located. Once each cluster has responded to such request, the searchexecutes as if all shards were part of the same cluster. The coordinating nodesends one request to each shard involved, each shard executes the query andreturns its own results which are then reduced (and fetched, depending on the<<search-request-search-type, search type>>) by the CCS coordinating node.This strategy may be beneficial whenever there is very low network latencybetween the CCS coordinating node and the remote clusters involved, as ittreats all shards the same, at the cost of sending many requests to each remotecluster, which is problematic in presence of network latency.The <<search-request-body, search API>> supports the `ccs_minimize_roundtrips`parameter, which defaults to `true` and can be set to `false` in caseminimizing network round-trips is not desirable.Note that all the communication between the nodes, regardless of which clusterthey belong to and the selected reduce mode, happens through the<<modules-transport,transport layer>>.
 |