123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317 |
- [[modules-cross-cluster-search]]
- == {ccs-cap}
- The _{ccs}_ feature allows any node to act as a federated client across
- multiple clusters. A {ccs} node won't join the remote cluster, instead
- it connects to a remote cluster in a light fashion in order to execute
- federated search requests. For details on communication and compatibility
- between different clusters, see <<modules-remote-clusters>>.
- [float]
- === Using {ccs}
- {ccs-cap} requires <<modules-remote-clusters,configuring remote clusters>>.
- [source,js]
- --------------------------------
- PUT _cluster/settings
- {
- "persistent": {
- "cluster": {
- "remote": {
- "cluster_one": {
- "seeds": [
- "127.0.0.1:9300"
- ]
- },
- "cluster_two": {
- "seeds": [
- "127.0.0.1:9301"
- ]
- },
- "cluster_three": {
- "seeds": [
- "127.0.0.1:9302"
- ]
- }
- }
- }
- }
- }
- --------------------------------
- // CONSOLE
- // TEST[setup:host]
- // TEST[s/127.0.0.1:9300/\${transport_host}/]
- To search the `twitter` index on remote cluster `cluster_one` the index name
- must be prefixed with the alias of the remote cluster followed by the `:`
- character:
- [source,js]
- --------------------------------------------------
- GET /cluster_one:twitter/_search
- {
- "query": {
- "match": {
- "user": "kimchy"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- // TEST[setup:twitter]
- [source,js]
- --------------------------------------------------
- {
- "took": 150,
- "timed_out": false,
- "_shards": {
- "total": 1,
- "successful": 1,
- "failed": 0,
- "skipped": 0
- },
- "_clusters": {
- "total": 1,
- "successful": 1,
- "skipped": 0
- },
- "hits": {
- "total" : {
- "value": 1,
- "relation": "eq"
- },
- "max_score": 1,
- "hits": [
- {
- "_index": "cluster_one:twitter",
- "_type": "_doc",
- "_id": "0",
- "_score": 1,
- "_source": {
- "user": "kimchy",
- "date": "2009-11-15T14:12:12",
- "message": "trying out Elasticsearch",
- "likes": 0
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
- // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
- // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
- Indices with the same name on different clusters can also be searched:
- [source,js]
- --------------------------------------------------
- GET /cluster_one:twitter,twitter/_search
- {
- "query": {
- "match": {
- "user": "kimchy"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- Search results are disambiguated the same way as the indices are disambiguated in the request.
- Indices with same names are treated as different indices when results are merged. All results
- retrieved from an index located in a remote cluster are prefixed with their corresponding
- cluster alias:
- [source,js]
- --------------------------------------------------
- {
- "took": 150,
- "timed_out": false,
- "num_reduce_phases": 3,
- "_shards": {
- "total": 2,
- "successful": 2,
- "failed": 0,
- "skipped": 0
- },
- "_clusters": {
- "total": 2,
- "successful": 2,
- "skipped": 0
- },
- "hits": {
- "total" : {
- "value": 2,
- "relation": "eq"
- },
- "max_score": 1,
- "hits": [
- {
- "_index": "twitter",
- "_type": "_doc",
- "_id": "0",
- "_score": 2,
- "_source": {
- "user": "kimchy",
- "date": "2009-11-15T14:12:12",
- "message": "trying out Elasticsearch",
- "likes": 0
- }
- },
- {
- "_index": "cluster_one:twitter",
- "_type": "_doc",
- "_id": "0",
- "_score": 1,
- "_source": {
- "user": "kimchy",
- "date": "2009-11-15T14:12:12",
- "message": "trying out Elasticsearch",
- "likes": 0
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
- // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
- // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
- // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
- [float]
- === Skipping disconnected clusters
- By default, all remote clusters that are searched via {ccs} need to be
- available when the search request is executed. Otherwise, the whole request
- fails; even if some of the clusters are available, no search results are
- returned. You can use the boolean `skip_unavailable` setting to make remote
- clusters optional. By default, it is set to `false`.
- [source,js]
- --------------------------------
- PUT _cluster/settings
- {
- "persistent": {
- "cluster.remote.cluster_two.skip_unavailable": true <1>
- }
- }
- --------------------------------
- // CONSOLE
- // TEST[continued]
- <1> `cluster_two` is made optional
- [source,js]
- --------------------------------------------------
- GET /cluster_one:twitter,cluster_two:twitter,twitter/_search <1>
- {
- "query": {
- "match": {
- "user": "kimchy"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- <1> Search against the `twitter` index in `cluster_one`, `cluster_two` and also locally
- [source,js]
- --------------------------------------------------
- {
- "took": 150,
- "timed_out": false,
- "num_reduce_phases": 3,
- "_shards": {
- "total": 2,
- "successful": 2,
- "failed": 0,
- "skipped": 0
- },
- "_clusters": { <1>
- "total": 3,
- "successful": 2,
- "skipped": 1
- },
- "hits": {
- "total" : {
- "value": 2,
- "relation": "eq"
- },
- "max_score": 1,
- "hits": [
- {
- "_index": "twitter",
- "_type": "_doc",
- "_id": "0",
- "_score": 2,
- "_source": {
- "user": "kimchy",
- "date": "2009-11-15T14:12:12",
- "message": "trying out Elasticsearch",
- "likes": 0
- }
- },
- {
- "_index": "cluster_one:twitter",
- "_type": "_doc",
- "_id": "0",
- "_score": 1,
- "_source": {
- "user": "kimchy",
- "date": "2009-11-15T14:12:12",
- "message": "trying out Elasticsearch",
- "likes": 0
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
- // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
- // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
- // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
- <1> The `clusters` section indicates that one cluster was unavailable and got skipped
- [float]
- [[ccs-reduction]]
- === {ccs-cap} reduction phase
- {ccs-cap} (CCS) requests can be executed in two ways:
- - the CCS coordinating node minimizes network round-trips by sending one search
- request to each cluster. Each cluster performs the search independently,
- reducing and fetching results. Once the CCS node has received all the
- responses, it performs another reduction and returns the relevant results back
- to the user. This strategy is beneficial when there is network latency between
- the CCS coordinating node and the remote clusters involved, which is typically
- the case. A single request is sent to each remote cluster, at the cost of
- retrieving `from` + `size` already fetched results. This is the default
- strategy, used whenever possible. In case a scroll is provided, or inner hits
- are requested as part of field collapsing, this strategy is not supported hence
- network round-trips cannot be minimized and the following strategy is used
- instead.
- - the CCS coordinating node sends a <<search-shards,search shards>> request to
- each remote cluster, in order to collect information about their corresponding
- remote indices involved in the search request and the shards where their data
- is located. Once each cluster has responded to such request, the search
- executes as if all shards were part of the same cluster. The coordinating node
- sends one request to each shard involved, each shard executes the query and
- returns its own results which are then reduced (and fetched, depending on the
- <<request-body-search-search-type, search type>>) by the CCS coordinating node.
- This strategy may be beneficial whenever there is very low network latency
- between the CCS coordinating node and the remote clusters involved, as it
- treats all shards the same, at the cost of sending many requests to each remote
- cluster, which is problematic in presence of network latency.
- The <<search-request-body, search API>> supports the `ccs_minimize_roundtrips`
- parameter, which defaults to `true` and can be set to `false` in case
- minimizing network round-trips is not desirable.
|