cross-cluster-search.asciidoc 8.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317
  1. [[modules-cross-cluster-search]]
  2. == {ccs-cap}
  3. The _{ccs}_ feature allows any node to act as a federated client across
  4. multiple clusters. A {ccs} node won't join the remote cluster, instead
  5. it connects to a remote cluster in a light fashion in order to execute
  6. federated search requests. For details on communication and compatibility
  7. between different clusters, see <<modules-remote-clusters>>.
  8. [float]
  9. === Using {ccs}
  10. {ccs-cap} requires <<modules-remote-clusters,configuring remote clusters>>.
  11. [source,js]
  12. --------------------------------
  13. PUT _cluster/settings
  14. {
  15. "persistent": {
  16. "cluster": {
  17. "remote": {
  18. "cluster_one": {
  19. "seeds": [
  20. "127.0.0.1:9300"
  21. ]
  22. },
  23. "cluster_two": {
  24. "seeds": [
  25. "127.0.0.1:9301"
  26. ]
  27. },
  28. "cluster_three": {
  29. "seeds": [
  30. "127.0.0.1:9302"
  31. ]
  32. }
  33. }
  34. }
  35. }
  36. }
  37. --------------------------------
  38. // CONSOLE
  39. // TEST[setup:host]
  40. // TEST[s/127.0.0.1:9300/\${transport_host}/]
  41. To search the `twitter` index on remote cluster `cluster_one` the index name
  42. must be prefixed with the alias of the remote cluster followed by the `:`
  43. character:
  44. [source,js]
  45. --------------------------------------------------
  46. GET /cluster_one:twitter/_search
  47. {
  48. "query": {
  49. "match": {
  50. "user": "kimchy"
  51. }
  52. }
  53. }
  54. --------------------------------------------------
  55. // CONSOLE
  56. // TEST[continued]
  57. // TEST[setup:twitter]
  58. [source,js]
  59. --------------------------------------------------
  60. {
  61. "took": 150,
  62. "timed_out": false,
  63. "_shards": {
  64. "total": 1,
  65. "successful": 1,
  66. "failed": 0,
  67. "skipped": 0
  68. },
  69. "_clusters": {
  70. "total": 1,
  71. "successful": 1,
  72. "skipped": 0
  73. },
  74. "hits": {
  75. "total" : {
  76. "value": 1,
  77. "relation": "eq"
  78. },
  79. "max_score": 1,
  80. "hits": [
  81. {
  82. "_index": "cluster_one:twitter",
  83. "_type": "_doc",
  84. "_id": "0",
  85. "_score": 1,
  86. "_source": {
  87. "user": "kimchy",
  88. "date": "2009-11-15T14:12:12",
  89. "message": "trying out Elasticsearch",
  90. "likes": 0
  91. }
  92. }
  93. ]
  94. }
  95. }
  96. --------------------------------------------------
  97. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  98. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  99. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  100. Indices with the same name on different clusters can also be searched:
  101. [source,js]
  102. --------------------------------------------------
  103. GET /cluster_one:twitter,twitter/_search
  104. {
  105. "query": {
  106. "match": {
  107. "user": "kimchy"
  108. }
  109. }
  110. }
  111. --------------------------------------------------
  112. // CONSOLE
  113. // TEST[continued]
  114. Search results are disambiguated the same way as the indices are disambiguated in the request.
  115. Indices with same names are treated as different indices when results are merged. All results
  116. retrieved from an index located in a remote cluster are prefixed with their corresponding
  117. cluster alias:
  118. [source,js]
  119. --------------------------------------------------
  120. {
  121. "took": 150,
  122. "timed_out": false,
  123. "num_reduce_phases": 3,
  124. "_shards": {
  125. "total": 2,
  126. "successful": 2,
  127. "failed": 0,
  128. "skipped": 0
  129. },
  130. "_clusters": {
  131. "total": 2,
  132. "successful": 2,
  133. "skipped": 0
  134. },
  135. "hits": {
  136. "total" : {
  137. "value": 2,
  138. "relation": "eq"
  139. },
  140. "max_score": 1,
  141. "hits": [
  142. {
  143. "_index": "twitter",
  144. "_type": "_doc",
  145. "_id": "0",
  146. "_score": 2,
  147. "_source": {
  148. "user": "kimchy",
  149. "date": "2009-11-15T14:12:12",
  150. "message": "trying out Elasticsearch",
  151. "likes": 0
  152. }
  153. },
  154. {
  155. "_index": "cluster_one:twitter",
  156. "_type": "_doc",
  157. "_id": "0",
  158. "_score": 1,
  159. "_source": {
  160. "user": "kimchy",
  161. "date": "2009-11-15T14:12:12",
  162. "message": "trying out Elasticsearch",
  163. "likes": 0
  164. }
  165. }
  166. ]
  167. }
  168. }
  169. --------------------------------------------------
  170. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  171. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  172. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  173. // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
  174. [float]
  175. === Skipping disconnected clusters
  176. By default, all remote clusters that are searched via {ccs} need to be
  177. available when the search request is executed. Otherwise, the whole request
  178. fails; even if some of the clusters are available, no search results are
  179. returned. You can use the boolean `skip_unavailable` setting to make remote
  180. clusters optional. By default, it is set to `false`.
  181. [source,js]
  182. --------------------------------
  183. PUT _cluster/settings
  184. {
  185. "persistent": {
  186. "cluster.remote.cluster_two.skip_unavailable": true <1>
  187. }
  188. }
  189. --------------------------------
  190. // CONSOLE
  191. // TEST[continued]
  192. <1> `cluster_two` is made optional
  193. [source,js]
  194. --------------------------------------------------
  195. GET /cluster_one:twitter,cluster_two:twitter,twitter/_search <1>
  196. {
  197. "query": {
  198. "match": {
  199. "user": "kimchy"
  200. }
  201. }
  202. }
  203. --------------------------------------------------
  204. // CONSOLE
  205. // TEST[continued]
  206. <1> Search against the `twitter` index in `cluster_one`, `cluster_two` and also locally
  207. [source,js]
  208. --------------------------------------------------
  209. {
  210. "took": 150,
  211. "timed_out": false,
  212. "num_reduce_phases": 3,
  213. "_shards": {
  214. "total": 2,
  215. "successful": 2,
  216. "failed": 0,
  217. "skipped": 0
  218. },
  219. "_clusters": { <1>
  220. "total": 3,
  221. "successful": 2,
  222. "skipped": 1
  223. },
  224. "hits": {
  225. "total" : {
  226. "value": 2,
  227. "relation": "eq"
  228. },
  229. "max_score": 1,
  230. "hits": [
  231. {
  232. "_index": "twitter",
  233. "_type": "_doc",
  234. "_id": "0",
  235. "_score": 2,
  236. "_source": {
  237. "user": "kimchy",
  238. "date": "2009-11-15T14:12:12",
  239. "message": "trying out Elasticsearch",
  240. "likes": 0
  241. }
  242. },
  243. {
  244. "_index": "cluster_one:twitter",
  245. "_type": "_doc",
  246. "_id": "0",
  247. "_score": 1,
  248. "_source": {
  249. "user": "kimchy",
  250. "date": "2009-11-15T14:12:12",
  251. "message": "trying out Elasticsearch",
  252. "likes": 0
  253. }
  254. }
  255. ]
  256. }
  257. }
  258. --------------------------------------------------
  259. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  260. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  261. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  262. // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
  263. <1> The `clusters` section indicates that one cluster was unavailable and got skipped
  264. [float]
  265. [[ccs-reduction]]
  266. === {ccs-cap} reduction phase
  267. {ccs-cap} (CCS) requests can be executed in two ways:
  268. - the CCS coordinating node minimizes network round-trips by sending one search
  269. request to each cluster. Each cluster performs the search independently,
  270. reducing and fetching results. Once the CCS node has received all the
  271. responses, it performs another reduction and returns the relevant results back
  272. to the user. This strategy is beneficial when there is network latency between
  273. the CCS coordinating node and the remote clusters involved, which is typically
  274. the case. A single request is sent to each remote cluster, at the cost of
  275. retrieving `from` + `size` already fetched results. This is the default
  276. strategy, used whenever possible. In case a scroll is provided, or inner hits
  277. are requested as part of field collapsing, this strategy is not supported hence
  278. network round-trips cannot be minimized and the following strategy is used
  279. instead.
  280. - the CCS coordinating node sends a <<search-shards,search shards>> request to
  281. each remote cluster, in order to collect information about their corresponding
  282. remote indices involved in the search request and the shards where their data
  283. is located. Once each cluster has responded to such request, the search
  284. executes as if all shards were part of the same cluster. The coordinating node
  285. sends one request to each shard involved, each shard executes the query and
  286. returns its own results which are then reduced (and fetched, depending on the
  287. <<request-body-search-search-type, search type>>) by the CCS coordinating node.
  288. This strategy may be beneficial whenever there is very low network latency
  289. between the CCS coordinating node and the remote clusters involved, as it
  290. treats all shards the same, at the cost of sending many requests to each remote
  291. cluster, which is problematic in presence of network latency.
  292. The <<search-request-body, search API>> supports the `ccs_minimize_roundtrips`
  293. parameter, which defaults to `true` and can be set to `false` in case
  294. minimizing network round-trips is not desirable.