cross-cluster-search.asciidoc 9.7 KB


  1. [[modules-cross-cluster-search]]
  2. == {ccs-cap}
  3. The _{ccs}_ feature allows any node to act as a federated client across
  4. multiple clusters. A {ccs} node won't join the remote cluster, instead
  5. it connects to a remote cluster in a light fashion in order to execute
  6. federated search requests. For details on communication and compatibility
  7. between different clusters, see <<modules-remote-clusters>>.
  8. [float]
  9. === Using {ccs}
  10. {ccs-cap} requires <<modules-remote-clusters,configuring remote clusters>>.
  11. [source,js]
  12. --------------------------------
  13. PUT _cluster/settings
  14. {
  15. "persistent": {
  16. "cluster": {
  17. "remote": {
  18. "cluster_one": {
  19. "seeds": [
  20. "127.0.0.1:9300"
  21. ]
  22. },
  23. "cluster_two": {
  24. "seeds": [
  25. "127.0.0.1:9301"
  26. ]
  27. },
  28. "cluster_three": {
  29. "seeds": [
  30. "127.0.0.1:9302"
  31. ]
  32. }
  33. }
  34. }
  35. }
  36. }
  37. --------------------------------
  38. // CONSOLE
  39. // TEST[setup:host]
  40. // TEST[s/127.0.0.1:9300/\${transport_host}/]
  41. To search the `twitter` index on remote cluster `cluster_one` the index name
  42. must be prefixed with the alias of the remote cluster followed by the `:`
  43. character:
  44. [source,js]
  45. --------------------------------------------------
  46. GET /cluster_one:twitter/_search
  47. {
  48. "query": {
  49. "match": {
  50. "user": "kimchy"
  51. }
  52. }
  53. }
  54. --------------------------------------------------
  55. // CONSOLE
  56. // TEST[continued]
  57. // TEST[setup:twitter]
  58. [source,js]
  59. --------------------------------------------------
  60. {
  61. "took": 150,
  62. "timed_out": false,
  63. "_shards": {
  64. "total": 1,
  65. "successful": 1,
  66. "failed": 0,
  67. "skipped": 0
  68. },
  69. "_clusters": {
  70. "total": 1,
  71. "successful": 1,
  72. "skipped": 0
  73. },
  74. "hits": {
  75. "total" : {
  76. "value": 1,
  77. "relation": "eq"
  78. },
  79. "max_score": 1,
  80. "hits": [
  81. {
  82. "_index": "cluster_one:twitter",
  83. "_type": "_doc",
  84. "_id": "0",
  85. "_score": 1,
  86. "_source": {
  87. "user": "kimchy",
  88. "date": "2009-11-15T14:12:12",
  89. "message": "trying out Elasticsearch",
  90. "likes": 0
  91. }
  92. }
  93. ]
  94. }
  95. }
  96. --------------------------------------------------
  97. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  98. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  99. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  100. Indices with the same name on different clusters can also be searched:
  101. [source,js]
  102. --------------------------------------------------
  103. GET /cluster_one:twitter,twitter/_search
  104. {
  105. "query": {
  106. "match": {
  107. "user": "kimchy"
  108. }
  109. }
  110. }
  111. --------------------------------------------------
  112. // CONSOLE
  113. // TEST[continued]
  114. Search results are disambiguated the same way as the indices are disambiguated in the request.
  115. Indices with same names are treated as different indices when results are merged. All results
  116. retrieved from an index located in a remote cluster are prefixed with their corresponding
  117. cluster alias:
  118. [source,js]
  119. --------------------------------------------------
  120. {
  121. "took": 150,
  122. "timed_out": false,
  123. "num_reduce_phases": 3,
  124. "_shards": {
  125. "total": 2,
  126. "successful": 2,
  127. "failed": 0,
  128. "skipped": 0
  129. },
  130. "_clusters": {
  131. "total": 2,
  132. "successful": 2,
  133. "skipped": 0
  134. },
  135. "hits": {
  136. "total" : {
  137. "value": 2,
  138. "relation": "eq"
  139. },
  140. "max_score": 1,
  141. "hits": [
  142. {
  143. "_index": "twitter",
  144. "_type": "_doc",
  145. "_id": "0",
  146. "_score": 2,
  147. "_source": {
  148. "user": "kimchy",
  149. "date": "2009-11-15T14:12:12",
  150. "message": "trying out Elasticsearch",
  151. "likes": 0
  152. }
  153. },
  154. {
  155. "_index": "cluster_one:twitter",
  156. "_type": "_doc",
  157. "_id": "0",
  158. "_score": 1,
  159. "_source": {
  160. "user": "kimchy",
  161. "date": "2009-11-15T14:12:12",
  162. "message": "trying out Elasticsearch",
  163. "likes": 0
  164. }
  165. }
  166. ]
  167. }
  168. }
  169. --------------------------------------------------
  170. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  171. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  172. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  173. // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
  174. [float]
  175. === Skipping disconnected clusters
  176. By default, all remote clusters that are searched via {ccs} need to be
  177. available when the search request is executed. Otherwise, the whole request
  178. fails; even if some of the clusters are available, no search results are
  179. returned. You can use the boolean `skip_unavailable` setting to make remote
  180. clusters optional. By default, it is set to `false`.
  181. [source,js]
  182. --------------------------------
  183. PUT _cluster/settings
  184. {
  185. "persistent": {
  186. "cluster.remote.cluster_two.skip_unavailable": true <1>
  187. }
  188. }
  189. --------------------------------
  190. // CONSOLE
  191. // TEST[continued]
  192. <1> `cluster_two` is made optional
  193. [source,js]
  194. --------------------------------------------------
  195. GET /cluster_one:twitter,cluster_two:twitter,twitter/_search <1>
  196. {
  197. "query": {
  198. "match": {
  199. "user": "kimchy"
  200. }
  201. }
  202. }
  203. --------------------------------------------------
  204. // CONSOLE
  205. // TEST[continued]
  206. <1> Search against the `twitter` index in `cluster_one`, `cluster_two` and also locally
  207. [source,js]
  208. --------------------------------------------------
  209. {
  210. "took": 150,
  211. "timed_out": false,
  212. "num_reduce_phases": 3,
  213. "_shards": {
  214. "total": 2,
  215. "successful": 2,
  216. "failed": 0,
  217. "skipped": 0
  218. },
  219. "_clusters": { <1>
  220. "total": 3,
  221. "successful": 2,
  222. "skipped": 1
  223. },
  224. "hits": {
  225. "total" : {
  226. "value": 2,
  227. "relation": "eq"
  228. },
  229. "max_score": 1,
  230. "hits": [
  231. {
  232. "_index": "twitter",
  233. "_type": "_doc",
  234. "_id": "0",
  235. "_score": 2,
  236. "_source": {
  237. "user": "kimchy",
  238. "date": "2009-11-15T14:12:12",
  239. "message": "trying out Elasticsearch",
  240. "likes": 0
  241. }
  242. },
  243. {
  244. "_index": "cluster_one:twitter",
  245. "_type": "_doc",
  246. "_id": "0",
  247. "_score": 1,
  248. "_source": {
  249. "user": "kimchy",
  250. "date": "2009-11-15T14:12:12",
  251. "message": "trying out Elasticsearch",
  252. "likes": 0
  253. }
  254. }
  255. ]
  256. }
  257. }
  258. --------------------------------------------------
  259. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  260. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  261. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  262. // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
  263. <1> The `clusters` section indicates that one cluster was unavailable and got skipped
  264. [float]
  265. [[ccs-works]]
  266. === How {ccs} works
  267. Because {ccs} involves sending requests to remote clusters, any network delays
  268. can impact search speed. To avoid slow searches, {ccs} offers two options for
  269. handling network delays:
  270. <<ccs-min-roundtrips,Minimize network roundtrips>>::
  271. By default, {es} reduces the number of network roundtrips between remote
  272. clusters. This reduces the impact of network delays on search speed. However,
  273. {es} can't reduce network roundtrips for large search requests, such as those
  274. including a <<request-body-search-scroll, scroll>> or
  275. <<request-body-search-inner-hits,inner hits>>.
  276. +
  277. See <<ccs-min-roundtrips>> to learn how this option works.
  278. <<ccs-unmin-roundtrips, Don't minimize network roundtrips>>::
  279. For search requests that include a scroll or inner hits, {es} sends multiple
  280. outgoing and ingoing requests to each remote cluster. You can also choose this
  281. option by setting the <<search,search>> API's
  282. <<ccs-minimize-roundtrips,`ccs_minimize_roundtrips`>> parameter to `false`.
  283. While typically slower, this approach may work well for networks with low
  284. latency.
  285. +
  286. See <<ccs-unmin-roundtrips>> to learn how this option works.
  287. [float]
  288. [[ccs-min-roundtrips]]
  289. ==== Minimize network roundtrips
  290. Here's how {ccs} works when you minimize network roundtrips.
  291. . You send a {ccs} request to your local cluster. A coordinating node in that
  292. cluster receives and parses the request.
  293. +
  294. image:images/ccs/ccs-min-roundtrip-client-request.png[]
  295. . The coordinating node sends a single search request to each cluster, including
  296. its own. Each cluster performs the search request independently.
  297. +
  298. image:images/ccs/ccs-min-roundtrip-cluster-search.png[]
  299. . Each remote cluster sends its search results back to the coordinating node.
  300. +
  301. image:images/ccs/ccs-min-roundtrip-cluster-results.png[]
  302. . After collecting results from each cluster, the coordinating node returns the
  303. final results in the {ccs} response.
  304. +
  305. image:images/ccs/ccs-min-roundtrip-client-response.png[]
  306. [float]
  307. [[ccs-unmin-roundtrips]]
  308. ==== Don't minimize network roundtrips
  309. Here's how {ccs} works when you don't minimize network roundtrips.
  310. . You send a {ccs} request to your local cluster. A coordinating node in that
  311. cluster receives and parses the request.
  312. +
  313. image:images/ccs/ccs-min-roundtrip-client-request.png[]
  314. . The coordinating node sends a <<search-shards,search shards>> API request to
  315. each remote cluster.
  316. +
  317. image:images/ccs/ccs-min-roundtrip-cluster-search.png[]
  318. . Each remote cluster sends its response back to the coordinating node.
  319. This response contains information about the indices and shards the {ccs}
  320. request will be executed on.
  321. +
  322. image:images/ccs/ccs-min-roundtrip-cluster-results.png[]
  323. . The coordinating node sends a search request to each shard, including those in
  324. its own cluster. Each shard performs the search request independently.
  325. +
  326. image:images/ccs/ccs-dont-min-roundtrip-shard-search.png[]
  327. . Each shard sends its search results back to the coordinating node.
  328. +
  329. image:images/ccs/ccs-dont-min-roundtrip-shard-results.png[]
  330. . After collecting results from each cluster, the coordinating node returns the
  331. final results in the {ccs} response.
  332. +
  333. image:images/ccs/ccs-min-roundtrip-client-response.png[]