cross-cluster-search.asciidoc 9.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352
  1. [chapter]
  2. [[modules-cross-cluster-search]]
  3. = Search across clusters
  4. *{ccs-cap}* lets you run a single search request against one or more
  5. <<modules-remote-clusters,remote clusters>>. For example, you can use a {ccs} to
  6. filter and analyze log data stored on clusters in different data centers.
  7. IMPORTANT: {ccs-cap} requires <<modules-remote-clusters, remote clusters>>.
  8. [float]
  9. [[ccs-example]]
  10. == {ccs-cap} examples
  11. [float]
  12. [[ccs-remote-cluster-setup]]
  13. === Remote cluster setup
  14. To perform a {ccs}, you must have at least one remote cluster configured.
  15. The following <<cluster-update-settings,cluster update settings>> API request
  16. adds three remote clusters:`cluster_one`, `cluster_two`, and `cluster_three`.
  17. [source,console]
  18. --------------------------------
  19. PUT _cluster/settings
  20. {
  21. "persistent": {
  22. "cluster": {
  23. "remote": {
  24. "cluster_one": {
  25. "seeds": [
  26. "127.0.0.1:9300"
  27. ]
  28. },
  29. "cluster_two": {
  30. "seeds": [
  31. "127.0.0.1:9301"
  32. ]
  33. },
  34. "cluster_three": {
  35. "seeds": [
  36. "127.0.0.1:9302"
  37. ]
  38. }
  39. }
  40. }
  41. }
  42. }
  43. --------------------------------
  44. // TEST[setup:host]
  45. // TEST[s/127.0.0.1:930\d+/\${transport_host}/]
  46. [float]
  47. [[ccs-search-remote-cluster]]
  48. === Search a single remote cluster
  49. The following <<search,search>> API request searches the
  50. `twitter` index on a single remote cluster, `cluster_one`.
  51. [source,console]
  52. --------------------------------------------------
  53. GET /cluster_one:twitter/_search
  54. {
  55. "query": {
  56. "match": {
  57. "user": "kimchy"
  58. }
  59. }
  60. }
  61. --------------------------------------------------
  62. // TEST[continued]
  63. // TEST[setup:twitter]
  64. The API returns the following response:
  65. [source,console-result]
  66. --------------------------------------------------
  67. {
  68. "took": 150,
  69. "timed_out": false,
  70. "_shards": {
  71. "total": 1,
  72. "successful": 1,
  73. "failed": 0,
  74. "skipped": 0
  75. },
  76. "_clusters": {
  77. "total": 1,
  78. "successful": 1,
  79. "skipped": 0
  80. },
  81. "hits": {
  82. "total" : {
  83. "value": 1,
  84. "relation": "eq"
  85. },
  86. "max_score": 1,
  87. "hits": [
  88. {
  89. "_index": "cluster_one:twitter", <1>
  90. "_id": "0",
  91. "_score": 1,
  92. "_source": {
  93. "user": "kimchy",
  94. "date": "2009-11-15T14:12:12",
  95. "message": "trying out Elasticsearch",
  96. "likes": 0
  97. }
  98. }
  99. ]
  100. }
  101. }
  102. --------------------------------------------------
  103. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  104. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  105. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  106. <1> The search response body includes the name of the remote cluster in the
  107. `_index` parameter.
  108. [float]
  109. [[ccs-search-multi-remote-cluster]]
  110. === Search multiple remote clusters
  111. The following <<search,search>> API request searches the `twitter` index on
  112. three clusters:
  113. * Your local cluster
  114. * Two remote clusters, `cluster_one` and `cluster_two`
  115. [source,console]
  116. --------------------------------------------------
  117. GET /twitter,cluster_one:twitter,cluster_two:twitter/_search
  118. {
  119. "query": {
  120. "match": {
  121. "user": "kimchy"
  122. }
  123. }
  124. }
  125. --------------------------------------------------
  126. // TEST[continued]
  127. The API returns the following response:
  128. [source,console-result]
  129. --------------------------------------------------
  130. {
  131. "took": 150,
  132. "timed_out": false,
  133. "num_reduce_phases": 4,
  134. "_shards": {
  135. "total": 3,
  136. "successful": 3,
  137. "failed": 0,
  138. "skipped": 0
  139. },
  140. "_clusters": {
  141. "total": 3,
  142. "successful": 3,
  143. "skipped": 0
  144. },
  145. "hits": {
  146. "total" : {
  147. "value": 3,
  148. "relation": "eq"
  149. },
  150. "max_score": 1,
  151. "hits": [
  152. {
  153. "_index": "twitter", <1>
  154. "_id": "0",
  155. "_score": 2,
  156. "_source": {
  157. "user": "kimchy",
  158. "date": "2009-11-15T14:12:12",
  159. "message": "trying out Elasticsearch",
  160. "likes": 0
  161. }
  162. },
  163. {
  164. "_index": "cluster_one:twitter", <2>
  165. "_id": "0",
  166. "_score": 1,
  167. "_source": {
  168. "user": "kimchy",
  169. "date": "2009-11-15T14:12:12",
  170. "message": "trying out Elasticsearch",
  171. "likes": 0
  172. }
  173. },
  174. {
  175. "_index": "cluster_two:twitter", <3>
  176. "_id": "0",
  177. "_score": 1,
  178. "_source": {
  179. "user": "kimchy",
  180. "date": "2009-11-15T14:12:12",
  181. "message": "trying out Elasticsearch",
  182. "likes": 0
  183. }
  184. }
  185. ]
  186. }
  187. }
  188. --------------------------------------------------
  189. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  190. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  191. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  192. // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
  193. <1> This document's `_index` parameter doesn't include a cluster name. This
  194. means the document came from the local cluster.
  195. <2> This document came from `cluster_one`.
  196. <3> This document came from `cluster_two`.
  197. [float]
  198. [[skip-unavailable-clusters]]
  199. == Skip unavailable clusters
  200. By default, a {ccs} returns an error if *any* cluster in the request is
  201. unavailable.
  202. To skip an unavailable cluster during a {ccs}, set the
  203. <<skip-unavailable,`skip_unavailable`>> cluster setting to `true`.
  204. The following <<cluster-update-settings,cluster update settings>> API request
  205. changes `cluster_two`'s `skip_unavailable` setting to `true`.
  206. [source,console]
  207. --------------------------------
  208. PUT _cluster/settings
  209. {
  210. "persistent": {
  211. "cluster.remote.cluster_two.skip_unavailable": true
  212. }
  213. }
  214. --------------------------------
  215. // TEST[continued]
  216. If `cluster_two` is disconnected or unavailable during a {ccs}, {es} won't
  217. include matching documents from that cluster in the final results.
  218. [discrete]
  219. [[ccs-works]]
  220. == How {ccs} works
  221. include::./remote-clusters.asciidoc[tag=how-remote-clusters-work]
  222. [discrete]
  223. [[ccs-gateway-seed-nodes]]
  224. === Selecting gateway and seed nodes
  225. Gateway and seed nodes need to be accessible from the local cluster via your
  226. network.
  227. By default, any master-ineligible node can act as a gateway node. If wanted,
  228. you can define the gateway nodes for a cluster by setting
  229. `cluster.remote.node.attr.gateway` to `true`.
  230. For {ccs}, we recommend you use gateway nodes that are capable of serving as
  231. <<coordinating-node,coordinating nodes>> for search requests. If
  232. wanted, the seed nodes for a cluster can be a subset of these gateway nodes.
  233. [discrete]
  234. [[ccs-network-delays]]
  235. === How {ccs} handles network delays
  236. Because {ccs} involves sending requests to remote clusters, any network delays
  237. can impact search speed. To avoid slow searches, {ccs} offers two options for
  238. handling network delays:
  239. <<ccs-min-roundtrips,Minimize network roundtrips>>::
  240. By default, {es} reduces the number of network roundtrips between remote
  241. clusters. This reduces the impact of network delays on search speed. However,
  242. {es} can't reduce network roundtrips for large search requests, such as those
  243. including a <<request-body-search-scroll, scroll>> or
  244. <<request-body-search-inner-hits,inner hits>>.
  245. +
  246. See <<ccs-min-roundtrips>> to learn how this option works.
  247. <<ccs-unmin-roundtrips, Don't minimize network roundtrips>>::
  248. For search requests that include a scroll or inner hits, {es} sends multiple
  249. outgoing and ingoing requests to each remote cluster. You can also choose this
  250. option by setting the <<search,search>> API's
  251. <<ccs-minimize-roundtrips,`ccs_minimize_roundtrips`>> parameter to `false`.
  252. While typically slower, this approach may work well for networks with low
  253. latency.
  254. +
  255. See <<ccs-unmin-roundtrips>> to learn how this option works.
  256. [float]
  257. [[ccs-min-roundtrips]]
  258. ==== Minimize network roundtrips
  259. Here's how {ccs} works when you minimize network roundtrips.
  260. . You send a {ccs} request to your local cluster. A coordinating node in that
  261. cluster receives and parses the request.
  262. +
  263. image:images/ccs/ccs-min-roundtrip-client-request.svg[]
  264. . The coordinating node sends a single search request to each cluster, including
  265. its own. Each cluster performs the search request independently.
  266. +
  267. image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]
  268. . Each remote cluster sends its search results back to the coordinating node.
  269. +
  270. image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]
  271. . After collecting results from each cluster, the coordinating node returns the
  272. final results in the {ccs} response.
  273. +
  274. image:images/ccs/ccs-min-roundtrip-client-response.svg[]
  275. [float]
  276. [[ccs-unmin-roundtrips]]
  277. ==== Don't minimize network roundtrips
  278. Here's how {ccs} works when you don't minimize network roundtrips.
  279. . You send a {ccs} request to your local cluster. A coordinating node in that
  280. cluster receives and parses the request.
  281. +
  282. image:images/ccs/ccs-min-roundtrip-client-request.svg[]
  283. . The coordinating node sends a <<search-shards,search shards>> API request to
  284. each remote cluster.
  285. +
  286. image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]
  287. . Each remote cluster sends its response back to the coordinating node.
  288. This response contains information about the indices and shards the {ccs}
  289. request will be executed on.
  290. +
  291. image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]
  292. . The coordinating node sends a search request to each shard, including those in
  293. its own cluster. Each shard performs the search request independently.
  294. +
  295. image:images/ccs/ccs-dont-min-roundtrip-shard-search.svg[]
  296. . Each shard sends its search results back to the coordinating node.
  297. +
  298. image:images/ccs/ccs-dont-min-roundtrip-shard-results.svg[]
  299. . After collecting results from each cluster, the coordinating node returns the
  300. final results in the {ccs} response.
  301. +
  302. image:images/ccs/ccs-min-roundtrip-client-response.svg[]