cross-cluster-search.asciidoc 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375
  1. [chapter]
  2. [[modules-cross-cluster-search]]
  3. = Search across clusters
  4. *{ccs-cap}* lets you run a single search request against one or more
  5. <<modules-remote-clusters,remote clusters>>. For example, you can use a {ccs} to
  6. filter and analyze log data stored on clusters in different data centers.
  7. IMPORTANT: {ccs-cap} requires <<modules-remote-clusters, remote clusters>>.
  8. [float]
  9. [[ccs-supported-apis]]
  10. == Supported APIs
  11. The following APIs support {ccs}:
  12. * <<search-search,Search>>
  13. * <<search-multi-search,Multi search>>
  14. * <<search-template,Search template>>
  15. * <<multi-search-template,Multi search template>>
  16. [float]
  17. [[ccs-example]]
  18. == {ccs-cap} examples
  19. [float]
  20. [[ccs-remote-cluster-setup]]
  21. === Remote cluster setup
  22. To perform a {ccs}, you must have at least one remote cluster configured.
  23. The following <<cluster-update-settings,cluster update settings>> API request
  24. adds three remote clusters:`cluster_one`, `cluster_two`, and `cluster_three`.
  25. [source,console]
  26. --------------------------------
  27. PUT _cluster/settings
  28. {
  29. "persistent": {
  30. "cluster": {
  31. "remote": {
  32. "cluster_one": {
  33. "seeds": [
  34. "127.0.0.1:9300"
  35. ]
  36. },
  37. "cluster_two": {
  38. "seeds": [
  39. "127.0.0.1:9301"
  40. ]
  41. },
  42. "cluster_three": {
  43. "seeds": [
  44. "127.0.0.1:9302"
  45. ]
  46. }
  47. }
  48. }
  49. }
  50. }
  51. --------------------------------
  52. // TEST[setup:host]
  53. // TEST[s/127.0.0.1:930\d+/\${transport_host}/]
  54. [float]
  55. [[ccs-search-remote-cluster]]
  56. === Search a single remote cluster
  57. The following <<search-search,search>> API request searches the
  58. `twitter` index on a single remote cluster, `cluster_one`.
  59. [source,console]
  60. --------------------------------------------------
  61. GET /cluster_one:twitter/_search
  62. {
  63. "query": {
  64. "match": {
  65. "user": "kimchy"
  66. }
  67. }
  68. }
  69. --------------------------------------------------
  70. // TEST[continued]
  71. // TEST[setup:twitter]
  72. The API returns the following response:
  73. [source,console-result]
  74. --------------------------------------------------
  75. {
  76. "took": 150,
  77. "timed_out": false,
  78. "_shards": {
  79. "total": 1,
  80. "successful": 1,
  81. "failed": 0,
  82. "skipped": 0
  83. },
  84. "_clusters": {
  85. "total": 1,
  86. "successful": 1,
  87. "skipped": 0
  88. },
  89. "hits": {
  90. "total" : {
  91. "value": 1,
  92. "relation": "eq"
  93. },
  94. "max_score": 1,
  95. "hits": [
  96. {
  97. "_index": "cluster_one:twitter", <1>
  98. "_id": "0",
  99. "_score": 1,
  100. "_source": {
  101. "user": "kimchy",
  102. "date": "2009-11-15T14:12:12",
  103. "message": "trying out Elasticsearch",
  104. "likes": 0
  105. }
  106. }
  107. ]
  108. }
  109. }
  110. --------------------------------------------------
  111. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  112. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  113. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  114. <1> The search response body includes the name of the remote cluster in the
  115. `_index` parameter.
  116. [float]
  117. [[ccs-search-multi-remote-cluster]]
  118. === Search multiple remote clusters
  119. The following <<search,search>> API request searches the `twitter` index on
  120. three clusters:
  121. * Your local cluster
  122. * Two remote clusters, `cluster_one` and `cluster_two`
  123. [source,console]
  124. --------------------------------------------------
  125. GET /twitter,cluster_one:twitter,cluster_two:twitter/_search
  126. {
  127. "query": {
  128. "match": {
  129. "user": "kimchy"
  130. }
  131. }
  132. }
  133. --------------------------------------------------
  134. // TEST[continued]
  135. The API returns the following response:
  136. [source,console-result]
  137. --------------------------------------------------
  138. {
  139. "took": 150,
  140. "timed_out": false,
  141. "num_reduce_phases": 4,
  142. "_shards": {
  143. "total": 3,
  144. "successful": 3,
  145. "failed": 0,
  146. "skipped": 0
  147. },
  148. "_clusters": {
  149. "total": 3,
  150. "successful": 3,
  151. "skipped": 0
  152. },
  153. "hits": {
  154. "total" : {
  155. "value": 3,
  156. "relation": "eq"
  157. },
  158. "max_score": 1,
  159. "hits": [
  160. {
  161. "_index": "twitter", <1>
  162. "_id": "0",
  163. "_score": 2,
  164. "_source": {
  165. "user": "kimchy",
  166. "date": "2009-11-15T14:12:12",
  167. "message": "trying out Elasticsearch",
  168. "likes": 0
  169. }
  170. },
  171. {
  172. "_index": "cluster_one:twitter", <2>
  173. "_id": "0",
  174. "_score": 1,
  175. "_source": {
  176. "user": "kimchy",
  177. "date": "2009-11-15T14:12:12",
  178. "message": "trying out Elasticsearch",
  179. "likes": 0
  180. }
  181. },
  182. {
  183. "_index": "cluster_two:twitter", <3>
  184. "_id": "0",
  185. "_score": 1,
  186. "_source": {
  187. "user": "kimchy",
  188. "date": "2009-11-15T14:12:12",
  189. "message": "trying out Elasticsearch",
  190. "likes": 0
  191. }
  192. }
  193. ]
  194. }
  195. }
  196. --------------------------------------------------
  197. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  198. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  199. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  200. // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
  201. <1> This document's `_index` parameter doesn't include a cluster name. This
  202. means the document came from the local cluster.
  203. <2> This document came from `cluster_one`.
  204. <3> This document came from `cluster_two`.
  205. [float]
  206. [[skip-unavailable-clusters]]
  207. == Skip unavailable clusters
  208. By default, a {ccs} returns an error if *any* cluster in the request is
  209. unavailable.
  210. To skip an unavailable cluster during a {ccs}, set the
  211. <<skip-unavailable,`skip_unavailable`>> cluster setting to `true`.
  212. The following <<cluster-update-settings,cluster update settings>> API request
  213. changes `cluster_two`'s `skip_unavailable` setting to `true`.
  214. [source,console]
  215. --------------------------------
  216. PUT _cluster/settings
  217. {
  218. "persistent": {
  219. "cluster.remote.cluster_two.skip_unavailable": true
  220. }
  221. }
  222. --------------------------------
  223. // TEST[continued]
  224. If `cluster_two` is disconnected or unavailable during a {ccs}, {es} won't
  225. include matching documents from that cluster in the final results.
  226. [discrete]
  227. [[ccs-gateway-seed-nodes]]
  228. == Selecting gateway and seed nodes in sniff mode
  229. For remote clusters using the <<sniff-mode,sniff connection>> mode, gateway and
  230. seed nodes need to be accessible from the local cluster via your network.
  231. By default, any non-<<master-node,master-eligible>> node can act as a
  232. gateway node. If wanted, you can define the gateway nodes for a cluster by
  233. setting `cluster.remote.node.attr.gateway` to `true`.
  234. For {ccs}, we recommend you use gateway nodes that are capable of serving as
  235. <<coordinating-node,coordinating nodes>> for search requests. If
  236. wanted, the seed nodes for a cluster can be a subset of these gateway nodes.
  237. [discrete]
  238. [[ccs-proxy-mode]]
  239. == {ccs} in proxy mode
  240. <<proxy-mode,Proxy mode>> remote cluster connections support {ccs}. All remote
  241. connections connect to the configured `proxy_address`. Any desired connection
  242. routing to gateway or <<coordinating-node,coordinating nodes>> must
  243. be implemented by the intermediate proxy at this configured address.
  244. [discrete]
  245. [[ccs-network-delays]]
  246. == How {ccs} handles network delays
  247. Because {ccs} involves sending requests to remote clusters, any network delays
  248. can impact search speed. To avoid slow searches, {ccs} offers two options for
  249. handling network delays:
  250. <<ccs-min-roundtrips,Minimize network roundtrips>>::
  251. By default, {es} reduces the number of network roundtrips between remote
  252. clusters. This reduces the impact of network delays on search speed. However,
  253. {es} can't reduce network roundtrips for large search requests, such as those
  254. including a <<request-body-search-scroll, scroll>> or
  255. <<request-body-search-inner-hits,inner hits>>.
  256. +
  257. See <<ccs-min-roundtrips>> to learn how this option works.
  258. <<ccs-unmin-roundtrips, Don't minimize network roundtrips>>:: For search
  259. requests that include a scroll or inner hits, {es} sends multiple outgoing and
  260. ingoing requests to each remote cluster. You can also choose this option by
  261. setting the <<ccs-minimize-roundtrips,`ccs_minimize_roundtrips`>> parameter to
  262. `false`. While typically slower, this approach may work well for networks with
  263. low latency.
  264. +
  265. See <<ccs-unmin-roundtrips>> to learn how this option works.
  266. [discrete]
  267. [[ccs-min-roundtrips]]
  268. === Minimize network roundtrips
  269. Here's how {ccs} works when you minimize network roundtrips.
  270. . You send a {ccs} request to your local cluster. A coordinating node in that
  271. cluster receives and parses the request.
  272. +
  273. image:images/ccs/ccs-min-roundtrip-client-request.svg[]
  274. . The coordinating node sends a single search request to each cluster, including
  275. the local cluster. Each cluster performs the search request independently,
  276. applying its own cluster-level settings to the request.
  277. +
  278. image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]
  279. . Each remote cluster sends its search results back to the coordinating node.
  280. +
  281. image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]
  282. . After collecting results from each cluster, the coordinating node returns the
  283. final results in the {ccs} response.
  284. +
  285. image:images/ccs/ccs-min-roundtrip-client-response.svg[]
  286. [discrete]
  287. [[ccs-unmin-roundtrips]]
  288. === Don't minimize network roundtrips
  289. Here's how {ccs} works when you don't minimize network roundtrips.
  290. . You send a {ccs} request to your local cluster. A coordinating node in that
  291. cluster receives and parses the request.
  292. +
  293. image:images/ccs/ccs-min-roundtrip-client-request.svg[]
  294. . The coordinating node sends a <<search-shards,search shards>> API request to
  295. each remote cluster.
  296. +
  297. image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]
  298. . Each remote cluster sends its response back to the coordinating node.
  299. This response contains information about the indices and shards the {ccs}
  300. request will be executed on.
  301. +
  302. image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]
  303. . The coordinating node sends a search request to each shard, including those in
  304. its own cluster. Each shard performs the search request independently.
  305. +
  306. [WARNING]
  307. ====
  308. When network roundtrips aren't minimized, the search is executed as if all data
  309. were in the coordinating node's cluster. We recommend updating cluster-level
  310. settings that limit searches, such as `action.search.shard_count.limit`,
  311. `pre_filter_shard_size`, and `max_concurrent_shard_requests`, to account for
  312. this. If these limits are too low, the search may be rejected.
  313. ====
  314. +
  315. image:images/ccs/ccs-dont-min-roundtrip-shard-search.svg[]
  316. . Each shard sends its search results back to the coordinating node.
  317. +
  318. image:images/ccs/ccs-dont-min-roundtrip-shard-results.svg[]
  319. . After collecting results from each cluster, the coordinating node returns the
  320. final results in the {ccs} response.
  321. +
  322. image:images/ccs/ccs-min-roundtrip-client-response.svg[]