cross-cluster-search.asciidoc 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390
  1. [[modules-cross-cluster-search]]
  2. == Search across clusters
  3. *{ccs-cap}* lets you run a single search request against one or more
  4. <<modules-remote-clusters,remote clusters>>. For example, you can use a {ccs} to
  5. filter and analyze log data stored on clusters in different data centers.
  6. IMPORTANT: {ccs-cap} requires <<modules-remote-clusters, remote clusters>>.
  7. [discrete]
  8. [[ccs-supported-apis]]
  9. === Supported APIs
  10. The following APIs support {ccs}:
  11. * <<search-search,Search>>
  12. * <<search-multi-search,Multi search>>
  13. * <<search-template,Search template>>
  14. * <<multi-search-template,Multi search template>>
  15. [discrete]
  16. [[ccs-example]]
  17. === {ccs-cap} examples
  18. [discrete]
  19. [[ccs-remote-cluster-setup]]
  20. ==== Remote cluster setup
  21. To perform a {ccs}, you must have at least one remote cluster configured.
  22. The following <<cluster-update-settings,cluster update settings>> API request
  23. adds three remote clusters:`cluster_one`, `cluster_two`, and `cluster_three`.
  24. [source,console]
  25. --------------------------------
  26. PUT _cluster/settings
  27. {
  28. "persistent": {
  29. "cluster": {
  30. "remote": {
  31. "cluster_one": {
  32. "seeds": [
  33. "127.0.0.1:9300"
  34. ]
  35. },
  36. "cluster_two": {
  37. "seeds": [
  38. "127.0.0.1:9301"
  39. ]
  40. },
  41. "cluster_three": {
  42. "seeds": [
  43. "127.0.0.1:9302"
  44. ]
  45. }
  46. }
  47. }
  48. }
  49. }
  50. --------------------------------
  51. // TEST[setup:host]
  52. // TEST[s/127.0.0.1:930\d+/\${transport_host}/]
  53. [discrete]
  54. [[ccs-search-remote-cluster]]
  55. ==== Search a single remote cluster
  56. The following <<search-search,search>> API request searches the
  57. `twitter` index on a single remote cluster, `cluster_one`.
  58. [source,console]
  59. --------------------------------------------------
  60. GET /cluster_one:twitter/_search
  61. {
  62. "query": {
  63. "match": {
  64. "user": "kimchy"
  65. }
  66. },
  67. "_source": ["user", "message", "likes"]
  68. }
  69. --------------------------------------------------
  70. // TEST[continued]
  71. // TEST[setup:twitter]
  72. The API returns the following response:
  73. [source,console-result]
  74. --------------------------------------------------
  75. {
  76. "took": 150,
  77. "timed_out": false,
  78. "_shards": {
  79. "total": 1,
  80. "successful": 1,
  81. "failed": 0,
  82. "skipped": 0
  83. },
  84. "_clusters": {
  85. "total": 1,
  86. "successful": 1,
  87. "skipped": 0
  88. },
  89. "hits": {
  90. "total" : {
  91. "value": 1,
  92. "relation": "eq"
  93. },
  94. "max_score": 1,
  95. "hits": [
  96. {
  97. "_index": "cluster_one:twitter", <1>
  98. "_id": "0",
  99. "_score": 1,
  100. "_source": {
  101. "user": "kimchy",
  102. "message": "trying out Elasticsearch",
  103. "likes": 0
  104. }
  105. }
  106. ]
  107. }
  108. }
  109. --------------------------------------------------
  110. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  111. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  112. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  113. <1> The search response body includes the name of the remote cluster in the
  114. `_index` parameter.
  115. [discrete]
  116. [[ccs-search-multi-remote-cluster]]
  117. ==== Search multiple remote clusters
  118. The following <<search,search>> API request searches the `twitter` index on
  119. three clusters:
  120. * Your local cluster
  121. * Two remote clusters, `cluster_one` and `cluster_two`
  122. [source,console]
  123. --------------------------------------------------
  124. GET /twitter,cluster_one:twitter,cluster_two:twitter/_search
  125. {
  126. "query": {
  127. "match": {
  128. "user": "kimchy"
  129. }
  130. },
  131. "_source": ["user", "message", "likes"]
  132. }
  133. --------------------------------------------------
  134. // TEST[continued]
  135. The API returns the following response:
  136. [source,console-result]
  137. --------------------------------------------------
  138. {
  139. "took": 150,
  140. "timed_out": false,
  141. "num_reduce_phases": 4,
  142. "_shards": {
  143. "total": 3,
  144. "successful": 3,
  145. "failed": 0,
  146. "skipped": 0
  147. },
  148. "_clusters": {
  149. "total": 3,
  150. "successful": 3,
  151. "skipped": 0
  152. },
  153. "hits": {
  154. "total" : {
  155. "value": 3,
  156. "relation": "eq"
  157. },
  158. "max_score": 1,
  159. "hits": [
  160. {
  161. "_index": "twitter", <1>
  162. "_id": "0",
  163. "_score": 2,
  164. "_source": {
  165. "user": "kimchy",
  166. "message": "trying out Elasticsearch",
  167. "likes": 0
  168. }
  169. },
  170. {
  171. "_index": "cluster_one:twitter", <2>
  172. "_id": "0",
  173. "_score": 1,
  174. "_source": {
  175. "user": "kimchy",
  176. "message": "trying out Elasticsearch",
  177. "likes": 0
  178. }
  179. },
  180. {
  181. "_index": "cluster_two:twitter", <3>
  182. "_id": "0",
  183. "_score": 1,
  184. "_source": {
  185. "user": "kimchy",
  186. "message": "trying out Elasticsearch",
  187. "likes": 0
  188. }
  189. }
  190. ]
  191. }
  192. }
  193. --------------------------------------------------
  194. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  195. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  196. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  197. // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
  198. <1> This document's `_index` parameter doesn't include a cluster name. This
  199. means the document came from the local cluster.
  200. <2> This document came from `cluster_one`.
  201. <3> This document came from `cluster_two`.
  202. [discrete]
  203. [[skip-unavailable-clusters]]
  204. === Skip unavailable clusters
  205. By default, a {ccs} returns an error if *any* cluster in the request is
  206. unavailable.
  207. To skip an unavailable cluster during a {ccs}, set the
  208. <<skip-unavailable,`skip_unavailable`>> cluster setting to `true`.
  209. The following <<cluster-update-settings,cluster update settings>> API request
  210. changes `cluster_two`'s `skip_unavailable` setting to `true`.
  211. [source,console]
  212. --------------------------------
  213. PUT _cluster/settings
  214. {
  215. "persistent": {
  216. "cluster.remote.cluster_two.skip_unavailable": true
  217. }
  218. }
  219. --------------------------------
  220. // TEST[continued]
  221. If `cluster_two` is disconnected or unavailable during a {ccs}, {es} won't
  222. include matching documents from that cluster in the final results.
  223. [discrete]
  224. [[ccs-gateway-seed-nodes]]
  225. === Selecting gateway and seed nodes in sniff mode
  226. For remote clusters using the <<sniff-mode,sniff connection>> mode, gateway and
  227. seed nodes need to be accessible from the local cluster via your network.
  228. By default, any non-<<master-node,master-eligible>> node can act as a
  229. gateway node. If wanted, you can define the gateway nodes for a cluster by
  230. setting `cluster.remote.node.attr.gateway` to `true`.
  231. For {ccs}, we recommend you use gateway nodes that are capable of serving as
  232. <<coordinating-node,coordinating nodes>> for search requests. If
  233. wanted, the seed nodes for a cluster can be a subset of these gateway nodes.
  234. [discrete]
  235. [[ccs-proxy-mode]]
  236. === {ccs-cap} in proxy mode
  237. <<proxy-mode,Proxy mode>> remote cluster connections support {ccs}. All remote
  238. connections connect to the configured `proxy_address`. Any desired connection
  239. routing to gateway or <<coordinating-node,coordinating nodes>> must
  240. be implemented by the intermediate proxy at this configured address.
  241. [discrete]
  242. [[ccs-network-delays]]
  243. === How {ccs} handles network delays
  244. Because {ccs} involves sending requests to remote clusters, any network delays
  245. can impact search speed. To avoid slow searches, {ccs} offers two options for
  246. handling network delays:
  247. <<ccs-min-roundtrips,Minimize network roundtrips>>::
  248. By default, {es} reduces the number of network roundtrips between remote
  249. clusters. This reduces the impact of network delays on search speed. However,
  250. {es} can't reduce network roundtrips for large search requests, such as those
  251. including a <<scroll-search-results, scroll>> or
  252. <<request-body-search-inner-hits,inner hits>>.
  253. +
  254. See <<ccs-min-roundtrips>> to learn how this option works.
  255. <<ccs-unmin-roundtrips, Don't minimize network roundtrips>>:: For search
  256. requests that include a scroll or inner hits, {es} sends multiple outgoing and
  257. ingoing requests to each remote cluster. You can also choose this option by
  258. setting the <<ccs-minimize-roundtrips,`ccs_minimize_roundtrips`>> parameter to
  259. `false`. While typically slower, this approach may work well for networks with
  260. low latency.
  261. +
  262. See <<ccs-unmin-roundtrips>> to learn how this option works.
  263. [discrete]
  264. [[ccs-min-roundtrips]]
  265. ==== Minimize network roundtrips
  266. Here's how {ccs} works when you minimize network roundtrips.
  267. . You send a {ccs} request to your local cluster. A coordinating node in that
  268. cluster receives and parses the request.
  269. +
  270. image:images/ccs/ccs-min-roundtrip-client-request.svg[]
  271. . The coordinating node sends a single search request to each cluster, including
  272. the local cluster. Each cluster performs the search request independently,
  273. applying its own cluster-level settings to the request.
  274. +
  275. image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]
  276. . Each remote cluster sends its search results back to the coordinating node.
  277. +
  278. image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]
  279. . After collecting results from each cluster, the coordinating node returns the
  280. final results in the {ccs} response.
  281. +
  282. image:images/ccs/ccs-min-roundtrip-client-response.svg[]
  283. [discrete]
  284. [[ccs-unmin-roundtrips]]
  285. ==== Don't minimize network roundtrips
  286. Here's how {ccs} works when you don't minimize network roundtrips.
  287. . You send a {ccs} request to your local cluster. A coordinating node in that
  288. cluster receives and parses the request.
  289. +
  290. image:images/ccs/ccs-min-roundtrip-client-request.svg[]
  291. . The coordinating node sends a <<search-shards,search shards>> API request to
  292. each remote cluster.
  293. +
  294. image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]
  295. . Each remote cluster sends its response back to the coordinating node.
  296. This response contains information about the indices and shards the {ccs}
  297. request will be executed on.
  298. +
  299. image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]
  300. . The coordinating node sends a search request to each shard, including those in
  301. its own cluster. Each shard performs the search request independently.
  302. +
  303. [WARNING]
  304. ====
  305. When network roundtrips aren't minimized, the search is executed as if all data
  306. were in the coordinating node's cluster. We recommend updating cluster-level
  307. settings that limit searches, such as `action.search.shard_count.limit`,
  308. `pre_filter_shard_size`, and `max_concurrent_shard_requests`, to account for
  309. this. If these limits are too low, the search may be rejected.
  310. ====
  311. +
  312. image:images/ccs/ccs-dont-min-roundtrip-shard-search.svg[]
  313. . Each shard sends its search results back to the coordinating node.
  314. +
  315. image:images/ccs/ccs-dont-min-roundtrip-shard-results.svg[]
  316. . After collecting results from each cluster, the coordinating node returns the
  317. final results in the {ccs} response.
  318. +
  319. image:images/ccs/ccs-min-roundtrip-client-response.svg[]
  320. [discrete]
  321. [[ccs-supported-configurations]]
  322. === Supported configurations
  323. Generally, <<gateway-nodes-selection, cross cluster search>> can search remote
  324. clusters that are one major version ahead or behind the coordinating node's
  325. version. Cross cluster search can also search remote clusters that are being
  326. <<rolling-upgrades, upgraded>> so long as both the "upgrade from" and
  327. "upgrade to" version are compatible with the gateway node.
  328. For example, a coordinating node running {es} 5.6 can search a remote cluster
  329. running {es} 6.8, but that cluster can not be upgraded to 7.1. In this case
  330. you should first upgrade the coordinating node to 7.1 and then upgrade remote
  331. cluster.
  332. WARNING: Running multiple versions of {es} in the same cluster beyond the
  333. duration of an upgrade is not supported.