cross-cluster-search.asciidoc 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374
  1. [[modules-cross-cluster-search]]
  2. == Search across clusters
  3. *{ccs-cap}* lets you run a single search request against one or more
  4. <<modules-remote-clusters,remote clusters>>. For example, you can use a {ccs} to
  5. filter and analyze log data stored on clusters in different data centers.
  6. IMPORTANT: {ccs-cap} requires <<modules-remote-clusters, remote clusters>>.
  7. [float]
  8. [[ccs-supported-apis]]
  9. === Supported APIs
  10. The following APIs support {ccs}:
  11. * <<search-search,Search>>
  12. * <<search-multi-search,Multi search>>
  13. * <<search-template,Search template>>
  14. * <<multi-search-template,Multi search template>>
  15. [float]
  16. [[ccs-example]]
  17. === {ccs-cap} examples
  18. [float]
  19. [[ccs-remote-cluster-setup]]
  20. ==== Remote cluster setup
  21. To perform a {ccs}, you must have at least one remote cluster configured.
  22. The following <<cluster-update-settings,cluster update settings>> API request
  23. adds three remote clusters:`cluster_one`, `cluster_two`, and `cluster_three`.
  24. [source,console]
  25. --------------------------------
  26. PUT _cluster/settings
  27. {
  28. "persistent": {
  29. "cluster": {
  30. "remote": {
  31. "cluster_one": {
  32. "seeds": [
  33. "127.0.0.1:9300"
  34. ]
  35. },
  36. "cluster_two": {
  37. "seeds": [
  38. "127.0.0.1:9301"
  39. ]
  40. },
  41. "cluster_three": {
  42. "seeds": [
  43. "127.0.0.1:9302"
  44. ]
  45. }
  46. }
  47. }
  48. }
  49. }
  50. --------------------------------
  51. // TEST[setup:host]
  52. // TEST[s/127.0.0.1:930\d+/\${transport_host}/]
  53. [float]
  54. [[ccs-search-remote-cluster]]
  55. ==== Search a single remote cluster
  56. The following <<search-search,search>> API request searches the
  57. `twitter` index on a single remote cluster, `cluster_one`.
  58. [source,console]
  59. --------------------------------------------------
  60. GET /cluster_one:twitter/_search
  61. {
  62. "query": {
  63. "match": {
  64. "user": "kimchy"
  65. }
  66. }
  67. }
  68. --------------------------------------------------
  69. // TEST[continued]
  70. // TEST[setup:twitter]
  71. The API returns the following response:
  72. [source,console-result]
  73. --------------------------------------------------
  74. {
  75. "took": 150,
  76. "timed_out": false,
  77. "_shards": {
  78. "total": 1,
  79. "successful": 1,
  80. "failed": 0,
  81. "skipped": 0
  82. },
  83. "_clusters": {
  84. "total": 1,
  85. "successful": 1,
  86. "skipped": 0
  87. },
  88. "hits": {
  89. "total" : {
  90. "value": 1,
  91. "relation": "eq"
  92. },
  93. "max_score": 1,
  94. "hits": [
  95. {
  96. "_index": "cluster_one:twitter", <1>
  97. "_id": "0",
  98. "_score": 1,
  99. "_source": {
  100. "user": "kimchy",
  101. "date": "2009-11-15T14:12:12",
  102. "message": "trying out Elasticsearch",
  103. "likes": 0
  104. }
  105. }
  106. ]
  107. }
  108. }
  109. --------------------------------------------------
  110. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  111. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  112. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  113. <1> The search response body includes the name of the remote cluster in the
  114. `_index` parameter.
  115. [float]
  116. [[ccs-search-multi-remote-cluster]]
  117. ==== Search multiple remote clusters
  118. The following <<search,search>> API request searches the `twitter` index on
  119. three clusters:
  120. * Your local cluster
  121. * Two remote clusters, `cluster_one` and `cluster_two`
  122. [source,console]
  123. --------------------------------------------------
  124. GET /twitter,cluster_one:twitter,cluster_two:twitter/_search
  125. {
  126. "query": {
  127. "match": {
  128. "user": "kimchy"
  129. }
  130. }
  131. }
  132. --------------------------------------------------
  133. // TEST[continued]
  134. The API returns the following response:
  135. [source,console-result]
  136. --------------------------------------------------
  137. {
  138. "took": 150,
  139. "timed_out": false,
  140. "num_reduce_phases": 4,
  141. "_shards": {
  142. "total": 3,
  143. "successful": 3,
  144. "failed": 0,
  145. "skipped": 0
  146. },
  147. "_clusters": {
  148. "total": 3,
  149. "successful": 3,
  150. "skipped": 0
  151. },
  152. "hits": {
  153. "total" : {
  154. "value": 3,
  155. "relation": "eq"
  156. },
  157. "max_score": 1,
  158. "hits": [
  159. {
  160. "_index": "twitter", <1>
  161. "_id": "0",
  162. "_score": 2,
  163. "_source": {
  164. "user": "kimchy",
  165. "date": "2009-11-15T14:12:12",
  166. "message": "trying out Elasticsearch",
  167. "likes": 0
  168. }
  169. },
  170. {
  171. "_index": "cluster_one:twitter", <2>
  172. "_id": "0",
  173. "_score": 1,
  174. "_source": {
  175. "user": "kimchy",
  176. "date": "2009-11-15T14:12:12",
  177. "message": "trying out Elasticsearch",
  178. "likes": 0
  179. }
  180. },
  181. {
  182. "_index": "cluster_two:twitter", <3>
  183. "_id": "0",
  184. "_score": 1,
  185. "_source": {
  186. "user": "kimchy",
  187. "date": "2009-11-15T14:12:12",
  188. "message": "trying out Elasticsearch",
  189. "likes": 0
  190. }
  191. }
  192. ]
  193. }
  194. }
  195. --------------------------------------------------
  196. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  197. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  198. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  199. // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
  200. <1> This document's `_index` parameter doesn't include a cluster name. This
  201. means the document came from the local cluster.
  202. <2> This document came from `cluster_one`.
  203. <3> This document came from `cluster_two`.
  204. [float]
  205. [[skip-unavailable-clusters]]
  206. === Skip unavailable clusters
  207. By default, a {ccs} returns an error if *any* cluster in the request is
  208. unavailable.
  209. To skip an unavailable cluster during a {ccs}, set the
  210. <<skip-unavailable,`skip_unavailable`>> cluster setting to `true`.
  211. The following <<cluster-update-settings,cluster update settings>> API request
  212. changes `cluster_two`'s `skip_unavailable` setting to `true`.
  213. [source,console]
  214. --------------------------------
  215. PUT _cluster/settings
  216. {
  217. "persistent": {
  218. "cluster.remote.cluster_two.skip_unavailable": true
  219. }
  220. }
  221. --------------------------------
  222. // TEST[continued]
  223. If `cluster_two` is disconnected or unavailable during a {ccs}, {es} won't
  224. include matching documents from that cluster in the final results.
  225. [discrete]
  226. [[ccs-gateway-seed-nodes]]
  227. === Selecting gateway and seed nodes in sniff mode
  228. For remote clusters using the <<sniff-mode,sniff connection>> mode, gateway and
  229. seed nodes need to be accessible from the local cluster via your network.
  230. By default, any non-<<master-node,master-eligible>> node can act as a
  231. gateway node. If wanted, you can define the gateway nodes for a cluster by
  232. setting `cluster.remote.node.attr.gateway` to `true`.
  233. For {ccs}, we recommend you use gateway nodes that are capable of serving as
  234. <<coordinating-node,coordinating nodes>> for search requests. If
  235. wanted, the seed nodes for a cluster can be a subset of these gateway nodes.
  236. [discrete]
  237. [[ccs-proxy-mode]]
  238. === {ccs-cap} in proxy mode
  239. <<proxy-mode,Proxy mode>> remote cluster connections support {ccs}. All remote
  240. connections connect to the configured `proxy_address`. Any desired connection
  241. routing to gateway or <<coordinating-node,coordinating nodes>> must
  242. be implemented by the intermediate proxy at this configured address.
  243. [discrete]
  244. [[ccs-network-delays]]
  245. === How {ccs} handles network delays
  246. Because {ccs} involves sending requests to remote clusters, any network delays
  247. can impact search speed. To avoid slow searches, {ccs} offers two options for
  248. handling network delays:
  249. <<ccs-min-roundtrips,Minimize network roundtrips>>::
  250. By default, {es} reduces the number of network roundtrips between remote
  251. clusters. This reduces the impact of network delays on search speed. However,
  252. {es} can't reduce network roundtrips for large search requests, such as those
  253. including a <<request-body-search-scroll, scroll>> or
  254. <<request-body-search-inner-hits,inner hits>>.
  255. +
  256. See <<ccs-min-roundtrips>> to learn how this option works.
  257. <<ccs-unmin-roundtrips, Don't minimize network roundtrips>>:: For search
  258. requests that include a scroll or inner hits, {es} sends multiple outgoing and
  259. ingoing requests to each remote cluster. You can also choose this option by
  260. setting the <<ccs-minimize-roundtrips,`ccs_minimize_roundtrips`>> parameter to
  261. `false`. While typically slower, this approach may work well for networks with
  262. low latency.
  263. +
  264. See <<ccs-unmin-roundtrips>> to learn how this option works.
  265. [discrete]
  266. [[ccs-min-roundtrips]]
  267. ==== Minimize network roundtrips
  268. Here's how {ccs} works when you minimize network roundtrips.
  269. . You send a {ccs} request to your local cluster. A coordinating node in that
  270. cluster receives and parses the request.
  271. +
  272. image:images/ccs/ccs-min-roundtrip-client-request.svg[]
  273. . The coordinating node sends a single search request to each cluster, including
  274. the local cluster. Each cluster performs the search request independently,
  275. applying its own cluster-level settings to the request.
  276. +
  277. image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]
  278. . Each remote cluster sends its search results back to the coordinating node.
  279. +
  280. image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]
  281. . After collecting results from each cluster, the coordinating node returns the
  282. final results in the {ccs} response.
  283. +
  284. image:images/ccs/ccs-min-roundtrip-client-response.svg[]
  285. [discrete]
  286. [[ccs-unmin-roundtrips]]
  287. ==== Don't minimize network roundtrips
  288. Here's how {ccs} works when you don't minimize network roundtrips.
  289. . You send a {ccs} request to your local cluster. A coordinating node in that
  290. cluster receives and parses the request.
  291. +
  292. image:images/ccs/ccs-min-roundtrip-client-request.svg[]
  293. . The coordinating node sends a <<search-shards,search shards>> API request to
  294. each remote cluster.
  295. +
  296. image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]
  297. . Each remote cluster sends its response back to the coordinating node.
  298. This response contains information about the indices and shards the {ccs}
  299. request will be executed on.
  300. +
  301. image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]
  302. . The coordinating node sends a search request to each shard, including those in
  303. its own cluster. Each shard performs the search request independently.
  304. +
  305. [WARNING]
  306. ====
  307. When network roundtrips aren't minimized, the search is executed as if all data
  308. were in the coordinating node's cluster. We recommend updating cluster-level
  309. settings that limit searches, such as `action.search.shard_count.limit`,
  310. `pre_filter_shard_size`, and `max_concurrent_shard_requests`, to account for
  311. this. If these limits are too low, the search may be rejected.
  312. ====
  313. +
  314. image:images/ccs/ccs-dont-min-roundtrip-shard-search.svg[]
  315. . Each shard sends its search results back to the coordinating node.
  316. +
  317. image:images/ccs/ccs-dont-min-roundtrip-shard-results.svg[]
  318. . After collecting results from each cluster, the coordinating node returns the
  319. final results in the {ccs} response.
  320. +
  321. image:images/ccs/ccs-min-roundtrip-client-response.svg[]