cross-cluster-search.asciidoc 9.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340
  1. [chapter]
  2. [[modules-cross-cluster-search]]
  3. = Search across clusters
  4. *{ccs-cap}* lets you run a single search request against one or more
  5. <<modules-remote-clusters,remote clusters>>. For example, you can use a {ccs} to
  6. filter and analyze log data stored on clusters in different data centers.
  7. IMPORTANT: {ccs-cap} requires <<modules-remote-clusters, remote clusters>>.
  8. [float]
  9. [[ccs-example]]
  10. == {ccs-cap} examples
  11. [float]
  12. [[ccs-remote-cluster-setup]]
  13. === Remote cluster setup
  14. To perform a {ccs}, you must have at least one remote cluster configured.
  15. The following <<cluster-update-settings,cluster update settings>> API request
  16. adds three remote clusters:`cluster_one`, `cluster_two`, and `cluster_three`.
  17. [source,js]
  18. --------------------------------
  19. PUT _cluster/settings
  20. {
  21. "persistent": {
  22. "cluster": {
  23. "remote": {
  24. "cluster_one": {
  25. "seeds": [
  26. "127.0.0.1:9300"
  27. ]
  28. },
  29. "cluster_two": {
  30. "seeds": [
  31. "127.0.0.1:9301"
  32. ]
  33. },
  34. "cluster_three": {
  35. "seeds": [
  36. "127.0.0.1:9302"
  37. ]
  38. }
  39. }
  40. }
  41. }
  42. }
  43. --------------------------------
  44. // CONSOLE
  45. // TEST[setup:host]
  46. // TEST[s/127.0.0.1:930\d+/\${transport_host}/]
  47. [float]
  48. [[ccs-search-remote-cluster]]
  49. === Search a single remote cluster
  50. The following <<search,search>> API request searches the
  51. `twitter` index on a single remote cluster, `cluster_one`.
  52. [source,js]
  53. --------------------------------------------------
  54. GET /cluster_one:twitter/_search
  55. {
  56. "query": {
  57. "match": {
  58. "user": "kimchy"
  59. }
  60. }
  61. }
  62. --------------------------------------------------
  63. // CONSOLE
  64. // TEST[continued]
  65. // TEST[setup:twitter]
  66. The API returns the following response:
  67. [source,js]
  68. --------------------------------------------------
  69. {
  70. "took": 150,
  71. "timed_out": false,
  72. "_shards": {
  73. "total": 1,
  74. "successful": 1,
  75. "failed": 0,
  76. "skipped": 0
  77. },
  78. "_clusters": {
  79. "total": 1,
  80. "successful": 1,
  81. "skipped": 0
  82. },
  83. "hits": {
  84. "total" : {
  85. "value": 1,
  86. "relation": "eq"
  87. },
  88. "max_score": 1,
  89. "hits": [
  90. {
  91. "_index": "cluster_one:twitter", <1>
  92. "_type": "_doc",
  93. "_id": "0",
  94. "_score": 1,
  95. "_source": {
  96. "user": "kimchy",
  97. "date": "2009-11-15T14:12:12",
  98. "message": "trying out Elasticsearch",
  99. "likes": 0
  100. }
  101. }
  102. ]
  103. }
  104. }
  105. --------------------------------------------------
  106. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  107. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  108. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  109. <1> The search response body includes the name of the remote cluster in the
  110. `_index` parameter.
  111. [float]
  112. [[ccs-search-multi-remote-cluster]]
  113. === Search multiple remote clusters
  114. The following <<search,search>> API request searches the `twitter` index on
  115. three clusters:
  116. * Your local cluster
  117. * Two remote clusters, `cluster_one` and `cluster_two`
  118. [source,js]
  119. --------------------------------------------------
  120. GET /twitter,cluster_one:twitter,cluster_two:twitter/_search
  121. {
  122. "query": {
  123. "match": {
  124. "user": "kimchy"
  125. }
  126. }
  127. }
  128. --------------------------------------------------
  129. // CONSOLE
  130. // TEST[continued]
  131. The API returns the following response:
  132. [source,js]
  133. --------------------------------------------------
  134. {
  135. "took": 150,
  136. "timed_out": false,
  137. "num_reduce_phases": 4,
  138. "_shards": {
  139. "total": 3,
  140. "successful": 3,
  141. "failed": 0,
  142. "skipped": 0
  143. },
  144. "_clusters": {
  145. "total": 3,
  146. "successful": 3,
  147. "skipped": 0
  148. },
  149. "hits": {
  150. "total" : {
  151. "value": 3,
  152. "relation": "eq"
  153. },
  154. "max_score": 1,
  155. "hits": [
  156. {
  157. "_index": "twitter", <1>
  158. "_type": "_doc",
  159. "_id": "0",
  160. "_score": 2,
  161. "_source": {
  162. "user": "kimchy",
  163. "date": "2009-11-15T14:12:12",
  164. "message": "trying out Elasticsearch",
  165. "likes": 0
  166. }
  167. },
  168. {
  169. "_index": "cluster_one:twitter", <2>
  170. "_type": "_doc",
  171. "_id": "0",
  172. "_score": 1,
  173. "_source": {
  174. "user": "kimchy",
  175. "date": "2009-11-15T14:12:12",
  176. "message": "trying out Elasticsearch",
  177. "likes": 0
  178. }
  179. },
  180. {
  181. "_index": "cluster_two:twitter", <3>
  182. "_type": "_doc",
  183. "_id": "0",
  184. "_score": 1,
  185. "_source": {
  186. "user": "kimchy",
  187. "date": "2009-11-15T14:12:12",
  188. "message": "trying out Elasticsearch",
  189. "likes": 0
  190. }
  191. }
  192. ]
  193. }
  194. }
  195. --------------------------------------------------
  196. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  197. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  198. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  199. // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
  200. <1> This document's `_index` parameter doesn't include a cluster name. This
  201. means the document came from the local cluster.
  202. <2> This document came from `cluster_one`.
  203. <3> This document came from `cluster_two`.
  204. [float]
  205. [[skip-unavailable-clusters]]
  206. == Skip unavailable clusters
  207. By default, a {ccs} returns an error if *any* cluster in the request is
  208. unavailable.
  209. To skip an unavailable cluster during a {ccs}, set the
  210. <<skip-unavailable,`skip_unavailable`>> cluster setting to `true`.
  211. The following <<cluster-update-settings,cluster update settings>> API request
  212. changes `cluster_two`'s `skip_unavailable` setting to `true`.
  213. [source,js]
  214. --------------------------------
  215. PUT _cluster/settings
  216. {
  217. "persistent": {
  218. "cluster.remote.cluster_two.skip_unavailable": true
  219. }
  220. }
  221. --------------------------------
  222. // CONSOLE
  223. // TEST[continued]
  224. If `cluster_two` is disconnected or unavailable during a {ccs}, {es} won't
  225. include matching documents from that cluster in the final results.
  226. [float]
  227. [[ccs-works]]
  228. == How {ccs} works
  229. Because {ccs} involves sending requests to remote clusters, any network delays
  230. can impact search speed. To avoid slow searches, {ccs} offers two options for
  231. handling network delays:
  232. <<ccs-min-roundtrips,Minimize network roundtrips>>::
  233. By default, {es} reduces the number of network roundtrips between remote
  234. clusters. This reduces the impact of network delays on search speed. However,
  235. {es} can't reduce network roundtrips for large search requests, such as those
  236. including a <<request-body-search-scroll, scroll>> or
  237. <<request-body-search-inner-hits,inner hits>>.
  238. +
  239. See <<ccs-min-roundtrips>> to learn how this option works.
  240. <<ccs-unmin-roundtrips, Don't minimize network roundtrips>>::
  241. For search requests that include a scroll or inner hits, {es} sends multiple
  242. outgoing and ingoing requests to each remote cluster. You can also choose this
  243. option by setting the <<search,search>> API's
  244. <<ccs-minimize-roundtrips,`ccs_minimize_roundtrips`>> parameter to `false`.
  245. While typically slower, this approach may work well for networks with low
  246. latency.
  247. +
  248. See <<ccs-unmin-roundtrips>> to learn how this option works.
  249. [float]
  250. [[ccs-min-roundtrips]]
  251. === Minimize network roundtrips
  252. Here's how {ccs} works when you minimize network roundtrips.
  253. . You send a {ccs} request to your local cluster. A coordinating node in that
  254. cluster receives and parses the request.
  255. +
  256. image:images/ccs/ccs-min-roundtrip-client-request.png[]
  257. . The coordinating node sends a single search request to each cluster, including
  258. its own. Each cluster performs the search request independently.
  259. +
  260. image:images/ccs/ccs-min-roundtrip-cluster-search.png[]
  261. . Each remote cluster sends its search results back to the coordinating node.
  262. +
  263. image:images/ccs/ccs-min-roundtrip-cluster-results.png[]
  264. . After collecting results from each cluster, the coordinating node returns the
  265. final results in the {ccs} response.
  266. +
  267. image:images/ccs/ccs-min-roundtrip-client-response.png[]
  268. [float]
  269. [[ccs-unmin-roundtrips]]
  270. === Don't minimize network roundtrips
  271. Here's how {ccs} works when you don't minimize network roundtrips.
  272. . You send a {ccs} request to your local cluster. A coordinating node in that
  273. cluster receives and parses the request.
  274. +
  275. image:images/ccs/ccs-min-roundtrip-client-request.png[]
  276. . The coordinating node sends a <<search-shards,search shards>> API request to
  277. each remote cluster.
  278. +
  279. image:images/ccs/ccs-min-roundtrip-cluster-search.png[]
  280. . Each remote cluster sends its response back to the coordinating node.
  281. This response contains information about the indices and shards the {ccs}
  282. request will be executed on.
  283. +
  284. image:images/ccs/ccs-min-roundtrip-cluster-results.png[]
  285. . The coordinating node sends a search request to each shard, including those in
  286. its own cluster. Each shard performs the search request independently.
  287. +
  288. image:images/ccs/ccs-dont-min-roundtrip-shard-search.png[]
  289. . Each shard sends its search results back to the coordinating node.
  290. +
  291. image:images/ccs/ccs-dont-min-roundtrip-shard-results.png[]
  292. . After collecting results from each cluster, the coordinating node returns the
  293. final results in the {ccs} response.
  294. +
  295. image:images/ccs/ccs-min-roundtrip-client-response.png[]