1
0

cross-cluster-search.asciidoc 9.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336
  1. [chapter]
  2. [[modules-cross-cluster-search]]
  3. = Search across clusters
  4. *{ccs-cap}* lets you run a single search request against one or more
  5. <<modules-remote-clusters,remote clusters>>. For example, you can use a {ccs} to
  6. filter and analyze log data stored on clusters in different data centers.
  7. IMPORTANT: {ccs-cap} requires <<modules-remote-clusters, remote clusters>>.
  8. [float]
  9. [[ccs-example]]
  10. == {ccs-cap} examples
  11. [float]
  12. [[ccs-remote-cluster-setup]]
  13. === Remote cluster setup
  14. To perform a {ccs}, you must have at least one remote cluster configured.
  15. The following <<cluster-update-settings,cluster update settings>> API request
  16. adds three remote clusters:`cluster_one`, `cluster_two`, and `cluster_three`.
  17. [source,console]
  18. --------------------------------
  19. PUT _cluster/settings
  20. {
  21. "persistent": {
  22. "cluster": {
  23. "remote": {
  24. "cluster_one": {
  25. "seeds": [
  26. "127.0.0.1:9300"
  27. ]
  28. },
  29. "cluster_two": {
  30. "seeds": [
  31. "127.0.0.1:9301"
  32. ]
  33. },
  34. "cluster_three": {
  35. "seeds": [
  36. "127.0.0.1:9302"
  37. ]
  38. }
  39. }
  40. }
  41. }
  42. }
  43. --------------------------------
  44. // TEST[setup:host]
  45. // TEST[s/127.0.0.1:930\d+/\${transport_host}/]
  46. [float]
  47. [[ccs-search-remote-cluster]]
  48. === Search a single remote cluster
  49. The following <<search,search>> API request searches the
  50. `twitter` index on a single remote cluster, `cluster_one`.
  51. [source,console]
  52. --------------------------------------------------
  53. GET /cluster_one:twitter/_search
  54. {
  55. "query": {
  56. "match": {
  57. "user": "kimchy"
  58. }
  59. }
  60. }
  61. --------------------------------------------------
  62. // TEST[continued]
  63. // TEST[setup:twitter]
  64. The API returns the following response:
  65. [source,console-result]
  66. --------------------------------------------------
  67. {
  68. "took": 150,
  69. "timed_out": false,
  70. "_shards": {
  71. "total": 1,
  72. "successful": 1,
  73. "failed": 0,
  74. "skipped": 0
  75. },
  76. "_clusters": {
  77. "total": 1,
  78. "successful": 1,
  79. "skipped": 0
  80. },
  81. "hits": {
  82. "total" : {
  83. "value": 1,
  84. "relation": "eq"
  85. },
  86. "max_score": 1,
  87. "hits": [
  88. {
  89. "_index": "cluster_one:twitter", <1>
  90. "_type": "_doc",
  91. "_id": "0",
  92. "_score": 1,
  93. "_source": {
  94. "user": "kimchy",
  95. "date": "2009-11-15T14:12:12",
  96. "message": "trying out Elasticsearch",
  97. "likes": 0
  98. }
  99. }
  100. ]
  101. }
  102. }
  103. --------------------------------------------------
  104. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  105. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  106. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  107. <1> The search response body includes the name of the remote cluster in the
  108. `_index` parameter.
  109. [float]
  110. [[ccs-search-multi-remote-cluster]]
  111. === Search multiple remote clusters
  112. The following <<search,search>> API request searches the `twitter` index on
  113. three clusters:
  114. * Your local cluster
  115. * Two remote clusters, `cluster_one` and `cluster_two`
  116. [source,console]
  117. --------------------------------------------------
  118. GET /twitter,cluster_one:twitter,cluster_two:twitter/_search
  119. {
  120. "query": {
  121. "match": {
  122. "user": "kimchy"
  123. }
  124. }
  125. }
  126. --------------------------------------------------
  127. // TEST[continued]
  128. The API returns the following response:
  129. [source,console-result]
  130. --------------------------------------------------
  131. {
  132. "took": 150,
  133. "timed_out": false,
  134. "num_reduce_phases": 4,
  135. "_shards": {
  136. "total": 3,
  137. "successful": 3,
  138. "failed": 0,
  139. "skipped": 0
  140. },
  141. "_clusters": {
  142. "total": 3,
  143. "successful": 3,
  144. "skipped": 0
  145. },
  146. "hits": {
  147. "total" : {
  148. "value": 3,
  149. "relation": "eq"
  150. },
  151. "max_score": 1,
  152. "hits": [
  153. {
  154. "_index": "twitter", <1>
  155. "_type": "_doc",
  156. "_id": "0",
  157. "_score": 2,
  158. "_source": {
  159. "user": "kimchy",
  160. "date": "2009-11-15T14:12:12",
  161. "message": "trying out Elasticsearch",
  162. "likes": 0
  163. }
  164. },
  165. {
  166. "_index": "cluster_one:twitter", <2>
  167. "_type": "_doc",
  168. "_id": "0",
  169. "_score": 1,
  170. "_source": {
  171. "user": "kimchy",
  172. "date": "2009-11-15T14:12:12",
  173. "message": "trying out Elasticsearch",
  174. "likes": 0
  175. }
  176. },
  177. {
  178. "_index": "cluster_two:twitter", <3>
  179. "_type": "_doc",
  180. "_id": "0",
  181. "_score": 1,
  182. "_source": {
  183. "user": "kimchy",
  184. "date": "2009-11-15T14:12:12",
  185. "message": "trying out Elasticsearch",
  186. "likes": 0
  187. }
  188. }
  189. ]
  190. }
  191. }
  192. --------------------------------------------------
  193. // TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
  194. // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
  195. // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
  196. // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
  197. <1> This document's `_index` parameter doesn't include a cluster name. This
  198. means the document came from the local cluster.
  199. <2> This document came from `cluster_one`.
  200. <3> This document came from `cluster_two`.
  201. [float]
  202. [[skip-unavailable-clusters]]
  203. == Skip unavailable clusters
  204. By default, a {ccs} returns an error if *any* cluster in the request is
  205. unavailable.
  206. To skip an unavailable cluster during a {ccs}, set the
  207. <<skip-unavailable,`skip_unavailable`>> cluster setting to `true`.
  208. The following <<cluster-update-settings,cluster update settings>> API request
  209. changes `cluster_two`'s `skip_unavailable` setting to `true`.
  210. [source,console]
  211. --------------------------------
  212. PUT _cluster/settings
  213. {
  214. "persistent": {
  215. "cluster.remote.cluster_two.skip_unavailable": true
  216. }
  217. }
  218. --------------------------------
  219. // TEST[continued]
  220. If `cluster_two` is disconnected or unavailable during a {ccs}, {es} won't
  221. include matching documents from that cluster in the final results.
  222. [float]
  223. [[ccs-works]]
  224. == How {ccs} works
  225. Because {ccs} involves sending requests to remote clusters, any network delays
  226. can impact search speed. To avoid slow searches, {ccs} offers two options for
  227. handling network delays:
  228. <<ccs-min-roundtrips,Minimize network roundtrips>>::
  229. By default, {es} reduces the number of network roundtrips between remote
  230. clusters. This reduces the impact of network delays on search speed. However,
  231. {es} can't reduce network roundtrips for large search requests, such as those
  232. including a <<request-body-search-scroll, scroll>> or
  233. <<request-body-search-inner-hits,inner hits>>.
  234. +
  235. See <<ccs-min-roundtrips>> to learn how this option works.
  236. <<ccs-unmin-roundtrips, Don't minimize network roundtrips>>::
  237. For search requests that include a scroll or inner hits, {es} sends multiple
  238. outgoing and ingoing requests to each remote cluster. You can also choose this
  239. option by setting the <<search,search>> API's
  240. <<ccs-minimize-roundtrips,`ccs_minimize_roundtrips`>> parameter to `false`.
  241. While typically slower, this approach may work well for networks with low
  242. latency.
  243. +
  244. See <<ccs-unmin-roundtrips>> to learn how this option works.
  245. [float]
  246. [[ccs-min-roundtrips]]
  247. === Minimize network roundtrips
  248. Here's how {ccs} works when you minimize network roundtrips.
  249. . You send a {ccs} request to your local cluster. A coordinating node in that
  250. cluster receives and parses the request.
  251. +
  252. image:images/ccs/ccs-min-roundtrip-client-request.png[]
  253. . The coordinating node sends a single search request to each cluster, including
  254. its own. Each cluster performs the search request independently.
  255. +
  256. image:images/ccs/ccs-min-roundtrip-cluster-search.png[]
  257. . Each remote cluster sends its search results back to the coordinating node.
  258. +
  259. image:images/ccs/ccs-min-roundtrip-cluster-results.png[]
  260. . After collecting results from each cluster, the coordinating node returns the
  261. final results in the {ccs} response.
  262. +
  263. image:images/ccs/ccs-min-roundtrip-client-response.png[]
  264. [float]
  265. [[ccs-unmin-roundtrips]]
  266. === Don't minimize network roundtrips
  267. Here's how {ccs} works when you don't minimize network roundtrips.
  268. . You send a {ccs} request to your local cluster. A coordinating node in that
  269. cluster receives and parses the request.
  270. +
  271. image:images/ccs/ccs-min-roundtrip-client-request.png[]
  272. . The coordinating node sends a <<search-shards,search shards>> API request to
  273. each remote cluster.
  274. +
  275. image:images/ccs/ccs-min-roundtrip-cluster-search.png[]
  276. . Each remote cluster sends its response back to the coordinating node.
  277. This response contains information about the indices and shards the {ccs}
  278. request will be executed on.
  279. +
  280. image:images/ccs/ccs-min-roundtrip-cluster-results.png[]
  281. . The coordinating node sends a search request to each shard, including those in
  282. its own cluster. Each shard performs the search request independently.
  283. +
  284. image:images/ccs/ccs-dont-min-roundtrip-shard-search.png[]
  285. . Each shard sends its search results back to the coordinating node.
  286. +
  287. image:images/ccs/ccs-dont-min-roundtrip-shard-results.png[]
  288. . After collecting results from each cluster, the coordinating node returns the
  289. final results in the {ccs} response.
  290. +
  291. image:images/ccs/ccs-min-roundtrip-client-response.png[]