esql-multi-index.asciidoc 6.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175
  1. [[esql-multi-index]]
  2. === Using {esql} to query multiple indices
  3. ++++
  4. <titleabbrev>Using {esql} to query multiple indices</titleabbrev>
  5. ++++
  6. With {esql}, you can execute a single query across multiple indices, data streams, or aliases.
  7. To do so, use wildcards and date arithmetic. The following example uses a comma-separated list and a wildcard:
  8. [source,esql]
  9. ----
  10. FROM employees-00001,other-employees-*
  11. ----
  12. Use the format `<remote_cluster_name>:<target>` to <<esql-cross-clusters, query data streams and indices
  13. on remote clusters>>:
  14. [source,esql]
  15. ----
  16. FROM cluster_one:employees-00001,cluster_two:other-employees-*
  17. ----
  18. [discrete]
  19. [[esql-multi-index-invalid-mapping]]
  20. === Field type mismatches
  21. When querying multiple indices, data streams, or aliases, you might find that the same field is mapped to multiple different types.
  22. For example, consider the two indices with the following field mappings:
  23. *index: events_ip*
  24. ```
  25. {
  26. "mappings": {
  27. "properties": {
  28. "@timestamp": { "type": "date" },
  29. "client_ip": { "type": "ip" },
  30. "event_duration": { "type": "long" },
  31. "message": { "type": "keyword" }
  32. }
  33. }
  34. }
  35. ```
  36. *index: events_keyword*
  37. ```
  38. {
  39. "mappings": {
  40. "properties": {
  41. "@timestamp": { "type": "date" },
  42. "client_ip": { "type": "keyword" },
  43. "event_duration": { "type": "long" },
  44. "message": { "type": "keyword" }
  45. }
  46. }
  47. }
  48. ```
  49. When you query each of these individually with a simple query like `FROM events_ip`, the results are provided with type-specific columns:
  50. [source.merge.styled,esql]
  51. ----
  52. FROM events_ip
  53. | SORT @timestamp DESC
  54. ----
  55. [%header.monospaced.styled,format=dsv,separator=|]
  56. |===
  57. @timestamp:date | client_ip:ip | event_duration:long | message:keyword
  58. 2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1
  59. 2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error
  60. 2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error
  61. |===
  62. Note how the `client_ip` column is correctly identified as type `ip`, and all values are displayed.
  63. However, if instead the query sources two conflicting indices with `FROM events_*`, the type of the `client_ip` column cannot be determined
  64. and is reported as `unsupported` with all values returned as `null`.
  65. [[query-unsupported]]
  66. [source.merge.styled,esql]
  67. ----
  68. FROM events_*
  69. | SORT @timestamp DESC
  70. ----
  71. [%header.monospaced.styled,format=dsv,separator=|]
  72. |===
  73. @timestamp:date | client_ip:unsupported | event_duration:long | message:keyword
  74. 2023-10-23T13:55:01.543Z | null | 1756467 | Connected to 10.1.0.1
  75. 2023-10-23T13:53:55.832Z | null | 5033755 | Connection error
  76. 2023-10-23T13:52:55.015Z | null | 8268153 | Connection error
  77. 2023-10-23T13:51:54.732Z | null | 725448 | Connection error
  78. 2023-10-23T13:33:34.937Z | null | 1232382 | Disconnected
  79. 2023-10-23T12:27:28.948Z | null | 2764889 | Connected to 10.1.0.2
  80. 2023-10-23T12:15:03.360Z | null | 3450233 | Connected to 10.1.0.3
  81. |===
  82. In addition, if the query refers to this unsupported field directly, the query fails:
  83. [source.merge.styled,esql]
  84. ----
  85. FROM events_*
  86. | KEEP @timestamp, client_ip, event_duration, message
  87. | SORT @timestamp DESC
  88. ----
  89. [source,bash]
  90. ----
  91. Cannot use field [client_ip] due to ambiguities being mapped as
  92. [2] incompatible types:
  93. [ip] in [events_ip],
  94. [keyword] in [events_keyword]
  95. ----
  96. [discrete]
  97. [[esql-multi-index-union-types]]
  98. === Union types
  99. {esql} has a way to handle <<esql-multi-index-invalid-mapping, field type mismatches>>. When the same field is mapped to multiple types in multiple indices,
  100. the type of the field is understood to be a _union_ of the various types in the index mappings.
  101. As seen in the preceding examples, this _union type_ cannot be used in the results,
  102. and cannot be referred to by the query
  103. -- except when it's passed to a type conversion function that accepts all the types in the _union_ and converts the field
  104. to a single type. {esql} offers a suite of <<esql-type-conversion-functions,type conversion functions>> to achieve this.
  105. In the above examples, the query can use a command like `EVAL client_ip = TO_IP(client_ip)` to resolve
  106. the union of `ip` and `keyword` to just `ip`.
  107. You can also use the type-conversion syntax `EVAL client_ip = client_ip::IP`.
  108. Alternatively, the query could use <<esql-to_string,`TO_STRING`>> to convert all supported types into `KEYWORD`.
  109. For example, the <<query-unsupported,query>> that returned `client_ip:unsupported` with `null` values can be improved using the `TO_IP` function or the equivalent `field::ip` syntax.
  110. These changes also resolve the error message.
  111. As long as the only reference to the original field is to pass it to a conversion function that resolves the type ambiguity, no error results.
  112. [source.merge.styled,esql]
  113. ----
  114. FROM events_*
  115. | EVAL client_ip = TO_IP(client_ip)
  116. | KEEP @timestamp, client_ip, event_duration, message
  117. | SORT @timestamp DESC
  118. ----
  119. [%header.monospaced.styled,format=dsv,separator=|]
  120. |===
  121. @timestamp:date | client_ip:ip | event_duration:long | message:keyword
  122. 2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1
  123. 2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error
  124. 2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error
  125. 2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error
  126. 2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected
  127. 2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2
  128. 2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3
  129. |===
  130. [discrete]
  131. [[esql-multi-index-index-metadata]]
  132. === Index metadata
  133. It can be helpful to know the particular index from which each row is sourced.
  134. To get this information, use the <<esql-metadata-fields,`METADATA`>> option on the <<esql-from,`FROM`>> command.
  135. [source.merge.styled,esql]
  136. ----
  137. FROM events_* METADATA _index
  138. | EVAL client_ip = TO_IP(client_ip)
  139. | KEEP _index, @timestamp, client_ip, event_duration, message
  140. | SORT @timestamp DESC
  141. ----
  142. [%header.monospaced.styled,format=dsv,separator=|]
  143. |===
  144. _index:keyword | @timestamp:date | client_ip:ip | event_duration:long | message:keyword
  145. events_ip | 2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1
  146. events_ip | 2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error
  147. events_ip | 2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error
  148. events_keyword | 2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error
  149. events_keyword | 2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected
  150. events_keyword | 2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2
  151. events_keyword | 2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3
  152. |===