esql-lookup-join.asciidoc 6.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184
  1. === LOOKUP JOIN
  2. ++++
  3. <titleabbrev>Correlate data with LOOKUP JOIN</titleabbrev>
  4. ++++
  5. The {esql} <<esql-lookup-join,LOOKUP join>>
  6. processing command combines data from your {esql} query results
  7. table with matching records from a specified lookup index. It adds
  8. fields from the lookup index as new columns to your results table based
  9. on matching values in the join field.
  10. Teams often have data scattered across multiple indices – like logs,
  11. IPs, user IDs, hosts, employees etc. Without a direct way to enrich or
  12. correlate each event with reference data, root-cause analysis, security
  13. checks, and operational insights become time-consuming.
  14. For example, you can use `LOOKUP JOIN` to:
  15. * Retrieve environment or ownership details for each host to correlate
  16. your metrics data.
  17. * Quickly see if any source IPs match known malicious addresses.
  18. * Tag logs with the owning team or escalation info for faster triage and
  19. incident response.
  20. <<esql-lookup-join,LOOKUP join>> is similar to <<esql-enrich-data,ENRICH>>
  21. in the fact that they both help you join data together. You should use
  22. `LOOKUP JOIN` when:
  23. * Your enrichment data changes frequently
  24. * You want to avoid index-time processing
  25. * You're working with regular indices
  26. * You need to preserve distinct matches
  27. * You need to match on any field in a lookup index
  28. * You use document or field level security
  29. * You want to restrict users to a specific lookup indices that they can
  30. you
  31. [discrete]
  32. [[esql-how-lookup-join-works]]
  33. ==== How the `LOOKUP JOIN` command works
  34. The `LOOKUP JOIN` command adds new columns to a table, with data from
  35. {es} indices.
  36. image::images/esql/esql-lookup-join.png[align="center"]
  37. [[esql-lookup-join-lookup-index]]
  38. lookup_index::
  39. The name of the lookup index. This must
  40. be a specific index name - wildcards, aliases, and remote cluster
  41. references are not supported.
  42. [[esql-lookup-join-field-name]]
  43. field_name::
  44. The field to join on. This field must exist
  45. in both your current query results and in the lookup index. If the field
  46. contains multi-valued entries, those entries will not match anything
  47. (the added fields will contain `null` for those rows).
  48. [discrete]
  49. [[esql-lookup-join-example]]
  50. ==== Example
  51. `LOOKUP JOIN` has left-join behavior. If no rows match in the looked index, `LOOKUP JOIN` retains the incoming row and adds `null`s. If many rows in the lookedup index match, `LOOKUP JOIN` adds one row per match.
  52. In this example, we have two sample tables:
  53. *employees*
  54. [cols=",,,,,",options="header",]
  55. |===
  56. |birth++_++date |emp++_++no |first++_++name |gender |hire++_++date
  57. |language
  58. |1955-10-04T00:00:00Z |10091 |Amabile |M |1992-11-18T00:00:00Z |3
  59. |1964-10-18T00:00:00Z |10092 |Valdiodio |F |1989-09-22T00:00:00Z |1
  60. |1964-06-11T00:00:00Z |10093 |Sailaja |M |1996-11-05T00:00:00Z |3
  61. |1957-05-25T00:00:00Z |10094 |Arumugam |F |1987-04-18T00:00:00Z |5
  62. |1965-01-03T00:00:00Z |10095 |Hilari |M |1986-07-15T00:00:00Z |4
  63. |===
  64. *languages++_++non++_++unique++_++key*
  65. [cols=",,",options="header",]
  66. |===
  67. |language++_++code |language++_++name |country
  68. |1 |English |Canada
  69. |1 |English |
  70. |1 | |United Kingdom
  71. |1 |English |United States of America
  72. |2 |German |++[++Germany{vbar}Austria++]++
  73. |2 |German |Switzerland
  74. |2 |German |
  75. |4 |Quenya |
  76. |5 | |Atlantis
  77. |++[++6{vbar}7++]++ |Mv-Lang |Mv-Land
  78. |++[++7{vbar}8++]++ |Mv-Lang2 |Mv-Land2
  79. |Null-Lang |Null-Land |
  80. |Null-Lang2 |Null-Land2 |
  81. |===
  82. Running the following query would provide the results shown below.
  83. [source,esql]
  84. ----
  85. FROM employees
  86. | EVAL language_code = emp_no % 10
  87. | LOOKUP JOIN languages_lookup_non_unique_key ON language_code
  88. | WHERE emp_no > 10090 AND emp_no < 10096
  89. | SORT emp_no, country
  90. | KEEP emp_no, language_code, language_name, country;
  91. ----
  92. [cols=",,,",options="header",]
  93. |===
  94. |emp++_++no |language++_++code |language++_++name |country
  95. |10091 |1 |English |Canada
  96. |10091 |1 |null |United Kingdom
  97. |10091 |1 |English |United States of America
  98. |10091 |1 |English |null
  99. |10092 |2 |German |++[++Germany, Austria++]++
  100. |10092 |2 |German |Switzerland
  101. |10092 |2 |German |null
  102. |10093 |3 |null |null
  103. |10094 |4 |Spanish |null
  104. |10095 |5 |null |France
  105. |===
  106. [IMPORTANT]
  107. ====
  108. `LOOKUP JOIN` does not guarantee the output to be in
  109. any particular order. If a certain order is required, users should use a
  110. <<esql-sort,`SORT`>> somewhere after the `LOOKUP JOIN`.
  111. ====
  112. [discrete]
  113. [[esql-lookup-join-prereqs]]
  114. ==== Prerequisites
  115. To use `LOOKUP JOIN`, the following requirements must be met:
  116. * *Compatible data types*: The join key and join field in the lookup
  117. index must have compatible data types. This means:
  118. ** The data types must either be identical or be internally represented
  119. as the same type in Elasticsearch's type system
  120. ** Numeric types follow these compatibility rules:
  121. *** `short` and `byte` are compatible with `integer` (all represented as
  122. `int`)
  123. *** `float`, `half_float`, and `scaled_float` are compatible
  124. with `double` (all represented as `double`)
  125. ** For text fields: You can use text fields on the left-hand side of the
  126. join only if they have a `.keyword` subfield
  127. For a complete list of supported data types and their internal
  128. representations, see the <<esql-supported-types,Supported Field Types documentation>>.
  129. [discrete]
  130. [[esql-lookup-join-limitations]]
  131. ==== Limitations
  132. The following are the current limitations with `LOOKUP JOIN`
  133. * `LOOKUP JOIN` will be successful if the join field in the lookup index
  134. is a `KEYWORD` type. If the main index's join field is `TEXT` type, it
  135. must have an exact `.keyword` subfield that can be matched with the
  136. lookup index's `KEYWORD` field.
  137. * Indices in <<index-mode-setting,lookup>> mode are always single-sharded.
  138. * Cross cluster search is unsupported. Both source and lookup indices
  139. must be local.
  140. * `LOOKUP JOIN` can only use a single match field and a single index.
  141. Wildcards, aliases, datemath, and datastreams are not supported.
  142. * The name of the match field in
  143. `LOOKUP JOIN lu++_++idx ON match++_++field` must match an existing field
  144. in the query. This may require renames or evals to achieve.
  145. * The query will circuit break if there are too many matching documents
  146. in the lookup index, or if the documents are too large. More precisely,
  147. `LOOKUP JOIN` works in batches of, normally, about 10,000 rows; a large
  148. amount of heap space is needed if the matching documents from the lookup
  149. index for a batch are multiple megabytes or larger. This is roughly the
  150. same as for `ENRICH`.