123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277 |
- === LOOKUP JOIN
- ++++
- <titleabbrev>Correlate data with LOOKUP JOIN</titleabbrev>
- ++++
- // hack because page didn't have explicit id originally we could link to using internal link syntax
- [[esql-lookup-join-landing-page]]
- [WARNING]
- ====
- This functionality is in technical preview and may be
- changed or removed in a future release. Elastic will work to fix any
- issues, but features in technical preview are not subject to the support
- SLA of official GA features.
- ====
- The {esql} <<esql-lookup-join,LOOKUP join>>
- processing command combines data from your {esql} query results
- table with matching records from a specified lookup index. It adds
- fields from the lookup index as new columns to your results table based
- on matching values in the join field.
- Teams often have data scattered across multiple indices – like logs,
- IPs, user IDs, hosts, employees etc. Without a direct way to enrich or
- correlate each event with reference data, root-cause analysis, security
- checks, and operational insights become time-consuming.
- For example, you can use `LOOKUP JOIN` to:
- * Retrieve environment or ownership details for each host to correlate
- your metrics data.
- * Quickly see if any source IPs match known malicious addresses.
- * Tag logs with the owning team or escalation info for faster triage and
- incident response.
- [discrete]
- [[esql-compare-with-enrich]]
- ==== Compare with ENRICH
- <<esql-lookup-join,LOOKUP join>> is similar to <<esql-enrich-data,ENRICH>>
- in the fact that they both help you join data together. You should use
- `LOOKUP JOIN` when:
- * Your enrichment data changes frequently
- * You want to avoid index-time processing
- * You want SQL-like behavior, so that multiple matches result in multiple rows
- * You need to match on any field in a lookup index
- * You use document or field level security
- * You want to restrict users to use only specific lookup indices
- * You do not need to match using ranges or spatial relations
- [discrete]
- [[esql-how-lookup-join-works]]
- ==== How the command works
- The `LOOKUP JOIN` command adds fields from the lookup index as new columns
- to your results table based on matching values in the join field.
- [source,esql]
- ----
- LOOKUP JOIN <lookup_index> ON <field_name>
- ----
- The command requires two parameters:
- [[esql-lookup-join-lookup-index]]
- lookup_index::
- The name of the lookup index. This must
- be a specific index name - wildcards, aliases, and remote cluster
- references are not supported. Indices used for lookups must be configured with the <<index-mode-setting,`lookup` mode>>.
- [[esql-lookup-join-field-name]]
- field_name::
- The field to join on. This field must exist
- in both your current query results and in the lookup index. If the field
- contains multi-valued entries, those entries will not match anything
- (the added fields will contain `null` for those rows).
- image::images/esql/esql-lookup-join.png[align="center"]
- If you're familiar with SQL, `LOOKUP JOIN` has left-join behavior. This means that
- if no rows match in the lookup index, the incoming row is retained and `null` values are added. If many rows in the lookup index match, `LOOKUP JOIN` adds one row per match.
- [discrete]
- [[esql-lookup-join-example]]
- ==== Example
- You can run this example for yourself to see how it works by setting up the indices and adding sample data. Otherwise, you just inspect the query and response.
- [discrete]
- [[esql-lookup-join-example-setup-sample-data]]
- ===== Sample data
- .*Expand for setup instructions*
- [%collapsible]
- ==============
- **Set up indices**
- First, let's create two indices with mappings: `threat_list` and `firewall_logs`.
- [source,console]
- ----
- PUT threat_list
- {
- "settings": {
- "index.mode": "lookup" <1>
- },
- "mappings": {
- "properties": {
- "source.ip": { "type": "ip" },
- "threat_level": { "type": "keyword" },
- "threat_type": { "type": "keyword" },
- "last_updated": { "type": "date" }
- }
- }
- }
- ----
- <1> The lookup index must be set up using this mode
- [source,console]
- ----
- PUT firewall_logs
- {
- "mappings": {
- "properties": {
- "timestamp": { "type": "date" },
- "source.ip": { "type": "ip" },
- "destination.ip": { "type": "ip" },
- "action": { "type": "keyword" },
- "bytes_transferred": { "type": "long" }
- }
- }
- }
- ----
- *Add sample data*
- Next, let's add some sample data to both indices. The `threat_list` index contains known malicious IPs, while the `firewall_logs` index contains logs of network traffic.
- [source,console]
- ----
- POST threat_list/_bulk
- {"index":{}}
- {"source.ip":"203.0.113.5","threat_level":"high","threat_type":"C2_SERVER","last_updated":"2025-04-22"}
- {"index":{}}
- {"source.ip":"198.51.100.2","threat_level":"medium","threat_type":"SCANNER","last_updated":"2025-04-23"}
- ----
- [source,console]
- ----
- POST firewall_logs/_bulk
- {"index":{}}
- {"timestamp":"2025-04-23T10:00:01Z","source.ip":"192.0.2.1","destination.ip":"10.0.0.100","action":"allow","bytes_transferred":1024}
- {"index":{}}
- {"timestamp":"2025-04-23T10:00:05Z","source.ip":"203.0.113.5","destination.ip":"10.0.0.55","action":"allow","bytes_transferred":2048}
- {"index":{}}
- {"timestamp":"2025-04-23T10:00:08Z","source.ip":"198.51.100.2","destination.ip":"10.0.0.200","action":"block","bytes_transferred":0}
- {"index":{}}
- {"timestamp":"2025-04-23T10:00:15Z","source.ip":"203.0.113.5","destination.ip":"10.0.0.44","action":"allow","bytes_transferred":4096}
- {"index":{}}
- {"timestamp":"2025-04-23T10:00:30Z","source.ip":"192.0.2.1","destination.ip":"10.0.0.100","action":"allow","bytes_transferred":512}
- ----
- ==============
- [discrete]
- [[esql-lookup-join-example-query]]
- ===== Query the Data
- [source,esql]
- ----
- FROM firewall_logs <1>
- | LOOKUP JOIN threat_list ON source.ip <2>
- | WHERE threat_level IS NOT NULL <3>
- | SORT timestamp <4>
- | KEEP source.ip, action, threat_level, threat_type <5>
- | LIMIT 10 <6>
- ----
- <1> The source index
- <2> The lookup index and join field
- <3> Filter for rows with non-null threat levels
- <4> LOOKUP JOIN does not guarantee output order, so you must explicitly sort
- <5> Keep only relevant fields
- <6> Limit the output to 10 rows
- [discrete]
- [[esql-lookup-join-example-response]]
- ===== Response
- A successful query will output a table like this:
- [cols="4*",options="header"]
- |===
- |source.ip |action |threat_type |threat_level
- |203.0.113.5 |allow |C2_SERVER |high
- |198.51.100.2 |block |SCANNER |medium
- |203.0.113.5 |allow |C2_SERVER |high
- |===
- In this example, you can see that the `source.ip` field from the `firewall_logs` index is matched with the `source.ip` field in the `threat_list` index, and the corresponding `threat_level` and `threat_type` fields are added to the output.
- [discrete]
- [[esql-lookup-join-additional-examples]]
- ===== Additional examples
- Refer to the examples section of the <<esql-lookup-join,LOOKUP JOIN>> command reference for more examples.
- [discrete]
- [[esql-lookup-join-prereqs]]
- ==== Prerequisites
- To use `LOOKUP JOIN`, the following requirements must be met:
- * Indices used for lookups must be configured with the <<index-mode-setting,`lookup` mode>>
- * *Compatible data types*: The join key and join field in the lookup
- index must have compatible data types. This means:
- ** The data types must either be identical or be internally represented
- as the same type in {esql}
- ** Numeric types follow these compatibility rules:
- *** `short` and `byte` are compatible with `integer` (all represented as
- `int`)
- *** `float`, `half_float`, and `scaled_float` are compatible
- with `double` (all represented as `double`)
- ** For text fields: You can only use text fields as the join key on the
- left-hand side of the join and only if they have a `.keyword` subfield
- To obtain a join key with a compatible type, use a
- <<esql-type-conversion-functions,conversion function>> if needed.
- For a complete list of supported data types and their internal
- representations, see the <<esql-supported-types,Supported Field Types documentation>>.
- [discrete]
- [[esql-lookup-join-usage-notes]]
- ==== Usage notes
- This section covers important details about `LOOKUP JOIN` that impact query behavior and results. Review these details to ensure your queries work as expected and to troubleshoot unexpected results.
- [discrete]
- [[esql-lookup-join-usage-notes-name-collisions]]
- ===== Handling name collisions
- When fields from the lookup index match existing column names, the new columns override the existing ones.
- Before the `LOOKUP JOIN` command, preserve columns by either:
- * Using `RENAME` to assign non-conflicting names
- * Using `EVAL` to create new columns with different names
- [discrete]
- [[esql-lookup-join-usage-notes-sorting]]
- ===== Sorting behavior
- The output rows produced by `LOOKUP JOIN` can be in any order and may not
- respect preceding `SORT` commands. To guarantee a certain ordering, place a `SORT` after any `LOOKUP JOIN` commands.
- [discrete]
- [[esql-lookup-join-limitations]]
- ==== Limitations
- The following are the current limitations with `LOOKUP JOIN`
- * Indices in <<index-mode-setting,`lookup`>> mode are always single-sharded.
- * Cross cluster search is unsupported initially. Both source and lookup indices
- must be local.
- * Currently, only matching on equality is supported.
- * `LOOKUP JOIN` can only use a single match field and a single index.
- Wildcards, aliases, datemath, and datastreams are not supported.
- * The name of the match field in
- `LOOKUP JOIN lu++_++idx ON match++_++field` must match an existing field
- in the query. This may require renames or evals to achieve.
- * The query will circuit break if there are too many matching documents
- in the lookup index, or if the documents are too large. More precisely,
- `LOOKUP JOIN` works in batches of, normally, about 10,000 rows; a large
- amount of heap space is needed if the matching documents from the lookup
- index for a batch are multiple megabytes or larger. This is roughly the
- same as for `ENRICH`.
|