123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277 |
- [[processors]]
- == Ingest processor reference
- ++++
- <titleabbrev>Processor reference</titleabbrev>
- ++++
- An <<ingest,ingest pipeline>> is made up of a sequence of processors that are applied to documents as they are ingested into an index.
- Each processor performs a specific task, such as filtering, transforming, or enriching data.
- Each successive processor depends on the output of the previous processor, so the order of processors is important.
- The modified documents are indexed into {es} after all processors are applied.
- {es} includes over 40 configurable processors.
- The subpages in this section contain reference documentation for each processor.
- To get a list of available
- processors, use the <<cluster-nodes-info,nodes info>> API.
- [source,console]
- ----
- GET _nodes/ingest?filter_path=nodes.*.ingest.processors
- ----
- [discrete]
- [[ingest-processors-categories]]
- === Ingest processors by category
- We've categorized the available processors on this page and summarized their functions.
- This will help you find the right processor for your use case.
- * <<ingest-process-category-data-enrichment>>
- * <<ingest-process-category-data-transformation>>
- * <<ingest-process-category-data-filtering>>
- * <<ingest-process-category-pipeline-handling>>
- * <<ingest-process-category-array-json-handling>>
- [discrete]
- [[ingest-process-category-data-enrichment]]
- === Data enrichment processors
- [discrete]
- [[ingest-process-category-data-enrichment-general]]
- ==== General outcomes
- <<append-processor, `append` processor>>::
- Appends a value to a field.
- <<date-index-name-processor, `date_index_name` processor>>::
- Points documents to the right time-based index based on a date or timestamp field.
- <<enrich-processor, `enrich` processor>>::
- Enriches documents with data from another index.
- [TIP]
- ====
- Refer to <<ingest-enriching-data, Enrich your data>> for detailed examples of how to use the `enrich` processor to add data from your existing indices to incoming documents during ingest.
- ====
- <<inference-processor, `inference` processor>>::
- Uses {ml} to classify and tag text fields.
- [discrete]
- [[ingest-process-category-data-enrichment-specific]]
- ==== Specific outcomes
- <<attachment, `attachment` processor>>::
- Parses and indexes binary data, such as PDFs and Word documents.
- <<ingest-circle-processor, `circle` processor>>::
- Converts a location field to a Geo-Point field.
- <<community-id-processor, `community_id` processor>>::
- Computes the Community ID for network flow data.
- <<fingerprint-processor, `fingerprint` processor>>::
- Computes a hash of the document’s content.
- <<ingest-geo-grid-processor, `geo_grid` processor>>::
- Converts geo-grid definitions of grid tiles or cells to regular bounding boxes or polygons which describe their shape.
- <<geoip-processor, `geoip` processor>>::
- Adds information about the geographical location of an IPv4 or IPv6 address from a Maxmind database.
- <<ip-location-processor, `ip_location` processor>>::
- Adds information about the geographical location of an IPv4 or IPv6 address from an ip geolocation database.
- <<network-direction-processor, `network_direction` processor>>::
- Calculates the network direction given a source IP address, destination IP address, and a list of internal networks.
- <<normalize-for-stream-processor, `normalize_for_stream` processor>>::
- Normalizes non-OpenTelemetry documents to be OpenTelemetry-compliant.
- <<registered-domain-processor, `registered_domain` processor>>::
- Extracts the registered domain (also known as the effective top-level domain or eTLD), sub-domain, and top-level domain from a fully qualified domain name (FQDN).
- <<ingest-node-set-security-user-processor, `set_security_user` processor>>::
- Sets user-related details (such as `username`, `roles`, `email`, `full_name`,`metadata`, `api_key`, `realm` and `authentication_type`) from the current authenticated user.
- <<uri-parts-processor, `uri_parts` processor>>::
- Parses a Uniform Resource Identifier (URI) string and extracts its components as an object.
- <<urldecode-processor, `urldecode` processor>>::
- URL-decodes a string.
- <<user-agent-processor, `user_agent` processor>>::
- Parses user-agent strings to extract information about web clients.
- [discrete]
- [[ingest-process-category-data-transformation]]
- === Data transformation processors
- [discrete]
- [[ingest-process-category-data-transformation-general]]
- ==== General outcomes
- <<convert-processor, `convert` processor>>::
- Converts a field in the currently ingested document to a different type, such as converting a string to an integer.
- <<dissect-processor, `dissect` processor>>::
- Extracts structured fields out of a single text field within a document.
- Unlike the <<grok-processor,grok processor>>, dissect does not use regular expressions.
- This makes the dissect's a simpler and often faster alternative.
- <<grok-processor, `grok` processor>>::
- Extracts structured fields out of a single text field within a document, using the <<grok, Grok>> regular expression dialect that supports reusable aliased expressions.
- <<gsub-processor, `gsub` processor>>::
- Converts a string field by applying a regular expression and a replacement.
- <<redact-processor, `redact` processor>>::
- Uses the <<grok, Grok>> rules engine to obscure text in the input document matching the given Grok patterns.
- <<rename-processor, `rename` processor>>::
- Renames an existing field.
- <<set-processor, `set` processor>>::
- Sets a value on a field.
- [discrete]
- [[ingest-process-category-data-transformation-specific]]
- ==== Specific outcomes
- <<bytes-processor, `bytes` processor>>::
- Converts a human-readable byte value to its value in bytes (for example `1kb` becomes `1024`).
- <<csv-processor, `csv` processor>>::
- Extracts a single line of CSV data from a text field.
- <<date-processor, `date` processor>>::
- Extracts and converts date fields.
- <<dot-expand-processor, `dot_expand` processor>>::
- Expands a field with dots into an object field.
- <<htmlstrip-processor, `html_strip` processor>>::
- Removes HTML tags from a field.
- <<join-processor, `join` processor>>::
- Joins each element of an array into a single string using a separator character between each element.
- <<kv-processor, `kv` processor>>::
- Parse messages (or specific event fields) containing key-value pairs.
- <<lowercase-processor, `lowercase` processor>> and <<uppercase-processor, `uppercase` processor>>::
- Converts a string field to lowercase or uppercase.
- <<split-processor, `split` processor>>::
- Splits a field into an array of values.
- <<trim-processor, `trim` processor>>::
- Trims whitespace from field.
- [discrete]
- [[ingest-process-category-data-filtering]]
- === Data filtering processors
- <<drop-processor, `drop` processor>>::
- Drops the document without raising any errors.
- <<remove-processor, `remove` processor>>::
- Removes fields from documents.
- [discrete]
- [[ingest-process-category-pipeline-handling]]
- === Pipeline handling processors
- <<fail-processor, `fail` processor>>::
- Raises an exception. Useful for when you expect a pipeline to fail and want to relay a specific message to the requester.
- <<pipeline-processor, `pipeline` processor>>::
- Executes another pipeline.
- <<reroute-processor, `reroute` processor>>::
- Reroutes documents to another target index or data stream.
- <<terminate-processor, `terminate` processor>>::
- Terminates the current ingest pipeline, causing no further processors to be run.
- [discrete]
- [[ingest-process-category-array-json-handling]]
- === Array/JSON handling processors
- <<foreach-processor, `for_each` processor>>::
- Runs an ingest processor on each element of an array or object.
- <<json-processor, `json` processor>>::
- Converts a JSON string into a structured JSON object.
- <<script-processor, `script` processor>>::
- Runs an inline or stored <<modules-scripting, script>> on incoming documents.
- The script runs in the {painless}/painless-ingest-processor-context.html[painless `ingest` context].
- <<sort-processor, `sort` processor>>::
- Sorts the elements of an array in ascending or descending order.
- [discrete]
- [[ingest-process-plugins]]
- === Add additional processors
- You can install additional processors as {plugins}/ingest.html[plugins].
- You must install any plugin processors on all nodes in your cluster. Otherwise,
- {es} will fail to create pipelines containing the processor.
- Mark a plugin as mandatory by setting `plugin.mandatory` in
- `elasticsearch.yml`. A node will fail to start if a mandatory plugin is not
- installed.
- [source,yaml]
- ----
- plugin.mandatory: my-ingest-plugin
- ----
- include::processors/append.asciidoc[]
- include::processors/attachment.asciidoc[]
- include::processors/bytes.asciidoc[]
- include::processors/circle.asciidoc[]
- include::processors/community-id.asciidoc[]
- include::processors/convert.asciidoc[]
- include::processors/csv.asciidoc[]
- include::processors/date.asciidoc[]
- include::processors/date-index-name.asciidoc[]
- include::processors/dissect.asciidoc[]
- include::processors/dot-expand.asciidoc[]
- include::processors/drop.asciidoc[]
- include::processors/enrich.asciidoc[]
- include::processors/fail.asciidoc[]
- include::processors/fingerprint.asciidoc[]
- include::processors/foreach.asciidoc[]
- include::processors/geo-grid.asciidoc[]
- include::processors/geoip.asciidoc[]
- include::processors/grok.asciidoc[]
- include::processors/gsub.asciidoc[]
- include::processors/html_strip.asciidoc[]
- include::processors/inference.asciidoc[]
- include::processors/ip-location.asciidoc[]
- include::processors/join.asciidoc[]
- include::processors/json.asciidoc[]
- include::processors/kv.asciidoc[]
- include::processors/lowercase.asciidoc[]
- include::processors/network-direction.asciidoc[]
- include::processors/normalize-for-stream.asciidoc[]
- include::processors/pipeline.asciidoc[]
- include::processors/redact.asciidoc[]
- include::processors/registered-domain.asciidoc[]
- include::processors/remove.asciidoc[]
- include::processors/rename.asciidoc[]
- include::processors/reroute.asciidoc[]
- include::processors/script.asciidoc[]
- include::processors/set.asciidoc[]
- include::processors/set-security-user.asciidoc[]
- include::processors/sort.asciidoc[]
- include::processors/split.asciidoc[]
- include::processors/terminate.asciidoc[]
- include::processors/trim.asciidoc[]
- include::processors/uppercase.asciidoc[]
- include::processors/url-decode.asciidoc[]
- include::processors/uri-parts.asciidoc[]
- include::processors/user-agent.asciidoc[]
|