123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230 |
- [[redact-processor]]
- === Redact processor
- ++++
- <titleabbrev>Redact</titleabbrev>
- ++++
- The Redact processor obscures portions of text in the input document
- matching the given Grok patterns. A Grok pattern is like a regular
- expression with a named capturing group: text that matches the regular
- expression is replaced with the capture name. {es} comes packaged with
- a number of useful predefined {es-repo}blob/{branch}/libs/grok/src/main/resources/patterns/ecs-v1[patterns].
- If one of those does not suit your needs, create a new pattern with a
- custom pattern definition. The Redact processor replaces every occurrence
- of a match. If there are multiple matches all will be replaced with the
- capture name.
- The Redact processor is compatible with {ecs-ref}/ecs-field-reference.html[Elastic Common Schema (ECS)]
- patterns. Legacy Grok patterns are not supported.
- [[using-redact]]
- ==== Using the Redact processor in a pipeline
- [[redact-options]]
- .Redact Options
- [options="header"]
- |======
- | Name | Required | Default | Description
- | `field` | yes | - | The field to be redacted
- | `patterns` | yes | - | A list of grok expressions to match and redact named captures with
- | `pattern_definitions` | no | - | A map of pattern-name and pattern tuples defining custom patterns to be used by the processor. Patterns matching existing names will override the pre-existing definition
- | `prefix` | no | < | Start a redacted section with this token
- | `suffix` | no | > | End a redacted section with this token
- | `ignore_missing` | no | false | If `true` and `field` does not exist or is `null`, the processor quietly exits without modifying the document
- include::common-options.asciidoc[]
- |======
- In this example the predefined `IP` Grok pattern is used to match
- and redact an IP addresses from the `message` text field. The pipeline
- is tested using the Simulate API.
- [source,console]
- --------------------------------------------------
- POST _ingest/pipeline/_simulate
- {
- "pipeline": {
- "description" : "Hide my IP",
- "processors": [
- {
- "redact": {
- "field": "message",
- "patterns": ["%{IP:client}"]
- }
- }
- ]
- },
- "docs":[
- {
- "_source": {
- "message": "55.3.244.1 GET /index.html 15824 0.043"
- }
- }
- ]
- }
- --------------------------------------------------
- The document in the response still contains the `message` field
- but now the IP address `55.3.244.1` is replaced by the text `<client>`.
- [source,console-result]
- --------------------------------------------------
- {
- "docs": [
- {
- "doc": {
- "_index": "_index",
- "_id": "_id",
- "_version": "-3",
- "_source": {
- "message": "<client> GET /index.html 15824 0.043"
- },
- "_ingest": {
- "timestamp": "2023-02-01T16:08:39.419056008Z"
- }
- }
- }
- ]
- }
- --------------------------------------------------
- // TESTRESPONSE[s/2023-02-01T16:08:39.419056008Z/$body.docs.0.doc._ingest.timestamp/]
- The IP address is replaced with the word `client` because that is what is
- specified in the Grok pattern `%{IP:client}`. The `<` and `>` tokens which
- surround the pattern name are configurable using the `prefix` and `suffix` options.
- The next example defines multiple patterns both of which are replaced
- with the word `REDACTED` and the prefix and suffix tokens are set to `*`
- [source,console]
- --------------------------------------------------
- POST _ingest/pipeline/_simulate
- {
- "pipeline": {
- "description": "Hide my IP",
- "processors": [
- {
- "redact": {
- "field": "message",
- "patterns": [
- "%{IP:REDACTED}",
- "%{EMAILADDRESS:REDACTED}"
- ],
- "prefix": "*",
- "suffix": "*"
- }
- }
- ]
- },
- "docs": [
- {
- "_source": {
- "message": "55.3.244.1 GET /index.html 15824 0.043 test@elastic.co"
- }
- }
- ]
- }
- --------------------------------------------------
- In the response both the IP `55.3.244.1` and email address `test@elastic.co`
- have been replaced by `*REDACTED*`.
- [source,console-result]
- --------------------------------------------------
- {
- "docs": [
- {
- "doc": {
- "_index": "_index",
- "_id": "_id",
- "_version": "-3",
- "_source": {
- "message": "*REDACTED* GET /index.html 15824 0.043 *REDACTED*"
- },
- "_ingest": {
- "timestamp": "2023-02-01T16:53:14.560005377Z"
- }
- }
- }
- ]
- }
- --------------------------------------------------
- // TESTRESPONSE[s/2023-02-01T16:53:14.560005377Z/$body.docs.0.doc._ingest.timestamp/]
- [[redact-custom-patterns]]
- ==== Custom patterns
- If one of the existing Grok {es-repo}blob/{branch}/libs/grok/src/main/resources/patterns/ecs-v1[patterns]
- does not fit your requirements custom patterns can be added with the
- `pattern_definitions` option. New patterns definitions are composed of
- a pattern name and the pattern itself. The pattern may be a regular
- expression or reference existing Grok patterns.
- This example defines the custom pattern `GITHUB_NAME` to match
- GitHub usernames. The pattern definition uses the existing
- `USERNAME` Grok https://github.com/elastic/elasticsearch/blob/{branch}/libs/grok/src/main/resources/patterns/ecs-v1/grok-patterns[pattern] prefixed by the
- literal `@`.
- NOTE: The {kibana-ref}/xpack-grokdebugger.html[Grok Debugger] is a really
- useful tool for building custom patterns.
- [source,console]
- --------------------------------------------------
- POST _ingest/pipeline/_simulate
- {
- "pipeline": {
- "processors": [
- {
- "redact": {
- "field": "message",
- "patterns": [
- "%{GITHUB_NAME:GITHUB_NAME}"
- ],
- "pattern_definitions": {
- "GITHUB_NAME": "@%{USERNAME}"
- }
- }
- }
- ]
- },
- "docs": [
- {
- "_source": {
- "message": "@elastic-data-management the PR is ready for review"
- }
- }
- ]
- }
- --------------------------------------------------
- The username is redacted in the response.
- [source,console-result]
- --------------------------------------------------
- {
- "docs": [
- {
- "doc": {
- "_index": "_index",
- "_id": "_id",
- "_version": "-3",
- "_source": {
- "message": "<GITHUB_NAME> the PR is ready for review"
- },
- "_ingest": {
- "timestamp": "2023-02-01T16:53:14.560005377Z"
- }
- }
- }
- ]
- }
- --------------------------------------------------
- // TESTRESPONSE[s/2023-02-01T16:53:14.560005377Z/$body.docs.0.doc._ingest.timestamp/]
- [[grok-watchdog-redact]]
- ==== Grok watchdog
- The watchdog interrupts expressions that take too long to execute.
- When interrupted, the Redact processor fails with an error.
- The same <<grok-watchdog-options, settings>> that control the
- Grok Watchdog timeout also apply to the Redact processor.
|