redact.asciidoc 6.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230
  1. [[redact-processor]]
  2. === Redact processor
  3. ++++
  4. <titleabbrev>Redact</titleabbrev>
  5. ++++
  6. The Redact processor obscures portions of text in the input document
  7. matching the given Grok patterns. A Grok pattern is like a regular
  8. expression with a named capturing group: text that matches the regular
  9. expression is replaced with the capture name. {es} comes packaged with
  10. a number of useful predefined {es-repo}blob/{branch}/libs/grok/src/main/resources/patterns/ecs-v1[patterns].
  11. If one of those does not suit your needs, create a new pattern with a
  12. custom pattern definition. The Redact processor replaces every occurrence
  13. of a match. If there are multiple matches all will be replaced with the
  14. capture name.
  15. The Redact processor is compatible with {ecs-ref}/ecs-field-reference.html[Elastic Common Schema (ECS)]
  16. patterns. Legacy Grok patterns are not supported.
  17. [[using-redact]]
  18. ==== Using the Redact processor in a pipeline
  19. [[redact-options]]
  20. .Redact Options
  21. [options="header"]
  22. |======
  23. | Name | Required | Default | Description
  24. | `field` | yes | - | The field to be redacted
  25. | `patterns` | yes | - | A list of grok expressions to match and redact named captures with
  26. | `pattern_definitions` | no | - | A map of pattern-name and pattern tuples defining custom patterns to be used by the processor. Patterns matching existing names will override the pre-existing definition
  27. | `prefix` | no | < | Start a redacted section with this token
  28. | `suffix` | no | > | End a redacted section with this token
  29. | `ignore_missing` | no | false | If `true` and `field` does not exist or is `null`, the processor quietly exits without modifying the document
  30. include::common-options.asciidoc[]
  31. |======
  32. In this example the predefined `IP` Grok pattern is used to match
  33. and redact an IP addresses from the `message` text field. The pipeline
  34. is tested using the Simulate API.
  35. [source,console]
  36. --------------------------------------------------
  37. POST _ingest/pipeline/_simulate
  38. {
  39. "pipeline": {
  40. "description" : "Hide my IP",
  41. "processors": [
  42. {
  43. "redact": {
  44. "field": "message",
  45. "patterns": ["%{IP:client}"]
  46. }
  47. }
  48. ]
  49. },
  50. "docs":[
  51. {
  52. "_source": {
  53. "message": "55.3.244.1 GET /index.html 15824 0.043"
  54. }
  55. }
  56. ]
  57. }
  58. --------------------------------------------------
  59. The document in the response still contains the `message` field
  60. but now the IP address `55.3.244.1` is replaced by the text `<client>`.
  61. [source,console-result]
  62. --------------------------------------------------
  63. {
  64. "docs": [
  65. {
  66. "doc": {
  67. "_index": "_index",
  68. "_id": "_id",
  69. "_version": "-3",
  70. "_source": {
  71. "message": "<client> GET /index.html 15824 0.043"
  72. },
  73. "_ingest": {
  74. "timestamp": "2023-02-01T16:08:39.419056008Z"
  75. }
  76. }
  77. }
  78. ]
  79. }
  80. --------------------------------------------------
  81. // TESTRESPONSE[s/2023-02-01T16:08:39.419056008Z/$body.docs.0.doc._ingest.timestamp/]
  82. The IP address is replaced with the word `client` because that is what is
  83. specified in the Grok pattern `%{IP:client}`. The `<` and `>` tokens which
  84. surround the pattern name are configurable using the `prefix` and `suffix` options.
  85. The next example defines multiple patterns both of which are replaced
  86. with the word `REDACTED` and the prefix and suffix tokens are set to `*`
  87. [source,console]
  88. --------------------------------------------------
  89. POST _ingest/pipeline/_simulate
  90. {
  91. "pipeline": {
  92. "description": "Hide my IP",
  93. "processors": [
  94. {
  95. "redact": {
  96. "field": "message",
  97. "patterns": [
  98. "%{IP:REDACTED}",
  99. "%{EMAILADDRESS:REDACTED}"
  100. ],
  101. "prefix": "*",
  102. "suffix": "*"
  103. }
  104. }
  105. ]
  106. },
  107. "docs": [
  108. {
  109. "_source": {
  110. "message": "55.3.244.1 GET /index.html 15824 0.043 test@elastic.co"
  111. }
  112. }
  113. ]
  114. }
  115. --------------------------------------------------
  116. In the response both the IP `55.3.244.1` and email address `test@elastic.co`
  117. have been replaced by `*REDACTED*`.
  118. [source,console-result]
  119. --------------------------------------------------
  120. {
  121. "docs": [
  122. {
  123. "doc": {
  124. "_index": "_index",
  125. "_id": "_id",
  126. "_version": "-3",
  127. "_source": {
  128. "message": "*REDACTED* GET /index.html 15824 0.043 *REDACTED*"
  129. },
  130. "_ingest": {
  131. "timestamp": "2023-02-01T16:53:14.560005377Z"
  132. }
  133. }
  134. }
  135. ]
  136. }
  137. --------------------------------------------------
  138. // TESTRESPONSE[s/2023-02-01T16:53:14.560005377Z/$body.docs.0.doc._ingest.timestamp/]
  139. [[redact-custom-patterns]]
  140. ==== Custom patterns
  141. If one of the existing Grok {es-repo}blob/{branch}/libs/grok/src/main/resources/patterns/ecs-v1[patterns]
  142. does not fit your requirements custom patterns can be added with the
  143. `pattern_definitions` option. New patterns definitions are composed of
  144. a pattern name and the pattern itself. The pattern may be a regular
  145. expression or reference existing Grok patterns.
  146. This example defines the custom pattern `GITHUB_NAME` to match
  147. GitHub usernames. The pattern definition uses the existing
  148. `USERNAME` Grok https://github.com/elastic/elasticsearch/blob/{branch}/libs/grok/src/main/resources/patterns/ecs-v1/grok-patterns[pattern] prefixed by the
  149. literal `@`.
  150. NOTE: The {kibana-ref}/xpack-grokdebugger.html[Grok Debugger] is a really
  151. useful tool for building custom patterns.
  152. [source,console]
  153. --------------------------------------------------
  154. POST _ingest/pipeline/_simulate
  155. {
  156. "pipeline": {
  157. "processors": [
  158. {
  159. "redact": {
  160. "field": "message",
  161. "patterns": [
  162. "%{GITHUB_NAME:GITHUB_NAME}"
  163. ],
  164. "pattern_definitions": {
  165. "GITHUB_NAME": "@%{USERNAME}"
  166. }
  167. }
  168. }
  169. ]
  170. },
  171. "docs": [
  172. {
  173. "_source": {
  174. "message": "@elastic-data-management the PR is ready for review"
  175. }
  176. }
  177. ]
  178. }
  179. --------------------------------------------------
  180. The username is redacted in the response.
  181. [source,console-result]
  182. --------------------------------------------------
  183. {
  184. "docs": [
  185. {
  186. "doc": {
  187. "_index": "_index",
  188. "_id": "_id",
  189. "_version": "-3",
  190. "_source": {
  191. "message": "<GITHUB_NAME> the PR is ready for review"
  192. },
  193. "_ingest": {
  194. "timestamp": "2023-02-01T16:53:14.560005377Z"
  195. }
  196. }
  197. }
  198. ]
  199. }
  200. --------------------------------------------------
  201. // TESTRESPONSE[s/2023-02-01T16:53:14.560005377Z/$body.docs.0.doc._ingest.timestamp/]
  202. [[grok-watchdog-redact]]
  203. ==== Grok watchdog
  204. The watchdog interrupts expressions that take too long to execute.
  205. When interrupted, the Redact processor fails with an error.
  206. The same <<grok-watchdog-options, settings>> that control the
  207. Grok Watchdog timeout also apply to the Redact processor.