Explorar o código

[DOCS] Refactor ingest pipeline docs (#70253)

James Rodewig %!s(int64=4) %!d(string=hai) anos
pai
achega
010a973018

+ 0 - 16
docs/plugins/ingest.asciidoc

@@ -13,20 +13,4 @@ The core ingest plugins are:
 The ingest attachment plugin lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by
 using the Apache text extraction library https://tika.apache.org/[Tika].
 
-<<ingest-geoip>>::
-
-The `geoip` processor adds information about the geographical location of IP
-addresses, based on data from the Maxmind databases. This processor adds this
-information by default under the `geoip` field. The `geoip` processor is no
-longer distributed as a plugin, but is now a module distributed by default with
-Elasticsearch. See {ref}/geoip-processor.html[GeoIP processor] for more
-details.
-
-<<ingest-user-agent>>::
-
-A processor that extracts details from the User-Agent header value. The
-`user_agent` processor is no longer distributed as a plugin, but is now a module
-distributed by default with Elasticsearch. See
-{ref}/user-agent-processor.html[User Agent processor] for more details.
-
 include::ingest-attachment.asciidoc[]

BIN=BIN
docs/reference/images/ingest/ingest-pipeline-list.png


BIN=BIN
docs/reference/images/ingest/ingest-pipeline-processor.png


BIN=BIN
docs/reference/images/ingest/test-a-pipeline.png


+ 2 - 0
docs/reference/index-modules.asciidoc

@@ -273,6 +273,7 @@ are ignored for this index.
     The length of time that a <<delete-versioning,deleted document's version number>> remains available for <<index-versioning,further versioned operations>>.
     Defaults to `60s`.
 
+[[index-default-pipeline]]
  `index.default_pipeline`::
 
     The default <<ingest,ingest node>> pipeline for this index. Index requests will fail
@@ -280,6 +281,7 @@ are ignored for this index.
     overridden using the `pipeline` parameter. The special pipeline name `_none` indicates
     no ingest pipeline should be run.
 
+[[index-final-pipeline]]
  `index.final_pipeline`::
     The final <<ingest,ingest node>> pipeline for this index. Index requests
     will fail if the final pipeline is set and the pipeline does not exist.

+ 648 - 57
docs/reference/ingest.asciidoc

@@ -1,91 +1,682 @@
 [[ingest]]
-= Ingest node
+= Ingest pipelines
 
-[partintro]
---
-Use an ingest node to pre-process documents before the actual document indexing happens.
-The ingest node intercepts bulk and index requests, it applies transformations, and it then
-passes the documents back to the index or bulk APIs.
+Ingest pipelines let you perform common transformations on your data before
+indexing. For example, you can use pipelines to remove fields, extract values
+from text, and enrich your data.
 
-All nodes enable ingest by default, so any node can handle ingest tasks. To
-create a dedicated ingest node, configure the <<modules-node,`node.roles`>>
-setting in `elasticsearch.yml` as follows:
+A pipeline consists of a series of configurable tasks called
+<<processors,processors>>. Each processor runs sequentially, making specific
+changes to incoming documents. After the processors have run, {es} adds the
+transformed documents to your data stream or index.
 
-[source,yaml]
+image::images/ingest/ingest-process.svg[Ingest pipeline diagram,align="center"]
+
+You can create and manage ingest pipelines using {kib}'s **Ingest Node
+Pipelines** feature or the <<ingest-apis,ingest APIs>>. {es} stores pipelines in
+the <<cluster-state,cluster state>>.
+
+[discrete]
+[[ingest-prerequisites]]
+=== Prerequisites
+
+* Nodes with the <<node-ingest-node,`ingest`>> node role handle pipeline
+processing. To use ingest pipelines, your cluster must have at least one node
+with the `ingest` role. For heavy ingest loads, we recommend creating
+<<node-ingest-node,dedicated ingest nodes>>.
+
+* If the {es} security features are enabled, you must have the `manage_pipeline`
+<<privileges-list-cluster,cluster privilege>> to manage ingest pipelines. To use
+{kib}'s **Ingest Node Pipelines** feature, you also need the
+`cluster:monitor/nodes/info` cluster privileges.
+
+* Pipelines including the `enrich` processor require additional setup. See
+<<ingest-enriching-data>>.
+
+[discrete]
+[[create-manage-ingest-pipelines]]
+== Create and manage pipelines
+
+In {kib}, open the main menu and click **Stack Management** > **Ingest Node
+Pipelines**. From the list view, you can:
+
+* View a list of your pipelines and drill down into details
+* Edit or clone existing pipelines
+* Delete pipelines
+
+To create a new pipeline, click **Create a pipeline**. For an example tutorial,
+see <<common-log-format-example>>.
+
+[role="screenshot"]
+image::images/ingest/ingest-pipeline-list.png[Kibana's Ingest Node Pipelines list view,align="center"]
+
+You can also use the <<ingest-apis,ingest APIs>> to create and manage pipelines.
+The following <<put-pipeline-api,create pipeline API>> request creates
+a pipeline containing two <<set-processor,`set`>> processors followed by a
+<<lowercase-processor,`lowercase`>> processor. The processors run sequentially
+in the order specified.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "description": "My pipeline description",
+  "processors": [
+    {
+      "set": {
+        "field": "my-long-field",
+        "value": 10
+      }
+    },
+    {
+      "set": {
+        "field": "my-boolean-field",
+        "value": true
+      }
+    },
+    {
+      "lowercase": {
+        "field": "my-keyword-field"
+      }
+    }
+  ]
+}
+----
+// TESTSETUP
+
+[discrete]
+[[test-pipeline]]
+=== Test a pipeline
+
+Before using a pipeline in production, we recommend you test it using sample
+documents. When creating or editing a pipeline in {kib}, click **Add
+documents**. In the **Documents** tab, provide sample documents and click **Run
+the pipeline**.
+
+[role="screenshot"]
+image::images/ingest/test-a-pipeline.png[Test a pipeline in Kibana,align="center"]
+
+You can also test pipelines using the <<simulate-pipeline-api,simulate pipeline
+API>>.
+
+[source,console]
+----
+POST _ingest/pipeline/my-pipeline/_simulate
+{
+  "docs": [
+    {
+      "_source": {
+        "my-keyword-field": "FOO"
+      }
+    },
+    {
+      "_source": {
+        "my-keyword-field": "BAR"
+      }
+    }
+  ]
+}
 ----
-node.roles: [ ingest ]
+
+The API returns transformed documents:
+
+[source,console-result]
+----
+{
+  "docs": [
+    {
+      "doc": {
+        "_index": "_index",
+        "_id": "_id",
+        "_source": {
+          "my-long-field": 10,
+          "my-boolean-field": true,
+          "my-keyword-field": "foo"
+        },
+        "_ingest": {
+          "timestamp": "2099-02-30T22:30:03.187Z"
+        }
+      }
+    },
+    {
+      "doc": {
+        "_index": "_index",
+        "_id": "_id",
+        "_source": {
+          "my-long-field": 10,
+          "my-boolean-field": true,
+          "my-keyword-field": "bar"
+        },
+        "_ingest": {
+          "timestamp": "2099-02-30T22:30:03.188Z"
+        }
+      }
+    }
+  ]
+}
 ----
+// TESTRESPONSE[s/"2099-02-30T22:30:03.187Z"/$body.docs.0.doc._ingest.timestamp/]
+// TESTRESPONSE[s/"2099-02-30T22:30:03.188Z"/$body.docs.1.doc._ingest.timestamp/]
 
-To disable ingest for a node, specify the `node.roles` setting and exclude
-`ingest` from the listed roles.
+[discrete]
+[[add-pipeline-to-indexing-request]]
+=== Add a pipeline to an indexing request
 
-To pre-process documents before indexing, <<pipeline,define a pipeline>> that specifies a series of
-<<ingest-processors,processors>>. Each processor transforms the document in some specific way. For example, a
-pipeline might have one processor that removes a field from the document, followed by
-another processor that renames a field. The <<cluster-state,cluster state>> then stores
-the configured pipelines.
+Use the `pipeline` query parameter to apply a pipeline to documents in
+<<docs-index_,individual>> or <<docs-bulk,bulk>> indexing requests.
 
-To use a pipeline, simply specify the `pipeline` parameter on an index or bulk request. This
-way, the ingest node knows which pipeline to use.
+[source,console]
+----
+POST my-data-stream/_doc?pipeline=my-pipeline
+{
+  "@timestamp": "2099-03-07T11:04:05.000Z",
+  "my-keyword-field": "foo"
+}
 
-For example:
-Create a pipeline
+PUT my-data-stream/_bulk?pipeline=my-pipeline
+{ "create":{ } }
+{ "@timestamp": "2099-03-08T11:04:05.000Z", "my-keyword-field" : "foo" }
+{ "create":{ } }
+{ "@timestamp": "2099-03-08T11:06:07.000Z", "my-keyword-field" : "bar" }
+----
+
+You can also use the `pipeline` parameter with the <<docs-update-by-query,update
+by query>> or <<docs-reindex,reindex>> APIs.
 
 [source,console]
---------------------------------------------------
-PUT _ingest/pipeline/my_pipeline_id
+----
+POST my-data-stream/_update_by_query?pipeline=my-pipeline
+
+POST _reindex
+{
+  "source": {
+    "index": "my-data-stream"
+  },
+  "dest": {
+    "index": "my-new-data-stream",
+    "op_type": "create",
+    "pipeline": "my-pipeline"
+  }
+}
+----
+// TEST[continued]
+
+[discrete]
+[[set-default-pipeline]]
+=== Set a default pipeline
+
+Use the <<index-default-pipeline,`index.default_pipeline`>> index setting to set
+a default pipeline. {es} applies this pipeline if no `pipeline` parameter
+is specified.
+
+[discrete]
+[[set-final-pipeline]]
+=== Set a final pipeline
+
+Use the <<index-final-pipeline,`index.final_pipeline`>> index setting to set a
+final pipeline. {es} applies this pipeline after the request or default
+pipeline, even if neither is specified.
+
+[discrete]
+[[access-source-fields]]
+=== Access source fields in a processor
+
+Processors have read and write access to an incoming document's source fields.
+To access a field key in a processor, use its field name. The following `set`
+processor accesses `my-long-field`.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "set": {
+        "field": "my-long-field",
+        "value": 10
+      }
+    }
+  ]
+}
+----
+
+You can also prepend the `_source` prefix.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "set": {
+        "field": "_source.my-long-field",
+        "value": 10
+      }
+    }
+  ]
+}
+----
+
+Use dot notation to access object fields.
+
+IMPORTANT: If your document contains flattened objects, use the
+<<dot-expand-processor,`dot_expander`>> processor to expand them first. Other
+ingest processors cannot access flattened objects.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "dot_expander": {
+        "field": "my-object-field.my-property"
+      }
+    },
+    {
+      "set": {
+        "field": "my-object-field.my-property",
+        "value": 10
+      }
+    }
+  ]
+}
+----
+
+[[template-snippets]]
+To access field values, enclose the field name in double curly brackets `{{ }}`
+to create a https://mustache.github.io[Mustache] template snippet. You can use
+template snippets to dynamically set field names. The following processor sets a
+field name as the `service` field value.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "set": {
+        "field": "{{service}}",
+        "value": "{{code}}"
+      }
+    }
+  ]
+}
+----
+
+[discrete]
+[[access-metadata-fields]]
+=== Access metadata fields in a processor
+
+Processors can access the following metadata fields by name:
+
+* `_index`
+* `_id`
+* `_routing`
+
+For example, the following `set` processor sets the document's routing value as
+the `geoip.country_iso_code` field value.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
 {
-  "description" : "describe pipeline",
   "processors" : [
     {
       "set" : {
+        "field": "_routing",
+        "value": "{{geoip.country_iso_code}}"
+      }
+    }
+  ]
+}
+----
+
+Use a Mustache template snippet to access metadata field values. For example,
+`{{_routing}}` retrieves a document's routing value.
+
+WARNING: If you <<create-document-ids-automatically,automatically generate>>
+document IDs, you cannot use `{{_id}}` in a processor. {es} assigns
+auto-generated `_id` values after ingest.
+
+[discrete]
+[[access-ingest-metadata]]
+=== Access ingest metadata in a processor
+
+Ingest processors can add and access ingest metadata using the `_ingest` key.
+
+Unlike source and metadata fields, {es} does not index ingest metadata fields by
+default. {es} also allows source fields that start with an `_ingest` key. If
+your data includes such source fields, use `_source._ingest` to access them.
+
+Pipelines only create the `_ingest.timestamp` ingest metadata field by default.
+This field contains a timestamp of when {es} received the document's indexing
+request. To index `_ingest.timestamp` or other ingest metadata fields, use the
+`set` processor.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "set": {
+        "field": "received",
+        "value": "{{_ingest.timestamp}}"
+      }
+    }
+  ]
+}
+----
+
+[discrete]
+[[handling-pipeline-failures]]
+=== Handing pipeline failures
+
+A pipeline's processors run sequentially. By default, pipeline processing stops
+when one of these processors fails or encounters an error.
+
+To ignore a processor failure and run the pipeline's remaining processors, set
+`ignore_failure` to `true`.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "rename": {
         "field": "foo",
-        "value": "new"
+        "target_field": "bar",
+        "ignore_failure": true
       }
     }
   ]
 }
---------------------------------------------------
+----
 
-Index with defined pipeline
+Use the `on_failure` parameter to specify a list of processors to run
+immediately after a processor failure. If `on_failure` is specified, {es}
+afterward runs the pipeline's remaining processors , even if the `on_failure`
+configuration is empty.
 
 [source,console]
---------------------------------------------------
-PUT my-index-000001/_doc/my-id?pipeline=my_pipeline_id
+----
+PUT _ingest/pipeline/my-pipeline
 {
-  "foo": "bar"
+  "processors": [
+    {
+      "rename": {
+        "field": "foo",
+        "target_field": "bar",
+        "on_failure": [
+          {
+            "set": {
+              "field": "error.message",
+              "value": "field \"foo\" does not exist, cannot rename to \"bar\"",
+              "override": false
+            }
+          }
+        ]
+      }
+    }
+  ]
 }
---------------------------------------------------
-// TEST[continued]
+----
 
-Response:
+Nest a list of `on_failure` processors for nested error handling.
 
-[source,console-result]
---------------------------------------------------
-{
-  "_index" : "my-index-000001",
-  "_id" : "my-id",
-  "_version" : 1,
-  "result" : "created",
-  "_shards" : {
-    "total" : 2,
-    "successful" : 2,
-    "failed" : 0
-  },
-  "_seq_no" : 0,
-  "_primary_term" : 1
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "rename": {
+        "field": "foo",
+        "target_field": "bar",
+        "on_failure": [
+          {
+            "set": {
+              "field": "error.message",
+              "value": "field \"foo\" does not exist, cannot rename to \"bar\"",
+              "override": false,
+              "on_failure": [
+                {
+                  "set": {
+                    "field": "error.message.multi",
+                    "value": "Document encountered multiple ingest errors",
+                    "override": true
+                  }
+                }
+              ]
+            }
+          }
+        ]
+      }
+    }
+  ]
+}
+----
+
+You can also specify `on_failure` for a pipeline.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [ ... ],
+  "on_failure": [
+    {
+      "set": {
+        "field": "_index",
+        "value": "failed-{{ _index }}"
+      }
+    }
+  ]
+}
+----
+// TEST[s/\.\.\./{"lowercase": {"field":"my-keyword-field"}}/]
+
+[discrete]
+[[conditionally-run-processor]]
+=== Conditionally run a processor
+
+Each processor supports an optional `if` condition, written as a
+{painless}/painless-guide.html[Painless script]. If provided, the processor only
+runs when the `if` condition is `true`.
+
+IMPORTANT: `if` condition scripts run in Painless's
+{painless}/painless-ingest-processor-context.html[ingest processor context]. In
+`if` conditions, `ctx` values are read-only.
+
+The following <<drop-processor,`drop`>> processor uses an `if` condition to drop
+documents with a `network_name` of `Guest`.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "drop": {
+        "if": "ctx?.network_name == 'Guest'"
+      }
+    }
+  ]
+}
+----
+
+If the static `script.painless.regex.enabled` cluster setting is enabled, you
+can use regular expressions in your `if` condition scripts. For supported
+syntax, see the {painless}/painless-regexes.html[Painless regexes]
+documentation.
+
+TIP: If possible, avoid using regular expressions. Expensive regular expressions
+can slow indexing speeds.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "set": {
+        "if": "ctx.href?.url =~ /^http[^s]/",
+        "field": "href.insecure",
+        "value": true
+      }
+    }
+  ]
+}
+----
+
+You must specify `if` conditions as valid JSON on a single line. However, you
+can use the {kibana-ref}/console-kibana.html#configuring-console[{kib}
+console]'s triple quote syntax to write and debug larger scripts.
+
+TIP: If possible, avoid using complex or expensive `if` condition scripts.
+Expensive condition scripts can slow indexing speeds.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "drop": {
+        "if": """
+            Collection tags = ctx.tags;
+            if(tags != null){
+              for (String tag : tags) {
+                if (tag.toLowerCase().contains('prod')) {
+                  return false;
+                }
+              }
+            }
+            return true;
+        """
+      }
+    }
+  ]
 }
---------------------------------------------------
-// TESTRESPONSE[s/"successful" : 2/"successful" : 1/]
+----
 
-An index may also declare a <<dynamic-index-settings,default pipeline>> that will be used in the
-absence of the `pipeline` parameter.
+You can also specify a <<modules-scripting-stored-scripts,stored script>> as the
+`if` condition.
+
+[source,console]
+----
+PUT _scripts/my-stored-script
+{
+  "script": {
+    "lang": "painless",
+    "source": """
+      Collection tags = ctx.tags;
+      if(tags != null){
+        for (String tag : tags) {
+          if (tag.toLowerCase().contains('prod')) {
+            return false;
+          }
+        }
+      }
+      return true;
+    """
+  }
+}
+
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "drop": {
+        "if": { "id": "my-stored-script" }
+      }
+    }
+  ]
+}
+----
 
-Finally, an index may also declare a <<dynamic-index-settings,final pipeline>>
-that will be executed after any request or default pipeline (if any).
+Incoming documents often contain object fields. If a processor script attempts
+to access a field whose parent object does not exist, {es} returns a
+`NullPointerException`. To avoid these exceptions, use
+{painless}/painless-operators-reference.html#null-safe-operator[null safe
+operators], such as `?.`, and write your scripts to be null safe.
 
-See <<ingest-apis,Ingest APIs>> for more information about creating, adding, and deleting pipelines.
+For example, `ctx.network?.name.equalsIgnoreCase('Guest')` is not null safe.
+`ctx.network?.name` can return null. Rewrite the script as
+`'Guest'.equalsIgnoreCase(ctx.network?.name)`, which is null safe because
+`Guest` is always non-null.
 
---
+If you can't rewrite a script to be null safe, include an explicit null check.
+
+[source,console]
+----
+PUT _ingest/pipeline/my-pipeline
+{
+  "processors": [
+    {
+      "drop": {
+        "if": "ctx.network?.name != null && ctx.network.name.contains('Guest')"
+      }
+    }
+  ]
+}
+----
+
+[discrete]
+[[conditionally-apply-pipelines]]
+=== Conditionally apply pipelines
+
+Combine an `if` condition with the <<pipeline-processor,`pipeline`>> processor
+to apply other pipelines to documents based on your criteria. You can use this
+pipeline as the <<set-default-pipeline,default pipeline>> in an
+<<index-templates,index template>> used to configure multiple data streams or
+indices.
+
+The following pipeline applies different pipelines to incoming documents based
+on the `service.name` field value.
+
+[source,console]
+----
+PUT _ingest/pipeline/one-pipeline-to-rule-them-all
+{
+  "processors": [
+    {
+      "pipeline": {
+        "if": "ctx.service?.name == 'apache_httpd'",
+        "name": "httpd_pipeline"
+      }
+    },
+    {
+      "pipeline": {
+        "if": "ctx.service?.name == 'syslog'",
+        "name": "syslog_pipeline"
+      }
+    },
+    {
+      "fail": {
+        "if": "ctx.service?.name != 'apache_httpd' && ctx.service?.name != 'syslog'",
+        "message": "This pipeline requires service.name to be either `syslog` or `apache_httpd`"
+      }
+    }
+  ]
+}
+----
+
+[discrete]
+[[get-pipeline-usage-stats]]
+=== Get pipeline usage statistics
+
+Use the <<cluster-nodes-stats,node stats>> API to get global and per-pipeline
+ingest statistics. Use these stats to determine which pipelines run most
+frequently or spend the most time processing.
+
+[source,console]
+----
+GET _nodes/stats/ingest?filter_path=nodes.*.ingest
+----
 
-include::ingest/ingest-node.asciidoc[]
+include::ingest/common-log-format-example.asciidoc[]
+include::ingest/enrich.asciidoc[]
+include::ingest/processors.asciidoc[]

+ 3 - 4
docs/reference/ingest/apis/put-pipeline.asciidoc

@@ -54,20 +54,19 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=master-timeout]
 ==== {api-response-body-title}
 
 `description`::
-(Required, string)
+(Optional, string)
 Description of the ingest pipeline.
 
 `processors`::
 +
 --
-(Required, array of <<ingest-processors,processor objects>>)
+(Required, array of <<processors,processor objects>>)
 Array of processors used to pre-process documents
 before indexing.
 
 Processors are executed in the order provided.
 
-See <<ingest-processors>> for processor object definitions
-and a list of built-in processors.
+See <<processors>>.
 --
 
 `version`::

+ 2 - 2
docs/reference/ingest/apis/simulate-pipeline.asciidoc

@@ -108,13 +108,13 @@ Description of the ingest pipeline.
 `processors`::
 +
 --
-(Optional, array of <<ingest-processors,processor objects>>)
+(Optional, array of <<processors,processor objects>>)
 Array of processors used to pre-process documents
 during ingest.
 
 Processors are executed in the order provided.
 
-See <<ingest-processors>> for processor object definitions
+See <<processors>> for processor object definitions
 and a list of built-in processors.
 --
 

+ 196 - 0
docs/reference/ingest/common-log-format-example.asciidoc

@@ -0,0 +1,196 @@
+[[common-log-format-example]]
+== Example: Parse logs in the Common Log Format
+++++
+<titleabbrev>Example: Parse logs</titleabbrev>
+++++
+
+In this example tutorial, you’ll use an <<ingest,ingest pipeline>> to parse
+server logs in the {wikipedia}/Common_Log_Format[Common Log Format] before
+indexing. Before starting, check the <<ingest-prerequisites,prerequisites>> for
+ingest pipelines.
+
+The logs you want to parse look similar to this:
+
+[source,js]
+----
+212.87.37.154 - - [30/May/2099:16:21:15 +0000] \"GET /favicon.ico HTTP/1.1\"
+200 3638 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6)
+AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\"
+----
+// NOTCONSOLE
+
+These logs contain an IP address, timestamp, and user agent. You want to give
+these three items their own field in {es} for faster searches and
+visualizations. You also want to know where the request is coming from.
+
+. In {kib}, open the main menu and click **Stack Management** > **Ingest Node
+Pipelines**.
++
+[role="screenshot"]
+image::images/ingest/ingest-pipeline-list.png[Kibana's Ingest Node Pipelines list view,align="center"]
+
+. Click **Create a pipeline**.
+. Provide a name and description for the pipeline.
+. Add a <<grok-processor,grok processor>> to parse the log message:
+
+.. Click **Add a processor** and select the **Grok** processor type.
+.. Set the field input to `message` and enter the following <<grok-basics,grok
+pattern>>:
++
+[source,js]
+----
+%{IPORHOST:client.ip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:@timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:user_agent}
+----
+// NOTCONSOLE
++
+.. Click **Add** to save the processor.
+
+. Add processors to map the date, IP, and user agent fields. Map the appropriate
+field to each processor type:
++
+--
+* <<date-processor,**Date**>>: `@timestamp`
+* <<geoip-processor,**GeoIP**>>: `client.ip`
+* <<user-agent-processor,**User agent**>>: `user_agent`
+
+In the **Date** processor, specify the date format you want to use:
+`dd/MMM/YYYY:HH:mm:ss Z`.
+--
+Your form should look similar to this:
++
+[role="screenshot"]
+image::images/ingest/ingest-pipeline-processor.png[Processors for Ingest Node Pipelines,align="center"]
++
+The four processors will run sequentially: +
+Grok > Date > GeoIP > User agent +
+You can reorder processors using the arrow icons.
++
+Alternatively, you can click the **Import processors** link and define the
+processors as JSON:
++
+[source,console]
+----
+{
+  "processors": [
+    {
+      "grok": {
+        "field": "message",
+        "patterns": ["%{IPORHOST:client.ip} %{USER:ident} %{USER:auth} \\[%{HTTPDATE:@timestamp}\\] \"%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:user_agent}"]
+      }
+    },
+    {
+      "date": {
+        "field": "@timestamp",
+        "formats": [ "dd/MMM/YYYY:HH:mm:ss Z" ]
+      }
+    },
+    {
+      "geoip": {
+        "field": "client.ip"
+      }
+    },
+    {
+      "user_agent": {
+        "field": "user_agent"
+      }
+    }
+  ]
+}
+----
+// TEST[s/^/PUT _ingest\/pipeline\/my-pipeline\n/]
+
+. To test the pipeline, click **Add documents**.
+
+. In the **Documents** tab, provide a sample document for testing:
++
+[source,js]
+----
+[
+  {
+    "_source": {
+      "message": "212.87.37.154 - - [05/May/2099:16:21:15 +0000] \"GET /favicon.ico HTTP/1.1\" 200 3638 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\""
+    }
+  }
+]
+----
+// NOTCONSOLE
+
+. Click **Run the pipeline** and verify the pipeline worked as expected.
+
+. If everything looks correct, close the panel, and then click **Create
+pipeline**.
++
+You’re now ready to load the logs data using the <<docs-index_,index API>>.
+
+. Index a document with the pipeline you created.
++
+[source,console]
+----
+PUT my-index/_doc/1?pipeline=my-pipeline
+{
+  "message": "212.87.37.154 - - [05/May/2099:16:21:15 +0000] \"GET /favicon.ico HTTP/1.1\" 200 3638 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\""
+}
+----
+// TEST[continued]
+
+. To verify, run:
++
+[source,console]
+----
+GET my-index/_doc/1
+----
+// TEST[continued]
+
+////
+[source,console-result]
+----
+{
+  "_index": "my-index",
+  "_id": "1",
+  "_version": 1,
+  "_seq_no": 0,
+  "_primary_term": 1,
+  "found": true,
+  "_source": {
+    "request": "/favicon.ico",
+    "geoip": {
+      "continent_name": "Europe",
+      "region_iso_code": "DE-BE",
+      "city_name": "Berlin",
+      "country_iso_code": "DE",
+      "country_name": "Germany",
+      "region_name": "Land Berlin",
+      "location": {
+        "lon": 13.4978,
+        "lat": 52.411
+      }
+    },
+    "auth": "-",
+    "ident": "-",
+    "verb": "GET",
+    "message": "212.87.37.154 - - [05/May/2099:16:21:15 +0000] \"GET /favicon.ico HTTP/1.1\" 200 3638 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\"",
+    "referrer": "\"-\"",
+    "@timestamp": "2098-12-29T16:21:15.000Z",
+    "response": 200,
+    "bytes": 3638,
+    "client": {
+      "ip": "212.87.37.154"
+    },
+    "httpversion": "1.1",
+    "user_agent": {
+      "original": "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\"",
+      "os": {
+        "name": "Mac OS X",
+        "version": "10.11.6",
+        "full": "Mac OS X 10.11.6"
+      },
+      "name": "Chrome",
+      "device": {
+        "name": "Mac"
+      },
+      "version": "52.0.2743.116"
+    }
+  }
+}
+----
+////

+ 5 - 10
docs/reference/ingest/enrich.asciidoc

@@ -17,18 +17,13 @@ For example, you can use the enrich processor to:
 [[how-enrich-works]]
 === How the enrich processor works
 
-An <<ingest,ingest pipeline>> changes documents before they are actually
-indexed. You can think of an ingest pipeline as an assembly line made up of a
-series of workers, called <<ingest-processors,processors>>. Each processor makes
-specific changes, like lowercasing field values, to incoming documents before
-moving on to the next. When all the processors in a pipeline are done, the
-finished document is added to the target index.
+Most processors are self-contained and only change _existing_ data in incoming
+documents.
 
 image::images/ingest/ingest-process.svg[align="center"]
 
-Most processors are self-contained and only change _existing_ data in incoming
-documents. But the enrich processor adds _new_ data to incoming documents
-and requires a few special components:
+The enrich processor adds _new_ data to incoming documents and requires a few
+special components:
 
 image::images/ingest/enrich/enrich-process.svg[align="center"]
 
@@ -193,7 +188,7 @@ added as an array.
 
 See <<enrich-processor>> for a full list of configuration options.
 
-You also can add other <<ingest-processors,processors>> to your ingest pipeline.
+You also can add other <<processors,processors>> to your ingest pipeline.
 
 [[ingest-enrich-docs]]
 ==== Ingest and enrich documents

+ 0 - 907
docs/reference/ingest/ingest-node.asciidoc

@@ -1,907 +0,0 @@
-[[pipeline]]
-== Pipeline Definition
-
-A pipeline is a definition of  a series of <<ingest-processors, processors>> that are to be executed
-in the same order as they are declared. A pipeline consists of two main fields: a `description`
-and a list of `processors`:
-
-[source,js]
---------------------------------------------------
-{
-  "description" : "...",
-  "processors" : [ ... ]
-}
---------------------------------------------------
-// NOTCONSOLE
-
-The `description` is a special field to store a helpful description of
-what the pipeline does.
-
-The `processors` parameter defines a list of processors to be executed in
-order.
-
-[[accessing-data-in-pipelines]]
-== Accessing Data in Pipelines
-
-The processors in a pipeline have read and write access to documents that pass through the pipeline.
-The processors can access fields in the source of a document and the document's metadata fields.
-
-[discrete]
-[[accessing-source-fields]]
-=== Accessing Fields in the Source
-Accessing a field in the source is straightforward. You simply refer to fields by
-their name. For example:
-
-[source,js]
---------------------------------------------------
-{
-  "set": {
-    "field": "my_field",
-    "value": 582.1
-  }
-}
---------------------------------------------------
-// NOTCONSOLE
-
-On top of this, fields from the source are always accessible via the `_source` prefix:
-
-[source,js]
---------------------------------------------------
-{
-  "set": {
-    "field": "_source.my_field",
-    "value": 582.1
-  }
-}
---------------------------------------------------
-// NOTCONSOLE
-
-[discrete]
-[[accessing-metadata-fields]]
-=== Accessing Metadata Fields
-You can access metadata fields in the same way that you access fields in the source. This
-is possible because Elasticsearch doesn't allow fields in the source that have the
-same name as metadata fields.
-
-The following metadata fields are accessible by a processor:
-
-* `_index`
-* `_id`
-* `_routing`
-
-The following example sets the `_id` metadata field of a document to `1`:
-
-[source,js]
---------------------------------------------------
-{
-  "set": {
-    "field": "_id",
-    "value": "1"
-  }
-}
---------------------------------------------------
-// NOTCONSOLE
-
-You can access a metadata field's value by surrounding it in double
-curly brackets `"{{ }}"`. For example, `{{_index}}` retrieves the name of a
-document's index.
-
-WARNING: If you <<create-document-ids-automatically,automatically generate>>
-document IDs, you cannot use the `{{_id}}` value in an ingest processor. {es}
-assigns auto-generated `_id` values after ingest.
-
-[discrete]
-[[accessing-ingest-metadata]]
-=== Accessing Ingest Metadata Fields
-Beyond metadata fields and source fields, ingest also adds ingest metadata to the documents that it processes.
-These metadata properties are accessible under the `_ingest` key. Currently ingest adds the ingest timestamp
-under the `_ingest.timestamp` key of the ingest metadata. The ingest timestamp is the time when Elasticsearch
-received the index or bulk request to pre-process the document.
-
-Any processor can add ingest-related metadata during document processing. Ingest metadata is transient
-and is lost after a document has been processed by the pipeline. Therefore, ingest metadata won't be indexed.
-
-The following example adds a field with the name `received`. The value is the ingest timestamp:
-
-[source,js]
---------------------------------------------------
-{
-  "set": {
-    "field": "received",
-    "value": "{{_ingest.timestamp}}"
-  }
-}
---------------------------------------------------
-// NOTCONSOLE
-
-Unlike Elasticsearch metadata fields, the ingest metadata field name `_ingest` can be used as a valid field name
-in the source of a document. Use `_source._ingest` to refer to the field in the source document. Otherwise, `_ingest`
-will be interpreted as an ingest metadata field.
-
-[discrete]
-[[accessing-template-fields]]
-=== Accessing Fields and Metafields in Templates
-A number of processor settings also support templating. Settings that support templating can have zero or more
-template snippets. A template snippet begins with `{{` and ends with `}}`.
-Accessing fields and metafields in templates is exactly the same as via regular processor field settings.
-
-The following example adds a field named `field_c`. Its value is a concatenation of
-the values of `field_a` and `field_b`.
-
-[source,js]
---------------------------------------------------
-{
-  "set": {
-    "field": "field_c",
-    "value": "{{field_a}} {{field_b}}"
-  }
-}
---------------------------------------------------
-// NOTCONSOLE
-
-The following example uses the value of the `geoip.country_iso_code` field in the source
-to set the index that the document will be indexed into:
-
-[source,js]
---------------------------------------------------
-{
-  "set": {
-    "field": "_index",
-    "value": "{{geoip.country_iso_code}}"
-  }
-}
---------------------------------------------------
-// NOTCONSOLE
-
-Dynamic field names are also supported. This example sets the field named after the
-value of `service` to the value of the field `code`:
-
-[source,js]
---------------------------------------------------
-{
-  "set": {
-    "field": "{{service}}",
-    "value": "{{code}}"
-  }
-}
---------------------------------------------------
-// NOTCONSOLE
-
-[[ingest-conditionals]]
-== Conditional Execution in Pipelines
-
-Each processor allows for an optional `if` condition to determine if that
-processor should be executed or skipped. The value of the `if` is a
-<<modules-scripting-painless, Painless>> script that needs to evaluate
-to `true` or `false`.
-
-For example the following processor will <<drop-processor,drop>> the document
-(i.e. not index it) if the input document has a field named `network_name`
-and it is equal to `Guest`.
-
-[source,console]
---------------------------------------------------
-PUT _ingest/pipeline/drop_guests_network
-{
-  "processors": [
-    {
-      "drop": {
-        "if": "ctx.network_name == 'Guest'"
-      }
-    }
-  ]
-}
---------------------------------------------------
-
-Using that pipeline for an index request:
-
-[source,console]
---------------------------------------------------
-POST test/_doc/1?pipeline=drop_guests_network
-{
-  "network_name" : "Guest"
-}
---------------------------------------------------
-// TEST[continued]
-
-Results in nothing indexed since the conditional evaluated to `true`.
-
-[source,console-result]
---------------------------------------------------
-{
-  "_index": "test",
-  "_id": "1",
-  "_version": -3,
-  "result": "noop",
-  "_shards": {
-    "total": 0,
-    "successful": 0,
-    "failed": 0
-  }
-}
---------------------------------------------------
-
-
-[[ingest-conditional-nullcheck]]
-=== Handling Nested Fields in Conditionals
-
-Source documents often contain nested fields. Care should be taken
-to avoid NullPointerExceptions if the parent object does not exist
-in the document. For example `ctx.a.b.c` can throw an NullPointerExceptions
-if the source document does not have top level `a` object, or a second
-level `b` object.
-
-To help protect against NullPointerExceptions, null safe operations should be used.
-Fortunately, Painless makes {painless}/painless-operators-reference.html#null-safe-operator[null safe]
-operations easy with the `?.` operator.
-
-[source,console]
---------------------------------------------------
-PUT _ingest/pipeline/drop_guests_network
-{
-  "processors": [
-    {
-      "drop": {
-        "if": "ctx.network?.name == 'Guest'"
-      }
-    }
-  ]
-}
---------------------------------------------------
-
-The following document will get <<drop-processor,dropped>> correctly:
-
-[source,console]
---------------------------------------------------
-POST test/_doc/1?pipeline=drop_guests_network
-{
-  "network": {
-    "name": "Guest"
-  }
-}
---------------------------------------------------
-// TEST[continued]
-
-Thanks to the `?.` operator the following document will not throw an error.
-If the pipeline used a `.` the following document would throw a NullPointerException
-since the `network` object is not part of the source document.
-
-[source,console]
---------------------------------------------------
-POST test/_doc/2?pipeline=drop_guests_network
-{
-  "foo" : "bar"
-}
---------------------------------------------------
-// TEST[continued]
-
-////
-Hidden example assertion:
-[source,console]
---------------------------------------------------
-GET test/_doc/2
---------------------------------------------------
-// TEST[continued]
-
-[source,console-result]
---------------------------------------------------
-{
-  "_index": "test",
-  "_id": "2",
-  "_version": 1,
-  "_seq_no": 22,
-  "_primary_term": 1,
-  "found": true,
-  "_source": {
-    "foo": "bar"
-  }
-}
---------------------------------------------------
-// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term": 1/"_primary_term" : $body._primary_term/]
-////
-
-The source document can also use dot delimited fields to represent nested fields.
-
-For example instead the source document defining the fields nested:
-
-[source,js]
---------------------------------------------------
-{
-  "network": {
-    "name": "Guest"
-  }
-}
---------------------------------------------------
-// NOTCONSOLE
-
-The source document may have the nested fields flattened as such:
-[source,js]
---------------------------------------------------
-{
-  "network.name": "Guest"
-}
---------------------------------------------------
-// NOTCONSOLE
-
-If this is the case, use the <<dot-expand-processor, Dot Expand Processor>>
-so that the nested fields may be used in a conditional.
-
-[source,console]
---------------------------------------------------
-PUT _ingest/pipeline/drop_guests_network
-{
-  "processors": [
-    {
-      "dot_expander": {
-        "field": "network.name"
-      }
-    },
-    {
-      "drop": {
-        "if": "ctx.network?.name == 'Guest'"
-      }
-    }
-  ]
-}
---------------------------------------------------
-
-Now the following input document can be used with a conditional in the pipeline.
-
-[source,console]
---------------------------------------------------
-POST test/_doc/3?pipeline=drop_guests_network
-{
-  "network.name": "Guest"
-}
---------------------------------------------------
-// TEST[continued]
-
-The `?.` operators works well for use in the `if` conditional
-because the {painless}/painless-operators-reference.html#null-safe-operator[null safe operator]
-returns null if the object is null and `==` is null safe (as well as many other
-{painless}/painless-operators.html[painless operators]).
-
-However, calling a method such as `.equalsIgnoreCase` is not null safe
-and can result in a NullPointerException.
-
-Some situations allow for the same functionality but done so in a null safe manner.
-For example: `'Guest'.equalsIgnoreCase(ctx.network?.name)` is null safe because
-`Guest` is always non null, but `ctx.network?.name.equalsIgnoreCase('Guest')` is not null safe
-since `ctx.network?.name` can return null.
-
-Some situations require an explicit null check. In the following example there
-is not null safe alternative, so an explicit null check is needed.
-
-[source,js]
---------------------------------------------------
-{
-  "drop": {
-    "if": "ctx.network?.name != null && ctx.network.name.contains('Guest')"
-  }
-}
---------------------------------------------------
-// NOTCONSOLE
-
-[[ingest-conditional-complex]]
-=== Complex Conditionals
-The `if` condition can be more complex than a simple equality check.
-The full power of the <<modules-scripting-painless, Painless Scripting Language>> is available and
-running in the {painless}/painless-ingest-processor-context.html[ingest processor context].
-
-IMPORTANT: The value of ctx is read-only in `if` conditions.
-
-A more complex `if` condition that drops the document (i.e. not index it)
-unless it has a multi-valued tag field with at least one value that contains the characters
-`prod` (case insensitive).
-
-[source,console]
---------------------------------------------------
-PUT _ingest/pipeline/not_prod_dropper
-{
-  "processors": [
-    {
-      "drop": {
-        "if": "Collection tags = ctx.tags;if(tags != null){for (String tag : tags) {if (tag.toLowerCase().contains('prod')) { return false;}}} return true;"
-      }
-    }
-  ]
-}
---------------------------------------------------
-
-The conditional needs to be all on one line since JSON does not
-support new line characters. However, Kibana's console supports
-a triple quote syntax to help with writing and debugging
-scripts like these.
-
-[source,console]
---------------------------------------------------
-PUT _ingest/pipeline/not_prod_dropper
-{
-  "processors": [
-    {
-      "drop": {
-        "if": """
-            Collection tags = ctx.tags;
-            if(tags != null){
-              for (String tag : tags) {
-                  if (tag.toLowerCase().contains('prod')) {
-                      return false;
-                  }
-              }
-            }
-            return true;
-        """
-      }
-    }
-  ]
-}
---------------------------------------------------
-// TEST[continued]
-
-or it can be built with a stored script:
-
-[source,console]
---------------------------------------------------
-PUT _scripts/not_prod
-{
-  "script": {
-    "lang": "painless",
-    "source": """
-        Collection tags = ctx.tags;
-        if(tags != null){
-          for (String tag : tags) {
-              if (tag.toLowerCase().contains('prod')) {
-                  return false;
-              }
-          }
-        }
-        return true;
-    """
-  }
-}
-PUT _ingest/pipeline/not_prod_dropper
-{
-  "processors": [
-    {
-      "drop": {
-        "if": { "id": "not_prod" }
-      }
-    }
-  ]
-}
---------------------------------------------------
-// TEST[continued]
-
-Either way, you can run it with:
-
-[source,console]
---------------------------------------------------
-POST test/_doc/1?pipeline=not_prod_dropper
-{
-  "tags": ["application:myapp", "env:Stage"]
-}
---------------------------------------------------
-// TEST[continued]
-
-The document is <<drop-processor,dropped>> since `prod` (case insensitive)
-is not found in the tags.
-
-The following document is indexed (i.e. not dropped) since
-`prod` (case insensitive) is found in the tags.
-
-[source,console]
---------------------------------------------------
-POST test/_doc/2?pipeline=not_prod_dropper
-{
-  "tags": ["application:myapp", "env:Production"]
-}
---------------------------------------------------
-// TEST[continued]
-
-////
-Hidden example assertion:
-[source,console]
---------------------------------------------------
-GET test/_doc/2
---------------------------------------------------
-// TEST[continued]
-
-[source,console-result]
---------------------------------------------------
-{
-  "_index": "test",
-  "_id": "2",
-  "_version": 1,
-  "_seq_no": 34,
-  "_primary_term": 1,
-  "found": true,
-  "_source": {
-    "tags": [
-      "application:myapp",
-      "env:Production"
-    ]
-  }
-}
---------------------------------------------------
-// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
-////
-
-
-
-The <<simulate-pipeline-api>> with verbose can be used to help build out
-complex conditionals. If the conditional evaluates to false it will be
-omitted from the verbose results of the simulation since the document will not change.
-
-Care should be taken to avoid overly complex or expensive conditional checks
-since the condition needs to be checked for each and every document.
-
-[[conditionals-with-multiple-pipelines]]
-=== Conditionals with the Pipeline Processor
-The combination of the `if` conditional and the <<pipeline-processor>> can result in a simple,
-yet powerful means to process heterogeneous input. For example, you can define a single pipeline
-that delegates to other pipelines based on some criteria.
-
-[source,console]
---------------------------------------------------
-PUT _ingest/pipeline/logs_pipeline
-{
-  "description": "A pipeline of pipelines for log files",
-  "version": 1,
-  "processors": [
-    {
-      "pipeline": {
-        "if": "ctx.service?.name == 'apache_httpd'",
-        "name": "httpd_pipeline"
-      }
-    },
-    {
-      "pipeline": {
-        "if": "ctx.service?.name == 'syslog'",
-        "name": "syslog_pipeline"
-      }
-    },
-    {
-      "fail": {
-        "if": "ctx.service?.name != 'apache_httpd' && ctx.service?.name != 'syslog'",
-        "message": "This pipeline requires service.name to be either `syslog` or `apache_httpd`"
-      }
-    }
-  ]
-}
---------------------------------------------------
-
-The above example allows consumers to point to a single pipeline for all log based index requests.
-Based on the conditional, the correct pipeline will be called to process that type of data.
-
-This pattern works well with a <<dynamic-index-settings, default pipeline>> defined in an index mapping
-template for all indexes that hold data that needs pre-index processing.
-
-[[conditionals-with-regex]]
-=== Conditionals with the Regular Expressions
-The `if` conditional is implemented as a Painless script, which requires
-{painless}//painless-regexes.html[explicit support for regular expressions].
-
-`script.painless.regex.enabled: true` must be set in `elasticsearch.yml` to use regular
-expressions in the `if` condition.
-
-If regular expressions are enabled, operators such as `=~` can be used against a `/pattern/` for conditions.
-
-For example:
-
-[source,console]
---------------------------------------------------
-PUT _ingest/pipeline/check_url
-{
-  "processors": [
-    {
-      "set": {
-        "if": "ctx.href?.url =~ /^http[^s]/",
-        "field": "href.insecure",
-        "value": true
-      }
-    }
-  ]
-}
---------------------------------------------------
-
-[source,console]
---------------------------------------------------
-POST test/_doc/1?pipeline=check_url
-{
-  "href": {
-    "url": "http://www.elastic.co/"
-  }
-}
---------------------------------------------------
-// TEST[continued]
-
-Results in:
-
-////
-Hidden example assertion:
-[source,console]
---------------------------------------------------
-GET test/_doc/1
---------------------------------------------------
-// TEST[continued]
-////
-
-[source,console-result]
---------------------------------------------------
-{
-  "_index": "test",
-  "_id": "1",
-  "_version": 1,
-  "_seq_no": 60,
-  "_primary_term": 1,
-  "found": true,
-  "_source": {
-    "href": {
-      "insecure": true,
-      "url": "http://www.elastic.co/"
-    }
-  }
-}
---------------------------------------------------
-// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
-
-
-Regular expressions can be expensive and should be avoided if viable
-alternatives exist.
-
-For example in this case `startsWith` can be used to get the same result
-without using a regular expression:
-
-[source,console]
---------------------------------------------------
-PUT _ingest/pipeline/check_url
-{
-  "processors": [
-    {
-      "set": {
-        "if": "ctx.href?.url != null && ctx.href.url.startsWith('http://')",
-        "field": "href.insecure",
-        "value": true
-      }
-    }
-  ]
-}
---------------------------------------------------
-
-[[handling-failure-in-pipelines]]
-== Handling Failures in Pipelines
-
-In its simplest use case, a pipeline defines a list of processors that
-are executed sequentially, and processing halts at the first exception. This
-behavior may not be desirable when failures are expected. For example, you may have logs
-that don't match the specified grok expression. Instead of halting execution, you may
-want to index such documents into a separate index.
-
-To enable this behavior, you can use the `on_failure` parameter. The `on_failure` parameter
-defines a list of processors to be executed immediately following the failed processor.
-You can specify this parameter at the pipeline level, as well as at the processor
-level. If a processor specifies an `on_failure` configuration, whether
-it is empty or not, any exceptions that are thrown by the processor are caught, and the
-pipeline continues executing the remaining processors. Because you can define further processors
-within the scope of an `on_failure` statement, you can nest failure handling.
-
-The following example defines a pipeline that renames the `foo` field in
-the processed document to `bar`. If the document does not contain the `foo` field, the processor
-attaches an error message to the document for later analysis within
-Elasticsearch.
-
-[source,js]
---------------------------------------------------
-{
-  "description" : "my first pipeline with handled exceptions",
-  "processors" : [
-    {
-      "rename" : {
-        "field" : "foo",
-        "target_field" : "bar",
-        "on_failure" : [
-          {
-            "set" : {
-              "field" : "error.message",
-              "value" : "field \"foo\" does not exist, cannot rename to \"bar\""
-            }
-          }
-        ]
-      }
-    }
-  ]
-}
---------------------------------------------------
-// NOTCONSOLE
-
-The following example defines an `on_failure` block on a whole pipeline to change
-the index to which failed documents get sent.
-
-[source,js]
---------------------------------------------------
-{
-  "description" : "my first pipeline with handled exceptions",
-  "processors" : [ ... ],
-  "on_failure" : [
-    {
-      "set" : {
-        "field" : "_index",
-        "value" : "failed-{{ _index }}"
-      }
-    }
-  ]
-}
---------------------------------------------------
-// NOTCONSOLE
-
-Alternatively instead of defining behaviour in case of processor failure, it is also possible
-to ignore a failure and continue with the next processor by specifying the `ignore_failure` setting.
-
-In case in the example below the field `foo` doesn't exist the failure will be caught and the pipeline
-continues to execute, which in this case means that the pipeline does nothing.
-
-[source,js]
---------------------------------------------------
-{
-  "description" : "my first pipeline with handled exceptions",
-  "processors" : [
-    {
-      "rename" : {
-        "field" : "foo",
-        "target_field" : "bar",
-        "ignore_failure" : true
-      }
-    }
-  ]
-}
---------------------------------------------------
-// NOTCONSOLE
-
-The `ignore_failure` can be set on any processor and defaults to `false`.
-
-[discrete]
-[[accessing-error-metadata]]
-=== Accessing Error Metadata From Processors Handling Exceptions
-
-You may want to retrieve the actual error message that was thrown
-by a failed processor. To do so you can access metadata fields called
-`on_failure_message`, `on_failure_processor_type`, `on_failure_processor_tag` and
-`on_failure_pipeline` (in case an error occurred inside a pipeline processor).
-These fields are only accessible from within the context of an `on_failure` block.
-
-Here is an updated version of the example that you
-saw earlier. But instead of setting the error message manually, the example leverages the `on_failure_message`
-metadata field to provide the error message.
-
-[source,js]
---------------------------------------------------
-{
-  "description" : "my first pipeline with handled exceptions",
-  "processors" : [
-    {
-      "rename" : {
-        "field" : "foo",
-        "to" : "bar",
-        "on_failure" : [
-          {
-            "set" : {
-              "field" : "error.message",
-              "value" : "{{ _ingest.on_failure_message }}"
-            }
-          }
-        ]
-      }
-    }
-  ]
-}
---------------------------------------------------
-// NOTCONSOLE
-
-
-include::enrich.asciidoc[]
-
-
-[[ingest-processors]]
-== Processors
-
-All processors are defined in the following way within a pipeline definition:
-
-[source,js]
---------------------------------------------------
-{
-  "PROCESSOR_NAME" : {
-    ... processor configuration options ...
-  }
-}
---------------------------------------------------
-// NOTCONSOLE
-
-Each processor defines its own configuration parameters, but all processors have
-the ability to declare `tag`, `on_failure` and `if` fields. These fields are optional.
-
-A `tag` is simply a string identifier of the specific instantiation of a certain
-processor in a pipeline. The `tag` field does not affect the processor's behavior,
-but is very useful for bookkeeping and tracing errors to specific processors.
-
-The `if` field must contain a script that returns a boolean value. If the script evaluates to `true`
-then the processor will be executed for the given document otherwise it will be skipped.
-The `if` field takes an object with the script fields defined in <<script-processor, script-options>>
-and accesses a read only version of the document via the same `ctx` variable used by scripts in the
-<<script-processor>>.
-
-[source,js]
---------------------------------------------------
-{
-  "set": {
-    "if": "ctx.foo == 'someValue'",
-    "field": "found",
-    "value": true
-  }
-}
---------------------------------------------------
-// NOTCONSOLE
-
-See <<ingest-conditionals>> to learn more about the `if` field and conditional execution.
-
-See <<handling-failure-in-pipelines>> to learn more about the `on_failure` field and error handling in pipelines.
-
-The <<cluster-nodes-info,node info API>> will provide a per node list of what processors are available.
-
-Custom processors must be installed on all nodes. The put pipeline API will fail if a processor specified in a pipeline
-doesn't exist on all nodes. If you rely on custom processor plugins make sure to mark these plugins as mandatory by adding
-`plugin.mandatory` setting to the `config/elasticsearch.yml` file, for example:
-
-[source,yaml]
---------------------------------------------------
-plugin.mandatory: ingest-attachment
---------------------------------------------------
-
-A node will not start if this plugin is not available.
-
-The <<cluster-nodes-stats,node stats API>> can be used to fetch ingest usage statistics, globally and on a per
-pipeline basis. Useful to find out which pipelines are used the most or spent the most time on preprocessing.
-
-[discrete]
-=== Ingest Processor Plugins
-
-Additional ingest processors can be implemented and installed as Elasticsearch {plugins}/intro.html[plugins].
-See {plugins}/ingest.html[Ingest plugins] for information about the available ingest plugins.
-
-include::processors/append.asciidoc[]
-include::processors/bytes.asciidoc[]
-include::processors/circle.asciidoc[]
-include::processors/community-id.asciidoc[]
-include::processors/convert.asciidoc[]
-include::processors/csv.asciidoc[]
-include::processors/date.asciidoc[]
-include::processors/date-index-name.asciidoc[]
-include::processors/dissect.asciidoc[]
-include::processors/dot-expand.asciidoc[]
-include::processors/drop.asciidoc[]
-include::processors/enrich.asciidoc[]
-include::processors/fail.asciidoc[]
-include::processors/fingerprint.asciidoc[]
-include::processors/foreach.asciidoc[]
-include::processors/geoip.asciidoc[]
-include::processors/grok.asciidoc[]
-include::processors/gsub.asciidoc[]
-include::processors/html_strip.asciidoc[]
-include::processors/inference.asciidoc[]
-include::processors/join.asciidoc[]
-include::processors/json.asciidoc[]
-include::processors/kv.asciidoc[]
-include::processors/lowercase.asciidoc[]
-include::processors/network-direction.asciidoc[]
-include::processors/pipeline.asciidoc[]
-include::processors/remove.asciidoc[]
-include::processors/rename.asciidoc[]
-include::processors/script.asciidoc[]
-include::processors/set.asciidoc[]
-include::processors/set-security-user.asciidoc[]
-include::processors/sort.asciidoc[]
-include::processors/split.asciidoc[]
-include::processors/trim.asciidoc[]
-include::processors/uppercase.asciidoc[]
-include::processors/url-decode.asciidoc[]
-include::processors/uri-parts.asciidoc[]
-include::processors/user-agent.asciidoc[]

+ 72 - 0
docs/reference/ingest/processors.asciidoc

@@ -0,0 +1,72 @@
+[[processors]]
+== Ingest processor reference
+++++
+<titleabbrev>Processor reference</titleabbrev>
+++++
+
+{es} includes several configurable processors. To get a list of available
+processors, use the <<cluster-nodes-info,nodes info>> API.
+
+[source,console]
+----
+GET _nodes/ingest?filter_path=nodes.*.ingest.processors
+----
+
+The pages in this section contain reference documentation for each processor.
+
+[discrete]
+[[ingest-process-plugins]]
+=== Processor plugins
+
+You can install additional processors as {plugins}/ingest.html[plugins].
+
+You must install any plugin processors on all nodes in your cluster. Otherwise,
+{es} will fail to create pipelines containing the processor.
+
+Mark a plugin as mandatory by setting `plugin.mandatory` in
+`elasticsearch.yml`. A node will fail to start is a mandatory plugin is not
+installed.
+
+[source,yaml]
+----
+plugin.mandatory: ingest-attachment
+----
+
+include::processors/append.asciidoc[]
+include::processors/bytes.asciidoc[]
+include::processors/circle.asciidoc[]
+include::processors/community-id.asciidoc[]
+include::processors/convert.asciidoc[]
+include::processors/csv.asciidoc[]
+include::processors/date.asciidoc[]
+include::processors/date-index-name.asciidoc[]
+include::processors/dissect.asciidoc[]
+include::processors/dot-expand.asciidoc[]
+include::processors/drop.asciidoc[]
+include::processors/enrich.asciidoc[]
+include::processors/fail.asciidoc[]
+include::processors/fingerprint.asciidoc[]
+include::processors/foreach.asciidoc[]
+include::processors/geoip.asciidoc[]
+include::processors/grok.asciidoc[]
+include::processors/gsub.asciidoc[]
+include::processors/html_strip.asciidoc[]
+include::processors/inference.asciidoc[]
+include::processors/join.asciidoc[]
+include::processors/json.asciidoc[]
+include::processors/kv.asciidoc[]
+include::processors/lowercase.asciidoc[]
+include::processors/network-direction.asciidoc[]
+include::processors/pipeline.asciidoc[]
+include::processors/remove.asciidoc[]
+include::processors/rename.asciidoc[]
+include::processors/script.asciidoc[]
+include::processors/set.asciidoc[]
+include::processors/set-security-user.asciidoc[]
+include::processors/sort.asciidoc[]
+include::processors/split.asciidoc[]
+include::processors/trim.asciidoc[]
+include::processors/uppercase.asciidoc[]
+include::processors/url-decode.asciidoc[]
+include::processors/uri-parts.asciidoc[]
+include::processors/user-agent.asciidoc[]

+ 2 - 2
docs/reference/ingest/processors/append.asciidoc

@@ -15,8 +15,8 @@ Accepts a single value or an array of values.
 [options="header"]
 |======
 | Name      | Required  | Default  | Description
-| `field`  | yes       | -        | The field to be appended to. Supports <<accessing-template-fields,template snippets>>.
-| `value`  | yes       | -        | The value to be appended. Supports <<accessing-template-fields,template snippets>>.
+| `field`  | yes       | -        | The field to be appended to. Supports <<template-snippets,template snippets>>.
+| `value`  | yes       | -        | The value to be appended. Supports <<template-snippets,template snippets>>.
 | `allow_duplicates` | no | true  | If `false`, the processor does not append
 values already present in the field.
 include::common-options.asciidoc[]

+ 3 - 3
docs/reference/ingest/processors/date-index-name.asciidoc

@@ -133,11 +133,11 @@ understands this to mean `2016-04-01` as is explained in the <<date-math-index-n
 |======
 | Name                   | Required  | Default                      | Description
 | `field`                | yes       | -                            | The field to get the date or timestamp from.
-| `index_name_prefix`    | no        | -                            | A prefix of the index name to be prepended before the printed date. Supports <<accessing-template-fields,template snippets>>.
-| `date_rounding`        | yes       | -                            | How to round the date when formatting the date into the index name. Valid values are: `y` (year), `M` (month), `w` (week), `d` (day), `h` (hour), `m` (minute) and `s` (second). Supports <<accessing-template-fields,template snippets>>.
+| `index_name_prefix`    | no        | -                            | A prefix of the index name to be prepended before the printed date. Supports <<template-snippets,template snippets>>.
+| `date_rounding`        | yes       | -                            | How to round the date when formatting the date into the index name. Valid values are: `y` (year), `M` (month), `w` (week), `d` (day), `h` (hour), `m` (minute) and `s` (second). Supports <<template-snippets,template snippets>>.
 | `date_formats`         | no        | yyyy-MM-dd+++'T'+++HH:mm:ss.SSSXX  | An array of the expected date formats for parsing dates / timestamps in the document being preprocessed. Can be a java time pattern or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N.
 | `timezone`             | no        | UTC                          | The timezone to use when parsing the date and when date math index supports resolves expressions into concrete index names.
 | `locale`               | no        | ENGLISH                      | The locale to use when parsing the date from the document being preprocessed, relevant when parsing month names or week days.
-| `index_name_format`    | no        | yyyy-MM-dd                   | The format to be used when printing the parsed date into the index name. A valid java time pattern is expected here. Supports <<accessing-template-fields,template snippets>>.
+| `index_name_format`    | no        | yyyy-MM-dd                   | The format to be used when printing the parsed date into the index name. A valid java time pattern is expected here. Supports <<template-snippets,template snippets>>.
 include::common-options.asciidoc[]
 |======

+ 2 - 2
docs/reference/ingest/processors/date.asciidoc

@@ -18,8 +18,8 @@ in the same order they were defined as part of the processor definition.
 | `field`                | yes       | -                   | The field to get the date from.
 | `target_field`         | no        | @timestamp          | The field that will hold the parsed date.
 | `formats`              | yes       | -                   | An array of the expected date formats. Can be a <<mapping-date-format,java time pattern>> or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N.
-| `timezone`             | no        | UTC                 | The timezone to use when parsing the date. Supports <<accessing-template-fields,template snippets>>.
-| `locale`               | no        | ENGLISH             | The locale to use when parsing the date, relevant when parsing month names or week days. Supports <<accessing-template-fields,template snippets>>.
+| `timezone`             | no        | UTC                 | The timezone to use when parsing the date. Supports <<template-snippets,template snippets>>.
+| `locale`               | no        | ENGLISH             | The locale to use when parsing the date, relevant when parsing month names or week days. Supports <<template-snippets,template snippets>>.
 | `output_format`        | no        | `yyyy-MM-dd'T'HH:mm:ss.SSSXXX` | The format to use when writing the date to `target_field`. Can be a <<mapping-date-format,java time pattern>> or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N.
 include::common-options.asciidoc[]
 |======

+ 1 - 1
docs/reference/ingest/processors/dot-expand.asciidoc

@@ -6,7 +6,7 @@
 
 Expands a field with dots into an object field. This processor allows fields
 with dots in the name to be accessible by other processors in the pipeline.
-Otherwise these <<accessing-data-in-pipelines,fields>> can't be accessed by any processor.
+Otherwise these fields can't be accessed by any processor.
 
 [[dot-expander-options]]
 .Dot Expand Options

+ 2 - 2
docs/reference/ingest/processors/enrich.asciidoc

@@ -15,8 +15,8 @@ See <<ingest-enriching-data,enrich data>> section for more information about how
 |======
 | Name               | Required  | Default              | Description
 | `policy_name`      | yes       | -                    | The name of the enrich policy to use.
-| `field`            | yes       | -                    | The field in the input document that matches the policies match_field used to retrieve the enrichment data. Supports <<accessing-template-fields,template snippets>>.
-| `target_field`     | yes       | -                    | Field added to incoming documents to contain enrich data. This field contains both the `match_field` and `enrich_fields` specified in the <<enrich-policy-definition,enrich policy>>. Supports <<accessing-template-fields,template snippets>>.
+| `field`            | yes       | -                    | The field in the input document that matches the policies match_field used to retrieve the enrichment data. Supports <<template-snippets,template snippets>>.
+| `target_field`     | yes       | -                    | Field added to incoming documents to contain enrich data. This field contains both the `match_field` and `enrich_fields` specified in the <<enrich-policy-definition,enrich policy>>. Supports <<template-snippets,template snippets>>.
 | `ignore_missing`   | no        | false                | If `true` and `field` does not exist, the processor quietly exits without modifying the document
 | `override`         | no        | true                 | If processor will update fields with pre-existing non-null-valued field. When set to `false`, such fields will not be touched.
 | `max_matches`      | no        | 1                    | The maximum number of matched documents to include under the configured target field. The `target_field` will be turned into a json array if `max_matches` is higher than 1, otherwise `target_field` will become a json object. In order to avoid documents getting too large, the maximum allowed value is 128.

+ 1 - 1
docs/reference/ingest/processors/fail.asciidoc

@@ -13,7 +13,7 @@ to the requester.
 [options="header"]
 |======
 | Name       | Required  | Default  | Description
-| `message`  | yes       | -        | The error message thrown by the processor. Supports <<accessing-template-fields,template snippets>>.
+| `message`  | yes       | -        | The error message thrown by the processor. Supports <<template-snippets,template snippets>>.
 include::common-options.asciidoc[]
 |======
 

+ 1 - 1
docs/reference/ingest/processors/pipeline.asciidoc

@@ -11,7 +11,7 @@ Executes another pipeline.
 [options="header"]
 |======
 | Name             | Required  | Default  | Description
-| `name`           | yes       | -        | The name of the pipeline to execute. Supports <<accessing-template-fields,template snippets>>.
+| `name`           | yes       | -        | The name of the pipeline to execute. Supports <<template-snippets,template snippets>>.
 include::common-options.asciidoc[]
 |======
 

+ 1 - 1
docs/reference/ingest/processors/remove.asciidoc

@@ -11,7 +11,7 @@ Removes existing fields. If one field doesn't exist, an exception will be thrown
 [options="header"]
 |======
 | Name             | Required  | Default  | Description
-| `field`          | yes       | -        | Fields to be removed. Supports <<accessing-template-fields,template snippets>>.
+| `field`          | yes       | -        | Fields to be removed. Supports <<template-snippets,template snippets>>.
 | `ignore_missing` | no        | `false`  | If `true` and `field` does not exist or is `null`, the processor quietly exits without modifying the document
 include::common-options.asciidoc[]
 |======

+ 2 - 2
docs/reference/ingest/processors/rename.asciidoc

@@ -11,8 +11,8 @@ Renames an existing field. If the field doesn't exist or the new name is already
 [options="header"]
 |======
 | Name              | Required  | Default  | Description
-| `field`           | yes       | -        | The field to be renamed. Supports <<accessing-template-fields,template snippets>>.
-| `target_field`    | yes       | -        | The new name of the field. Supports <<accessing-template-fields,template snippets>>.
+| `field`           | yes       | -        | The field to be renamed. Supports <<template-snippets,template snippets>>.
+| `target_field`    | yes       | -        | The new name of the field. Supports <<template-snippets,template snippets>>.
 | `ignore_missing`  | no        | `false`  | If `true` and `field` does not exist, the processor quietly exits without modifying the document
 include::common-options.asciidoc[]
 |======

+ 4 - 4
docs/reference/ingest/processors/set.asciidoc

@@ -12,12 +12,12 @@ its value will be replaced with the provided one.
 [options="header"]
 |======
 | Name        | Required | Default | Description
-| `field`     | yes      | -       | The field to insert, upsert, or update. Supports <<accessing-template-fields,template snippets>>.
-| `value`     | yes*     | -       | The value to be set for the field. Supports <<accessing-template-fields,template snippets>>. May specify only one of `value` or `copy_from`.
+| `field`     | yes      | -       | The field to insert, upsert, or update. Supports <<template-snippets,template snippets>>.
+| `value`     | yes*     | -       | The value to be set for the field. Supports <<template-snippets,template snippets>>. May specify only one of `value` or `copy_from`.
 | `copy_from` | no       | -       | The origin field which will be copied to `field`, cannot set `value` simultaneously. Supported data types are `boolean`, `number`, `array`, `object`, `string`, `date`, etc.
 | `override`  | no       | `true`  | If processor will update fields with pre-existing non-null-valued field. When set to `false`, such fields will not be touched.
-| `ignore_empty_value` | no | `false` | If `true` and `value` is a <<accessing-template-fields,template snippet>> that evaluates to `null` or the empty string, the processor quietly exits without modifying the document
-| `media_type` | no       | `application/json` | The media type for encoding `value`. Applies only when `value` is a <<accessing-template-fields,template snippet>>. Must be one of `application/json`, `text/plain`, or `application/x-www-form-urlencoded`.
+| `ignore_empty_value` | no | `false` | If `true` and `value` is a <<template-snippets,template snippet>> that evaluates to `null` or the empty string, the processor quietly exits without modifying the document
+| `media_type` | no       | `application/json` | The media type for encoding `value`. Applies only when `value` is a <<template-snippets,template snippet>>. Must be one of `application/json`, `text/plain`, or `application/x-www-form-urlencoded`.
 include::common-options.asciidoc[]
 |======
 

+ 46 - 0
docs/reference/redirects.asciidoc

@@ -1392,3 +1392,49 @@ include::redirects.asciidoc[tag=legacy-rollup-redirect]
 
 include::redirects.asciidoc[tag=legacy-rollup-redirect]
 endif::[]
+
+[role="exclude",id="pipeline"]
+=== Pipeline definition
+
+See <<ingest>>.
+
+[role="exclude",id="accessing-data-in-pipelines"]
+=== Accessing data in pipelines
+
+See <<access-source-fields>>, <<access-metadata-fields>>, and
+<<access-ingest-metadata>>.
+
+[role="exclude",id="ingest-conditionals"]
+=== Conditional execution in pipelines
+
+See <<conditionally-run-processor>>.
+
+[role="exclude",id="ingest-conditional-nullcheck"]
+=== Handling nested fields in conditionals
+
+See <<conditionally-run-processor>>.
+
+[role="exclude",id="ingest-conditional-complex"]
+=== Complex conditionals
+
+See <<conditionally-run-processor>>.
+
+[role="exclude",id="conditionals-with-multiple-pipelines"]
+=== Conditionals with the pipeline processor
+
+See <<conditionally-apply-pipelines>>.
+
+[role="exclude",id="conditionals-with-regex"]
+=== Conditionals with the regular expressions
+
+See <<conditionally-run-processor>>.
+
+[role="exclude",id="handling-failure-in-pipelines"]
+=== Handling failures in pipelines
+
+See <<handling-pipeline-failures>>.
+
+[role="exclude",id="ingest-processors"]
+=== Ingest processors
+
+See <<processors>>.

+ 1 - 1
docs/reference/transform/apis/put-transform.asciidoc

@@ -247,7 +247,7 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=sync-time-field]
 +
 --
 TIP: In general, it’s a good idea to use a field that contains the
-<<accessing-ingest-metadata,ingest timestamp>>. If you use a different field,
+<<access-ingest-metadata,ingest timestamp>>. If you use a different field,
 you might need to set the `delay` such that it accounts for data transmission
 delays.
 

+ 1 - 1
docs/reference/transform/apis/update-transform.asciidoc

@@ -195,7 +195,7 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=sync-time-field]
 +
 --
 TIP: In general, it’s a good idea to use a field that contains the
-<<accessing-ingest-metadata,ingest timestamp>>. If you use a different field,
+<<access-ingest-metadata,ingest timestamp>>. If you use a different field,
 you might need to set the `delay` such that it accounts for data transmission
 delays.