1
0

ingest-node.asciidoc 34 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199
  1. [[pipe-line]]
  2. == Pipeline Definition
  3. A pipeline is a definition of a series of processors that are to be
  4. executed in the same sequential order as they are declared.
  5. [source,js]
  6. --------------------------------------------------
  7. {
  8. "description" : "...",
  9. "processors" : [ ... ]
  10. }
  11. --------------------------------------------------
  12. The `description` is a special field to store a helpful description of
  13. what the pipeline attempts to achieve.
  14. The `processors` parameter defines a list of processors to be executed in
  15. order.
  16. == Processors
  17. All processors are defined in the following way within a pipeline definition:
  18. [source,js]
  19. --------------------------------------------------
  20. {
  21. "PROCESSOR_NAME" : {
  22. ... processor configuration options ...
  23. }
  24. }
  25. --------------------------------------------------
  26. Each processor defines its own configuration parameters, but all processors have
  27. the ability to declare `tag` and `on_failure` fields. These fields are optional.
  28. A `tag` is simply a string identifier of the specific instantiation of a certain
  29. processor in a pipeline. The `tag` field does not affect any processor's behavior,
  30. but is very useful for bookkeeping and tracing errors to specific processors.
  31. See <<handling-failure-in-pipelines>> to learn more about the `on_failure` field and error handling in pipelines.
  32. === Set processor
  33. Sets one field and associates it with the specified value. If the field already exists,
  34. its value will be replaced with the provided one.
  35. [[set-options]]
  36. .Set Options
  37. [options="header"]
  38. |======
  39. | Name | Required | Default | Description
  40. | `field` | yes | - | The field to insert, upsert, or update
  41. | `value` | yes | - | The value to be set for the field
  42. |======
  43. [source,js]
  44. --------------------------------------------------
  45. {
  46. "set": {
  47. "field": "field1",
  48. "value": 582.1
  49. }
  50. }
  51. --------------------------------------------------
  52. === Append processor
  53. Appends one or more values to an existing array if the field already exists and it is an array.
  54. Converts a scalar to an array and appends one or more values to it if the field exists and it is a scalar.
  55. Creates an array containing the provided values if the fields doesn't exist.
  56. Accepts a single value or an array of values.
  57. [[append-options]]
  58. .Append Options
  59. [options="header"]
  60. |======
  61. | Name | Required | Default | Description
  62. | `field` | yes | - | The field to be appended to
  63. | `value` | yes | - | The value to be appended
  64. |======
  65. [source,js]
  66. --------------------------------------------------
  67. {
  68. "append": {
  69. "field": "field1"
  70. "value": ["item2", "item3", "item4"]
  71. }
  72. }
  73. --------------------------------------------------
  74. === Remove processor
  75. Removes an existing field. If the field doesn't exist, an exception will be thrown
  76. [[remove-options]]
  77. .Remove Options
  78. [options="header"]
  79. |======
  80. | Name | Required | Default | Description
  81. | `field` | yes | - | The field to be removed
  82. |======
  83. [source,js]
  84. --------------------------------------------------
  85. {
  86. "remove": {
  87. "field": "foo"
  88. }
  89. }
  90. --------------------------------------------------
  91. === Rename processor
  92. Renames an existing field. If the field doesn't exist, an exception will be thrown. Also, the new field
  93. name must not exist.
  94. [[rename-options]]
  95. .Rename Options
  96. [options="header"]
  97. |======
  98. | Name | Required | Default | Description
  99. | `field` | yes | - | The field to be renamed
  100. | `to` | yes | - | The new name of the field
  101. |======
  102. [source,js]
  103. --------------------------------------------------
  104. {
  105. "rename": {
  106. "field": "foo",
  107. "to": "foobar"
  108. }
  109. }
  110. --------------------------------------------------
  111. === Convert processor
  112. Converts an existing field's value to a different type, like turning a string to an integer.
  113. If the field value is an array, all members will be converted.
  114. The supported types include: `integer`, `float`, `string`, and `boolean`.
  115. `boolean` will set the field to true if its string value is equal to `true` (ignore case), to
  116. false if its string value is equal to `false` (ignore case) and it will throw exception otherwise.
  117. [[convert-options]]
  118. .Convert Options
  119. [options="header"]
  120. |======
  121. | Name | Required | Default | Description
  122. | `field` | yes | - | The field whose value is to be converted
  123. | `type` | yes | - | The type to convert the existing value to
  124. |======
  125. [source,js]
  126. --------------------------------------------------
  127. {
  128. "convert": {
  129. "field" : "foo"
  130. "type": "integer"
  131. }
  132. }
  133. --------------------------------------------------
  134. === Gsub processor
  135. Converts a string field by applying a regular expression and a replacement.
  136. If the field is not a string, the processor will throw an exception.
  137. [[gsub-options]]
  138. .Gsub Options
  139. [options="header"]
  140. |======
  141. | Name | Required | Default | Description
  142. | `field` | yes | - | The field apply the replacement for
  143. | `pattern` | yes | - | The pattern to be replaced
  144. | `replacement` | yes | - | The string to replace the matching patterns with.
  145. |======
  146. [source,js]
  147. --------------------------------------------------
  148. {
  149. "gsub": {
  150. "field": "field1",
  151. "pattern": "\.",
  152. "replacement": "-"
  153. }
  154. }
  155. --------------------------------------------------
  156. === Join processor
  157. Joins each element of an array into a single string using a separator character between each element.
  158. Throws error when the field is not an array.
  159. [[join-options]]
  160. .Join Options
  161. [options="header"]
  162. |======
  163. | Name | Required | Default | Description
  164. | `field` | yes | - | The field to be separated
  165. | `separator` | yes | - | The separator character
  166. |======
  167. [source,js]
  168. --------------------------------------------------
  169. {
  170. "join": {
  171. "field": "joined_array_field",
  172. "separator": "-"
  173. }
  174. }
  175. --------------------------------------------------
  176. === Split processor
  177. Split a field to an array using a separator character. Only works on string fields.
  178. [[split-options]]
  179. .Split Options
  180. [options="header"]
  181. |======
  182. | Name | Required | Default | Description
  183. | `field` | yes | - | The field to split
  184. |======
  185. [source,js]
  186. --------------------------------------------------
  187. {
  188. "split": {
  189. "field": ","
  190. }
  191. }
  192. --------------------------------------------------
  193. === Lowercase processor
  194. Converts a string to its lowercase equivalent.
  195. [[lowercase-options]]
  196. .Lowercase Options
  197. [options="header"]
  198. |======
  199. | Name | Required | Default | Description
  200. | `field` | yes | - | The field to lowercase
  201. |======
  202. [source,js]
  203. --------------------------------------------------
  204. {
  205. "lowercase": {
  206. "field": "foo"
  207. }
  208. }
  209. --------------------------------------------------
  210. === Uppercase processor
  211. Converts a string to its uppercase equivalent.
  212. [[uppercase-options]]
  213. .Uppercase Options
  214. [options="header"]
  215. |======
  216. | Name | Required | Default | Description
  217. | `field` | yes | - | The field to uppercase
  218. |======
  219. [source,js]
  220. --------------------------------------------------
  221. {
  222. "uppercase": {
  223. "field": "foo"
  224. }
  225. }
  226. --------------------------------------------------
  227. === Trim processor
  228. Trims whitespace from field. NOTE: this only works on leading and trailing whitespaces.
  229. [[trim-options]]
  230. .Trim Options
  231. [options="header"]
  232. |======
  233. | Name | Required | Default | Description
  234. | `field` | yes | - | The string-valued field to trim whitespace from
  235. |======
  236. [source,js]
  237. --------------------------------------------------
  238. {
  239. "trim": {
  240. "field": "foo"
  241. }
  242. }
  243. --------------------------------------------------
  244. === Grok Processor
  245. The Grok Processor extracts structured fields out of a single text field within a document. You choose which field to
  246. extract matched fields from, as well as the Grok Pattern you expect will match. A Grok Pattern is like a regular
  247. expression that supports aliased expressions that can be reused.
  248. This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format
  249. that is generally written for humans and not computer consumption.
  250. The processor comes packaged with over 120 reusable patterns that are located at `$ES_HOME/config/ingest/grok/patterns`.
  251. Here, you can add your own custom grok pattern files with custom grok expressions to be used by the processor.
  252. If you need help building patterns to match your logs, you will find the <http://grokdebug.herokuapp.com> and
  253. <http://grokconstructor.appspot.com/> applications quite useful!
  254. ==== Grok Basics
  255. Grok sits on top of regular expressions, so any regular expressions are valid in grok as well.
  256. The regular expression library is Oniguruma, and you can see the full supported regexp syntax
  257. https://github.com/kkos/oniguruma/blob/master/doc/RE[on the Onigiruma site].
  258. Grok works by leveraging this regular expression language to allow naming existing patterns and combining them into more
  259. complex patterns that match your fields.
  260. The syntax for re-using a grok pattern comes in three forms: `%{SYNTAX:SEMANTIC}`, `%{SYNTAX}`, `%{SYNTAX:SEMANTIC:TYPE}`.
  261. The `SYNTAX` is the name of the pattern that will match your text. For example, `3.44` will be matched by the `NUMBER`
  262. pattern and `55.3.244.1` will be matched by the `IP` pattern. The syntax is how you match. `NUMBER` and `IP` are both
  263. patterns that are provided within the default patterns set.
  264. The `SEMANTIC` is the identifier you give to the piece of text being matched. For example, `3.44` could be the
  265. duration of an event, so you could call it simply `duration`. Further, a string `55.3.244.1` might identify
  266. the `client` making a request.
  267. The `TYPE` is the type you wish to cast your named field. `int` and `float` are currently the only types supported for coercion.
  268. For example, here is a grok pattern that would match the above example given. We would like to match a text with the following
  269. contents:
  270. [source,js]
  271. --------------------------------------------------
  272. 3.44 55.3.244.1
  273. --------------------------------------------------
  274. We may know that the above message is a number followed by an IP-address. We can match this text with the following
  275. Grok expression.
  276. [source,js]
  277. --------------------------------------------------
  278. %{NUMBER:duration} %{IP:client}
  279. --------------------------------------------------
  280. ==== Custom Patterns and Pattern Files
  281. The Grok Processor comes pre-packaged with a base set of pattern files. These patterns may not always have
  282. what you are looking for. These pattern files have a very basic format. Each line describes a named pattern with
  283. the following format:
  284. [source,js]
  285. --------------------------------------------------
  286. NAME ' '+ PATTERN '\n'
  287. --------------------------------------------------
  288. You can add this pattern to an existing file, or add your own file in the patterns directory here: `$ES_HOME/config/ingest/grok/patterns`.
  289. The Ingest Plugin will pick up files in this directory to be loaded into the grok processor's known patterns. These patterns are loaded
  290. at startup, so you will need to do a restart your ingest node if you wish to update these files while running.
  291. Example snippet of pattern definitions found in the `grok-patterns` patterns file:
  292. [source,js]
  293. --------------------------------------------------
  294. YEAR (?>\d\d){1,2}
  295. HOUR (?:2[0123]|[01]?[0-9])
  296. MINUTE (?:[0-5][0-9])
  297. SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
  298. TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
  299. --------------------------------------------------
  300. ==== Using Grok Processor in a Pipeline
  301. [[grok-options]]
  302. .Grok Options
  303. [options="header"]
  304. |======
  305. | Name | Required | Default | Description
  306. | `match_field` | yes | - | The field to use for grok expression parsing
  307. | `match_pattern` | yes | - | The grok expression to match and extract named captures with
  308. | `pattern_definitions` | no | - | A map of pattern-name and pattern tuples defining custom patterns to be used by the current processor. Patterns matching existing names will override the pre-existing definition.
  309. |======
  310. Here is an example of using the provided patterns to extract out and name structured fields from a string field in
  311. a document.
  312. [source,js]
  313. --------------------------------------------------
  314. {
  315. "message": "55.3.244.1 GET /index.html 15824 0.043"
  316. }
  317. --------------------------------------------------
  318. The pattern for this could be
  319. [source,js]
  320. --------------------------------------------------
  321. %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
  322. --------------------------------------------------
  323. An example pipeline for processing the above document using Grok:
  324. [source,js]
  325. --------------------------------------------------
  326. {
  327. "description" : "...",
  328. "processors": [
  329. {
  330. "grok": {
  331. "match_field": "message",
  332. "match_pattern": "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
  333. }
  334. }
  335. ]
  336. }
  337. --------------------------------------------------
  338. This pipeline will insert these named captures as new fields within the document, like so:
  339. [source,js]
  340. --------------------------------------------------
  341. {
  342. "message": "55.3.244.1 GET /index.html 15824 0.043",
  343. "client": "55.3.244.1",
  344. "method": "GET",
  345. "request": "/index.html",
  346. "bytes": 15824,
  347. "duration": "0.043"
  348. }
  349. --------------------------------------------------
  350. An example of a pipeline specifying custom pattern definitions:
  351. [source,js]
  352. --------------------------------------------------
  353. {
  354. "description" : "...",
  355. "processors": [
  356. {
  357. "grok": {
  358. "match_field": "message",
  359. "match_pattern": "my %{FAVORITE_DOG:dog} is colored %{RGB:color}"
  360. "pattern_definitions" : {
  361. "FAVORITE_DOG" : "beagle",
  362. "RGB" : "RED|GREEN|BLUE"
  363. }
  364. }
  365. }
  366. ]
  367. }
  368. --------------------------------------------------
  369. === Date processor
  370. The date processor is used for parsing dates from fields, and then using that date or timestamp as the timestamp for that document.
  371. The date processor adds by default the parsed date as a new field called `@timestamp`, configurable by setting the `target_field`
  372. configuration parameter. Multiple date formats are supported as part of the same date processor definition. They will be used
  373. sequentially to attempt parsing the date field, in the same order they were defined as part of the processor definition.
  374. [[date-options]]
  375. .Date options
  376. [options="header"]
  377. |======
  378. | Name | Required | Default | Description
  379. | `match_field` | yes | - | The field to get the date from.
  380. | `target_field` | no | @timestamp | The field that will hold the parsed date.
  381. | `match_formats` | yes | - | Array of the expected date formats. Can be a joda pattern or one of the following formats: ISO8601, UNIX, UNIX_MS, TAI64N.
  382. | `timezone` | no | UTC | The timezone to use when parsing the date.
  383. | `locale` | no | ENGLISH | The locale to use when parsing the date, relevant when parsing month names or week days.
  384. |======
  385. An example that adds the parsed date to the `timestamp` field based on the `initial_date` field:
  386. [source,js]
  387. --------------------------------------------------
  388. {
  389. "description" : "...",
  390. "processors" : [
  391. {
  392. "date" : {
  393. "match_field" : "initial_date",
  394. "target_field" : "timestamp",
  395. "match_formats" : ["dd/MM/yyyy hh:mm:ss"],
  396. "timezone" : "Europe/Amsterdam"
  397. }
  398. }
  399. ]
  400. }
  401. --------------------------------------------------
  402. === Fail processor
  403. The Fail Processor is used to raise an exception. This is useful for when
  404. a user expects a pipeline to fail and wishes to relay a specific message
  405. to the requester.
  406. [[fail-options]]
  407. .Fail Options
  408. [options="header"]
  409. |======
  410. | Name | Required | Default | Description
  411. | `message` | yes | - | The error message of the `FailException` thrown by the processor
  412. |======
  413. [source,js]
  414. --------------------------------------------------
  415. {
  416. "fail": {
  417. "message": "an error message"
  418. }
  419. }
  420. --------------------------------------------------
  421. === Foreach processor
  422. All processors can operate on elements inside an array, but if all elements of an array need to
  423. be processed in the same way defining a processor for each element becomes cumbersome and tricky
  424. because it is likely that the number of elements in an array are unknown. For this reason the `foreach`
  425. processor is exists. By specifying the field holding array elements and a list of processors that
  426. define what should happen to each element, array field can easily be preprocessed.
  427. Processors inside the foreach processor work in a different context and the only valid top level
  428. field is `_value`, which holds the array element value. Under this field other fields may exist.
  429. If the `foreach` processor failed to process an element inside the array and no `on_failure` processor has been specified
  430. then it aborts the execution and leaves the array unmodified.
  431. [[foreach-options]]
  432. .Foreach Options
  433. [options="header"]
  434. |======
  435. | Name | Required | Default | Description
  436. | `field` | yes | - | The array field
  437. | `processors` | yes | - | The processors
  438. |======
  439. Assume the following document:
  440. [source,js]
  441. --------------------------------------------------
  442. {
  443. "value" : ["foo", "bar", "baz"]
  444. }
  445. --------------------------------------------------
  446. When this `foreach` processor operates on this sample document:
  447. [source,js]
  448. --------------------------------------------------
  449. {
  450. "foreach" : {
  451. "field" : "values",
  452. "processors" : [
  453. {
  454. "uppercase" : {
  455. "field" : "_value"
  456. }
  457. }
  458. ]
  459. }
  460. }
  461. --------------------------------------------------
  462. Then the document will look like this after preprocessing:
  463. [source,js]
  464. --------------------------------------------------
  465. {
  466. "value" : ["FOO", "BAR", "BAZ"]
  467. }
  468. --------------------------------------------------
  469. Lets take a look at another example:
  470. [source,js]
  471. --------------------------------------------------
  472. {
  473. "persons" : [
  474. {
  475. "id" : "1",
  476. "name" : "John Doe"
  477. },
  478. {
  479. "id" : "2",
  480. "name" : "Jane Doe"
  481. }
  482. ]
  483. }
  484. --------------------------------------------------
  485. and in the case the `id` field needs to be removed
  486. then the following `foreach` processor can be used:
  487. [source,js]
  488. --------------------------------------------------
  489. {
  490. "foreach" : {
  491. "field" : "persons",
  492. "processors" : [
  493. {
  494. "remove" : {
  495. "field" : "_value.id"
  496. }
  497. }
  498. ]
  499. }
  500. }
  501. --------------------------------------------------
  502. After preprocessing the result is:
  503. [source,js]
  504. --------------------------------------------------
  505. {
  506. "persons" : [
  507. {
  508. "name" : "John Doe"
  509. },
  510. {
  511. "name" : "Jane Doe"
  512. }
  513. ]
  514. }
  515. --------------------------------------------------
  516. Like on any processor `on_failure` processors can also be defined
  517. in processors that wrapped inside the `foreach` processor.
  518. For example the `id` field may not exist on all person objects and
  519. instead of failing the index request, the document will be send to
  520. the 'failure_index' index for later inspection:
  521. [source,js]
  522. --------------------------------------------------
  523. {
  524. "foreach" : {
  525. "field" : "persons",
  526. "processors" : [
  527. {
  528. "remove" : {
  529. "field" : "_value.id",
  530. "on_failure" : [
  531. {
  532. "set" : {
  533. "field", "_index",
  534. "value", "failure_index"
  535. }
  536. }
  537. ]
  538. }
  539. }
  540. ]
  541. }
  542. }
  543. --------------------------------------------------
  544. In this example if the `remove` processor does fail then
  545. the array elements that have been processed thus far will
  546. be updated.
  547. == Accessing data in pipelines
  548. Processors in pipelines have read and write access to documents that pass through the pipeline.
  549. The fields in the source of a document and its metadata fields are accessible.
  550. Accessing a field in the source is straightforward and one can refer to fields by
  551. their name. For example:
  552. [source,js]
  553. --------------------------------------------------
  554. {
  555. "set": {
  556. "field": "my_field"
  557. "value": 582.1
  558. }
  559. }
  560. --------------------------------------------------
  561. On top of this fields from the source are always accessible via the `_source` prefix:
  562. [source,js]
  563. --------------------------------------------------
  564. {
  565. "set": {
  566. "field": "_source.my_field"
  567. "value": 582.1
  568. }
  569. }
  570. --------------------------------------------------
  571. Metadata fields can also be accessed in the same way as fields from the source. This
  572. is possible because Elasticsearch doesn't allow fields in the source that have the
  573. same name as metadata fields.
  574. The following example sets the id of a document to `1`:
  575. [source,js]
  576. --------------------------------------------------
  577. {
  578. "set": {
  579. "field": "_id"
  580. "value": "1"
  581. }
  582. }
  583. --------------------------------------------------
  584. The following metadata fields are accessible by a processor: `_index`, `_type`, `_id`, `_routing`, `_parent`,
  585. `_timestamp` and `_ttl`.
  586. Beyond metadata fields and source fields, ingest also adds ingest metadata to documents being processed.
  587. These metadata properties are accessible under the `_ingest` key. Currently ingest adds the ingest timestamp
  588. under `_ingest.timestamp` key to the ingest metadata, which is the time ES received the index or bulk
  589. request to pre-process. But any processor is free to add more ingest related metadata to it. Ingest metadata is transient
  590. and is lost after a document has been processed by the pipeline and thus ingest metadata won't be indexed.
  591. The following example adds a field with the name `received` and the value is the ingest timestamp:
  592. [source,js]
  593. --------------------------------------------------
  594. {
  595. "set": {
  596. "field": "received"
  597. "value": "{{_ingest.timestamp}}"
  598. }
  599. }
  600. --------------------------------------------------
  601. As opposed to Elasticsearch metadata fields, the ingest metadata field name _ingest can be used as a valid field name
  602. in the source of a document. Use _source._ingest to refer to it, otherwise _ingest will be interpreted as ingest
  603. metadata fields.
  604. A number of processor settings also support templating. Settings that support templating can have zero or more
  605. template snippets. A template snippet begins with `{{` and ends with `}}`.
  606. Accessing fields and metafields in templates is exactly the same as via regular processor field settings.
  607. In this example a field by the name `field_c` is added and its value is a concatenation of
  608. the values of `field_a` and `field_b`.
  609. [source,js]
  610. --------------------------------------------------
  611. {
  612. "set": {
  613. "field": "field_c"
  614. "value": "{{field_a}} {{field_b}}"
  615. }
  616. }
  617. --------------------------------------------------
  618. The following example changes the index a document is going to be indexed into. The index a document will be redirected
  619. to depends on the field in the source with name `geoip.country_iso_code`.
  620. [source,js]
  621. --------------------------------------------------
  622. {
  623. "set": {
  624. "field": "_index"
  625. "value": "{{geoip.country_iso_code}}"
  626. }
  627. }
  628. --------------------------------------------------
  629. [[handling-failure-in-pipelines]]
  630. === Handling Failure in Pipelines
  631. In its simplest case, pipelines describe a list of processors which
  632. are executed sequentially and processing halts at the first exception. This
  633. may not be desirable when failures are expected. For example, not all your logs
  634. may match a certain grok expression and you may wish to index such documents into
  635. a separate index.
  636. To enable this behavior, you can utilize the `on_failure` parameter. `on_failure`
  637. defines a list of processors to be executed immediately following the failed processor.
  638. This parameter can be supplied at the pipeline level, as well as at the processor
  639. level. If a processor has an `on_failure` configuration option provided, whether
  640. it is empty or not, any exceptions that are thrown by it will be caught and the
  641. pipeline will continue executing the proceeding processors defined. Since further processors
  642. are defined within the scope of an `on_failure` statement, failure handling can be nested.
  643. Example: In the following example we define a pipeline that hopes to rename documents with
  644. a field named `foo` to `bar`. If the document does not contain the `foo` field, we
  645. go ahead and attach an error message within the document for later analysis within
  646. Elasticsearch.
  647. [source,js]
  648. --------------------------------------------------
  649. {
  650. "description" : "my first pipeline with handled exceptions",
  651. "processors" : [
  652. {
  653. "rename" : {
  654. "field" : "foo",
  655. "to" : "bar",
  656. "on_failure" : [
  657. {
  658. "set" : {
  659. "field" : "error",
  660. "value" : "field \"foo\" does not exist, cannot rename to \"bar\""
  661. }
  662. }
  663. ]
  664. }
  665. }
  666. ]
  667. }
  668. --------------------------------------------------
  669. Example: Here we define an `on_failure` block on a whole pipeline to change
  670. the index for which failed documents get sent.
  671. [source,js]
  672. --------------------------------------------------
  673. {
  674. "description" : "my first pipeline with handled exceptions",
  675. "processors" : [ ... ],
  676. "on_failure" : [
  677. {
  678. "set" : {
  679. "field" : "_index",
  680. "value" : "failed-{{ _index }}"
  681. }
  682. }
  683. ]
  684. }
  685. --------------------------------------------------
  686. ==== Accessing Error Metadata From Processors Handling Exceptions
  687. Sometimes you may want to retrieve the actual error message that was thrown
  688. by a failed processor. To do so you can access metadata fields called
  689. `on_failure_message`, `on_failure_processor_type`, `on_failure_processor_tag`. These fields are only accessible
  690. from within the context of an `on_failure` block. Here is an updated version of
  691. our first example which leverages these fields to provide the error message instead
  692. of manually setting it.
  693. [source,js]
  694. --------------------------------------------------
  695. {
  696. "description" : "my first pipeline with handled exceptions",
  697. "processors" : [
  698. {
  699. "rename" : {
  700. "field" : "foo",
  701. "to" : "bar",
  702. "on_failure" : [
  703. {
  704. "set" : {
  705. "field" : "error",
  706. "value" : "{{ _ingest.on_failure_message }}"
  707. }
  708. }
  709. ]
  710. }
  711. }
  712. ]
  713. }
  714. --------------------------------------------------
  715. == Ingest APIs
  716. === Put pipeline API
  717. The put pipeline api adds pipelines and updates existing pipelines in the cluster.
  718. [source,js]
  719. --------------------------------------------------
  720. PUT _ingest/pipeline/my-pipeline-id
  721. {
  722. "description" : "describe pipeline",
  723. "processors" : [
  724. {
  725. "simple" : {
  726. // settings
  727. }
  728. },
  729. // other processors
  730. ]
  731. }
  732. --------------------------------------------------
  733. // AUTOSENSE
  734. NOTE: The put pipeline api also instructs all ingest nodes to reload their in-memory representation of pipelines, so that
  735. pipeline changes take immediately in effect.
  736. === Get pipeline API
  737. The get pipeline api returns pipelines based on id. This api always returns a local reference of the pipeline.
  738. [source,js]
  739. --------------------------------------------------
  740. GET _ingest/pipeline/my-pipeline-id
  741. --------------------------------------------------
  742. // AUTOSENSE
  743. Example response:
  744. [source,js]
  745. --------------------------------------------------
  746. {
  747. "my-pipeline-id": {
  748. "_source" : {
  749. "description": "describe pipeline",
  750. "processors": [
  751. {
  752. "simple" : {
  753. // settings
  754. }
  755. },
  756. // other processors
  757. ]
  758. },
  759. "_version" : 0
  760. }
  761. }
  762. --------------------------------------------------
  763. For each returned pipeline the source and the version is returned.
  764. The version is useful for knowing what version of the pipeline the node has.
  765. Multiple ids can be provided at the same time. Also wildcards are supported.
  766. === Delete pipeline API
  767. The delete pipeline api deletes pipelines by id.
  768. [source,js]
  769. --------------------------------------------------
  770. DELETE _ingest/pipeline/my-pipeline-id
  771. --------------------------------------------------
  772. // AUTOSENSE
  773. === Simulate pipeline API
  774. The simulate pipeline api executes a specific pipeline against
  775. the set of documents provided in the body of the request.
  776. A simulate request may call upon an existing pipeline to be executed
  777. against the provided documents, or supply a pipeline definition in
  778. the body of the request.
  779. Here is the structure of a simulate request with a provided pipeline:
  780. [source,js]
  781. --------------------------------------------------
  782. POST _ingest/pipeline/_simulate
  783. {
  784. "pipeline" : {
  785. // pipeline definition here
  786. },
  787. "docs" : [
  788. { /** first document **/ },
  789. { /** second document **/ },
  790. // ...
  791. ]
  792. }
  793. --------------------------------------------------
  794. Here is the structure of a simulate request against a pre-existing pipeline:
  795. [source,js]
  796. --------------------------------------------------
  797. POST _ingest/pipeline/my-pipeline-id/_simulate
  798. {
  799. "docs" : [
  800. { /** first document **/ },
  801. { /** second document **/ },
  802. // ...
  803. ]
  804. }
  805. --------------------------------------------------
  806. Here is an example simulate request with a provided pipeline and its response:
  807. [source,js]
  808. --------------------------------------------------
  809. POST _ingest/pipeline/_simulate
  810. {
  811. "pipeline" :
  812. {
  813. "description": "_description",
  814. "processors": [
  815. {
  816. "set" : {
  817. "field" : "field2",
  818. "value" : "_value"
  819. }
  820. }
  821. ]
  822. },
  823. "docs": [
  824. {
  825. "_index": "index",
  826. "_type": "type",
  827. "_id": "id",
  828. "_source": {
  829. "foo": "bar"
  830. }
  831. },
  832. {
  833. "_index": "index",
  834. "_type": "type",
  835. "_id": "id",
  836. "_source": {
  837. "foo": "rab"
  838. }
  839. }
  840. ]
  841. }
  842. --------------------------------------------------
  843. // AUTOSENSE
  844. response:
  845. [source,js]
  846. --------------------------------------------------
  847. {
  848. "docs": [
  849. {
  850. "doc": {
  851. "_id": "id",
  852. "_ttl": null,
  853. "_parent": null,
  854. "_index": "index",
  855. "_routing": null,
  856. "_type": "type",
  857. "_timestamp": null,
  858. "_source": {
  859. "field2": "_value",
  860. "foo": "bar"
  861. },
  862. "_ingest": {
  863. "timestamp": "2016-01-04T23:53:27.186+0000"
  864. }
  865. }
  866. },
  867. {
  868. "doc": {
  869. "_id": "id",
  870. "_ttl": null,
  871. "_parent": null,
  872. "_index": "index",
  873. "_routing": null,
  874. "_type": "type",
  875. "_timestamp": null,
  876. "_source": {
  877. "field2": "_value",
  878. "foo": "rab"
  879. },
  880. "_ingest": {
  881. "timestamp": "2016-01-04T23:53:27.186+0000"
  882. }
  883. }
  884. }
  885. ]
  886. }
  887. --------------------------------------------------
  888. It is often useful to see how each processor affects the ingest document
  889. as it is passed through the pipeline. To see the intermediate results of
  890. each processor in the simulate request, a `verbose` parameter may be added
  891. to the request
  892. Here is an example verbose request and its response:
  893. [source,js]
  894. --------------------------------------------------
  895. POST _ingest/pipeline/_simulate?verbose
  896. {
  897. "pipeline" :
  898. {
  899. "description": "_description",
  900. "processors": [
  901. {
  902. "set" : {
  903. "field" : "field2",
  904. "value" : "_value2"
  905. }
  906. },
  907. {
  908. "set" : {
  909. "field" : "field3",
  910. "value" : "_value3"
  911. }
  912. }
  913. ]
  914. },
  915. "docs": [
  916. {
  917. "_index": "index",
  918. "_type": "type",
  919. "_id": "id",
  920. "_source": {
  921. "foo": "bar"
  922. }
  923. },
  924. {
  925. "_index": "index",
  926. "_type": "type",
  927. "_id": "id",
  928. "_source": {
  929. "foo": "rab"
  930. }
  931. }
  932. ]
  933. }
  934. --------------------------------------------------
  935. // AUTOSENSE
  936. response:
  937. [source,js]
  938. --------------------------------------------------
  939. {
  940. "docs": [
  941. {
  942. "processor_results": [
  943. {
  944. "tag": "processor[set]-0",
  945. "doc": {
  946. "_id": "id",
  947. "_ttl": null,
  948. "_parent": null,
  949. "_index": "index",
  950. "_routing": null,
  951. "_type": "type",
  952. "_timestamp": null,
  953. "_source": {
  954. "field2": "_value2",
  955. "foo": "bar"
  956. },
  957. "_ingest": {
  958. "timestamp": "2016-01-05T00:02:51.383+0000"
  959. }
  960. }
  961. },
  962. {
  963. "tag": "processor[set]-1",
  964. "doc": {
  965. "_id": "id",
  966. "_ttl": null,
  967. "_parent": null,
  968. "_index": "index",
  969. "_routing": null,
  970. "_type": "type",
  971. "_timestamp": null,
  972. "_source": {
  973. "field3": "_value3",
  974. "field2": "_value2",
  975. "foo": "bar"
  976. },
  977. "_ingest": {
  978. "timestamp": "2016-01-05T00:02:51.383+0000"
  979. }
  980. }
  981. }
  982. ]
  983. },
  984. {
  985. "processor_results": [
  986. {
  987. "tag": "processor[set]-0",
  988. "doc": {
  989. "_id": "id",
  990. "_ttl": null,
  991. "_parent": null,
  992. "_index": "index",
  993. "_routing": null,
  994. "_type": "type",
  995. "_timestamp": null,
  996. "_source": {
  997. "field2": "_value2",
  998. "foo": "rab"
  999. },
  1000. "_ingest": {
  1001. "timestamp": "2016-01-05T00:02:51.384+0000"
  1002. }
  1003. }
  1004. },
  1005. {
  1006. "tag": "processor[set]-1",
  1007. "doc": {
  1008. "_id": "id",
  1009. "_ttl": null,
  1010. "_parent": null,
  1011. "_index": "index",
  1012. "_routing": null,
  1013. "_type": "type",
  1014. "_timestamp": null,
  1015. "_source": {
  1016. "field3": "_value3",
  1017. "field2": "_value2",
  1018. "foo": "rab"
  1019. },
  1020. "_ingest": {
  1021. "timestamp": "2016-01-05T00:02:51.384+0000"
  1022. }
  1023. }
  1024. }
  1025. ]
  1026. }
  1027. ]
  1028. }
  1029. --------------------------------------------------