enrich.asciidoc 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[ingest-enriching-data]]
  4. == Enrich your data
  5. You can use the <<enrich-processor,enrich processor>> to add data from your
  6. existing indices to incoming documents during ingest.
  7. For example, you can use the enrich processor to:
  8. * Identify web services or vendors based on known IP addresses
  9. * Add product information to retail orders based on product IDs
  10. * Supplement contact information based on an email address
  11. * Add postal codes based on user coordinates
  12. [discrete]
  13. [[how-enrich-works]]
  14. === How the enrich processor works
  15. An <<ingest,ingest pipeline>> changes documents before they are actually
  16. indexed. You can think of an ingest pipeline as an assembly line made up of a
  17. series of workers, called <<ingest-processors,processors>>. Each processor makes
  18. specific changes, like lowercasing field values, to incoming documents before
  19. moving on to the next. When all the processors in a pipeline are done, the
  20. finished document is added to the target index.
  21. image::images/ingest/ingest-process.svg[align="center"]
  22. Most processors are self-contained and only change _existing_ data in incoming
  23. documents. But the enrich processor adds _new_ data to incoming documents
  24. and requires a few special components:
  25. image::images/ingest/enrich/enrich-process.svg[align="center"]
  26. [[enrich-policy]]
  27. enrich policy::
  28. +
  29. --
  30. A set of configuration options used to add the right enrich data to the right
  31. incoming documents.
  32. An enrich policy contains:
  33. // tag::enrich-policy-fields[]
  34. * A list of one or more _source indices_ which store enrich data as documents
  35. * The _policy type_ which determines how the processor matches the enrich data
  36. to incoming documents
  37. * A _match field_ from the source indices used to match incoming documents
  38. * _Enrich fields_ containing enrich data from the source indices you want to add
  39. to incoming documents
  40. // end::enrich-policy-fields[]
  41. Before it can be used with an enrich processor, an enrich policy must be
  42. <<execute-enrich-policy-api,executed>>. When executed, an enrich policy uses
  43. enrich data from the policy's source indices to create a streamlined system
  44. index called the _enrich index_. The processor uses this index to match and
  45. enrich incoming documents.
  46. See <<enrich-policy-definition>> for a full list of enrich policy types and
  47. configuration options.
  48. --
  49. [[source-index]]
  50. source index::
  51. An index which stores enrich data you'd like to add to incoming documents. You
  52. can create and manage these indices just like a regular {es} index. You can use
  53. multiple source indices in an enrich policy. You also can use the same source
  54. index in multiple enrich policies.
  55. [[enrich-index]]
  56. enrich index::
  57. +
  58. --
  59. A special system index tied to a specific enrich policy.
  60. Directly matching incoming documents to documents in source indices could be
  61. slow and resource intensive. To speed things up, the enrich processor uses an
  62. enrich index.
  63. Enrich indices contain enrich data from source indices but have a few special
  64. properties to help streamline them:
  65. * They are system indices, meaning they're managed internally by {es} and only
  66. intended for use with enrich processors.
  67. * They always begin with `.enrich-*`.
  68. * They are read-only, meaning you can't directly change them.
  69. * They are <<indices-forcemerge,force merged>> for fast retrieval.
  70. --
  71. [role="xpack"]
  72. [testenv="basic"]
  73. [[enrich-setup]]
  74. === Set up an enrich processor
  75. To set up an enrich processor, follow these steps:
  76. . Check the <<enrich-prereqs, prerequisites>>.
  77. . <<create-enrich-source-index>>.
  78. . <<create-enrich-policy>>.
  79. . <<execute-enrich-policy>>.
  80. . <<add-enrich-processor>>.
  81. . <<ingest-enrich-docs>>.
  82. Once you have an enrich processor set up,
  83. you can <<update-enrich-data,update your enrich data>>
  84. and <<update-enrich-policies, update your enrich policies>>.
  85. [IMPORTANT]
  86. ====
  87. The enrich processor performs several operations
  88. and may impact the speed of your <<pipeline,ingest pipeline>>.
  89. We strongly recommend testing and benchmarking your enrich processors
  90. before deploying them in production.
  91. We do not recommend using the enrich processor to append real-time data.
  92. The enrich processor works best with reference data
  93. that doesn't change frequently.
  94. ====
  95. [float]
  96. [[enrich-prereqs]]
  97. ==== Prerequisites
  98. include::{es-repo-dir}/ingest/apis/enrich/put-enrich-policy.asciidoc[tag=enrich-policy-api-prereqs]
  99. [[create-enrich-source-index]]
  100. ==== Add enrich data
  101. To begin, add documents to one or more source indices. These documents should
  102. contain the enrich data you eventually want to add to incoming documents.
  103. You can manage source indices just like regular {es} indices using the
  104. <<docs,document>> and <<indices,index>> APIs.
  105. You also can set up {beats-ref}/getting-started.html[{beats}], such as a
  106. {filebeat-ref}/filebeat-installation-configuration.html[{filebeat}], to
  107. automatically send and index documents to your source indices. See
  108. {beats-ref}/getting-started.html[Getting started with {beats}].
  109. [[create-enrich-policy]]
  110. ==== Create an enrich policy
  111. After adding enrich data to your source indices, you can
  112. <<enrich-policy-definition,define an enrich policy>>. When defining the enrich
  113. policy, you should include at least the following:
  114. include::enrich.asciidoc[tag=enrich-policy-fields]
  115. You can use this definition to create the enrich policy with the
  116. <<put-enrich-policy-api,put enrich policy API>>.
  117. [WARNING]
  118. ====
  119. Once created, you can't update or change an enrich policy.
  120. See <<update-enrich-policies>>.
  121. ====
  122. [[execute-enrich-policy]]
  123. ==== Execute the enrich policy
  124. Once the enrich policy is created, you can execute it using the
  125. <<execute-enrich-policy-api,execute enrich policy API>> to create an
  126. <<enrich-index,enrich index>>.
  127. image::images/ingest/enrich/enrich-policy-index.svg[align="center"]
  128. include::apis/enrich/execute-enrich-policy.asciidoc[tag=execute-enrich-policy-def]
  129. [[add-enrich-processor]]
  130. ==== Add an enrich processor to an ingest pipeline
  131. Once you have source indices, an enrich policy, and the related enrich index in
  132. place, you can set up an ingest pipeline that includes an enrich processor for
  133. your policy.
  134. image::images/ingest/enrich/enrich-processor.svg[align="center"]
  135. Define an <<enrich-processor,enrich processor>> and add it to an ingest
  136. pipeline using the <<put-pipeline-api,put pipeline API>>.
  137. When defining the enrich processor, you must include at least the following:
  138. * The enrich policy to use.
  139. * The field used to match incoming documents to the documents in your enrich index.
  140. * The target field to add to incoming documents. This target field contains the
  141. match and enrich fields specified in your enrich policy.
  142. You also can use the `max_matches` option to set the number of enrich documents
  143. an incoming document can match. If set to the default of `1`, data is added to
  144. an incoming document's target field as a JSON object. Otherwise, the data is
  145. added as an array.
  146. See <<enrich-processor>> for a full list of configuration options.
  147. You also can add other <<ingest-processors,processors>> to your ingest pipeline.
  148. [[ingest-enrich-docs]]
  149. ==== Ingest and enrich documents
  150. You can now use your ingest pipeline to enrich and index documents.
  151. image::images/ingest/enrich/enrich-process.svg[align="center"]
  152. Before implementing the pipeline in production, we recommend indexing a few test
  153. documents first and verifying enrich data was added correctly using the
  154. <<docs-get,get API>>.
  155. [[update-enrich-data]]
  156. ==== Update an enrich index
  157. include::{es-repo-dir}/ingest/apis/enrich/execute-enrich-policy.asciidoc[tag=update-enrich-index]
  158. If wanted, you can <<docs-reindex,reindex>>
  159. or <<docs-update-by-query,update>> any already ingested documents
  160. using your ingest pipeline.
  161. [[update-enrich-policies]]
  162. ==== Update an enrich policy
  163. // tag::update-enrich-policy[]
  164. Once created, you can't update or change an enrich policy.
  165. Instead, you can:
  166. . Create and <<execute-enrich-policy-api,execute>> a new enrich policy.
  167. . Replace the previous enrich policy
  168. with the new enrich policy
  169. in any in-use enrich processors.
  170. . Use the <<delete-enrich-policy-api, delete enrich policy>> API
  171. to delete the previous enrich policy.
  172. // end::update-enrich-policy[]
  173. [role="xpack"]
  174. [testenv="basic"]
  175. [[enrich-policy-definition]]
  176. === Enrich policy definition
  177. <<enrich-policy,Enrich policies>> are defined as JSON objects like the
  178. following:
  179. [source,js]
  180. ----
  181. {
  182. "<enrich_policy_type>": {
  183. "indices": ["..."],
  184. "match_field": "...",
  185. "enrich_fields": ["..."],
  186. "query": {...}
  187. }
  188. }
  189. }
  190. ----
  191. // NOTCONSOLE
  192. [[enrich-policy-parms]]
  193. ==== Parameters
  194. `<enrich_policy_type>`::
  195. +
  196. --
  197. (Required, enrich policy object)
  198. The enrich policy type determines how enrich data is matched to incoming
  199. documents.
  200. Supported enrich policy types include:
  201. <<geo-match-enrich-policy-type,`geo_match`>>:::
  202. Matches enrich data to incoming documents based on a geographic location using
  203. a <<query-dsl-geo-shape-query,`geo_shape` query>>. For an example, see
  204. <<geo-match-enrich-policy-type>>.
  205. <<match-enrich-policy-type,`match`>>:::
  206. Matches enrich data to incoming documents based on a precise value, such as an
  207. email address or ID, using a <<query-dsl-term-query,`term` query>>. For an
  208. example, see <<match-enrich-policy-type>>.
  209. --
  210. `indices`::
  211. +
  212. --
  213. (Required, String or array of strings)
  214. Source indices used to create the enrich index.
  215. If multiple indices are provided, they must share a common `match_field`, which
  216. the enrich processor can use to match incoming documents.
  217. --
  218. `match_field`::
  219. (Required, string)
  220. Field in the source indices used to match incoming documents.
  221. `enrich_fields`::
  222. (Required, Array of strings)
  223. Fields to add to matching incoming documents. These fields must be present in
  224. the source indices.
  225. `query`::
  226. (Optional, <<query-dsl,Query DSL query object>>)
  227. Query used to filter documents in the enrich index for matching. Defaults to
  228. a <<query-dsl-match-all-query,`match_all`>> query.
  229. [role="xpack"]
  230. [testenv="basic"]
  231. [[geo-match-enrich-policy-type]]
  232. === Example: Enrich your data based on geolocation
  233. `geo_match` <<enrich-policy,enrich policies>> match enrich data to incoming
  234. documents based on a geographic location, using a
  235. <<query-dsl-geo-shape-query,`geo_shape` query>>.
  236. The following example creates a `geo_match` enrich policy that adds postal
  237. codes to incoming documents based on a set of coordinates. It then adds the
  238. `geo_match` enrich policy to a processor in an ingest pipeline.
  239. Use the <<indices-create-index,create index API>> to create a source index
  240. containing at least one `geo_shape` field.
  241. [source,console]
  242. ----
  243. PUT /postal_codes
  244. {
  245. "mappings": {
  246. "properties": {
  247. "location": {
  248. "type": "geo_shape"
  249. },
  250. "postal_code": {
  251. "type": "keyword"
  252. }
  253. }
  254. }
  255. }
  256. ----
  257. Use the <<docs-index_,index API>> to index enrich data to this source index.
  258. [source,console]
  259. ----
  260. PUT /postal_codes/_doc/1?refresh=wait_for
  261. {
  262. "location": {
  263. "type": "envelope",
  264. "coordinates": [[13.0, 53.0], [14.0, 52.0]]
  265. },
  266. "postal_code": "96598"
  267. }
  268. ----
  269. // TEST[continued]
  270. Use the <<put-enrich-policy-api,put enrich policy API>> to create an enrich
  271. policy with the `geo_match` policy type. This policy must include:
  272. * One or more source indices
  273. * A `match_field`,
  274. the `geo_shape` field from the source indices used to match incoming documents
  275. * Enrich fields from the source indices you'd like to append to incoming
  276. documents
  277. [source,console]
  278. ----
  279. PUT /_enrich/policy/postal_policy
  280. {
  281. "geo_match": {
  282. "indices": "postal_codes",
  283. "match_field": "location",
  284. "enrich_fields": ["location","postal_code"]
  285. }
  286. }
  287. ----
  288. // TEST[continued]
  289. Use the <<execute-enrich-policy-api,execute enrich policy API>> to create an
  290. enrich index for the policy.
  291. [source,console]
  292. ----
  293. POST /_enrich/policy/postal_policy/_execute
  294. ----
  295. // TEST[continued]
  296. Use the <<put-pipeline-api,put pipeline API>> to create an ingest pipeline. In
  297. the pipeline, add an <<enrich-processor,enrich processor>> that includes:
  298. * Your enrich policy.
  299. * The `field` of incoming documents used to match the geo_shape of documents
  300. from the enrich index.
  301. * The `target_field` used to store appended enrich data for incoming documents.
  302. This field contains the `match_field` and `enrich_fields` specified in your
  303. enrich policy.
  304. * The `shape_relation`, which indicates how the processor matches geo_shapes in
  305. incoming documents to geo_shapes in documents from the enrich index. See
  306. <<_spatial_relations>> for valid options and more information.
  307. [source,console]
  308. ----
  309. PUT /_ingest/pipeline/postal_lookup
  310. {
  311. "description": "Enrich postal codes",
  312. "processors": [
  313. {
  314. "enrich": {
  315. "policy_name": "postal_policy",
  316. "field": "geo_location",
  317. "target_field": "geo_data",
  318. "shape_relation": "INTERSECTS"
  319. }
  320. }
  321. ]
  322. }
  323. ----
  324. // TEST[continued]
  325. Use the ingest pipeline to index a document. The incoming document should
  326. include the `field` specified in your enrich processor.
  327. [source,console]
  328. ----
  329. PUT /users/_doc/0?pipeline=postal_lookup
  330. {
  331. "first_name": "Mardy",
  332. "last_name": "Brown",
  333. "geo_location": "POINT (13.5 52.5)"
  334. }
  335. ----
  336. // TEST[continued]
  337. To verify the enrich processor matched and appended the appropriate field data,
  338. use the <<docs-get,get API>> to view the indexed document.
  339. [source,console]
  340. ----
  341. GET /users/_doc/0
  342. ----
  343. // TEST[continued]
  344. The API returns the following response:
  345. [source,console-result]
  346. ----
  347. {
  348. "found": true,
  349. "_index": "users",
  350. "_id": "0",
  351. "_version": 1,
  352. "_seq_no": 55,
  353. "_primary_term": 1,
  354. "_source": {
  355. "geo_data": {
  356. "location": {
  357. "type": "envelope",
  358. "coordinates": [[13.0, 53.0], [14.0, 52.0]]
  359. },
  360. "postal_code": "96598"
  361. },
  362. "first_name": "Mardy",
  363. "last_name": "Brown",
  364. "geo_location": "POINT (13.5 52.5)"
  365. }
  366. }
  367. ----
  368. // TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term":1/"_primary_term" : $body._primary_term/]
  369. ////
  370. [source,console]
  371. --------------------------------------------------
  372. DELETE /_ingest/pipeline/postal_lookup
  373. DELETE /_enrich/policy/postal_policy
  374. --------------------------------------------------
  375. // TEST[continued]
  376. ////
  377. [role="xpack"]
  378. [testenv="basic"]
  379. [[match-enrich-policy-type]]
  380. === Example: Enrich your data based on exact values
  381. `match` <<enrich-policy,enrich policies>> match enrich data to incoming
  382. documents based on an exact value, such as a email address or ID, using a
  383. <<query-dsl-term-query,`term` query>>.
  384. The following example creates a `match` enrich policy that adds user name and
  385. contact information to incoming documents based on an email address. It then
  386. adds the `match` enrich policy to a processor in an ingest pipeline.
  387. Use the <<indices-create-index, create index API>> or <<docs-index_,index
  388. API>> to create a source index.
  389. The following index API request creates a source index and indexes a
  390. new document to that index.
  391. [source,console]
  392. ----
  393. PUT /users/_doc/1?refresh=wait_for
  394. {
  395. "email": "mardy.brown@asciidocsmith.com",
  396. "first_name": "Mardy",
  397. "last_name": "Brown",
  398. "city": "New Orleans",
  399. "county": "Orleans",
  400. "state": "LA",
  401. "zip": 70116,
  402. "web": "mardy.asciidocsmith.com"
  403. }
  404. ----
  405. Use the put enrich policy API to create an enrich policy with the `match`
  406. policy type. This policy must include:
  407. * One or more source indices
  408. * A `match_field`,
  409. the field from the source indices used to match incoming documents
  410. * Enrich fields from the source indices you'd like to append to incoming
  411. documents
  412. [source,console]
  413. ----
  414. PUT /_enrich/policy/users-policy
  415. {
  416. "match": {
  417. "indices": "users",
  418. "match_field": "email",
  419. "enrich_fields": ["first_name", "last_name", "city", "zip", "state"]
  420. }
  421. }
  422. ----
  423. // TEST[continued]
  424. Use the <<execute-enrich-policy-api,execute enrich policy API>> to create an
  425. enrich index for the policy.
  426. [source,console]
  427. ----
  428. POST /_enrich/policy/users-policy/_execute
  429. ----
  430. // TEST[continued]
  431. Use the <<put-pipeline-api,put pipeline API>> to create an ingest pipeline. In
  432. the pipeline, add an <<enrich-processor,enrich processor>> that includes:
  433. * Your enrich policy.
  434. * The `field` of incoming documents used to match documents
  435. from the enrich index.
  436. * The `target_field` used to store appended enrich data for incoming documents.
  437. This field contains the `match_field` and `enrich_fields` specified in your
  438. enrich policy.
  439. [source,console]
  440. ----
  441. PUT /_ingest/pipeline/user_lookup
  442. {
  443. "description" : "Enriching user details to messages",
  444. "processors" : [
  445. {
  446. "enrich" : {
  447. "policy_name": "users-policy",
  448. "field" : "email",
  449. "target_field": "user",
  450. "max_matches": "1"
  451. }
  452. }
  453. ]
  454. }
  455. ----
  456. // TEST[continued]
  457. Use the ingest pipeline to index a document. The incoming document should
  458. include the `field` specified in your enrich processor.
  459. [source,console]
  460. ----
  461. PUT /my_index/_doc/my_id?pipeline=user_lookup
  462. {
  463. "email": "mardy.brown@asciidocsmith.com"
  464. }
  465. ----
  466. // TEST[continued]
  467. To verify the enrich processor matched and appended the appropriate field data,
  468. use the <<docs-get,get API>> to view the indexed document.
  469. [source,console]
  470. ----
  471. GET /my_index/_doc/my_id
  472. ----
  473. // TEST[continued]
  474. The API returns the following response:
  475. [source,console-result]
  476. ----
  477. {
  478. "found": true,
  479. "_index": "my_index",
  480. "_id": "my_id",
  481. "_version": 1,
  482. "_seq_no": 55,
  483. "_primary_term": 1,
  484. "_source": {
  485. "user": {
  486. "email": "mardy.brown@asciidocsmith.com",
  487. "first_name": "Mardy",
  488. "last_name": "Brown",
  489. "zip": 70116,
  490. "city": "New Orleans",
  491. "state": "LA"
  492. },
  493. "email": "mardy.brown@asciidocsmith.com"
  494. }
  495. }
  496. ----
  497. // TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term":1/"_primary_term" : $body._primary_term/]
  498. ////
  499. [source,console]
  500. --------------------------------------------------
  501. DELETE /_ingest/pipeline/user_lookup
  502. DELETE /_enrich/policy/users-policy
  503. --------------------------------------------------
  504. // TEST[continued]
  505. ////