enrich.asciidoc 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[ingest-enriching-data]]
  4. == Enrich your data
  5. You can use the <<enrich-processor,enrich processor>> to add data from your
  6. existing indices to incoming documents during ingest.
  7. For example, you can use the enrich processor to:
  8. * Identify web services or vendors based on known IP addresses
  9. * Add product information to retail orders based on product IDs
  10. * Supplement contact information based on an email address
  11. * Add postal codes based on user coordinates
  12. [discrete]
  13. [[how-enrich-works]]
  14. === How the enrich processor works
  15. Most processors are self-contained and only change _existing_ data in incoming
  16. documents.
  17. image::images/ingest/ingest-process.svg[align="center"]
  18. The enrich processor adds _new_ data to incoming documents and requires a few
  19. special components:
  20. image::images/ingest/enrich/enrich-process.svg[align="center"]
  21. [[enrich-policy]]
  22. enrich policy::
  23. +
  24. --
  25. A set of configuration options used to add the right enrich data to the right
  26. incoming documents.
  27. An enrich policy contains:
  28. // tag::enrich-policy-fields[]
  29. * A list of one or more _source indices_ which store enrich data as documents
  30. * The _policy type_ which determines how the processor matches the enrich data
  31. to incoming documents
  32. * A _match field_ from the source indices used to match incoming documents
  33. * _Enrich fields_ containing enrich data from the source indices you want to add
  34. to incoming documents
  35. // end::enrich-policy-fields[]
  36. Before it can be used with an enrich processor, an enrich policy must be
  37. <<execute-enrich-policy-api,executed>>. When executed, an enrich policy uses
  38. enrich data from the policy's source indices to create a streamlined system
  39. index called the _enrich index_. The processor uses this index to match and
  40. enrich incoming documents.
  41. See <<enrich-policy-definition>> for a full list of enrich policy types and
  42. configuration options.
  43. --
  44. [[source-index]]
  45. source index::
  46. An index which stores enrich data you'd like to add to incoming documents. You
  47. can create and manage these indices just like a regular {es} index. You can use
  48. multiple source indices in an enrich policy. You also can use the same source
  49. index in multiple enrich policies.
  50. [[enrich-index]]
  51. enrich index::
  52. +
  53. --
  54. A special system index tied to a specific enrich policy.
  55. Directly matching incoming documents to documents in source indices could be
  56. slow and resource intensive. To speed things up, the enrich processor uses an
  57. enrich index.
  58. Enrich indices contain enrich data from source indices but have a few special
  59. properties to help streamline them:
  60. * They are system indices, meaning they're managed internally by {es} and only
  61. intended for use with enrich processors.
  62. * They always begin with `.enrich-*`.
  63. * They are read-only, meaning you can't directly change them.
  64. * They are <<indices-forcemerge,force merged>> for fast retrieval.
  65. --
  66. [role="xpack"]
  67. [testenv="basic"]
  68. [[enrich-setup]]
  69. === Set up an enrich processor
  70. To set up an enrich processor, follow these steps:
  71. . Check the <<enrich-prereqs, prerequisites>>.
  72. . <<create-enrich-source-index>>.
  73. . <<create-enrich-policy>>.
  74. . <<execute-enrich-policy>>.
  75. . <<add-enrich-processor>>.
  76. . <<ingest-enrich-docs>>.
  77. Once you have an enrich processor set up,
  78. you can <<update-enrich-data,update your enrich data>>
  79. and <<update-enrich-policies, update your enrich policies>>.
  80. [IMPORTANT]
  81. ====
  82. The enrich processor performs several operations and may impact the speed of
  83. your ingest pipeline.
  84. We strongly recommend testing and benchmarking your enrich processors
  85. before deploying them in production.
  86. We do not recommend using the enrich processor to append real-time data.
  87. The enrich processor works best with reference data
  88. that doesn't change frequently.
  89. ====
  90. [discrete]
  91. [[enrich-prereqs]]
  92. ==== Prerequisites
  93. include::{es-repo-dir}/ingest/apis/enrich/put-enrich-policy.asciidoc[tag=enrich-policy-api-prereqs]
  94. [[create-enrich-source-index]]
  95. ==== Add enrich data
  96. To begin, add documents to one or more source indices. These documents should
  97. contain the enrich data you eventually want to add to incoming documents.
  98. You can manage source indices just like regular {es} indices using the
  99. <<docs,document>> and <<indices,index>> APIs.
  100. You also can set up {beats-ref}/getting-started.html[{beats}], such as a
  101. {filebeat-ref}/filebeat-installation-configuration.html[{filebeat}], to
  102. automatically send and index documents to your source indices. See
  103. {beats-ref}/getting-started.html[Getting started with {beats}].
  104. [[create-enrich-policy]]
  105. ==== Create an enrich policy
  106. After adding enrich data to your source indices, you can
  107. <<enrich-policy-definition,define an enrich policy>>. When defining the enrich
  108. policy, you should include at least the following:
  109. include::enrich.asciidoc[tag=enrich-policy-fields]
  110. You can use this definition to create the enrich policy with the
  111. <<put-enrich-policy-api,create or update enrich policy API>>.
  112. [WARNING]
  113. ====
  114. Once created, you can't update or change an enrich policy.
  115. See <<update-enrich-policies>>.
  116. ====
  117. [[execute-enrich-policy]]
  118. ==== Execute the enrich policy
  119. Once the enrich policy is created, you can execute it using the
  120. <<execute-enrich-policy-api,execute enrich policy API>> to create an
  121. <<enrich-index,enrich index>>.
  122. image::images/ingest/enrich/enrich-policy-index.svg[align="center"]
  123. include::apis/enrich/execute-enrich-policy.asciidoc[tag=execute-enrich-policy-def]
  124. [[add-enrich-processor]]
  125. ==== Add an enrich processor to an ingest pipeline
  126. Once you have source indices, an enrich policy, and the related enrich index in
  127. place, you can set up an ingest pipeline that includes an enrich processor for
  128. your policy.
  129. image::images/ingest/enrich/enrich-processor.svg[align="center"]
  130. Define an <<enrich-processor,enrich processor>> and add it to an ingest
  131. pipeline using the <<put-pipeline-api,create or update pipeline API>>.
  132. When defining the enrich processor, you must include at least the following:
  133. * The enrich policy to use.
  134. * The field used to match incoming documents to the documents in your enrich index.
  135. * The target field to add to incoming documents. This target field contains the
  136. match and enrich fields specified in your enrich policy.
  137. You also can use the `max_matches` option to set the number of enrich documents
  138. an incoming document can match. If set to the default of `1`, data is added to
  139. an incoming document's target field as a JSON object. Otherwise, the data is
  140. added as an array.
  141. See <<enrich-processor>> for a full list of configuration options.
  142. You also can add other <<processors,processors>> to your ingest pipeline.
  143. [[ingest-enrich-docs]]
  144. ==== Ingest and enrich documents
  145. You can now use your ingest pipeline to enrich and index documents.
  146. image::images/ingest/enrich/enrich-process.svg[align="center"]
  147. Before implementing the pipeline in production, we recommend indexing a few test
  148. documents first and verifying enrich data was added correctly using the
  149. <<docs-get,get API>>.
  150. [[update-enrich-data]]
  151. ==== Update an enrich index
  152. include::{es-repo-dir}/ingest/apis/enrich/execute-enrich-policy.asciidoc[tag=update-enrich-index]
  153. If wanted, you can <<docs-reindex,reindex>>
  154. or <<docs-update-by-query,update>> any already ingested documents
  155. using your ingest pipeline.
  156. [[update-enrich-policies]]
  157. ==== Update an enrich policy
  158. // tag::update-enrich-policy[]
  159. Once created, you can't update or change an enrich policy.
  160. Instead, you can:
  161. . Create and <<execute-enrich-policy-api,execute>> a new enrich policy.
  162. . Replace the previous enrich policy
  163. with the new enrich policy
  164. in any in-use enrich processors.
  165. . Use the <<delete-enrich-policy-api, delete enrich policy>> API
  166. to delete the previous enrich policy.
  167. // end::update-enrich-policy[]
  168. [role="xpack"]
  169. [testenv="basic"]
  170. [[enrich-policy-definition]]
  171. === Enrich policy definition
  172. <<enrich-policy,Enrich policies>> are defined as JSON objects like the
  173. following:
  174. [source,js]
  175. ----
  176. {
  177. "<enrich_policy_type>": {
  178. "indices": [ "..." ],
  179. "match_field": "...",
  180. "enrich_fields": [ "..." ],
  181. "query": {... }
  182. }
  183. }
  184. ----
  185. // NOTCONSOLE
  186. [[enrich-policy-parms]]
  187. ==== Parameters
  188. `<enrich_policy_type>`::
  189. +
  190. --
  191. (Required, enrich policy object)
  192. The enrich policy type determines how enrich data is matched to incoming
  193. documents.
  194. Supported enrich policy types include:
  195. <<geo-match-enrich-policy-type,`geo_match`>>:::
  196. Matches enrich data to incoming documents based on a geographic location using
  197. a <<query-dsl-geo-shape-query,`geo_shape` query>>. For an example, see
  198. <<geo-match-enrich-policy-type>>.
  199. <<match-enrich-policy-type,`match`>>:::
  200. Matches enrich data to incoming documents based on a precise value, such as an
  201. email address or ID, using a <<query-dsl-term-query,`term` query>>. For an
  202. example, see <<match-enrich-policy-type>>.
  203. --
  204. `indices`::
  205. +
  206. --
  207. (Required, String or array of strings)
  208. Source indices used to create the enrich index.
  209. If multiple indices are provided, they must share a common `match_field`, which
  210. the enrich processor can use to match incoming documents.
  211. --
  212. `match_field`::
  213. (Required, string)
  214. Field in the source indices used to match incoming documents.
  215. `enrich_fields`::
  216. (Required, Array of strings)
  217. Fields to add to matching incoming documents. These fields must be present in
  218. the source indices.
  219. `query`::
  220. (Optional, <<query-dsl,Query DSL query object>>)
  221. Query used to filter documents in the enrich index for matching. Defaults to
  222. a <<query-dsl-match-all-query,`match_all`>> query.
  223. [role="xpack"]
  224. [testenv="basic"]
  225. [[geo-match-enrich-policy-type]]
  226. === Example: Enrich your data based on geolocation
  227. `geo_match` <<enrich-policy,enrich policies>> match enrich data to incoming
  228. documents based on a geographic location, using a
  229. <<query-dsl-geo-shape-query,`geo_shape` query>>.
  230. The following example creates a `geo_match` enrich policy that adds postal
  231. codes to incoming documents based on a set of coordinates. It then adds the
  232. `geo_match` enrich policy to a processor in an ingest pipeline.
  233. Use the <<indices-create-index,create index API>> to create a source index
  234. containing at least one `geo_shape` field.
  235. [source,console]
  236. ----
  237. PUT /postal_codes
  238. {
  239. "mappings": {
  240. "properties": {
  241. "location": {
  242. "type": "geo_shape"
  243. },
  244. "postal_code": {
  245. "type": "keyword"
  246. }
  247. }
  248. }
  249. }
  250. ----
  251. Use the <<docs-index_,index API>> to index enrich data to this source index.
  252. [source,console]
  253. ----
  254. PUT /postal_codes/_doc/1?refresh=wait_for
  255. {
  256. "location": {
  257. "type": "envelope",
  258. "coordinates": [ [ 13.0, 53.0 ], [ 14.0, 52.0 ] ]
  259. },
  260. "postal_code": "96598"
  261. }
  262. ----
  263. // TEST[continued]
  264. Use the <<put-enrich-policy-api,create or update enrich policy API>> to create
  265. an enrich policy with the `geo_match` policy type. This policy must include:
  266. * One or more source indices
  267. * A `match_field`,
  268. the `geo_shape` field from the source indices used to match incoming documents
  269. * Enrich fields from the source indices you'd like to append to incoming
  270. documents
  271. [source,console]
  272. ----
  273. PUT /_enrich/policy/postal_policy
  274. {
  275. "geo_match": {
  276. "indices": "postal_codes",
  277. "match_field": "location",
  278. "enrich_fields": [ "location", "postal_code" ]
  279. }
  280. }
  281. ----
  282. // TEST[continued]
  283. Use the <<execute-enrich-policy-api,execute enrich policy API>> to create an
  284. enrich index for the policy.
  285. [source,console]
  286. ----
  287. POST /_enrich/policy/postal_policy/_execute
  288. ----
  289. // TEST[continued]
  290. Use the <<put-pipeline-api,create or update pipeline API>> to create an ingest
  291. pipeline. In the pipeline, add an <<enrich-processor,enrich processor>> that
  292. includes:
  293. * Your enrich policy.
  294. * The `field` of incoming documents used to match the geo_shape of documents
  295. from the enrich index.
  296. * The `target_field` used to store appended enrich data for incoming documents.
  297. This field contains the `match_field` and `enrich_fields` specified in your
  298. enrich policy.
  299. * The `shape_relation`, which indicates how the processor matches geo_shapes in
  300. incoming documents to geo_shapes in documents from the enrich index. See
  301. <<_spatial_relations>> for valid options and more information.
  302. [source,console]
  303. ----
  304. PUT /_ingest/pipeline/postal_lookup
  305. {
  306. "description": "Enrich postal codes",
  307. "processors": [
  308. {
  309. "enrich": {
  310. "policy_name": "postal_policy",
  311. "field": "geo_location",
  312. "target_field": "geo_data",
  313. "shape_relation": "INTERSECTS"
  314. }
  315. }
  316. ]
  317. }
  318. ----
  319. // TEST[continued]
  320. Use the ingest pipeline to index a document. The incoming document should
  321. include the `field` specified in your enrich processor.
  322. [source,console]
  323. ----
  324. PUT /users/_doc/0?pipeline=postal_lookup
  325. {
  326. "first_name": "Mardy",
  327. "last_name": "Brown",
  328. "geo_location": "POINT (13.5 52.5)"
  329. }
  330. ----
  331. // TEST[continued]
  332. To verify the enrich processor matched and appended the appropriate field data,
  333. use the <<docs-get,get API>> to view the indexed document.
  334. [source,console]
  335. ----
  336. GET /users/_doc/0
  337. ----
  338. // TEST[continued]
  339. The API returns the following response:
  340. [source,console-result]
  341. ----
  342. {
  343. "found": true,
  344. "_index": "users",
  345. "_id": "0",
  346. "_version": 1,
  347. "_seq_no": 55,
  348. "_primary_term": 1,
  349. "_source": {
  350. "geo_data": {
  351. "location": {
  352. "type": "envelope",
  353. "coordinates": [[13.0, 53.0], [14.0, 52.0]]
  354. },
  355. "postal_code": "96598"
  356. },
  357. "first_name": "Mardy",
  358. "last_name": "Brown",
  359. "geo_location": "POINT (13.5 52.5)"
  360. }
  361. }
  362. ----
  363. // TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term":1/"_primary_term" : $body._primary_term/]
  364. ////
  365. [source,console]
  366. --------------------------------------------------
  367. DELETE /_ingest/pipeline/postal_lookup
  368. DELETE /_enrich/policy/postal_policy
  369. --------------------------------------------------
  370. // TEST[continued]
  371. ////
  372. [role="xpack"]
  373. [testenv="basic"]
  374. [[match-enrich-policy-type]]
  375. === Example: Enrich your data based on exact values
  376. `match` <<enrich-policy,enrich policies>> match enrich data to incoming
  377. documents based on an exact value, such as a email address or ID, using a
  378. <<query-dsl-term-query,`term` query>>.
  379. The following example creates a `match` enrich policy that adds user name and
  380. contact information to incoming documents based on an email address. It then
  381. adds the `match` enrich policy to a processor in an ingest pipeline.
  382. Use the <<indices-create-index, create index API>> or <<docs-index_,index
  383. API>> to create a source index.
  384. The following index API request creates a source index and indexes a
  385. new document to that index.
  386. [source,console]
  387. ----
  388. PUT /users/_doc/1?refresh=wait_for
  389. {
  390. "email": "mardy.brown@asciidocsmith.com",
  391. "first_name": "Mardy",
  392. "last_name": "Brown",
  393. "city": "New Orleans",
  394. "county": "Orleans",
  395. "state": "LA",
  396. "zip": 70116,
  397. "web": "mardy.asciidocsmith.com"
  398. }
  399. ----
  400. Use the create or update enrich policy API to create an enrich policy with the
  401. `match` policy type. This policy must include:
  402. * One or more source indices
  403. * A `match_field`,
  404. the field from the source indices used to match incoming documents
  405. * Enrich fields from the source indices you'd like to append to incoming
  406. documents
  407. [source,console]
  408. ----
  409. PUT /_enrich/policy/users-policy
  410. {
  411. "match": {
  412. "indices": "users",
  413. "match_field": "email",
  414. "enrich_fields": ["first_name", "last_name", "city", "zip", "state"]
  415. }
  416. }
  417. ----
  418. // TEST[continued]
  419. Use the <<execute-enrich-policy-api,execute enrich policy API>> to create an
  420. enrich index for the policy.
  421. [source,console]
  422. ----
  423. POST /_enrich/policy/users-policy/_execute
  424. ----
  425. // TEST[continued]
  426. Use the <<put-pipeline-api,create or update pipeline API>> to create an ingest
  427. pipeline. In the pipeline, add an <<enrich-processor,enrich processor>> that
  428. includes:
  429. * Your enrich policy.
  430. * The `field` of incoming documents used to match documents
  431. from the enrich index.
  432. * The `target_field` used to store appended enrich data for incoming documents.
  433. This field contains the `match_field` and `enrich_fields` specified in your
  434. enrich policy.
  435. [source,console]
  436. ----
  437. PUT /_ingest/pipeline/user_lookup
  438. {
  439. "description" : "Enriching user details to messages",
  440. "processors" : [
  441. {
  442. "enrich" : {
  443. "policy_name": "users-policy",
  444. "field" : "email",
  445. "target_field": "user",
  446. "max_matches": "1"
  447. }
  448. }
  449. ]
  450. }
  451. ----
  452. // TEST[continued]
  453. Use the ingest pipeline to index a document. The incoming document should
  454. include the `field` specified in your enrich processor.
  455. [source,console]
  456. ----
  457. PUT /my-index-000001/_doc/my_id?pipeline=user_lookup
  458. {
  459. "email": "mardy.brown@asciidocsmith.com"
  460. }
  461. ----
  462. // TEST[continued]
  463. To verify the enrich processor matched and appended the appropriate field data,
  464. use the <<docs-get,get API>> to view the indexed document.
  465. [source,console]
  466. ----
  467. GET /my-index-000001/_doc/my_id
  468. ----
  469. // TEST[continued]
  470. The API returns the following response:
  471. [source,console-result]
  472. ----
  473. {
  474. "found": true,
  475. "_index": "my-index-000001",
  476. "_id": "my_id",
  477. "_version": 1,
  478. "_seq_no": 55,
  479. "_primary_term": 1,
  480. "_source": {
  481. "user": {
  482. "email": "mardy.brown@asciidocsmith.com",
  483. "first_name": "Mardy",
  484. "last_name": "Brown",
  485. "zip": 70116,
  486. "city": "New Orleans",
  487. "state": "LA"
  488. },
  489. "email": "mardy.brown@asciidocsmith.com"
  490. }
  491. }
  492. ----
  493. // TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term":1/"_primary_term" : $body._primary_term/]
  494. ////
  495. [source,console]
  496. --------------------------------------------------
  497. DELETE /_ingest/pipeline/user_lookup
  498. DELETE /_enrich/policy/users-policy
  499. --------------------------------------------------
  500. // TEST[continued]
  501. ////