examples.asciidoc 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717
  1. [role="xpack"]
  2. [[transform-examples]]
  3. = {transform-cap} examples
  4. ++++
  5. <titleabbrev>Examples</titleabbrev>
  6. ++++
  7. These examples demonstrate how to use {transforms} to derive useful insights
  8. from your data. All the examples use one of the
  9. {kibana-ref}/add-sample-data.html[{kib} sample datasets]. For a more detailed,
  10. step-by-step example, see <<ecommerce-transforms>>.
  11. * <<example-best-customers>>
  12. * <<example-airline>>
  13. * <<example-clientips>>
  14. * <<example-last-log>>
  15. * <<example-bytes>>
  16. * <<example-customer-names>>
  17. [[example-best-customers]]
  18. == Finding your best customers
  19. This example uses the eCommerce orders sample data set to find the customers who
  20. spent the most in a hypothetical webshop. Let's use the `pivot` type of
  21. {transform} such that the destination index contains the number of orders, the
  22. total price of the orders, the amount of unique products and the average price
  23. per order, and the total amount of ordered products for each customer.
  24. [role="screenshot"]
  25. image::images/transform-ex1-1.jpg["Finding your best customers with {transforms} in {kib}"]
  26. Alternatively, you can use the <<preview-transform, preview {transform}>> and
  27. the <<put-transform, create {transform} API>>.
  28. .API example
  29. [%collapsible]
  30. ====
  31. [source,console]
  32. ----------------------------------
  33. POST _transform/_preview
  34. {
  35. "source": {
  36. "index": "kibana_sample_data_ecommerce"
  37. },
  38. "dest" : { <1>
  39. "index" : "sample_ecommerce_orders_by_customer"
  40. },
  41. "pivot": {
  42. "group_by": { <2>
  43. "user": { "terms": { "field": "user" }},
  44. "customer_id": { "terms": { "field": "customer_id" }}
  45. },
  46. "aggregations": {
  47. "order_count": { "value_count": { "field": "order_id" }},
  48. "total_order_amt": { "sum": { "field": "taxful_total_price" }},
  49. "avg_amt_per_order": { "avg": { "field": "taxful_total_price" }},
  50. "avg_unique_products_per_order": { "avg": { "field": "total_unique_products" }},
  51. "total_unique_products": { "cardinality": { "field": "products.product_id" }}
  52. }
  53. }
  54. }
  55. ----------------------------------
  56. // TEST[skip:setup kibana sample data]
  57. <1> The destination index for the {transform}. It is ignored by `_preview`.
  58. <2> Two `group_by` fields is selected. This means the {transform} contains a
  59. unique row per `user` and `customer_id` combination. Within this data set, both
  60. these fields are unique. By including both in the {transform}, it gives more
  61. context to the final results.
  62. NOTE: In the example above, condensed JSON formatting is used for easier
  63. readability of the pivot object.
  64. The preview {transforms} API enables you to see the layout of the {transform} in
  65. advance, populated with some sample values. For example:
  66. [source,js]
  67. ----------------------------------
  68. {
  69. "preview" : [
  70. {
  71. "total_order_amt" : 3946.9765625,
  72. "order_count" : 59.0,
  73. "total_unique_products" : 116.0,
  74. "avg_unique_products_per_order" : 2.0,
  75. "customer_id" : "10",
  76. "user" : "recip",
  77. "avg_amt_per_order" : 66.89790783898304
  78. },
  79. ...
  80. ]
  81. }
  82. ----------------------------------
  83. // NOTCONSOLE
  84. ====
  85. This {transform} makes it easier to answer questions such as:
  86. * Which customers spend the most?
  87. * Which customers spend the most per order?
  88. * Which customers order most often?
  89. * Which customers ordered the least number of different products?
  90. It's possible to answer these questions using aggregations alone, however
  91. {transforms} allow us to persist this data as a customer centric index. This
  92. enables us to analyze data at scale and gives more flexibility to explore and
  93. navigate data from a customer centric perspective. In some cases, it can even
  94. make creating visualizations much simpler.
  95. [[example-airline]]
  96. == Finding air carriers with the most delays
  97. This example uses the Flights sample data set to find out which air carrier
  98. had the most delays. First, filter the source data such that it excludes all
  99. the cancelled flights by using a query filter. Then transform the data to
  100. contain the distinct number of flights, the sum of delayed minutes, and the sum
  101. of the flight minutes by air carrier. Finally, use a
  102. <<search-aggregations-pipeline-bucket-script-aggregation,`bucket_script`>>
  103. to determine what percentage of the flight time was actually delay.
  104. [source,console]
  105. ----------------------------------
  106. POST _transform/_preview
  107. {
  108. "source": {
  109. "index": "kibana_sample_data_flights",
  110. "query": { <1>
  111. "bool": {
  112. "filter": [
  113. { "term": { "Cancelled": false } }
  114. ]
  115. }
  116. }
  117. },
  118. "dest" : { <2>
  119. "index" : "sample_flight_delays_by_carrier"
  120. },
  121. "pivot": {
  122. "group_by": { <3>
  123. "carrier": { "terms": { "field": "Carrier" }}
  124. },
  125. "aggregations": {
  126. "flights_count": { "value_count": { "field": "FlightNum" }},
  127. "delay_mins_total": { "sum": { "field": "FlightDelayMin" }},
  128. "flight_mins_total": { "sum": { "field": "FlightTimeMin" }},
  129. "delay_time_percentage": { <4>
  130. "bucket_script": {
  131. "buckets_path": {
  132. "delay_time": "delay_mins_total.value",
  133. "flight_time": "flight_mins_total.value"
  134. },
  135. "script": "(params.delay_time / params.flight_time) * 100"
  136. }
  137. }
  138. }
  139. }
  140. }
  141. ----------------------------------
  142. // TEST[skip:setup kibana sample data]
  143. <1> Filter the source data to select only flights that are not cancelled.
  144. <2> The destination index for the {transform}. It is ignored by `_preview`.
  145. <3> The data is grouped by the `Carrier` field which contains the airline name.
  146. <4> This `bucket_script` performs calculations on the results that are returned
  147. by the aggregation. In this particular example, it calculates what percentage of
  148. travel time was taken up by delays.
  149. The preview shows you that the new index would contain data like this for each
  150. carrier:
  151. [source,js]
  152. ----------------------------------
  153. {
  154. "preview" : [
  155. {
  156. "carrier" : "ES-Air",
  157. "flights_count" : 2802.0,
  158. "flight_mins_total" : 1436927.5130677223,
  159. "delay_time_percentage" : 9.335543983955839,
  160. "delay_mins_total" : 134145.0
  161. },
  162. ...
  163. ]
  164. }
  165. ----------------------------------
  166. // NOTCONSOLE
  167. This {transform} makes it easier to answer questions such as:
  168. * Which air carrier has the most delays as a percentage of flight time?
  169. NOTE: This data is fictional and does not reflect actual delays or flight stats
  170. for any of the featured destination or origin airports.
  171. [[example-clientips]]
  172. == Finding suspicious client IPs
  173. This example uses the web log sample data set to identify suspicious client IPs.
  174. It transforms the data such that the new index contains the sum of bytes and the
  175. number of distinct URLs, agents, incoming requests by location, and geographic
  176. destinations for each client IP. It also uses filter aggregations to count the
  177. specific types of HTTP responses that each client IP receives. Ultimately, the
  178. example below transforms web log data into an entity centric index where the
  179. entity is `clientip`.
  180. [source,console]
  181. ----------------------------------
  182. PUT _transform/suspicious_client_ips
  183. {
  184. "source": {
  185. "index": "kibana_sample_data_logs"
  186. },
  187. "dest" : { <1>
  188. "index" : "sample_weblogs_by_clientip"
  189. },
  190. "sync" : { <2>
  191. "time": {
  192. "field": "timestamp",
  193. "delay": "60s"
  194. }
  195. },
  196. "pivot": {
  197. "group_by": { <3>
  198. "clientip": { "terms": { "field": "clientip" } }
  199. },
  200. "aggregations": {
  201. "url_dc": { "cardinality": { "field": "url.keyword" }},
  202. "bytes_sum": { "sum": { "field": "bytes" }},
  203. "geo.src_dc": { "cardinality": { "field": "geo.src" }},
  204. "agent_dc": { "cardinality": { "field": "agent.keyword" }},
  205. "geo.dest_dc": { "cardinality": { "field": "geo.dest" }},
  206. "responses.total": { "value_count": { "field": "timestamp" }},
  207. "success" : { <4>
  208. "filter": {
  209. "term": { "response" : "200"}}
  210. },
  211. "error404" : {
  212. "filter": {
  213. "term": { "response" : "404"}}
  214. },
  215. "error5xx" : {
  216. "filter": {
  217. "range": { "response" : { "gte": 500, "lt": 600}}}
  218. },
  219. "timestamp.min": { "min": { "field": "timestamp" }},
  220. "timestamp.max": { "max": { "field": "timestamp" }},
  221. "timestamp.duration_ms": { <5>
  222. "bucket_script": {
  223. "buckets_path": {
  224. "min_time": "timestamp.min.value",
  225. "max_time": "timestamp.max.value"
  226. },
  227. "script": "(params.max_time - params.min_time)"
  228. }
  229. }
  230. }
  231. }
  232. }
  233. ----------------------------------
  234. // TEST[skip:setup kibana sample data]
  235. <1> The destination index for the {transform}.
  236. <2> Configures the {transform} to run continuously. It uses the `timestamp`
  237. field to synchronize the source and destination indices. The worst case
  238. ingestion delay is 60 seconds.
  239. <3> The data is grouped by the `clientip` field.
  240. <4> Filter aggregation that counts the occurrences of successful (`200`)
  241. responses in the `response` field. The following two aggregations (`error404`
  242. and `error5xx`) count the error responses by error codes, matching an exact
  243. value or a range of response codes.
  244. <5> This `bucket_script` calculates the duration of the `clientip` access based
  245. on the results of the aggregation.
  246. After you create the {transform}, you must start it:
  247. [source,console]
  248. ----------------------------------
  249. POST _transform/suspicious_client_ips/_start
  250. ----------------------------------
  251. // TEST[skip:setup kibana sample data]
  252. Shortly thereafter, the first results should be available in the destination
  253. index:
  254. [source,console]
  255. ----------------------------------
  256. GET sample_weblogs_by_clientip/_search
  257. ----------------------------------
  258. // TEST[skip:setup kibana sample data]
  259. The search result shows you data like this for each client IP:
  260. [source,js]
  261. ----------------------------------
  262. "hits" : [
  263. {
  264. "_index" : "sample_weblogs_by_clientip",
  265. "_id" : "MOeHH_cUL5urmartKj-b5UQAAAAAAAAA",
  266. "_score" : 1.0,
  267. "_source" : {
  268. "geo" : {
  269. "src_dc" : 2.0,
  270. "dest_dc" : 2.0
  271. },
  272. "success" : 2,
  273. "error404" : 0,
  274. "error503" : 0,
  275. "clientip" : "0.72.176.46",
  276. "agent_dc" : 2.0,
  277. "bytes_sum" : 4422.0,
  278. "responses" : {
  279. "total" : 2.0
  280. },
  281. "url_dc" : 2.0,
  282. "timestamp" : {
  283. "duration_ms" : 5.2191698E8,
  284. "min" : "2020-03-16T07:51:57.333Z",
  285. "max" : "2020-03-22T08:50:34.313Z"
  286. }
  287. }
  288. }
  289. ]
  290. ----------------------------------
  291. // NOTCONSOLE
  292. NOTE: Like other Kibana sample data sets, the web log sample dataset contains
  293. timestamps relative to when you installed it, including timestamps in the
  294. future. The {ctransform} will pick up the data points once they are in the past.
  295. If you installed the web log sample dataset some time ago, you can uninstall and
  296. reinstall it and the timestamps will change.
  297. This {transform} makes it easier to answer questions such as:
  298. * Which client IPs are transferring the most amounts of data?
  299. * Which client IPs are interacting with a high number of different URLs?
  300. * Which client IPs have high error rates?
  301. * Which client IPs are interacting with a high number of destination countries?
  302. [[example-last-log]]
  303. == Finding the last log event for each IP address
  304. This example uses the web log sample data set to find the last log from an IP
  305. address. Let's use the `latest` type of {transform} in continuous mode. It
  306. copies the most recent document for each unique key from the source index to the
  307. destination index and updates the destination index as new data comes into the
  308. source index.
  309. Pick the `clientip` field as the unique key; the data is grouped by this field.
  310. Select `timestamp` as the date field that sorts the data chronologically. For
  311. continuous mode, specify a date field that is used to identify new documents,
  312. and an interval between checks for changes in the source index.
  313. [role="screenshot"]
  314. image::images/transform-ex4-1.jpg["Finding the last log event for each IP address with {transforms} in {kib}"]
  315. Let's assume that we're interested in retaining documents only for IP addresses
  316. that appeared recently in the log. You can define a retention policy and specify
  317. a date field that is used to calculate the age of a document. This example uses
  318. the same date field that is used to sort the data. Then set the maximum age of a
  319. document; documents that are older than the value you set will be removed from
  320. the destination index.
  321. [role="screenshot"]
  322. image::images/transform-ex4-2.jpg["Defining retention policy for {transforms} in {kib}"]
  323. This {transform} creates the destination index that contains the latest login
  324. date for each client IP. As the {transform} runs in continuous mode, the
  325. destination index will be updated as new data that comes into the source index.
  326. Finally, every document that is older than 30 days will be removed from the
  327. destination index due to the applied retention policy.
  328. .API example
  329. [%collapsible]
  330. ====
  331. [source,console]
  332. ----------------------------------
  333. PUT _transform/last-log-from-clientip
  334. {
  335. "source": {
  336. "index": [
  337. "kibana_sample_data_logs"
  338. ]
  339. },
  340. "latest": {
  341. "unique_key": [ <1>
  342. "clientip"
  343. ],
  344. "sort": "timestamp" <2>
  345. },
  346. "frequency": "1m", <3>
  347. "dest": {
  348. "index": "last-log-from-clientip"
  349. },
  350. "sync": { <4>
  351. "time": {
  352. "field": "timestamp",
  353. "delay": "60s"
  354. }
  355. },
  356. "retention_policy": { <5>
  357. "time": {
  358. "field": "timestamp",
  359. "max_age": "30d"
  360. }
  361. },
  362. "settings": {
  363. "max_page_search_size": 500
  364. }
  365. }
  366. ----------------------------------
  367. // TEST[skip:setup kibana sample data]
  368. <1> Specifies the field for grouping the data.
  369. <2> Specifies the date field that is used for sorting the data.
  370. <3> Sets the interval for the {transform} to check for changes in the source
  371. index.
  372. <4> Contains the time field and delay settings used to synchronize the source
  373. and destination indices.
  374. <5> Specifies the retention policy for the transform. Documents that are older
  375. than the configured value will be removed from the destination index.
  376. After you create the {transform}, start it:
  377. [source,console]
  378. ----------------------------------
  379. POST _transform/last-log-from-clientip/_start
  380. ----------------------------------
  381. // TEST[skip:setup kibana sample data]
  382. ====
  383. After the {transform} processes the data, search the destination index:
  384. [source,console]
  385. ----------------------------------
  386. GET last-log-from-clientip/_search
  387. ----------------------------------
  388. // TEST[skip:setup kibana sample data]
  389. The search result shows you data like this for each client IP:
  390. [source,js]
  391. ----------------------------------
  392. {
  393. "_index" : "last-log-from-clientip",
  394. "_id" : "MOeHH_cUL5urmartKj-b5UQAAAAAAAAA",
  395. "_score" : 1.0,
  396. "_source" : {
  397. "referer" : "http://twitter.com/error/don-lind",
  398. "request" : "/elasticsearch",
  399. "agent" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
  400. "extension" : "",
  401. "memory" : null,
  402. "ip" : "0.72.176.46",
  403. "index" : "kibana_sample_data_logs",
  404. "message" : "0.72.176.46 - - [2018-09-18T06:31:00.572Z] \"GET /elasticsearch HTTP/1.1\" 200 7065 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\"",
  405. "url" : "https://www.elastic.co/downloads/elasticsearch",
  406. "tags" : [
  407. "success",
  408. "info"
  409. ],
  410. "geo" : {
  411. "srcdest" : "IN:PH",
  412. "src" : "IN",
  413. "coordinates" : {
  414. "lon" : -124.1127917,
  415. "lat" : 40.80338889
  416. },
  417. "dest" : "PH"
  418. },
  419. "utc_time" : "2021-05-04T06:31:00.572Z",
  420. "bytes" : 7065,
  421. "machine" : {
  422. "os" : "ios",
  423. "ram" : 12884901888
  424. },
  425. "response" : 200,
  426. "clientip" : "0.72.176.46",
  427. "host" : "www.elastic.co",
  428. "event" : {
  429. "dataset" : "sample_web_logs"
  430. },
  431. "phpmemory" : null,
  432. "timestamp" : "2021-05-04T06:31:00.572Z"
  433. }
  434. }
  435. ----------------------------------
  436. // NOTCONSOLE
  437. This {transform} makes it easier to answer questions such as:
  438. * What was the most recent log event associated with a specific IP address?
  439. [[example-bytes]]
  440. == Finding client IPs that sent the most bytes to the server
  441. This example uses the web log sample data set to find the client IP that sent
  442. the most bytes to the server in every hour. The example uses a `pivot`
  443. {transform} with a <<search-aggregations-metrics-top-metrics,`top_metrics`>>
  444. aggregation.
  445. Group the data by a <<_date_histogram,date histogram>> on the time field with an
  446. interval of one hour. Use a
  447. <<search-aggregations-metrics-max-aggregation,max aggregation>> on the `bytes`
  448. field to get the maximum amount of data that is sent to the server. Without
  449. the `max` aggregation, the API call still returns the client IP that sent the
  450. most bytes, however, the amount of bytes that it sent is not returned. In the
  451. `top_metrics` property, specify `clientip` and `geo.src`, then sort them by the
  452. `bytes` field in descending order. The {transform} returns the client IP that
  453. sent the biggest amount of data and the 2-letter ISO code of the corresponding
  454. location.
  455. [source,console]
  456. ----------------------------------
  457. POST _transform/_preview
  458. {
  459. "source": {
  460. "index": "kibana_sample_data_logs"
  461. },
  462. "pivot": {
  463. "group_by": { <1>
  464. "timestamp": {
  465. "date_histogram": {
  466. "field": "timestamp",
  467. "fixed_interval": "1h"
  468. }
  469. }
  470. },
  471. "aggregations": {
  472. "bytes.max": { <2>
  473. "max": {
  474. "field": "bytes"
  475. }
  476. },
  477. "top": {
  478. "top_metrics": { <3>
  479. "metrics": [
  480. {
  481. "field": "clientip"
  482. },
  483. {
  484. "field": "geo.src"
  485. }
  486. ],
  487. "sort": {
  488. "bytes": "desc"
  489. }
  490. }
  491. }
  492. }
  493. }
  494. }
  495. ----------------------------------
  496. // TEST[skip:setup kibana sample data]
  497. <1> The data is grouped by a date histogram of the time field with a one hour
  498. interval.
  499. <2> Calculates the maximum value of the `bytes` field.
  500. <3> Specifies the fields (`clientip` and `geo.src`) of the top document to
  501. return and the sorting method (document with the highest `bytes` value).
  502. The API call above returns a response similar to this:
  503. [source,js]
  504. ----------------------------------
  505. {
  506. "preview" : [
  507. {
  508. "top" : {
  509. "clientip" : "223.87.60.27",
  510. "geo.src" : "IN"
  511. },
  512. "bytes" : {
  513. "max" : 6219
  514. },
  515. "timestamp" : "2021-04-25T00:00:00.000Z"
  516. },
  517. {
  518. "top" : {
  519. "clientip" : "99.74.118.237",
  520. "geo.src" : "LK"
  521. },
  522. "bytes" : {
  523. "max" : 14113
  524. },
  525. "timestamp" : "2021-04-25T03:00:00.000Z"
  526. },
  527. {
  528. "top" : {
  529. "clientip" : "218.148.135.12",
  530. "geo.src" : "BR"
  531. },
  532. "bytes" : {
  533. "max" : 4531
  534. },
  535. "timestamp" : "2021-04-25T04:00:00.000Z"
  536. },
  537. ...
  538. ]
  539. }
  540. ----------------------------------
  541. // NOTCONSOLE
  542. [[example-customer-names]]
  543. == Getting customer name and email address by customer ID
  544. This example uses the ecommerce sample data set to create an entity-centric
  545. index based on customer ID, and to get the customer name and email address by
  546. using the `top_metrics` aggregation.
  547. Group the data by `customer_id`, then add a `top_metrics` aggregation where the
  548. `metrics` are the `email`, the `customer_first_name.keyword`, and the
  549. `customer_last_name.keyword` fields. Sort the `top_metrics` by `order_date` in
  550. descending order. The API call looks like this:
  551. [source,console]
  552. ----------------------------------
  553. POST _transform/_preview
  554. {
  555. "source": {
  556. "index": "kibana_sample_data_ecommerce"
  557. },
  558. "pivot": {
  559. "group_by": { <1>
  560. "customer_id": {
  561. "terms": {
  562. "field": "customer_id"
  563. }
  564. }
  565. },
  566. "aggregations": {
  567. "last": {
  568. "top_metrics": { <2>
  569. "metrics": [
  570. {
  571. "field": "email"
  572. },
  573. {
  574. "field": "customer_first_name.keyword"
  575. },
  576. {
  577. "field": "customer_last_name.keyword"
  578. }
  579. ],
  580. "sort": {
  581. "order_date": "desc"
  582. }
  583. }
  584. }
  585. }
  586. }
  587. }
  588. ----------------------------------
  589. // TEST[skip:setup kibana sample data]
  590. <1> The data is grouped by a `terms` aggregation on the `customer_id` field.
  591. <2> Specifies the fields to return (email and name fields) in a descending order
  592. by the order date.
  593. The API returns a response that is similar to this:
  594. [source,js]
  595. ----------------------------------
  596. {
  597. "preview" : [
  598. {
  599. "last" : {
  600. "customer_last_name.keyword" : "Long",
  601. "customer_first_name.keyword" : "Recip",
  602. "email" : "recip@long-family.zzz"
  603. },
  604. "customer_id" : "10"
  605. },
  606. {
  607. "last" : {
  608. "customer_last_name.keyword" : "Jackson",
  609. "customer_first_name.keyword" : "Fitzgerald",
  610. "email" : "fitzgerald@jackson-family.zzz"
  611. },
  612. "customer_id" : "11"
  613. },
  614. {
  615. "last" : {
  616. "customer_last_name.keyword" : "Cross",
  617. "customer_first_name.keyword" : "Brigitte",
  618. "email" : "brigitte@cross-family.zzz"
  619. },
  620. "customer_id" : "12"
  621. },
  622. ...
  623. ]
  624. }
  625. ----------------------------------
  626. // NOTCONSOLE