painless-examples.asciidoc 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[transform-painless-examples]]
  4. === Painless examples for {transforms}
  5. ++++
  6. <titleabbrev>Painless examples for {transforms}</titleabbrev>
  7. ++++
  8. These examples demonstrate how to use Painless in {transforms}. You can learn
  9. more about the Painless scripting language in the
  10. {painless}/painless-guide.html[Painless guide].
  11. * <<painless-top-hits>>
  12. * <<painless-time-features>>
  13. * <<painless-group-by>>
  14. * <<painless-bucket-script>>
  15. * <<painless-count-http>>
  16. * <<painless-compare>>
  17. * <<painless-web-session>>
  18. NOTE: While the context of the following examples is the {transform} use case,
  19. the Painless scripts in the snippets below can be used in other {es} search
  20. aggregations, too.
  21. [discrete]
  22. [[painless-top-hits]]
  23. ==== Getting top hits by using scripted metric aggregation
  24. This snippet shows how to find the latest document, in other words the document
  25. with the earliest timestamp. From a technical perspective, it helps to achieve
  26. the function of a <<search-aggregations-metrics-top-hits-aggregation>> by using
  27. scripted metric aggregation in a {transform}, which provides a metric output.
  28. [source,js]
  29. --------------------------------------------------
  30. "aggregations": {
  31. "latest_doc": {
  32. "scripted_metric": {
  33. "init_script": "state.timestamp_latest = 0L; state.last_doc = ''", <1>
  34. "map_script": """ <2>
  35. def current_date = doc['@timestamp'].getValue().toInstant().toEpochMilli();
  36. if (current_date > state.timestamp_latest)
  37. {state.timestamp_latest = current_date;
  38. state.last_doc = new HashMap(params['_source']);}
  39. """,
  40. "combine_script": "return state", <3>
  41. "reduce_script": """ <4>
  42. def last_doc = '';
  43. def timestamp_latest = 0L;
  44. for (s in states) {if (s.timestamp_latest > (timestamp_latest))
  45. {timestamp_latest = s.timestamp_latest; last_doc = s.last_doc;}}
  46. return last_doc
  47. """
  48. }
  49. }
  50. }
  51. --------------------------------------------------
  52. // NOTCONSOLE
  53. <1> The `init_script` creates a long type `timestamp_latest` and a string type
  54. `last_doc` in the `state` object.
  55. <2> The `map_script` defines `current_date` based on the timestamp of the
  56. document, then compares `current_date` with `state.timestamp_latest`, finally
  57. returns `state.last_doc` from the shard. By using `new HashMap(...)` you copy
  58. the source document, this is important whenever you want to pass the full source
  59. object from one phase to the next.
  60. <3> The `combine_script` returns `state` from each shard.
  61. <4> The `reduce_script` iterates through the value of `s.timestamp_latest`
  62. returned by each shard and returns the document with the latest timestamp
  63. (`last_doc`). In the response, the top hit (in other words, the `latest_doc`) is
  64. nested below the `latest_doc` field.
  65. Check the
  66. <<scripted-metric-aggregation-scope,scope of scripts>>
  67. for detailed explanation on the respective scripts.
  68. You can retrieve the last value in a similar way:
  69. [source,js]
  70. --------------------------------------------------
  71. "aggregations": {
  72. "latest_value": {
  73. "scripted_metric": {
  74. "init_script": "state.timestamp_latest = 0L; state.last_value = ''",
  75. "map_script": """
  76. def current_date = doc['date'].getValue().toInstant().toEpochMilli();
  77. if (current_date > state.timestamp_latest)
  78. {state.timestamp_latest = current_date;
  79. state.last_value = params['_source']['value'];}
  80. """,
  81. "combine_script": "return state",
  82. "reduce_script": """
  83. def last_value = '';
  84. def timestamp_latest = 0L;
  85. for (s in states) {if (s.timestamp_latest > (timestamp_latest))
  86. {timestamp_latest = s.timestamp_latest; last_value = s.last_value;}}
  87. return last_value
  88. """
  89. }
  90. }
  91. }
  92. --------------------------------------------------
  93. // NOTCONSOLE
  94. [discrete]
  95. [[painless-time-features]]
  96. ==== Getting time features by using aggregations
  97. This snippet shows how to extract time based features by using Painless in a
  98. {transform}. The snippet uses an index where `@timestamp` is defined as a `date`
  99. type field.
  100. [source,js]
  101. --------------------------------------------------
  102. "aggregations": {
  103. "avg_hour_of_day": { <1>
  104. "avg":{
  105. "script": { <2>
  106. "source": """
  107. ZonedDateTime date = doc['@timestamp'].value; <3>
  108. return date.getHour(); <4>
  109. """
  110. }
  111. }
  112. },
  113. "avg_month_of_year": { <5>
  114. "avg":{
  115. "script": { <6>
  116. "source": """
  117. ZonedDateTime date = doc['@timestamp'].value; <7>
  118. return date.getMonthValue(); <8>
  119. """
  120. }
  121. }
  122. },
  123. ...
  124. }
  125. --------------------------------------------------
  126. // NOTCONSOLE
  127. <1> Name of the aggregation.
  128. <2> Contains the Painless script that returns the hour of the day.
  129. <3> Sets `date` based on the timestamp of the document.
  130. <4> Returns the hour value from `date`.
  131. <5> Name of the aggregation.
  132. <6> Contains the Painless script that returns the month of the year.
  133. <7> Sets `date` based on the timestamp of the document.
  134. <8> Returns the month value from `date`.
  135. [discrete]
  136. [[painless-group-by]]
  137. ==== Using Painless in `group_by`
  138. It is possible to base the `group_by` property of a {transform} on the output of
  139. a script. The following example uses the {kib} sample web logs dataset. The goal
  140. here is to make the {transform} output easier to understand through normalizing
  141. the value of the fields that the data is grouped by.
  142. [source,console]
  143. --------------------------------------------------
  144. POST _transform/_preview
  145. {
  146. "source": {
  147. "index": [ <1>
  148. "kibana_sample_data_logs"
  149. ]
  150. },
  151. "pivot": {
  152. "group_by": {
  153. "agent": {
  154. "terms": {
  155. "script": { <2>
  156. "source": """String agent = doc['agent.keyword'].value;
  157. if (agent.contains("MSIE")) {
  158. return "internet explorer";
  159. } else if (agent.contains("AppleWebKit")) {
  160. return "safari";
  161. } else if (agent.contains('Firefox')) {
  162. return "firefox";
  163. } else { return agent }""",
  164. "lang": "painless"
  165. }
  166. }
  167. }
  168. },
  169. "aggregations": { <3>
  170. "200": {
  171. "filter": {
  172. "term": {
  173. "response": "200"
  174. }
  175. }
  176. },
  177. "404": {
  178. "filter": {
  179. "term": {
  180. "response": "404"
  181. }
  182. }
  183. },
  184. "503": {
  185. "filter": {
  186. "term": {
  187. "response": "503"
  188. }
  189. }
  190. }
  191. }
  192. },
  193. "dest": { <4>
  194. "index": "pivot_logs"
  195. }
  196. }
  197. --------------------------------------------------
  198. // TEST[skip:setup kibana sample data]
  199. <1> Specifies the source index or indices.
  200. <2> The script defines an `agent` string based on the `agent` field of the
  201. documents, then iterates through the values. If an `agent` field contains
  202. "MSIE", than the script returns "Internet Explorer". If it contains
  203. `AppleWebKit`, it returns "safari". It returns "firefox" if the field value
  204. contains "Firefox". Finally, in every other case, the value of the field is
  205. returned.
  206. <3> The aggregations object contains filters that narrow down the results to
  207. documents that contains `200`, `404`, or `503` values in the `response` field.
  208. <4> Specifies the destination index of the {transform}.
  209. The API returns the following result:
  210. [source,js]
  211. --------------------------------------------------
  212. {
  213. "preview" : [
  214. {
  215. "agent" : "firefox",
  216. "200" : 4931,
  217. "404" : 259,
  218. "503" : 172
  219. },
  220. {
  221. "agent" : "internet explorer",
  222. "200" : 3674,
  223. "404" : 210,
  224. "503" : 126
  225. },
  226. {
  227. "agent" : "safari",
  228. "200" : 4227,
  229. "404" : 332,
  230. "503" : 143
  231. }
  232. ],
  233. "mappings" : {
  234. "properties" : {
  235. "200" : {
  236. "type" : "long"
  237. },
  238. "agent" : {
  239. "type" : "keyword"
  240. },
  241. "404" : {
  242. "type" : "long"
  243. },
  244. "503" : {
  245. "type" : "long"
  246. }
  247. }
  248. }
  249. }
  250. --------------------------------------------------
  251. // NOTCONSOLE
  252. You can see that the `agent` values are simplified so it is easier to interpret
  253. them. The table below shows how normalization modifies the output of the
  254. {transform} in our example compared to the non-normalized values.
  255. [width="50%"]
  256. |===
  257. | Non-normalized `agent` value | Normalized `agent` value
  258. | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" | "internet explorer"
  259. | "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24" | "safari"
  260. | "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1" | "firefox"
  261. |===
  262. [discrete]
  263. [[painless-bucket-script]]
  264. ==== Getting duration by using bucket script
  265. This example shows you how to get the duration of a session by client IP from a
  266. data log by using
  267. {ref}/search-aggregations-pipeline-bucket-script-aggregation.html[bucket script].
  268. The example uses the {kib} sample web logs dataset.
  269. [source,console]
  270. --------------------------------------------------
  271. PUT _transform/data_log
  272. {
  273. "source": {
  274. "index": "kibana_sample_data_logs"
  275. },
  276. "dest": {
  277. "index": "data-logs-by-client"
  278. },
  279. "pivot": {
  280. "group_by": {
  281. "machine.os": {"terms": {"field": "machine.os.keyword"}},
  282. "machine.ip": {"terms": {"field": "clientip"}}
  283. },
  284. "aggregations": {
  285. "time_frame.lte": {
  286. "max": {
  287. "field": "timestamp"
  288. }
  289. },
  290. "time_frame.gte": {
  291. "min": {
  292. "field": "timestamp"
  293. }
  294. },
  295. "time_length": { <1>
  296. "bucket_script": {
  297. "buckets_path": { <2>
  298. "min": "time_frame.gte.value",
  299. "max": "time_frame.lte.value"
  300. },
  301. "script": "params.max - params.min" <3>
  302. }
  303. }
  304. }
  305. }
  306. }
  307. --------------------------------------------------
  308. // TEST[skip:setup kibana sample data]
  309. <1> To define the length of the sessions, we use a bucket script.
  310. <2> The bucket path is a map of script variables and their associated path to
  311. the buckets you want to use for the variable. In this particular case, `min` and
  312. `max` are variables mapped to `time_frame.gte.value` and `time_frame.lte.value`.
  313. <3> Finally, the script substracts the start date of the session from the end
  314. date which results in the duration of the session.
  315. [discrete]
  316. [[painless-count-http]]
  317. ==== Counting HTTP responses by using scripted metric aggregation
  318. You can count the different HTTP response types in a web log data set by using
  319. scripted metric aggregation as part of the {transform}. The example below
  320. assumes that the HTTP response codes are stored as keywords in the `response`
  321. field of the documents.
  322. [source,js]
  323. --------------------------------------------------
  324. "aggregations": { <1>
  325. "responses.counts": { <2>
  326. "scripted_metric": { <3>
  327. "init_script": "state.responses = ['error':0L,'success':0L,'other':0L]", <4>
  328. "map_script": """ <5>
  329. def code = doc['response.keyword'].value;
  330. if (code.startsWith('5') || code.startsWith('4')) {
  331. state.responses.error += 1 ;
  332. } else if(code.startsWith('2')) {
  333. state.responses.success += 1;
  334. } else {
  335. state.responses.other += 1;
  336. }
  337. """,
  338. "combine_script": "state.responses", <6>
  339. "reduce_script": """ <7>
  340. def counts = ['error': 0L, 'success': 0L, 'other': 0L];
  341. for (responses in states) {
  342. counts.error += responses['error'];
  343. counts.success += responses['success'];
  344. counts.other += responses['other'];
  345. }
  346. return counts;
  347. """
  348. }
  349. },
  350. ...
  351. }
  352. --------------------------------------------------
  353. // NOTCONSOLE
  354. <1> The `aggregations` object of the {transform} that contains all aggregations.
  355. <2> Object of the `scripted_metric` aggregation.
  356. <3> This `scripted_metric` performs a distributed operation on the web log data
  357. to count specific types of HTTP responses (error, success, and other).
  358. <4> The `init_script` creates a `responses` array in the `state` object with
  359. three properties (`error`, `success`, `other`) with long data type.
  360. <5> The `map_script` defines `code` based on the `response.keyword` value of the
  361. document, then it counts the errors, successes, and other responses based on the
  362. first digit of the responses.
  363. <6> The `combine_script` returns `state.responses` from each shard.
  364. <7> The `reduce_script` creates a `counts` array with the `error`, `success`,
  365. and `other` properties, then iterates through the value of `responses` returned
  366. by each shard and assigns the different response types to the appropriate
  367. properties of the `counts` object; error responses to the error counts, success
  368. responses to the success counts, and other responses to the other counts.
  369. Finally, returns the `counts` array with the response counts.
  370. [discrete]
  371. [[painless-compare]]
  372. ==== Comparing indices by using scripted metric aggregations
  373. This example shows how to compare the content of two indices by a {transform}
  374. that uses a scripted metric aggregation.
  375. [source,console]
  376. --------------------------------------------------
  377. POST _transform/_preview
  378. {
  379. "id" : "index_compare",
  380. "source" : { <1>
  381. "index" : [
  382. "index1",
  383. "index2"
  384. ],
  385. "query" : {
  386. "match_all" : { }
  387. }
  388. },
  389. "dest" : { <2>
  390. "index" : "compare"
  391. },
  392. "pivot" : {
  393. "group_by" : {
  394. "unique-id" : {
  395. "terms" : {
  396. "field" : "<unique-id-field>" <3>
  397. }
  398. }
  399. },
  400. "aggregations" : {
  401. "compare" : { <4>
  402. "scripted_metric" : {
  403. "init_script" : "",
  404. "map_script" : "state.doc = new HashMap(params['_source'])", <5>
  405. "combine_script" : "return state", <6>
  406. "reduce_script" : """ <7>
  407. if (states.size() != 2) {
  408. return "count_mismatch"
  409. }
  410. if (states.get(0).equals(states.get(1))) {
  411. return "match"
  412. } else {
  413. return "mismatch"
  414. }
  415. """
  416. }
  417. }
  418. }
  419. }
  420. }
  421. --------------------------------------------------
  422. // TEST[skip:setup kibana sample data]
  423. <1> The indices referenced in the `source` object are compared to each other.
  424. <2> The `dest` index contains the results of the comparison.
  425. <3> The `group_by` field needs to be a unique identifier for each document.
  426. <4> Object of the `scripted_metric` aggregation.
  427. <5> The `map_script` defines `doc` in the state object. By using
  428. `new HashMap(...)` you copy the source document, this is important whenever you
  429. want to pass the full source object from one phase to the next.
  430. <6> The `combine_script` returns `state` from each shard.
  431. <7> The `reduce_script` checks if the size of the indices are equal. If they are
  432. not equal, than it reports back a `count_mismatch`. Then it iterates through all
  433. the values of the two indices and compare them. If the values are equal, then it
  434. returns a `match`, otherwise returns a `mismatch`.
  435. [discrete]
  436. [[painless-web-session]]
  437. ==== Getting web session details by using scripted metric aggregation
  438. This example shows how to derive multiple features from a single transaction.
  439. Let's take a look on the example source document from the data:
  440. .Source document
  441. [%collapsible%open]
  442. =====
  443. [source,js]
  444. --------------------------------------------------
  445. {
  446. "_index":"apache-sessions",
  447. "_type":"_doc",
  448. "_id":"KvzSeGoB4bgw0KGbE3wP",
  449. "_score":1.0,
  450. "_source":{
  451. "@timestamp":1484053499256,
  452. "apache":{
  453. "access":{
  454. "sessionid":"571604f2b2b0c7b346dc685eeb0e2306774a63c2",
  455. "url":"http://www.leroymerlin.fr/v3/search/search.do?keyword=Carrelage%20salle%20de%20bain",
  456. "path":"/v3/search/search.do",
  457. "query":"keyword=Carrelage%20salle%20de%20bain",
  458. "referrer":"http://www.leroymerlin.fr/v3/p/produits/carrelage-parquet-sol-souple/carrelage-sol-et-mur/decor-listel-et-accessoires-carrelage-mural-l1308217717?resultOffset=0&resultLimit=51&resultListShape=MOSAIC&priceStyle=SALEUNIT_PRICE",
  459. "user_agent":{
  460. "original":"Mobile Safari 10.0 Mac OS X (iPad) Apple Inc.",
  461. "os_name":"Mac OS X (iPad)"
  462. },
  463. "remote_ip":"0337b1fa-5ed4-af81-9ef4-0ec53be0f45d",
  464. "geoip":{
  465. "country_iso_code":"FR",
  466. "location":{
  467. "lat":48.86,
  468. "lon":2.35
  469. }
  470. },
  471. "response_code":200,
  472. "method":"GET"
  473. }
  474. }
  475. }
  476. }
  477. ...
  478. --------------------------------------------------
  479. // NOTCONSOLE
  480. =====
  481. By using the `sessionid` as a group-by field, you are able to enumerate events
  482. through the session and get more details of the session by using scripted metric
  483. aggregation.
  484. [source,js]
  485. --------------------------------------------------
  486. POST _transform/_preview
  487. {
  488. "source": {
  489. "index": "apache-sessions"
  490. },
  491. "pivot": {
  492. "group_by": {
  493. "sessionid": { <1>
  494. "terms": {
  495. "field": "apache.access.sessionid"
  496. }
  497. }
  498. },
  499. "aggregations": { <2>
  500. "distinct_paths": {
  501. "cardinality": {
  502. "field": "apache.access.path"
  503. }
  504. },
  505. "num_pages_viewed": {
  506. "value_count": {
  507. "field": "apache.access.url"
  508. }
  509. },
  510. "session_details": {
  511. "scripted_metric": {
  512. "init_script": "state.docs = []", <3>
  513. "map_script": """ <4>
  514. Map span = [
  515. '@timestamp':doc['@timestamp'].value,
  516. 'url':doc['apache.access.url'].value,
  517. 'referrer':doc['apache.access.referrer'].value
  518. ];
  519. state.docs.add(span)
  520. """,
  521. "combine_script": "return state.docs;", <5>
  522. "reduce_script": """ <6>
  523. def all_docs = [];
  524. for (s in states) {
  525. for (span in s) {
  526. all_docs.add(span);
  527. }
  528. }
  529. all_docs.sort((HashMap o1, HashMap o2)->o1['@timestamp'].millis.compareTo(o2['@timestamp'].millis));
  530. def size = all_docs.size();
  531. def min_time = all_docs[0]['@timestamp'];
  532. def max_time = all_docs[size-1]['@timestamp'];
  533. def duration = max_time.millis - min_time.millis;
  534. def entry_page = all_docs[0]['url'];
  535. def exit_path = all_docs[size-1]['url'];
  536. def first_referrer = all_docs[0]['referrer'];
  537. def ret = new HashMap();
  538. ret['first_time'] = min_time;
  539. ret['last_time'] = max_time;
  540. ret['duration'] = duration;
  541. ret['entry_page'] = entry_page;
  542. ret['exit_path'] = exit_path;
  543. ret['first_referrer'] = first_referrer;
  544. return ret;
  545. """
  546. }
  547. }
  548. }
  549. }
  550. }
  551. --------------------------------------------------
  552. // NOTCONSOLE
  553. <1> The data is grouped by `sessionid`.
  554. <2> The aggregations counts the number of paths and enumerate the viewed pages
  555. during the session.
  556. <3> The `init_script` creates an array type `doc` in the `state` object.
  557. <4> The `map_script` defines a `span` array with a timestamp, a URL, and a
  558. referrer value which are based on the corresponding values of the document, then
  559. adds the value of the `span` array to the `doc` object.
  560. <5> The `combine_script` returns `state.docs` from each shard.
  561. <6> The `reduce_script` defines various objects like `min_time`, `max_time`, and
  562. `duration` based on the document fields, then declares a `ret` object, and
  563. copies the source document by using `new HashMap ()`. Next, the script defines
  564. `first_time`, `last_time`, `duration` and other fields inside the `ret` object
  565. based on the corresponding object defined earlier, finally returns `ret`.
  566. The API call results in a similar response:
  567. [source,js]
  568. --------------------------------------------------
  569. {
  570. "num_pages_viewed" : 2.0,
  571. "session_details" : {
  572. "duration" : 131374,
  573. "first_referrer" : "https://www.bing.com/",
  574. "entry_page" : "http://www.leroymerlin.fr/v3/p/produits/materiaux-menuiserie/porte-coulissante-porte-interieure-escalier-et-rambarde/barriere-de-securite-l1308218463",
  575. "first_time" : "2017-01-10T21:22:52.982Z",
  576. "last_time" : "2017-01-10T21:25:04.356Z",
  577. "exit_path" : "http://www.leroymerlin.fr/v3/p/produits/materiaux-menuiserie/porte-coulissante-porte-interieure-escalier-et-rambarde/barriere-de-securite-l1308218463?__result-wrapper?pageTemplate=Famille%2FMat%C3%A9riaux+et+menuiserie&resultOffset=0&resultLimit=50&resultListShape=PLAIN&nomenclatureId=17942&priceStyle=SALEUNIT_PRICE&fcr=1&*4294718806=4294718806&*14072=14072&*4294718593=4294718593&*17942=17942"
  578. },
  579. "distinct_paths" : 1.0,
  580. "sessionid" : "000046f8154a80fd89849369c984b8cc9d795814"
  581. },
  582. {
  583. "num_pages_viewed" : 10.0,
  584. "session_details" : {
  585. "duration" : 343112,
  586. "first_referrer" : "https://www.google.fr/",
  587. "entry_page" : "http://www.leroymerlin.fr/",
  588. "first_time" : "2017-01-10T16:57:39.937Z",
  589. "last_time" : "2017-01-10T17:03:23.049Z",
  590. "exit_path" : "http://www.leroymerlin.fr/v3/p/produits/porte-de-douche-coulissante-adena-e168578"
  591. },
  592. "distinct_paths" : 8.0,
  593. "sessionid" : "000087e825da1d87a332b8f15fa76116c7467da6"
  594. }
  595. ...
  596. --------------------------------------------------
  597. // NOTCONSOLE