painless-examples.asciidoc 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[transform-painless-examples]]
  4. === Painless examples for {transforms}
  5. ++++
  6. <titleabbrev>Painless examples for {transforms}</titleabbrev>
  7. ++++
  8. These examples demonstrate how to use Painless in {transforms}. You can learn
  9. more about the Painless scripting language in the
  10. {painless}/painless-guide.html[Painless guide].
  11. * <<painless-top-hits>>
  12. * <<painless-time-features>>
  13. * <<painless-group-by>>
  14. * <<painless-bucket-script>>
  15. NOTE: While the context of the following examples is the {transform} use case,
  16. the Painless scripts in the snippets below can be used in other {es} search
  17. aggregations, too.
  18. [discrete]
  19. [[painless-top-hits]]
  20. ==== Getting top hits by using scripted metric aggregation
  21. This snippet shows how to find the latest document, in other words the document
  22. with the earliest timestamp. From a technical perspective, it helps to achieve
  23. the function of a <<search-aggregations-metrics-top-hits-aggregation>> by using
  24. scripted metric aggregation in a {transform}, which provides a metric output.
  25. [source,js]
  26. --------------------------------------------------
  27. "aggregations": {
  28. "latest_doc": {
  29. "scripted_metric": {
  30. "init_script": "state.timestamp_latest = 0L; state.last_doc = ''", <1>
  31. "map_script": """ <2>
  32. def current_date = doc['@timestamp'].getValue().toInstant().toEpochMilli();
  33. if (current_date > state.timestamp_latest)
  34. {state.timestamp_latest = current_date;
  35. state.last_doc = new HashMap(params['_source']);}
  36. """,
  37. "combine_script": "return state", <3>
  38. "reduce_script": """ <4>
  39. def last_doc = '';
  40. def timestamp_latest = 0L;
  41. for (s in states) {if (s.timestamp_latest > (timestamp_latest))
  42. {timestamp_latest = s.timestamp_latest; last_doc = s.last_doc;}}
  43. return last_doc
  44. """
  45. }
  46. }
  47. }
  48. --------------------------------------------------
  49. // NOTCONSOLE
  50. <1> The `init_script` creates a long type `timestamp_latest` and a string type
  51. `last_doc` in the `state` object.
  52. <2> The `map_script` defines `current_date` based on the timestamp of the
  53. document, then compares `current_date` with `state.timestamp_latest`, finally
  54. returns `state.last_doc` from the shard. By using `new HashMap(...)` we copy the
  55. source document, this is important whenever you want to pass the full source
  56. object from one phase to the next.
  57. <3> The `combine_script` returns `state` from each shard.
  58. <4> The `reduce_script` iterates through the value of `s.timestamp_latest`
  59. returned by each shard and returns the document with the latest timestamp
  60. (`last_doc`). In the response, the top hit (in other words, the `latest_doc`) is
  61. nested below the `latest_doc` field.
  62. Check the
  63. <<scripted-metric-aggregation-scope,scope of scripts>>
  64. for detailed explanation on the respective scripts.
  65. You can retrieve the last value in a similar way:
  66. [source,js]
  67. --------------------------------------------------
  68. "aggregations": {
  69. "latest_value": {
  70. "scripted_metric": {
  71. "init_script": "state.timestamp_latest = 0L; state.last_value = ''",
  72. "map_script": """
  73. def current_date = doc['date'].getValue().toInstant().toEpochMilli();
  74. if (current_date > state.timestamp_latest)
  75. {state.timestamp_latest = current_date;
  76. state.last_value = params['_source']['value'];}
  77. """,
  78. "combine_script": "return state",
  79. "reduce_script": """
  80. def last_value = '';
  81. def timestamp_latest = 0L;
  82. for (s in states) {if (s.timestamp_latest > (timestamp_latest))
  83. {timestamp_latest = s.timestamp_latest; last_value = s.last_value;}}
  84. return last_value
  85. """
  86. }
  87. }
  88. }
  89. --------------------------------------------------
  90. // NOTCONSOLE
  91. [discrete]
  92. [[painless-time-features]]
  93. ==== Getting time features as scripted fields
  94. This snippet shows how to extract time based features by using Painless in a
  95. {transform}. The snippet uses an index where `@timestamp` is defined as a `date`
  96. type field.
  97. [source,js]
  98. --------------------------------------------------
  99. "aggregations": {
  100. "script_fields": {
  101. "hour_of_day": { <1>
  102. "script": {
  103. "lang": "painless",
  104. "source": """
  105. ZonedDateTime date = doc['@timestamp'].value; <2>
  106. return date.getHour(); <3>
  107. """
  108. }
  109. },
  110. "month_of_year": { <4>
  111. "script": {
  112. "lang": "painless",
  113. "source": """
  114. ZonedDateTime date = doc['@timestamp'].value; <5>
  115. return date.getMonthValue(); <6>
  116. """
  117. }
  118. }
  119. },
  120. ...
  121. }
  122. --------------------------------------------------
  123. // NOTCONSOLE
  124. <1> Contains the Painless script that returns the hour of the day.
  125. <2> Sets `date` based on the timestamp of the document.
  126. <3> Returns the hour value from `date`.
  127. <4> Contains the Painless script that returns the month of the year.
  128. <5> Sets `date` based on the timestamp of the document.
  129. <6> Returns the month value from `date`.
  130. [discrete]
  131. [[painless-group-by]]
  132. ==== Using Painless in `group_by`
  133. It is possible to base the `group_by` property of a {transform} on the output of
  134. a script. The following example uses the {kib} sample web logs dataset. The goal
  135. here is to make the {transform} output easier to understand through normalizing
  136. the value of the fields that the data is grouped by.
  137. [source,console]
  138. --------------------------------------------------
  139. POST _transform/_preview
  140. {
  141. "source": {
  142. "index": [ <1>
  143. "kibana_sample_data_logs"
  144. ]
  145. },
  146. "pivot": {
  147. "group_by": {
  148. "agent": {
  149. "terms": {
  150. "script": { <2>
  151. "source": """String agent = doc['agent.keyword'].value;
  152. if (agent.contains("MSIE")) {
  153. return "internet explorer";
  154. } else if (agent.contains("AppleWebKit")) {
  155. return "safari";
  156. } else if (agent.contains('Firefox')) {
  157. return "firefox";
  158. } else { return agent }""",
  159. "lang": "painless"
  160. }
  161. }
  162. }
  163. },
  164. "aggregations": { <3>
  165. "200": {
  166. "filter": {
  167. "term": {
  168. "response": "200"
  169. }
  170. }
  171. },
  172. "404": {
  173. "filter": {
  174. "term": {
  175. "response": "404"
  176. }
  177. }
  178. },
  179. "503": {
  180. "filter": {
  181. "term": {
  182. "response": "503"
  183. }
  184. }
  185. }
  186. }
  187. },
  188. "dest": { <4>
  189. "index": "pivot_logs"
  190. }
  191. }
  192. --------------------------------------------------
  193. // TEST[skip:setup kibana sample data]
  194. <1> Specifies the source index or indices.
  195. <2> The script defines an `agent` string based on the `agent` field of the
  196. documents, then iterates through the values. If an `agent` field contains
  197. "MSIE", than the script returns "Internet Explorer". If it contains
  198. `AppleWebKit`, it returns "safari". It returns "firefox" if the field value
  199. contains "Firefox". Finally, in every other case, the value of the field is
  200. returned.
  201. <3> The aggregations object contains filters that narrow down the results to
  202. documents that contains `200`, `404`, or `503` values in the `response` field.
  203. <4> Specifies the destination index of the {transform}.
  204. The API returns the following result:
  205. [source,js]
  206. --------------------------------------------------
  207. {
  208. "preview" : [
  209. {
  210. "agent" : "firefox",
  211. "200" : 4931,
  212. "404" : 259,
  213. "503" : 172
  214. },
  215. {
  216. "agent" : "internet explorer",
  217. "200" : 3674,
  218. "404" : 210,
  219. "503" : 126
  220. },
  221. {
  222. "agent" : "safari",
  223. "200" : 4227,
  224. "404" : 332,
  225. "503" : 143
  226. }
  227. ],
  228. "mappings" : {
  229. "properties" : {
  230. "200" : {
  231. "type" : "long"
  232. },
  233. "agent" : {
  234. "type" : "keyword"
  235. },
  236. "404" : {
  237. "type" : "long"
  238. },
  239. "503" : {
  240. "type" : "long"
  241. }
  242. }
  243. }
  244. }
  245. --------------------------------------------------
  246. // NOTCONSOLE
  247. You can see that the `agent` values are simplified so it is easier to interpret
  248. them. The table below shows how normalization modifies the output of the
  249. {transform} in our example compared to the non-normalized values.
  250. [width="50%"]
  251. |===
  252. | Non-normalized `agent` value | Normalized `agent` value
  253. | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" | "internet explorer"
  254. | "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24" | "safari"
  255. | "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1" | "firefox"
  256. |===
  257. [discrete]
  258. [[painless-bucket-script]]
  259. ==== Getting duration by using bucket script
  260. This example shows you how to get the duration of a session by client IP from a
  261. data log by using
  262. {ref}/search-aggregations-pipeline-bucket-script-aggregation.html[bucket script].
  263. The example uses the {kib} sample web logs dataset.
  264. [source,console]
  265. --------------------------------------------------
  266. PUT _data_frame/transforms/data_log
  267. {
  268. "source": {
  269. "index": "kibana_sample_data_logs"
  270. },
  271. "dest": {
  272. "index": "data-logs-by-client"
  273. },
  274. "pivot": {
  275. "group_by": {
  276. "machine.os": {"terms": {"field": "machine.os.keyword"}},
  277. "machine.ip": {"terms": {"field": "clientip"}}
  278. },
  279. "aggregations": {
  280. "time_frame.lte": {
  281. "max": {
  282. "field": "timestamp"
  283. }
  284. },
  285. "time_frame.gte": {
  286. "min": {
  287. "field": "timestamp"
  288. }
  289. },
  290. "time_length": { <1>
  291. "bucket_script": {
  292. "buckets_path": { <2>
  293. "min": "time_frame.gte.value",
  294. "max": "time_frame.lte.value"
  295. },
  296. "script": "params.max - params.min" <3>
  297. }
  298. }
  299. }
  300. }
  301. }
  302. --------------------------------------------------
  303. // TEST[skip:setup kibana sample data]
  304. <1> To define the length of the sessions, we use a bucket script.
  305. <2> The bucket path is a map of script variables and their associated path to
  306. the buckets you want to use for the variable. In this particular case, `min` and
  307. `max` are variables mapped to `time_frame.gte.value` and `time_frame.lte.value`.
  308. <3> Finally, the script substracts the start date of the session from the end
  309. date which results in the duration of the session.
  310. [discrete]
  311. [[painless-count-http]]
  312. ==== Counting HTTP responses by using scripted metric aggregation
  313. You can count the different HTTP response types in a web log data set by using
  314. scripted metric aggregation as part of the {transform}. The example below
  315. assumes that the HTTP response codes are stored as keywords in the `response`
  316. field of the documents.
  317. [source,js]
  318. --------------------------------------------------
  319. "aggregations": { <1>
  320. "responses.counts": { <2>
  321. "scripted_metric": { <3>
  322. "init_script": "state.responses = ['error':0L,'success':0L,'other':0L]", <4>
  323. "map_script": """ <5>
  324. def code = doc['response.keyword'].value;
  325. if (code.startsWith('5') || code.startsWith('4')) {
  326. state.responses.error += 1 ;
  327. } else if(code.startsWith('2')) {
  328. state.responses.success += 1;
  329. } else {
  330. state.responses.other += 1;
  331. }
  332. """,
  333. "combine_script": "state.responses", <6>
  334. "reduce_script": """ <7>
  335. def counts = ['error': 0L, 'success': 0L, 'other': 0L];
  336. for (responses in states) {
  337. counts.error += responses['error'];
  338. counts.success += responses['success'];
  339. counts.other += responses['other'];
  340. }
  341. return counts;
  342. """
  343. }
  344. },
  345. ...
  346. }
  347. --------------------------------------------------
  348. // NOTCONSOLE
  349. <1> The `aggregations` object of the {transform} that contains all aggregations.
  350. <2> Object of the `scripted_metric` aggregation.
  351. <3> This `scripted_metric` performs a distributed operation on the web log data
  352. to count specific types of HTTP responses (error, success, and other).
  353. <4> The `init_script` creates a `responses` array in the `state` object with
  354. three properties (`error`, `success`, `other`) with long data type.
  355. <5> The `map_script` defines `code` based on the `response.keyword` value of the
  356. document, then it counts the errors, successes, and other responses based on the
  357. first digit of the responses.
  358. <6> The `combine_script` returns `state.responses` from each shard.
  359. <7> The `reduce_script` creates a `counts` array with the `error`, `success`,
  360. and `other` properties, then iterates through the value of `responses` returned
  361. by each shard and assigns the different response types to the appropriate
  362. properties of the `counts` object; error responses to the error counts, success
  363. responses to the success counts, and other responses to the other counts.
  364. Finally, returns the `counts` array with the response counts.