painless-examples.asciidoc 9.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[transform-painless-examples]]
  4. === Painless examples for {transforms}
  5. ++++
  6. <titleabbrev>Painless examples for {transforms}</titleabbrev>
  7. ++++
  8. These examples demonstrate how to use Painless in {transforms}. You can learn
  9. more about the Painless scripting language in the
  10. {painless}/painless-guide.html[Painless guide].
  11. * <<painless-top-hits>>
  12. * <<painless-time-features>>
  13. * <<painless-group-by>>
  14. * <<painless-bucket-script>>
  15. [discrete]
  16. [[painless-top-hits]]
  17. ==== Getting top hits by using scripted metric
  18. This snippet shows how to find the latest document, in other words the document
  19. with the earliest timestamp. From a technical perspective, it helps to achieve
  20. the function of a <<search-aggregations-metrics-top-hits-aggregation>> by using
  21. scripted metric aggregation which provides a metric output.
  22. [source,js]
  23. --------------------------------------------------
  24. "latest_doc": {
  25. "scripted_metric": {
  26. "init_script": "state.timestamp_latest = 0L; state.last_doc = ''", <1>
  27. "map_script": """ <2>
  28. def current_date = doc['@timestamp'].getValue().toInstant().toEpochMilli();
  29. if (current_date > state.timestamp_latest)
  30. {state.timestamp_latest = current_date;
  31. state.last_doc = new HashMap(params['_source']);}
  32. """,
  33. "combine_script": "return state", <3>
  34. "reduce_script": """ <4>
  35. def last_doc = '';
  36. def timestamp_latest = 0L;
  37. for (s in states) {if (s.timestamp_latest > (timestamp_latest))
  38. {timestamp_latest = s.timestamp_latest; last_doc = s.last_doc;}}
  39. return last_doc
  40. """
  41. }
  42. }
  43. --------------------------------------------------
  44. // NOTCONSOLE
  45. <1> The `init_script` creates a long type `timestamp_latest` and a string type
  46. `last_doc` in the `state` object.
  47. <2> The `map_script` defines `current_date` based on the timestamp of the
  48. document, then compares `current_date` with `state.timestamp_latest`, finally
  49. returns `state.last_doc` from the shard. By using `new HashMap(...)` we copy the
  50. source document, this is important whenever you want to pass the full source
  51. object from one phase to the next.
  52. <3> The `combine_script` returns `state` from each shard.
  53. <4> The `reduce_script` iterates through the value of `s.timestamp_latest`
  54. returned by each shard and returns the document with the latest timestamp
  55. (`last_doc`). In the response, the top hit (in other words, the `latest_doc`) is
  56. nested below the `latest_doc` field.
  57. Check the
  58. <<scripted-metric-aggregation-scope,scope of scripts>>
  59. for detailed explanation on the respective scripts.
  60. You can retrieve the last value in a similar way:
  61. [source,js]
  62. --------------------------------------------------
  63. "latest_value": {
  64. "scripted_metric": {
  65. "init_script": "state.timestamp_latest = 0L; state.last_value = ''",
  66. "map_script": """
  67. def current_date = doc['date'].getValue().toInstant().toEpochMilli();
  68. if (current_date > state.timestamp_latest)
  69. {state.timestamp_latest = current_date;
  70. state.last_value = params['_source']['value'];}
  71. """,
  72. "combine_script": "return state",
  73. "reduce_script": """
  74. def last_value = '';
  75. def timestamp_latest = 0L;
  76. for (s in states) {if (s.timestamp_latest > (timestamp_latest))
  77. {timestamp_latest = s.timestamp_latest; last_value = s.last_value;}}
  78. return last_value
  79. """
  80. }
  81. }
  82. --------------------------------------------------
  83. // NOTCONSOLE
  84. [discrete]
  85. [[painless-time-features]]
  86. ==== Getting time features as scripted fields
  87. This snippet shows how to extract time based features by using Painless. The
  88. snippet uses an index where `@timestamp` is defined as a `date` type field.
  89. [source,js]
  90. --------------------------------------------------
  91. "script_fields": {
  92. "hour_of_day": { <1>
  93. "script": {
  94. "lang": "painless",
  95. "source": """
  96. ZonedDateTime date = doc['@timestamp'].value; <2>
  97. return date.getHour(); <3>
  98. """
  99. }
  100. },
  101. "month_of_year": { <4>
  102. "script": {
  103. "lang": "painless",
  104. "source": """
  105. ZonedDateTime date = doc['@timestamp'].value; <5>
  106. return date.getMonthValue(); <6>
  107. """
  108. }
  109. }
  110. }
  111. --------------------------------------------------
  112. // NOTCONSOLE
  113. <1> Contains the Painless script that returns the hour of the day.
  114. <2> Sets `date` based on the timestamp of the document.
  115. <3> Returns the hour value from `date`.
  116. <4> Contains the Painless script that returns the month of the year.
  117. <5> Sets `date` based on the timestamp of the document.
  118. <6> Returns the month value from `date`.
  119. [discrete]
  120. [[painless-group-by]]
  121. ==== Using Painless in `group_by`
  122. It is possible to base the `group_by` property of a {transform} on the output of
  123. a script. The following example uses the {kib} sample web logs dataset. The goal
  124. here is to make the {transform} output easier to understand through normalizing
  125. the value of the fields that the data is grouped by.
  126. [source,console]
  127. --------------------------------------------------
  128. POST _transform/_preview
  129. {
  130. "source": {
  131. "index": [ <1>
  132. "kibana_sample_data_logs"
  133. ]
  134. },
  135. "pivot": {
  136. "group_by": {
  137. "agent": {
  138. "terms": {
  139. "script": { <2>
  140. "source": """String agent = doc['agent.keyword'].value;
  141. if (agent.contains("MSIE")) {
  142. return "internet explorer";
  143. } else if (agent.contains("AppleWebKit")) {
  144. return "safari";
  145. } else if (agent.contains('Firefox')) {
  146. return "firefox";
  147. } else { return agent }""",
  148. "lang": "painless"
  149. }
  150. }
  151. }
  152. },
  153. "aggregations": { <3>
  154. "200": {
  155. "filter": {
  156. "term": {
  157. "response": "200"
  158. }
  159. }
  160. },
  161. "404": {
  162. "filter": {
  163. "term": {
  164. "response": "404"
  165. }
  166. }
  167. },
  168. "503": {
  169. "filter": {
  170. "term": {
  171. "response": "503"
  172. }
  173. }
  174. }
  175. }
  176. },
  177. "dest": { <4>
  178. "index": "pivot_logs"
  179. }
  180. }
  181. --------------------------------------------------
  182. // TEST[skip:setup kibana sample data]
  183. <1> Specifies the source index or indices.
  184. <2> The script defines an `agent` string based on the `agent` field of the
  185. documents, then iterates through the values. If an `agent` field contains
  186. "MSIE", than the script returns "Internet Explorer". If it contains
  187. `AppleWebKit`, it returns "safari". It returns "firefox" if the field value
  188. contains "Firefox". Finally, in every other case, the value of the field is
  189. returned.
  190. <3> The aggregations object contains filters that narrow down the results to
  191. documents that contains `200`, `404`, or `503` values in the `response` field.
  192. <4> Specifies the destination index of the {transform}.
  193. The API returns the following result:
  194. [source,js]
  195. --------------------------------------------------
  196. {
  197. "preview" : [
  198. {
  199. "agent" : "firefox",
  200. "200" : 4931,
  201. "404" : 259,
  202. "503" : 172
  203. },
  204. {
  205. "agent" : "internet explorer",
  206. "200" : 3674,
  207. "404" : 210,
  208. "503" : 126
  209. },
  210. {
  211. "agent" : "safari",
  212. "200" : 4227,
  213. "404" : 332,
  214. "503" : 143
  215. }
  216. ],
  217. "mappings" : {
  218. "properties" : {
  219. "200" : {
  220. "type" : "long"
  221. },
  222. "agent" : {
  223. "type" : "keyword"
  224. },
  225. "404" : {
  226. "type" : "long"
  227. },
  228. "503" : {
  229. "type" : "long"
  230. }
  231. }
  232. }
  233. }
  234. --------------------------------------------------
  235. // NOTCONSOLE
  236. You can see that the `agent` values are simplified so it is easier to interpret
  237. them. The table below shows how normalization modifies the output of the
  238. {transform} in our example compared to the non-normalized values.
  239. [width="50%"]
  240. |===
  241. | Non-normalized `agent` value | Normalized `agent` value
  242. | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" | "internet explorer"
  243. | "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24" | "safari"
  244. | "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1" | "firefox"
  245. |===
  246. [discrete]
  247. [[painless-bucket-script]]
  248. ==== Getting duration by using bucket script
  249. This example shows you how to get the duration of a session by client IP from a
  250. data log by using
  251. {ref}/search-aggregations-pipeline-bucket-script-aggregation.html[bucket script].
  252. The example uses the {kib} sample web logs dataset.
  253. [source,console]
  254. --------------------------------------------------
  255. PUT _data_frame/transforms/data_log
  256. {
  257. "source": {
  258. "index": "kibana_sample_data_logs"
  259. },
  260. "dest": {
  261. "index": "data-logs-by-client"
  262. },
  263. "pivot": {
  264. "group_by": {
  265. "machine.os": {"terms": {"field": "machine.os.keyword"}},
  266. "machine.ip": {"terms": {"field": "clientip"}}
  267. },
  268. "aggregations": {
  269. "time_frame.lte": {
  270. "max": {
  271. "field": "timestamp"
  272. }
  273. },
  274. "time_frame.gte": {
  275. "min": {
  276. "field": "timestamp"
  277. }
  278. },
  279. "time_length": { <1>
  280. "bucket_script": {
  281. "buckets_path": { <2>
  282. "min": "time_frame.gte.value",
  283. "max": "time_frame.lte.value"
  284. },
  285. "script": "params.max - params.min" <3>
  286. }
  287. }
  288. }
  289. }
  290. }
  291. --------------------------------------------------
  292. // TEST[skip:setup kibana sample data]
  293. <1> To define the length of the sessions, we use a bucket script.
  294. <2> The bucket path is a map of script variables and their associated path to
  295. the buckets you want to use for the variable. In this particular case, `min` and
  296. `max` are variables mapped to `time_frame.gte.value` and `time_frame.lte.value`.
  297. <3> Finally, the script substracts the start date of the session from the end
  298. date which results in the duration of the session.