scripting.asciidoc 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639
  1. [[modules-scripting]]
  2. == Scripting
  3. The scripting module allows to use scripts in order to evaluate custom
  4. expressions. For example, scripts can be used to return "script fields"
  5. as part of a search request, or can be used to evaluate a custom score
  6. for a query and so on.
  7. The scripting module uses by default http://groovy-lang.org/[groovy]
  8. (previously http://mvel.codehaus.org/[mvel] in 1.3.x and earlier) as the
  9. scripting language with some extensions. Groovy is used since it is extremely
  10. fast and very simple to use.
  11. .Groovy dynamic scripting off by default from v1.4.3
  12. [IMPORTANT]
  13. ===================================================
  14. Groovy dynamic scripting is off by default, preventing dynamic Groovy scripts
  15. from being accepted as part of a request or retrieved from the special
  16. `.scripts` index. You will still be able to use Groovy scripts stored in files
  17. in the `config/scripts/` directory on every node.
  18. To convert an inline script to a file, take this simple script
  19. as an example:
  20. [source,js]
  21. -----------------------------------
  22. GET /_search
  23. {
  24. "script_fields": {
  25. "my_field": {
  26. "inline": "1 + my_var",
  27. "params": {
  28. "my_var": 2
  29. }
  30. }
  31. }
  32. }
  33. -----------------------------------
  34. Save the contents of the `inline` field as a file called `config/scripts/my_script.groovy`
  35. on every data node in the cluster:
  36. [source,js]
  37. -----------------------------------
  38. 1 + my_var
  39. -----------------------------------
  40. Now you can access the script by file name (without the extension):
  41. [source,js]
  42. -----------------------------------
  43. GET /_search
  44. {
  45. "script_fields": {
  46. "my_field": {
  47. "file": "my_script",
  48. "params": {
  49. "my_var": 2
  50. }
  51. }
  52. }
  53. }
  54. -----------------------------------
  55. ===================================================
  56. Additional `lang` plugins are provided to allow to execute scripts in
  57. different languages. All places where a script can be used, a `lang` parameter
  58. can be provided to define the language of the script. The following are the
  59. supported scripting languages:
  60. [cols="<,<,<",options="header",]
  61. |=======================================================================
  62. |Language |Sandboxed |Required plugin
  63. |groovy |no |built-in
  64. |expression |yes |built-in
  65. |mustache |yes |built-in
  66. |mvel |no |https://github.com/elastic/elasticsearch-lang-mvel[elasticsearch-lang-mvel]
  67. |javascript |no |https://github.com/elastic/elasticsearch-lang-javascript[elasticsearch-lang-javascript]
  68. |python |no |https://github.com/elastic/elasticsearch-lang-python[elasticsearch-lang-python]
  69. |=======================================================================
  70. To increase security, Elasticsearch does not allow you to specify scripts for
  71. non-sandboxed languages with a request. Instead, scripts must be placed in the
  72. `scripts` directory inside the configuration directory (the directory where
  73. elasticsearch.yml is). The default location of this `scripts` directory can be
  74. changed by setting `path.scripts` in elasticsearch.yml. Scripts placed into
  75. this directory will automatically be picked up and be available to be used.
  76. Once a script has been placed in this directory, it can be referenced by name.
  77. For example, a script called `calculate-score.groovy` can be referenced in a
  78. request like this:
  79. [source,sh]
  80. --------------------------------------------------
  81. $ tree config
  82. config
  83. ├── elasticsearch.yml
  84. ├── logging.yml
  85. └── scripts
  86. └── calculate-score.groovy
  87. --------------------------------------------------
  88. [source,sh]
  89. --------------------------------------------------
  90. $ cat config/scripts/calculate-score.groovy
  91. log(_score * 2) + my_modifier
  92. --------------------------------------------------
  93. [source,js]
  94. --------------------------------------------------
  95. curl -XPOST localhost:9200/_search -d '{
  96. "query": {
  97. "function_score": {
  98. "query": {
  99. "match": {
  100. "body": "foo"
  101. }
  102. },
  103. "functions": [
  104. {
  105. "script_score": {
  106. "lang": "groovy",
  107. "file": "calculate-score",
  108. "params": {
  109. "my_modifier": 8
  110. }
  111. }
  112. }
  113. ]
  114. }
  115. }
  116. }'
  117. --------------------------------------------------
  118. The name of the script is derived from the hierarchy of directories it
  119. exists under, and the file name without the lang extension. For example,
  120. a script placed under `config/scripts/group1/group2/test.py` will be
  121. named `group1_group2_test`.
  122. [float]
  123. === Indexed Scripts
  124. Elasticsearch allows you to store scripts in an internal index known as
  125. `.scripts` and reference them by id. There are REST endpoints to manage
  126. indexed scripts as follows:
  127. Requests to the scripts endpoint look like :
  128. [source,js]
  129. -----------------------------------
  130. /_scripts/{lang}/{id}
  131. -----------------------------------
  132. Where the `lang` part is the language the script is in and the `id` part is the id
  133. of the script. In the `.scripts` index the type of the document will be set to the `lang`.
  134. [source,js]
  135. -----------------------------------
  136. curl -XPOST localhost:9200/_scripts/groovy/indexedCalculateScore -d '{
  137. "script": "log(_score * 2) + my_modifier"
  138. }'
  139. -----------------------------------
  140. This will create a document with id: `indexedCalculateScore` and type: `groovy` in the
  141. `.scripts` index. The type of the document is the language used by the script.
  142. This script can be accessed at query time by using the `id` script parameter and passing
  143. the script id:
  144. [source,js]
  145. --------------------------------------------------
  146. curl -XPOST localhost:9200/_search -d '{
  147. "query": {
  148. "function_score": {
  149. "query": {
  150. "match": {
  151. "body": "foo"
  152. }
  153. },
  154. "functions": [
  155. {
  156. "script_score": {
  157. "id": "indexedCalculateScore",
  158. "lang" : "groovy",
  159. "params": {
  160. "my_modifier": 8
  161. }
  162. }
  163. }
  164. ]
  165. }
  166. }
  167. }'
  168. --------------------------------------------------
  169. The script can be viewed by:
  170. [source,js]
  171. -----------------------------------
  172. curl -XGET localhost:9200/_scripts/groovy/indexedCalculateScore
  173. -----------------------------------
  174. This is rendered as:
  175. [source,js]
  176. -----------------------------------
  177. '{
  178. "script": "log(_score * 2) + my_modifier"
  179. }'
  180. -----------------------------------
  181. Indexed scripts can be deleted by:
  182. [source,js]
  183. -----------------------------------
  184. curl -XDELETE localhost:9200/_scripts/groovy/indexedCalculateScore
  185. -----------------------------------
  186. [float]
  187. [[enable-dynamic-scripting]]
  188. === Enabling dynamic scripting
  189. We recommend running Elasticsearch behind an application or proxy, which
  190. protects Elasticsearch from the outside world. If users are allowed to run
  191. inline scripts (even in a search request) or indexed scripts, then they have
  192. the same access to your box as the user that Elasticsearch is running as. For
  193. this reason dynamic scripting is allowed only for sandboxed languages by default.
  194. First, you should not run Elasticsearch as the `root` user, as this would allow
  195. a script to access or do *anything* on your server, without limitations. Second,
  196. you should not expose Elasticsearch directly to users, but instead have a proxy
  197. application inbetween. If you *do* intend to expose Elasticsearch directly to
  198. your users, then you have to decide whether you trust them enough to run scripts
  199. on your box or not.
  200. It is possible to enable scripts based on their source, for
  201. every script engine, through the following settings that need to be added to the
  202. `config/elasticsearch.yml` file on every node.
  203. [source,yaml]
  204. -----------------------------------
  205. script.inline: on
  206. script.indexed: on
  207. -----------------------------------
  208. While this still allows execution of named scripts provided in the config, or
  209. _native_ Java scripts registered through plugins, it also allows users to run
  210. arbitrary scripts via the API. Instead of sending the name of the file as the
  211. script, the body of the script can be sent instead or retrieved from the
  212. `.scripts` indexed if previously stored.
  213. There are three possible configuration values for any of the fine-grained
  214. script settings:
  215. [cols="<,<",options="header",]
  216. |=======================================================================
  217. |Value |Description
  218. | `off` |scripting is turned off completely, in the context of the setting being set.
  219. | `on` |scripting is turned on, in the context of the setting being set.
  220. | `sandbox` |scripts may be executed only for languages that are sandboxed
  221. |=======================================================================
  222. The default values are the following:
  223. [source,yaml]
  224. -----------------------------------
  225. script.inline: sandbox
  226. script.indexed: sandbox
  227. script.file: on
  228. -----------------------------------
  229. NOTE: Global scripting settings affect the `mustache` scripting language.
  230. <<search-template,Search templates>> internally use the `mustache` language,
  231. and will still be enabled by default as the `mustache` engine is sandboxed,
  232. but they will be enabled/disabled according to fine-grained settings
  233. specified in `elasticsearch.yml`.
  234. It is also possible to control which operations can execute scripts. The
  235. supported operations are:
  236. [cols="<,<",options="header",]
  237. |=======================================================================
  238. |Value |Description
  239. | `aggs` |Aggregations (wherever they may be used)
  240. | `search` |Search api, Percolator api and Suggester api (e.g filters, script_fields)
  241. | `update` |Update api
  242. | `plugin` |Any plugin that makes use of scripts under the generic `plugin` category
  243. |=======================================================================
  244. Plugins can also define custom operations that they use scripts for instead
  245. of using the generic `plugin` category. Those operations can be referred to
  246. in the following form: `${pluginName}_${operation}`.
  247. The following example disables scripting for `update` and `mapping` operations,
  248. regardless of the script source, for any engine. Scripts can still be
  249. executed from sandboxed languages as part of `aggregations`, `search`
  250. and plugins execution though, as the above defaults still get applied.
  251. [source,yaml]
  252. -----------------------------------
  253. script.update: off
  254. script.mapping: off
  255. -----------------------------------
  256. Generic settings get applied in order, operation based ones have precedence
  257. over source based ones. Language specific settings are supported too. They
  258. need to be prefixed with the `script.engine.<engine>` prefix and have
  259. precedence over any other generic settings.
  260. [source,yaml]
  261. -----------------------------------
  262. script.engine.groovy.file.aggs: on
  263. script.engine.groovy.file.mapping: on
  264. script.engine.groovy.file.search: on
  265. script.engine.groovy.file.update: on
  266. script.engine.groovy.file.plugin: on
  267. script.engine.groovy.indexed.aggs: on
  268. script.engine.groovy.indexed.mapping: off
  269. script.engine.groovy.indexed.search: on
  270. script.engine.groovy.indexed.update: off
  271. script.engine.groovy.indexed.plugin: off
  272. script.engine.groovy.inline.aggs: on
  273. script.engine.groovy.inline.mapping: off
  274. script.engine.groovy.inline.search: off
  275. script.engine.groovy.inline.update: off
  276. script.engine.groovy.inline.plugin: off
  277. -----------------------------------
  278. [float]
  279. === Default Scripting Language
  280. The default scripting language (assuming no `lang` parameter is provided) is
  281. `groovy`. In order to change it, set the `script.default_lang` to the
  282. appropriate language.
  283. [float]
  284. === Automatic Script Reloading
  285. The `config/scripts` directory is scanned periodically for changes.
  286. New and changed scripts are reloaded and deleted script are removed
  287. from preloaded scripts cache. The reload frequency can be specified
  288. using `resource.reload.interval` setting, which defaults to `60s`.
  289. To disable script reloading completely set `script.auto_reload_enabled`
  290. to `false`.
  291. [[native-java-scripts]]
  292. [float]
  293. === Native (Java) Scripts
  294. Even though `groovy` is pretty fast, this allows to register native Java based
  295. scripts for faster execution.
  296. In order to allow for scripts, the `NativeScriptFactory` needs to be
  297. implemented that constructs the script that will be executed. There are
  298. two main types, one that extends `AbstractExecutableScript` and one that
  299. extends `AbstractSearchScript` (probably the one most users will extend,
  300. with additional helper classes in `AbstractLongSearchScript`,
  301. `AbstractDoubleSearchScript`, and `AbstractFloatSearchScript`).
  302. Registering them can either be done by settings, for example:
  303. `script.native.my.type` set to `sample.MyNativeScriptFactory` will
  304. register a script named `my`. Another option is in a plugin, access
  305. `ScriptModule` and call `registerScript` on it.
  306. Executing the script is done by specifying the `lang` as `native`, and
  307. the name of the script as the `script`.
  308. Note, the scripts need to be in the classpath of elasticsearch. One
  309. simple way to do it is to create a directory under plugins (choose a
  310. descriptive name), and place the jar / classes files there. They will be
  311. automatically loaded.
  312. [float]
  313. === Lucene Expressions Scripts
  314. experimental[The Lucene expressions module is undergoing significant development and the exposed functionality is likely to change in the future]
  315. Lucene's expressions module provides a mechanism to compile a
  316. `javascript` expression to bytecode. This allows very fast execution,
  317. as if you had written a `native` script. Expression scripts can be
  318. used in `script_score`, `script_fields`, sort scripts and numeric aggregation scripts.
  319. See the link:http://lucene.apache.org/core/4_9_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html[expressions module documentation]
  320. for details on what operators and functions are available.
  321. Variables in `expression` scripts are available to access:
  322. * Single valued document fields, e.g. `doc['myfield'].value`
  323. * Single valued document fields can also be accessed without `.value` e.g. `doc['myfield']`
  324. * Parameters passed into the script, e.g. `mymodifier`
  325. * The current document's score, `_score` (only available when used in a `script_score`)
  326. Variables in `expression` scripts that are of type `date` may use the following member methods:
  327. * getYear()
  328. * getMonth()
  329. * getDayOfMonth()
  330. * getHourOfDay()
  331. * getMinutes()
  332. * getSeconds()
  333. The following example shows the difference in years between the `date` fields date0 and date1:
  334. `doc['date1'].getYear() - doc['date0'].getYear()`
  335. There are a few limitations relative to other script languages:
  336. * Only numeric fields may be accessed
  337. * Stored fields are not available
  338. * If a field is sparse (only some documents contain a value), documents missing the field will have a value of `0`
  339. [float]
  340. === Score
  341. In all scripts that can be used in aggregations, the current
  342. document's score is accessible in `_score`.
  343. [float]
  344. === Computing scores based on terms in scripts
  345. see <<modules-advanced-scripting, advanced scripting documentation>>
  346. [float]
  347. === Document Fields
  348. Most scripting revolve around the use of specific document fields data.
  349. The `doc['field_name']` can be used to access specific field data within
  350. a document (the document in question is usually derived by the context
  351. the script is used). Document fields are very fast to access since they
  352. end up being loaded into memory (all the relevant field values/tokens
  353. are loaded to memory). Note, however, that the `doc[...]` notation only
  354. allows for simple valued fields (can’t return a json object from it)
  355. and makes sense only on non-analyzed or single term based fields.
  356. The following data can be extracted from a field:
  357. [cols="<,<",options="header",]
  358. |=======================================================================
  359. |Expression |Description
  360. |`doc['field_name'].value` |The native value of the field. For example,
  361. if its a short type, it will be short.
  362. |`doc['field_name'].values` |The native array values of the field. For
  363. example, if its a short type, it will be short[]. Remember, a field can
  364. have several values within a single doc. Returns an empty array if the
  365. field has no values.
  366. |`doc['field_name'].empty` |A boolean indicating if the field has no
  367. values within the doc.
  368. |`doc['field_name'].multiValued` |A boolean indicating that the field
  369. has several values within the corpus.
  370. |`doc['field_name'].lat` |The latitude of a geo point type.
  371. |`doc['field_name'].lon` |The longitude of a geo point type.
  372. |`doc['field_name'].lats` |The latitudes of a geo point type.
  373. |`doc['field_name'].lons` |The longitudes of a geo point type.
  374. |`doc['field_name'].distance(lat, lon)` |The `plane` distance (in meters)
  375. of this geo point field from the provided lat/lon.
  376. |`doc['field_name'].distanceWithDefault(lat, lon, default)` |The `plane` distance (in meters)
  377. of this geo point field from the provided lat/lon with a default value.
  378. |`doc['field_name'].distanceInMiles(lat, lon)` |The `plane` distance (in
  379. miles) of this geo point field from the provided lat/lon.
  380. |`doc['field_name'].distanceInMilesWithDefault(lat, lon, default)` |The `plane` distance (in
  381. miles) of this geo point field from the provided lat/lon with a default value.
  382. |`doc['field_name'].distanceInKm(lat, lon)` |The `plane` distance (in
  383. km) of this geo point field from the provided lat/lon.
  384. |`doc['field_name'].distanceInKmWithDefault(lat, lon, default)` |The `plane` distance (in
  385. km) of this geo point field from the provided lat/lon with a default value.
  386. |`doc['field_name'].arcDistance(lat, lon)` |The `arc` distance (in
  387. meters) of this geo point field from the provided lat/lon.
  388. |`doc['field_name'].arcDistanceWithDefault(lat, lon, default)` |The `arc` distance (in
  389. meters) of this geo point field from the provided lat/lon with a default value.
  390. |`doc['field_name'].arcDistanceInMiles(lat, lon)` |The `arc` distance (in
  391. miles) of this geo point field from the provided lat/lon.
  392. |`doc['field_name'].arcDistanceInMilesWithDefault(lat, lon, default)` |The `arc` distance (in
  393. miles) of this geo point field from the provided lat/lon with a default value.
  394. |`doc['field_name'].arcDistanceInKm(lat, lon)` |The `arc` distance (in
  395. km) of this geo point field from the provided lat/lon.
  396. |`doc['field_name'].arcDistanceInKmWithDefault(lat, lon, default)` |The `arc` distance (in
  397. km) of this geo point field from the provided lat/lon with a default value.
  398. |`doc['field_name'].factorDistance(lat, lon)` |The distance factor of this geo point field from the provided lat/lon.
  399. |`doc['field_name'].factorDistance(lat, lon, default)` |The distance factor of this geo point field from the provided lat/lon with a default value.
  400. |`doc['field_name'].geohashDistance(geohash)` |The `arc` distance (in meters)
  401. of this geo point field from the provided geohash.
  402. |`doc['field_name'].geohashDistanceInKm(geohash)` |The `arc` distance (in km)
  403. of this geo point field from the provided geohash.
  404. |`doc['field_name'].geohashDistanceInMiles(geohash)` |The `arc` distance (in
  405. miles) of this geo point field from the provided geohash.
  406. |=======================================================================
  407. [float]
  408. === Stored Fields
  409. Stored fields can also be accessed when executing a script. Note, they
  410. are much slower to access compared with document fields, as they are not
  411. loaded into memory. They can be simply accessed using
  412. `_fields['my_field_name'].value` or `_fields['my_field_name'].values`.
  413. [float]
  414. === Accessing the score of a document within a script
  415. When using scripting for calculating the score of a document (for instance, with
  416. the `function_score` query), you can access the score using the `_score`
  417. variable inside of a Groovy script.
  418. [float]
  419. === Source Field
  420. The source field can also be accessed when executing a script. The
  421. source field is loaded per doc, parsed, and then provided to the script
  422. for evaluation. The `_source` forms the context under which the source
  423. field can be accessed, for example `_source.obj2.obj1.field3`.
  424. Accessing `_source` is much slower compared to using `doc`
  425. but the data is not loaded into memory. For a single field access `_fields` may be
  426. faster than using `_source` due to the extra overhead of potentially parsing large documents.
  427. However, `_source` may be faster if you access multiple fields or if the source has already been
  428. loaded for other purposes.
  429. [float]
  430. === Groovy Built In Functions
  431. There are several built in functions that can be used within scripts.
  432. They include:
  433. [cols="<,<",options="header",]
  434. |=======================================================================
  435. |Function |Description
  436. |`sin(a)` |Returns the trigonometric sine of an angle.
  437. |`cos(a)` |Returns the trigonometric cosine of an angle.
  438. |`tan(a)` |Returns the trigonometric tangent of an angle.
  439. |`asin(a)` |Returns the arc sine of a value.
  440. |`acos(a)` |Returns the arc cosine of a value.
  441. |`atan(a)` |Returns the arc tangent of a value.
  442. |`toRadians(angdeg)` |Converts an angle measured in degrees to an
  443. approximately equivalent angle measured in radians
  444. |`toDegrees(angrad)` |Converts an angle measured in radians to an
  445. approximately equivalent angle measured in degrees.
  446. |`exp(a)` |Returns Euler's number _e_ raised to the power of value.
  447. |`log(a)` |Returns the natural logarithm (base _e_) of a value.
  448. |`log10(a)` |Returns the base 10 logarithm of a value.
  449. |`sqrt(a)` |Returns the correctly rounded positive square root of a
  450. value.
  451. |`cbrt(a)` |Returns the cube root of a double value.
  452. |`IEEEremainder(f1, f2)` |Computes the remainder operation on two
  453. arguments as prescribed by the IEEE 754 standard.
  454. |`ceil(a)` |Returns the smallest (closest to negative infinity) value
  455. that is greater than or equal to the argument and is equal to a
  456. mathematical integer.
  457. |`floor(a)` |Returns the largest (closest to positive infinity) value
  458. that is less than or equal to the argument and is equal to a
  459. mathematical integer.
  460. |`rint(a)` |Returns the value that is closest in value to the argument
  461. and is equal to a mathematical integer.
  462. |`atan2(y, x)` |Returns the angle _theta_ from the conversion of
  463. rectangular coordinates (_x_, _y_) to polar coordinates (r,_theta_).
  464. |`pow(a, b)` |Returns the value of the first argument raised to the
  465. power of the second argument.
  466. |`round(a)` |Returns the closest _int_ to the argument.
  467. |`random()` |Returns a random _double_ value.
  468. |`abs(a)` |Returns the absolute value of a value.
  469. |`max(a, b)` |Returns the greater of two values.
  470. |`min(a, b)` |Returns the smaller of two values.
  471. |`ulp(d)` |Returns the size of an ulp of the argument.
  472. |`signum(d)` |Returns the signum function of the argument.
  473. |`sinh(x)` |Returns the hyperbolic sine of a value.
  474. |`cosh(x)` |Returns the hyperbolic cosine of a value.
  475. |`tanh(x)` |Returns the hyperbolic tangent of a value.
  476. |`hypot(x, y)` |Returns sqrt(_x2_ + _y2_) without intermediate overflow
  477. or underflow.
  478. |=======================================================================
  479. [float]
  480. === Arithmetic precision in MVEL
  481. When dividing two numbers using MVEL based scripts, the engine tries to
  482. be smart and adheres to the default behaviour of java. This means if you
  483. divide two integers (you might have configured the fields as integer in
  484. the mapping), the result will also be an integer. This means, if a
  485. calculation like `1/num` is happening in your scripts and `num` is an
  486. integer with the value of `8`, the result is `0` even though you were
  487. expecting it to be `0.125`. You may need to enforce precision by
  488. explicitly using a double like `1.0/num` in order to get the expected
  489. result.