example-text-analysis-plugin.asciidoc 6.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222
  1. [[example-text-analysis-plugin]]
  2. ==== Example text analysis plugin
  3. This example shows how to create a simple "Hello world" text analysis plugin
  4. using the stable plugin API. The plugin provides a custom Lucene token filter
  5. that strips all tokens except for "hello" and "world".
  6. Elastic provides a Grade plugin, `elasticsearch.stable-esplugin`, that makes it
  7. easier to develop and package stable plugins. The steps in this guide assume you
  8. use this plugin. However, you don't need Gradle to create plugins.
  9. . Create a new directory for your project.
  10. . In this example, the source code is organized under the `main` and
  11. `test` directories. In your project's home directory, create `src/` `src/main/`,
  12. and `src/test/` directories.
  13. . Create the following `build.gradle` build script in your project's home
  14. directory:
  15. +
  16. [source,gradle]
  17. ----
  18. ext.pluginApiVersion = '8.7.0-SNAPSHOT'
  19. ext.luceneVersion = '9.5.0-snapshot-d19c3e2e0ed'
  20. buildscript {
  21. ext.pluginApiVersion = '8.7.0-SNAPSHOT'
  22. repositories {
  23. maven {
  24. url = 'https://snapshots.elastic.co/maven/'
  25. }
  26. mavenCentral()
  27. }
  28. dependencies {
  29. classpath "org.elasticsearch.gradle:build-tools:${pluginApiVersion}"
  30. }
  31. }
  32. apply plugin: 'elasticsearch.stable-esplugin'
  33. apply plugin: 'elasticsearch.yaml-rest-test'
  34. esplugin {
  35. name 'my-plugin'
  36. description 'My analysis plugin'
  37. }
  38. group 'org.example'
  39. version '1.0-SNAPSHOT'
  40. repositories {
  41. maven {
  42. url = "https://s3.amazonaws.com/download.elasticsearch.org/lucenesnapshots/d19c3e2e0ed/"
  43. }
  44. maven {
  45. url = 'https://snapshots.elastic.co/maven/'
  46. }
  47. mavenLocal()
  48. mavenCentral()
  49. }
  50. dependencies {
  51. //TODO transitive dependency off and plugin-api dependency?
  52. compileOnly "org.elasticsearch.plugin:elasticsearch-plugin-api:${pluginApiVersion}"
  53. compileOnly "org.elasticsearch.plugin:elasticsearch-plugin-analysis-api:${pluginApiVersion}"
  54. compileOnly "org.apache.lucene:lucene-analysis-common:${luceneVersion}"
  55. //TODO for testing this also have to be declared
  56. testImplementation "org.elasticsearch.plugin:elasticsearch-plugin-api:${pluginApiVersion}"
  57. testImplementation "org.elasticsearch.plugin:elasticsearch-plugin-analysis-api:${pluginApiVersion}"
  58. testImplementation "org.apache.lucene:lucene-analysis-common:${luceneVersion}"
  59. testImplementation ('junit:junit:4.13.2'){
  60. exclude group: 'org.hamcrest'
  61. }
  62. testImplementation 'org.mockito:mockito-core:4.4.0'
  63. testImplementation 'org.hamcrest:hamcrest:2.2'
  64. }
  65. ----
  66. . In `src/main/java/org/example/`, create `HelloWorldTokenFilter.java`. This
  67. file provides the code for a token filter that strips all tokens except for
  68. "hello" and "world":
  69. +
  70. [source,java]
  71. ----
  72. package org.example;
  73. import org.apache.lucene.analysis.FilteringTokenFilter;
  74. import org.apache.lucene.analysis.TokenStream;
  75. import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
  76. import java.util.Arrays;
  77. public class HelloWorldTokenFilter extends FilteringTokenFilter {
  78. private final CharTermAttribute term = addAttribute(CharTermAttribute.class);
  79. public HelloWorldTokenFilter(TokenStream input) {
  80. super(input);
  81. }
  82. @Override
  83. public boolean accept() {
  84. if (term.length() != 5) return false;
  85. return Arrays.equals(term.buffer(), 0, 4, "hello".toCharArray(), 0, 4)
  86. || Arrays.equals(term.buffer(), 0, 4, "world".toCharArray(), 0, 4);
  87. }
  88. }
  89. ----
  90. . This filter can be provided to Elasticsearch using the following
  91. `HelloWorldTokenFilterFactory.java` factory class. The `@NamedComponent`
  92. annotation is used to give the filter the `hello_world` name. This is the name
  93. you can use to refer to the filter, once the plugin has been deployed.
  94. +
  95. [source,java]
  96. ----
  97. package org.example;
  98. import org.apache.lucene.analysis.TokenStream;
  99. import org.elasticsearch.plugin.analysis.TokenFilterFactory;
  100. import org.elasticsearch.plugin.NamedComponent;
  101. @NamedComponent(value = "hello_world")
  102. public class HelloWorldTokenFilterFactory implements TokenFilterFactory {
  103. @Override
  104. public TokenStream create(TokenStream tokenStream) {
  105. return new HelloWorldTokenFilter(tokenStream);
  106. }
  107. }
  108. ----
  109. . Unit tests may go under the `src/test` directory. You will have to add
  110. dependencies for your preferred testing framework.
  111. . Run:
  112. +
  113. [source,sh]
  114. ----
  115. gradle bundlePlugin
  116. ----
  117. This builds the JAR file, generates the metadata files, and bundles them into a
  118. plugin ZIP file. The resulting ZIP file will be written to the
  119. `build/distributions` directory.
  120. . <<plugin-management,Install the plugin>>.
  121. . You can use the `_analyze` API to verify that the `hello_world` token filter
  122. works as expected:
  123. +
  124. [source,console]
  125. ----
  126. GET /_analyze
  127. {
  128. "text": "hello to everyone except the world",
  129. "tokenizer": "standard",
  130. "filter": ["hello_world"]
  131. }
  132. ----
  133. // TEST[skip:would require this plugin to be installed]
  134. [discrete]
  135. === YAML REST tests
  136. If you are using the `elasticsearch.stable-esplugin` plugin for Gradle, you can
  137. use {es}'s YAML Rest Test framework. This framework allows you to load your
  138. plugin in a running test cluster and issue real REST API queries against it. The
  139. full syntax for this framework is beyond the scope of this tutorial, but there
  140. are many examples in the Elasticsearch repository. Refer to the
  141. {es-repo}tree/main/plugins/examples/stable-analysis[example analysis plugin] in
  142. the {es} Github repository for an example.
  143. . Create a `yamlRestTest` directory in the `src` directory.
  144. . Under the `yamlRestTest` directory, create a `java` folder for Java sources
  145. and a `resources` folder.
  146. . In `src/yamlRestTest/java/org/example/`, create
  147. `HelloWorldPluginClientYamlTestSuiteIT.java`. This class implements
  148. `ESClientYamlSuiteTestCase`.
  149. +
  150. [source,java]
  151. ----
  152. import com.carrotsearch.randomizedtesting.annotations.Name;
  153. import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;
  154. import org.elasticsearch.test.rest.yaml.ClientYamlTestCandidate;
  155. import org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase;
  156. public class HelloWorldPluginClientYamlTestSuiteIT extends ESClientYamlSuiteTestCase {
  157. public HelloWorldPluginClientYamlTestSuiteIT(
  158. @Name("yaml") ClientYamlTestCandidate testCandidate
  159. ) {
  160. super(testCandidate);
  161. }
  162. @ParametersFactory
  163. public static Iterable<Object[]> parameters() throws Exception {
  164. return ESClientYamlSuiteTestCase.createParameters();
  165. }
  166. }
  167. ----
  168. . In `src/yamlRestTest/resources/rest-api-spec/test/plugin`, create the
  169. `10_token_filter.yml` YAML file:
  170. +
  171. [source,yaml]
  172. ----
  173. ## Sample rest test
  174. ---
  175. "Hello world plugin test - removes all tokens except hello and world":
  176. - do:
  177. indices.analyze:
  178. body:
  179. text: hello to everyone except the world
  180. tokenizer: standard
  181. filter:
  182. - type: "hello_world"
  183. - length: { tokens: 2 }
  184. - match: { tokens.0.token: "hello" }
  185. - match: { tokens.1.token: "world" }
  186. ----
  187. . Run the test with:
  188. +
  189. [source,sh]
  190. ----
  191. gradle yamlRestTest
  192. ----