Browse Source

[DOCS] Reformat trim token filter docs (#51649)

Makes the following changes to the `trim` token filter docs:

* Updates description
* Adds a link to the related Lucene filter
* Adds tip about removing whitespace using tokenizers
* Adds detailed analyze snippets
* Adds custom analyzer snippet
James Rodewig 5 years ago
parent
commit
996ec0def7
1 changed files with 104 additions and 1 deletions
  1. 104 1
      docs/reference/analysis/tokenfilters/trim-tokenfilter.asciidoc

+ 104 - 1
docs/reference/analysis/tokenfilters/trim-tokenfilter.asciidoc

@@ -4,4 +4,107 @@
 <titleabbrev>Trim</titleabbrev>
 ++++
 
-The `trim` token filter trims the whitespace surrounding a token.
+Removes leading and trailing whitespace from each token in a stream.
+
+The `trim` filter uses Lucene's
+https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html[TrimFilter].
+
+[TIP]
+====
+Many commonly used tokenizers, such as the
+<<analysis-standard-tokenizer,`standard`>> or
+<<analysis-whitespace-tokenizer,`whitespace`>> tokenizer, remove whitespace by
+default. When using these tokenizers, you don't need to add a separate `trim`
+filter.
+====
+
+[[analysis-trim-tokenfilter-analyze-ex]]
+==== Example
+
+To see how the `trim` filter works, you first need to produce a token
+containing whitespace.
+
+The following <<indices-analyze,analyze API>> request uses the
+<<analysis-keyword-tokenizer,`keyword`>> tokenizer to produce a token for 
+`" fox "`.
+
+[source,console]
+----
+GET _analyze
+{
+  "tokenizer" : "keyword",
+  "text" : " fox "
+}
+----
+
+The API returns the following response. Note the `" fox "` token contains
+the original text's whitespace.
+
+[source,console-result]
+----
+{
+  "tokens": [
+    {
+      "token": " fox ",
+      "start_offset": 0,
+      "end_offset": 5,
+      "type": "word",
+      "position": 0
+    }
+  ]
+}
+----
+
+To remove the whitespace, add the `trim` filter to the previous analyze API
+request.
+
+[source,console]
+----
+GET _analyze
+{
+  "tokenizer" : "keyword",
+  "filter" : ["trim"],
+  "text" : " fox "
+}
+----
+
+The API returns the following response. The returned `fox` token does not
+include any leading or trailing whitespace.
+
+[source,console-result]
+----
+{
+  "tokens": [
+    {
+      "token": "fox",
+      "start_offset": 0,
+      "end_offset": 5,
+      "type": "word",
+      "position": 0
+    }
+  ]
+}
+----
+
+[[analysis-trim-tokenfilter-analyzer-ex]]
+==== Add to an analyzer
+
+The following <<indices-create-index,create index API>> request uses the `trim`
+filter to configure a new <<analysis-custom-analyzer,custom analyzer>>.
+
+[source,console]
+----
+PUT trim_example
+{
+  "settings": {
+    "analysis": {
+      "analyzer": {
+        "keyword_trim": {
+          "tokenizer": "keyword",
+          "filter": [ "trim" ]
+        }
+      }
+    }
+  }
+}
+----