|
@@ -4,4 +4,107 @@
|
|
|
<titleabbrev>Trim</titleabbrev>
|
|
|
++++
|
|
|
|
|
|
-The `trim` token filter trims the whitespace surrounding a token.
|
|
|
+Removes leading and trailing whitespace from each token in a stream.
|
|
|
+
|
|
|
+The `trim` filter uses Lucene's
|
|
|
+https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html[TrimFilter].
|
|
|
+
|
|
|
+[TIP]
|
|
|
+====
|
|
|
+Many commonly used tokenizers, such as the
|
|
|
+<<analysis-standard-tokenizer,`standard`>> or
|
|
|
+<<analysis-whitespace-tokenizer,`whitespace`>> tokenizer, remove whitespace by
|
|
|
+default. When using these tokenizers, you don't need to add a separate `trim`
|
|
|
+filter.
|
|
|
+====
|
|
|
+
|
|
|
+[[analysis-trim-tokenfilter-analyze-ex]]
|
|
|
+==== Example
|
|
|
+
|
|
|
+To see how the `trim` filter works, you first need to produce a token
|
|
|
+containing whitespace.
|
|
|
+
|
|
|
+The following <<indices-analyze,analyze API>> request uses the
|
|
|
+<<analysis-keyword-tokenizer,`keyword`>> tokenizer to produce a token for
|
|
|
+`" fox "`.
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----
|
|
|
+GET _analyze
|
|
|
+{
|
|
|
+ "tokenizer" : "keyword",
|
|
|
+ "text" : " fox "
|
|
|
+}
|
|
|
+----
|
|
|
+
|
|
|
+The API returns the following response. Note the `" fox "` token contains
|
|
|
+the original text's whitespace.
|
|
|
+
|
|
|
+[source,console-result]
|
|
|
+----
|
|
|
+{
|
|
|
+ "tokens": [
|
|
|
+ {
|
|
|
+ "token": " fox ",
|
|
|
+ "start_offset": 0,
|
|
|
+ "end_offset": 5,
|
|
|
+ "type": "word",
|
|
|
+ "position": 0
|
|
|
+ }
|
|
|
+ ]
|
|
|
+}
|
|
|
+----
|
|
|
+
|
|
|
+To remove the whitespace, add the `trim` filter to the previous analyze API
|
|
|
+request.
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----
|
|
|
+GET _analyze
|
|
|
+{
|
|
|
+ "tokenizer" : "keyword",
|
|
|
+ "filter" : ["trim"],
|
|
|
+ "text" : " fox "
|
|
|
+}
|
|
|
+----
|
|
|
+
|
|
|
+The API returns the following response. The returned `fox` token does not
|
|
|
+include any leading or trailing whitespace.
|
|
|
+
|
|
|
+[source,console-result]
|
|
|
+----
|
|
|
+{
|
|
|
+ "tokens": [
|
|
|
+ {
|
|
|
+ "token": "fox",
|
|
|
+ "start_offset": 0,
|
|
|
+ "end_offset": 5,
|
|
|
+ "type": "word",
|
|
|
+ "position": 0
|
|
|
+ }
|
|
|
+ ]
|
|
|
+}
|
|
|
+----
|
|
|
+
|
|
|
+[[analysis-trim-tokenfilter-analyzer-ex]]
|
|
|
+==== Add to an analyzer
|
|
|
+
|
|
|
+The following <<indices-create-index,create index API>> request uses the `trim`
|
|
|
+filter to configure a new <<analysis-custom-analyzer,custom analyzer>>.
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----
|
|
|
+PUT trim_example
|
|
|
+{
|
|
|
+ "settings": {
|
|
|
+ "analysis": {
|
|
|
+ "analyzer": {
|
|
|
+ "keyword_trim": {
|
|
|
+ "tokenizer": "keyword",
|
|
|
+ "filter": [ "trim" ]
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+----
|