Browse Source

[DOCS] Reformat unique token filter docs (#50748)

* Updates the description
* Adds analyze, custom analyzer, and custom filter snippets
* Adds parameter documentation
James Rodewig 5 years ago
parent
commit
c99a0e9a5e
1 changed files with 144 additions and 4 deletions
  1. 144 4
      docs/reference/analysis/tokenfilters/unique-tokenfilter.asciidoc

+ 144 - 4
docs/reference/analysis/tokenfilters/unique-tokenfilter.asciidoc

@@ -4,7 +4,147 @@
 <titleabbrev>Unique</titleabbrev>
 ++++
 
-The `unique` token filter can be used to only index unique tokens during
-analysis. By default it is applied on all the token stream. If
-`only_on_same_position` is set to `true`, it will only remove duplicate
-tokens on the same position.
+Removes duplicate tokens from a stream. For example, you can use the `unique`
+filter to change `the lazy lazy dog` to `the lazy dog`.
+
+If the `only_on_same_position` parameter is set to `true`, the `unique` filter
+removes only duplicate tokens _in the same position_.
+
+[NOTE]
+====
+When `only_on_same_position` is `true`, the `unique` filter works the same as
+<<analysis-remove-duplicates-tokenfilter,`remove_duplicates`>> filter.
+====
+
+[[analysis-unique-tokenfilter-analyze-ex]]
+==== Example
+
+The following <<indices-analyze,analyze API>> request uses the `unique` filter
+to remove duplicate tokens from `the quick fox jumps the lazy fox`:
+
+[source,console]
+--------------------------------------------------
+GET _analyze
+{
+  "tokenizer" : "whitespace",
+  "filter" : ["unique"],
+  "text" : "the quick fox jumps the lazy fox"
+}
+--------------------------------------------------
+
+The filter removes duplicated tokens for `the` and `fox`, producing the
+following output:
+
+[source,text]
+--------------------------------------------------
+[ the, quick, fox, jumps, lazy ]
+--------------------------------------------------
+
+/////////////////////
+[source,console-result]
+--------------------------------------------------
+{
+  "tokens" : [
+    {
+      "token" : "the",
+      "start_offset" : 0,
+      "end_offset" : 3,
+      "type" : "word",
+      "position" : 0
+    },
+    {
+      "token" : "quick",
+      "start_offset" : 4,
+      "end_offset" : 9,
+      "type" : "word",
+      "position" : 1
+    },
+    {
+      "token" : "fox",
+      "start_offset" : 10,
+      "end_offset" : 13,
+      "type" : "word",
+      "position" : 2
+    },
+    {
+      "token" : "jumps",
+      "start_offset" : 14,
+      "end_offset" : 19,
+      "type" : "word",
+      "position" : 3
+    },
+    {
+      "token" : "lazy",
+      "start_offset" : 24,
+      "end_offset" : 28,
+      "type" : "word",
+      "position" : 4
+    }
+  ]
+}
+--------------------------------------------------
+/////////////////////
+
+[[analysis-unique-tokenfilter-analyzer-ex]]
+==== Add to an analyzer
+
+The following <<indices-create-index,create index API>> request uses the
+`unique` filter to configure a new <<analysis-custom-analyzer,custom analyzer>>.
+
+[source,console]
+--------------------------------------------------
+PUT custom_unique_example
+{
+  "settings" : {
+    "analysis" : {
+      "analyzer" : {
+        "standard_truncate" : {
+        "tokenizer" : "standard",
+        "filter" : ["unique"]
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
+
+[[analysis-unique-tokenfilter-configure-parms]]
+==== Configurable parameters
+
+`only_on_same_position`::
+(Optional, boolean)
+If `true`, only remove duplicate tokens in the same position.
+Defaults to `false`.
+
+[[analysis-unique-tokenfilter-customize]]
+==== Customize
+
+To customize the `unique` filter, duplicate it to create the basis
+for a new custom token filter. You can modify the filter using its configurable
+parameters.
+
+For example, the following request creates a custom `unique` filter with
+`only_on_same_position` set to `true`.
+
+[source,console]
+--------------------------------------------------
+PUT letter_unique_pos_example
+{
+  "settings": {
+    "analysis": {
+      "analyzer": {
+        "letter_unique_pos": {
+          "tokenizer": "letter",
+          "filter": [ "unique_pos" ]
+        }
+      },
+      "filter": {
+        "unique_pos": {
+          "type": "unique",
+          "only_on_same_position": true
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------