Browse Source

Docs: Updated stop token filter docs

Clinton Gormley 11 years ago
parent
commit
cf059378d1
1 changed files with 69 additions and 26 deletions
  1. 69 26
      docs/reference/analysis/tokenfilters/stop-tokenfilter.asciidoc

+ 69 - 26
docs/reference/analysis/tokenfilters/stop-tokenfilter.asciidoc

@@ -7,29 +7,72 @@ streams.
 The following are settings that can be set for a `stop` token filter
 type:
 
-[cols="<,<",options="header",]
-|=======================================================================
-|Setting |Description
-|`stopwords` |A list of stop words to use. Defaults to english stop
-words.
-
-|`stopwords_path` |A path (either relative to `config` location, or
-absolute) to a stopwords file configuration. Each stop word should be in
-its own "line" (separated by a line break). The file must be UTF-8
-encoded.
-
-|`ignore_case` |Set to `true` to lower case all words first. Defaults to
-`false`.
-
-|`remove_trailing` |Set to `false` in order to not ignore the last term of
-a search if it is a stop word. This is very useful for the completion
-suggester as a query like `green a` can be extended to `green apple` even
-though you remove stop words in general. Defaults to `true`.
-|=======================================================================
-
-stopwords allow for custom language specific expansion of default
-stopwords. It follows the `_lang_` notation and supports: arabic,
-armenian, basque, brazilian, bulgarian, catalan, czech, danish, dutch,
-english, finnish, french, galician, german, greek, hindi, hungarian,
-indonesian, italian, norwegian, persian, portuguese, romanian, russian,
-spanish, swedish, turkish.
+[horizontal]
+`stopwords`::
+
+    A list of stop words to use. Defaults to `_english_` stop words.
+
+`stopwords_path`::
+
+    A path (either relative to `config` location, or absolute) to a stopwords
+    file configuration. Each stop word should be in its own "line" (separated
+    by a line break). The file must be UTF-8 encoded.
+
+`ignore_case`::
+
+    Set to `true` to lower case all words first. Defaults to `false`.
+
+`remove_trailing`::
+
+    Set to `false` in order to not ignore the last term of a search if it is a
+    stop word. This is very useful for the completion suggester as a query
+    like `green a` can be extended to `green apple` even though you remove
+    stop words in general. Defaults to `true`.
+
+The `stopwords` parameter accepts either an array of stopwords:
+
+[source,json]
+------------------------------------
+PUT /my_index
+{
+    "settings": {
+        "analysis": {
+            "filter": {
+                "my_stop": {
+                    "type":       "stop",
+                    "stopwords": ["and", "is", "the"]
+                }
+            }
+        }
+    }
+}
+------------------------------------
+
+or a predefined language-specific list:
+
+[source,json]
+------------------------------------
+PUT /my_index
+{
+    "settings": {
+        "analysis": {
+            "filter": {
+                "my_stop": {
+                    "type":       "stop",
+                    "stopwords":  "_english_"
+                }
+            }
+        }
+    }
+}
+------------------------------------
+
+Elasticsearch provides the following predefined list of languages:
+
+`_arabic_`, `_armenian_`, `_basque_`, `_brazilian_`, `_bulgarian_`,
+`_catalan_`, `_czech_`, `_danish_`, `_dutch_`, `_english_`, `_finnish_`,
+`_french_`, `_galician_`, `_german_`, `_greek_`, `_hindi_`, `_hungarian_`,
+`_indonesian_`, `_italian_`, `_norwegian_`, `_persian_`, `_portuguese_`,
+`_romanian_`, `_russian_`, `_spanish_`, `_swedish_`, `_turkish_`.
+
+For the empty stopwords list (to disable stopwords) use: `_none_`.