12 years ago · ea05f4538c
--- a/docs/reference/analysis/icu-plugin.asciidoc
+++ b/docs/reference/analysis/icu-plugin.asciidoc
@@ -39,7 +39,7 @@ Here is a sample settings:
 
				 === ICU Folding
			
 
				 
			
 
				 Folding of unicode characters based on `UTR#30`. It registers itself
			
 
				-under `icu_folding` and `icuFolding` names.  
			
 
				+under `icu_folding` and `icuFolding` names.
			
 
				 The filter also does lowercasing, which means the lowercase filter can
			
 
				 normally be left out. Sample setting:
			
 
				 
			
@@ -70,7 +70,7 @@ primary letters in a specific language is wanted. See syntax for the
 
				 UnicodeSet
			
 
				 http://icu-project.org/apiref/icu4j/com/ibm/icu/text/UnicodeSet.html[here].
			
 
				 
			
 
				-The Following example excempt Swedish characters from the folding. Note
			
 
				+The Following example exempts Swedish characters from the folding. Note
			
 
				 that the filtered characters are NOT lowercased which is why we add that
			
 
				 filter below.
			
 
				 
			
@@ -148,5 +148,73 @@ And here is a sample of custom collation:
 
				             }
			
 
				         }
			
 
				     }
			
 
				-}    
			
 
				+}
			
 
				 --------------------------------------------------
			
 
				+
			
 
				+[float]
			
 
				+==== Options
			
 
				+
			
 
				+[horizontal]
			
 
				+`strength`::
			
 
				+    The strength property determines the minimum level of difference considered significant during comparison.
			
 
				+     The default strength for the Collator is `tertiary`, unless specified otherwise by the locale used to create the Collator.
			
 
				+     Possible values: `primary`, `secondary`, `tertiary`, `quaternary` or `identical`.
			
 
				+ +
			
 
				+ See http://icu-project.org/apiref/icu4j/com/ibm/icu/text/Collator.html[ICU Collation] documentation for a more detailed
			
 
				+ explanation for the specific values.
			
 
				+
			
 
				+`decomposition`::
			
 
				+    Possible values: `no` or `canonical`. Defaults to `no`. Setting this decomposition property with
			
 
				+    `canonical` allows the Collator to handle un-normalized text properly, producing the same results as if the text were
			
 
				+    normalized. If `no` is set, it is the user's responsibility to insure that all text is already in the appropriate form
			
 
				+    before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user to select between
			
 
				+    faster and more complete collation behavior. Since a great many of the world's languages do not require text
			
 
				+    normalization, most locales set `no` as the default decomposition mode.
			
 
				+
			
 
				+[float]
			
 
				+==== Expert options:
			
 
				+
			
 
				+[horizontal]
			
 
				+`alternate`::
			
 
				+     Possible values: `shifted` or `non-ignorable`. Sets the alternate handling for strength `quaternary`
			
 
				+     to be either shifted or non-ignorable. What boils down to ignoring punctuation and whitespace.
			
 
				+
			
 
				+`caseLevel`::
			
 
				+    Possible values: `true` or `false`. Default is `false`. Whether case level sorting is required. When
			
 
				+     strength is set to `primary` this will ignore accent differences.
			
 
				+
			
 
				+`caseFirst`::
			
 
				+    Possible values: `lower` or `upper`. Useful to control which case is sorted first when case is not ignored
			
 
				+    for strength `tertiary`.
			
 
				+
			
 
				+`numeric`::
			
 
				+    Possible values: `true` or `false`. Whether digits are sorted according to numeric representation. For
			
 
				+    example the value `egg-9` is sorted before the value `egg-21`. Defaults to `false`.
			
 
				+
			
 
				+`variableTop`::
			
 
				+    Single character or contraction. Controls what is variable for `alternate`.
			
 
				+
			
 
				+`hiraganaQuaternaryMode`::
			
 
				+    Possible values: `true` or `false`. Defaults to `false`. Distinguishing between Katakana and
			
 
				+    Hiragana characters in `quaternary` strength .
			
 
				+
			
 
				+[float]
			
 
				+=== ICU Tokenizer
			
 
				+
			
 
				+Breaks text into words according to UAX #29: Unicode Text Segmentation ((http://www.unicode.org/reports/tr29/)).
			
 
				+
			
 
				+[source,js]
			
 
				+--------------------------------------------------
			
 
				+{
			
 
				+    "index" : {
			
 
				+        "analysis" : {
			
 
				+            "analyzer" : {
			
 
				+                "collation" : {
			
 
				+                    "tokenizer" : "icu_tokenizer",
			
 
				+                }
			
 
				+            }
			
 
				+        }
			
 
				+    }
			
 
				+}
			
 
				+--------------------------------------------------
			
 
				+