|
@@ -218,3 +218,29 @@ Breaks text into words according to UAX #29: Unicode Text Segmentation ((http://
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
|
|
|
+
|
|
|
+[float]
|
|
|
+=== ICU Normalization CharFilter
|
|
|
+
|
|
|
+Normalizes characters as explained http://userguide.icu-project.org/transforms/normalization[here].
|
|
|
+It registers itself by default under `icu_normalizer` or `icuNormalizer` using the default settings.
|
|
|
+Allows for the name parameter to be provided which can include the following values: `nfc`, `nfkc`, and `nfkc_cf`.
|
|
|
+Allows for the mode parameter to be provided which can include the following values: `compose` and `decompose`.
|
|
|
+Use `decompose` with `nfc` or `nfkc`, to get `nfd` or `nfkd`, respectively.
|
|
|
+Here is a sample settings:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+{
|
|
|
+ "index" : {
|
|
|
+ "analysis" : {
|
|
|
+ "analyzer" : {
|
|
|
+ "collation" : {
|
|
|
+ "tokenizer" : "keyword",
|
|
|
+ "char_filter" : ["icu_normalizer"]
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|