|
@@ -48,7 +48,7 @@ convert `nfc` to `nfd` or `nfkc` to `nfkd` respectively:
|
|
|
Here are two examples, the default usage and a customised character filter:
|
|
|
|
|
|
|
|
|
-[source,json]
|
|
|
+[source,js]
|
|
|
--------------------------------------------------
|
|
|
PUT icu_sample
|
|
|
{
|
|
@@ -96,7 +96,7 @@ but adds better support for some Asian languages by using a dictionary-based
|
|
|
approach to identify words in Thai, Lao, Chinese, Japanese, and Korean, and
|
|
|
using custom rules to break Myanmar and Khmer text into syllables.
|
|
|
|
|
|
-[source,json]
|
|
|
+[source,js]
|
|
|
--------------------------------------------------
|
|
|
PUT icu_sample
|
|
|
{
|
|
@@ -137,7 +137,7 @@ As a demonstration of how the rule files can be used, save the following user fi
|
|
|
|
|
|
Then create an analyzer to use this rule file as follows:
|
|
|
|
|
|
-[source,json]
|
|
|
+[source,js]
|
|
|
--------------------------------------------------
|
|
|
PUT icu_sample
|
|
|
{
|
|
@@ -167,7 +167,7 @@ POST icu_sample/_analyze?analyzer=my_analyzer&text=Elasticsearch. Wow!
|
|
|
|
|
|
The above `analyze` request returns the following:
|
|
|
|
|
|
-[source,json]
|
|
|
+[source,js]
|
|
|
--------------------------------------------------
|
|
|
# Result
|
|
|
{
|
|
@@ -198,7 +198,7 @@ You should probably prefer the <<analysis-icu-normalization-charfilter,Normaliza
|
|
|
|
|
|
Here are two examples, the default usage and a customised token filter:
|
|
|
|
|
|
-[source,json]
|
|
|
+[source,js]
|
|
|
--------------------------------------------------
|
|
|
PUT icu_sample
|
|
|
{
|
|
@@ -244,7 +244,7 @@ Case folding of Unicode characters based on `UTR#30`, like the
|
|
|
on steroids. It registers itself as the `icu_folding` token filter and is
|
|
|
available to all indices:
|
|
|
|
|
|
-[source,json]
|
|
|
+[source,js]
|
|
|
--------------------------------------------------
|
|
|
PUT icu_sample
|
|
|
{
|
|
@@ -278,7 +278,7 @@ to note that both upper and lowercase forms should be specified, and that
|
|
|
these filtered character are not lowercased which is why we add the
|
|
|
`lowercase` filter as well:
|
|
|
|
|
|
-[source,json]
|
|
|
+[source,js]
|
|
|
--------------------------------------------------
|
|
|
PUT icu_sample
|
|
|
{
|
|
@@ -319,7 +319,7 @@ which is a best-effort attempt at language-neutral sorting.
|
|
|
Below is an example of how to set up a field for sorting German names in
|
|
|
``phonebook'' order:
|
|
|
|
|
|
-[source,json]
|
|
|
+[source,js]
|
|
|
--------------------------------------------------
|
|
|
PUT /my_index
|
|
|
{
|
|
@@ -452,7 +452,7 @@ rulesets are not yet supported.
|
|
|
|
|
|
For example:
|
|
|
|
|
|
-[source,json]
|
|
|
+[source,js]
|
|
|
--------------------------------------------------
|
|
|
PUT icu_sample
|
|
|
{
|