1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390139113921393139413951396139713981399140014011402140314041405140614071408140914101411141214131414141514161417141814191420142114221423142414251426142714281429143014311432143314341435143614371438143914401441144214431444144514461447144814491450145114521453145414551456145714581459146014611462146314641465146614671468146914701471147214731474147514761477147814791480148114821483148414851486148714881489149014911492149314941495149614971498149915001501150215031504150515061507150815091510151115121513151415151516151715181519152015211522152315241525152615271528152915301531153215331534153515361537153815391540154115421543154415451546154715481549155015511552155315541555155615571558155915601561156215631564156515661567156815691570157115721573157415751576157715781579158015811582158315841585158615871588158915901591159215931594159515961597159815991600160116021603160416051606160716081609161016111612161316141615161616171618161916201621162216231624162516261627162816291630 |
- [[analysis-lang-analyzer]]
- === Language Analyzers
- A set of analyzers aimed at analyzing specific language text. The
- following types are supported:
- <<arabic-analyzer,`arabic`>>,
- <<armenian-analyzer,`armenian`>>,
- <<basque-analyzer,`basque`>>,
- <<brazilian-analyzer,`brazilian`>>,
- <<bulgarian-analyzer,`bulgarian`>>,
- <<catalan-analyzer,`catalan`>>,
- <<cjk-analyzer,`cjk`>>,
- <<czech-analyzer,`czech`>>,
- <<danish-analyzer,`danish`>>,
- <<dutch-analyzer,`dutch`>>,
- <<english-analyzer,`english`>>,
- <<finnish-analyzer,`finnish`>>,
- <<french-analyzer,`french`>>,
- <<galician-analyzer,`galician`>>,
- <<german-analyzer,`german`>>,
- <<greek-analyzer,`greek`>>,
- <<hindi-analyzer,`hindi`>>,
- <<hungarian-analyzer,`hungarian`>>,
- <<indonesian-analyzer,`indonesian`>>,
- <<irish-analyzer,`irish`>>,
- <<italian-analyzer,`italian`>>,
- <<latvian-analyzer,`latvian`>>,
- <<lithuanian-analyzer,`lithuanian`>>,
- <<norwegian-analyzer,`norwegian`>>,
- <<persian-analyzer,`persian`>>,
- <<portuguese-analyzer,`portuguese`>>,
- <<romanian-analyzer,`romanian`>>,
- <<russian-analyzer,`russian`>>,
- <<sorani-analyzer,`sorani`>>,
- <<spanish-analyzer,`spanish`>>,
- <<swedish-analyzer,`swedish`>>,
- <<turkish-analyzer,`turkish`>>,
- <<thai-analyzer,`thai`>>.
- ==== Configuring language analyzers
- ===== Stopwords
- All analyzers support setting custom `stopwords` either internally in
- the config, or by using an external stopwords file by setting
- `stopwords_path`. Check <<analysis-stop-analyzer,Stop Analyzer>> for
- more details.
- ===== Excluding words from stemming
- The `stem_exclusion` parameter allows you to specify an array
- of lowercase words that should not be stemmed. Internally, this
- functionality is implemented by adding the
- <<analysis-keyword-marker-tokenfilter,`keyword_marker` token filter>>
- with the `keywords` set to the value of the `stem_exclusion` parameter.
- The following analyzers support setting custom `stem_exclusion` list:
- `arabic`, `armenian`, `basque`, `bulgarian`, `catalan`, `czech`,
- `dutch`, `english`, `finnish`, `french`, `galician`,
- `german`, `hindi`, `hungarian`, `indonesian`, `irish`, `italian`, `latvian`,
- `lithuanian`, `norwegian`, `portuguese`, `romanian`, `russian`, `sorani`,
- `spanish`, `swedish`, `turkish`.
- ==== Reimplementing language analyzers
- The built-in language analyzers can be reimplemented as `custom` analyzers
- (as described below) in order to customize their behaviour.
- NOTE: If you do not intend to exclude words from being stemmed (the
- equivalent of the `stem_exclusion` parameter above), then you should remove
- the `keyword_marker` token filter from the custom analyzer configuration.
- [[arabic-analyzer]]
- ===== `arabic` analyzer
- The `arabic` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /arabic_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "arabic_stop": {
- "type": "stop",
- "stopwords": "_arabic_" <1>
- },
- "arabic_keywords": {
- "type": "keyword_marker",
- "keywords": ["مثال"] <2>
- },
- "arabic_stemmer": {
- "type": "stemmer",
- "language": "arabic"
- }
- },
- "analyzer": {
- "arabic": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "arabic_stop",
- "arabic_normalization",
- "arabic_keywords",
- "arabic_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[armenian-analyzer]]
- ===== `armenian` analyzer
- The `armenian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /armenian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "armenian_stop": {
- "type": "stop",
- "stopwords": "_armenian_" <1>
- },
- "armenian_keywords": {
- "type": "keyword_marker",
- "keywords": ["օրինակ"] <2>
- },
- "armenian_stemmer": {
- "type": "stemmer",
- "language": "armenian"
- }
- },
- "analyzer": {
- "armenian": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "armenian_stop",
- "armenian_keywords",
- "armenian_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[basque-analyzer]]
- ===== `basque` analyzer
- The `basque` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /armenian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "basque_stop": {
- "type": "stop",
- "stopwords": "_basque_" <1>
- },
- "basque_keywords": {
- "type": "keyword_marker",
- "keywords": ["Adibidez"] <2>
- },
- "basque_stemmer": {
- "type": "stemmer",
- "language": "basque"
- }
- },
- "analyzer": {
- "basque": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "basque_stop",
- "basque_keywords",
- "basque_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[brazilian-analyzer]]
- ===== `brazilian` analyzer
- The `brazilian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /brazilian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "brazilian_stop": {
- "type": "stop",
- "stopwords": "_brazilian_" <1>
- },
- "brazilian_keywords": {
- "type": "keyword_marker",
- "keywords": ["exemplo"] <2>
- },
- "brazilian_stemmer": {
- "type": "stemmer",
- "language": "brazilian"
- }
- },
- "analyzer": {
- "brazilian": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "brazilian_stop",
- "brazilian_keywords",
- "brazilian_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[bulgarian-analyzer]]
- ===== `bulgarian` analyzer
- The `bulgarian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /bulgarian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "bulgarian_stop": {
- "type": "stop",
- "stopwords": "_bulgarian_" <1>
- },
- "bulgarian_keywords": {
- "type": "keyword_marker",
- "keywords": ["пример"] <2>
- },
- "bulgarian_stemmer": {
- "type": "stemmer",
- "language": "bulgarian"
- }
- },
- "analyzer": {
- "bulgarian": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "bulgarian_stop",
- "bulgarian_keywords",
- "bulgarian_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[catalan-analyzer]]
- ===== `catalan` analyzer
- The `catalan` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /catalan_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "catalan_elision": {
- "type": "elision",
- "articles": [ "d", "l", "m", "n", "s", "t"]
- },
- "catalan_stop": {
- "type": "stop",
- "stopwords": "_catalan_" <1>
- },
- "catalan_keywords": {
- "type": "keyword_marker",
- "keywords": ["exemple"] <2>
- },
- "catalan_stemmer": {
- "type": "stemmer",
- "language": "catalan"
- }
- },
- "analyzer": {
- "catalan": {
- "tokenizer": "standard",
- "filter": [
- "catalan_elision",
- "lowercase",
- "catalan_stop",
- "catalan_keywords",
- "catalan_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[cjk-analyzer]]
- ===== `cjk` analyzer
- The `cjk` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /cjk_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "english_stop": {
- "type": "stop",
- "stopwords": "_english_" <1>
- }
- },
- "analyzer": {
- "cjk": {
- "tokenizer": "standard",
- "filter": [
- "cjk_width",
- "lowercase",
- "cjk_bigram",
- "english_stop"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- [[czech-analyzer]]
- ===== `czech` analyzer
- The `czech` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /czech_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "czech_stop": {
- "type": "stop",
- "stopwords": "_czech_" <1>
- },
- "czech_keywords": {
- "type": "keyword_marker",
- "keywords": ["příklad"] <2>
- },
- "czech_stemmer": {
- "type": "stemmer",
- "language": "czech"
- }
- },
- "analyzer": {
- "czech": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "czech_stop",
- "czech_keywords",
- "czech_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[danish-analyzer]]
- ===== `danish` analyzer
- The `danish` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /danish_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "danish_stop": {
- "type": "stop",
- "stopwords": "_danish_" <1>
- },
- "danish_keywords": {
- "type": "keyword_marker",
- "keywords": ["eksempel"] <2>
- },
- "danish_stemmer": {
- "type": "stemmer",
- "language": "danish"
- }
- },
- "analyzer": {
- "danish": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "danish_stop",
- "danish_keywords",
- "danish_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[dutch-analyzer]]
- ===== `dutch` analyzer
- The `dutch` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /detch_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "dutch_stop": {
- "type": "stop",
- "stopwords": "_dutch_" <1>
- },
- "dutch_keywords": {
- "type": "keyword_marker",
- "keywords": ["voorbeeld"] <2>
- },
- "dutch_stemmer": {
- "type": "stemmer",
- "language": "dutch"
- },
- "dutch_override": {
- "type": "stemmer_override",
- "rules": [
- "fiets=>fiets",
- "bromfiets=>bromfiets",
- "ei=>eier",
- "kind=>kinder"
- ]
- }
- },
- "analyzer": {
- "dutch": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "dutch_stop",
- "dutch_keywords",
- "dutch_override",
- "dutch_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[english-analyzer]]
- ===== `english` analyzer
- The `english` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /english_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "english_stop": {
- "type": "stop",
- "stopwords": "_english_" <1>
- },
- "english_keywords": {
- "type": "keyword_marker",
- "keywords": ["example"] <2>
- },
- "english_stemmer": {
- "type": "stemmer",
- "language": "english"
- },
- "english_possessive_stemmer": {
- "type": "stemmer",
- "language": "possessive_english"
- }
- },
- "analyzer": {
- "english": {
- "tokenizer": "standard",
- "filter": [
- "english_possessive_stemmer",
- "lowercase",
- "english_stop",
- "english_keywords",
- "english_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[finnish-analyzer]]
- ===== `finnish` analyzer
- The `finnish` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /finnish_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "finnish_stop": {
- "type": "stop",
- "stopwords": "_finnish_" <1>
- },
- "finnish_keywords": {
- "type": "keyword_marker",
- "keywords": ["esimerkki"] <2>
- },
- "finnish_stemmer": {
- "type": "stemmer",
- "language": "finnish"
- }
- },
- "analyzer": {
- "finnish": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "finnish_stop",
- "finnish_keywords",
- "finnish_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[french-analyzer]]
- ===== `french` analyzer
- The `french` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /french_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "french_elision": {
- "type": "elision",
- "articles_case": true,
- "articles": [
- "l", "m", "t", "qu", "n", "s",
- "j", "d", "c", "jusqu", "quoiqu",
- "lorsqu", "puisqu"
- ]
- },
- "french_stop": {
- "type": "stop",
- "stopwords": "_french_" <1>
- },
- "french_keywords": {
- "type": "keyword_marker",
- "keywords": ["Exemple"] <2>
- },
- "french_stemmer": {
- "type": "stemmer",
- "language": "light_french"
- }
- },
- "analyzer": {
- "french": {
- "tokenizer": "standard",
- "filter": [
- "french_elision",
- "lowercase",
- "french_stop",
- "french_keywords",
- "french_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[galician-analyzer]]
- ===== `galician` analyzer
- The `galician` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /galician_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "galician_stop": {
- "type": "stop",
- "stopwords": "_galician_" <1>
- },
- "galician_keywords": {
- "type": "keyword_marker",
- "keywords": ["exemplo"] <2>
- },
- "galician_stemmer": {
- "type": "stemmer",
- "language": "galician"
- }
- },
- "analyzer": {
- "galician": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "galician_stop",
- "galician_keywords",
- "galician_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[german-analyzer]]
- ===== `german` analyzer
- The `german` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /german_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "german_stop": {
- "type": "stop",
- "stopwords": "_german_" <1>
- },
- "german_keywords": {
- "type": "keyword_marker",
- "keywords": ["Beispiel"] <2>
- },
- "german_stemmer": {
- "type": "stemmer",
- "language": "light_german"
- }
- },
- "analyzer": {
- "german": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "german_stop",
- "german_keywords",
- "german_normalization",
- "german_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[greek-analyzer]]
- ===== `greek` analyzer
- The `greek` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /greek_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "greek_stop": {
- "type": "stop",
- "stopwords": "_greek_" <1>
- },
- "greek_lowercase": {
- "type": "lowercase",
- "language": "greek"
- },
- "greek_keywords": {
- "type": "keyword_marker",
- "keywords": ["παράδειγμα"] <2>
- },
- "greek_stemmer": {
- "type": "stemmer",
- "language": "greek"
- }
- },
- "analyzer": {
- "greek": {
- "tokenizer": "standard",
- "filter": [
- "greek_lowercase",
- "greek_stop",
- "greek_keywords",
- "greek_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[hindi-analyzer]]
- ===== `hindi` analyzer
- The `hindi` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /hindi_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "hindi_stop": {
- "type": "stop",
- "stopwords": "_hindi_" <1>
- },
- "hindi_keywords": {
- "type": "keyword_marker",
- "keywords": ["उदाहरण"] <2>
- },
- "hindi_stemmer": {
- "type": "stemmer",
- "language": "hindi"
- }
- },
- "analyzer": {
- "hindi": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "indic_normalization",
- "hindi_normalization",
- "hindi_stop",
- "hindi_keywords",
- "hindi_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[hungarian-analyzer]]
- ===== `hungarian` analyzer
- The `hungarian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /hungarian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "hungarian_stop": {
- "type": "stop",
- "stopwords": "_hungarian_" <1>
- },
- "hungarian_keywords": {
- "type": "keyword_marker",
- "keywords": ["példa"] <2>
- },
- "hungarian_stemmer": {
- "type": "stemmer",
- "language": "hungarian"
- }
- },
- "analyzer": {
- "hungarian": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "hungarian_stop",
- "hungarian_keywords",
- "hungarian_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[indonesian-analyzer]]
- ===== `indonesian` analyzer
- The `indonesian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /indonesian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "indonesian_stop": {
- "type": "stop",
- "stopwords": "_indonesian_" <1>
- },
- "indonesian_keywords": {
- "type": "keyword_marker",
- "keywords": ["contoh"] <2>
- },
- "indonesian_stemmer": {
- "type": "stemmer",
- "language": "indonesian"
- }
- },
- "analyzer": {
- "indonesian": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "indonesian_stop",
- "indonesian_keywords",
- "indonesian_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[irish-analyzer]]
- ===== `irish` analyzer
- The `irish` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /irish_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "irish_elision": {
- "type": "elision",
- "articles": [ "h", "n", "t" ]
- },
- "irish_stop": {
- "type": "stop",
- "stopwords": "_irish_" <1>
- },
- "irish_lowercase": {
- "type": "lowercase",
- "language": "irish"
- },
- "irish_keywords": {
- "type": "keyword_marker",
- "keywords": ["sampla"] <2>
- },
- "irish_stemmer": {
- "type": "stemmer",
- "language": "irish"
- }
- },
- "analyzer": {
- "irish": {
- "tokenizer": "standard",
- "filter": [
- "irish_stop",
- "irish_elision",
- "irish_lowercase",
- "irish_keywords",
- "irish_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[italian-analyzer]]
- ===== `italian` analyzer
- The `italian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /italian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "italian_elision": {
- "type": "elision",
- "articles": [
- "c", "l", "all", "dall", "dell",
- "nell", "sull", "coll", "pell",
- "gl", "agl", "dagl", "degl", "negl",
- "sugl", "un", "m", "t", "s", "v", "d"
- ]
- },
- "italian_stop": {
- "type": "stop",
- "stopwords": "_italian_" <1>
- },
- "italian_keywords": {
- "type": "keyword_marker",
- "keywords": ["esempio"] <2>
- },
- "italian_stemmer": {
- "type": "stemmer",
- "language": "light_italian"
- }
- },
- "analyzer": {
- "italian": {
- "tokenizer": "standard",
- "filter": [
- "italian_elision",
- "lowercase",
- "italian_stop",
- "italian_keywords",
- "italian_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[latvian-analyzer]]
- ===== `latvian` analyzer
- The `latvian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /latvian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "latvian_stop": {
- "type": "stop",
- "stopwords": "_latvian_" <1>
- },
- "latvian_keywords": {
- "type": "keyword_marker",
- "keywords": ["piemērs"] <2>
- },
- "latvian_stemmer": {
- "type": "stemmer",
- "language": "latvian"
- }
- },
- "analyzer": {
- "latvian": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "latvian_stop",
- "latvian_keywords",
- "latvian_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[lithuanian-analyzer]]
- ===== `lithuanian` analyzer
- The `lithuanian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /lithuanian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "lithuanian_stop": {
- "type": "stop",
- "stopwords": "_lithuanian_" <1>
- },
- "lithuanian_keywords": {
- "type": "keyword_marker",
- "keywords": ["pavyzdys"] <2>
- },
- "lithuanian_stemmer": {
- "type": "stemmer",
- "language": "lithuanian"
- }
- },
- "analyzer": {
- "lithuanian": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "lithuanian_stop",
- "lithuanian_keywords",
- "lithuanian_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[norwegian-analyzer]]
- ===== `norwegian` analyzer
- The `norwegian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /norwegian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "norwegian_stop": {
- "type": "stop",
- "stopwords": "_norwegian_" <1>
- },
- "norwegian_keywords": {
- "type": "keyword_marker",
- "keywords": ["eksempel"] <2>
- },
- "norwegian_stemmer": {
- "type": "stemmer",
- "language": "norwegian"
- }
- },
- "analyzer": {
- "norwegian": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "norwegian_stop",
- "norwegian_keywords",
- "norwegian_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[persian-analyzer]]
- ===== `persian` analyzer
- The `persian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /persian_example
- {
- "settings": {
- "analysis": {
- "char_filter": {
- "zero_width_spaces": {
- "type": "mapping",
- "mappings": [ "\\u200C=> "] <1>
- }
- },
- "filter": {
- "persian_stop": {
- "type": "stop",
- "stopwords": "_persian_" <2>
- }
- },
- "analyzer": {
- "persian": {
- "tokenizer": "standard",
- "char_filter": [ "zero_width_spaces" ],
- "filter": [
- "lowercase",
- "arabic_normalization",
- "persian_normalization",
- "persian_stop"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> Replaces zero-width non-joiners with an ASCII space.
- <2> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- [[portuguese-analyzer]]
- ===== `portuguese` analyzer
- The `portuguese` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /portuguese_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "portuguese_stop": {
- "type": "stop",
- "stopwords": "_portuguese_" <1>
- },
- "portuguese_keywords": {
- "type": "keyword_marker",
- "keywords": ["exemplo"] <2>
- },
- "portuguese_stemmer": {
- "type": "stemmer",
- "language": "light_portuguese"
- }
- },
- "analyzer": {
- "portuguese": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "portuguese_stop",
- "portuguese_keywords",
- "portuguese_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[romanian-analyzer]]
- ===== `romanian` analyzer
- The `romanian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /romanian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "romanian_stop": {
- "type": "stop",
- "stopwords": "_romanian_" <1>
- },
- "romanian_keywords": {
- "type": "keyword_marker",
- "keywords": ["exemplu"] <2>
- },
- "romanian_stemmer": {
- "type": "stemmer",
- "language": "romanian"
- }
- },
- "analyzer": {
- "romanian": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "romanian_stop",
- "romanian_keywords",
- "romanian_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[russian-analyzer]]
- ===== `russian` analyzer
- The `russian` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /russian_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "russian_stop": {
- "type": "stop",
- "stopwords": "_russian_" <1>
- },
- "russian_keywords": {
- "type": "keyword_marker",
- "keywords": ["пример"] <2>
- },
- "russian_stemmer": {
- "type": "stemmer",
- "language": "russian"
- }
- },
- "analyzer": {
- "russian": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "russian_stop",
- "russian_keywords",
- "russian_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[sorani-analyzer]]
- ===== `sorani` analyzer
- The `sorani` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /sorani_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "sorani_stop": {
- "type": "stop",
- "stopwords": "_sorani_" <1>
- },
- "sorani_keywords": {
- "type": "keyword_marker",
- "keywords": ["mînak"] <2>
- },
- "sorani_stemmer": {
- "type": "stemmer",
- "language": "sorani"
- }
- },
- "analyzer": {
- "sorani": {
- "tokenizer": "standard",
- "filter": [
- "sorani_normalization",
- "lowercase",
- "sorani_stop",
- "sorani_keywords",
- "sorani_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[spanish-analyzer]]
- ===== `spanish` analyzer
- The `spanish` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /spanish_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "spanish_stop": {
- "type": "stop",
- "stopwords": "_spanish_" <1>
- },
- "spanish_keywords": {
- "type": "keyword_marker",
- "keywords": ["ejemplo"] <2>
- },
- "spanish_stemmer": {
- "type": "stemmer",
- "language": "light_spanish"
- }
- },
- "analyzer": {
- "spanish": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "spanish_stop",
- "spanish_keywords",
- "spanish_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[swedish-analyzer]]
- ===== `swedish` analyzer
- The `swedish` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /swidish_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "swedish_stop": {
- "type": "stop",
- "stopwords": "_swedish_" <1>
- },
- "swedish_keywords": {
- "type": "keyword_marker",
- "keywords": ["exempel"] <2>
- },
- "swedish_stemmer": {
- "type": "stemmer",
- "language": "swedish"
- }
- },
- "analyzer": {
- "swedish": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "swedish_stop",
- "swedish_keywords",
- "swedish_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[turkish-analyzer]]
- ===== `turkish` analyzer
- The `turkish` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /turkish_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "turkish_stop": {
- "type": "stop",
- "stopwords": "_turkish_" <1>
- },
- "turkish_lowercase": {
- "type": "lowercase",
- "language": "turkish"
- },
- "turkish_keywords": {
- "type": "keyword_marker",
- "keywords": ["örnek"] <2>
- },
- "turkish_stemmer": {
- "type": "stemmer",
- "language": "turkish"
- }
- },
- "analyzer": {
- "turkish": {
- "tokenizer": "standard",
- "filter": [
- "apostrophe",
- "turkish_lowercase",
- "turkish_stop",
- "turkish_keywords",
- "turkish_stemmer"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
- <2> This filter should be removed unless there are words which should
- be excluded from stemming.
- [[thai-analyzer]]
- ===== `thai` analyzer
- The `thai` analyzer could be reimplemented as a `custom` analyzer as follows:
- [source,js]
- ----------------------------------------------------
- PUT /thai_example
- {
- "settings": {
- "analysis": {
- "filter": {
- "thai_stop": {
- "type": "stop",
- "stopwords": "_thai_" <1>
- }
- },
- "analyzer": {
- "thai": {
- "tokenizer": "thai",
- "filter": [
- "lowercase",
- "thai_stop"
- ]
- }
- }
- }
- }
- }
- ----------------------------------------------------
- // CONSOLE
- <1> The default stopwords can be overridden with the `stopwords`
- or `stopwords_path` parameters.
|