Дрво: 162763bd13

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.90

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

130465-tp-merge-avail

2.0

2.1

2.2

2.3

2.4

2024/02/08/ES-6685-transport-action-changes-final

2025/04/18/netty-flow-control-tests

2gavy-patch-1

5.0

5.1

5.2

5.3

5.4

5.5

5.6

6.0

6.1

6.2

6.3

6.4

6.5

6.6

6.7

6.8

7.0

7.1

7.10

7.11

7.12

7.13

7.14

7.15

7.16

7.17

7.17-ci-pipelines-add-ubuntu-2404

7.2

7.3

7.4

7.5

7.6

7.7

7.8

7.9

8.0

8.1

8.10

8.11

8.12

8.13

8.14

8.15

8.16

8.17

8.18

8.19

8.2

8.3

8.4

8.5

8.6

8.7

8.8

8.9

9.0

9.1

9.2

AndyHunt66-patch-1

CCR_doc-followingClusterEmphasize

ChrisHegarty-patch-1

JohannesMahne-patch-1

JohannesMahne-patch-2

JohannesMahne-patch-3

JohannesMahne-patch-4

JohannesMahne-patch-5

Leaf-Lin-89619-doc-highlight

Leaf-Lin-license

Leaf-Lin-patch-1

Leaf-Lin-patch-2

Leaf-Lin-patch-3

Leaf-Lin-uni-directional-CCR-DR

Leaf-adding-autoscale-policy-details

LucaWintergerst-patch-1

PhaedrusTheGreek-patch-2

TheRiffRafi-patch-1

ZachBrisson-Elastic-patch-1

ZachBrisson-Elastic-patch-2

add-amazonlinux-platform-support

add-configuration-cache-evaluation-script

add-csp-vulnerabilities-index-privileges

add-fips-docker-image

add-forcemerge-ks

add-linux-aarch64-to-bwc-setup-plugin

add-linux-aarch64-to-bwc-setup-plugin-7.x

add-manage-inference

add-percolator-indice-roles

add-periodic-ea-coverage-pipeline

add-periodic-ea-coverage-pipeline-painless

add-support-for-getting-last-element-of-array

add-wolfi-ess

addParametersToPerformInference

addPrivilegesForCspPlugin

add_min_score_linear_retriever

add_specific_error

albertzaharovits-doc-dls-fls-remote-index-permitted

albertzaharovits-doc-dls-fls-remote-index-permitted-1

antrodrob-patch-1

apm-add-create_index-permission

apply-precommit-checks-for-qa

approksiu-patch-1

artem/8.x-add-mongodb-connector-known-issue

asmith-elastic-patch-1

avoidNPEInAdaptiveallocations

azure_upgrade6

backport-86223

backport/7.17/pr-116172

backport/8.17/pr-11938

backport/8.x/pr-122823_pr-124650

backport/9.1/pr-133204

backport/undo_transient_settings

bczifra-add-rest-event-types-link

benironside-typo-fix

bernarmn-mnb-patch-1

berto-lstc-patch-1

bk-cost-opt-combined

bk-cost-opt-sizing

bk-cost-opt-testclusters

blobcachemetrics

bmorelli25-patch-1

breakup-release-tests

breskeby/fix-windows-invalid-java-home-paths

build-against-fixed-build-ids

build-bwc-without-es-runtime-7x

buildkite-migration

buildkite-migration-717

buildkite-migration-periodic-arm

buildkite-sandbox

buildkite-settings-tweaks

buildkite-test-more-parallel

bwc-explicit-branches-version

categorize_text_ga

cbuescher-ES-7669

cc5b2b7dab2b4b7b98a3a3eae71e8f98-debugging-branch

cfs-normal-readadvice

charlotte-connectorsRN-8.18.8

charlotte-connectorsRN-8.19.5

charlotte-connectorsRN-9.0.8

charlotte-connectorsRN-9.1.5

ci-pipelines-add-debian-12

ci-pipelines-add-oraclelinux-9

ci-pipelines-add-ubuntu-2404

ci-pipelines-add-ubuntu-2404-aarch64

ckauf-date-ingest-year-1

cleydyr-patch-1

codingogre-patch-1

combat-lucene-10-0-0

commitinfo

compat_rest_api

copilot/fix-19f5c557-8534-4d19-8370-d71b44b91842

cps/prohibit-_project-mappings

crisdarocha-patch-1

crispybacon-patch-2

custom-inference-service

custom_generated_location

cuvs-snapshot

cuvs-snapshot-pr-test-01

damien-renier-elastic-patch-1

danajuratoni-patch-1

danajuratoni-patch-2

danajuratoni-patch-3

debadair-read-only

debug-os-packaging-config-time

debug-pipeline-fs-issue

default-config-first

defaultEISEndpoint

defaultEndpointsFF

denisgils-patch-1

deprecateTaskSettings

detached

detached2

detached3

dharada-patch-1

dharada-patch-2

direct-io

disable-check-pending-flushes

dmitrii/test-mtls

do-not-delete_legacy-docs

docker_arm_beats_fix

dockerfile_extension

docs-transform-job-ilm-limitation

docs/cluster-health-link-fix

docs__changelog-90302

docs__time-series-audit

dynamic-template-unmatch-mapping-type

ecs-test

eis-text-embedding-task-type

eisUnifiedMergeContentOptional

elasticsearch-teamcity

enhancement/add-search-load-autoscaling-transport-version

enhancement/super-thin-indexing-poc

enhancement/super-thin-indexing-poc-second

enhancement/task-tracking-for-scaling-executor

enroll-as-a-param

enterprise-search/rex-demo

entitlement-init-debugging

entitlements/apm_agent

entity-store-permissions

es-12487

esql/agg-func-mapping

esql/maxString

esql/random-function

esql/search

essodjolo-patch-1

failure-store-cft

failure-store-naming-scheme

fall-back-jdk17-images

fast-checkout-agent

fdartayre-patch-1

feat/extend_kibana_system_osquery_manager

feat/sql-multivalue

feature-fleet-agent-policies-enrich

feature/0-copy-indexing

feature/84811-disk-usage-indicator

feature/PCA

feature/apm-integration

feature/desired-balance-allocator

feature/esql

feature/esql_case_insensitive

feature/esql_search

feature/logsdb-ignore-dynamic-beyond-limit

feature/repository-s3-sdk-v2-upgrade

feature/semantic-text

fix-buildkite-period-template

fix-cc-buildfinished-hook

fix-configuration-cache-validation-build

fix-debmetadatatests

fix-docker-packaging-tests

fix-fips-rpm-checksums

fix-indenting-code-block

fix-ironbank-packaging-tests

fix-jdk-download-for-linux-arm64

fix-licenseheaders-func-test-on-win

fix-lucene-bwc-test

fix-packaging-tests

fix-stop-old-elasticsearch-fixture

fix-testcluster-jvmoptions-on-windows

fix/84927-significant-texts-yaml-test

fix/97544

fix/connector-put

fixEmptyParsedModels

fleet/fleet-secrets-followup

freakingid-ilm-action-forcemerge

fwc-pipeline-fixes

fwc-tests-task-registration-fix

geekpete-fleet-experimental-for-viewer-role-1

geekpete-patch-1

georgewallace-patch-1

georgewallace-patch-2

georgewallace-patch-3

georgewallace-patch-4

georgewallace-patch-6

giladgal-patch-1

giladgal-patch-2

git-fetch-recurse-submodule

gpu-benchmarks

gradle-update-82

halfMlCpuOnServerless

herrBez-patch-1

herrBez-patch-2

id-is-string

ignore-BwcVersionsTest-on-aarch

ignore-older-version-bwc-tests-with-aarch64

ignore-older-version-bwc-tests-with-aarch64-7.x

ilm-error-not-bootstrapped-time-series-index

ilm-policy-name

index-min-primary-term-generation

integer_precise_convert

intro-javadoc-plugin

introduce-build-benchmark-tests

jalogisch-patch-1

jalogisch-patch-2

jamie-wilson88-patch-1

jbc-debug

jbc-debug-8.18

jbc-debug-9

jbc-debug-main

jdk21/this-escape-gatewaymetastate

jeanfabrice-docpatch-1

jmceniery-patch-1

joshFive-patch-1

jsevidal13-patch-1

juliaElastic-patch-1

kilfoyle-patch-1

kilfoyle-patch-2

kilfoyle-remove-frozen-tier-warning

ldematte/internal-engine-change

ldematte/non-semver-nodeinfo

leemthompo-patch-1

leemthompo-patch-2

leemthompo-patch-3

leemthompo-patch-4

leemthompo-patch-5

leemthompo-patch-filter

leemthompo-patch-flattened

leemthompo-test

lhirlimann-patch-1

lhirlimann-patch-2

log4j_2_25_1

long-term-storage

lucene_snapshot

lucene_snapshot_10_2

lucene_snapshot_10_2_1-SNAPSHOT

lucene_snapshot_10_3

lucene_snapshot_old

lukewhiting-bk-test-run

main

make-distro-download-cc-compatible

marks-branch

mb-es-monitoring-missing-fields

mc-fix-license-tool--build

merlixelastic-patch-1

metering/updates_by_doc

metering_ra_ingested_esmain

metrics_prototype

metricsdb

migrate-primary-term-generation

minscore-linear

ml-eis-integration

ml-pytorch-oom

mm/heap-defaults

mrklaney-patch-1

mrklaney-patch-2

ms-graph-thirdparty-tests

mute-65048-master

mute-flaky-test-40-range-float-range

nathandh22-patch-1

niels-bk-atomic-index-rename-niels

niels-bk-component-template-refactor

niels-bk-force-merge-noop

niels-bk-geoip-fix

niels-bk-merge-policy-parsing

niels-bk-reserved-state-string

niels-bk-skip-ds-reference-check

non-issue/ES-11323-upload-pool

non-issue/ES-11457-hollow-searches-always

numeric-bugfix

octaavio-patch-1

octaavio-patch-1-1

ocuil-patch-1

online-prewarming-stateless

optimize-functional

packaging-docker-parallel

packer-cache-updates

patch-1

patch-revert-readiness-on-filesettings-106437

patch-serverless

patch/incident-981-serverless-da50e9c11

patch/jdk-downgrade

patch/revert-check-for-file-settings-on-readiness

patch/serverless-2025-01-07

patch/serverless-fix

patch/serverless-fix-outdated

patch/test2

periodic-test-failure-debug

perrinal-patch-1

philippkahr-patch-monitoringstats

pjhampton/update-endpoint-data-streams

plaid

plaid-null-safe

polish-java-ea-pipeline

port-repository-s3-to-newtestframework

ppf2-ConcurrentMergeScheduler-forcemerge

ppf2-aggregations-ccs-leak-7.15.0

ppf2-aws-bedrock

ppf2-clarify-echo-command-usage

ppf2-clarify-meta

ppf2-improve-slack-setup

ppf2-improve-slack-setup-1

ppf2-patch-1

ppf2-patch-2

ppf2-patch-3

ppf2-transport-profile-settings

pr-124758

pr-benchmark-test-2

pr-linking-test2

prdoyle-patch-1

predogma-patch-1

prmt-actions

processor_set_docs_addition

prodfiler

promote_es_serverless

reakaleek-patch-1

record-anomaly-detection-messages

redcinelli-patch-1

refactorMlServerlessAutoscaling

reindex_v2

release-notes-7-17-20

release-notes-8-17-0

release-notes-8-18-8

release-notes-8-19-5

release-notes-9-0-8

release-notes-9-1-5

release-notes-fp-8-15-3-main

release-notes-fp-8-15-4-main

release-notes-fp-8-15-5-main

release-notes-fp-8-16-0-main

release-notes-fp-8-16-1-main

release-notes-fp-8-16-3-main

release-notes-fp-8-16-4-main

release-notes-fp-8-17-0-main

release-notes-fp-8-17-1-main

release-notes-fp-8-17-2-main

release-notes-fp-8-18-0-main

release-notes-fp-8-18-7-8-19

remote-cluster-security-extension

remote_cluster_settings_from_docs_content

remove-apm-user

remove-static-buildparams

renovate/8.16-wolfi-versioned

renovate/8.19-wolfi-versioned

renovate/8.x-wolfi-(versioned)

renovate/9.1-wolfi-versioned

renovate/main-docker.elastic.co-elasticsearch-elasticsearch-9.x

renovate/main-wolfi-versioned

replace-non-whitespaces

resource-manager-with-memory

revert-100369-fix-idp-fixture-issue

revert-107107-iter-count-keystore-wrapper

revert-112112-revert-112028-apm-data-fix-dlm

revert-114805-bp114636

revert-116061-delay-instantiating-next-phase

revert-116339-index_stats_enhancement

revert-116582

revert-117933-enhancement/change-logging-field

revert-120951-revert-118991-elastic-connectors-system-index-descriptors

revert-121119-revert-120168-enhancement/es-9724-reduce-data-loss-system-indices

revert-121120-revert-120566-enhancement/es-9724-reduce-data-loss-system-indices-8x

revert-128504-SEARCH-921-add-l-2-norm-normalization-support-to-linear-retriever

revert-134454-disposer-entitlements

revert-67890-mute-67889-master

revert-72083-refactor-shard-level-snapshot-api

revert-77491-Leaf-Lin-patch-2

revert-87887-master

revert-90122-support-ignore_malformed-in-boolean-field

revert-90950-fix/no_subobjects_fieldname_validation

revert-91519-tsds-remove-preview

revert-92363-mute-find-structure-doctest

revert-93076-transportversion-versionedwriteable

revert-97821-ent-search/add-bwc-tests

revert-98631-testfix/testSlicingBehaviourForParallelCollection

rework-exclusive-metricbeatsrepo

richiejarvis-patch-2

robb3rt-patch-1

robin13-patch-1

rolling-upgrade-cc-compatibility

rose

rseldner-patch-1

rseldner-patch-1-1

rseldner-patch-1-2

rseldner-patch-2

rseldner-patch-3

rseldner-patch-4

rseldner-patch-5

run-as-admin

runtime-fields-enrich

runtime/runtime-to-index

rwaight-patch-2

s2d1-index-expression-rewriter

samx/feature/multi-project/geoip

seanstory/chat-index-templates

seanstory/increase-mapping-field-meta-char-limit

serverless-qa-deploy

serverless-submodule-lgc

servers

service_account_test_artifact

set-user-profile-apis-to-master

share-path-ignore-unkown-shard

sherry-ger-search-app-api-elser-v2

sherry-ger-search-template-mustache-list

sherry-ger-search-template-mustache-list-main

sieve-cache

sigterm-cleanup

simplify-distribution-download-configurations

sm-mappings-moved-codeowners

smaller-instances

smh

space-time-aug-2025/atomic-index-rename

spacetime_improve_balancer

stateless-debug-logging

stateless-upgrade-testing

stefnestor-patch-1

stefnestor-patch-2

support-buildkite-in-build-scans

synonyms_management_api

szabosteve/eis-def-semtex

table-alignment

tailorzed-patch-1

tailorzed-patch-2

tailorzed-patch-3

tanja-milicic-patch-1

tanja-milicic-patch-2

tarekziade/tikachanges

task-tracking-scaling-executors

teamcity

terms_aggs_merge_reduce

test-branch-colings86

test-cluster-setup-fix-debug

test-distribution

test-ecs-failure-notification

test-fixture-with-docker-compose-v2

test-teamcity

test_branch

testing-linked-prs

text-mapping-null-bug

thumbnail-processor

toby-sutor-patch-1

toby-sutor-patch-2

tracing/prevent_context_propagation

track-internal-service-metrics-by-primaryterm-generation

tsdb-hash-once

tsds-docs-fix

turnUpTheChill-named-nested

unmute-testHistoryIsWrittenWithSuccess

unset-available-processors-for-tests

update-discovery-node-constructor

update-es-upstream

update-event-type

update-krb5kdc-fixture-ubuntu-base

update-pinned-docs-8.19

update-stack-monitoring

updatecli_main_9bc32f73c7425e5d7d6a3e394780b3da48a426b42465b81f2ee642f0d3fe10e3

updatecli_main_ironbank/template

updatecli_main_ironbank_template

use-add-primary-term-generation-listener

use-ubuntu24-java-periodic

weighted-token-utils

woodywalton-patch-1

xcontent/numeric_copy

yomduf-patch-1

mapped_pages:

https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji-tokenizer.html ---

kuromoji_tokenizer [analysis-kuromoji-tokenizer]

The kuromoji_tokenizer accepts the following settings:

mode : The tokenization mode determines how the tokenizer handles compound and unknown words. It can be set to:

normal : Normal segmentation, no decomposition for compounds. Example output:

```
関西国際空港
アブラカダブラ
```

search : Segmentation geared towards search. This includes a decompounding process for long nouns, also including the full compound token as a synonym. Example output:

```
関西, 関西国際空港, 国際, 空港
アブラカダブラ
```

extended : Extended mode outputs unigrams for unknown words. Example output:

```
関西, 関西国際空港, 国際, 空港
ア, ブ, ラ, カ, ダ, ブ, ラ
```

discard_punctuation : Whether punctuation should be discarded from the output. Defaults to true.

lenient : Whether the user_dictionary should be deduplicated on the provided text. False by default causing duplicates to generate an error.

user_dictionary : The Kuromoji tokenizer uses the MeCab-IPADIC dictionary by default. A user_dictionary may be appended to the default dictionary. The dictionary should have the following CSV format:

<text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag>

As a demonstration of how the user dictionary can be used, save the following dictionary to $ES_HOME/config/userdict_ja.txt:

東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞

You can also inline the rules directly in the tokenizer definition using the user_dictionary_rules option:

PUT kuromoji_sample
{
  "settings": {
    "index": {
      "analysis": {
        "tokenizer": {
          "kuromoji_user_dict": {
            "type": "kuromoji_tokenizer",
            "mode": "extended",
            "user_dictionary_rules": ["東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞"]
          }
        },
        "analyzer": {
          "my_analyzer": {
            "type": "custom",
            "tokenizer": "kuromoji_user_dict"
          }
        }
      }
    }
  }
}

nbest_cost/nbest_examples : Additional expert user parameters nbest_cost and nbest_examples can be used to include additional tokens that are most likely according to the statistical model. If both parameters are used, the largest number of both is applied.

nbest_cost : The nbest_cost parameter specifies an additional Viterbi cost. The KuromojiTokenizer will include all tokens in Viterbi paths that are within the nbest_cost value of the best path.

nbest_examples : The nbest_examples can be used to find a nbest_cost value based on examples. For example, a value of /箱根山-箱根/成田空港-成田/ indicates that in the texts, 箱根山 (Mt. Hakone) and 成田空港 (Narita Airport) we’d like a cost that gives is us 箱根 (Hakone) and 成田 (Narita).

Then create an analyzer as follows:

PUT kuromoji_sample
{
  "settings": {
    "index": {
      "analysis": {
        "tokenizer": {
          "kuromoji_user_dict": {
            "type": "kuromoji_tokenizer",
            "mode": "extended",
            "discard_punctuation": "false",
            "user_dictionary": "userdict_ja.txt",
            "lenient": "true"
          }
        },
        "analyzer": {
          "my_analyzer": {
            "type": "custom",
            "tokenizer": "kuromoji_user_dict"
          }
        }
      }
    }
  }
}

GET kuromoji_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "東京スカイツリー"
}

The above analyze request returns the following:

{
  "tokens" : [ {
    "token" : "東京",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "スカイツリー",
    "start_offset" : 2,
    "end_offset" : 8,
    "type" : "word",
    "position" : 1
  } ]
}

discard_compound_token : Whether original compound tokens should be discarded from the output with search mode. Defaults to false. Example output with search or extended mode and this option true:

```
関西, 国際, 空港
```

::::{note} If a text contains full-width characters, the kuromoji_tokenizer tokenizer can produce unexpected tokens. To avoid this, add the icu_normalizer character filter to your analyzer. See Normalize full-width characters. ::::

analysis-kuromoji-tokenizer.md 4.8 KB Историја Датотека

kuromoji_tokenizer [analysis-kuromoji-tokenizer]

analysis-kuromoji-tokenizer.md 4.8 KB

Историја Датотека