|
@@ -393,17 +393,10 @@ This might cause many (globally) high frequent terms to be missing in the final
|
|
the candidate lists. To avoid this, the `shard_size` parameter can be increased to allow more candidate terms on the shards.
|
|
the candidate lists. To avoid this, the `shard_size` parameter can be increased to allow more candidate terms on the shards.
|
|
However, this increases memory consumption and network traffic.
|
|
However, this increases memory consumption and network traffic.
|
|
|
|
|
|
-`shard_min_doc_count` parameter
|
|
|
|
-
|
|
|
|
-The parameter `shard_min_doc_count` regulates the _certainty_ a shard has if the term should actually be added to the candidate list or
|
|
|
|
-not with respect to the `min_doc_count`. Terms will only be considered if their local shard frequency within the set is higher than the
|
|
|
|
-`shard_min_doc_count`. If your dictionary contains many low frequent words and you are not interested in these (for example misspellings),
|
|
|
|
-then you can set the `shard_min_doc_count` parameter to filter out candidate terms on a shard level that will with a reasonable certainty
|
|
|
|
-not reach the required `min_doc_count` even after merging the local frequencies. `shard_min_doc_count` is set to `1` per default and has
|
|
|
|
-no effect unless you explicitly set it.
|
|
|
|
-
|
|
|
|
-
|
|
|
|
|
|
+[[search-aggregations-bucket-significanttext-shard-min-doc-count]]
|
|
|
|
+====== `shard_min_doc_count`
|
|
|
|
|
|
|
|
+include::terms-aggregation.asciidoc[tag=min-doc-count]
|
|
|
|
|
|
WARNING: Setting `min_doc_count` to `1` is generally not advised as it tends to return terms that
|
|
WARNING: Setting `min_doc_count` to `1` is generally not advised as it tends to return terms that
|
|
are typos or other bizarre curiosities. Finding more than one instance of a term helps
|
|
are typos or other bizarre curiosities. Finding more than one instance of a term helps
|