浏览代码

Docs: Clean doc for agg parameter (#70675)

This adds a heading for `shard_min_doc_count` and merges the paragraphs
for them. I wanted to link to this section earlier today and it wasn't a
"real" section so I couldn't.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Nik Everett 4 年之前
父节点
当前提交
2b9ed7d36f

+ 3 - 4
docs/reference/aggregations/bucket/significantterms-aggregation.asciidoc

@@ -486,12 +486,11 @@ The above aggregation would only return tags which have been found in 10 hits or
 
 Terms that score highly will be collected on a shard level and merged with the terms collected from other shards in a second step. However, the shard does not have the information about the global term frequencies available. The decision if a term is added to a candidate list depends only on the score computed on the shard using local shard frequencies, not the global frequencies of the word. The `min_doc_count` criterion is only applied after merging local terms statistics of all shards. In a way the decision to add the term as a candidate is made without being very _certain_ about if the term will actually reach the required `min_doc_count`. This might cause many (globally) high frequent terms to be missing in the final result if low frequent but high scoring terms populated the candidate lists. To avoid this, the `shard_size` parameter can be increased to allow more candidate terms on the shards. However, this increases memory consumption and network traffic.
 
-`shard_min_doc_count` parameter
-
-The parameter `shard_min_doc_count` regulates the _certainty_ a shard has if the term should actually be added to the candidate list or not with respect to the `min_doc_count`. Terms will only be considered if their local shard frequency within the set is higher than the `shard_min_doc_count`. If your dictionary contains many low frequent words and you are not interested in these (for example misspellings), then you can set the `shard_min_doc_count` parameter to filter out candidate terms on a shard level that will with a reasonable certainty not reach the required `min_doc_count` even after merging the local frequencies. `shard_min_doc_count` is set to `1` per default and has no effect unless you explicitly set it.
-
 
+[[search-aggregations-bucket-significantterms-shard-min-doc-count]]
+===== `shard_min_doc_count`
 
+include::terms-aggregation.asciidoc[tag=min-doc-count]
 
 WARNING: Setting `min_doc_count` to `1` is generally not advised as it tends to return terms that
          are typos or other bizarre curiosities. Finding more than one instance of a term helps

+ 3 - 10
docs/reference/aggregations/bucket/significanttext-aggregation.asciidoc

@@ -393,17 +393,10 @@ This might cause many (globally) high frequent terms to be missing in the final
 the candidate lists. To avoid this, the `shard_size` parameter can be increased to allow more candidate terms on the shards. 
 However, this increases memory consumption and network traffic.
 
-`shard_min_doc_count` parameter
-
-The parameter `shard_min_doc_count` regulates the _certainty_ a shard has if the term should actually be added to the candidate list or 
-not with respect to the `min_doc_count`. Terms will only be considered if their local shard frequency within the set is higher than the 
-`shard_min_doc_count`. If your dictionary contains many low frequent words and you are not interested in these (for example misspellings), 
-then you can set the `shard_min_doc_count` parameter to filter out candidate terms on a shard level that will with a reasonable certainty 
-not reach the required `min_doc_count` even after merging the local frequencies. `shard_min_doc_count` is set to `1` per default and has 
-no effect unless you explicitly set it.
-
-
+[[search-aggregations-bucket-significanttext-shard-min-doc-count]]
+====== `shard_min_doc_count`
 
+include::terms-aggregation.asciidoc[tag=min-doc-count]
 
 WARNING: Setting `min_doc_count` to `1` is generally not advised as it tends to return terms that
          are typos or other bizarre curiosities. Finding more than one instance of a term helps

+ 4 - 2
docs/reference/aggregations/bucket/terms-aggregation.asciidoc

@@ -386,10 +386,12 @@ The above aggregation would only return tags which have been found in 10 hits or
 
 Terms are collected and ordered on a shard level and merged with the terms collected from other shards in a second step. However, the shard does not have the information about the global document count available. The decision if a term is added to a candidate list depends only on the order computed on the shard using local shard frequencies. The `min_doc_count` criterion is only applied after merging local terms statistics of all shards. In a way the decision to add the term as a candidate is made without being very _certain_ about if the term will actually reach the required `min_doc_count`. This might cause many (globally) high frequent terms to be missing in the final result if low frequent terms populated the candidate lists. To avoid this, the `shard_size` parameter can be increased to allow more candidate terms on the shards. However, this increases memory consumption and network traffic.
 
-`shard_min_doc_count` parameter
+[[search-aggregations-bucket-terms-shard-min-doc-count]]
+===== `shard_min_doc_count`
 
+// tag::min-doc-count[]
 The parameter `shard_min_doc_count` regulates the _certainty_ a shard has if the term should actually be added to the candidate list or not with respect to the `min_doc_count`. Terms will only be considered if their local shard frequency within the set is higher than the `shard_min_doc_count`. If your dictionary contains many low frequent terms and you are not interested in those (for example misspellings), then you can set the `shard_min_doc_count` parameter to filter out candidate terms on a shard level that will with a reasonable certainty not reach the required `min_doc_count` even after merging the local counts. `shard_min_doc_count` is set to `0` per default and has no effect unless you explicitly set it.
-
+// end::min-doc-count[]
 
 
 NOTE:    Setting `min_doc_count`=`0` will also return buckets for terms that didn't match any hit. However, some of