|
@@ -7,7 +7,7 @@ An aggregation that returns interesting or unusual occurrences of terms in a set
|
|
|
* Suggesting "H5N1" when users search for "bird flu" in text
|
|
|
* Identifying the merchant that is the "common point of compromise" from the transaction history of credit card owners reporting loss
|
|
|
* Suggesting keywords relating to stock symbol $ATI for an automated news classifier
|
|
|
-* Spotting the fraudulent doctor who is diagnosing more than his fair share of whiplash injuries
|
|
|
+* Spotting the fraudulent doctor who is diagnosing more than their fair share of whiplash injuries
|
|
|
* Spotting the tire manufacturer who has a disproportionate number of blow-outs
|
|
|
|
|
|
In all these cases the terms being selected are not simply the most popular terms in a set.
|
|
@@ -388,7 +388,7 @@ By default this produces a score greater than zero and less than one.
|
|
|
|
|
|
The benefit of this heuristic is that the scoring logic is simple to explain to anyone familiar with a "per capita" statistic. However, for fields with high cardinality there is a tendency for this heuristic to select the rarest terms such as typos that occur only once because they score 1/1 = 100%.
|
|
|
|
|
|
-It would be hard for a seasoned boxer to win a championship if the prize was awarded purely on the basis of percentage of fights won - by these rules a newcomer with only one fight under his belt would be impossible to beat.
|
|
|
+It would be hard for a seasoned boxer to win a championship if the prize was awarded purely on the basis of percentage of fights won - by these rules a newcomer with only one fight under their belt would be impossible to beat.
|
|
|
Multiple observations are typically required to reinforce a view so it is recommended in these cases to set both `min_doc_count` and `shard_min_doc_count` to a higher value such as 10 in order to filter out the low-frequency terms that otherwise take precedence.
|
|
|
|
|
|
[source,js]
|