|
@@ -146,6 +146,16 @@ the percentiles. It is effectively trading accuracy for memory savings. The
|
|
|
exact level of inaccuracy is difficult to generalize, since it depends on your
|
|
|
data distribution and volume of data being aggregated
|
|
|
|
|
|
+The following chart shows the relative error on a uniform distribution depending
|
|
|
+on the number of collected values and the requested percentile:
|
|
|
+
|
|
|
+image:images/percentiles_error.png[]
|
|
|
+
|
|
|
+It shows how precision is better for extreme percentiles. The reason why error diminishes
|
|
|
+for large number of values is that the law of large numbers makes the distribution of
|
|
|
+values more and more uniform and the t-digest tree can do a better job at summarizing
|
|
|
+it. It would not be the case on more skewed distributions.
|
|
|
+
|
|
|
==== Compression
|
|
|
|
|
|
Approximate algorithms must balance memory utilization with estimation accuracy.
|