cumulative-cardinality-aggregation.asciidoc 6.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[search-aggregations-pipeline-cumulative-cardinality-aggregation]]
  4. === Cumulative cardinality aggregation
  5. ++++
  6. <titleabbrev>Cumulative cardinality</titleabbrev>
  7. ++++
  8. A parent pipeline aggregation which calculates the Cumulative Cardinality in a parent histogram (or date_histogram)
  9. aggregation. The specified metric must be a cardinality aggregation and the enclosing histogram
  10. must have `min_doc_count` set to `0` (default for `histogram` aggregations).
  11. The `cumulative_cardinality` agg is useful for finding "total new items", like the number of new visitors to your
  12. website each day. A regular cardinality aggregation will tell you how many unique visitors came each day, but doesn't
  13. differentiate between "new" or "repeat" visitors. The Cumulative Cardinality aggregation can be used to determine
  14. how many of each day's unique visitors are "new".
  15. ==== Syntax
  16. A `cumulative_cardinality` aggregation looks like this in isolation:
  17. [source,js]
  18. --------------------------------------------------
  19. {
  20. "cumulative_cardinality": {
  21. "buckets_path": "my_cardinality_agg"
  22. }
  23. }
  24. --------------------------------------------------
  25. // NOTCONSOLE
  26. [[cumulative-cardinality-params]]
  27. .`cumulative_cardinality` Parameters
  28. [options="header"]
  29. |===
  30. |Parameter Name |Description |Required |Default Value
  31. |`buckets_path` |The path to the cardinality aggregation we wish to find the cumulative cardinality for (see <<buckets-path-syntax>> for more
  32. details) |Required |
  33. |`format` |format to apply to the output value of this aggregation |Optional |`null`
  34. |===
  35. The following snippet calculates the cumulative cardinality of the total daily `users`:
  36. [source,console]
  37. --------------------------------------------------
  38. GET /user_hits/_search
  39. {
  40. "size": 0,
  41. "aggs": {
  42. "users_per_day": {
  43. "date_histogram": {
  44. "field": "timestamp",
  45. "calendar_interval": "day"
  46. },
  47. "aggs": {
  48. "distinct_users": {
  49. "cardinality": {
  50. "field": "user_id"
  51. }
  52. },
  53. "total_new_users": {
  54. "cumulative_cardinality": {
  55. "buckets_path": "distinct_users" <1>
  56. }
  57. }
  58. }
  59. }
  60. }
  61. }
  62. --------------------------------------------------
  63. // TEST[setup:user_hits]
  64. <1> `buckets_path` instructs this aggregation to use the output of the `distinct_users` aggregation for the cumulative cardinality
  65. And the following may be the response:
  66. [source,console-result]
  67. --------------------------------------------------
  68. {
  69. "took": 11,
  70. "timed_out": false,
  71. "_shards": ...,
  72. "hits": ...,
  73. "aggregations": {
  74. "users_per_day": {
  75. "buckets": [
  76. {
  77. "key_as_string": "2019-01-01T00:00:00.000Z",
  78. "key": 1546300800000,
  79. "doc_count": 2,
  80. "distinct_users": {
  81. "value": 2
  82. },
  83. "total_new_users": {
  84. "value": 2
  85. }
  86. },
  87. {
  88. "key_as_string": "2019-01-02T00:00:00.000Z",
  89. "key": 1546387200000,
  90. "doc_count": 2,
  91. "distinct_users": {
  92. "value": 2
  93. },
  94. "total_new_users": {
  95. "value": 3
  96. }
  97. },
  98. {
  99. "key_as_string": "2019-01-03T00:00:00.000Z",
  100. "key": 1546473600000,
  101. "doc_count": 3,
  102. "distinct_users": {
  103. "value": 3
  104. },
  105. "total_new_users": {
  106. "value": 4
  107. }
  108. }
  109. ]
  110. }
  111. }
  112. }
  113. --------------------------------------------------
  114. // TESTRESPONSE[s/"took": 11/"took": $body.took/]
  115. // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
  116. // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
  117. Note how the second day, `2019-01-02`, has two distinct users but the `total_new_users` metric generated by the
  118. cumulative pipeline agg only increments to three. This means that only one of the two users that day were
  119. new, the other had already been seen in the previous day. This happens again on the third day, where only
  120. one of three users is completely new.
  121. ==== Incremental cumulative cardinality
  122. The `cumulative_cardinality` agg will show you the total, distinct count since the beginning of the time period
  123. being queried. Sometimes, however, it is useful to see the "incremental" count. Meaning, how many new users
  124. are added each day, rather than the total cumulative count.
  125. This can be accomplished by adding a `derivative` aggregation to our query:
  126. [source,console]
  127. --------------------------------------------------
  128. GET /user_hits/_search
  129. {
  130. "size": 0,
  131. "aggs": {
  132. "users_per_day": {
  133. "date_histogram": {
  134. "field": "timestamp",
  135. "calendar_interval": "day"
  136. },
  137. "aggs": {
  138. "distinct_users": {
  139. "cardinality": {
  140. "field": "user_id"
  141. }
  142. },
  143. "total_new_users": {
  144. "cumulative_cardinality": {
  145. "buckets_path": "distinct_users"
  146. }
  147. },
  148. "incremental_new_users": {
  149. "derivative": {
  150. "buckets_path": "total_new_users"
  151. }
  152. }
  153. }
  154. }
  155. }
  156. }
  157. --------------------------------------------------
  158. // TEST[setup:user_hits]
  159. And the following may be the response:
  160. [source,console-result]
  161. --------------------------------------------------
  162. {
  163. "took": 11,
  164. "timed_out": false,
  165. "_shards": ...,
  166. "hits": ...,
  167. "aggregations": {
  168. "users_per_day": {
  169. "buckets": [
  170. {
  171. "key_as_string": "2019-01-01T00:00:00.000Z",
  172. "key": 1546300800000,
  173. "doc_count": 2,
  174. "distinct_users": {
  175. "value": 2
  176. },
  177. "total_new_users": {
  178. "value": 2
  179. }
  180. },
  181. {
  182. "key_as_string": "2019-01-02T00:00:00.000Z",
  183. "key": 1546387200000,
  184. "doc_count": 2,
  185. "distinct_users": {
  186. "value": 2
  187. },
  188. "total_new_users": {
  189. "value": 3
  190. },
  191. "incremental_new_users": {
  192. "value": 1.0
  193. }
  194. },
  195. {
  196. "key_as_string": "2019-01-03T00:00:00.000Z",
  197. "key": 1546473600000,
  198. "doc_count": 3,
  199. "distinct_users": {
  200. "value": 3
  201. },
  202. "total_new_users": {
  203. "value": 4
  204. },
  205. "incremental_new_users": {
  206. "value": 1.0
  207. }
  208. }
  209. ]
  210. }
  211. }
  212. }
  213. --------------------------------------------------
  214. // TESTRESPONSE[s/"took": 11/"took": $body.took/]
  215. // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
  216. // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]