adjacency-matrix-aggregation.asciidoc 4.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152
  1. [[search-aggregations-bucket-adjacency-matrix-aggregation]]
  2. === Adjacency matrix aggregation
  3. ++++
  4. <titleabbrev>Adjacency matrix</titleabbrev>
  5. ++++
  6. A bucket aggregation returning a form of {wikipedia}/Adjacency_matrix[adjacency matrix].
  7. The request provides a collection of named filter expressions, similar to the `filters` aggregation
  8. request.
  9. Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.
  10. Given filters named `A`, `B` and `C` the response would return buckets with the following names:
  11. [options="header"]
  12. |=======================
  13. | h|A h|B h|C
  14. h|A |A |A&B |A&C
  15. h|B | |B |B&C
  16. h|C | | |C
  17. |=======================
  18. The intersecting buckets e.g `A&C` are labelled using a combination of the two filter names with a default separator
  19. of `&`. Note that the response does not also include a `C&A` bucket as this would be the
  20. same set of documents as `A&C`. The matrix is said to be _symmetric_ so we only return half of it. To do this we sort
  21. the filter name strings and always use the lowest of a pair as the value to the left of the separator.
  22. [[adjacency-matrix-agg-ex]]
  23. ==== Example
  24. The following `interactions` aggregation uses `adjacency_matrix` to determine
  25. which groups of individuals exchanged emails.
  26. [source,console,id=adjacency-matrix-aggregation-example]
  27. --------------------------------------------------
  28. PUT emails/_bulk?refresh
  29. { "index" : { "_id" : 1 } }
  30. { "accounts" : ["hillary", "sidney"]}
  31. { "index" : { "_id" : 2 } }
  32. { "accounts" : ["hillary", "donald"]}
  33. { "index" : { "_id" : 3 } }
  34. { "accounts" : ["vladimir", "donald"]}
  35. GET emails/_search
  36. {
  37. "size": 0,
  38. "aggs" : {
  39. "interactions" : {
  40. "adjacency_matrix" : {
  41. "filters" : {
  42. "grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }},
  43. "grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }},
  44. "grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }}
  45. }
  46. }
  47. }
  48. }
  49. }
  50. --------------------------------------------------
  51. The response contains buckets with document counts for each filter and
  52. combination of filters. Buckets with no matching documents are excluded from the
  53. response.
  54. [source,console-result]
  55. --------------------------------------------------
  56. {
  57. "took": 9,
  58. "timed_out": false,
  59. "_shards": ...,
  60. "hits": ...,
  61. "aggregations": {
  62. "interactions": {
  63. "buckets": [
  64. {
  65. "key":"grpA",
  66. "doc_count": 2
  67. },
  68. {
  69. "key":"grpA&grpB",
  70. "doc_count": 1
  71. },
  72. {
  73. "key":"grpB",
  74. "doc_count": 2
  75. },
  76. {
  77. "key":"grpB&grpC",
  78. "doc_count": 1
  79. },
  80. {
  81. "key":"grpC",
  82. "doc_count": 1
  83. }
  84. ]
  85. }
  86. }
  87. }
  88. --------------------------------------------------
  89. // TESTRESPONSE[s/"took": 9/"took": $body.took/]
  90. // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
  91. // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
  92. [role="child_attributes"]
  93. [[adjacency-matrix-agg-params]]
  94. ==== Parameters
  95. `filters`::
  96. (Required, object)
  97. Filters used to create buckets.
  98. +
  99. .Properties of `filters`
  100. [%collapsible%open]
  101. ====
  102. `<filter>`::
  103. (Required, <<query-dsl,Query DSL object>>)
  104. Query used to filter documents. The key is the filter name.
  105. +
  106. At least one filter is required. The total number of filters cannot exceed the
  107. <<indices-query-bool-max-clause-count,`indices.query.bool.max_clause_count`>>
  108. setting. See <<adjacency-matrix-agg-filter-limits>>.
  109. ====
  110. `separator`::
  111. (Optional, string)
  112. Separator used to concatenate filter names. Defaults to `&`.
  113. [[adjacency-matrix-agg-response]]
  114. ==== Response body
  115. `key`::
  116. (string)
  117. Filters for the bucket. If the bucket uses multiple filters, filter names are
  118. concatenated using a `separator`.
  119. `document_count`::
  120. (integer)
  121. Number of documents matching the bucket's filters.
  122. [[adjacency-matrix-agg-usage]]
  123. ==== Usage
  124. On its own this aggregation can provide all of the data required to create an undirected weighted graph.
  125. However, when used with child aggregations such as a `date_histogram` the results can provide the
  126. additional levels of data required to perform {wikipedia}/Dynamic_network_analysis[dynamic network analysis]
  127. where examining interactions _over time_ becomes important.
  128. [[adjacency-matrix-agg-filter-limits]]
  129. ==== Filter limits
  130. For N filters the matrix of buckets produced can be N²/2 which can be costly.
  131. The circuit breaker settings prevent results producing too many buckets and to avoid excessive disk seeks
  132. the `indices.query.bool.max_clause_count` setting is used to limit the number of filters.