adjacency-matrix-aggregation.asciidoc 3.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
  1. [[search-aggregations-bucket-adjacency-matrix-aggregation]]
  2. === Adjacency Matrix Aggregation
  3. A bucket aggregation returning a form of https://en.wikipedia.org/wiki/Adjacency_matrix[adjacency matrix].
  4. The request provides a collection of named filter expressions, similar to the `filters` aggregation
  5. request.
  6. Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.
  7. beta[The `adjacency_matrix` aggregation is a new feature and we may evolve its design as we get feedback on its use. As a result, the API for this feature may change in non-backwards compatible ways]
  8. Given filters named `A`, `B` and `C` the response would return buckets with the following names:
  9. [options="header"]
  10. |=======================
  11. | h|A h|B h|C
  12. h|A |A |A&B |A&C
  13. h|B | |B |B&C
  14. h|C | | |C
  15. |=======================
  16. The intersecting buckets e.g `A&C` are labelled using a combination of the two filter names separated by
  17. the ampersand character. Note that the response does not also include a "C&A" bucket as this would be the
  18. same set of documents as "A&C". The matrix is said to be _symmetric_ so we only return half of it. To do this we sort
  19. the filter name strings and always use the lowest of a pair as the value to the left of the "&" separator.
  20. An alternative `separator` parameter can be passed in the request if clients wish to use a separator string
  21. other than the default of the ampersand.
  22. Example:
  23. [source,js]
  24. --------------------------------------------------
  25. PUT /emails/message/_bulk?refresh
  26. { "index" : { "_id" : 1 } }
  27. { "accounts" : ["hillary", "sidney"]}
  28. { "index" : { "_id" : 2 } }
  29. { "accounts" : ["hillary", "donald"]}
  30. { "index" : { "_id" : 3 } }
  31. { "accounts" : ["vladimir", "donald"]}
  32. GET emails/message/_search
  33. {
  34. "size": 0,
  35. "aggs" : {
  36. "interactions" : {
  37. "adjacency_matrix" : {
  38. "filters" : {
  39. "grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }},
  40. "grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }},
  41. "grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }}
  42. }
  43. }
  44. }
  45. }
  46. }
  47. --------------------------------------------------
  48. // CONSOLE
  49. In the above example, we analyse email messages to see which groups of individuals
  50. have exchanged messages.
  51. We will get counts for each group individually and also a count of messages for pairs
  52. of groups that have recorded interactions.
  53. Response:
  54. [source,js]
  55. --------------------------------------------------
  56. {
  57. "took": 9,
  58. "timed_out": false,
  59. "_shards": ...,
  60. "hits": ...,
  61. "aggregations": {
  62. "interactions": {
  63. "buckets": [
  64. {
  65. "key":"grpA",
  66. "doc_count": 2
  67. },
  68. {
  69. "key":"grpA&grpB",
  70. "doc_count": 1
  71. },
  72. {
  73. "key":"grpB",
  74. "doc_count": 2
  75. },
  76. {
  77. "key":"grpB&grpC",
  78. "doc_count": 1
  79. },
  80. {
  81. "key":"grpC",
  82. "doc_count": 1
  83. }
  84. ]
  85. }
  86. }
  87. }
  88. --------------------------------------------------
  89. // TESTRESPONSE[s/"took": 9/"took": $body.took/]
  90. // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
  91. // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
  92. ==== Usage
  93. On its own this aggregation can provide all of the data required to create an undirected weighted graph.
  94. However, when used with child aggregations such as a `date_histogram` the results can provide the
  95. additional levels of data required to perform https://en.wikipedia.org/wiki/Dynamic_network_analysis[dynamic network analysis]
  96. where examining interactions _over time_ becomes important.
  97. ==== Limitations
  98. For N filters the matrix of buckets produced can be N²/2 and so there is a default maximum
  99. imposed of 100 filters . This setting can be changed using the `index.max_adjacency_matrix_filters` index-level setting.