adjacency-matrix-aggregation.asciidoc 3.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112
  1. [[search-aggregations-bucket-adjacency-matrix-aggregation]]
  2. === Adjacency Matrix Aggregation
  3. A bucket aggregation returning a form of https://en.wikipedia.org/wiki/Adjacency_matrix[adjacency matrix].
  4. The request provides a collection of named filter expressions, similar to the `filters` aggregation
  5. request.
  6. Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.
  7. Given filters named `A`, `B` and `C` the response would return buckets with the following names:
  8. [options="header"]
  9. |=======================
  10. | h|A h|B h|C
  11. h|A |A |A&B |A&C
  12. h|B | |B |B&C
  13. h|C | | |C
  14. |=======================
  15. The intersecting buckets e.g `A&C` are labelled using a combination of the two filter names separated by
  16. the ampersand character. Note that the response does not also include a "C&A" bucket as this would be the
  17. same set of documents as "A&C". The matrix is said to be _symmetric_ so we only return half of it. To do this we sort
  18. the filter name strings and always use the lowest of a pair as the value to the left of the "&" separator.
  19. An alternative `separator` parameter can be passed in the request if clients wish to use a separator string
  20. other than the default of the ampersand.
  21. Example:
  22. [source,console]
  23. --------------------------------------------------
  24. PUT /emails/_bulk?refresh
  25. { "index" : { "_id" : 1 } }
  26. { "accounts" : ["hillary", "sidney"]}
  27. { "index" : { "_id" : 2 } }
  28. { "accounts" : ["hillary", "donald"]}
  29. { "index" : { "_id" : 3 } }
  30. { "accounts" : ["vladimir", "donald"]}
  31. GET emails/_search
  32. {
  33. "size": 0,
  34. "aggs" : {
  35. "interactions" : {
  36. "adjacency_matrix" : {
  37. "filters" : {
  38. "grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }},
  39. "grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }},
  40. "grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }}
  41. }
  42. }
  43. }
  44. }
  45. }
  46. --------------------------------------------------
  47. In the above example, we analyse email messages to see which groups of individuals
  48. have exchanged messages.
  49. We will get counts for each group individually and also a count of messages for pairs
  50. of groups that have recorded interactions.
  51. Response:
  52. [source,js]
  53. --------------------------------------------------
  54. {
  55. "took": 9,
  56. "timed_out": false,
  57. "_shards": ...,
  58. "hits": ...,
  59. "aggregations": {
  60. "interactions": {
  61. "buckets": [
  62. {
  63. "key":"grpA",
  64. "doc_count": 2
  65. },
  66. {
  67. "key":"grpA&grpB",
  68. "doc_count": 1
  69. },
  70. {
  71. "key":"grpB",
  72. "doc_count": 2
  73. },
  74. {
  75. "key":"grpB&grpC",
  76. "doc_count": 1
  77. },
  78. {
  79. "key":"grpC",
  80. "doc_count": 1
  81. }
  82. ]
  83. }
  84. }
  85. }
  86. --------------------------------------------------
  87. // TESTRESPONSE[s/"took": 9/"took": $body.took/]
  88. // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
  89. // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
  90. ==== Usage
  91. On its own this aggregation can provide all of the data required to create an undirected weighted graph.
  92. However, when used with child aggregations such as a `date_histogram` the results can provide the
  93. additional levels of data required to perform https://en.wikipedia.org/wiki/Dynamic_network_analysis[dynamic network analysis]
  94. where examining interactions _over time_ becomes important.
  95. ==== Limitations
  96. For N filters the matrix of buckets produced can be N²/2 and so there is a default maximum
  97. imposed of 100 filters . This setting can be changed using the `index.max_adjacency_matrix_filters` index-level setting.