adjacency-matrix-aggregation.asciidoc 3.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
  1. [[search-aggregations-bucket-adjacency-matrix-aggregation]]
  2. === Adjacency Matrix Aggregation
  3. A bucket aggregation returning a form of https://en.wikipedia.org/wiki/Adjacency_matrix[adjacency matrix].
  4. The request provides a collection of named filter expressions, similar to the `filters` aggregation
  5. request.
  6. Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.
  7. Given filters named `A`, `B` and `C` the response would return buckets with the following names:
  8. [options="header"]
  9. |=======================
  10. | h|A h|B h|C
  11. h|A |A |A&B |A&C
  12. h|B | |B |B&C
  13. h|C | | |C
  14. |=======================
  15. The intersecting buckets e.g `A&C` are labelled using a combination of the two filter names separated by
  16. the ampersand character. Note that the response does not also include a "C&A" bucket as this would be the
  17. same set of documents as "A&C". The matrix is said to be _symmetric_ so we only return half of it. To do this we sort
  18. the filter name strings and always use the lowest of a pair as the value to the left of the "&" separator.
  19. An alternative `separator` parameter can be passed in the request if clients wish to use a separator string
  20. other than the default of the ampersand.
  21. Example:
  22. [source,js]
  23. --------------------------------------------------
  24. PUT /emails/_doc/_bulk?refresh
  25. { "index" : { "_id" : 1 } }
  26. { "accounts" : ["hillary", "sidney"]}
  27. { "index" : { "_id" : 2 } }
  28. { "accounts" : ["hillary", "donald"]}
  29. { "index" : { "_id" : 3 } }
  30. { "accounts" : ["vladimir", "donald"]}
  31. GET emails/_search
  32. {
  33. "size": 0,
  34. "aggs" : {
  35. "interactions" : {
  36. "adjacency_matrix" : {
  37. "filters" : {
  38. "grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }},
  39. "grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }},
  40. "grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }}
  41. }
  42. }
  43. }
  44. }
  45. }
  46. --------------------------------------------------
  47. // CONSOLE
  48. In the above example, we analyse email messages to see which groups of individuals
  49. have exchanged messages.
  50. We will get counts for each group individually and also a count of messages for pairs
  51. of groups that have recorded interactions.
  52. Response:
  53. [source,js]
  54. --------------------------------------------------
  55. {
  56. "took": 9,
  57. "timed_out": false,
  58. "_shards": ...,
  59. "hits": ...,
  60. "aggregations": {
  61. "interactions": {
  62. "buckets": [
  63. {
  64. "key":"grpA",
  65. "doc_count": 2
  66. },
  67. {
  68. "key":"grpA&grpB",
  69. "doc_count": 1
  70. },
  71. {
  72. "key":"grpB",
  73. "doc_count": 2
  74. },
  75. {
  76. "key":"grpB&grpC",
  77. "doc_count": 1
  78. },
  79. {
  80. "key":"grpC",
  81. "doc_count": 1
  82. }
  83. ]
  84. }
  85. }
  86. }
  87. --------------------------------------------------
  88. // TESTRESPONSE[s/"took": 9/"took": $body.took/]
  89. // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
  90. // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
  91. ==== Usage
  92. On its own this aggregation can provide all of the data required to create an undirected weighted graph.
  93. However, when used with child aggregations such as a `date_histogram` the results can provide the
  94. additional levels of data required to perform https://en.wikipedia.org/wiki/Dynamic_network_analysis[dynamic network analysis]
  95. where examining interactions _over time_ becomes important.
  96. ==== Limitations
  97. For N filters the matrix of buckets produced can be N²/2 and so there is a default maximum
  98. imposed of 100 filters . This setting can be changed using the `index.max_adjacency_matrix_filters` index-level setting.