123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116 |
- [[search-aggregations-bucket-adjacency-matrix-aggregation]]
- === Adjacency Matrix Aggregation
- A bucket aggregation returning a form of https://en.wikipedia.org/wiki/Adjacency_matrix[adjacency matrix].
- The request provides a collection of named filter expressions, similar to the `filters` aggregation
- request.
- Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.
- beta[The `adjacency_matrix` aggregation is a new feature and we may evolve its design as we get feedback on its use. As a result, the API for this feature may change in non-backwards compatible ways]
- Given filters named `A`, `B` and `C` the response would return buckets with the following names:
- [options="header"]
- |=======================
- | h|A h|B h|C
- h|A |A |A&B |A&C
- h|B | |B |B&C
- h|C | | |C
- |=======================
- The intersecting buckets e.g `A&C` are labelled using a combination of the two filter names separated by
- the ampersand character. Note that the response does not also include a "C&A" bucket as this would be the
- same set of documents as "A&C". The matrix is said to be _symmetric_ so we only return half of it. To do this we sort
- the filter name strings and always use the lowest of a pair as the value to the left of the "&" separator.
- An alternative `separator` parameter can be passed in the request if clients wish to use a separator string
- other than the default of the ampersand.
- Example:
- [source,js]
- --------------------------------------------------
- PUT /emails/message/_bulk?refresh
- { "index" : { "_id" : 1 } }
- { "accounts" : ["hillary", "sidney"]}
- { "index" : { "_id" : 2 } }
- { "accounts" : ["hillary", "donald"]}
- { "index" : { "_id" : 3 } }
- { "accounts" : ["vladimir", "donald"]}
- GET emails/message/_search
- {
- "size": 0,
- "aggs" : {
- "interactions" : {
- "adjacency_matrix" : {
- "filters" : {
- "grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }},
- "grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }},
- "grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }}
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- In the above example, we analyse email messages to see which groups of individuals
- have exchanged messages.
- We will get counts for each group individually and also a count of messages for pairs
- of groups that have recorded interactions.
- Response:
- [source,js]
- --------------------------------------------------
- {
- "took": 9,
- "timed_out": false,
- "_shards": ...,
- "hits": ...,
- "aggregations": {
- "interactions": {
- "buckets": [
- {
- "key":"grpA",
- "doc_count": 2
- },
- {
- "key":"grpA&grpB",
- "doc_count": 1
- },
- {
- "key":"grpB",
- "doc_count": 2
- },
- {
- "key":"grpB&grpC",
- "doc_count": 1
- },
- {
- "key":"grpC",
- "doc_count": 1
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 9/"took": $body.took/]
- // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
- // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
- ==== Usage
- On its own this aggregation can provide all of the data required to create an undirected weighted graph.
- However, when used with child aggregations such as a `date_histogram` the results can provide the
- additional levels of data required to perform https://en.wikipedia.org/wiki/Dynamic_network_analysis[dynamic network analysis]
- where examining interactions _over time_ becomes important.
- ==== Limitations
- For N filters the matrix of buckets produced can be N²/2 and so there is a default maximum
- imposed of 100 filters . This setting can be changed using the `index.max_adjacency_matrix_filters` index-level setting.
|