123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152 |
- [[search-aggregations-bucket-adjacency-matrix-aggregation]]
- === Adjacency matrix aggregation
- ++++
- <titleabbrev>Adjacency matrix</titleabbrev>
- ++++
- A bucket aggregation returning a form of {wikipedia}/Adjacency_matrix[adjacency matrix].
- The request provides a collection of named filter expressions, similar to the `filters` aggregation
- request.
- Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.
- Given filters named `A`, `B` and `C` the response would return buckets with the following names:
- [options="header"]
- |=======================
- | h|A h|B h|C
- h|A |A |A&B |A&C
- h|B | |B |B&C
- h|C | | |C
- |=======================
- The intersecting buckets e.g `A&C` are labelled using a combination of the two filter names with a default separator
- of `&`. Note that the response does not also include a `C&A` bucket as this would be the
- same set of documents as `A&C`. The matrix is said to be _symmetric_ so we only return half of it. To do this we sort
- the filter name strings and always use the lowest of a pair as the value to the left of the separator.
- [[adjacency-matrix-agg-ex]]
- ==== Example
- The following `interactions` aggregation uses `adjacency_matrix` to determine
- which groups of individuals exchanged emails.
- [source,console,id=adjacency-matrix-aggregation-example]
- --------------------------------------------------
- PUT emails/_bulk?refresh
- { "index" : { "_id" : 1 } }
- { "accounts" : ["hillary", "sidney"]}
- { "index" : { "_id" : 2 } }
- { "accounts" : ["hillary", "donald"]}
- { "index" : { "_id" : 3 } }
- { "accounts" : ["vladimir", "donald"]}
- GET emails/_search
- {
- "size": 0,
- "aggs" : {
- "interactions" : {
- "adjacency_matrix" : {
- "filters" : {
- "grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }},
- "grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }},
- "grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }}
- }
- }
- }
- }
- }
- --------------------------------------------------
- The response contains buckets with document counts for each filter and
- combination of filters. Buckets with no matching documents are excluded from the
- response.
- [source,console-result]
- --------------------------------------------------
- {
- "took": 9,
- "timed_out": false,
- "_shards": ...,
- "hits": ...,
- "aggregations": {
- "interactions": {
- "buckets": [
- {
- "key":"grpA",
- "doc_count": 2
- },
- {
- "key":"grpA&grpB",
- "doc_count": 1
- },
- {
- "key":"grpB",
- "doc_count": 2
- },
- {
- "key":"grpB&grpC",
- "doc_count": 1
- },
- {
- "key":"grpC",
- "doc_count": 1
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 9/"took": $body.took/]
- // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
- // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
- [role="child_attributes"]
- [[adjacency-matrix-agg-params]]
- ==== Parameters
- `filters`::
- (Required, object)
- Filters used to create buckets.
- +
- .Properties of `filters`
- [%collapsible%open]
- ====
- `<filter>`::
- (Required, <<query-dsl,Query DSL object>>)
- Query used to filter documents. The key is the filter name.
- +
- At least one filter is required. The total number of filters cannot exceed the
- <<indices-query-bool-max-clause-count,`indices.query.bool.max_clause_count`>>
- setting. See <<adjacency-matrix-agg-filter-limits>>.
- ====
- `separator`::
- (Optional, string)
- Separator used to concatenate filter names. Defaults to `&`.
- [[adjacency-matrix-agg-response]]
- ==== Response body
- `key`::
- (string)
- Filters for the bucket. If the bucket uses multiple filters, filter names are
- concatenated using a `separator`.
- `document_count`::
- (integer)
- Number of documents matching the bucket's filters.
- [[adjacency-matrix-agg-usage]]
- ==== Usage
- On its own this aggregation can provide all of the data required to create an undirected weighted graph.
- However, when used with child aggregations such as a `date_histogram` the results can provide the
- additional levels of data required to perform {wikipedia}/Dynamic_network_analysis[dynamic network analysis]
- where examining interactions _over time_ becomes important.
- [[adjacency-matrix-agg-filter-limits]]
- ==== Filter limits
- For N filters the matrix of buckets produced can be N²/2 which can be costly.
- The circuit breaker settings prevent results producing too many buckets and to avoid excessive disk seeks
- the `indices.query.bool.max_clause_count` setting is used to limit the number of filters.
|