rollup-search-limitations.asciidoc 6.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[rollup-search-limitations]]
  4. === {rollup-cap} search limitations
  5. experimental[]
  6. While we feel the Rollup function is extremely flexible, the nature of summarizing data means there will be some limitations. Once
  7. live data is thrown away, you will always lose some flexibility.
  8. This page highlights the major limitations so that you are aware of them.
  9. [float]
  10. ==== Only one {rollup} index per search
  11. When using the <<rollup-search>> endpoint, the `index` parameter accepts one or more indices. These can be a mix of regular, non-rollup
  12. indices and rollup indices. However, only one rollup index can be specified. The exact list of rules for the `index` parameter are as
  13. follows:
  14. - At least one index/index-pattern must be specified. This can be either a rollup or non-rollup index. Omitting the index parameter,
  15. or using `_all`, is not permitted
  16. - Multiple non-rollup indices may be specified
  17. - Only one rollup index may be specified. If more than one are supplied an exception will be thrown
  18. - Index patterns may be used, but if they match more than one rollup index an exception will be thrown.
  19. This limitation is driven by the logic that decides which jobs are the "best" for any given query. If you have ten jobs stored in a single
  20. index, which cover the source data with varying degrees of completeness and different intervals, the query needs to determine which set
  21. of jobs to actually search. Incorrect decisions can lead to inaccurate aggregation results (e.g. over-counting doc counts, or bad metrics).
  22. Needless to say, this is a technically challenging piece of code.
  23. To help simplify the problem, we have limited search to just one rollup index at a time (which may contain multiple jobs). In the future we
  24. may be able to open this up to multiple rollup jobs.
  25. [float]
  26. [[aggregate-stored-only]]
  27. ==== Can only aggregate what's been stored
  28. A perhaps obvious limitation, but rollups can only aggregate on data that has been stored in the rollups. If you don't configure the
  29. rollup job to store metrics about the `price` field, you won't be able to use the `price` field in any query or aggregation.
  30. For example, the `temperature` field in the following query has been stored in a rollup job... but not with an `avg` metric. Which means
  31. the usage of `avg` here is not allowed:
  32. [source,console]
  33. --------------------------------------------------
  34. GET sensor_rollup/_rollup_search
  35. {
  36. "size": 0,
  37. "aggregations": {
  38. "avg_temperature": {
  39. "avg": {
  40. "field": "temperature"
  41. }
  42. }
  43. }
  44. }
  45. --------------------------------------------------
  46. // TEST[setup:sensor_prefab_data]
  47. // TEST[catch:/illegal_argument_exception/]
  48. The response will tell you that the field and aggregation were not possible, because no rollup jobs were found which contained them:
  49. [source,console-result]
  50. ----
  51. {
  52. "error" : {
  53. "root_cause" : [
  54. {
  55. "type" : "illegal_argument_exception",
  56. "reason" : "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",
  57. "stack_trace": ...
  58. }
  59. ],
  60. "type" : "illegal_argument_exception",
  61. "reason" : "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",
  62. "stack_trace": ...
  63. },
  64. "status": 400
  65. }
  66. ----
  67. // TESTRESPONSE[s/"stack_trace": \.\.\./"stack_trace": $body.$_path/]
  68. [float]
  69. ==== Interval granularity
  70. Rollups are stored at a certain granularity, as defined by the `date_histogram` group in the configuration. This means you
  71. can only search/aggregate the rollup data with an interval that is greater-than or equal to the configured rollup interval.
  72. For example, if data is rolled up at hourly intervals, the <<rollup-search>> API can aggregate on any time interval
  73. hourly or greater. Intervals that are less than an hour will throw an exception, since the data simply doesn't
  74. exist for finer granularities.
  75. [[rollup-search-limitations-intervals]]
  76. .Requests must be multiples of the config
  77. **********************************
  78. Perhaps not immediately apparent, but the interval specified in an aggregation request must be a whole
  79. multiple of the configured interval. If the job was configured to rollup on `3d` intervals, you can only
  80. query and aggregate on multiples of three (`3d`, `6d`, `9d`, etc).
  81. A non-multiple wouldn't work, since the rolled up data wouldn't cleanly "overlap" with the buckets generated
  82. by the aggregation, leading to incorrect results.
  83. For that reason, an error is thrown if a whole multiple of the configured interval isn't found.
  84. **********************************
  85. Because the RollupSearch endpoint can "upsample" intervals, there is no need to configure jobs with multiple intervals (hourly, daily, etc).
  86. It's recommended to just configure a single job with the smallest granularity that is needed, and allow the search endpoint to upsample
  87. as needed.
  88. That said, if multiple jobs are present in a single rollup index with varying intervals, the search endpoint will identify and use the job(s)
  89. with the largest interval to satisfy the search request.
  90. [float]
  91. ==== Limited querying components
  92. The Rollup functionality allows `query`'s in the search request, but with a limited subset of components. The queries currently allowed are:
  93. - Term Query
  94. - Terms Query
  95. - Range Query
  96. - MatchAll Query
  97. - Any compound query (Boolean, Boosting, ConstantScore, etc)
  98. Furthermore, these queries can only use fields that were also saved in the rollup job as a `group`.
  99. If you wish to filter on a keyword `hostname` field, that field must have been configured in the rollup job under a `terms` grouping.
  100. If you attempt to use an unsupported query, or the query references a field that wasn't configured in the rollup job, an exception will be
  101. thrown. We expect the list of support queries to grow over time as more are implemented.
  102. [float]
  103. ==== Timezones
  104. Rollup documents are stored in the timezone of the `date_histogram` group configuration in the job. If no timezone is specified, the default
  105. is to rollup timestamps in `UTC`.