123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266 |
- [role="xpack"]
- [testenv="basic"]
- [[rollup-put-job]]
- === Create {rollup-jobs} API
- [subs="attributes"]
- ++++
- <titleabbrev>Create {rollup-jobs}</titleabbrev>
- ++++
- Creates a {rollup-job}.
- experimental[]
- [[rollup-put-job-api-request]]
- ==== {api-request-title}
- `PUT _rollup/job/<job_id>`
- [[rollup-put-job-api-prereqs]]
- ==== {api-prereq-title}
- * If the {es} {security-features} are enabled, you must have `manage` or
- `manage_rollup` cluster privileges to use this API. For more information, see
- <<security-privileges>>.
- [[rollup-put-job-api-desc]]
- ==== {api-description-title}
- The {rollup-job} configuration contains all the details about how the job should
- run, when it indexes documents, and what future queries will be able to execute
- against the rollup index.
- There are three main sections to the job configuration: the logistical details
- about the job (cron schedule, etc), the fields that are used for grouping, and
- what metrics to collect for each group.
- Jobs are created in a `STOPPED` state. You can start them with the
- <<rollup-start-job,start {rollup-jobs} API>>.
- [[rollup-put-job-api-path-params]]
- ==== {api-path-parms-title}
- `<job_id>`::
- (Required, string) Identifier for the {rollup-job}. This can be any
- alphanumeric string and uniquely identifies the data that is associated with
- the {rollup-job}. The ID is persistent; it is stored with the rolled up data.
- If you create a job, let it run for a while, then delete the job, the data
- that the job rolled up is still be associated with this job ID. You cannot
- create a new job with the same ID since that could lead to problems with
- mismatched job configurations.
- [[rollup-put-job-api-request-body]]
- ==== {api-request-body-title}
- `cron`::
- (Required, string) A cron string which defines the intervals when the
- {rollup-job} should be executed. When the interval triggers, the indexer
- attempts to rollup the data in the index pattern. The cron pattern is
- unrelated to the time interval of the data being rolled up. For example, you
- may wish to create hourly rollups of your document but to only run the indexer
- on a daily basis at midnight, as defined by the cron. The cron pattern is
- defined just like a {watcher} cron schedule.
- [[rollup-groups-config]]
- `groups`::
- (Required, object) Defines the grouping fields and aggregations that are
- defined for this {rollup-job}. These fields will then be available later for
- aggregating into buckets.
- +
- --
- These aggs and fields can be used in any combination. Think of the `groups`
- configuration as defining a set of tools that can later be used in aggregations
- to partition the data. Unlike raw data, we have to think ahead to which fields
- and aggregations might be used. Rollups provide enough flexibility that you
- simply need to determine _which_ fields are needed, not _in what order_ they are
- needed.
- There are three types of groupings currently available:
- --
- `date_histogram`:::
- (Required, object) A date histogram group aggregates a `date` field into
- time-based buckets. This group is *mandatory*; you currently cannot rollup
- documents without a timestamp and a `date_histogram` group. The
- `date_histogram` group has several parameters:
-
- `field`::::
- (Required, string) The date field that is to be rolled up.
-
- `calendar_interval` or `fixed_interval`::::
- (Required, <<time-units,time units>>) The interval of time buckets to be
- generated when rolling up. For example, `60m` produces 60 minute (hourly)
- rollups. This follows standard time formatting syntax as used elsewhere in
- {es}. The interval defines the _minimum_ interval that can be aggregated only.
- If hourly (`60m`) intervals are configured, <<rollup-search,rollup search>>
- can execute aggregations with 60m or greater (weekly, monthly, etc) intervals.
- So define the interval as the smallest unit that you wish to later query. For
- more information about the difference between calendar and fixed time
- intervals, see <<rollup-understanding-group-intervals>>.
- +
- --
- NOTE: Smaller, more granular intervals take up proportionally more space.
- --
- `delay`::::
- (Optional,<<time-units,time units>>) How long to wait before rolling up new
- documents. By default, the indexer attempts to roll up all data that is
- available. However, it is not uncommon for data to arrive out of order,
- sometimes even a few days late. The indexer is unable to deal with data that
- arrives after a time-span has been rolled up. That is to say, there is no
- provision to update already-existing rollups.
- +
- --
- Instead, you should specify a `delay` that matches the longest period of time
- you expect out-of-order data to arrive. For example, a `delay` of `1d`
- instructs the indexer to roll up documents up to `now - 1d`, which provides
- a day of buffer time for out-of-order documents to arrive.
- --
- `time_zone`::::
- (Optional, string) Defines what time_zone the rollup documents are stored as.
- Unlike raw data, which can shift timezones on the fly, rolled documents have
- to be stored with a specific timezone. By default, rollup documents are stored
- in `UTC`.
- `terms`:::
- (Optional, object) The terms group can be used on `keyword` or numeric fields
- to allow bucketing via the `terms` aggregation at a later point. The indexer
- enumerates and stores _all_ values of a field for each time-period. This can
- be potentially costly for high-cardinality groups such as IP addresses,
- especially if the time-bucket is particularly sparse.
- +
- --
- TIP: While it is unlikely that a rollup will ever be larger in size than the raw
- data, defining `terms` groups on multiple high-cardinality fields can
- effectively reduce the compression of a rollup to a large extent. You should be
- judicious which high-cardinality fields are included for that reason.
- The `terms` group has a single parameter:
- --
- `fields`::::
- (Required, string) The set of fields that you wish to collect terms for. This
- array can contain fields that are both `keyword` and numerics. Order does not
- matter.
-
- `histogram`:::
- (Optional, object) The histogram group aggregates one or more numeric fields
- into numeric histogram intervals.
- +
- --
- The `histogram` group has a two parameters:
- --
- `fields`::::
- (Required, array) The set of fields that you wish to build histograms for. All fields
- specified must be some kind of numeric. Order does not matter.
- `interval`::::
- (Required, integer) The interval of histogram buckets to be generated when
- rolling up. For example, a value of `5` creates buckets that are five units
- wide (`0-5`, `5-10`, etc). Note that only one interval can be specified in the
- `histogram` group, meaning that all fields being grouped via the histogram
- must share the same interval.
- `index_pattern`::
- (Required, string) The index or index pattern to roll up. Supports
- wildcard-style patterns (`logstash-*`). The job will
- attempt to rollup the entire index or index-pattern.
- +
- --
- NOTE: The `index_pattern` cannot be a pattern that would also match the
- destination `rollup_index`. For example, the pattern `foo-*` would match the
- rollup index `foo-rollup`. This situation would cause problems because the
- {rollup-job} would attempt to rollup its own data at runtime. If you attempt to
- configure a pattern that matches the `rollup_index`, an exception occurs to
- prevent this behavior.
- --
- [[rollup-metrics-config]]
- `metrics`::
- (Optional, object) Defines the metrics to collect for each grouping tuple.
- By default, only the doc_counts are collected for each group. To make rollup
- useful, you will often add metrics like averages, mins, maxes, etc. Metrics
- are defined on a per-field basis and for each field you configure which metric
- should be collected.
- +
- --
- The `metrics` configuration accepts an array of objects, where each object has
- two parameters:
- --
- `field`:::
- (Required, string) The field to collect metrics for. This must be a numeric
- of some kind.
- `metrics`:::
- (Required, array) An array of metrics to collect for the field. At least one
- metric must be configured. Acceptable metrics are `min`,`max`,`sum`,`avg`, and
- `value_count`.
- `page_size`::
- (Required, integer) The number of bucket results that are processed on each
- iteration of the rollup indexer. A larger value tends to execute faster, but
- requires more memory during processing. This value has no effect on how the
- data is rolled up; it is merely used for tweaking the speed or memory cost of
- the indexer.
- `rollup_index`::
- (Required, string) The index that contains the rollup results. The index can
- be shared with other {rollup-jobs}. The data is stored so that it doesn't
- interfere with unrelated jobs.
- [[rollup-put-job-api-example]]
- ==== {api-example-title}
- The following example creates a {rollup-job} named `sensor`, targeting the
- `sensor-*` index pattern:
- [source,console]
- --------------------------------------------------
- PUT _rollup/job/sensor
- {
- "index_pattern": "sensor-*",
- "rollup_index": "sensor_rollup",
- "cron": "*/30 * * * * ?",
- "page_size" :1000,
- "groups" : { <1>
- "date_histogram": {
- "field": "timestamp",
- "fixed_interval": "1h",
- "delay": "7d"
- },
- "terms": {
- "fields": ["node"]
- }
- },
- "metrics": [ <2>
- {
- "field": "temperature",
- "metrics": ["min", "max", "sum"]
- },
- {
- "field": "voltage",
- "metrics": ["avg"]
- }
- ]
- }
- --------------------------------------------------
- // TEST[setup:sensor_index]
- <1> This configuration enables date histograms to be used on the `timestamp`
- field and `terms` aggregations to be used on the `node` field.
- <2> This configuration defines metrics over two fields: `temperature` and
- `voltage`. For the `temperature` field, we are collecting the min, max, and
- sum of the temperature. For `voltage`, we are collecting the average.
- When the job is created, you receive the following results:
- [source,console-result]
- ----
- {
- "acknowledged": true
- }
- ----
|