123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467 |
- [[downsampling-manual]]
- === Run downsampling manually
- ++++
- <titleabbrev>Run downsampling manually</titleabbrev>
- ++++
- preview::[]
- This is a simplified example that allows you to see quickly how
- <<downsampling,downsampling>> works to reduce the storage size of a time series
- index. The example uses typical Kubernetes cluster monitoring data. To test out
- downsampling, follow these steps:
- . Check the <<downsampling-manual-prereqs,prerequisites>>.
- . <<downsampling-manual-create-index>>.
- . <<downsampling-manual-ingest-data>>.
- . <<downsampling-manual-run>>.
- . <<downsampling-manual-view-results>>.
- [discrete]
- [[downsampling-manual-prereqs]]
- ==== Prerequisites
- Refer to <<tsds-prereqs,time series data stream prerequisites>>.
- For the example you need a sample data file. Download the file from link:
- https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/bltf2fe7a300c3c59f7/631b4bc5cc56115de2f58e8c/sample-k8s-metrics.json[here]
- and save it in the local directory where you're running {es}.
- [discrete]
- [[downsampling-manual-create-index]]
- ==== Create a time series index
- This creates an index for a basic data stream. The available parameters for an
- index are described in detail in <<set-up-a-data-stream,Set up a time series
- data stream>>.
- The time series boundaries are set so that sampling data for the index begins at
- `2022-06-10T00:00:00Z` and ends at `2022-06-30T23:59:59Z`.
- For simplicity, in the time series mapping all `time_series_metric` parameters
- are set to type `gauge`, but <<time-series-metric,other values>> such as
- `counter` and `histogram` may also be used. The `time_series_metric` values
- determine the kind of statistical representations that are used during
- downsampling.
- The index template includes a set of static
- <<time-series-dimension,time series dimensions>>: `host`, `namespace`,
- `node`, and `pod`. The time series dimensions are not changed by the
- downsampling process.
- [source,console]
- ----
- PUT /sample-01
- {
- "settings": {
- "index": {
- "mode": "time_series",
- "time_series": {
- "start_time": "2022-06-10T00:00:00Z",
- "end_time": "2022-06-30T23:59:59Z"
- },
- "routing_path": [
- "kubernetes.namespace",
- "kubernetes.host",
- "kubernetes.node",
- "kubernetes.pod"
- ],
- "number_of_replicas": 0,
- "number_of_shards": 2
- }
- },
- "mappings": {
- "properties": {
- "@timestamp": {
- "type": "date"
- },
- "kubernetes": {
- "properties": {
- "container": {
- "properties": {
- "cpu": {
- "properties": {
- "usage": {
- "properties": {
- "core": {
- "properties": {
- "ns": {
- "type": "long"
- }
- }
- },
- "limit": {
- "properties": {
- "pct": {
- "type": "float"
- }
- }
- },
- "nanocores": {
- "type": "long",
- "time_series_metric": "gauge"
- },
- "node": {
- "properties": {
- "pct": {
- "type": "float"
- }
- }
- }
- }
- }
- }
- },
- "memory": {
- "properties": {
- "available": {
- "properties": {
- "bytes": {
- "type": "long",
- "time_series_metric": "gauge"
- }
- }
- },
- "majorpagefaults": {
- "type": "long"
- },
- "pagefaults": {
- "type": "long",
- "time_series_metric": "gauge"
- },
- "rss": {
- "properties": {
- "bytes": {
- "type": "long",
- "time_series_metric": "gauge"
- }
- }
- },
- "usage": {
- "properties": {
- "bytes": {
- "type": "long",
- "time_series_metric": "gauge"
- },
- "limit": {
- "properties": {
- "pct": {
- "type": "float"
- }
- }
- },
- "node": {
- "properties": {
- "pct": {
- "type": "float"
- }
- }
- }
- }
- },
- "workingset": {
- "properties": {
- "bytes": {
- "type": "long",
- "time_series_metric": "gauge"
- }
- }
- }
- }
- },
- "name": {
- "type": "keyword"
- },
- "start_time": {
- "type": "date"
- }
- }
- },
- "host": {
- "type": "keyword",
- "time_series_dimension": true
- },
- "namespace": {
- "type": "keyword",
- "time_series_dimension": true
- },
- "node": {
- "type": "keyword",
- "time_series_dimension": true
- },
- "pod": {
- "type": "keyword",
- "time_series_dimension": true
- }
- }
- }
- }
- }
- }
- ----
- // TEST
- [discrete]
- [[downsampling-manual-ingest-data]]
- ==== Ingest time series data
- In a terminal window with {es} running, run the following curl command to load
- the documents from the downloaded sample data file:
- //[source,console]
- //----
- ```
- curl -s -H "Content-Type: application/json" \
- -XPOST http://<elasticsearch-node>/sample-01/_bulk?pretty \
- --data-binary @sample-k8s-metrics.json
- ```
- //----
- Approximately 18,000 documents are added. Check the search results for the newly
- ingested data:
- [source,console]
- ----
- GET /sample-01*/_search
- ----
- // TEST[continued]
- The query should return the first 10,000 hits. In each document you can see the
- time series dimensions (`host`, `node`, `pod` and `container`) as well as the
- various CPU and memory time series metrics.
- ```
- "hits": {
- "total": {
- "value": 10000,
- "relation": "gte"
- },
- "max_score": 1,
- "hits": [
- {
- "_index": "sample-01",
- "_id": "WyHN6N6AwdaJByQWAAABgYOOweA",
- "_score": 1,
- "_source": {
- "@timestamp": "2022-06-20T23:59:40Z",
- "kubernetes": {
- "host": "gke-apps-0",
- "node": "gke-apps-0-1",
- "pod": "gke-apps-0-1-0",
- "container": {
- "cpu": {
- "usage": {
- "nanocores": 80037,
- "core": {
- "ns": 12828317850
- },
- "node": {
- "pct": 0.0000277905
- },
- "limit": {
- "pct": 0.0000277905
- }
- }
- },
- "memory": {
- "available": {
- "bytes": 790830121
- },
- "usage": {
- "bytes": 139548672,
- "node": {
- "pct": 0.01770037710617187
- },
- "limit": {
- "pct": 0.00009923134671484496
- }
- },
- "workingset": {
- "bytes": 2248540
- },
- "rss": {
- "bytes": 289260
- },
- "pagefaults": 74843,
- "majorpagefaults": 0
- },
- "start_time": "2021-03-30T07:59:06Z",
- "name": "container-name-44"
- },
- "namespace": "namespace26"
- }
- }
- }
- ...
- ```
- Next, run a terms aggregation on the set of time series dimensions (`_tsid`) to
- create a date histogram on a fixed interval of one day.
- [source,console]
- ----
- GET /sample-01*/_search
- {
- "size": 0,
- "aggs": {
- "tsid": {
- "terms": {
- "field": "_tsid"
- },
- "aggs": {
- "over_time": {
- "date_histogram": {
- "field": "@timestamp",
- "fixed_interval": "1d"
- },
- "aggs": {
- "min": {
- "min": {
- "field": "kubernetes.container.memory.usage.bytes"
- }
- },
- "max": {
- "max": {
- "field": "kubernetes.container.memory.usage.bytes"
- }
- },
- "avg": {
- "avg": {
- "field": "kubernetes.container.memory.usage.bytes"
- }
- }
- }
- }
- }
- }
- }
- }
- ----
- // TEST[continued]
- Re-run your search query to view the aggregated time series data.
- [source,console]
- ----
- GET /sample-01*/_search
- ----
- // TEST[continued]
- [discrete]
- [[downsampling-manual-run]]
- ==== Run downsampling for the index
- Before running downsampling, the index needs to be set to read only mode:
- [source,console]
- ----
- PUT /sample-01/_block/write
- ----
- // TEST[continued]
- And now, you can use the <<indices-downsample-data-stream,downsample API>> to
- downsample the index, setting the time series interval to one hour:
- [source,console]
- ----
- POST /sample-01/_downsample/sample-01-downsample
- {
- "fixed_interval": "1h"
- }
- ----
- // TEST[continued]
- Finally, delete the original index:
- [source,console]
- ----
- DELETE /sample-01
- ----
- // TEST[continued]
- [discrete]
- [[downsampling-manual-view-results]]
- ==== View the results
- Now, re-run your search query:
- [source,console]
- ----
- GET /sample-01*/_search
- ----
- // TEST[continued]
- In the query results, notice that the numer of hits has been reduced to only 288
- documents. As well, for each time series metric statistical representations have
- been calculated: `min`, `max`, `sum`, and `value_count`.
- ```
- "hits": {
- "total": {
- "value": 288,
- "relation": "eq"
- },
- "max_score": 1,
- "hits": [
- {
- "_index": "sample-01-downsample",
- "_id": "WyHN6N6AwdaJByQWAAABgYNYIYA",
- "_score": 1,
- "_source": {
- "@timestamp": "2022-06-20T23:00:00.000Z",
- "_doc_count": 81,
- "kubernetes.host": "gke-apps-0",
- "kubernetes.namespace": "namespace26",
- "kubernetes.node": "gke-apps-0-1",
- "kubernetes.pod": "gke-apps-0-1-0",
- "kubernetes.container.cpu.usage.nanocores": {
- "min": 23344,
- "max": 163408,
- "sum": 7488985,
- "value_count": 81
- },
- "kubernetes.container.memory.available.bytes": {
- "min": 167751844,
- "max": 1182251090,
- "sum": 58169948901,
- "value_count": 81
- },
- "kubernetes.container.memory.rss.bytes": {
- "min": 54067,
- "max": 391987,
- "sum": 17550215,
- "value_count": 81
- },
- "kubernetes.container.memory.pagefaults": {
- "min": 69086,
- "max": 428910,
- "sum": 20239365,
- "value_count": 81
- },
- "kubernetes.container.memory.workingset.bytes": {
- "min": 323420,
- "max": 2279342,
- "sum": 104233700,
- "value_count": 81
- },
- "kubernetes.container.memory.usage.bytes": {
- "min": 61401416,
- "max": 413064069,
- "sum": 18557182404,
- "value_count": 81
- }
- }
- },
- ...
- ```
- This example demonstrates how downsampling can dramatically reduce the number of
- records stored for time series data, within whatever time boundaries you choose.
- It's also possible to perform downsampling on already downsampled data, to
- further reduce storage and associated costs, as the time series data ages and
- the data resolution becomes less critical.
- Downsampling is very easily integrated within an ILM policy. To learn more, try
- the <<downsampling-ilm,Run downsampling with ILM>> example.
|