123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244 |
- [[logs-data-stream]]
- == Logs data stream
- IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted
- and self-managed Elasticsearch as of version 8.17, and is enabled by default for
- logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}].
- A logs data stream is a data stream type that stores log data more efficiently.
- In benchmarks, log data stored in a logs data stream used ~2.5 times less disk space than a regular data
- stream. The exact impact varies by data set.
- [discrete]
- [[how-to-use-logsds]]
- === Create a logs data stream
- IMPORTANT: Fleet integrations use <<index-templates,index templates>> managed by Elastic. To modify these backing templates, update their {observability-guide}/logs-index-template.html#custom-logs-template-edit[composite `custom` templates].
- To create a logs data stream, set your <<index-templates,template>> `index.mode` to `logsdb`:
- [source,console]
- ----
- PUT _index_template/my-index-template
- {
- "index_patterns": ["my-datastream-*"],
- "data_stream": { },
- "template": {
- "settings": {
- "index.mode": "logsdb" <1>
- }
- },
- "priority": 101 <2>
- }
- ----
- // TEST
- <1> The index mode setting.
- <2> The index template priority. By default, Elasticsearch ships with a `logs-*-*` index template with a priority of 100. To make sure your index template takes priority over the default `logs-*-*` template, set its `priority` to a number higher than 100. For more information, see <<avoid-index-pattern-collisions,Avoid index pattern collisions>>.
- After the index template is created, new indices that use the template will be configured as a logs data stream. You can start indexing data and <<use-a-data-stream,using the data stream>>.
- You can also set the index mode and adjust other template settings in <<index-mgmt,the Elastic UI>>.
- ////
- [source,console]
- ----
- DELETE _index_template/my-index-template
- ----
- // TEST[continued]
- ////
- [[logsdb-default-settings]]
- [discrete]
- [[logsdb-synthetic-source]]
- === Synthetic source
- If you have the required https://www.elastic.co/subscriptions[subscription], `logsdb` index mode uses <<synthetic-source,synthetic `_source`>>, which omits storing the original `_source`
- field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval.
- If you don't have the required https://www.elastic.co/subscriptions[subscription], `logsdb` mode uses the original `_source` field.
- Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>.
- When working with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values
- are preserved for <<synthetic-source,synthetic source>> reconstruction. In `logsdb`, the default value is `arrays`,
- which retains both duplicate values and the order of entries. However, the exact structure of
- array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some
- log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events.
- [discrete]
- [[logsdb-sort-settings]]
- === Index sort settings
- In `logsdb` index mode, indices are sorted by the fields `host.name` and `@timestamp` by default.
- * If the `@timestamp` field is not present, it is automatically injected.
- * If the `host.name` field is not present, it is automatically injected as a `keyword` field, if possible.
- ** If `host.name` can't be injected (for example, `host` is a keyword field) or can't be used for sorting
- (for example, its value is an IP address), only the `@timestamp` is used for sorting.
- ** If `host.name` is injected and `subobjects` is set to `true` (default), the `host` field is mapped as
- an object field named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`,
- a single `host.name` field is mapped as a `keyword` field.
- * To prioritize the latest data, `host.name` is sorted in ascending order and `@timestamp` is sorted in
- descending order.
- You can override the default sort settings by manually configuring `index.sort.field`
- and `index.sort.order`. For more details, see <<index-modules-index-sorting>>.
- To modify the sort configuration of an existing data stream, update the data stream's
- component templates, and then perform or wait for a <<data-streams-rollover,rollover>>.
- NOTE: If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not
- automatically added to the list of sort fields. For best results, include it manually as the last sort
- field, with `desc` ordering.
- [discrete]
- [[logsdb-host-name]]
- ==== Existing data streams
- If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied.
- To avoid mapping conflicts, consider these options:
- * **Adjust mappings:** Check your existing mappings to ensure that `host.name` is mapped as a keyword.
- * **Change sorting:** If needed, you can remove `host.name` from the sort settings and use a different set of fields. Sorting by `@timestamp` can be a good fallback.
- * **Switch to a different <<index-mode-setting,index mode>>**: If resolving `host.name` mapping conflicts is not feasible, you can choose not to use `logsdb` mode.
- IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual).
- [discrete]
- [[logsdb-sort-routing]]
- ==== Optimized routing on sort fields
- If you have the required [subscription](https://www.elastic.co/subscriptions), you can enable routing optimizations
- to reduce the storage footprint of `logsdb` indexes. A routing optimization uses the fields in the sort configuration
- (except for `@timestamp`) to route documents to shards.
- In benchmarks,
- routing optimizations reduced storage requirements by 20% compared to the default `logsdb` configuration, with a negligible penalty to ingestion
- performance (1-4%). Routing optimizations can benefit data streams that are expected to grow substantially over
- time. Exact results depend on the sort configuration and the nature of the logged data.
- To configure a routing optimization:
- * Include the index setting `[index.logsdb.route_on_sort_fields:true]` in the data stream configuration.
- * <<index-modules-index-sorting, Configure index sorting>> with two or more fields, in addition to `@timestamp`.
- * Make sure the <<mapping-id-field,`_id`>> field is not populated in ingested documents. It should be
- auto-generated instead.
- A custom sort configuration is required, to improve storage efficiency and to minimize hotspots
- from logging spikes that may route documents to a single shard. For best results, use a few sort fields
- that have a relatively low cardinality and don't co-vary (for example, `host.name` and `host.id` are not optimal).
- [discrete]
- [[logsdb-specialized-codecs]]
- === Specialized codecs
- By default, `logsdb` index mode uses the `best_compression` <<index-codec,codec>>, which applies {wikipedia}/Zstd[ZSTD]
- compression to stored fields. You can switch to the `default` codec for faster compression with a slightly larger storage footprint.
- The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are
- encoded using the following sequence of codecs:
- * **Delta encoding**:
- Stores the difference between consecutive values instead of the actual values.
- * **Offset encoding**:
- Stores the difference from a base value rather than between consecutive values.
- * **Greatest Common Divisor (GCD) encoding**:
- Finds the greatest common divisor of a set of values and stores the differences as multiples of the GCD.
- * **Frame Of Reference (FOR) encoding**:
- Determines the smallest number of bits required to encode a block of values and uses
- bit-packing to fit such values into larger 64-bit blocks.
- Each encoding is evaluated according to heuristics determined by the data distribution.
- For example, the algorithm checks whether the data is monotonically non-decreasing or
- non-increasing. If so, delta encoding is applied; otherwise, the process
- continues with the next encoding method (offset).
- Encoding is specific to each Lucene segment and is reapplied when segments are merged. The merged Lucene segment
- might use a different encoding than the original segments, depending on the characteristics of the merged data.
- For keyword fields, **Run Length Encoding (RLE)** is applied to the ordinals, which represent positions in the Lucene
- segment-level keyword dictionary. This compression is used when multiple consecutive documents share the same keyword.
- [discrete]
- [[logsdb-ignored-settings]]
- === `ignore` settings
- The `logsdb` index mode uses the following `ignore` settings. You can override these settings as needed.
- [discrete]
- [[logsdb-ignore-malformed]]
- ==== `ignore_malformed`
- By default, `logsdb` index mode sets `ignore_malformed` to `true`. With this setting, documents with malformed fields
- can be indexed without causing ingestion failures.
- [discrete]
- [[logs-db-ignore-above]]
- ==== `ignore_above`
- In `logsdb` index mode, the `index.mapping.ignore_above` setting is applied by default at the index level to ensure
- efficient storage and indexing of large keyword fields. This applies to all members of the keyword type family (keyword,
- constant_keyword, and wildcard). The index-level default for `ignore_above` is 8191 _characters._ Using UTF-8 encoding,
- this results in a limit of 32764 bytes, depending on character encoding.
- The mapping-level `ignore_above` setting takes precedence. If a specific field has an `ignore_above` value
- defined in its mapping, that value overrides the index-level `index.mapping.ignore_above` value. This default
- behavior helps to optimize indexing performance by preventing excessively large string values from being indexed.
- If you need to customize the limit, you can override it at the mapping level or change the index level default.
- [discrete]
- [[logs-db-ignore-limit]]
- ==== `ignore_dynamic_beyond_limit`
- In `logsdb` index mode, the setting `index.mapping.total_fields.ignore_dynamic_beyond_limit` is set to `true` by
- default. This setting allows dynamically mapped fields to be added on top of statically defined fields, even when the total number of fields exceeds the `index.mapping.total_fields.limit`. Instead of triggering an index failure, additional dynamically mapped fields are ignored so that ingestion can continue.
- NOTE: When automatically injected, `host.name` and `@timestamp` count toward the limit of mapped fields. If `host.name` is mapped with `subobjects: true`, it has two fields. When mapped with `subobjects: false`, `host.name` has only one field.
- [discrete]
- [[logsdb-nodocvalue-fields]]
- === Fields without `doc_values`
- When the `logsdb` index mode uses synthetic `_source` and `doc_values` are disabled for a field in the mapping,
- {es} might set the `store` setting to `true` for that field. This ensures that the field's
- data remains accessible for reconstructing the document's source when using
- <<synthetic-source,synthetic source>>.
- For example, this adjustment occurs with text fields when `store` is `false` and no suitable multi-field is available for
- reconstructing the original value.
- [discrete]
- [[logsdb-settings-summary]]
- === Settings reference
- The `logsdb` index mode uses the following settings:
- * **`"index.mode"`**: `"logsdb"`
- * **`"index.mapping.synthetic_source_keep"`**: `"arrays"`
- * **`"index.sort.field"`**: `["host.name", "@timestamp"]`
- * **`"index.sort.order"`**: `["desc", "desc"]`
- * **`"index.sort.mode"`**: `["min", "min"]`
- * **`"index.sort.missing"`**: `["_first", "_first"]`
- * **`"index.codec"`**: `"best_compression"`
- * **`"index.mapping.ignore_malformed"`**: `true`
- * **`"index.mapping.ignore_above"`**: `8191`
- * **`"index.mapping.total_fields.ignore_dynamic_beyond_limit"`**: `true`
|