123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160 |
- [[tune-for-disk-usage]]
- == Tune for disk usage
- [float]
- === Disable the features you do not need
- By default elasticsearch indexes and adds doc values to most fields so that they
- can be searched and aggregated out of the box. For instance if you have a numeric
- field called `foo` that you need to run histograms on but that you never need to
- filter on, you can safely disable indexing on this field in your
- <<mappings,mappings>>:
- [source,js]
- --------------------------------------------------
- PUT index
- {
- "mappings": {
- "type": {
- "properties": {
- "foo": {
- "type": "integer",
- "index": false
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <<text,`text`>> fields store normalization factors in the index in order to be
- able to score documents. If you only need matching capabilities on a `text`
- field but do not care about the produced scores, you can configure elasticsearch
- to not write norms to the index:
- [source,js]
- --------------------------------------------------
- PUT index
- {
- "mappings": {
- "type": {
- "properties": {
- "foo": {
- "type": "text",
- "norms": false
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <<text,`text`>> fields also store frequencies and positions in the index by
- default. Frequencies are used to compute scores and positions are used to run
- phrase queries. If you do not need to run phrase queries, you can tell
- elasticsearch to not index positions:
- [source,js]
- --------------------------------------------------
- PUT index
- {
- "mappings": {
- "type": {
- "properties": {
- "foo": {
- "type": "text",
- "index_options": "freqs"
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- Furthermore if you do not care about scoring either, you can configure
- elasticsearch to just index matching documents for every term. You will
- still be able to search on this field, but phrase queries will raise errors
- and scoring will assume that terms appear only once in every document.
- [source,js]
- --------------------------------------------------
- PUT index
- {
- "mappings": {
- "type": {
- "properties": {
- "foo": {
- "type": "text",
- "norms": false,
- "index_options": "freqs"
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- [float]
- === Don't use default dynamic string mappings
- The default <<dynamic-mapping,dynamic string mappings>> will index string fields
- both as <<text,`text`>> and <<keyword,`keyword`>>. This is wasteful if you only
- need one of them. Typically an `id` field will only need to be indexed as a
- `keyword` while a `body` field will only need to be indexed as a `text` field.
- This can be disabled by either configuring explicit mappings on string fields
- or setting up dynamic templates that will map string fields as either `text`
- or `keyword`.
- For instance, here is a template that can be used in order to only map string
- fields as `keyword`:
- [source,js]
- --------------------------------------------------
- PUT index
- {
- "mappings": {
- "type": {
- "dynamic_templates": [
- {
- "strings": {
- "match_mapping_type": "string",
- "mapping": {
- "type": "keyword"
- }
- }
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- [float]
- === Disable `_all`
- The <<mapping-all-field,`_all`>> field indexes the value of all fields of a
- document and can use significant space. If you never need to search against all
- fields at the same time, it can be disabled.
- [float]
- === Use `best_compression`
- The `_source` and stored fields can easily take a non negligible amount of disk
- space. They can be compressed more aggressively by using the `best_compression`
- <<index-codec,codec>>.
- [float]
- === Use the smallest numeric type that is sufficient
- The type that you pick for <<number,numeric data>> can have a significant impact
- on disk usage. In particular, integers should be stored using an integer type
- (`byte`, `short`, `integer` or `long`) and floating points should either be
- stored in a `scaled_float` if appropriate or in the smallest type that fits the
- use-case: using `float` over `double`, or `half_float` over `float` will help
- save storage.
|