| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278 | [[index-modules-codec]]== Codec moduleCodecs define how documents are written to disk and read from disk. Thepostings format is the part of the codec that is responsible for readingand writing the term dictionary, postings lists and positions, as well as the payloadsand offsets stored in the postings list. The doc values format isresponsible for reading column-stride storage for a field and is typicallyused for sorting or faceting. When a field doesn't have doc values enabled,it is still possible to sort or facet by loading field values from theinverted index into main memory.Configuring custom postings or doc values formats is an expert feature andmost likely using the builtin formats will suit your needs as is describedin the <<mapping-core-types,mapping section>>.**********************************Only the default codec, postings format and doc values format are supported:other formats may break backward compatibility between minor versions ofElasticsearch, requiring data to be reindexed.**********************************[float][[custom-postings]]=== Configuring a custom postings formatA custom postings format can be defined in the index settings in the`codec` part. The `codec` part can be configured when creating an indexor updating index settings. An example on how to define your custompostings format:[source,js]--------------------------------------------------curl -XPUT 'http://localhost:9200/twitter/' -d '{    "settings" : {        "index" : {            "codec" : {          "postings_format" : {             "my_format" : {                "type" : "pulsing",                "freq_cut_off" : "5"             }          }       }        }    }}'--------------------------------------------------Then when defining your mapping you can use the `my_format` name in the`postings_format` option as the example below illustrates:[source,js]--------------------------------------------------{  "person" : {     "properties" : {         "second_person_id" : {"type" : "string", "postings_format" : "my_format"}     }  }}--------------------------------------------------[float]=== Available postings formats[float][[direct-postings]]==== Direct postings formatWraps the default postings format for on-disk storage, but then at readtime loads and stores all terms & postings directly in RAM. Thispostings format makes no effort to compress the terms and posting listand therefore is memory intensive, but because of this it gives asubstantial increase in search performance. Because this holds all termbytes as a single byte[], you cannot have more than 2.1GB worth of termsin a single segment.This postings format offers the following parameters:`min_skip_count`::    The minimum number terms with a shared prefix to    allow a skip pointer to be written. The default is *8*.`low_freq_cutoff`::    Terms with a lower document frequency use a    single array object representation for postings and positions. The    default is *32*.Type name: `direct`[float][[memory-postings]]==== Memory postings formatA postings format that stores terms & postings (docs, positions,payloads) in RAM, using an FST. This postings format does write to disk,but loads everything into memory. The memory postings format has thefollowing options:`pack_fst`::    A boolean option that defines if the in memory structure    should be packed once its build. Packed will reduce the size for the    data-structure in memory but requires more memory during building.    Default is *false*.`acceptable_overhead_ratio`::    The compression ratio specified as a    float, that is used to compress internal structures. Example ratios `0`    (Compact, no memory overhead at all, but the returned implementation may    be slow), `0.5` (Fast, at most 50% memory overhead, always select a    reasonably fast implementation), `7` (Fastest, at most 700% memory    overhead, no compression). Default is `0.2`.Type name: `memory`[float][[bloom-postings]]==== Bloom filter posting formatThe bloom filter postings format wraps a delegate postings format and ontop of this creates a bloom filter that is written to disk. Duringopening this bloom filter is loaded into memory and used to offer"fast-fail" reads. This postings format is useful for low doc-frequencyfields such as primary keys. The bloom filter postings format has thefollowing options:`delegate`::    The name of the configured postings format that the    bloom filter postings format will wrap.`fpp`::    The desired false positive probability specified as a    floating point number between 0 and 1.0. The `fpp` can be configured for    multiple expected insertions. Example expression: *10k=0.01,1m=0.03*. If    number docs per index segment is larger than *1m* then use *0.03* as fpp    and if number of docs per segment is larger than *10k* use *0.01* as    fpp. The last fallback value is always *0.03*. This example expression    is also the default.Type name: `bloom`[[codec-bloom-load]][TIP]==================================================It can sometime make sense to disable bloom filters. For instance, if you arelogging into an index per day, and you have thousands of indices, the bloomfilters can take up a sizable amount of memory. For most queries you are onlyinterested in recent indices, so you don't mind CRUD operations on olderindices taking slightly longer.In these cases you can disable loading of the bloom filter on  a per-indexbasis by updating the index settings:[source,js]--------------------------------------------------PUT /old_index/_settings?index.codec.bloom.load=false--------------------------------------------------This setting, which defaults to `true`, can be updated on a live index. Note,however, that changing the value will cause the index to be reopened, whichwill invalidate any existing caches.==================================================[float][[pulsing-postings]]==== Pulsing postings formatThe pulsing implementation in-lines the posting lists for very lowfrequent terms in the term dictionary. This is useful to improve lookupperformance for low-frequent terms. This postings format offers thefollowing parameters:`min_block_size`::    The minimum block size the default Lucene term    dictionary uses to encode on-disk blocks. Defaults to *25*.`max_block_size`::    The maximum block size the default Lucene term    dictionary uses to encode on-disk blocks. Defaults to *48*.`freq_cut_off`::    The document frequency cut off where pulsing    in-lines posting lists into the term dictionary. Terms with a document    frequency less or equal to the cutoff will be in-lined. The default is    *1*.Type name: `pulsing`[float][[default-postings]]==== Default postings formatThe default postings format has the following options:`min_block_size`::    The minimum block size the default Lucene term    dictionary uses to encode on-disk blocks. Defaults to *25*.`max_block_size`::    The maximum block size the default Lucene term    dictionary uses to encode on-disk blocks. Defaults to *48*.Type name: `default`[float]=== Configuring a custom doc values formatCustom doc values format can be defined in the index settings in the`codec` part. The `codec` part can be configured when creating an indexor updating index settings. An example on how to define your customdoc values format:[source,js]--------------------------------------------------curl -XPUT 'http://localhost:9200/twitter/' -d '{    "settings" : {        "index" : {            "codec" : {                "doc_values_format" : {                    "my_format" : {                        "type" : "disk"                    }                }            }        }    }}'--------------------------------------------------Then we defining your mapping your can use the `my_format` name in the`doc_values_format` option as the example below illustrates:[source,js]--------------------------------------------------{  "product" : {     "properties" : {         "price" : {"type" : "integer", "doc_values_format" : "my_format"}     }  }}--------------------------------------------------[float]=== Available doc values formats[float]==== Memory doc values formatA doc values format that stores all values in a FST in RAM. This format doeswrite to disk but the whole data-structure is loaded into memory when readingthe index. The memory postings format has no options.Type name: `memory`[float]==== Disk doc values formatA doc values format that stores and reads everything from disk. Although it maybe slightly slower than the default doc values format, this doc values formatwill require almost no memory from the JVM. The disk doc values format has nooptions.Type name: `disk`[float]==== Default doc values formatThe default doc values format tries to make a good compromise between speed andmemory usage by only loading into memory data-structures that matter forperformance. This makes this doc values format a good fit for most use-cases.The default doc values format has no options.Type name: `default`
 |