11 年之前 · a242a63817
--- a/docs/reference/index-modules.asciidoc
+++ b/docs/reference/index-modules.asciidoc
@@ -76,8 +76,6 @@ include::index-modules/query-cache.asciidoc[]
 
				 
			
 
				 include::index-modules/fielddata.asciidoc[]
			
 
				 
			
 
				-include::index-modules/codec.asciidoc[]
			
 
				-
			
 
				 include::index-modules/similarity.asciidoc[]
			
 
				 
			
 
				 
			
--- a/docs/reference/index-modules/codec.asciidoc
+++ b/docs/reference/index-modules/codec.asciidoc
@@ -1,277 +0,0 @@
 
				-[[index-modules-codec]]
			
 
				-== Codec module
			
 
				-
			
 
				-Codecs define how documents are written to disk and read from disk. The
			
 
				-postings format is the part of the codec that is responsible for reading
			
 
				-and writing the term dictionary, postings lists and positions, as well as the payloads
			
 
				-and offsets stored in the postings list. The doc values format is
			
 
				-responsible for reading column-stride storage for a field and is typically
			
 
				-used for sorting or faceting. When a field doesn't have doc values enabled,
			
 
				-it is still possible to sort or facet by loading field values from the
			
 
				-inverted index into main memory.
			
 
				-
			
 
				-Configuring custom postings or doc values formats is an expert feature and
			
 
				-most likely using the builtin formats will suit your needs as is described
			
 
				-in the <<mapping-core-types,mapping section>>.
			
 
				-
			
 
				-[WARNING]
			
 
				-Only the default codec, postings format and doc values format are supported:
			
 
				-other formats may break backward compatibility between minor versions of
			
 
				-Elasticsearch, requiring data to be reindexed.
			
 
				-
			
 
				-
			
 
				-[float]
			
 
				-[[custom-postings]]
			
 
				-=== Configuring a custom postings format
			
 
				-
			
 
				-A custom postings format can be defined in the index settings in the
			
 
				-`codec` part. The `codec` part can be configured when creating an index
			
 
				-or updating index settings. An example on how to define your custom
			
 
				-postings format:
			
 
				-
			
 
				-[source,js]
			
 
				---------------------------------------------------
			
 
				-curl -XPUT 'http://localhost:9200/twitter/' -d '{
			
 
				-    "settings" : {
			
 
				-        "index" : {
			
 
				-            "codec" : {
			
 
				-          "postings_format" : {
			
 
				-             "my_format" : {
			
 
				-                "type" : "pulsing",
			
 
				-                "freq_cut_off" : "5"
			
 
				-             }
			
 
				-          }
			
 
				-       }
			
 
				-        }
			
 
				-    }
			
 
				-}'
			
 
				---------------------------------------------------
			
 
				-
			
 
				-Then when defining your mapping you can use the `my_format` name in the
			
 
				-`postings_format` option as the example below illustrates:
			
 
				-
			
 
				-[source,js]
			
 
				---------------------------------------------------
			
 
				-{
			
 
				-  "person" : {
			
 
				-     "properties" : {
			
 
				-         "second_person_id" : {"type" : "string", "postings_format" : "my_format"}
			
 
				-     }
			
 
				-  }
			
 
				-}
			
 
				---------------------------------------------------
			
 
				-
			
 
				-[float]
			
 
				-=== Available postings formats
			
 
				-
			
 
				-[float]
			
 
				-[[direct-postings]]
			
 
				-==== Direct postings format
			
 
				-
			
 
				-Wraps the default postings format for on-disk storage, but then at read
			
 
				-time loads and stores all terms & postings directly in RAM. This
			
 
				-postings format makes no effort to compress the terms and posting list
			
 
				-and therefore is memory intensive, but because of this it gives a
			
 
				-substantial increase in search performance. Because this holds all term
			
 
				-bytes as a single byte[], you cannot have more than 2.1GB worth of terms
			
 
				-in a single segment.
			
 
				-
			
 
				-This postings format offers the following parameters:
			
 
				-
			
 
				-`min_skip_count`::
			
 
				-    The minimum number terms with a shared prefix to
			
 
				-    allow a skip pointer to be written. The default is *8*.
			
 
				-
			
 
				-`low_freq_cutoff`::
			
 
				-    Terms with a lower document frequency use a
			
 
				-    single array object representation for postings and positions. The
			
 
				-    default is *32*.
			
 
				-
			
 
				-Type name: `direct`
			
 
				-
			
 
				-[float]
			
 
				-[[memory-postings]]
			
 
				-==== Memory postings format
			
 
				-
			
 
				-A postings format that stores terms & postings (docs, positions,
			
 
				-payloads) in RAM, using an FST. This postings format does write to disk,
			
 
				-but loads everything into memory. The memory postings format has the
			
 
				-following options:
			
 
				-
			
 
				-`pack_fst`::
			
 
				-    A boolean option that defines if the in memory structure
			
 
				-    should be packed once its build. Packed will reduce the size for the
			
 
				-    data-structure in memory but requires more memory during building.
			
 
				-    Default is *false*.
			
 
				-
			
 
				-`acceptable_overhead_ratio`::
			
 
				-    The compression ratio specified as a
			
 
				-    float, that is used to compress internal structures. Example ratios `0`
			
 
				-    (Compact, no memory overhead at all, but the returned implementation may
			
 
				-    be slow), `0.5` (Fast, at most 50% memory overhead, always select a
			
 
				-    reasonably fast implementation), `7` (Fastest, at most 700% memory
			
 
				-    overhead, no compression). Default is `0.2`.
			
 
				-
			
 
				-Type name: `memory`
			
 
				-
			
 
				-[float]
			
 
				-[[bloom-postings]]
			
 
				-==== Bloom filter posting format
			
 
				-
			
 
				-The bloom filter postings format wraps a delegate postings format and on
			
 
				-top of this creates a bloom filter that is written to disk. During
			
 
				-opening this bloom filter is loaded into memory and used to offer
			
 
				-"fast-fail" reads. This postings format is useful for low doc-frequency
			
 
				-fields such as primary keys. The bloom filter postings format has the
			
 
				-following options:
			
 
				-
			
 
				-`delegate`::
			
 
				-    The name of the configured postings format that the
			
 
				-    bloom filter postings format will wrap.
			
 
				-
			
 
				-`fpp`::
			
 
				-    The desired false positive probability specified as a
			
 
				-    floating point number between 0 and 1.0. The `fpp` can be configured for
			
 
				-    multiple expected insertions. Example expression: *10k=0.01,1m=0.03*. If
			
 
				-    number docs per index segment is larger than *1m* then use *0.03* as fpp
			
 
				-    and if number of docs per segment is larger than *10k* use *0.01* as
			
 
				-    fpp. The last fallback value is always *0.03*. This example expression
			
 
				-    is also the default.
			
 
				-
			
 
				-Type name: `bloom`
			
 
				-
			
 
				-[[codec-bloom-load]]
			
 
				-[TIP]
			
 
				-==================================================
			
 
				-
			
 
				-As of 1.4, the bloom filters are no longer loaded at search time by
			
 
				-default: they consume RAM in proportion to the number of unique terms,
			
 
				-which can quickly add up for certain use cases, and separate
			
 
				-performance improvements have made the performance gains with bloom
			
 
				-filters very small.
			
 
				-
			
 
				-You can enable loading of the bloom filter at search time on a
			
 
				-per-index basis by updating the index settings:
			
 
				-
			
 
				-[source,js]
			
 
				---------------------------------------------------
			
 
				-PUT /old_index/_settings?index.codec.bloom.load=true
			
 
				---------------------------------------------------
			
 
				-
			
 
				-This setting, which defaults to `false`, can be updated on a live index. Note,
			
 
				-however, that changing the value will cause the index to be reopened, which
			
 
				-will invalidate any existing caches.
			
 
				-
			
 
				-==================================================
			
 
				-
			
 
				-[float]
			
 
				-[[pulsing-postings]]
			
 
				-==== Pulsing postings format
			
 
				-
			
 
				-The pulsing implementation in-lines the posting lists for very low
			
 
				-frequent terms in the term dictionary. This is useful to improve lookup
			
 
				-performance for low-frequent terms. This postings format offers the
			
 
				-following parameters:
			
 
				-
			
 
				-`min_block_size`::
			
 
				-    The minimum block size the default Lucene term
			
 
				-    dictionary uses to encode on-disk blocks. Defaults to *25*.
			
 
				-
			
 
				-`max_block_size`::
			
 
				-    The maximum block size the default Lucene term
			
 
				-    dictionary uses to encode on-disk blocks. Defaults to *48*.
			
 
				-
			
 
				-`freq_cut_off`::
			
 
				-    The document frequency cut off where pulsing
			
 
				-    in-lines posting lists into the term dictionary. Terms with a document
			
 
				-    frequency less or equal to the cutoff will be in-lined. The default is
			
 
				-    *1*.
			
 
				-
			
 
				-Type name: `pulsing`
			
 
				-
			
 
				-[float]
			
 
				-[[default-postings]]
			
 
				-==== Default postings format
			
 
				-
			
 
				-The default postings format has the following options:
			
 
				-
			
 
				-`min_block_size`::
			
 
				-    The minimum block size the default Lucene term
			
 
				-    dictionary uses to encode on-disk blocks. Defaults to *25*.
			
 
				-
			
 
				-`max_block_size`::
			
 
				-    The maximum block size the default Lucene term
			
 
				-    dictionary uses to encode on-disk blocks. Defaults to *48*.
			
 
				-
			
 
				-Type name: `default`
			
 
				-
			
 
				-[float]
			
 
				-=== Configuring a custom doc values format
			
 
				-
			
 
				-Custom doc values format can be defined in the index settings in the
			
 
				-`codec` part. The `codec` part can be configured when creating an index
			
 
				-or updating index settings. An example on how to define your custom
			
 
				-doc values format:
			
 
				-
			
 
				-[source,js]
			
 
				---------------------------------------------------
			
 
				-curl -XPUT 'http://localhost:9200/twitter/' -d '{
			
 
				-    "settings" : {
			
 
				-        "index" : {
			
 
				-            "codec" : {
			
 
				-                "doc_values_format" : {
			
 
				-                    "my_format" : {
			
 
				-                        "type" : "disk"
			
 
				-                    }
			
 
				-                }
			
 
				-            }
			
 
				-        }
			
 
				-    }
			
 
				-}'
			
 
				---------------------------------------------------
			
 
				-
			
 
				-Then we defining your mapping your can use the `my_format` name in the
			
 
				-`doc_values_format` option as the example below illustrates:
			
 
				-
			
 
				-[source,js]
			
 
				---------------------------------------------------
			
 
				-{
			
 
				-  "product" : {
			
 
				-     "properties" : {
			
 
				-         "price" : {"type" : "integer", "doc_values_format" : "my_format"}
			
 
				-     }
			
 
				-  }
			
 
				-}
			
 
				---------------------------------------------------
			
 
				-
			
 
				-[float]
			
 
				-=== Available doc values formats
			
 
				-
			
 
				-[float]
			
 
				-==== Memory doc values format
			
 
				-
			
 
				-A doc values format that stores all values in a FST in RAM. This format does
			
 
				-write to disk but the whole data-structure is loaded into memory when reading
			
 
				-the index. The memory postings format has no options.
			
 
				-
			
 
				-Type name: `memory`
			
 
				-
			
 
				-[float]
			
 
				-==== Disk doc values format
			
 
				-
			
 
				-A doc values format that stores and reads everything from disk. This is
			
 
				-generally not a good idea to use it as it saves very little memory compared
			
 
				-to the default doc values format although it can be significantly slower.
			
 
				-The disk doc values format has no options.
			
 
				-
			
 
				-Type name: `disk`
			
 
				-
			
 
				-[float]
			
 
				-==== Default doc values format
			
 
				-
			
 
				-The default doc values format tries to make a good compromise between speed and
			
 
				-memory usage by only loading into memory data-structures that matter for
			
 
				-performance. This makes this doc values format a good fit for most use-cases.
			
 
				-The default doc values format has no options.
			
 
				-
			
 
				-Type name: `default`
			
--- a/docs/reference/indices/update-settings.asciidoc
+++ b/docs/reference/indices/update-settings.asciidoc
@@ -57,12 +57,9 @@ settings API:
 
				 `index.index_concurrency`::
			
 
				     Defaults to `8`.
			
 
				 
			
 
				-`index.codec`::
			
 
				-    Codec. Default to `default`.
			
 
				-
			
 
				 `index.codec.bloom.load`::
			
 
				-    Whether to load the bloom filter. Defaults to `true`.
			
 
				-    See <<bloom-postings>>.
			
 
				+    Whether to load the bloom filter. Defaults to `false`.
			
 
				+    See <<codec-bloom-load>>.
			
 
				 
			
 
				 `index.fail_on_merge_failure`::
			
 
				     Default to `true`.
			
@@ -225,3 +222,35 @@ curl -XPUT 'localhost:9200/myindex/_settings' -d '{
 
				 
			
 
				 curl -XPOST 'localhost:9200/myindex/_open'
			
 
				 --------------------------------------------------
			
 
				+
			
 
				+[float]
			
 
				+[[codec-bloom-load]]
			
 
				+=== Bloom filters
			
 
				+
			
 
				+Up to version 1.3, Elasticsearch used to generate bloom filters for the `_uid`
			
 
				+field at indexing time and to load them at search time in order to speed-up
			
 
				+primary-key lookups by savings disk seeks.
			
 
				+
			
 
				+As of 1.4, bloom filters are still generated at indexing time, but they are
			
 
				+no longer loaded at search time by default: they consume RAM in proportion to
			
 
				+the number of unique terms, which can quickly add up for certain use cases,
			
 
				+and separate performance improvements have made the performance gains with
			
 
				+bloom filters very small.
			
 
				+
			
 
				+[TIP]
			
 
				+==================================================
			
 
				+
			
 
				+You can enable loading of the bloom filter at search time on a
			
 
				+per-index basis by updating the index settings:
			
 
				+
			
 
				+[source,js]
			
 
				+--------------------------------------------------
			
 
				+PUT /old_index/_settings?index.codec.bloom.load=true
			
 
				+--------------------------------------------------
			
 
				+
			
 
				+This setting, which defaults to `false`, can be updated on a live index. Note,
			
 
				+however, that changing the value will cause the index to be reopened, which
			
 
				+will invalidate any existing caches.
			
 
				+
			
 
				+==================================================
			
 
				+
			
--- a/docs/reference/mapping/types/core-types.asciidoc
+++ b/docs/reference/mapping/types/core-types.asciidoc
@@ -511,116 +511,6 @@ effect the next time the fielddata for a segment is loaded. Use the
 
				 <<indices-clearcache,Clear Cache>> API
			
 
				 to reload the fielddata using the new filters.
			
 
				 
			
 
				-[float]
			
 
				-[[postings]]
			
 
				-==== Postings format
			
 
				-
			
 
				-Posting formats define how fields are written into the index and how
			
 
				-fields are represented into memory. Posting formats can be defined per
			
 
				-field via the `postings_format` option. Postings format are configurable.
			
 
				-Elasticsearch has several builtin formats:
			
 
				-
			
 
				-`direct`::
			
 
				-        A postings format that uses disk-based storage but loads
			
 
				-        its terms and postings directly into memory. Note this postings format
			
 
				-        is very memory intensive and has certain limitation that don't allow
			
 
				-        segments to grow beyond 2.1GB see \{@link DirectPostingsFormat} for
			
 
				-        details.
			
 
				-
			
 
				-`memory`::
			
 
				-        A postings format that stores its entire terms, postings,
			
 
				-        positions and payloads in a finite state transducer. This format should
			
 
				-        only be used for primary keys or with fields where each term is
			
 
				-        contained in a very low number of documents.
			
 
				-
			
 
				-`pulsing`::
			
 
				-        A postings format that in-lines the posting lists for very low
			
 
				-        frequent terms in the term dictionary. This is useful to improve lookup
			
 
				-        performance for low-frequent terms.
			
 
				-
			
 
				-`bloom_default`::
			
 
				-        A postings format that uses a bloom filter to
			
 
				-        improve term lookup performance. This is useful for primary keys or
			
 
				-        fields that are used as a delete key.
			
 
				-
			
 
				-`bloom_pulsing`::
			
 
				-        A postings format that combines the advantages of
			
 
				-        *bloom* and *pulsing* to further improve lookup performance.
			
 
				-
			
 
				-`default`::
			
 
				-        The default Elasticsearch postings format offering best
			
 
				-        general purpose performance. This format is used if no postings format
			
 
				-        is specified in the field mapping.
			
 
				-
			
 
				-[float]
			
 
				-===== Postings format example
			
 
				-
			
 
				-On all field types it possible to configure a `postings_format`
			
 
				-attribute:
			
 
				-
			
 
				-[source,js]
			
 
				---------------------------------------------------
			
 
				-{
			
 
				-  "person" : {
			
 
				-     "properties" : {
			
 
				-         "second_person_id" : {"type" : "string", "postings_format" : "pulsing"}
			
 
				-     }
			
 
				-  }
			
 
				-}
			
 
				---------------------------------------------------
			
 
				-
			
 
				-On top of using the built-in posting formats it is possible define
			
 
				-custom postings format. See
			
 
				-<<index-modules-codec,codec module>> for more
			
 
				-information.
			
 
				-
			
 
				-[float]
			
 
				-==== Doc values format
			
 
				-
			
 
				-Doc values formats define how fields are written into column-stride storage in
			
 
				-the index for the purpose of sorting or faceting. Fields that have doc values
			
 
				-enabled will have special field data instances, which will not be uninverted
			
 
				-from the inverted index, but directly read from disk. This makes _refresh faster
			
 
				-and ultimately allows for having field data stored on disk depending on the
			
 
				-configured doc values format.
			
 
				-
			
 
				-Doc values formats are configurable. Elasticsearch has several builtin formats:
			
 
				-
			
 
				-`memory`::
			
 
				-        A doc values format which stores data in memory. Compared to the default
			
 
				-        field data implementations, using doc values with this format will have
			
 
				-        similar performance but will be faster to load, making '_refresh' less
			
 
				-        time-consuming.
			
 
				-
			
 
				-`disk`::
			
 
				-        A doc values format which stores all data on disk, requiring almost no
			
 
				-        memory from the JVM at the cost of a slight performance degradation.
			
 
				-
			
 
				-`default`::
			
 
				-        The default Elasticsearch doc values format, offering good performance
			
 
				-        with low memory usage. This format is used if no format is specified in
			
 
				-        the field mapping.
			
 
				-
			
 
				-[float]
			
 
				-===== Doc values format example
			
 
				-
			
 
				-On all field types, it is possible to configure a `doc_values_format` attribute:
			
 
				-
			
 
				-[source,js]
			
 
				---------------------------------------------------
			
 
				-{
			
 
				-  "product" : {
			
 
				-     "properties" : {
			
 
				-         "price" : {"type" : "integer", "doc_values_format" : "memory"}
			
 
				-     }
			
 
				-  }
			
 
				-}
			
 
				---------------------------------------------------
			
 
				-
			
 
				-On top of using the built-in doc values formats it is possible to define
			
 
				-custom doc values formats. See
			
 
				-<<index-modules-codec,codec module>> for more information.
			
 
				-
			
 
				 [float]
			
 
				 ==== Similarity