9 gadi atpakaļ · f295a218a0
--- a/docs/reference/how-to.asciidoc
+++ b/docs/reference/how-to.asciidoc
@@ -15,6 +15,8 @@ This section provides guidance about which changes should and shouldn't be
 
				 made.
			
 
				 --
			
 
				 
			
 
				+include::how-to/general.asciidoc[]
			
 
				+
			
 
				 include::how-to/indexing-speed.asciidoc[]
			
 
				 
			
 
				 include::how-to/search-speed.asciidoc[]
			
--- a/docs/reference/how-to/general.asciidoc
+++ b/docs/reference/how-to/general.asciidoc
@@ -0,0 +1,104 @@
 
				+[[general-recommendations]]
			
 
				+== General recommendations
			
 
				+
			
 
				+[float]
			
 
				+[[large-size]]
			
 
				+=== Don't return large result sets
			
 
				+
			
 
				+Elasticsearch is designed as a search engine, which makes it very good at
			
 
				+getting back the top documents that match a query. However, it is not as good
			
 
				+for workloads that fall into the database domain, such as retrieving all
			
 
				+documents that match a particular query. If you need to do this, make sure to
			
 
				+use the <<search-request-scroll,Scroll>> API.
			
 
				+
			
 
				+[float]
			
 
				+[[sparsity]]
			
 
				+=== Avoid sparsity
			
 
				+
			
 
				+The data-structures behind Lucene, which elasticsearch relies on in order to
			
 
				+index and store data, work best with dense data, ie. when all documents have the
			
 
				+same fields. This is especially true for fields that have norms enabled (which
			
 
				+is the case for `text` fields by default) or doc values enabled (which is the
			
 
				+case for numerics, `date`, `ip` and `keyword` by default).
			
 
				+
			
 
				+The reason is that Lucene internally identifies documents with so-called doc
			
 
				+ids, which are integers between 0 and the total number of documents in the
			
 
				+index. These doc ids are used for communication between the internal APIs of
			
 
				+Lucene: for instance searching on a term with a `match` query produces an
			
 
				+iterator of doc ids, and these doc ids are then used to retrieve the value of
			
 
				+the `norm` in order to compute a score for these documents. The way this `norm`
			
 
				+lookup is implemented currently is by reserving one byte for each document.
			
 
				+The `norm` value for a given doc id can then be retrieved by reading the
			
 
				+byte at index `doc_id`. While this is very efficient and helps Lucene quickly
			
 
				+have access to the `norm` values of every document, this has the drawback that
			
 
				+documents that do not have a value will also require one byte of storage.
			
 
				+
			
 
				+In practice, this means that if an index has `M` documents, norms will require
			
 
				+`M` bytes of storage *per field*, even for fields that only appear in a small
			
 
				+fraction of the documents of the index. Although slightly more complex with doc
			
 
				+values due to the fact that doc values have multiple ways that they can be
			
 
				+encoded depending on the type of field and on the actual data that the field
			
 
				+stores, the problem is very similar. In case you wonder: `fielddata`, which was
			
 
				+used in elasticsearch pre-2.0 before being replaced with doc values, also
			
 
				+suffered from this issue, except that the impact was only on the memory
			
 
				+footprint since `fielddata` was not explicitly materialized on disk.
			
 
				+
			
 
				+Note that even though the most notable impact of sparsity is on storage
			
 
				+requirements, it also has an impact on indexing speed and search speed since
			
 
				+these bytes for documents that do not have a field still need to be written
			
 
				+at index time and skipped over at search time.
			
 
				+
			
 
				+It is totally fine to have a minority of sparse fields in an index. But beware
			
 
				+that if sparsity becomes the rule rather than the exception, then the index
			
 
				+will not be as efficient as it could be.
			
 
				+
			
 
				+This section mostly focused on `norms` and `doc values` because those are the
			
 
				+two features that are most affected by sparsity. Sparsity also affect the
			
 
				+efficiency of the inverted index (used to index `text`/`keyword` fields) and
			
 
				+dimensional points (used to index `geo_point` and numerics) but to a lesser
			
 
				+extent.
			
 
				+
			
 
				+Here are some recommendations that can help avoid sparsity:
			
 
				+
			
 
				+[float]
			
 
				+==== Avoid putting unrelated data in the same index
			
 
				+
			
 
				+You should avoid putting documents that have totally different structures into
			
 
				+the same index in order to avoid sparsity. It is often better to put these
			
 
				+documents into different indices, you could also consider giving fewer shards
			
 
				+to these smaller indices since they will contain fewer documents overall.
			
 
				+
			
 
				+Note that this advice does not apply in the case that you need to use
			
 
				+parent/child relations between your documents since this feature is only
			
 
				+supported on documents that live in the same index.
			
 
				+
			
 
				+[float]
			
 
				+==== Normalize document structures
			
 
				+
			
 
				+Even if you really need to put different kinds of documents in the same index,
			
 
				+maybe there are opportunities to reduce sparsity. For instance if all documents
			
 
				+in the index have a timestamp field but some call it `timestamp` and others
			
 
				+call it `creation_date`, it would help to rename it so that all documents have
			
 
				+the same field name for the same data.
			
 
				+
			
 
				+[float]
			
 
				+==== Avoid types
			
 
				+
			
 
				+Types might sound like a good way to store multiple tenants in a single index.
			
 
				+They are not: given that types store everything in a single index, having
			
 
				+multiple types that have different fields in a single index will also cause
			
 
				+problems due to sparsity as described above. If your types to not have very
			
 
				+similar mappings, you might want to consider moving them to a dedicated index.
			
 
				+
			
 
				+[float]
			
 
				+==== Disable `norms` and `doc_values` on sparse fields
			
 
				+
			
 
				+If none of the above recommendations apply in your case, you might want to
			
 
				+check whether you actually need `norms` and `doc_values` on your sparse fields.
			
 
				+`norms` can be disabled if producing scores is not necessary on a field, this is
			
 
				+typically true for fields that are only used for filtering. `doc_values` can be
			
 
				+disabled on fields that are neither used for sorting nor for aggregations.
			
 
				+Beware that this decision should not be made lightly since these parameters
			
 
				+cannot be changed on a live index, so you would have to reindex if you realize
			
 
				+that you need `norms` or `doc_values`.
			
 
				+