| 123456789101112131415161718192021222324252627282930313233343536373839404142 | [[general-recommendations]]== General recommendations[discrete][[large-size]]=== Don't return large result setsElasticsearch is designed as a search engine, which makes it very good atgetting back the top documents that match a query. However, it is not as goodfor workloads that fall into the database domain, such as retrieving alldocuments that match a particular query. If you need to do this, make sure touse the <<scroll-search-results,Scroll>> API.[discrete][[maximum-document-size]]=== Avoid large documentsGiven that the default <<modules-http,`http.max_content_length`>> is set to100MB, Elasticsearch will refuse to index any document that is larger thanthat. You might decide to increase that particular setting, but Lucene stillhas a limit of about 2GB.Even without considering hard limits, large documents are usually notpractical. Large documents put more stress on network, memory usage and disk,even for search requests that do not request the `_source` since Elasticsearchneeds to fetch the `_id` of the document in all cases, and the cost of gettingthis field is bigger for large documents due to how the filesystem cache works.Indexing this document can use an amount of memory that is a multiplier of theoriginal size of the document. Proximity search (phrase queries for instance)and <<highlighting,highlighting>> also become more expensivesince their cost directly depends on the size of the original document.It is sometimes useful to reconsider what the unit of information should be.For instance, the fact you want to make books searchable doesn't necessarilymean that a document should consist of a whole book. It might be a better ideato use chapters or even paragraphs as documents, and then have a property inthese documents that identifies which book they belong to. This does not onlyavoid the issues with large documents, it also makes the search experiencebetter. For instance if a user searches for two words `foo` and `bar`, a matchacross different chapters is probably very poor, while a match within the sameparagraph is likely good.
 |