general.asciidoc 2.1 KB

1234567891011121314151617181920212223242526272829303132333435363738394041
  1. [[general-recommendations]]
  2. == General recommendations
  3. [discrete]
  4. [[large-size]]
  5. === Don't return large result sets
  6. Elasticsearch is designed as a search engine, which makes it very good at
  7. getting back the top documents that match a query. However, it is not as good
  8. for workloads that fall into the database domain, such as retrieving all
  9. documents that match a particular query. If you need to do this, make sure to
  10. use the <<scroll-search-results,Scroll>> API.
  11. [discrete]
  12. [[maximum-document-size]]
  13. === Avoid large documents
  14. Given that the default <<http-settings,`http.max_content_length`>> is set to
  15. 100MB, Elasticsearch will refuse to index any document that is larger than
  16. that. You might decide to increase that particular setting, but Lucene still
  17. has a limit of about 2GB.
  18. Even without considering hard limits, large documents are usually not
  19. practical. Large documents put more stress on network, memory usage and disk,
  20. even for search requests that do not request the `_source` since Elasticsearch
  21. needs to fetch the `_id` of the document in all cases, and the cost of getting
  22. this field is bigger for large documents due to how the filesystem cache works.
  23. Indexing this document can use an amount of memory that is a multiplier of the
  24. original size of the document. Proximity search (phrase queries for instance)
  25. and <<highlighting,highlighting>> also become more expensive
  26. since their cost directly depends on the size of the original document.
  27. It is sometimes useful to reconsider what the unit of information should be.
  28. For instance, the fact you want to make books searchable doesn't necessarily
  29. mean that a document should consist of a whole book. It might be a better idea
  30. to use chapters or even paragraphs as documents, and then have a property in
  31. these documents that identifies which book they belong to. This does not only
  32. avoid the issues with large documents, it also makes the search experience
  33. better. For instance if a user searches for two words `foo` and `bar`, a match
  34. across different chapters is probably very poor, while a match within the same
  35. paragraph is likely good.