123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194 |
- [[tune-for-search-speed]]
- == Tune for search speed
- [float]
- === Give memory to the filesystem cache
- Elasticsearch heavily relies on the filesystem cache in order to make search
- fast. In general, you should make sure that at least half the available memory
- goes to the filesystem cache so that elasticsearch can keep hot regions of the
- index in physical memory.
- [float]
- === Use faster hardware
- If your search is I/O bound, you should investigate giving more memory to the
- filesystem cache (see above) or buying faster drives. In particular SSD drives
- are known to perform better than spinning disks. Always use local storage,
- remote filesystems such as `NFS` or `SMB` should be avoided. Also beware of
- virtualized storage such as Amazon's `Elastic Block Storage`. Virtualized
- storage works very well with Elasticsearch, and it is appealing since it is so
- fast and simple to set up, but it is also unfortunately inherently slower on an
- ongoing basis when compared to dedicated local storage. If you put an index on
- `EBS`, be sure to use provisioned IOPS otherwise operations could be quickly
- throttled.
- If your search is CPU-bound, you should investigate buying faster CPUs.
- [float]
- === Document modeling
- Documents should be modeled so that search-time operations are as cheap as possible.
- In particular, joins should be avoided. <<nested,`nested`>> can make queries
- several times slower and <<mapping-parent-field,parent-child>> relations can make
- queries hundreds of times slower. So if the same questions can be answered without
- joins by denormalizing documents, significant speedups can be expected.
- [float]
- === Pre-index data
- You should leverage patterns in your queries to optimize the way data is indexed.
- For instance, if all your documents have a `price` field and most queries run
- <<search-aggregations-bucket-range-aggregation,`range`>> aggregations on a fixed
- list of ranges, you could make this aggregation faster by pre-indexing the ranges
- into the index and using a <<search-aggregations-bucket-terms-aggregation,`terms`>>
- aggregations.
- For instance, if documents look like:
- [source,js]
- --------------------------------------------------
- PUT index/type/1
- {
- "designation": "spoon",
- "price": 13
- }
- --------------------------------------------------
- // CONSOLE
- and search requests look like:
- [source,js]
- --------------------------------------------------
- GET index/_search
- {
- "aggs": {
- "price_ranges": {
- "range": {
- "field": "price",
- "ranges": [
- { "to": 10 },
- { "from": 10, "to": 100 },
- { "from": 100 }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- Then documents could be enriched by a `price_range` field at index time, which
- should be mapped as a <<keyword,`keyword`>>:
- [source,js]
- --------------------------------------------------
- PUT index
- {
- "mappings": {
- "type": {
- "properties": {
- "price_range": {
- "type": "keyword"
- }
- }
- }
- }
- }
- PUT index/type/1
- {
- "designation": "spoon",
- "price": 13,
- "price_range": "10-100"
- }
- --------------------------------------------------
- // CONSOLE
- And then search requests could aggregate this new field rather than running a
- `range` aggregation on the `price` field.
- [source,js]
- --------------------------------------------------
- GET index/_search
- {
- "aggs": {
- "price_ranges": {
- "terms": {
- "field": "price_range"
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- [float]
- === Mappings
- The fact that some data is numeric does not mean it should always be mapped as a
- <<number,numeric field>>. Typically, fields storing identifiers such as an `ISBN`
- or any number identifying a record from another database, might benefit from
- being mapped as <<keyword,`keyword`>> rather than `integer` or `long`.
- [float]
- === Avoid scripts
- In general, scripts should be avoided. If they are absolutely needed, you
- should prefer the `painless` and `expressions` engines.
- [float]
- === Force-merge read-only indices
- Indices that are read-only would benefit from being
- <<indices-forcemerge,merged down to a single segment>>. This is typically the
- case with time-based indices: only the index for the current time frame is
- getting new documents while older indices are read-only.
- IMPORTANT: Don't force-merge indices that are still being written to -- leave
- merging to the background merge process.
- [float]
- === Warm up global ordinals
- Global ordinals are a data-structure that is used in order to run
- <<search-aggregations-bucket-terms-aggregation,`terms`>> aggregations on
- <<keyword,`keyword`>> fields. They are loaded lazily in memory because
- elasticsearch does not know which fields will be used in `terms` aggregations
- and which fields won't. You can tell elasticsearch to load global ordinals
- eagerly at refresh-time by configuring mappings as described below:
- [source,js]
- --------------------------------------------------
- PUT index
- {
- "mappings": {
- "type": {
- "properties": {
- "foo": {
- "type": "keyword",
- "eager_global_ordinals": true
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- [float]
- === Warm up the filesystem cache
- If the machine running elasticsearch is restarted, the filesystem cache will be
- empty, so it will take some time before the operating system loads hot regions
- of the index into memory so that search operations are fast. You can explicitly
- tell the operating system which files should be loaded into memory eagerly
- depending on the file extension using the <<file-system,`index.store.preload`>>
- setting.
- WARNING: Loading data into the filesystem cache eagerly on too many indices or
- too many files will make search _slower_ if the filesystem cache is not large
- enough to hold all the data. Use with caution.
|