|
@@ -0,0 +1,194 @@
|
|
|
+[[tune-for-search-speed]]
|
|
|
+== Tune for search speed
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Give memory to the filesystem cache
|
|
|
+
|
|
|
+Elasticsearch heavily relies on the filesystem cache in order to make search
|
|
|
+fast. In general, you should make sure that at least half the available memory
|
|
|
+goes to the filesystem cache so that elasticsearch can keep hot regions of the
|
|
|
+index in physical memory.
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Use faster hardware
|
|
|
+
|
|
|
+If your search is I/O bound, you should investigate giving more memory to the
|
|
|
+filesystem cache (see above) or buying faster drives. In particular SSD drives
|
|
|
+are known to perform better than spinning disks. Always use local storage,
|
|
|
+remote filesystems such as `NFS` or `SMB` should be avoided. Also beware of
|
|
|
+virtualized storage such as Amazon's `Elastic Block Storage`. Virtualized
|
|
|
+storage works very well with Elasticsearch, and it is appealing since it is so
|
|
|
+fast and simple to set up, but it is also unfortunately inherently slower on an
|
|
|
+ongoing basis when compared to dedicated local storage. If you put an index on
|
|
|
+`EBS`, be sure to use provisioned IOPS otherwise operations could be quickly
|
|
|
+throttled.
|
|
|
+
|
|
|
+If your search is CPU-bound, you should investigate buying faster CPUs.
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Document modeling
|
|
|
+
|
|
|
+Documents should be modeled so that search-time operations are as cheap as possible.
|
|
|
+
|
|
|
+In particular, joins should be avoided. <<nested,`nested`>> can make queries
|
|
|
+several times slower and <<mapping-parent-field,parent-child>> relations can make
|
|
|
+queries hundreds of times slower. So if the same questions can be answered without
|
|
|
+joins by denormalizing documents, significant speedups can be expected.
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Pre-index data
|
|
|
+
|
|
|
+You should leverage patterns in your queries to optimize the way data is indexed.
|
|
|
+For instance, if all your documents have a `price` field and most queries run
|
|
|
+<<search-aggregations-bucket-range-aggregation,`range`>> aggregations on a fixed
|
|
|
+list of ranges, you could make this aggregation faster by pre-indexing the ranges
|
|
|
+into the index and using a <<search-aggregations-bucket-terms-aggregation,`terms`>>
|
|
|
+aggregations.
|
|
|
+
|
|
|
+For instance, if documents look like:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+PUT index/type/1
|
|
|
+{
|
|
|
+ "designation": "spoon",
|
|
|
+ "price": 13
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
+
|
|
|
+and search requests look like:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+GET index/_search
|
|
|
+{
|
|
|
+ "aggs": {
|
|
|
+ "price_ranges": {
|
|
|
+ "range": {
|
|
|
+ "field": "price",
|
|
|
+ "ranges": [
|
|
|
+ { "to": 10 },
|
|
|
+ { "from": 10, "to": 100 },
|
|
|
+ { "from": 100 }
|
|
|
+ ]
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
+// TEST[continued]
|
|
|
+
|
|
|
+Then documents could be enriched by a `price_range` field at index time, which
|
|
|
+should be mapped as a <<keyword,`keyword`>>:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+PUT index
|
|
|
+{
|
|
|
+ "mappings": {
|
|
|
+ "type": {
|
|
|
+ "properties": {
|
|
|
+ "price_range": {
|
|
|
+ "type": "keyword"
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+
|
|
|
+PUT index/type/1
|
|
|
+{
|
|
|
+ "designation": "spoon",
|
|
|
+ "price": 13,
|
|
|
+ "price_range": "10-100"
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
+
|
|
|
+And then search requests could aggregate this new field rather than running a
|
|
|
+`range` aggregation on the `price` field.
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+GET index/_search
|
|
|
+{
|
|
|
+ "aggs": {
|
|
|
+ "price_ranges": {
|
|
|
+ "terms": {
|
|
|
+ "field": "price_range"
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
+// TEST[continued]
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Mappings
|
|
|
+
|
|
|
+The fact that some data is numeric does not mean it should always be mapped as a
|
|
|
+<<number,numeric field>>. Typically, fields storing identifiers such as an `ISBN`
|
|
|
+or any number identifying a record from another database, might benefit from
|
|
|
+being mapped as <<keyword,`keyword`>> rather than `integer` or `long`.
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Avoid scripts
|
|
|
+
|
|
|
+In general, scripts should be avoided. If they are absolutely needed, you
|
|
|
+should prefer the `painless` and `expressions` engines.
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Force-merge read-only indices
|
|
|
+
|
|
|
+Indices that are read-only would benefit from being
|
|
|
+<<indices-forcemerge,merged down to a single segment>>. This is typically the
|
|
|
+case with time-based indices: only the index for the current time frame is
|
|
|
+getting new documents while older indices are read-only.
|
|
|
+
|
|
|
+IMPORTANT: Don't force-merge indices that are still being written to -- leave
|
|
|
+merging to the background merge process.
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Warm up global ordinals
|
|
|
+
|
|
|
+Global ordinals are a data-structure that is used in order to run
|
|
|
+<<search-aggregations-bucket-terms-aggregation,`terms`>> aggregations on
|
|
|
+<<keyword,`keyword`>> fields. They are loaded lazily in memory because
|
|
|
+elasticsearch does not know which fields will be used in `terms` aggregations
|
|
|
+and which fields won't. You can tell elasticsearch to load global ordinals
|
|
|
+eagerly at refresh-time by configuring mappings as described below:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+PUT index
|
|
|
+{
|
|
|
+ "mappings": {
|
|
|
+ "type": {
|
|
|
+ "properties": {
|
|
|
+ "foo": {
|
|
|
+ "type": "keyword",
|
|
|
+ "eager_global_ordinals": true
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Warm up the filesystem cache
|
|
|
+
|
|
|
+If the machine running elasticsearch is restarted, the filesystem cache will be
|
|
|
+empty, so it will take some time before the operating system loads hot regions
|
|
|
+of the index into memory so that search operations are fast. You can explicitly
|
|
|
+tell the operating system which files should be loaded into memory eagerly
|
|
|
+depending on the file extension using the <<file-system,`index.store.preload`>>
|
|
|
+setting.
|
|
|
+
|
|
|
+WARNING: Loading data into the filesystem cache eagerly on too many indices or
|
|
|
+too many files will make searh _slower_ if the filesystem cache is not large
|
|
|
+enough to hold all the data. Use with caution.
|