Browse Source

[Docs] Improve documentation of the new caching policy for filters.

Adrien Grand 10 years ago
parent
commit
fb6c3b7c29
1 changed files with 61 additions and 22 deletions
  1. 61 22
      docs/reference/query-dsl/filters.asciidoc

+ 61 - 22
docs/reference/query-dsl/filters.asciidoc

@@ -15,37 +15,76 @@ filter does not require a lot of memory, and will cause other queries
 executing against the same filter (same parameters) to be blazingly
 fast.
 
-Some filters already produce a result that is easily cacheable, and the
-difference between caching and not caching them is the act of placing
-the result in the cache or not. These filters, which include the
-<<query-dsl-term-filter,term>>,
+However the cost of caching is not the same for all filters. For
+instance some filters are already fast out of the box while caching could
+add significant overhead, and some filters produce results that are already
+cacheable so caching them is just a matter of putting the result in the
+cache.
+
+The default caching policy, `_cache: auto`, tracks the 1000 most recently
+used filters on a per-index basis and makes decisions based on their
+frequency.
+
+[float]
+==== Filters that read directly the index structure
+
+Some filters can directly read the index structure and potentially jump
+over large sequences of documents that are not worth evaluating (for
+instance when these documents do not match the query). Caching these
+filters introduces overhead given that all documents that the filter
+matches need to be consumed in order to be loaded into the cache.
+
+These filters, which include the <<query-dsl-term-filter,term>> and
+<<query-dsl-term-query,query>> filters, are only cached after they
+appear 5 times or more in the history of the 1000 most recently used
+filters.
+
+[float]
+==== Filters that produce results that are already cacheable
+
+Some filters produce results that are already cacheable, and the difference
+between caching and not caching them is the act of placing the result in
+the cache or not. These filters, which include the
 <<query-dsl-terms-filter,terms>>,
 <<query-dsl-prefix-filter,prefix>>, and
-<<query-dsl-range-filter,range>> filters, are by
-default cached and are recommended to use (compared to the equivalent
-query version) when the same filter (same parameters) will be used
-across multiple different queries (for example, a range filter with age
-higher than 10).
-
-Other filters, usually already working with the field data loaded into
-memory, are not cached by default. Those filters are already very fast,
-and the process of caching them requires extra processing in order to
-allow the filter result to be used with different queries than the one
-executed. These filters, including the geo,
-and <<query-dsl-script-filter,script>> filters
-are not cached by default.
-
-The last type of filters are those working with other filters. The
+<<query-dsl-range-filter,range>> filters, are by default cached after they
+appear twice or more in the history of the most 1000 recently used filters.
+
+[float]
+==== Computational filters
+
+Some filters need to run some computation in order to figure out whether
+a given document matches a filter. These filters, which include the geo and
+<<query-dsl-script-filter,script>> filters, but also the
+<<query-dsl-terms-filter,terms>>  and <<query-dsl-range-filter,range>>
+filters when using the `fielddata` execution mode are never cached by default,
+as it would require to evaluate the filter on all documents in your indices
+while they can otherwise be only evaluated on documents that match the query.
+
+[float]
+==== Compound filters
+
+The last type of filters are those working with other filters, and includes
+the <<query-dsl-bool-filter,bool>>,
 <<query-dsl-and-filter,and>>,
 <<query-dsl-not-filter,not>> and
-<<query-dsl-or-filter,or>> filters are not
-cached as they basically just manipulate the internal filters.
+<<query-dsl-or-filter,or>> filters.
+
+There is no general rule about these filters. Depending on the filters that
+they wrap, they will sometimes return a filter that dynamically evaluates the
+sub filters and sometimes evaluate the sub filters eagerly in order to return
+a result that is already cacheable, so depending on the case, these filters
+will be cached after they appear 2+ or 5+ times in the history of the most
+1000 recently used filters.
+
+[float]
+==== Overriding the default behaviour
 
 All filters allow to set `_cache` element on them to explicitly control
 caching. It accepts 3 values: `true` in order to cache the filter, `false`
 to make sure that the filter will not be cached, and `auto`, which is the
 default and will decide on whether to cache the filter based on the cost
-to cache the filter and how often the filter has been used.
+to cache it and how often it has been used as explained above.
 
 Filters also allow to set `_cache_key` which will be used as the
 caching key for that filter. This can be handy when using very large